Build both human-powered and Large Language Model (LLM)-powered automated evaluation systems to assess model performance.
Establish clear metrics to measure aspects like grounding, coherence, safety, and helpfulness.
Utilize platforms and tools to efficiently run evaluations across different models and datasets.
Provide actionable insights from evaluations to improve model quality, often in collaboration with research, and cross-functional teams.
Create tools and systems that make the evaluation process more efficient and effective.

Staff Software Engineer, Model Quality

Job Description