Back to jobs
Job Description
- Build both human-powered and Large Language Model (LLM)-powered automated evaluation systems to assess model performance.
- Establish clear metrics to measure aspects like grounding, coherence, safety, and helpfulness.
- Utilize platforms and tools to efficiently run evaluations across different models and datasets.
- Provide actionable insights from evaluations to improve model quality, often in collaboration with research, and cross-functional teams.
- Create tools and systems that make the evaluation process more efficient and effective.
