Back to jobs
Job Description
- Define key capabilities and translate them into measurable evaluation goals for each model release cycle.
- Curate and evolve the post-training evaluation suite to accurately gauge model quality, readiness, and performance.
- Anticipate measurement needs 2–3 release cycles ahead, partnering with teams to develop evaluations for emerging capabilities.
- Collaborate closely with researchers and training leads to analyze checkpoints, interpret results, and guide daily training runs.
- Investigate and resolve high-impact issues, such as output regressions and behavioral artifacts, by rallying cross-functional teams to implement fixes.
