Back to jobs

Staff Software Engineer, Gemini Evals, GenAI, DeepMind
Posted 2 weeks ago
Job Description
- Design and optimize distributed evaluation execution engines capable of orchestrating large volumes of inference steps across TPU and Google compute unit (GCU) pools with high throughput and low latency.
- Build foundational abstractions to evaluate complex LLM agent loops, tool use, and automated LLM-as-a-judge rating systems.
- Design error classification, automated retry policies, and observability dashboards to maintain strict service level objective (SLOs) for evaluation pipeline success rates.
- Partner closely with GDM research scientists and Data Science teams to anticipate frontier model evaluation requirements and translate them into elegant infrastructure solutions.
- Mentor fellow engineers, set high standards for code quality (Python in Google3), and advocate testing and system design practices.