Staff Software Engineer, Gemini Evals, GenAI, DeepMind at Dahl Consulting

Design and optimize distributed evaluation execution engines capable of orchestrating large volumes of inference steps across TPU and Google compute unit (GCU) pools with high throughput and low latency.
Build foundational abstractions to evaluate complex LLM agent loops, tool use, and automated LLM-as-a-judge rating systems.
Design error classification, automated retry policies, and observability dashboards to maintain strict service level objective (SLOs) for evaluation pipeline success rates.
Partner closely with GDM research scientists and Data Science teams to anticipate frontier model evaluation requirements and translate them into elegant infrastructure solutions.
Mentor fellow engineers, set high standards for code quality (Python in Google3), and advocate testing and system design practices.

Staff Software Engineer, Gemini Evals, GenAI, DeepMind