Research Engineer, Benchmarking, Robotics, DeepMind at Google

Design, implement, and maintain scalable, robust frameworks to enable large-scale evaluation of robot policies across offline open-loop testing and real-world hardware evaluations.
Partner with researchers to design the content of various benchmarks in order to maximize evaluation signal and stress-test model capabilities.
Build diagnostic and visualization tools that allow the team to easily root-cause policy failures and track performance regressions.
Establish evaluation criteria for model releases and own the stability and benchmarking of models slated for critical demos.
Innovate on how to make real-world hardware evaluation faster, more reproducible, and less reliant on manual human intervention.

Research Engineer, Benchmarking, Robotics, DeepMind