Identify and maintain ML training and serving benchmarks that are representative to Google production and the broader ML industry.
Achieve performance for customer launches, and in case of third-party/OSS models, for engaged benchmark submissions (ML Commons, InferenceX, etc).
Use the benchmarks to identify performance opportunities and drive both near-term state of the art (e.g. custom kernels) and out-of the box performance (compiler/runtime optimizations, agentic tooling, auto-sharding) directly and in collaboration with partner teams.
Participate in algorithmic innovations exploiting new TPU hardware features and model-preserving optimizations (speculative decoding, sparsity, quantization, LoRA, etc).
Participate in co-designing models that are TPU-friendly to showcase model quality at performance excellent to OSS models typically designed on GPUs.

Senior Staff Software Engineer, TPU Performance

Job Description