Back to jobs
Job Description
- Identify and maintain ML training and serving benchmarks that are representative to Google production and the broader ML industry.
- Achieve performance for customer launches, and in case of third-party/OSS models, for engaged benchmark submissions (ML Commons, InferenceX, etc).
- Use the benchmarks to identify performance opportunities and drive both near-term state of the art (e.g. custom kernels) and out-of the box performance (compiler/runtime optimizations, agentic tooling, auto-sharding) directly and in collaboration with partner teams.
- Participate in algorithmic innovations exploiting new TPU hardware features and model-preserving optimizations (speculative decoding, sparsity, quantization, LoRA, etc).
- Participate in co-designing models that are TPU-friendly to showcase model quality at performance excellent to OSS models typically designed on GPUs.
