Back to jobs
Job Description
- Identify and maintain ML training and serving benchmarks.
- Achieve state-of-the-art performance for customer launches, and in case of 3P/OSS models, for competitive benchmark submissions (ML Commons, InferenceX, etc.).
- Use the benchmarks to identify performance opportunities and directly drive both near-term SOTA (e.g. custom kernels) and out-of-the-box performance (e.g. compiler/runtime optimizations, agentic tooling, auto-sharding) in collaboration with partner teams.
- Participate in algorithmic innovation, exploiting new TPU hardware features and model-preserving optimizations (e.g. speculative decoding, sparsity, quantization, LoRA, etc.).
- Participate in co-designing models that are TPU-friendly to showcase model quality at performance of OSS models typically designed on GPUs.
