Staff Software Engineer, TPU Performance at Google

Focus on Tensor Processing Unit (TPU) fleet efficiency analysis and performance optimization, while identifying and maintaining Machine Learning (ML) training and serving benchmarks.
Use the benchmarks to identify performance opportunities and drive out-of-the-box performance by improving the compiler, runtime, etc. in collaboration with partner teams.
Collaborate with Google product teams and researchers to solve performance problems, such as onboarding new Machine Learning models and products onto new Tensor Processing Unit hardware to enable larger models to train efficiently at a very large scale.
Analyze performance and efficiency metrics to identify bottlenecks, design, and implement solutions at Google fleet-wide scale.
Explore model and data efficiency techniques i.e., model co-design, quantization, and sparsity.

Staff Software Engineer, TPU Performance