Back to jobs
Job Description
- Identify and maintain LLM training and serving benchmarks, using them to identify performance opportunities, drive XLA:GPU/Triton performance toward XLA releases.
- Engage with various teams, like DeepMind, to solve challenging ML model performance problems.
- Run architecture-level simulations on GPU designs and perform roofline analysis to guide partner teams.
- Analyze performance and efficiency metrics to identify bottlenecks and then design and implement solutions at Google fleet-wide scale.
- Run performance benchmarks on GPU hardware using internal and external tools such as TRT-LLM, vLLM , and SGLang.
