Identify and maintain LLM training and serving benchmarks, using them to identify performance opportunities, drive XLA:GPU/Triton performance toward XLA releases.
Engage with various teams, like DeepMind, to solve challenging ML model performance problems.
Run architecture-level simulations on GPU designs and perform roofline analysis to guide partner teams.
Analyze performance and efficiency metrics to identify bottlenecks and then design and implement solutions at Google fleet-wide scale.
Run performance benchmarks on GPU hardware using internal and external tools such as TRT-LLM, vLLM , and SGLang.

Staff Software Engineer, GPU Performance

Job Description