Back to jobs
Google

Software Engineer, GDC LLM Serving and GPU Performance

Posted Today

Job Description

  • Design, develop, and implement enhancements to the LLM serving stack, focusing on performance, scalability, and resource efficiency (e.g., on systems like Wiz, Servomatic).
  • Contribute to the design and implementation of advanced serving architectures, including disaggregated serving.
  • Build and maintain infrastructure and tooling for in-depth performance analysis, profiling, and benchmarking of LLM models on GPU accelerators.
  • Identify and address performance bottlenecks across the stack, working closely with teams providing core GPU libraries and kernels.
  • Collaborate with research, engineering, and SRE teams to optimize and deploy LLMs in production.
Software Engineer, GDC LLM Serving and GPU Performance at Google | Renata