Back to jobs
Dahl Consulting

Software Engineer, GDC LLM Serving and GPU Performance

Posted 1 weeks ago

Job Description

  • Design, develop, and implement enhancements to the LLM serving stack, focusing on performance, scalability, and resource efficiency (e.g., on systems like Wiz, Servomatic).
  • Contribute to the design and implementation of advanced serving architectures, including disaggregated serving.
  • Build and maintain infrastructure and tooling for in-depth performance analysis, profiling, and benchmarking of LLM models on GPU accelerators.
  • Identify and address performance bottlenecks across the stack, working closely with teams providing core GPU libraries and kernels.
  • Collaborate with research, engineering, and SRE teams to optimize and deploy LLMs in production.

See Your Match Score

Sign up and Renata will show you how this job matches your skills and experience.

Get Started Free
Software Engineer, GDC LLM Serving and GPU Performance at Dahl Consulting | Renata