Back to jobs

Software Engineer, GDC LLM Serving and GPU Performance
Posted 1 weeks ago
Job Description
- Design, develop, and implement enhancements to the LLM serving stack, focusing on performance, scalability, and resource efficiency (e.g., on systems like Wiz, Servomatic).
- Contribute to the design and implementation of advanced serving architectures, including disaggregated serving.
- Build and maintain infrastructure and tooling for in-depth performance analysis, profiling, and benchmarking of LLM models on GPU accelerators.
- Identify and address performance bottlenecks across the stack, working closely with teams providing core GPU libraries and kernels.
- Collaborate with research, engineering, and SRE teams to optimize and deploy LLMs in production.