Back to jobs
Job Description
- Enable and optimize foundational models (e.g., LLMs and Diffusion) within key frameworks like vLLM, MaxText, and MaxDiffusion, providing Google Cloud customers with immediate access to AI capabilities.
- Partner with customers to measure Artificial Intelligence/Machine Learning (AI/ML) model performance on Google Cloud infrastructure. Identify and resolve technical bottlenecks to drive customer success working with Customer Engineers teams.
- Collaborate with internal infrastructure teams to enhance support for demanding AI workloads. Contribute to product improvement by identifying bugs and recommending enhancements.
- Conduct performance profiling, debugging, and troubleshooting of training and inference workloads. . Maintain and update documentation and educational content based on product changes and user feedback. Triage, debug, and resolve system issues by analyzing root causes and operational impact.
- Design and implement specialized Machine Leaning solutions leveraging advanced ML infrastructure.
