Back to jobs

Technical Program Manager III, ML Infrastructure Resource Management, Google Cloud
Sunnyvale, CA, USAPosted 1 weeks ago
hybrid
Job Description
- Act as a trusted advisor to Product Area partners, understanding their TPU/GPU requirements and delivering a guided, seamless resource management experience.
- Collaborate closely with Software Engineering (SWE) and Site Reliability Engineering (SRE) teams to uncover, analyze, and execute on efficiency opportunities across our managed resource footprints.
- Own the operational execution of capacity allocations and allied workflows using core Google tooling, a technical or engineering background is critical to successfully navigating this significant operational component.
- Partner cross-functionally to drive tool and process optimizations. Leverage strong data analysis skills to convert fleet metrics into actionable business value and automated scalability.
- Utilize an understanding of ML fundamentals to inform resourcing decisions, with a preference for practical experience in deploying large-scale ML models.