
Technical Program Manager, ML Fleet Capacity, Systems Enablement
Job Description
- Lead Technical Program Manager (TPgM) to deliver an automated solver solution to enable complex supply/demand matching scenarios. This will be the foundational capability in which Google leadership will make decisions based on scenarios modeled under this capability.
- Lead complex, cross-functional programs related to ML Fleet capacity management, including the design, update, and maintenance of ML Fleet's cluster-level allocation plan of record.
- Drive the development, implementation, and ongoing maintenance of fleet-wide accelerator and auxiliary resource usage metrics, policies, and robust governance frameworks.
- Identify gaps and drive initiatives to improve existing tooling and processes, enhancing the efficiency, agility, and responsiveness of ML capacity allocation and management.
- Partner with key stakeholders including ML Strategy and Allocation (MLSA), Product Area Resource Management Teams (PARMs), Capital Engineering, Supply teams, tooling engineering and system infrastructure SREs.