Back to jobs
Dahl Consulting

Technical Program Manager III, GPU Infrastructure Reliability, Google Cloud

Posted Today

Job Description

  • Lead the end-to-end development, project planning, and delivery of next-gen AI Infra GPU products from concept to production.
  • Lead software qualifications, release strategy, and test infrastructure management for AI hypercompute clusters.
  • Manage escalations and critical incidents while proactively identifying and mitigating risks that could impact project success.
  • Coordinate with TPMs in AI2 (e.g., ACI, Platforms, and CSCO) and ACI leadership on cross-functional initiatives related to AI Infra customer onboarding and production support.
  • Participate in the development of core management software, monitoring, and diagnostic tooling for scalable Cloud ML solutions.

See Your Match Score

Sign up and Renata will show you how this job matches your skills and experience.

Get Started Free
Technical Program Manager III, GPU Infrastructure Reliability, Google Cloud at Dahl Consulting | Renata