Back to jobs
Google

System Debug Engineer Manager, Cloud AI Infrastructure

Kirkland, WA, USAPosted 2 weeks ago
onsite

Job Description

  • Drive technical team performance across on-call activities and system management by delivering leadership, mentorship, and career development while collaborating with primary responders to address system issues.
  • Debug platform hardware, silicon, and AI/ML workloads to drive root-cause resolution, develop permanent infrastructure improvements, and build tools for faster diagnosis through troubleshooting and reproduction.
  • Collaborate cross-functionally with Product, Quality, and Engineering teams to ehance product outcomes, and engage with Site Reliability Engineering (SRE) teams to ensure high-quality production and reliability.
  • Resolve customer challenges on AI/ML infrastructure through effective diagnosis, resolution, and the implementation of investigation tools to increase productivity for critical reported issues.
  • Serve as a consultant and subject matter expert for internal stakeholders to resolve deployment and operational obstacles across AI infrastructure environments daily.

See Your Match Score

Sign up and Renata will show you how this job matches your skills and experience.

Get Started Free
System Debug Engineer Manager, Cloud AI Infrastructure at Google | Renata