Back to jobs
Google

Software Engineer, ML Fleet Intelligence

Posted 2 weeks ago

Job Description

  • Lead the design and implementation of solutions in specialized ML areas, optimize ML infrastructure, and guide the development of model optimization and data processing strategies.
  • Design and implement AI/ML models to predict, detect, and mitigate hardware and software faults across a global fleet.
  • Analyze petabytes of telemetry and performance data to uncover insights that improve the reliability of ML TPUs and traditional compute infrastructure.
  • Build scalable automated systems that allow Google’s data center footprint to grow while maintaining industry-leading uptime.
  • Partner with hardware designers and site reliability engineers (SREs) to integrate intelligent diagnostics into the core data center lifecycle.

See Your Match Score

Sign up and Renata will show you how this job matches your skills and experience.

Get Started Free
Software Engineer, ML Fleet Intelligence at Google | Renata