Software Engineer, ML Fleet Intelligence at Dahl Consulting

Lead the design and implementation of solutions in specialized ML areas, optimize ML infrastructure, and guide the development of model optimization and data processing strategies.
Design and implement AI/ML models to predict, detect, and mitigate hardware and software faults across a global fleet.
Analyze petabytes of telemetry and performance data to uncover insights that improve the reliability of ML TPUs and traditional compute infrastructure.
Build scalable automated systems that allow Google’s data center footprint to grow while maintaining industry-leading uptime.
Partner with hardware designers and site reliability engineers (SREs) to integrate intelligent diagnostics into the core data center lifecycle.

Software Engineer, ML Fleet Intelligence