Coach, mentor, and scale a Reliability Engineering team across planning, validation, and fleet failure analysis, optimizing resource allocation to navigate evolving data center complexities at a fast-moving pace.
Oversee manufacturing stability to ensure intrinsic product reliability across all verticals at APAC contract manufacturer locations, proactively identifying workflow opportunities to better support dynamic business needs.
Drive Design for Reliability (DfR) methodologies and DFMEAs from the initial concept phase, formalizing a lessons learned pipeline to directly shape design rules for next-generation ML hardware.
Lead high-priority investigations for complex, intermittent field reliability failures, guiding internal teams, OEMs, and external laboratories through advanced failure analysis techniques to validate conclusions and enforce strict remediation standards.
Utilize statistical tools, physics-of-failure models, and internal reliability data to predict product life performance, feedback application stress, enable early detection, and define comprehensive end-of-life strategies.

System Hardware Reliability Manager, AI Infrastructure

Job Description