Back to jobs

System Hardware Reliability Manager, AI Infrastructure
Taipei, TaiwanPosted 2 weeks ago
hybrid
Job Description
- Coach, mentor, and scale a Reliability Engineering team across planning, validation, and fleet failure analysis, optimizing resource allocation to navigate evolving data center complexities at a fast-moving pace.
- Oversee manufacturing stability to ensure intrinsic product reliability across all verticals at APAC contract manufacturer locations, proactively identifying workflow opportunities to better support dynamic business needs.
- Drive Design for Reliability (DfR) methodologies and DFMEAs from the initial concept phase, formalizing a lessons learned pipeline to directly shape design rules for next-generation ML hardware.
- Lead high-priority investigations for complex, intermittent field reliability failures, guiding internal teams, OEMs, and external laboratories through advanced failure analysis techniques to validate conclusions and enforce strict remediation standards.
- Utilize statistical tools, physics-of-failure models, and internal reliability data to predict product life performance, feedback application stress, enable early detection, and define comprehensive end-of-life strategies.