Job Description
- Participate in on-call activities and manage domain systems, collaborating with responders to resolve issues.
- Resolve customer issues and troubleshoot AI/ML workloads by developing effective diagnostic and investigation tools.
- Partner with Product, Quality, and SRE teams to improve product quality and production standards.
