
ML Infrastructure Service Reliability Engineer- Apple Services Engineering
Job Description
At Apple, we don’t just build products — we create transformative experiences that have reshaped entire industries. Our innovation is driven by the diversity of our people and their ideas, inspiring everything we do. Imagine the impact you could make. Join Apple and help us leave the world better than we found it. The ML Infrastructure team is responsible for managing Apple’s largest ML compute platform, multi-cloud storage abstraction and caching platform, which supports critical machine learning training workloads that power user-facing features across the Apple ecosystem. Operating across both first-party and third-party cloud environments brings complex and unique challenges. As a Site Reliability Engineer (SRE) on the ML Infrastructure team, you’ll be expected to address these challenges through a strong foundation in cloud object storage, data analysis, automation, collaboration, and advanced expertise in Kubernetes. Our team oversees the full infrastructure stack — from low-level nodes to the complete network architecture — ensuring our platform remains highly available, resilient, and efficient at scale.