Back to jobs

Site Reliability Engineering Manager, Google Distributed Cloud
Sunnyvale, CA, USAPosted 1 weeks ago
hybrid
Job Description
- Engage with and improve the entire lifecycle of services; from inception and design, through capacity planning and launch reviews, to deployment, operation and continual improvement.
- Solve large, ambiguous problems and drive solutions across Site Reliability Engineering (SRE) and Development teams where our reliability expertise lends us to be a team multiplier.
- Balance the need for a reliable system, efficient incident response and blameless postmortems with highly dynamic, customer priorities.
- Maintain services' long-term health by creating and monitoring service level objectives (SLOs), scaling systems and processes sustainably through mechanisms such as automation.
- Mentor and train team members on design, coding, and reliability best practices and grow knowledge of Gemini and Vertex AI within GDC SRE.