Back to jobs

Senior Systems Engineer, Site Reliability Engineering, Distributed Cloud
Posted 3 weeks ago
Job Description
- Improve the whole lifecycle of services from inception and design, through deployment, operation, and refinement.
- Provide guidance to other team members on managing availability and performance of mission critical services, building automation to prevent problem recurrence, and building automated responses for non-exceptional service conditions.
- Maintain services once they are live by measuring and monitoring availability, latency, and overall system health. Lead sustainable incident response and blameless postmortems.
- Scale systems sustainably through mechanisms like automation and evolve systems by driving changes that improve reliability and velocity.
- Manage support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning, and launch reviews.