Back to jobs
Job Description
What You Will Do
- Design, implement, and operate observability and AIOps capabilities for cloud-native and hybrid environments, supporting reliable, production-grade services
- Lead the onboarding of early adopter teams and services, defining and applying standards for telemetry, SLIs, SLOs, and alerting in real-world systems
- Work hands-on with engineering, SRE, and operations teams to gather requirements and translate them into actionable observability and automation solutions
- Build and maintain telemetry pipelines, dashboards, and alerting, leveraging OpenTelemetry to deliver meaningful insights and reduce operational noise
- Run and evolve observability services in Kubernetes environments, using Helm and Infrastructure as Code (Terraform), integrating with ITSM, ticketing and event management systems
