
Observability and Automation Architect
Job Description
Position Highlights
An Observability and DevOps Architect is a senior technology leader responsible for designing enterprise-wide monitoring and automation strategies that improve system reliability, performance visibility, and operational efficiency. They architect scalable observability solutions (metrics, logs, traces) across cloud and hybrid environments, implement SRE practices (SLIs/SLOs), and drive automation and self-healing capabilities to reduce manual effort and MTTR. This role combines deep expertise in cloud-native architectures, Kubernetes, and modern observability tools with strong automation skills (IaC, scripting, CI/CD integration).
Main responsibilities:
• Define and implement enterprise observability architecture (metrics, logs, traces, events).
• Design end-to-end monitoring for cloud-native, microservices, and distributed systems.
• Standardize logging, tracing, alerting, and dashboarding practices across platforms.
• Implement APM, infrastructure monitoring, synthetic monitoring, and user experience monitoring.
• Establish SLI/SLO frameworks and integrate reliability metrics into engineering workflows.
• Architect and implement automation solutions for incident response, remediation, and operational tasks.
• Develop auto-remediation workflows and self-healing infrastructure.
• Drive Infrastructure as Code (IaC) and policy-as-code adoption.
• Integrate automation pipelines within CI/CD and DevOps toolchains.
• Enable event-driven automation across hybrid/cloud environments.
• Observability: Dynatrace, Datadog, New Relic, Prometheus, Grafana, ELK, Splunk, OpenTelemetry
• Automation: Ansible, Terraform, Puppet, Chef, Jenkins, GitHub Actions
• Cloud: AWS, Azure, GCP
• Containers: Kubernetes, OpenShift, Docker
• Drive OpenTelemetry adoption and telemetry standardization.
• Ensure integration across ITSM, ITOM, and incident management tools (e.g., ServiceNow).
Main requirements:
• 7+ years of experience in IT infrastructure, DevOps, SRE, or platform engineering.
• 5+ years of hands-on experience with enterprise observability platforms.
• Kubernetes and container observability
• Cloud-native architecture
• Infrastructure as Code (Terraform/CloudFormation)
• Scripting (Python, Bash, PowerShell, or similar)
• Deep understanding of SRE principles (SLI/SLO/Error Budgets).
• Advanced knowledge of AWS, Azure, and/or GCP.
• Kubernetes observability
• Containerized & microservices architecture
• Experience with hybrid and multi-cloud monitoring architectures.