Job Description
As the Senior Platform Engineer for Monitoring and Logging, you will serve as a key technical engineer responsible for building, scaling, and maintaining our enterprise-wide observability and log management ecosystem. Collaborating closely with your team and the principal engineer, you will focus on the technical execution and engineering of our open telemetry pipeline transformation—ensuring systems can seamlessly ingest terabytes of daily logs, metrics, and traces. You will directly configure and maintain our data distribution pipelines using Cribl, establish analytical environments in Sumo Logic (log management) and Datadog (monitoring), and help internal customers manage responsive alerting loops through PagerDuty.
This role works with core platform engineering, continuous infrastructure maintenance, and site reliability areas. You will ensure product and system teams across the company have the required visibility into their software stacks while maintaining tight control over data configurations, filtering workflows, and ingestion costs through automated configuration baselines.
What you will do
Observability Platform Maintenance: Support and extend dashboards, metrics tracking, and APM tracing infrastructure inside Datadog and Sumo Logic. Maintain multi-tenant workspaces and universal tagging compliance across teams.
Incident Response Configuration: Configure and manage PagerDuty infrastructure. Maintain service orchestrations, alert routing rules, event intelligence settings, on-call calendar schedules, and native alerts integration across collaboration platforms (Slack).
FinOps Execution & Data Controls: Optimize telemetry pipeline data flows using Cribl to eliminate noise, drop duplicate fields, and strip out bloated payloads. Ensure high-value signals reach Sumo Logic and Datadog while directing low-value compliance logs to archival cold storage.
Ansible Configuration Management: Fully automate the deployment, onboarding, patch management, and state consistency of monitoring agents (Datadog agents, Sumo collectors, Cribl Edge) and pipeline configurations using Ansible Playbooks and Roles.
Standardization Compliance: Enforce telemetry schemas, log signatures, and operational golden signals across the enterprise. Collaborate on the implementation and configuration of OpenTelemetry (OTel) collectors.
Team Upskilling & Collaboration: Serve as an engineering mentor across internal product teams, building out technical documentation, runbooks, and leading enablement sessions for modern logging and alerting procedures.
What you will bring
3 years of Python development experience
Proven expertise in Datadog, including AWS integrations and dashboard templating.
Experience with SignalFX/Splunk Observability Cloud and legacy monitoring paradigms.
Experience working across Infra, App, and DevOps teams to create relevant metrics.
Experience with applying Site Reliability Engineering (SRE) concepts
Strong understanding of AWS architecture and cloud-native observability.
Strong understanding of monitoring distributed systems
Familiarity with OpenShift or Kubernetes
Familiarity with Ansible
Familiarity with Infrastructure-as-Code concepts
Familiarity with OpenTelemetry
Excellent communication and stakeholder management skills.
Preferred Qualifications
Certifications in Datadog, AWS, or related observability platforms.
Experience in enterprise-scale monitoring transformations.
#LI-SM1
About Red Hat
Red Hat is the world’s leading provider of enterprise open source software solutions, using a community-powered approach to deliver high-performing Linux, cloud, container, and Kubernetes technologies. Spread across 40+ countries, our associates work flexibly across work environments, from in-office, to office-flex, to fully remote, depending on the requirements of their role. Red Hatters are encouraged to bring their best ideas, no matter their title or tenure. We're a leader in open source because of our open and inclusive environment. We hire creative, passionate people ready to contribute their ideas, help solve complex problems, and make an impact.
Inclusion at Red Hat
Red Hat’s culture is built on the open source principles of transparency, collaboration, and inclusion, where the best ideas can come from anywhere and anyone. When this is realized, it empowers people from different backgrounds, perspectives, and experiences to come together to share ideas, challenge the status quo, and drive innovation. Our aspiration is that everyone experiences this culture with equal opportunity and access, and that all voices are not only heard but also celebrated. We hope you will join our celebration, and we welcome and encourage applicants from all the beautiful dimensions that compose our global village.
Equal Opportunity Policy (EEO)
Red Hat is proud to be an equal opportunity workplace and an affirmative action employer. We review applications for employment without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, citizenship, age, veteran status, genetic information, physical or mental disability, medical condition, marital status, or any other basis prohibited by law.
