Back to jobs
Ontrac Solutions

Senior Production Engineer (IC4)

(4)Posted Today
onsite

Job Description

About Ontrac Solutions

At Ontrac Solutions, we partner with elite engineering organizations to build systems that operate at planetary scale. Our team supports complex cloud, infrastructure, automation, and production engineering initiatives for organizations modernizing critical platforms and high-availability environments.

We are seeking a highly skilled Senior Production Engineer — IC4 to support a critical customer engagement. This role is ideal for a hands-on engineering professional with deep experience in infrastructure modernization, Linux systems, Python automation, production support, and large-scale migration execution.

Role Overview

The Senior Production Engineer will work closely with Cloud Platform Engineering, CloudTech SRE, internal engineering teams, and customer stakeholders to support the modernization of legacy infrastructure into production-ready environments.

This individual will help lead complex operating system upgrades, packaging migrations, configuration management transitions, observability improvements, CI/CD hardening, and service onboarding efforts across a large-scale infrastructure footprint.

The ideal candidate is comfortable executing independently, owning technical workstreams, resolving complex production issues, and documenting repeatable processes for long-term operational success.

Key Responsibilities

  • Lead and execute large-scale OS modernization efforts, including migrations from RHEL7 to EL8/EL9 across approximately 1,700 systems and virtual machines.
  • Support configuration management transitions, including Chef to CINC and legacy package/configuration migration from yinst to RPM.
  • Build, maintain, and configure RPM packages to support infrastructure modernization and application migration efforts.
  • Develop, execute, and improve automated runbooks for OS upgrades, configuration changes, service onboarding, and production support.
  • Triage, own, and resolve complex production issues, including high-priority S-bugs and infrastructure-related incidents.
  • Harden CI/CD pipelines, observability frameworks, and rollout/rollback mechanisms for legacy-to-modern infrastructure transitions.
  • Partner closely with CloudTech SRE to provide follow-the-sun Tier-2 production support, including hands-on incident response and break/fix operations.
  • Onboard services to modern monitoring, logging, and observability stacks.
  • Support migrations from legacy monitoring tools such as Yamas to platforms such as Chronosphere, Prometheus, and Grafana.
  • Assist with log management and Splunk integration strategies.
  • Partner with application development teams during cloud cutovers, component migrations, and production readiness activities.
  • Automate repetitive operational tasks using Python and related tooling.
  • Document technical procedures, runbooks, migration steps, and operational standards.

Required Qualifications

  • 5+ years of professional software engineering, production engineering, SRE, DevOps, or infrastructure engineering experience.
  • Strong hands-on experience with Python for automation, tooling, scripting, and operational workflows.
  • Experience supporting Linux infrastructure in production environments, ideally including RHEL7, EL8, and EL9.
  • Experience with OS modernization, infrastructure migration, or large-scale systems upgrade initiatives.
  • Hands-on experience with package management and build processes, preferably including RPM packaging.
  • Experience with configuration management tools such as Chef, CINC, Ansible, Puppet, or similar platforms.
  • Strong understanding of production support, incident response, break/fix workflows, and Tier-2 operational support.
  • Experience hardening CI/CD pipelines and supporting safe rollout/rollback processes.
  • Familiarity with observability, monitoring, logging, and alerting frameworks.
  • Ability to work independently, manage technical tasks, and communicate clearly with engineering and stakeholder teams.
  • Strong documentation skills and the ability to create repeatable runbooks and operational procedures.

Preferred Qualifications

  • Experience with Chef to CINC migrations.
  • Experience with yinst to RPM migration or similar legacy packaging transitions.
  • Experience supporting monitoring migrations from Yamas to Chronosphere, Prometheus, or Grafana.
  • Experience with Splunk log management strategy and integration.
  • Experience supporting developers through cloud cutovers and application migration phases.
  • Experience working with Cloud Platform Engineering, SRE, or infrastructure modernization teams.
  • Familiarity with NetAuto or similar network automation / operational support tooling.
  • Experience operating in a follow-the-sun support model.
  • Prior experience supporting high-scale cloud, infrastructure, or platform engineering environments.

Scope of Work / Delivery Expectations

The contractor will help drive the technical transition of legacy systems to modern infrastructure environments. Expected workstreams include:

  • Migrating and updating configurations across approximately 1,700 systems and virtual machines from RHEL7 to EL8/EL9.
  • Developing and executing automated runbooks for OS upgrades and configuration management changes.
  • Building and maintaining RPM packages to replace legacy configuration and packaging processes.
  • Supporting the transition of monitoring infrastructure to a modern observability stack, including Chronosphere, Prometheus, and Grafana.
  • Supporting Splunk integration and logging strategies.
  • Providing Tier-2 operational support and incident response under a follow-the-sun model.
  • Partnering with application developers during cloud migration and cutover phases.
  • Improving CI/CD pipelines, deployment safety, and rollback readiness.
  • Creating documentation to support repeatable operational processes and long-term platform maintainability.

Senior Production Engineer (IC4) at Ontrac Solutions | Renata