Job Description
This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Lead Site Reliability Engineer based in Canada.
This is a high-impact technical leadership role focused on improving reliability across large-scale distributed systems that directly impact millions of customers. You will sit at the core of incident response and production stability, working across engineering teams to identify systemic failure patterns and eliminate them at the root. The role blends hands-on engineering with cross-functional influence, requiring you to translate real production incidents into durable architectural and operational improvements. You will help define and elevate reliability standards across the organization, shaping how systems are built, deployed, and operated. Beyond incident response, you will drive long-term resilience through observability, automation, and safer deployment practices. This is a highly collaborative environment where influence matters as much as execution, and where your work compounds across teams and services. You will also help mature a growing SRE practice, moving it from reactive incident handling to proactive reliability engineering.
This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Lead Site Reliability Engineer based in Canada.
This is a high-impact technical leadership role focused on improving reliability across large-scale distributed systems that directly impact millions of customers. You will sit at the core of incident response and production stability, working across engineering teams to identify systemic failure patterns and eliminate them at the root. The role blends hands-on engineering with cross-functional influence, requiring you to translate real production incidents into durable architectural and operational improvements. You will help define and elevate reliability standards across the organization, shaping how systems are built, deployed, and operated. Beyond incident response, you will drive long-term resilience through observability, automation, and safer deployment practices. This is a highly collaborative environment where influence matters as much as execution, and where your work compounds across teams and services. You will also help mature a growing SRE practice, moving it from reactive incident handling to proactive reliability engineering.
