Job Description
This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Site Reliability Consultant based in Canada.
This role sits at the intersection of cloud infrastructure, software reliability, and large-scale distributed systems engineering. You will be responsible for designing, operating, and continuously improving highly available platforms that support critical workloads across cloud-native environments. The position involves deep hands-on work with Kubernetes, observability tooling, and automation frameworks to ensure systems remain resilient, scalable, and performant. You will collaborate closely with engineering, data, and AI/ML teams to enable reliable infrastructure for complex workloads. This is a highly technical and impact-driven role where your work directly influences system uptime, performance, and engineering efficiency. You will also contribute to incident response, root cause analysis, and long-term reliability improvements across global systems.
This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Site Reliability Consultant based in Canada.
This role sits at the intersection of cloud infrastructure, software reliability, and large-scale distributed systems engineering. You will be responsible for designing, operating, and continuously improving highly available platforms that support critical workloads across cloud-native environments. The position involves deep hands-on work with Kubernetes, observability tooling, and automation frameworks to ensure systems remain resilient, scalable, and performant. You will collaborate closely with engineering, data, and AI/ML teams to enable reliable infrastructure for complex workloads. This is a highly technical and impact-driven role where your work directly influences system uptime, performance, and engineering efficiency. You will also contribute to incident response, root cause analysis, and long-term reliability improvements across global systems.
