
Senior Platform Engineer
Job Description
Company description Saatchi & Saatchi is an advertising agency with the belief that creativity, data, media and technology should all work together, and we use that to influence human behavior and drive success for clients. S&S is one of the world's largest agency networks with 114 offices and more than 6000 employees globally. Here in our Toronto office we work with some of the countries most valued brands including Toyota, Buckley's, and Children's Advil, as well as several under the Mondelez banner (Crispers, Oreo, and Chips Ahoy just to name a few). We’re an award winning agency in both creativity and effectiveness, so it’s really important for us here to convert that innovation and the great ideas into real tangible business results for the organization. Overview Saatchi & Saatchi Canada is currently opening a new Senior Platform Engineer position to support an innovative AI initiative. This role is part of a growing team with an immediate start date. We are seeking a candidate with a minimum of 6 years of experience in cloud infrastructure and DevOps practices, who is passionate about automation, reliability, and enabling development teams to deliver value efficiently. A strong understanding of security, scalability, and operational excellence is essential. Responsibilities Own the platform across many concurrent environments (dev, sit, staging, sandbox, live-preview, prod, and chaos) — provisioning, configuration, scaling, and recovery. Manage Google Kubernetes Engine (GKE) at production scale: node pools, autoscaling, HPA tuning, PDBs, blue-green and rolling cluster migrations, and Cloud NAT / private node hardening. Own Terraform across cloud resources (GKE, Cloud SQL, Memorystore, GCS, IAM, networking, Cloud Armor, WIF) — including the judgement calls about blast radius when state and reality diverge. Build and maintain CI/CD across multiple tiers (frontend, backend, agentic services) on GitHub Actions, including gated deployments, image promotion, and safe rollbacks. Run live infrastructure changes during high-pressure live events (Builder Nights, client demos, integration partner releases) — communicating intent, status, and risk to the broader team in real time. Lead incident response on infrastructure issues: triage, mitigate, communicate, post-mortem, and turn the learnings into runbooks and follow-up tickets. Establish and operate the observability stack — metrics, logs, traces, alerts — so the team finds problems before users do, without drowning in noise. Implement and enforce security and compliance baselines: secrets management, IAM least-privilege, network segmentation, audit logging, supply-chain hygiene. Partner with the Dev Lead and application engineers on the application-side reliability concerns (graceful shutdown, retry semantics, capacity planning) so the platform and the apps share the same operational contract. Write down what you know — runbooks, decision logs, environment maps, on-call guides — so the next platform engineer can read in instead of starting over. Qualifications You enjoy chaos. Live events, unpredictable load, and infrastructure surprises are why you took this kind of job — not what makes you want to leave it. Strong written and verbal communication. You narrate decisions in real time, write clear status updates while things are on fire, and explain trade-offs to non-infra people without condescension. Calm, deliberate decision-making under pressure — especially around irreversible or high-blast-radius changes. Clear ownership instincts — you can say 'I've got this' and follow through, and you can say 'this isn't mine' and explain why. Generous documentation habit — you leave the platform more legible than you found it. Pragmatic — you reach for the boring, well-understood solution before the novel one. Comfortable disagreeing with engineers, leads, and stakeholders when you see a real risk; respectful when you do. Nice to Have Prior experience inside Publicis Groupe or another large holding-company agency environment. Additional information Location & Eligibility: Candidates must be based in the Greater Toronto Area with valid Canadian work authorization of at least 12 months. Remote Work: This role is mostly remote. If you are GTA-based, the team meets once a month at our Toronto office (111 Queen St. E, Suite 200, Toronto, ON M5C 1S2). Time Off: - Up to 3 weeks vacation, with additional paid closure between Christmas and New Year's - Extended long weekends for provincial holidays — we give you both the Monday and Friday so you get a full 4-day break - 6 sick days and 2 personal days per year Flexibility & Global Mobility: Work remotely for up to 6 weeks per year from any of our 50+ global offices through our Work Your World program. Benefits: Comprehensive group coverage including: - Medical, dental, and vision care - Psychological and paramedical services - Disability insurance - Fertility support and gender-affirming care - Dedicated internal guidance programs for employees navigating cancer, fertility treatments, or gender transition Compensation: The salary range for this position is $90,000–$120,000 per year, based on experience, skills, and relevant certifications. We believe in pay transparency and are committed to offering competitive, market-aligned compensation. We use artificial intelligence (AI) tools to support parts of our hiring process, such as reviewing applications or analyzing resumes. These tools assist our recruitment team but never replace human decision-making. We believe in a human-first approach, where your experience and potential are recognized by people. Saatchi & Saatchi is committed to building a diverse workforce representative of our community. We encourage and are pleased to consider all qualified candidates, without regard to race, colour, citizenship, religion, sex, marital / family status, sexual orientation, gender identity, aboriginal status, age, disability or persons who may require an accommodation, to apply. If you require a specific accommodation please contact Human Resources at 416-925-7733 or by email at [email protected].
You enjoy chaos. Live events, unpredictable load, and infrastructure surprises are why you took this kind of job — not what makes you want to leave it. Strong written and verbal communication. You narrate decisions in real time, write clear status updates while things are on fire, and explain trade-offs to non-infra people without condescension. Calm, deliberate decision-making under pressure — especially around irreversible or high-blast-radius changes. Clear ownership instincts — you can say 'I've got this' and follow through, and you can say 'this isn't mine' and explain why. Generous documentation habit — you leave the platform more legible than you found it. Pragmatic — you reach for the boring, well-understood solution before the novel one. Comfortable disagreeing with engineers, leads, and stakeholders when you see a real risk; respectful when you do. Nice to Have Prior experience inside Publicis Groupe or another large holding-company agency environment.
Own the platform across many concurrent environments (dev, sit, staging, sandbox, live-preview, prod, and chaos) — provisioning, configuration, scaling, and recovery. Manage Google Kubernetes Engine (GKE) at production scale: node pools, autoscaling, HPA tuning, PDBs, blue-green and rolling cluster migrations, and Cloud NAT / private node hardening. Own Terraform across cloud resources (GKE, Cloud SQL, Memorystore, GCS, IAM, networking, Cloud Armor, WIF) — including the judgement calls about blast radius when state and reality diverge. Build and maintain CI/CD across multiple tiers (frontend, backend, agentic services) on GitHub Actions, including gated deployments, image promotion, and safe rollbacks. Run live infrastructure changes during high-pressure live events (Builder Nights, client demos, integration partner releases) — communicating intent, status, and risk to the broader team in real time. Lead incident response on infrastructure issues: triage, mitigate, communicate, post-mortem, and turn the learnings into runbooks and follow-up tickets. Establish and operate the observability stack — metrics, logs, traces, alerts — so the team finds problems before users do, without drowning in noise. Implement and enforce security and compliance baselines: secrets management, IAM least-privilege, network segmentation, audit logging, supply-chain hygiene. Partner with the Dev Lead and application engineers on the application-side reliability concerns (graceful shutdown, retry semantics, capacity planning) so the platform and the apps share the same operational contract. Write down what you know — runbooks, decision logs, environment maps, on-call guides — so the next platform engineer can read in instead of starting over.