Job Description
Job Description
What's the opportunity?
We're looking for a Senior ML Platform Engineer to join the AI Farm team — RBC's enterprise GPU compute and data platform for machine learning. You'll own and deliver critical platform capabilities that enable hundreds of ML researchers and engineers to train models, access data, and deploy at scale.
This isn't a typical MLOps role. You'll be building the platform itself — the Kubernetes infrastructure, data access layer, compliance automation, and developer tooling that our ML teams depend on daily. You'll work at the intersection of distributed systems, data engineering, and platform engineering, solving problems like multi-tenant GPU scheduling, data governance enforcement, and self-serve infrastructure provisioning.
At RBC Borealis, you'll join a small, high-impact team that operates AI Farm — an on-premise OpenShift + Run:AI cluster with H100, B300, and A100 GPUs serving multiple business units. You'll have direct ownership over system design decisions and ship features that immediately impact researcher productivity.
Your responsibilities include:
Designing and building Kubernetes-native automation for platform operations: PV lifecycle management, namespace provisioning, compliance scanning, and workload enforcement
Owning the data infrastructure layer: Trino/Starburst cluster operations, column-level data masking, resource group management, and catalog provisioning automation
Building developer-facing tools and libraries (Python SDK, CLI) that reduce cognitive load for ML teams accessing data and compute
Implementing data governance and compliance systems: automated scanning, classification integration, retention enforcement, and audit reporting
Designing and operating observability pipelines: Grafana dashboards for GPU utilization, developer experience metrics, pipeline throughput measurement, and compliance coverage
Collaborating with INFRA, security, and compliance teams to design and enforce platform policies (OPA admission webhooks, image enforcement, access controls)
Contributing to architecture decisions (ADRs) and owning end-to-end delivery of multi-sprint epics with cross-team dependencies
You're our ideal candidate if you have:
Must Have:
5+ years of industry experience in software/platform engineering
Deep hands-on experience with Kubernetes in production (pod security, RBAC, storage classes, CronJobs, admission webhooks, custom controllers). OpenShift experience is a strong plus.
Proficiency in Python for building production tools, automation scripts, CLIs, and libraries
Experience operating distributed data systems (Trino/Presto/Spark, SQL engines, Iceberg/Hive catalogs, or similar)
Strong CI/CD and automation skills (GitHub Actions, Helm, GitOps, infrastructure-as-code)
Experience building multi-tenant platforms with self-serve provisioning for internal teams
Ability to own and deliver complex, ambiguous projects end-to-end with minimal direction
Strong Preference:
Experience with data governance, compliance automation, or security enforcement on shared platforms
Hands-on Prometheus/Grafana: building dashboards, alerting, and instrumentation from scratch
Container image lifecycle management (registries, scanning, enforcement policies)
Experience with GPU compute platforms (Run:AI, Slurm, or cloud GPU scheduling)
Familiarity with S3-compatible object storage and persistent volume management
Nice to Have:
Experience with Trino/Starburst (resource groups, connectors, column masking, SEP)
OPA/Gatekeeper policy-as-code experience
Familiarity with ML workflows (training jobs, experiment tracking, model serving) — enough to empathize with platform users
Experience in regulated industries (financial services, healthcare) with compliance requirements
Strong fundamentals in networking, storage, and distributed systems
What's in it for you?
Own significant platform capabilities on a small team with high autonomy and direct business impact
Work with cutting-edge GPU hardware (NVIDIA B300, H100, A100) powering real ML research
Collaborate with high-performing engineers and AI researchers solving problems in finance
A comprehensive Total Rewards Program including bonuses and flexible benefits, competitive compensation, commissions, and stock options where applicable
Leaders who support your development through coaching and managing opportunities
Clear growth path: Senior Engineer → Staff Engineer, with increasing scope over platform architecture
About RBC Borealis
RBC Borealis is the driving force behind Royal Bank of Canada's AI and data innovation. As part of Canada's largest financial institution, we bring together a team of architects, engineers, scientists, and product experts on a mission to revolutionize finance through world-class research, solutions, and a resilient data platform. With locations across Toronto, Waterloo, Montreal, Calgary, and Vancouver, we're at the forefront of AI research and platform development. With a focus on cutting-edge research in areas like time series forecasting, causal machine learning, and responsible AI, we are seamlessly integrating AI research and data engineering, to solve critical challenges in the financial industry. We are building intelligent, and scalable, data-driven solutions that will help communities thrive and drive innovation for our customers across the bank.
Inclusion and Equal Opportunity Employment
RBC is an equal opportunity employer committed to diversity and inclusion. We are pleased to consider all qualified applicants for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, protected veterans status, Aboriginal/Native American status or any other legally-protected factors. Disability-related accommodations during the application process are available upon request.
#LI-Post
#TechPJ
Job Skills
CI/CD, Cloud Infrastructure, Container Orchestration, Data Infrastructure, Data Systems, DevOps, End-to-End Testing, Kubernetes, Platform Engineering, Programming Languages, Python (Programming Language), SRE Observability, Structured Query Language (SQL)Additional Job Details
Address:
City:
Country:
Work hours/week:
Employment Type:
Platform:
Job Type:
Pay Type:
Posted Date:
Application Deadline:
Note: Applications will be accepted until 11:59 PM on the day prior to the application deadline date above
Our Employment Opportunities
At RBC, we are guided by living shared values of Client First, Integrity, Collaboration, Respect and Excellence and winning together as One RBC. We believe an inclusive workplace that has diverse perspectives is core to our continued growth as one of the largest and most successful banks in the world. Maintaining a workplace where our employees feel supported to perform at their best, effectively collaborate, drive innovation, and grow professionally helps to bring our Purpose to life and create value for our clients and communities. RBC strives to deliver this through policies and programs intended to foster a workplace based on respect, belonging and opportunity for all.
Join our Talent Community
Stay in-the-know about great career opportunities at RBC. Sign up and get customized info on our latest jobs, career tips and Recruitment events that matter to you.
Expand your limits and create a new future together at RBC. Find out how we use our passion and drive to enhance the well-being of our clients and communities at jobs.rbc.com.
RBC is presently inviting candidates to apply for this existing vacancy. Applying to this posting allows you to express your interest in this current career opportunity at RBC. Qualified applicants may be contacted to review their resume in more detail.
