Job Description
Description
Location: Ontario - Remote
- Design, build, and operate the AI runtime platform, including deployment pipelines, environment configuration, and scaling strategies
- Implement observability and monitoring across AI systems, with dashboards, alerting, and incident response processes
- Manage prompt and model configuration lifecycle, including versioning, approvals, routing logic, and rollback mechanisms
- Ensure security and compliance standards are met through access controls, auditability, and safe data handling practices
- Optimize AI system performance and cost through usage monitoring, caching strategies, and capacity planning
- 6+ years of experience in platform engineering, DevOps/SRE, or backend operations for production systems
- Hands-on experience building CI/CD pipelines and managing cloud infrastructure in production environments
- Strong understanding of observability practices (metrics, logging, distributed tracing) and incident management workflows
- Familiarity with operational considerations for AI/LLM systems (latency, rate limits, token usage, cost drivers)
- Experience managing infrastructure or configuration changes through code review and controlled release processes
- Experience optimizing system reliability, performance, and cost in a production environment
- Your Own World-class coach to help you grow personally and professionally.
- Coaching for Friends and family because coaching is a gift worth passing on.
- Charity Days to support the causes close to your heart - because doing good feels good.
- Learning Budget to fuel your curiosity. If it helps you grow, we’re in.
- Weekly Wellbeing Hour just for you. No meetings. No emails. Just space to breathe, reflect, or reset.
- Regional benefits flex to fit your location and lifestyle.
- A welcoming place to do your best work. Comfortable, collaborative and inclusive.
The salary range for this role is $180,000 - 195,000 (CAD)
