Role Overview
The Infrastructure & DevOps Engineer at Precision AI will focus on building and maintaining reliable infrastructure, automation, and deployment systems across diverse technical platforms. This role emphasizes DevOps/EngOps expertise with a support mindset. You will build automated CI/CD pipelines, support AWS infrastructure, clean up systems architecture, and contribute to building agentic AI tools for internal use.
Working closely with AI/ML and Engineering teams, this role requires someone comfortable operating in ambiguous, fast-paced startup environments without relying on formal processes or extensive documentation.
This role is hybrid working out of our Southeast Calgary office 3 days per week.
Key Responsibilities
Infrastructure & DevOps
- Design, build, and maintain automated CI/CD pipelines using GitHub Actions; implement deployment automation across development, staging, and production environments.
- Support and optimize AWS infrastructure; manage cloud resources, monitoring, and cost optimization.
- Clean up and refactor systems architecture to improve reliability, scalability, and maintainability.
System Support & Maintenance
- Triage and resolve infrastructure issues across AI infrastructure, website platforms, and agentic AI tools; prioritize production incidents and system reliability improvements.
- Monitor system health, respond to incidents, and implement fixes in a high-velocity startup environment.
- Maintain runbooks and infrastructure documentation to enable team self-sufficiency.
Quality & Collaboration
- Implement basic QA processes and automated testing within CI/CD pipelines.
- Collaborate with AI/ML teams on infrastructure requirements, deployment needs, and tooling support.
Relevant Experience (Ideal Candidate Background)
- 3-5+ years of professional experience in DevOps, EngOps, or infrastructure engineering roles at startups with demonstrated automation and systems reliability experience.
- Strong proficiency with GitHub Actions for CI/CD pipeline development and automation.
- Proficiency in scripting (Python/Bash) and infrastructure-as-code practices; experience debugging web applications and cloud systems.
- Hands-on experience managing AWS infrastructure (EC2, S3, Lambda, RDS, CloudWatch, etc.) in production environments.
- Experience diagnosing connectivity issues using tools like ping, curl, traceroute, and log analysis
- Comfortable handling large image datasets (e.g., on S3), CSV/JSON files, and performing data quality checks and validation.
- Ability to identify and resolve CPU/memory bottlenecks, slow services, and system performance issues.
- Basic Java and Linux experience
What You Bring
- Bias for action and excellence in maintenance work within scrappy, resource-constrained environments.
- Ability to anticipate downstream issues and handle second-order effects independently without constant oversight.
- Collaborative yet autonomous working style; excited by variety and breadth over narrow specialization.
- Attention to detail in troubleshooting, performance monitoring, and incident documentation.
Bonus
- Startup background with end-to-end ownership of support across multiple teams.
- Exposure AI/ML infrastructure, model deployment pipelines, or agentic AI tools.
- Exposure to edge AI deployment on NVIDIA Jetson platforms, including running and testing GPU-accelerated workloads (CUDA/TensorRT).
- Experience deploying, running, and troubleshooting containerized (Docker) applications.