
Cloud Infrastructure and Automation Engineer
Job Description
THE COMPANY:
STACK INFRASTRUCTURE (STACK) provides digital infrastructure to scale the world’s most innovative companies. We are an award-winning industry leader in building, owning, and operating highly efficient, cost-effective wholesale, colocation, and cloud data centers. Each of our national facilities meets or exceeds the highest industry standards in all operational categories of availability, security, connectivity, and physical resilience.
STACK offers the scale and geographic reach that rapidly growing hyperscale and enterprise companies need. The world runs on data. Data runs on STACK.
THE POSITION:
The Cloud Infrastructure & Automation Engineer owns the cloud platform, DevOps pipelines, automation runtime environments, and operational infrastructure that power all of STACK’s AI, automation, and data initiatives. This is a hands-on leadership role—responsible for ensuring that every intelligent agent, automation workflow, RAG platform, and data pipeline moves from prototype to production rapidly, runs reliably, and scales cost-effectively. The scope spans Azure infrastructure provisioning using Terraform and Bicep, CI/CD pipeline engineering with Azure DevOps and GitHub Actions, container orchestration on AKS and Azure Container Apps, model serving and vector search infrastructure, automation runtime hosting, security hardening, and FinOps cost management. This lead also owns the deployment infrastructure for agentic and hybrid model workloads—including LLM/SLM serving endpoints, embedding compute, GPU/inference scaling, and multi-model routing. The ideal candidate is equally comfortable writing Terraform modules and reviewing architecture diagrams, with a relentless focus on deployment velocity, reliability, cost optimization, and security.
Azure Infrastructure & Platform Engineering
Design, deploy, and manage Azure infrastructure across dual EA subscriptions (Dev/Non-Prod and Production) including Databricks workspaces, AI Search clusters, Cosmos DB instances, ADLS Gen2, Azure OpenAI Service endpoints, and Azure Functions.
Implement Infrastructure-as-Code using Terraform, Bicep, or ARM templates with modular, version-controlled patterns enabling new workloads to deploy within hours.
Configure Azure networking (VNets, Private Endpoints, NSGs, Private DNS) for secure, globally distributed platform environments across AMER, EMEA, and APAC.
Build container-based deployment patterns (Azure Container Apps, AKS) for API serving, agent hosting, model inference, and automation execution.
Provision and manage LLM/SLM serving infrastructure: Azure OpenAI deployments, model endpoints, token-based scaling, and multi-region failover.
CI/CD, MLOps & Automation Runtime
Design end-to-end CI/CD pipelines (Azure DevOps, GitHub Actions) for application deployment, model promotion, data pipeline orchestration, and automated testing with blue/green and canary patterns.
Build MLOps pipelines for model registration, versioning, A/B testing, canary deployment, and automated rollback of LLM endpoints and RAG configurations.
Deploy and manage automation runtime infrastructure: Azure Logic Apps, Power Automate, Azure Functions, Durable Functions, and event-driven triggers for intelligent workflows.
Maintain agent hosting environments (Chainlit, FastAPI, Teams bots) for the HR PM Agent and future agentic solutions, with auto-scaling and health monitoring.
Create reusable deployment accelerators (Terraform modules, Helm charts, pipeline templates) to reduce time-to-production for each successive initiative.
FinOps, Security & Compliance
Drive Azure cost optimization: commitment-tier analysis, right-sizing, automated shutdown policies, and token consumption tracking across LLM endpoints.
Implement RBAC, managed identities, Key Vault integration, and least-privilege access across all platform components.
Ensure SOX compliance, data residency, and governance using Microsoft Purview, Defender XDR, and Azure Policy.
Manage secrets, certificates, API key rotation, and Entra ID integration for platform authentication across global regions.
Produce monthly infrastructure cost and performance reports with spend trends, cost-per-query, and optimization metrics.
THE DETAILS:
Location: Denver, CO or Dallas, TX
Travel: <10%
Benefits: Healthcare, Dental Care, Vision Insurance, Life Insurance, Paid Time Off, and Paid Leave Programs
Must be eligible to work in the United States
Must pass comprehensive background and drug screening
MUST-HAVE QUALIFICATIONS:
7+ years of cloud infrastructure/DevOps experience with at least 2 years supporting AI/ML, automation, or data platform workloads at scale.
Expert-level Azure skills: Databricks, Cosmos DB, Azure Functions, Logic Apps, ADLS Gen2, Azure AI Search, Azure OpenAI Service, Container Apps/AKS, and Azure Monitor.
Strong IaC proficiency: Terraform (modules, state, workspaces), Bicep, or ARM templates with environment-templated patterns.
Hands-on CI/CD engineering: Azure DevOps, GitHub Actions, container registries, Helm charts, and blue/green/canary deployment automation.
Solid Python and Bash skills for infrastructure tooling, automation scripts, and deployment utilities.
Deep understanding of Azure networking, security (RBAC, managed identities, Key Vault, Private Endpoints, Azure Policy), and cost management.
Experience with containerization (Docker) and orchestration (AKS or Container Apps) for production workload and model serving.
Familiarity with AI platform infrastructure: Databricks provisioning, Cosmos DB scaling, AI Search management, and LLM endpoint deployment.
PREFERRED QUALIFICATIONS
Experience deploying RAG platform infrastructure, vector search clusters, and LLM/SLM serving endpoints in production.
Hands-on MLOps: model registries, experiment tracking, automated deployment pipelines, and A/B testing infrastructure.
Background in enterprise IT environments with M365, Intune, and Entra ID.
Azure certifications: AZ-104, AZ-400, AZ-305.
FinOps certification or demonstrated cloud cost optimization experience delivering measurable savings.
Experience supporting global operations across AMER, EMEA, and APAC with high-availability requirements.
Compensation Range:
$128,260.00 - $146,017.59THIS MIGHT BE RIGHT FOR YOU IF:
You are a strong communicator, you are persuasive and clear, blending analytics with experience in decision-making.
You do not get flustered easily. You can juggle multiple priorities while balancing urgent requests with shifting timelines and deliverables.
You are a team builder. You take the time to understand and develop the strengths of your resources while formulating long-term plans for the growth and success of the team.
You are naturally curious and driven toward continual improvement. While you celebrate your successes, you take time to review and analyze campaigns for future learning.
WHY STACK?
We offer a competitive compensation package with strong benefits, including medical, dental, and vision insurance, a 401K program, flexible spending accounts – even a cell phone subsidy.
We foster a culture of appreciation, including peer-to-peer recognition and rewards programs.
Fun is part of our DNA, with events, game nights, happy hours, and barbecues.
We’re growing – this is a great time to join and make an impact!
STACK is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity and expression, age, national origin, mental or physical disability, genetic information, veteran status, or any other status protected by federal, state, or local law
Note to external agencies: We are not accepting any blind submissions or resumes/cvs from recruitment agencies. Any candidates sent to STACK Infrastructure, Inc. will not be accepted or considered as a submission without a signed agreement in place. Fees will not be paid in the event a candidate submitted by a recruiter without an agreement in place is hired; such resumes will be deemed the sole property of STACK Infrastructure, Inc.