Back to jobs
SkyWater Technology

Senior Dev Ops Engineer

Bloomington, MN, USPosted 1 months ago
onsite

Job Description

About the team We build and operate the platforms that powers our enterprise AI/ML, data engineering, and reporting/BI workloads. We run in a regulated Google Cloud environment (FedRAMP High), where reliability, security, and operational rigor are non-negotiable. Role summary We are hiring a seasoned DevOps engineer who can join our team and be self-sufficient from day one—owning infrastructure, CI/CD, observability, and security guardrails that keep our AI + data + reporting systems secure, compliant, and reliable. You will serve as a hands-on engineer and mentor others: you’ll standardize environments, reduce toil, harden delivery pipelines, and improve incident response—while working inside the constraints of a FedRAMP High environment. What you’ll do (responsibilities) Own production reliability for AI + data platforms Operate and continuously improve platform reliability for batch + streaming pipelines, reporting SLAs, and ML workloads. Define and run SLOs/SLIs, alerting standards, and incident response processes (on-call, postmortems, measurable follow-ups). Build runbooks, dashboards, and automation that reduce MTTR and recurring incidents. Build secure, compliant delivery “paved roads” Design and maintain CI/CD for services, pipelines, infra, and (where applicable) model artifacts. Implement safe deployment patterns: progressive delivery, automated rollbacks, change controls, and release governance appropriate for regulated environments. Own “golden paths” and templates so engineering teams can ship reliably without reinventing the wheel. Infrastructure as Code and environment standardization Design and maintain Terraform modules and IaC standards for repeatable GCP provisioning. Operate GCP org/folder/project structures, network patterns, and environment separation (dev/stage/prod) aligned to compliance requirements. Establish secure baseline configurations and guardrails (policy-as-code where relevant). Security controls aligned to FedRAMP High / NIST expectations (Highly Desirable) Implement and operate security controls aligned to FedRAMP High / NIST 800-53 High baseline concepts: IAM hardening, audit logging, encryption, vulnerability management, secure configuration, incident handling, and continuous monitoring. Partner with compliance/security stakeholders to support audit readiness through evidence automation, control mapping, and operational documentation. GCP-first operations (typical focus areas) GKE platform operations: cluster lifecycle, upgrades, node pools, workload identity, RBAC, network policy, resource governance. Centralized logging/monitoring and audit: alert hygiene, retention, routing, and security event visibility. Secrets and key management: Secret Manager, Cloud KMS, key rotation patterns, access controls. Network controls for regulated environments: private connectivity patterns, service perimeters, and controlled egress. Minimum qualifications 10+ years in DevOps, platform engineering, or production operations for critical systems. Proven experience operating in regulated cloud environments (FedRAMP High and/or similarly constrained government high-side environments). Strong hands-on capability with: GCP operations (projects, IAM/service accounts, networking fundamentals) Kubernetes/GKE in production Terraform (or equivalent IaC) at scale CI/CD systems and release automation Observability (logs/metrics/traces), alerting, and incident response Experience working in GCP Big Data Deployments (e.g., Big Query, BigTable, CloudBuild, Cloud Run, Cloud Functions, Managed Instance Groups, Airflow, GSUtil, Cloud Composer, Vertex AI& ML tools) Proficiency in Linux and at least one of: Python, SQL, plus shell scripting (e.g., Zsh, Bash, PowerShell). Experience working with BI, ETL, Data Management Tools (e.g., dbt, Power BI, Tableau) Demonstrated ability to work independently: take ambiguous problems, drive execution end-to-end, communicate clearly, and land durable solutions. Knowledge of API development (REST API) Experience creating, hardening Docker images and Compose files Experience with Virtual Private Cloud (VPC) and cloud network segmentation Comfortable working in an Agile/Scrum environment   Preferred qualifications Experience supporting AI/ML platforms (training/inference workflows, model packaging/versioning, GPU capacity planning). Experience supporting data platforms (warehouse, ETL/ELT, orchestration, streaming) in regulated environments. Familiarity with compliance artifacts and workflows (e.g., SSP/POA&M concepts, control narratives, evidence collection), without needing constant direction from Governance, Risk, and Compliance teams. Experience with: GCP security posture tooling and workflows Policy-as-code / admission controls (OPA/Gatekeeper or similar patterns) Supply chain security (artifact signing, SBOMs, dependency management, container scanning) What defines Success in the first 90 days CI/CD is more consistent and safer (fewer manual steps, better rollback, clearer promotion paths). Alerting becomes more actionable (less noise, faster detection, faster recovery). Clear operational standards exist (runbooks, dashboards, ownership boundaries). Compliance readiness improves via practical, automated evidence signals—without slowing delivery. On-call Participate in an on-call rotation for platform-owned services, with strong expectations to reduce noisy alerts and recurring incidents through engineering. This position will report out of the MN or TX site. The ideal candidate should be able to obtain and maintain government clearances as required.

See Your Match Score

Sign up and Renata will show you how this job matches your skills and experience.

Get Started Free
Senior Dev Ops Engineer at SkyWater Technology | Renata