Engineer, Production Engineering at Guild.ai

Engineer — Production Engineering

Location: San Francisco Bay Area (Hybrid/Onsite)
Type: Full-time
Stage: Early-stage startup

About the Role

We are building the control plane for AI agents in teams and companies.

As a Production Engineer, you will own the infrastructure, security, and compliance systems that allow our platform to ship fast and run reliably at scale. This is not a traditional ops role — you will write real code, contribute directly to the product, and own the full security and compliance surface of an early-stage company.

You'll work across Kubernetes infrastructure, cloud delivery, agent sandboxing, SOC2 compliance, IT systems, and production observability — and you'll contribute to the product itself, building security-sensitive features and auditing application code for vulnerabilities.

If you want to own the production backbone for the agent-native era — from a Terraform module to a pentest to an API key implementation — we want to talk.

What You'll Own

1. Cloud & Kubernetes Infrastructure

Our Stack: Manage and evolve our production and staging infrastructure on GCP (GKE) using Terraform. Own DNS, networking, and environment configuration end-to-end.
Customer Environments: Deploy and operate within customer VPCs across AWS, Azure, and GCP — adapting to varied infrastructure constraints, security requirements, and enterprise networking configurations.
Agent Sandboxing: Build and maintain Kubernetes-based sandboxing for agent execution — ensuring agents operate within strict network boundaries and must route through our API gateway rather than having unfettered internet access.
Observability: Own our observability stack, including OpenTelemetry instrumentation and integrations with New Relic and Splunk, to give the team deep visibility into system performance and agent runtime behavior.

2. Security, Compliance & IT

SOC2 & Audits: Lead infrastructure and operational work to support SOC2 compliance, including audit preparation, evidence collection, and control implementation.
Penetration Testing & Bug Bounty: Manage our HackerOne engagement — coordinating pentests, triaging incoming bug bounty reports, and driving remediation.
Product Security: Audit application code for security vulnerabilities, contribute security-sensitive product features (e.g., API key management), and ensure product and infrastructure security are coherent end-to-end.
IT & Identity: Own our IT stack — Okta, device management, and access controls — keeping the company secure as we scale.

3. CI/CD & Progressive Delivery

Deployment Pipelines: Design and maintain safe, automated CI/CD workflows supporting rollout strategies like canary and blue-green deployments.
Release Velocity: Make shipping to production a routine, boring, highly automated non-event.

What We're Looking For

Strong Fit

Experience: 5+ years in Production Engineering, Platform Engineering, or a security-focused infrastructure role, ideally at a fast-growing startup or SaaS company.
Our Stack: Strong hands-on experience with Kubernetes and GCP in production; comfortable with Terraform for managing real infrastructure.
Code over Click: Strong programming skills (Python, Go, TypeScript, etc.) with a passion for automating away toil.
Security Depth: Hands-on experience with compliance frameworks (SOC2), vulnerability management, and secure system design.

Bonus Points

Background with multi-tenant SaaS or enterprise security and procurement requirements.
Exposure to AI/ML infrastructure, particularly agent runtimes.
Experience building security-sensitive product features alongside infrastructure work.
Experience supporting pentests / bug bounties
Experience deploying and operating in customer VPCs or other external cloud environments across AWS, Azure, and/or GCP — navigating enterprise networking, security, and access constraints.

Why This Role is Unique

Broad Ownership: You'll own the full security and compliance surface of an early-stage company — from SOC2 to sandboxed agent execution to IT — while also contributing directly to the product.
Agent Infrastructure: You'll design infrastructure for autonomous AI agents, not just traditional web services — introducing unique sandboxing, observability, and security challenges.
Our Infra and Theirs: You'll operate across both our own production environment and customer cloud environments, requiring you to be fluent across AWS, Azure, and GCP.
High Autonomy: As an early hire, you'll have a seat at the table to choose the tools and define the architecture that carries us to scale.

Who Thrives Here

Engineers who are as comfortable reading application code for vulnerabilities as they are writing a Terraform module.
People who enjoy owning the full security and compliance surface, not just one layer of it.
Builders who can navigate the constraints of customer enterprise environments without losing velocity.
Those who are energized — not overwhelmed — by the breadth of an early-stage technical operations role.

Engineer, Production Engineering

Job Description