Back to jobs
G

Member of Technical Staff - Distributed Systems

San Francisco$150K - $350KPosted 3 months ago
Full-timeremote

Job Description

About Us

Gimlet is building the next generation of AI infrastructure: large-scale AI datacenters and the orchestration platform that coordinates them.

The future of AI will require vastly more compute than exists today. But as AI workloads become more complex and new hardware architectures emerge, simply deploying more GPUs isn't enough. The challenge is making increasingly diverse compute work together.

Gimlet's platform intelligently partitions and routes workloads across heterogeneous hardware, enabling step-function improvements in performance and efficiency. Customers deploy through production-grade APIs without needing to think about hardware selection, placement, or optimization.

We work with foundation labs, hyperscalers, and AI-native companies to power production workloads at massive scale and help define the infrastructure layer for the future of AI.

About the role

Gimlet Labs is seeking a Member of Technical Staff focused on distributed systems. In this role, you will build the core platform that schedules, routes, and operates AI workloads reliably at production scale. You will work on systems that coordinate execution across thousands of nodes, expose stable production APIs, and ensure workloads run predictably under real-world load and failure conditions.

This role is well-suited for engineers who enjoy building foundational infrastructure, understanding systems end-to-end, and operating at scale.

What you will work on

  • Design and build distributed systems that orchestrate and operate AI workloads at large scale

  • Develop scheduling, routing, and resource management components that coordinate execution across many nodes and services

  • Build production-grade APIs and control planes for deploying and managing workloads

  • Implement mechanisms for reliability, availability, and fault tolerance in distributed environments

  • Instrument systems for observability and debugging at scale

  • Work closely with compilers, runtimes, and hardware to ensure end-to-end system correctness and performance

You may be a good fit if

  • Strong software engineering fundamentals

  • Experience building or operating distributed systems in production environments

  • Comfort reasoning about concurrency, failure modes, and tradeoffs in large-scale systems

Strong candidates may also have

  • Experience with Kubernetes or Kubernetes-adjacent systems beyond basic usage

  • Experience designing service-oriented architectures using RPC or asynchronous messaging

  • Familiarity with scheduling, queues, or resource management systems

  • Experience building reliable APIs and operating systems under high load

  • Software development experience in languages commonly used for systems development (e.g., Go, C++, Python)

What Makes Gimlet Different

At Gimlet, you will work on infrastructure problems that span the full stack of modern AI systems. Our team operates across datacenters, networking, distributed systems, compilers, runtimes, orchestration, and performance engineering to build the foundation for the next generation of AI infrastructure.

As an early member of the team, you will have significant ownership, work alongside highly technical engineers, and help shape both the systems we build and how we scale the company.

We value people who are excited to work across domains, take ownership of meaningful problems, and build technology that enables the next generation of AI.

See Your Match Score

Sign up and Renata will show you how this job matches your skills and experience.

Get Started Free
Member of Technical Staff - Distributed Systems at Gimlet Labs | Renata