Job Description
Remedy Robotics is a medical technology company developing robotic systems for endovascular intervention. Its proprietary technology combines robotics, machine learning, and advanced computer vision to help physicians perform highly precise endovascular procedures and expand access to life-saving stroke and cardiovascular care. Initially focused on neurovascular intervention, Remedy is addressing the limited availability of specialized treatment for time-critical cardiovascular emergencies, with the long-term goal of enabling expert intervention regardless of patient location. Headquartered in San Francisco, Remedy is backed by DCVC, Blackbird, and Tony Fadell's Build Collective, among others.
We are looking to hire a Dev Ops Engineer for our Software Team.
What You’ll Do:
Own the developer platform end-to-end for a multidisciplinary team building an autonomous surgical robot
Design, build, and operate CI/CD pipelines across Python (core application and ML), C++ (robot control), and TypeScript (surgical UI) codebases with distinct testing and deployment requirements
Own lab compute infrastructure, including on-prem Ubuntu servers, GPU workstations, and supporting network systems
Improve developer experience across the organization, including local development environments, build systems, package management, and test reliability
Integrate hardware-in-the-loop testing into CI workflows where appropriate to support system-level regression testing
Manage and harden infrastructure security across both on-prem and cloud environments
Support ML workflows, including GPU compute pipelines, experiment tracking, and model deployment
Own cloud infrastructure for training, data processing, and remote services
Partner closely with software, ML, hardware, and data teams to optimize tooling, workflows, and overall development velocity
Knowledge, Skills, Abilities:
Experience operating CI/CD pipelines for polyglot codebases, including troubleshooting CI systems (e.g., GitHub Actions), building non-trivial workflows, and evaluating tradeoffs between self-hosted and managed runners
Strong Linux systems administration experience and proficiency with infrastructure-as-code practices
Advanced Python proficiency, with ability to contribute to C++ and TypeScript codebases as needed
Experience with cloud infrastructure (e.g., AWS or equivalent platforms)
Proficiency with modern developer tooling and AI-assisted coding environments (e.g., Claude Code, Cursor, or similar) as part of daily workflow
Strong communication skills and a service-oriented mindset focused on improving developer productivity across engineering teams
Minimum Qualifications:
5+ years of experience in DevOps, platform engineering, or infrastructure engineering on complex production systems
Preferred Qualications:
Background in robotics, embedded systems, or scientific computing, including experience working with hardware-dependent testing environments
Experience with ML pipeline orchestration tools (e.g., SkyPilot, Metaflow, Ray, or similar frameworks)
Experience operating self-hosted GitHub Actions runners at scale
Familiarity with Python monorepo tooling (e.g., uv, Poetry, Bazel) and C++ dependency and packaging systems (e.g., Conan, vcpkg)
Experience with real-time Linux systems
Experience building audit-ready build systems, including signed builds, traceable artifacts, and reproducible build processes (e.g., in preparation for standards such as IEC 62304)
Prior experience in medical device development or other regulated industries
Familiarity with containerization and orchestration technologies (e.g., Docker and Kubernetes), with awareness that Kubernetes adoption may evolve over time