Job Description
- Own and operate the cloud infrastructure that powers Progress Agentic RAG across multiple cloud providers and regions.
- Design and run production-grade, multi-cloud platforms using Infrastructure as Code and GitOps principles.
- Lead our GitOps-driven infrastructure workflows, ensuring reliable, secure, and auditable changes.
- Operate and scale Kubernetes environments globally, supporting highly available, secure, and scalable workloads.
- Enable platform and infrastructure delivery through modern CI/CD and automation practices.
- Design and maintain secure, zero-trust networking and identity models across cloud and on‑prem environments.
- Build and evolve monitoring, incident response, and operational readiness for a 24/7 production platform.
- Collaborate with engineering, security, and product teams to continuously improve reliability and developer experience.
- Mentor engineers and help define infrastructure standards, documentation, and best practices.
- Strong experience designing and operating cloud or platform infrastructure at scale, including high availability, security, and disaster recovery.
- Deep hands-on expertise with Terraform and Infrastructure as Code.
- Proven experience running Kubernetes in production (GKE, EKS, AKS), including scaling, security, and observability.
- Experience with GitOps-based workflows and tools such as ArgoCD or similar.
- Solid hands-on experience with AWS and/or GCP and their core networking, compute, storage, and identity services.
- Experience building and maintaining CI/CD pipelines, ideally with GitHub Actions.
- Good understanding of cloud networking and identity, including zero-trust concepts and workload identity.
- Ability to automate operational tasks using Python/Go.
- Proven experience architecting on-premises infrastructure and managing Kubernetes on bare metal.
- Hands-on expertise in hybrid release management and artifact distribution for restricted environments.
- Strong communication and collaboration skills in cross-functional, distributed teams.
- Comfortable owning complex systems end to end and making decisions in production environments.
- A proactive, automation-first mindset with a focus on operational excellence.
- Willingness to mentor others and contribute to shared standards and documentation.
- Comfortable working in a remote-first, global environment.
- Exposure to cloud cost optimization and governance.
- Experience with security posture or compliance tooling.
- Familiarity with AI/ML infrastructure, including GPU-based workloads.
- Contributions to open-source or community infrastructure projects.
Compensation
- Generous remuneration package
- Employee Stock Purchase Plan Enrollment
- 23 vacation days annually
- Birthday day off
- Community service time off
- International Women's Day - March 8 is an official holiday for all employees
- Life and Medical Insurance
#LI-DG1
#LI-Remote
Together, We Make Progress
Progress is an inclusive workplace where opportunities to succeed are available to everyone. As a multicultural company serving a global community, we encourage a wide range of points of view and celebrate our diverse backgrounds. Our unique combination of perspectives inspires innovation, connects us to our customers and positively affects our communities. It is only by working together and learning from each other that we make Progress. Join us!
