
Principal Site Reliability Engineer
Job Description
At Roche you can show up as yourself, embraced for the unique qualities you bring. Our culture encourages personal expression, open dialogue, and genuine connections, where you are valued, accepted and respected for who you are, allowing you to thrive both personally and professionally. This is how we aim to prevent, stop and cure diseases and ensure everyone has access to healthcare today and for generations to come. Join Roche, where every voice matters.
The Position
A healthier future. It’s what drives us to innovate. To continuously advance science and ensure everyone has access to the healthcare they need today and for generations to come. Creating a world where we all have more time with the people we love. That’s what makes us Roche.
Advances in AI, data and computational sciences are transforming drug discovery and development. Roche’s Research and Early Development organizations at Genentech (gRED) and Pharma (pRED) have demonstrated how these technologies accelerate R&D, leveraging data and novel computational models to drive impact. Seamless data sharing and access to models across gRED and pRED are essential to maximising these opportunities. The Computational Sciences Center of Excellence (CS CoE) is a strategic, unified group whose goal is to harness the transformative power of data and Artificial Intelligence (AI) to assist our scientists in both pRED and gRED to deliver more innovative and life-changing medicines for patients worldwide.
Within the CS CoE organisation, the Data and Digital Catalyst (DDC) organization leads the modernization of our computational and data ecosystems by integrating digital technologies across Research and Early Development to empower stakeholders, advance data-driven science and accelerate decision-making.
The Technical Services and Operations (TSO) group within DDC is accountable for providing application support (Level 1-3) for hundreds of software solutions. In addition, TSO is responsible for DevOps and Operations for the DDC’ application landscape, both Cloud and On Premise. Finally, TSO provides technical support for systems that offer tools to train large language models (LLMs) and create autonomous agents, as well as ensuring users are effectively enabled to leverage these advanced systems, from LLM training platforms to autonomous agent building tools.
As a Principal Software Engineer, you will join the TSO Leadership team and will work closely with DDC colleagues as well as directly with our key stakeholders including Computational Scientists, ML Scientists and Research Scientists. You will also work closely with vendors who provide managed services which form the core of many of the TSO activities.
You are an experienced and hands-on technical expert with the proven ability to lead and deliver enterprise scale technical operations. You will be managing operations for Applications and Software and also contributing as an SRE supporting our Infrastructure. Together with the Technical Operations leadership team, you will integrate expertise across on-premises operations, cloud services, and advanced AI/ML technologies. Your strategic focus will be on ensuring high performance, user enablement, and efficient service delivery while effectively managing relationships with external vendors. This comprehensive approach will drive operational excellence and support the evolving technical needs of our organization.
The Opportunity
Strategic Planning & Service Integration – Develop and implement strategic plans aligned with organizational goals, integrating managed services, on-premises operations, and cloud initiatives while leveraging AI as a key accelerator.
Vendor & Stakeholder engagement – Vendor relationships, negotiate contracts, and ensure adherence to SLAs.
Operational Excellence & Change Management – Applying software engineering principles to operations to scale and maintain highly reliable production systems, balancing the rapid release of new features with uncompromising system stability and performance. Establish Service level metrics (SLA, SLO, SLI), error budgets, change management to drive continuous improvement and innovation.
Security, Compliance & Risk Management – Ensure adherence to security standards, regulatory requirements, and proactively mitigate risks to safeguard critical systems and data.
User Enablement & Technical Support – Champion user onboarding, training, and support to enhance system usability, troubleshooting capabilities, and AI/ML adoption.
Leadership & Team Development – Foster a collaborative, high-performing team culture through mentorship, standardization, and alignment with gCS values (impact, collaboration, diversity, scientific excellence, and curiosity).
Who you are
A PhD in Computer Science or a related field with 2-7 years experience or an MS with 5-10 years of experience, or a BS with 7-12 years of experience is required.
5+ years leading technical operations in on-premises and cloud environments within medium to large organizations is required.
A deep understanding of on-premises infrastructure, cloud ecosystems (especially Kubernetes-based AWS environments), and AI/ML systems, with the ability to manage technical risks and ensure system reliability and scalability is required.
Strong ability to think strategically and drive long-term optimization while acting with urgency; experience in reducing tech debt, consolidating platforms, and deprecating legacy solutions is required.
Proven experience mentoring diverse teams while fostering a culture of collaboration, accountability, and continuous learning is preferred.
Strong oral and written communication skills with the ability to engage stakeholders, provide clear direction, and navigate complex organizational structures is required.
Staying updated on emerging technologies and industry best practices to guide technical decision-making is required.
Not sure you meet all qualifications? Let us decide! Research shows that women and members of other under-represented groups tend to not apply to jobs when they think they may not meet every qualification, when, in fact, they often do! We are committed to creating a diverse and inclusive environment and strongly encourage you to apply.
Who we are
A healthier future drives us to innovate. Together, more than 100’000 employees across the globe are dedicated to advance science, ensuring everyone has access to healthcare today and for generations to come. Our efforts result in more than 26 million people treated with our medicines and over 30 billion tests conducted using our Diagnostics products. We empower each other to explore new possibilities, foster creativity, and keep our ambitions high, so we can deliver life-changing healthcare solutions that make a global impact.
Let’s build a healthier future, together.
Roche is an Equal Opportunity Employer.