Engineering Manager, SRE/DevOps - Platform Reliability Engineering Group, Cloud Services Department (CLSD)
Job Description
Job Description:
Business Overview
The Technology Platforms Division (TPD) drives the growth of Rakuten's ecosystem by delivering innovative, high-quality technology platforms characterized by integrated control and strategic partnerships.
Within TPD, the Cloud Platform Supervisory Department (CPSD) develops and manages Rakuten's state-of-the-art cloud platform, empowering global scalability and accelerating innovation across its diverse business units.
Department Overview
The Cloud Services Department (CLSD) at Rakuten Group provides high-quality cloud infrastructure and platform services to application developers across Rakuten. Our mission is to enable secure, scalable, and efficient digital innovation. We deliver key domain services, including compute, storage, core infrastructure components, databases, container platform, observability, and gateway solutions, empowering Rakuten application teams to focus on their core business objectives.
Position:
Why We Hire
We are scaling our infrastructure and reliability capabilities, and we are continuously growing our platform engineering organization. To support our mission as the Cloud Services Department, we are looking for engineering management talent who can lead teams responsible for the reliability, scalability, and operational excellence of our production systems
Position Details
As an Engineering Manager for SRE/DevOps, you will play a critical role in our organization, providing leadership and strategic direction to one of our platform engineering groups. You will be accountable for your team's ability to deliver reliable, scalable infrastructure and tooling, and for the operational health of the services they own. We are looking for someone with expertise in growing the careers of SRE and DevOps engineers through mentorship and technical guidance, hands-on experience owning production reliability, and a passion for engineering quality, automation, and operational maturity.
Mandatory Qualifications:
- Lead and coach your SRE/DevOps team to deliver high-quality infrastructure, tooling, and automation that align with organizational reliability and scalability goals, while upholding sound software engineering and operational practices
- Drive a culture of reliability, including SLO/SLA ownership, incident management, post-mortem culture, and continuous improvement of system observability and resilience
- Grow and guide your engineers on their career paths — from junior SREs to senior platform engineers — serving as their primary source of mentorship and technical direction
- Partner with software engineering teams, architects, and product leaders to promote a shared ownership model of production reliability across the organization
- Maintain a clear mid- and long-term technical vision for platform infrastructure, developer experience, and operational capabilities
Desired Qualifications:
- Experience coaching and guiding 5–10 engineers across 1–2 SRE or platform/DevOps teams
- More than 3 years of engineering management experience, with a background in SRE, platform engineering, or DevOps
- Hands-on experience with cloud infrastructure (AWS, GCP, or Azure), CI/CD pipelines, container orchestration (Kubernetes), and observability tooling
- Familiarity with reliability engineering concepts: SLOs, error budgets, incident response, and chaos engineering
- Experience managing large-scale platforms (API Gateways, service meshes, or internal developer platforms) is a strong bonus
Other Information:
Additional information on English Qualification
- Proficiency in English, with a business level of understanding (TOEIC Score 800 above or possess equivalent abilities).
- Japanese proficiency is a bonus but not mandatory.
#engineer #technologymanagement #technologyplatformdiv
Languages:
English (Overall - 3 - Advanced)