Back to jobs
TikTok

Site Reliability Engineer, Global E-Commerce

San Jose, California, United States of AmericaPosted 1 weeks ago
Full-timehybrid

Job Description

The Global E-commerce Service Architecture team ensures the availability, scalability, and resilience of TikTok’s e-commerce platform in the U.S., partnering closely with product and engineering teams to operate reliable, large-scale production systems.

We are seeking a Senior Site Reliability Engineer (SRE) to advance the stability and resilience of TikTok Global E-commerce services in the U.S. In this role, you will strengthen disaster recovery readiness, optimize infrastructure capacity, and elevate service stability.

Key Responsibilities:

  • Data Center Disaster Recovery: Ensure services maintain disaster recovery capabilities under normal operations, including contingency planning and drills, capacity assurance, and effective response in disaster scenarios.
  • Resource Management & Capacity Planning: Manage and plan server and compute resources, including resource restructuring, overall capacity planning, and dynamic scaling, to support reliable business deployment and operations.
  • Service Stability Improvement: Establish and enhance service monitoring systems to enable timely alerting on failures and rapid issue identification and resolution. Partner with Business stakeholders to conduct ongoing stability governance.

Minimum Qualifications:

  • Bachelor’s degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience.
  • 5 years of experience in Site Reliability Engineering, infrastructure, or production engineering roles.
  • Proficiency in at least one programming language (e.g., Go, Python, or Java).
  • Strong understanding of Linux systems, networking fundamentals, and distributed systems architecture.
  • Experience operating services in cloud-native or large-scale production environments.

Preferred Qualifications:

  • Experience supporting high-traffic e-commerce or internet platforms.
  • Experience in designing, operating, and troubleshooting large-scale distributed systems.
  • Strong communication and cross-functional collaboration skills, with a high sense of ownership and accountability.

See Your Match Score

Sign up and Renata will show you how this job matches your skills and experience.

Get Started Free
Site Reliability Engineer, Global E-Commerce at TikTok | Renata