
Senior Site Reliability Engineer (US Hours)
Job Description
Senior Site Reliability Engineer (US Hours)
Department: Engineering
Employment Type: Permanent - Full Time
Location: Ahmedabad/GiftCity
Description
The Senior Site Reliability Engineer (SRE) is a development-first role focused on coding, automation, and platform stability. The ideal candidate is a skilled software developer who values a balance between building new tools and maintaining operational excellence. To ensure this balance, we adhere to the following 4 week rotation: 3 weeks pure development cycle, 1 week dispatch cycle responding to alerts and mitigating production issues. This position supports the US time zone.
What you’ll be doing
Platform Reliability & Automation
- Design, build, and maintain advanced telemetry and automation tooling to monitor global platform health and trigger automated corrective actions.
- Own and improve incident response runbooks and automated remediation workflows, reducing MTTR over time.
- Participate in on-call rotations, diagnosing and resolving system issues and escalations from the customer support team (this is an internal-facing role, not customer-facing).
- Drive continuous improvement through post-incident reviews (PIRs) and engineering initiatives that eliminate classes of failure.
- Develop advanced monitoring software in python and GoLang.
- Contribute to full-stack troubleshooting across our React.js frontend, Python backend services (Flask, Litestar, Celery), and AWS-managed Kafka (MSK/ESK).
- Write infrastructure-as-code using Terraform, building reusable modules and submodules to provision and manage cloud resources.
- Development Cycle (3 Weeks): Focus on coding advanced telemetry, implementing automation strategies, and building tools that proactively monitor platform health.
- Operations Cycle (1 Week): Rotate into an operational role to swiftly diagnose system issues and handle internal escalations, ensuring continuous platform stability.
- Continuous Improvement: Use insights gained during the operations week to develop automated solutions that reduce future incidents and optimize system performance.
- Supports US hours
Skills, Knowledge and Expertise
Essential Skills & Experience
*It is essential that you are able to cover US working hours
- Extensive professional Python development experience, including object-oriented design and multi-threaded applications.
- Experience with Golang
- Substantial hands-on Terraform experience—able to author modules and submodules from scratch.
- Substantial hands-on AWS experience across EC2, Lambda, CloudWatch, EKS, ECS, MSK, ELB, RDS, DynamoDB, and SQS.
- Solid Linux systems experience, including monitoring critical system health parameters.
Desirable Skills & Experience
- Familiarity with trading systems, financial markets, or low-latency environments
- AWS Associate-level certification or higher (preferred but not required).
- Experience with chaos engineering, SLO/SLI frameworks, or formal reliability programs.
- Prior on-call experience at a high-traffic or mission-critical platform.
- Experience building or supporting React.js applications.
- Working understanding of TCP/IP, DNS, HTTP, and load balancing concepts
What We Bring to the Table
-
Health & Financial Security:
- Medical, Dental, and Vision coverage
-
Time Off & Flexibility:
- Enjoy the best of both worlds: the energy and collaboration of in-person work, combined with the convenience and focus of remote days. This is a hybrid position requiring three days of in-office collaboration per week, with the flexibility to work remotely for the remaining two days. Our hybrid model is designed to balance individual flexibility with the benefits of in-person collaboration, enhanced team cohesion, spontaneous innovation, hands-on mentorship opportunities and strengthens our company culture.
- 21 days of Paid Time Off (PTO) per year, with the option to roll over unused days.
- One dedicated day per year for volunteering.
- Two professional development days per year to allow uninterrupted professional development.
- An additional PTO day is added during milestone anniversary years.
- Robust paid holiday schedule with early dismissal.
- Generous parental leave for all parents (including adoptive parents).
-
Work-Life Support & Resources:
- Budget for tech accessories, including monitors, headphones, keyboards, and other office equipment.
- Milestone anniversary bonuses.
-
Wellness & Lifestyle Perks:
- Subsidy contributions toward gym memberships and health/wellness initiatives.
-
Our Culture:
- Forward-thinking, culture-based organization with collaborative teams that promote diversity and inclusion.
- Forward-thinking, culture-based organization with collaborative teams that promote diversity and inclusion.