Back to jobs
Photon

SRE +Dynatrace | Mexico

MexicoPosted 3 months ago
remote

Job Description

Location: Guadalajara (Mexico)

What you'll do: 

  • Experience of working with large scale distributed systems, including scalability, disaster recovery and fault tolerance. 
  • Expertise Python scripting .
  • Define, implement, and own SLIs, SLOs, and error budgets for critical microservices in collaboration with product and engineering teams. 
  • Use error budgets to influence release decisions, prioritize reliability work, and manage operational risk. 
  • Design and maintain observability platforms including metrics, logs, traces, and real-time telemetry. 
  • Track, manage, and reduce operational toil by converting repetitive operational work into Jira stories and epics with clear ownership and measurable outcomes. 
  • Design, implement, and validate resiliency mechanisms such as graceful degradation, redundancy, automated failover, and disaster recovery. 
  • Lead incident response, act as an escalation point for high-severity incidents, and drive blameless postmortems. 
  • Partner with scrum teams to improve reliability through release readiness reviews, production change validation, and testing strategies. 
  • Capture incident action items and reliability improvements in Jira, ensuring closure, accountability, and continuous improvement. 
  • Perform deep root cause analysis, debugging, and performance tuning across distributed systems. 
  • Provide technical leadership and mentoring to junior SREs and engineers. 
  • Promote shift-left reliability by embedding operability, monitoring, and failure testing early in the SDLC. 
  • Strong knowledge on CICD Pipeline, GIT, AWS/Azure/GCP as Paas service 
  • Demonstrated knowledge of Configuration Management and Deployment tools automation 
  • Strong Experience with networking concepts and protocols (HTTP, HTTPS, Telnet, SSH, Firewall, VPN, Routing and Load Balancing)
  • Strong Experience with Linux 
  • Experience with Monitoring solutions like Prometheus, Grafana, Products like ELK/Splunk etc.
  • Experience of working with large scale systems 
  • Experience with containers and orchestration technologies like Docker, Kubernetes 
  • Experience on Service Mesh like Istio, etc. would be added Advantage 
  • Experience with any CDN like Akamai etc.. 

What you'll bring: 

  • Bachelor's Degree in Computer Science or related technical field. 
  • 4+ years of experience in SRE, software engineering, or production operations supporting large-scale eCommerce platforms. 
  • Hands-on experience with Java/J2EE-based distributed systems. React experience is a plus. 
  • Proven ability to design and operate systems using SLO-driven reliability models. 
  • Experience defining and measuring SLIs (availability, latency, error rates, throughput, saturation). 
  • Good understanding with NoSQL technologies and RDBMS. Should be able to write queries to fetch results from database. 
  • Experience deploying and operating services on cloud platforms (AWS, Azure, or Google Cloud). 
  • Expertise with observability, APM, and caching tools (Dynatrace, Splunk, ELK, Akamai, QuantumMetric/Tealeaf, etc.). 
  • Strong experience using Jira for backlog management, incident follow-ups, toil reduction tracking, and cross-team coordination. 
  • Ability to independently own services and drive reliability initiatives end-to-end. 
  • Strong communication skills and ability to influence engineering and product teams. 
  • Experience being on On-Call rotation and handling critical/high incidents. 

 

Good to have: 

  • Candidates with application support experience can be considered. 
  • Any monitoring tools experience is acceptable such as New Relic or Datadog can also be considered. 
  • Candidates with 3 to 4 years of experience are fine; even junior resources with 3 years of experience can be considered. 
  • Akamai experience is optional. 
  • Any cloud experience is acceptable. 

See Your Match Score

Sign up and Renata will show you how this job matches your skills and experience.

Get Started Free
SRE +Dynatrace | Mexico at Photon | Renata