Job Description
Job Posting End Date:
Worker Type:
Maximum Term/Fixed Term (Fixed Term)About the Role
We are seeking an Analyst‑level Site Reliability Engineer (SRE) to play a critical role in Production Operations, Incident Management, and Platform Resilience.
This role is hands‑on, operationally focused, and suited for engineers who enjoy keeping systems stable, observable, and recoverablewhile continuously improving reliability through automation and better ways of working.
You will work closely with engineering and platform teams to ensure our services are available, secure, and supportable across cloud environments (AWS, AZURE, GCP).
Key Responsibilities
Production & Incident Management
Provide 24×7 production support through on‑call rotations.
Actively participate in incident response, including monitoring, triage, escalation, and service restoration.
Coordinate with development and platform teams during Major Incidents (MI).
Ensure incidents are managed efficiently with clear communication and documented outcomes.
Platform & Resilience Support
Operate and support applications across clould environments (AWS, AZURE, GCP).
Perform and support deployments, releases, patching, and environment maintenance.
Participate in Disaster Recovery (DR) testing, failover exercises, and resilience reviews.
Monitor system health, capacity, and performance to proactively identify risks.
Monitoring & Observability
Maintain and improve monitoring, alerting, and logging to ensure actionable signals.
Reduce alert noise and improve signal‑to‑noise ratio for on‑call teams.
Ensure production issues are detectable before customer impact where possible.
Continuous Improvement
Lead and contribute to Root Cause Analysis (RCA) and post‑incident reviews.
Create, maintain, and continuously improve runbooks, SOPs, and operational documentation.
Identify repetitive or manual tasks and automate operational processes.
Feed operational learnings back into system design to improve long‑term reliability.
Skills & Experience
Required
3–5+ years experience in SRE, DevOps, or Production Support roles.
Strong experience supporting production systems in cloud or hybrid environments.
Hands‑on experience with AWS and/or Azure.
Experience operating containerized platforms (Kubernetes – EKS/AKS).
Familiarity with CI/CD pipelines, deployments, and release processes.
Solid understanding of incident management, on‑call operations, and RCA.
Experience working with monitoring, logging, and alerting tools.
Strong documentation, communication, and collaboration skills.
Nice to Have
Infrastructure as Code experience (Terraform, CloudFormation, etc.).
Experience automating operational tasks using scripting (Python, Bash).
Understanding of microservices and distributed systems.
Exposure to security, compliance, or regulated environments (e.g. banking).
What Success Looks Like
Incidents are resolved quickly, with clear ownership and calm execution.
Production systems are stable, observable, and recoverable.
Runbooks and SOPs are reliable and actually used during incidents.
Operational toil is continuously reduced through automation.
Teams trust Production and SRE support as a force multiplier, not a bottleneck.
It's more than just a career at NAB!
We believe in people with people and dreams, and we want you to achieve your aspirations. More than just a career, NAB Vietnam offers you a flexibility to balance your work - life, the opportunity to grow as professionals, people and a complete set of well-being offerings. If you have an appetite to learn, grow and elevate others around you, this is the place for you.
IT'S MORE THAN MONEY
We naturally also provide a very competitive remuneration package but a career with us is about a lot more than money. We believe in people with ideas and dreams, and we want you to achieve your aspirations. We will work together to deliver exceptional products and outcomes that push the limits of our own aspirations. Our passion for creating value and exceeding our customers' expectations means we are constantly striving to redefine our standards of excellence. You will have our backing to develop and our encouragement to explore, realize and reach your full potential.
