Back to jobs
Grosvenor Casinos

Site Reliability Engineer

Quatre Bornes, Plaines Wilhems District, MUPosted 1 weeks ago
onsite

Job Description

As a Site Reliability Engineer (SRE), you will ensure our customers get the best quality of service and up-time we can give them. You will Identify where we can expect and how we can tolerate IT failures from our systems as well as those we depend upon. You will be responsible for the availability, performance, monitoring, and incident response, and general service management, of the platforms and services that our company runs and owns.  You will work closely with our developers and infrastructure engineers to build and run services and systems that respond consistently to failures by gracefully degrading our services and help to ensure they are thinking about operational deliverables such as monitoring, logging, run books which can be make or break for diagnosing and fixing critical issues.    Main Accountabilities & Responsibilities:  Identify where we can expect and how we can tolerate IT failures from our systems as well as those we depend upon.  Responsible for the availability, performance, monitoring, and incident response, and general service management, of the platforms and services that our company runs and owns.  Work closely with our developers and architects to build and run services and systems that respond consistently to failures by gracefully degrading our services and help to ensure they are thinking about operational deliverables such as monitoring, logging, run books which can be make or break for diagnosing and fixing critical issues.  Responsible for ensuring the systems and applications we launch remain available, reliable and efficient at accomplishing their duties even as their duties scale and evolve.  Involved in every part of our site, from conceptions of products and their development to deployment, troubleshooting and analysis.  Design, build and automate tools and processes to ensure and improve scalability, availability and performance across areas of technology. In addition, build, integrate and run tools to inject, predict and identify infrastructure and service failures on an ongoing basis to help optimise our sites.    What’s needed for success – Experience & Qualifications:  Have some experience in IT Service Management (ITIL) and have an understanding of which parts apply in an agile DevOps environment   A desire to learn new technologies and apply them where appropriate to improve the quality of our software and processes  Experience with AWS services, Docker/Kubernetes, CI/CD Pipelines: Jenkins, GitHub Actions, GitLab CI, and Infrastructure as a code - ideally Terraform  Experience in implementing Agentic Ops solutions (designing and configuring AI agents for incident troubleshooting and automated issue resolution)  UNIX/Linux systems administration background  Experience in at least one configuration management solution (preferably Ansible)  Experience in using monitoring tools (Splunk/New Relic/Elasticsearch/AWS CloudWatch)  Programming skills (Python, Bash, Java)  You love to automate everything!!  #interactive

See Your Match Score

Sign up and Renata will show you how this job matches your skills and experience.

Get Started Free
Site Reliability Engineer at Grosvenor Casinos | Renata