Job Description
OPERATIONS ENGINEER
The Role
We're looking for someone who has a passion for technology, SRE, and DevOps, a hunger to learn, and the professionalism required in a mission-critical role. Our close-knit team of operations engineers automates everything: system administration, continuous integration, deployments, monitoring, metrics, and tooling. They enjoy working on complex problems with others and are not afraid to span the stack from the network all the way to building and extending tools.
As an operations engineer, you'll be responsible for defining, innovating, and improving our CI/CD process. You'll make sure an idea flows from a developer's workstation to production in a clean, predictable and automated fashion. Once it's in production, it's in front of users all over the world. You must have an analytical mind, be highly organised and possess strong Linux skills. Ideally, you will have experience in production operations, and you'll be looking for the next step.
Responsibilities
- Building, monitoring and maintaining a 100% cloud environment (AWS)
- Helping to define SLOs and measure SLIs
- Developing tooling that enhances the lives and happiness of the Dev and Ops team alike
- Investigating new technologies that advance the observability and performance of our platform
- Taking ownership of tasks, communicating ideas and decisions throughout the team and ensuring tasks are fully completed with high quality
Requirements
- Strong Unix/Linux administration skills, including understanding TCP/IP networking, scripting (Bash or Python)
- Experience with containers and container orchestration tools such as Kubernetes, and related tooling such as Helm.
- Proven experience in an agile, fast-paced and product-oriented environment as an SRE or related 'DevOps’ role
- A dislike of ad-hoc or manual processes, enjoyment of automating them away and experience of doing this using one or more of Ansible, Puppet, Chef, Salt, etc.
- Experience with Continuous Integration/Delivery (tooling and approach) and experience with tools such as Jenkins, GitHub actions
- Participation in a 24/7/365 operations on-call rotation responsible for mission-critical systems
- Strong teamwork, written and oral communication skills
Desirable Attributes
- An understanding of modern infrastructure design & administration, experience of using AWS, Google Cloud, Microsoft Azure or one of the other IaaS providers (Linode, Digital Ocean, OpenStack, VMWare, XEN) and using Terraform, CloudFormation, boto or other orchestration tools
- An appreciation for security, both in design & operation
- Ability to work in a small team part of a larger organisation, capable of independent work & working with distributed team members
- Experience of or a strong desire to be part of a larger engineering organisation
- Experience building, operating, maintaining & scaling RabbitMQ/AMQP & Postgres
