Site Reliability Engineer

RiyadhPosted 1 months ago

Full-timemid

About Lucidya

Lucidya is an AI-native platform for customer experience (CX) intelligence that manages entire customer lifecycles autonomously, from initial engagement through retention and growth.

Unlike platforms that only surface insights and leave the action to you, Lucidya closes the loop with proprietary NLU technology built in-house and trained on millions of multilingual conversations. This enables marketing, support, CX, and research teams to deliver personalized experiences that drive measurable improvements in customer satisfaction, retention, and lifetime value.

As we continue scaling globally, the reliability, performance, and resilience of our infrastructure become mission-critical to everything we do.

Why this role matters

At Lucidya, our platform processes massive volumes of real-time customer data. Any downtime, latency, or instability directly impacts our customers’ ability to make decisions and serve their own users.

This role exists to make sure that doesn’t happen.

As a Site Reliability Engineer, you’ll sit at the heart of our platform’s stability, owning the reliability of our cloud infrastructure and ensuring it scales seamlessly as we grow. You won’t just react to issues; you’ll anticipate them, design systems that prevent them, and build automation that removes them entirely.

If you enjoy solving complex infrastructure challenges, eliminating inefficiencies, and building systems that “just work” - this is where you’ll thrive.

What You’ll Do

You’ll be responsible for outcomes, not just tasks. Here’s what success looks like in this role:

You’ll make reliability the default

You’ll design and maintain infrastructure that is highly available, fault-tolerant, and scalable
You’ll proactively identify and eliminate single points of failure before they become incidents

You’ll ensure our production systems remain stable, even under increasing scale and load

You’ll own and optimize our cloud environments

You’ll manage and continuously improve workloads across AWS, GCP, or Azure
You’ll use Infrastructure as Code (Terraform) to standardize and scale infrastructure
You’ll optimize resource usage to balance performance and cost

You’ll run and improve Kubernetes in production

You’ll operate and scale Kubernetes clusters (EKS, GKE, etc.) with confidence
You’ll troubleshoot issues quickly and ensure smooth deployments and upgrades
You’ll ensure our containerized workloads perform reliably at scale

You’ll build strong observability and respond to incidents

You’ll implement and refine monitoring systems using tools like Prometheus, Grafana, Datadog, or ELK
You’ll define alerting that is meaningful, not noisy
You’ll respond to incidents, lead root cause analysis, and ensure we learn from every failure

You’ll automate everything that shouldn’t be manual

You’ll write scripts and build tooling to eliminate repetitive operational work
You’ll continuously improve infrastructure efficiency through automation
You’ll promote a culture where manual work is a temporary state, not the norm

You’ll collaborate to improve the entire system

You’ll work closely with DevOps and engineering teams to solve performance bottlenecks
You’ll contribute to CI/CD improvements and deployment reliability
You’ll help shape reliability best practices across the organization

What success looks like (First 90 Days)

First 30 days:

You’ve built a strong understanding of our infrastructure, systems, and workflows
You’re contributing to day-to-day operations with support from the team
You’ve started identifying areas for improvement in automation and reliability

By 90 days:

You’re independently managing infrastructure tasks and troubleshooting issues
You’re actively contributing to reliability and scalability improvements
You’ve taken ownership of parts of our infrastructure and are improving them

See Your Match Score

About Lucidya | لوسيديا

201-500 employees

Riyadh, Riyadh, SA

Website

More jobs at Lucidya | لوسيديا

Customer Support Manager

Cairo

Voice AI Engineer (Mid/Senior)

Egypt; Kuwait; Jordan; Oman; Qatar; India

Technical Customer Support Executive

Cairo

Staff / Senior AI Engineer (Video AI & LLM Systems)

Riyadh

Sr. Customer Success Manager

Riyadh

Sr Account Executive - Agentic AI

Riyadh, Riyadh Province

Similar roles

Site Reliability Engineer - FedRAMP

Cisco · San Francisco, California, US; New York, New York, US

Associate Manufacturing Technician (Weekend Day Shift - Onsite)

Insulet Corporation · US - Massachusetts

Materials Handler (Weekend Day Shift - Onsite)

Insulet Corporation · US - Massachusetts

Sr Manufacturing Engineer (Onsite)

Insulet Corporation · US - Massachusetts (Acton - Office)

Assistant Site Manager Hourly

COBBLESTONE AUTO SPA · Meridian, ID

Miller Electric Company- Prefabrication Site Manager

EMCOR Facilities Services ·

Site Reliability Engineer

Job Description

About Lucidya

What You’ll Do

What success looks like (First 90 Days)

See Your Match Score

More jobs at Lucidya | لوسيديا

Similar roles

More jobs at Lucidya | لوسيديا

Similar roles