Back to jobs
Lyzr AI

Production Support Lead

Bengaluru, IndiaPosted 1 months ago
hybrid

Job Description

Professional Services -Service Practice

Full-time · Remote (India / US / EU)
Experience : 5-8 years


About Lyzr

Lyzr.ai's agentic AI platform powers intelligent, autonomous workflows for enterprise clients. Production Support Engineers are the front line that keeps those workflows healthy — triaging incidents, resolving tickets, digging into logs, and escalating the right issues to the right teams before clients feel the pain.

This role suits someone who thrives in a fast-paced technical environment, takes ownership seriously, and genuinely enjoys the detective work of diagnosing why something broke in production. You will work within a global follow-the-sun support model, reporting to the Production Support Lead.

What you’ll do

Incident command & escalation

  • Own the full incident lifecycle — detection, triage, war-room coordination, resolution, and post-mortem — for P1/P2 issues across all production tenants.

  • Act as the primary escalation point for Production Support Engineers; make the call on severity reclassification and client communication timing.

  • Drive RCA completion within SLA windows and ensure corrective actions are tracked to closure in Jira/Confluence.

  • Maintain and continuously improve the P1 runbook library, escalation trees, and on-call rotation schedules.

Team leadership & operations

  • Manage and mentor a team of 3–6 Production Support Engineers; run weekly 1:1s, set KPIs, and own the performance review cycle.

  • Build and optimise the shift rota for 24x7x365 follow-the-sun coverage across India, EU, and US time zones.

  • Define and track operational metrics: MTTR, SLA attainment by priority tier, re-open rate, and backlog aging.

  • Partner with Engineering and Platform teams to advocate for supportability improvements, observability tooling, and bug-fix prioritisation.

Client & commercial accountability

  • Serve as the named support contact for strategic accounts during critical incidents; provide executive-level written updates under pressure.

  • Review monthly SLA performance reports with client stakeholders; identify systemic patterns and propose proactive remediation.

  • Contribute to SLA definition in new SOWs, ensuring commitments are operationally deliverable.

  • Support the renewal and expansion process by demonstrating support maturity and service quality data.


Process & tooling

  • Own the support toolchain: ticketing (Jira Service Management or equivalent), monitoring dashboards, alerting rules, and on-call tooling (PagerDuty / OpsGenie).

  • Establish knowledge management practices — internal runbooks, known-error database, and a tiered FAQ — to reduce repeat escalations to Engineering.

  • Define and enforce severity classification criteria and ticket hygiene standards across the team.

What you bring

  • Experience : 5–8 years in production/application support; 2+ years in a lead or senior role

  • Domain: SaaS / AI / ML platform support; ideally agentic or LLM-based systems

  • Incident mgmt.: ITIL Foundation or equivalent; proven P1 incident commander

  • Tooling: Jira SM, PagerDuty / OpsGenie, Datadog / Grafana, Confluence

  • Leadership: Direct team management experience; mentoring junior engineers

  • Communication: Executive-level written updates under high-pressure conditions


Additionally, you will have:

  • Hands-on familiarity with cloud infrastructure (AWS / GCP / Azure) and container environments (Kubernetes, Docker).

  • Ability to read logs, traces, and basic Python/SQL to independently diagnose issues before engaging Engineering.

  • Bonus: experience supporting multi-tenant SaaS at scale, or prior work with AI/ML pipelines in production.

  • Bonus: familiarity with enterprise client SLA frameworks — P1/P2/P3 tiering, OLA/UC structures.


See Your Match Score

Sign up and Renata will show you how this job matches your skills and experience.

Production Support Lead at Lyzr AI | Renata