
Senior Tier-4 Model Serving Support Lead
Job Description
Everforth ECS is seeking a Senior Tier-4 Model Serving Support Lead to work in the National Capital Region covering the Pentagon, Falls Church, and Fairfax. Please Note: This position is contingent upon contract award.
The War Data Platform (WDP) is a key initiative within the U.S. Department of War's (DoW) AI-First strategy introduced in early 2026. The WDP focuses on operational warfighting data and aims to accelerate the deployment of artificial intelligence (AI) on the battlefield. The WDP extends to Unclassified, Secret, and Top Secret environments, and supports collaboration between Combatant Commands, Joint Staff directorates, Senior Executive Service leaders, and operational analysts.
The Senior Tier-4 Model Serving Support Lead serves as the authoritative escalation owner for AI and machine learning model-serving pipelines, production endpoints, and model zoo operations across WDP Core Integration's full multi-enclave environment. This role bridges platform engineering, cybersecurity, and cross-service mission partners to sustain uninterrupted AI model-serving performance in direct support of DoW missions, Joint Staff analysts, Combatant Command elements, and Senior Executive Service leadership.
• Owns Tier-4 escalation coordination for artificial intelligence and machine learning model-serving pipelines, production endpoints, and model zoo operations within War Data Platform (WDP) Core Integration environments supporting Department of War missions, Joint Staff analysts, Combatant Command elements, and Senior Executive Service leadership.
• Directs escalation workflows by activating incident bridges, coordinating engineering response actions, validating operational impact, and aligning escalation playbooks with service-level agreement requirements.
• Applies Kubernetes, GitLab Continuous Integration, VMware environments, Elastic Stack, Prometheus metrics, Grafana dashboards, and enterprise observability tooling to diagnose serving failures, analyze telemetry, and guide stabilization activities across unclassified and higher-domain enclaves.
• Leads coordination with Platform One, Cloud One, multi-national engineering teams, and cross-service mission partners to maintain operational readiness for serving pipelines, cross-domain transfer workflows, API endpoints, and model-runtime components.
• Conducts structured post-incident analysis by collecting operational evidence, reconstructing failure sequences, validating remediation steps, and documenting mission-assurance considerations for future release cycles.
• Produces mission-critical deliverables including escalation playbooks, incident-response documentation, service-level alignment reports, operational risk assessments, and restoration summaries.
• Strengthens program value by reinforcing deployment consistency, advancing mission assurance posture, and sustaining operational continuity across all enclaves.
• Supports enterprise release operations by coordinating readiness checks, validating rollback pathways, and maintaining authoritative Tier-4 support artifacts required for uninterrupted artificial intelligence model-serving performance.
• Performs other duties as assigned.