
Senior - Cloud Engineer, Development (51410)
Job Description
Citrin Cooperman offers a dynamic work environment, fostering professional growth and collaboration. We’re continuously seeking talented individuals who bring a problem-solving mindset, fresh perspectives, and sharp technical expertise. We know you have choices, so our team of collaborative, innovative professionals are ready to support your professional development. At Citrin Cooperman, we offer competitive compensation and benefits and most importantly, the flexibility to manage your personal and professional life to focus on what matters most to you!
We are seeking a Senior – Cloud Engineer, Development, to join our Development team within the Information Technology department. The newly formed AI Solutions team is the vanguard of our firm’s enterprise AI competency, tasked with transitioning successful AI and Generative AI pilots out of the sandbox into secure, scalable, and resilient production environments. You’ll architect the deployment standards, guardrails, and infrastructure that’ll define how AI operates across the business.
In this pioneering role, you’ll be responsible for the “industrialization” of AI infrastructure. You’ll look past the pilot phase to solve complex challenges related to enterprise integrations, operational readiness, AI tool standard-setting, and network security. With our pilots leveraging a mix of proprietary models (Anthropic, Google, OpenAI), evaluation tools, and custom agent platforms like LangGraph, you’ll design the robust CI/CD pipelines, API routing, and infrastructure-as-code required to manage this diverse ecosystem. This role requires a high-caliber engineer who views pilot transition not as technical debt resolution, but as the critical engineering required to turn a proof-of-concept into a mission-critical, secure enterprise asset.
Responsibilities are, but not limited to
- Multi-LLM Infrastructure Architecture: Design and provision the secure networking, API gateways, and load balancing required to seamlessly and securely route traffic between internal applications and multiple LLM providers (Anthropic, OpenAI, Google, etc.).
- Agentic Framework Deployment: Engineer the hosting and scaling strategies for complex, stateful AI applications (e.g., containerizing and deploying LangGraph agents), ensuring high availability and fault tolerance.
- AI Security & Guardrails: Implement stringent network security, including private endpoints, VNET integration, and firewall rules for all AI services. Establish infrastructure guardrails to prevent data leakage and ensure compliance with enterprise infosec policies.
- LLMOps & CI/CD Tooling: Build and maintain automated CI/CD pipelines specifically tailored for non-deterministic applications. Deploy and manage the infrastructure required to host internal LLM evaluation and tracing tools.
- Infrastructure-as-Code (IaC) Standards: Develop modular, reusable IaC templates (e.g., Terraform, Bicep) for standardizing how AI pilots are stamped out into production environments.
- Cost Management & Operations Readiness: Implement robust monitoring, alerting, and cost-tracking mechanisms for token usage and computing resources. Prepare the environment and documentation for eventual handover to enterprise operations.
The ideal candidate must:
- Have a bachelor’s degree in computer science, information technology, engineering, or equivalent practical experience.
- Be Microsoft Certified: Azure Administrator Associate (AZ-104)
- Be Microsoft Certified: Fabric Analytics Engineer Associate (DP-600)
- Be Microsoft Certified: Azure Network Engineer Associate (AZ-700)
- Be Microsoft Certified: Azure Security Engineer Associate (AZ-500)
- Have 5+ years of cloud engineering and architecture experience in highly regulated or complex enterprise environments.
- Have deep expertise in cloud networking, security, and identity management (Azure/AWS preferred) with a strong command of Infrastructure-as-Code (Terraform, Bicep, or CloudFormation).
- Have experience with containerization and orchestration (Docker, Kubernetes) and serverless compute paradigms.
- Have hands-on experience configuring and securing API Gateways and managing routing for third-party SaaS or AI model APIs.
- Be familiar with the infrastructure needs of modern AI/ML workflows, including MLOps/LLMOps principles, model registries, and evaluation platform hosting.
- Have experience building resilient CI/CD pipelines for complex software deployments.
- Be a Security-First Mentality: Naturally anticipates vulnerabilities and builds infrastructure with defense-in-depth principles.
- Be a Systems Thinker: Able to see the entire enterprise architecture and understand how a localized AI pilot will impact on a broader network and compute ecosystems.
- Be a Resilient Problem Solver: Thrives when navigating the ambiguity of emerging AI technologies and bringing order, standardization, and operational rigor to them.