Technical Consultant Int - Azure Platform, Monitoring & Observability, Identity Access Management
Job Description
Scope:
The Monitoring & Observability Engineer is responsible for ensuring end-to-end visibility across the Azure platform by implementing and maintaining monitoring, alerting, and observability solutions. This role focuses on proactively monitoring platform health, performance, availability, and security, enabling rapid detection and resolution of issues while supporting reliable and efficient platform operations. The engineer plays a key role in fostering a proactive operational culture through continuous monitoring, analytics, and operational insights.
Our current technical environment:
Our technical environment includes modern cloud and DevOps tooling with IaC (Terraform, Ansible, ARM, Bicep), CI/CD (Azure DevOps, Jenkins, GitHub Actions), and container orchestration (Docker, Kubernetes, AKS). We leverage Azure and OCI platforms, automation frameworks, microservices architecture, and observability tools, while also adopting emerging technologies such as GenAI and AI/ML.
What you’ll do:
- Monitor the health, availability, performance, and security of Azure platform services using Azure Monitor, Log Analytics, Application Insights, and Elastic.
- Maintain and monitor dashboards, alerts, and key operational metrics across platform services including IAM, APIM, MongoDB, Stratosphere, Portal Shell, Portal Collaboration, and Event Framework.
- Respond to monitoring alerts, perform initial triage, and escalate incidents to appropriate L2/L3 teams in accordance with defined procedures.
- Monitor authentication services, token issuance processes, and access management operations within Azure AD / Entra ID to ensure service availability and compliance.
- Track API gateway performance metrics, including latency, error rates, throttling events, and quota utilization, and report anomalies to support teams.
- Review logs, traces, and monitoring data to identify operational issues, performance degradation, and potential service disruptions.
- Execute synthetic monitoring checks and validate end-to-end user journeys to ensure platform functionality and availability.
- Follow established runbooks and operational procedures to support incident resolution and routine maintenance activities.
- Collaborate with engineering and operations teams to improve monitoring coverage, alert accuracy, and operational efficiency.
- Participate in shift operations, monitoring reviews, and continuous improvement initiatives aimed at reducing incident response times and enhancing platform reliability.
- Monitor the health, availability, performance, and security of Azure platform services using Azure Monitor, Log Analytics, Application Insights, and Elastic.
- Maintain and monitor dashboards, alerts, and key operational metrics across platform services including IAM, APIM, MongoDB, Stratosphere, Portal Shell, Portal Collaboration, and Event Framework.
- Respond to monitoring alerts, perform initial triage, and escalate incidents to appropriate L2/L3 teams in accordance with defined procedures.
- Monitor authentication services, token issuance processes, and access management operations within Azure AD / Entra ID to ensure service availability and compliance.
- Track API gateway performance metrics, including latency, error rates, throttling events, and quota utilization, and report anomalies to support teams.
- Review logs, traces, and monitoring data to identify operational issues, performance degradation, and potential service disruptions.
- Execute synthetic monitoring checks and validate end-to-end user journeys to ensure platform functionality and availability.
- Follow established runbooks and operational procedures to support incident resolution and routine maintenance activities.
- Collaborate with engineering and operations teams to improve monitoring coverage, alert accuracy, and operational efficiency.
- Participate in shift operations, monitoring reviews, and continuous improvement initiatives aimed at reducing incident response times and enhancing platform reliability.
What we are looking for:
- 2–4 years in Azure cloud operations or SRE roles.
- Immediate Joiners Preferred
- BE/Btech/Engineering Degree must
- Relocation to Coimbatore preferred
- In-Person interviews
- Strong Azure platforms fundamentals expert
- Identity access management tools expertise
- Monitoring tools and Kubernetes
- Hands-on experience with KQL, Log Analytics workspaces, and Azure Workbooks.
- Demonstrated ability to design alert hierarchies and reduce alert fatigue.
- Familiarity with APIM diagnostic settings and event hub log forwarding.
- Experience monitoring MongoDB Atlas or similar NoSQL databases.
- Knowledge of OAuth 2.0 / OIDC flows for IAM health monitoring.
- Exposure to event-driven architectures (Azure Event Grid, Service Bus, Event Hubs).
- Strong communication skills — ability to translate metrics into business impact.
- AZ-900 / AZ-104 / AZ-204 certifications preferred.
- Strong exposure to cloud technologies
- Application & Production Monitoring and Support
Our Values
If you want to know the heart of a company, take a look at their values. Ours unite us. They are what drive our success – and the success of our customers. Does your heart beat like ours? Find out here: Core Values
All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or protected veteran status.