Job Description
WHAT MAKES US, US
Join some of the most innovative thinkers in FinTech as we lead the evolution of financial technology. If you are an innovative, curious, collaborative person who embraces challenges and wants to grow, learn and pursue outcomes with our prestigious financial clients, say Hello to SimCorp!
At its foundation, SimCorp is guided by our values – caring, customer success-driven, collaborative, curious, and courageous. Our people-centered organization focuses on skills development, relationship building, and client success. We take pride in cultivating an environment where all team members can grow, feel heard, valued, and empowered.
If you like what we’re saying, keep reading!
WHY THIS ROLE IS IMPORTANT TO US
SimCorp‘s Observability strategy is to deliver a consistent & coherent Observability approach across the full SimCorp One ecosystem. This includes different technology stacks, products and services across the organization. This is a requirement to be able to observe SimCorp products & services seamlessly & efficiently investigate emerging problems to provide high quality software to our clients as well as being able to stay within agreed resolution times. Also provide insights to KPIs, SLOs, SLAs and cost attribution.
As a Lead Site Reliability Engineer – Observability, you will blend site reliability engineering principles with deep telemetry expertise to ensure system visibility, uptime, and performance. Candidate must possess in-depth knowledge and expertise in telemetry data collection, analysis, and implementation. Fully understand the intricacies of and how to derive meaningful insights from different telemetry sources such as metrics, traces, logs and events.
Candidate will work closely with product management, architects and engineering teams to establish unified visibility across the full stack, from LLM‑driven agents to backend services. You won’t just monitor systems—you’ll define the patterns and tools that are a core part of empowering and driving SimCorp’s engineering culture. Your contributions will drive stability, continuous improvement, and operational excellence in our Azure-based environments. This role blends hands-on engineering, incident response, platform configuration, and service quality - guided by ITIL and SRE best practices.
WHAT YOU WILL BE RESPONSIBLE FOR
Support the operational and enhancement of mission-critical environments for both new and existing Cloud Native products & services.
Deploy and manage instrumentation for applications to gain granular insights into service health.
Assist engineering teams in implementing and maintaining metrics, logs, and traces for applications & infrastructure
Unify observability tooling across teams, ensuring metrics, logs, and traces flow into a central platform (e.g., Application Insights or equivalent).
Enable and configure OpenTelemetry-based data collection within Azure Monitor Application Insights by leveraging Azure Monitor OpenTelemetry Distro
Make sure AI agent frameworks adopt the semantic convention to ensure interoperability and consistency in observability data.
Work with product development teams to enable structured logging, basic distributed tracing, and core metrics.
Support incident response by gathering logs, metrics, and traces to perform root cause analysis using observability tools.
Build tools and automation to eliminate TOIL, improve engineering velocity, developer experience, and improve system reliability.
Define and manage SLOs and error budgets in partnership with Engineering teams.
Flexible working in regular & evening shift on rotational basis and provide weekend or On-Call support as needed.
Collaborate with Agile teams and take part in design discussions with clients, vendors, and stakeholders.
Contribute to knowledge sharing across multiple Product Areas.
Leverage a strong foundation in ITIL practices, including problem, change, and incident management
WHAT WE VALUE
Bachelor’s degree in Computer Science or related field (Master’s is a plus)
5+ year experience in Site Reliability, Observability, DevOps, or Cloud Engineering roles
Must have expertise with Microsoft Azure Cloud.
Must have experience working with observability frameworks like Open Telemetry and distributed tracing systems
Expertise in Infrastructure as Code (IaC) using Bicep, ARM and Terraform.
Strong understanding of instrumenting, tracing, and correlating AI/LLM workflows with infrastructure telemetry.
Solid experience in monitoring and logging tools (Azure Monitor, Application Insights, DataDog, Log Analytics).
Knowledge of AI/ML-based anomaly detection, log aggregation and analysis tools like Microsoft Azure Anomaly Detector or equivalent.
Experience with Agentic/LLM‑based systems (like LangChain, Celery, OpenAI APIs, orchestration frameworks)
Experience working with application reliability platforms like Checkly or equivalent
Experience setting up synthetic monitoring using Playwright or equivalent
Solid Understanding of networking, containerization (Kubernetes, Docker)
Good understanding of APIs, scripting languages like PowerShell, Bash, Kusto and databases like SQL, Cosmos DB and Postgres SQL
Familiarity with SimCorp Dimension & Salesforce User is a plus
Proficiency in IT service management (ITSM) frameworks like ITIL, focusing on incident, change, and problem management to improve operational efficiency
Experience managing both onboarding projects and live production operations
Collaborative mindset and ability to work in cross-functional teams
Interest in continuous learning and growth within our Product Area
Benefits
Global hybrid work policy - We ask you to work 2 days a week from the office. If you choose you can work remotely the other days. Of course, you are welcome at the office if that is your preference.
Culture – Inclusive and diverse company culture
Work-life balance – We believe that an equilibrium between professional responsibilities makes us all the best version of ourselves, both in private life and as colleagues in the workplace
Empowerment – We believe that all voices are valuable and must be heard. You will be involved in shaping our work processes
Career & Growth – Simcorp does offer opportunities for professional development: there is never just only one route - we offer an individual approach to professional development to support the direction you want to take.
NEXT STEPS
Please send us your application in English via our career site as soon as possible, we process incoming applications continually. Please note that only applications sent through our system will be processed. At SimCorp, we recognize that bias can unintentionally occur in the recruitment process. To uphold fairness and equal opportunities for all applicants, we kindly ask you to exclude personal data such as photos, age, or any non-professional information from your application. Thank you for aiding us in our endeavor to mitigate biases in our recruitment process.
For any questions you are welcome to contact Shweta Goyal ([email protected]), Talent Acquisition Partner. If you are interested in being a part of SimCorp but are not sure this role is suitable, submit your CV anyway. SimCorp is on an exciting growth journey, and our Talent Acquisition Team is ready to assist you discover the right role for you. The approximate time to consider your CV is three weeks.
We are eager to continually improve our talent acquisition process and make everyone’s experience positive and valuable. Therefore, during the process we will ask you to provide your feedback, which is highly appreciated.
WHO WE ARE
For over 50 years, we have worked closely with investment and asset managers to become the world’s leading provider of integrated investment management solutions. We are 3,000+ colleagues with a broad range of nationalities, education, professional experiences, ages, and backgrounds.
SimCorp is an independent subsidiary of the Deutsche Börse Group. Following the recent merger with Axioma, we leverage the combined strength of our brands to provide an industry-leading, full, front-to-back offering for our clients.
SimCorp is an equal opportunity employer and welcome applicants from all backgrounds, without regard to race, gender, age, disability, or any other protected status under applicable law. We are committed to building a culture where diverse perspectives and expertise are integrated into our everyday work. We believe in the continual growth and development of our employees, so that we can provide best-in-class solutions to our clients.
#LI-Hybrid
