Senior Developer, Data Engineer

Hyderabad, IndiaPosted 1 months ago

Full-timehybrid

Job Description

Job Purpose

Intercontinental Exchange, Inc. (ICE) is seeking an experienced Senior Data Engineer to join the AI Centre of Excellence (AI CoE) team

The AI CoE drives the adoption of artificial intelligence and advanced analytics across ICE's global exchange and financial data business, bringing together data engineering, machine learning, and applied research to deliver intelligent solutions at scale

This role sits at the foundation of that work, ensuring the data infrastructure, pipelines, and platforms the AI CoE depends on are robust, scalable, and fit for purpose.

The successful candidate will take ownership of the orchestration layer that keeps our data moving reliably, contribute to the lakehouse architecture that provides clean and trusted data to analytical and ML consumers, and develop the transformation logic that converts raw inputs into high-quality, governed data products

The role requires close collaboration with data scientists, ML engineers, and business analysts, and carries meaningful input into the tooling decisions and engineering standards the team adopts.

We are looking for a candidate with demonstrable, production-scale experience in data engineering and pipeline orchestration

Technical depth is essential, as is the ability to produce well-structured, maintainable code and to operate effectively within a collaborative engineering team.

Responsibilities

Own Apache Airflow end-to-end
You will write DAGs that handle complex multi-step workflows across ingestion, transformation, validation, and delivery
That means getting the fundamentals right: idempotency, backfill safety, SLA alerting, dynamic task mapping, and thorough testing
You will also review other engineers' DAGs and raise the standard across the team.
Keep Airflow running well on Kubernetes
You will manage the deployment using KubernetesExecutor or CeleryKubernetesExecutor, handle Helm upgrades, tune pod templates and resource limits, and deal with scheduler performance issues before they become incidents.
Build ETL and ELT pipelines that hold up in production
Sources will vary: APIs, message queues, databases, object storage
The expectation is that pipelines are fault-tolerant, incremental where it makes sense, and straightforward to debug when something goes wrong.
Treat data quality as part of the job, not an afterthought
Schema validation, row-count reconciliation, and anomaly detection should be baked into pipelines from the start
Freshness and accuracy are part of your definition of done.
Design and maintain a lakehouse architecture using Apache Iceberg as the primary table format, following a Bronze, Silver, and Gold medallion structure
Schema evolution, partitioning strategies, and time-travel queries will be regular concerns, not edge cases.
Use Databricks to manage cluster compute, Delta Lake workflows, Unity Catalog, and scheduled jobs
You will keep things cost-efficient, well-organised, and easy for others to navigate.
Build and maintain dbt projects that sit alongside the Airflow orchestration layer
Models, tests, snapshots, and macros should be well-structured and documented
Test failures and freshness checks should surface where people can act on them.
Contribute reusable components to the shared platform: pipeline templates, custom Airflow operators, and utility libraries that make the whole team faster and more consistent.
Partner with data scientists and ML engineers to deliver feature pipelines and training datasets with the freshness and reproducibility their models need.
Instrument your work
Pipelines should have metrics, dashboards, and alerts that surface real problems early
You will own runbooks and participate in on-call cover.
Play an active part in architecture discussions, tooling evaluations, and mentoring
Your experience should benefit the team, not just your own workstream.

Knowledge and Experience

Apache Airflow: Expert level, with at least three years running Airflow in production
You should know the scheduler internals, executor types, XComs, Connections, Variables, pools, and the TaskFlow API well enough to make trade-off decisions and explain them clearly.
Airflow on Kubernetes: Real experience deploying and operating Airflow on Kubernetes via KubernetesExecutor or CeleryKubernetesExecutor: pod templates, resource tuning, persistent volume claims, Helm chart management, and upgrades that do not take the scheduler offline.
DAG Development: Strong Python applied to DAG authoring: dynamic task mapping, custom operators and sensors, cross-DAG triggers, parameterised pipelines, and proper test coverage
Writing testable DAGs should feel natural, not optional.
ETL and ELT Pipelines: Demonstrated experience building batch and incremental pipelines at scale
You understand transformation logic, data quality validation, and lineage capture, and you have shipped pipelines that other people maintain without you.
Data Lakehouse Architecture: A solid grasp of lakehouse design in practice: how to implement a medallion model, when to use which table format, and how to make data reliably available to downstream consumers without constant hand-holding.
Apache Iceberg: Hands-on knowledge of the Iceberg table spec: snapshot isolation, partition evolution, hidden partitioning, and time-travel queries
You have managed Iceberg tables in a real catalog environment, whether Hive metastore, a REST catalog, or otherwise.
Databricks: Practical Databricks experience covering cluster management, Delta Lake, Workflows or Jobs, and Unity Catalog
Experience with Databricks Asset Bundles or MLflow is a welcome bonus.
dbt: Confident building dbt projects from the ground up: models, sources, tests, snapshots, seeds, and macros
You know how to wire dbt into Airflow, write documentation that people actually use, and make freshness checks meaningful.
Python: Strong engineering-grade Python: well-structured, tested, and packaged
Comfortable with pandas, PySpark, SQLAlchemy, and Pydantic
You write code other engineers want to read.

Preferred Knowledge and Experience

Apache Kafka and Streaming: Experience with near-real-time ingestion using Kafka, Kafka Connect, or Flink, including landing data into lakehouse storage.
Apache Spark: PySpark proficiency for large-scale distributed processing, with a feel for memory management, shuffle optimisation, and partition sizing.
Kubernetes and OpenShift: Platform knowledge beyond Airflow: Helm, RBAC, and namespace management, enough to self-serve on infrastructure without leaning on platform teams.
Data Governance and Contracts: Familiarity with data contract approaches, catalogue tools such as DataHub or Collibra, and quality frameworks like Great Expectations or Soda.
Cloud Platforms: Experience with managed data services on AWS, Azure, or GCP: S3, Glue, ADLS, Synapse, GCS, Dataproc, and similar.
Infrastructure as Code: Ability to provision data infrastructure using Terraform or Pulumi.
ML Feature Engineering: An understanding of feature stores and experience building pipelines that serve model training and inference reliably.

See Your Match Score

About ICE

Website

More jobs at ICE

Senior Principal Product Manager

New York, New York, United States

Senior Quality Assurance Engineer

Atlanta, Georgia, United States

Director, Associate General Counsel

Jacksonville, Florida, United States

Commercial Business Analyst

Bedford, Massachusetts, United States

Engineer, Platform Engineering

Atlanta, Georgia, United States

Senior Analyst, Index Product Development

New York, New York, United States

Similar roles

Systematic and Credit Portfolio Trading Tool Developer, Assistant Vice President

Citi · New York, New York, United States

Salesforce Developer, Lead

Booz Allen Hamilton · Charleston, SC

Salesforce Software Developer, Mid

Booz Allen Hamilton · McLean, VA

Developer, IoT

Magna · Newaygo, Michigan, US

Developer, SAP

Rimini Street · Hyderabad

Developer, SAP

Rimini Street · Hyderabad