Back to jobs
ICE

Senior Developer, Data Engineer

Hyderabad, IndiaPosted 1 months ago
Full-timehybrid

Job Description

Job Description

Job Purpose

Intercontinental Exchange, Inc. (ICE) is seeking an experienced Senior Data Engineer to join the AI Centre of Excellence (AI CoE) team

The AI CoE drives the adoption of artificial intelligence and advanced analytics across ICE's global exchange and financial data business, bringing together data engineering, machine learning, and applied research to deliver intelligent solutions at scale

This role sits at the foundation of that work, ensuring the data infrastructure, pipelines, and platforms the AI CoE depends on are robust, scalable, and fit for purpose.

 

The successful candidate will take ownership of the orchestration layer that keeps our data moving reliably, contribute to the lakehouse architecture that provides clean and trusted data to analytical and ML consumers, and develop the transformation logic that converts raw inputs into high-quality, governed data products

The role requires close collaboration with data scientists, ML engineers, and business analysts, and carries meaningful input into the tooling decisions and engineering standards the team adopts.

 

We are looking for a candidate with demonstrable, production-scale experience in data engineering and pipeline orchestration

Technical depth is essential, as is the ability to produce well-structured, maintainable code and to operate effectively within a collaborative engineering team.

 

 

Responsibilities

  • Own Apache Airflow end-to-end

    You will write DAGs that handle complex multi-step workflows across ingestion, transformation, validation, and delivery

    That means getting the fundamentals right: idempotency, backfill safety, SLA alerting, dynamic task mapping, and thorough testing

    You will also review other engineers' DAGs and raise the standard across the team.

  • Keep Airflow running well on Kubernetes

    You will manage the deployment using KubernetesExecutor or CeleryKubernetesExecutor, handle Helm upgrades, tune pod templates and resource limits, and deal with scheduler performance issues before they become incidents.

  • Build ETL and ELT pipelines that hold up in production

    Sources will vary: APIs, message queues, databases, object storage

    The expectation is that pipelines are fault-tolerant, incremental where it makes sense, and straightforward to debug when something goes wrong.

  • Treat data quality as part of the job, not an afterthought

    Schema validation, row-count reconciliation, and anomaly detection should be baked into pipelines from the start

    Freshness and accuracy are part of your definition of done.

  • Design and maintain a lakehouse architecture using Apache Iceberg as the primary table format, following a Bronze, Silver, and Gold medallion structure

    Schema evolution, partitioning strategies, and time-travel queries will be regular concerns, not edge cases.

  • Use Databricks to manage cluster compute, Delta Lake workflows, Unity Catalog, and scheduled jobs

    You will keep things cost-efficient, well-organised, and easy for others to navigate.

  • Build and maintain dbt projects that sit alongside the Airflow orchestration layer

    Models, tests, snapshots, and macros should be well-structured and documented

    Test failures and freshness checks should surface where people can act on them.

  • Contribute reusable components to the shared platform: pipeline templates, custom Airflow operators, and utility libraries that make the whole team faster and more consistent.
  • Partner with data scientists and ML engineers to deliver feature pipelines and training datasets with the freshness and reproducibility their models need.
  • Instrument your work

    Pipelines should have metrics, dashboards, and alerts that surface real problems early

    You will own runbooks and participate in on-call cover.

  • Play an active part in architecture discussions, tooling evaluations, and mentoring

    Your experience should benefit the team, not just your own workstream.

 

Knowledge and Experience

  • Apache Airflow:  Expert level, with at least three years running Airflow in production

    You should know the scheduler internals, executor types, XComs, Connections, Variables, pools, and the TaskFlow API well enough to make trade-off decisions and explain them clearly.

  • Airflow on Kubernetes:  Real experience deploying and operating Airflow on Kubernetes via KubernetesExecutor or CeleryKubernetesExecutor: pod templates, resource tuning, persistent volume claims, Helm chart management, and upgrades that do not take the scheduler offline.
  • DAG Development:  Strong Python applied to DAG authoring: dynamic task mapping, custom operators and sensors, cross-DAG triggers, parameterised pipelines, and proper test coverage

    Writing testable DAGs should feel natural, not optional.

  • ETL and ELT Pipelines:  Demonstrated experience building batch and incremental pipelines at scale

    You understand transformation logic, data quality validation, and lineage capture, and you have shipped pipelines that other people maintain without you.

  • Data Lakehouse Architecture:  A solid grasp of lakehouse design in practice: how to implement a medallion model, when to use which table format, and how to make data reliably available to downstream consumers without constant hand-holding.
  • Apache Iceberg:  Hands-on knowledge of the Iceberg table spec: snapshot isolation, partition evolution, hidden partitioning, and time-travel queries

    You have managed Iceberg tables in a real catalog environment, whether Hive metastore, a REST catalog, or otherwise.

  • Databricks:  Practical Databricks experience covering cluster management, Delta Lake, Workflows or Jobs, and Unity Catalog

    Experience with Databricks Asset Bundles or MLflow is a welcome bonus.

  • dbt:  Confident building dbt projects from the ground up: models, sources, tests, snapshots, seeds, and macros

    You know how to wire dbt into Airflow, write documentation that people actually use, and make freshness checks meaningful.

  • Python:  Strong engineering-grade Python: well-structured, tested, and packaged

    Comfortable with pandas, PySpark, SQLAlchemy, and Pydantic

    You write code other engineers want to read.

 

Preferred Knowledge and Experience

  • Apache Kafka and Streaming:  Experience with near-real-time ingestion using Kafka, Kafka Connect, or Flink, including landing data into lakehouse storage.
  • Apache Spark:  PySpark proficiency for large-scale distributed processing, with a feel for memory management, shuffle optimisation, and partition sizing.
  • Kubernetes and OpenShift:  Platform knowledge beyond Airflow: Helm, RBAC, and namespace management, enough to self-serve on infrastructure without leaning on platform teams.
  • Data Governance and Contracts:  Familiarity with data contract approaches, catalogue tools such as DataHub or Collibra, and quality frameworks like Great Expectations or Soda.
  • Cloud Platforms:  Experience with managed data services on AWS, Azure, or GCP: S3, Glue, ADLS, Synapse, GCS, Dataproc, and similar.
  • Infrastructure as Code:  Ability to provision data infrastructure using Terraform or Pulumi.
  • ML Feature Engineering:  An understanding of feature stores and experience building pipelines that serve model training and inference reliably.

See Your Match Score

Sign up and Renata will show you how this job matches your skills and experience.

Senior Developer, Data Engineer at ICE | Renata