Job Description
The Data Engineering team is responsible for designing, building, and maintaining the Market Data Platform — a lakehouse infrastructure spanning the full path from raw exchange feeds to reliable, petabyte-scale data for research, backtesting, and real-time trading.
Key Responsibilities
- Capture & Ingestion. Own the full capture path from wire to lake: decode and normalize raw exchange feeds (pcap, multicast UDP / ITCH / FIX) and vendor sources (OneTick, Refinitiv, Bloomberg, ICE) into a unified canonical model with nanosecond timestamps. Build batch + stream pipelines (Airflow, Spark, dbt) for tick and reference data. Own L2/L3 order-book reconstruction with gap handling. Provide Python and Rust producer SDKs for internal feed handlers.
- Storage & Modeling — Apache Iceberg. Own the Iceberg-over-S3 lakehouse: design partitioning, sort orders, and row-group layout for fast scans; manage schema evolution, snapshots, time travel, compaction, and TTL. Maintain reference data as slowly-changing tables with point-in-time correctness for backtests. Drive storage cost optimisation via compaction, tiering, and snapshot expiry.
- Tooling & Libraries. Build libraries for schema management, data contracts, validation, and lineage on top of the Iceberg catalog. Develop shared access services (Spark + Polars) so Research, backtesting, and trading share one normalized data layer, including gap detection and pcap-vs-lake reconciliation.
- Reliability & Observability. Embed monitoring, alerting, SLAs/SLOs, and CI/CD across capture and pipeline layers on Kubernetes (EKS). Own data-quality dashboards and incident runbooks for the capture fleet.
- Collaboration. Partner with Quant Research, Data Science, Backend, and DevOps to translate requirements into platform capabilities and champion market-data engineering best practices.
- 5+ years building production-grade data systems, with proven expertise architecting and launching data lakes / lakehouses from scratch.
- Hands-on experience with Apache Iceberg (or comparable table formats — Delta / Hudi): partitioning, schema evolution, snapshots, compaction, and catalog operations; familiarity with Apache Arrow for zero-copy, columnar in-memory interchange.
- Experience with market data and/or network packet capture — decoding pcap, exchange feed protocols (ITCH, FIX/FAST, multicast UDP), order-book reconstruction, and time-series at scale (strong plus; willingness to learn required).
- Experience normalizing market data from multiple vendors — e.g. OneTick, Refinitiv/Reuters, Bloomberg, ICE — into a unified schema and symbology (strong plus).
- Expert-level Python (incl. Polars and/or PySpark); Rust a strong plus (relevant for high-performance capture/decoding).
- Modern orchestration (Airflow) and distributed processing (Apache Spark).
- Advanced SQL: complex aggregations, window functions, query optimization, partition pruning.
- Solid fundamentals in Linux, containerization (Docker, Kubernetes / EKS), and cloud object storage (AWS S3).
- DevOps & observability: CI/CD, infrastructure-as-code (Terraform), GitOps (ArgoCD), and metrics/dashboards/alerting (Grafana, Prometheus).
- Strong grasp of structured + unstructured / binary data, and storage optimization — partitioning, compression, cost management.
- English fluency for documentation and collaboration in an international team.
We Offer
- Work in a modern IT company — no bureaucracy or legacy systems.
- Real opportunities for professional growth and to make your mark.
- Fully remote work from anywhere in the world, on a flexible schedule.
- Compensation for health insurance, sports, professional development, and more.