Staff Data Engineer- Data Lake

New YorkPosted Yesterday

Full timehybrid

Job Description

At H1, we believe access to the best healthcare information is a basic human right. Our mission is to provide a platform that can optimally inform every doctor interaction globally. This promotes health equity and builds needed trust in healthcare systems. To accomplish this, our teams harness the power of data and AI technology to unlock groundbreaking medical insights and convert those insights into actions that result in optimal patient outcomes and accelerate an equitable and inclusive drug development lifecycle. Visit h1.co to learn more about us.

Data Engineering is responsible for the development and delivery of our most important asset—our data. With thousands of data sources from around the world, the team ensures that data is accurate, normalized, and delivered at a velocity that keeps up with real-world changes. As we expand our markets and the scope of data we provide to our customers, our team must scale to meet that demand.

WHAT YOU'LL DO AT H1

As a Staff Data Engineer on the Data Lake team at H1, you will play a critical role in shaping the architecture, scalability, reliability, and long-term direction of our core data platform. This role is designed for a highly technical engineer who is excited to grow into an Engineering Manager track while remaining deeply hands-on technically.

The Data Lake is the foundation of H1’s platform, responsible for the validation, accuracy, standardization, and quality of the data powering every downstream product and team across the organization. You will help lead the evolution of this platform while supporting and mentoring a growing team of engineers.

You will:
- Architect, build, and scale distributed ETL/ELT pipelines and large-scale ingestion frameworks across structured and unstructured healthcare datasets.
- Lead the evolution of H1’s Data Lake architecture with a focus on scalability, observability, reliability, and cost optimization.
- Own and improve data quality, validation, normalization, and standardization workflows across thousands of global data sources.
- Design and optimize batch and near real-time data processing frameworks using cloud-native distributed systems.
- Optimize distributed compute and storage systems, including Spark workloads, query performance, partitioning strategies, and infrastructure efficiency.
- Drive improvements in monitoring, governance, operational excellence, and production reliability across the platform.
- Troubleshoot complex production data and infrastructure issues across distributed systems.
- Partner closely with Product, Infrastructure, Security, Compliance, and downstream engineering teams to support scalable and secure data delivery.
- Mentor engineers through technical leadership, architecture reviews, and engineering best practices.
- Help define technical roadmap priorities and contribute to long-term platform strategy and execution planning.
- Support production operations, incident response, and platform health as part of overall ownership of the Data Lake ecosystem.

ABOUT YOU

You are a highly technical data engineer who thrives in lean, high-ownership environments and enjoys solving complex distributed systems challenges. You are excited by the opportunity to influence technical direction, mentor engineers, and grow into broader engineering leadership responsibilities while remaining hands-on.

- You have deep experience designing and scaling distributed data platforms and large-scale pipelines in cloud-native environments.
- You excel at building reliable, observable, and maintainable data systems supporting critical business and analytics workloads.
- You have strong expertise in distributed processing, performance optimization, and modern data architecture patterns.
- You are comfortable leading technical initiatives and influencing architecture decisions across teams.
- You communicate effectively with both technical and non-technical stakeholders.
- You enjoy mentoring engineers and helping raise the engineering bar across teams.
- You are energized by ownership, autonomy, and solving ambiguous technical challenges.

REQUIREMENTS

- 8+ years of experience in data engineering, software engineering, or related fields with significant experience building and scaling distributed data platforms.
- Demonstrated technical leadership experience with interest in or experience mentoring and leading engineers.
- Strong proficiency in Python (PySpark), Java, Scala, or similar programming languages.
Advanced SQL expertise, including performance tuning and optimization across large datasets.
- Deep experience with Apache Spark and cloud-native big data platforms, preferably within AWS environments (EMR, Glue, S3, Athena, Redshift, or similar).
- Experience designing and scaling modern cloud-native data lake architectures and large-scale ingestion frameworks.
- Experience with orchestration and workflow management tools such as Argo, Airflow, or similar technologies.
- Strong understanding of distributed storage systems, partitioning strategies, and file formats such as Parquet, Avro, and ORC.
- Experience with Docker, Kubernetes, and modern containerization technologies.
- Experience implementing monitoring, observability, and data quality frameworks within production environments.
- Experience with large-scale data cleaning, parsing, normalization, and validation workflows preferred.
- Experience working with healthcare, life sciences, publication, or large-scale entity-resolution datasets preferred.
- Exposure to ML/AI-driven data enrichment, parsing, or validation workflows is a plus.

- Experience using AI-assisted coding tools (e.g., GitHub Copilot, Claude Code) to accelerate development while maintaining quality is encouraged

COMPENSATION

This role pays $170,000 to $190,000 per year, based on experience, in addition to stock options.

Anticipated role close date: 8/1/2026

WHAT YOU'LL DO AT H1

ABOUT YOU

REQUIREMENTS

- Experience using AI-assisted coding tools (e.g., GitHub Copilot, Claude Code) to accelerate development while maintaining quality is encouraged

COMPENSATION

This role pays $170,000 to $190,000 per year, based on experience, in addition to stock options.

Anticipated role close date: 8/1/2026

H1 OFFERS

- Full suite of health insurance options, in addition to generous paid time off

- Pre-planned company-wide wellness holidays

- Retirement options

- Health & charitable donation stipends

- Impactful Business Resource Groups

- Flexible work hours & the opportunity to work from anywhere

- The opportunity to work with leading biotech and life sciences companies in an innovative industry with a mission to improve healthcare around the globe

H1 is proud to be an equal opportunity employer that celebrates diversity and is committed to creating an inclusive workplace with equal opportunity for all applicants and teammates. Our goal is to recruit the most talented people from a diverse candidate pool regardless of race, color, ancestry, national origin, religion, disability, sex (including pregnancy), age, gender, gender identity, sexual orientation, marital status, veteran status, or any other characteristic protected by law.

H1 is committed to working with and providing access and reasonable accommodation to applicants with mental and/or physical disabilities. If you require an accommodation, please reach out to your recruiter once you've begun the interview process. All requests for accommodations are treated discreetly and confidentially, as practical and permitted by law.

About H1

Website

More jobs at H1

Staff Data Engineer - Emerald

New York

People Business Partner

New York

Senior Strategic Account Manager (Pharma) - Copenhagen

Copenhagen

Full Stack .NET Software Engineer

Copenhagen

Clinical Operations Business Analyst

Copenhagen

Clinical Operations Data Analyst

Copenhagen

Similar roles

Senior Data Scientist

Success Academy Charter Schools · New York

Software Data Engineer, Data Platform

Augury · Haifa

Senior Data Platform Engineer

Manychat · Barcelona, Spain

Senior Data Architect (Pre-Sales)

Orion Innovation · India

Director, HRS - HR Data & Support MY

CIMB Group Malaysia · Malaysia

Retail Product & Data Specialist

Hy-Vee, Inc. · Des Moines #1, MLK Jr Parkway, Des Moines, IA