About Us:

Proofpoint is a global leader in human- and agent-centric cybersecurity. We protect how people, data, and AI agents connect across email, cloud, and collaboration tools. Over 80 of the Fortune 100, 10,000 large enterprises, and millions of smaller organizations trust Proofpoint to stop threats, prevent data loss, and build resilience across their people and AI workflows. Our mission is simple: safeguard the digital world and empower people to work securely and confidently. Join us in our pursuit to defend data and protect people.

How We Work:

At Proofpoint you’ll be part of a global team that breaks barriers to redefine cybersecurity guided by our BRAVE core values:

Bold in how we dream and innovate

Responsive to feedback, challenges and opportunities

Accountable for results and best in class outcomes

Visionary in future focused problem-solving

Exceptional in execution and impact

The Role

We're seeking a Senior Data Engineer to build and maintain the ML/AI data infrastructure powering our email security platform. In this role, you'll design and optimize scalable data pipelines that enable threat detection and investigation while supporting both machine learning models and LLM-powered agents that provide context-aware security insights.

You'll work on our Detection Intelligence Platform (DIP) building feature engineering frameworks, and offline/online feature stores that serve as the foundation for ML model research and context engineering for AI agents. You'll collaborate with data scientists, ML engineers, and security researchers to build data models and context stores that power our detection systems and enable human security analysts to investigate threats effectively.

Key Responsibilities:

Develop and maintain scalable data pipelines on AWS/Azure using technologies such as Spark, Airflow, Athena, Kubernetes etc. to process structured and unstructured email data at scale

Design and optimize Iceberg-based data lake tables and schemas for efficient storage, querying and versioning across petabyte-scale datasets distributed across data centers globally

Build and manage feature engineering frameworks that support offline batch processing and online real-time feature serving for ML model training and inference

Develop and maintain training data pipelines optimized for distributed ML model training, ensuring data lineage and reproducibility

Collaborate with data scientists and security researchers to understand data requirements and translate them into robust, production-grade data solutions

Monitor and optimize data pipeline performance, implementing observability and alerting to ensure data freshness and quality

Mentor junior engineers and foster a culture of engineering excellence and knowledge sharing

Required Experience:

Several years of industry experience building and maintaining distributed data systems and high-scale data pipelines in a managed cloud environment (AWS / Azure / GCP) using big data processing engines such as Spark, Flink, Dask, Ray, Beam, DataBricks Workflows or similar

Deep proficiency in Python for developing production-grade data processing code

Strong experience with Infrastructure-as-Code frameworks, particularly Terraform

Solid understanding and hands-on experience with open table formats for data lakes (Apache Iceberg, Hudi, DeltaLake) and data modeling best practices

Experience with AWS Athena, Glue, or similar data query and cataloging services

Experience with Apache Airflow or similar workflow orchestration tools for batch and real-time pipeline management

Demonstrated ability to design and implement scalable ETL/ELT pipelines handling complex data transformations

Excellent communication skills and ability to collaborate effectively with technical and non-technical stakeholders

Good to have:

Experience with feature engineering frameworks and feature stores (e.g., Feast, Tecton, or custom solutions)

Familiarity with Kubernetes for containerized data workloads and orchestration

Background in building data infrastructure for machine learning and AI applications

Experience with data quality frameworks and observability tools for data pipelines

Why Proofpoint?

At Proofpoint, we believe that an exceptional career experience includes a comprehensive compensation and benefits package. Here are just a few reasons you’ll love working with us:

Competitive compensation
Comprehensive benefits
Career success on your terms
Flexible work environment
Annual wellness and community outreach days
Always on recognition for your contributions
Global collaboration and networking opportunities

Our Culture:

Our culture is rooted in values that inspire belonging, empower purpose and drive success-every day, for everyone.

We encourage applications from individuals of all backgrounds, experiences, and perspectives. If you need accommodation during the application or interview process, please reach out to [email protected].

How to Apply

Interested? Submit your application along with any supporting information- we can’t wait to hear from you!

Senior Data Engineer

Job Description

See Your Match Score

More jobs at Proofpoint

More jobs at Proofpoint