Data Engineer / 1

Warsaw, Masovian Voivodeship, PolandPosted 3 months ago

Full-timehybridAssociate

Job Description

Join our team to leverage your data engineering skills in a dynamic environment, ensuring seamless data migration and optimization for advanced AI and ML projects. Apply now to be part of our innovative journey!

Key responsibilities

Data pipeline development:

Design, develop, and deploy Python-based ETL/ELT pipelines to migrate data from the on-premises MS SQL Server into the Databricks instance,
Ensure efficient ingestion of historical parquet datasets into Databricks.

Data quality & validation:

Implement validation, reconciliation, and quality assurance checks to ensure accuracy and completeness of migrated data,
Handle schema mapping, field transformations, and metadata enrichment to standardize datasets,
Ensure data governance, quality assurance, and compliance are integral to all migration activities.

Performance optimization:

Tune pipelines for speed and efficiency, leveraging Databricks capabilities such as Delta Lake when appropriate,
Manage resource usage and scheduling for large dataset transfers.

Collaboration:

Work closely with AI engineers, data scientists, and business stakeholders to define data access patterns required for upcoming AI POCs,
Partner with infrastructure teams to ensure secure connection between legacy systems and Databricks.

Documentation & governance:

Maintain technical documentation for all data pipelines,
Adhere to data governance, compliance, and security best practices throughout the migration process.

Required skills & experience:

Proven experience in Python for data engineering tasks (PySpark, Pandas, etc.),
Hands-on experience with Databricks and the Spark ecosystem,
Solid understanding of ETL/ELT concepts, data modeling, and pipeline orchestration,
Experience working with Microsoft SQL Server, including direct database connections,
Practical experience ingesting Parquet data and managing large historical datasets,
Knowledge of Delta Lake and structured streaming in Databricks is a plus,
Familiarity with secure data transfer protocols between on-premises environments and cloud platforms,
Strong problem-solving skills and ability to work independently.

Preferred qualifications:

Experience with AI/ML data preparation workflows,
Understanding of data governance and compliance requirements related to customer and contract data,
Familiarity with orchestration tools such as Databricks Workflows or Airflow,
Experience in setting up Databricks environments from first use.

We hereby inform you that Inetum Polska sp. z o.o. has implemented an internal reporting (whistleblowing) procedure. The content of the procedure and the possibility to submit an internal report are available at:

https://inetum.whispli.com/speakup?locale=pl

See Your Match Score

About Inetum

More jobs at Inetum

CONSULTANT FONCTIONNEL PLM/ALM

SAINT OUEN, , France

Développeur Java/Angular

NIORT, Nouvelle-Aquitaine, France

Senior Integration Engineer / 1

Warsaw, Masovian Voivodeship, Poland

Consultant technique RPA

Lyon, France

Intégrateur DevOps H/F

NIORT, Nouvelle-Aquitaine, France

Jefe/a de Proyectos IT (Infraestructura)

Santurtzi, PV, Spain

Similar roles

Data & Digital Solutions Engineer

ASM · US, Arizona, Phoenix

Software Engineer II (IND)-Snowflake/Data fabric-GR-39944-72496-1-JR189252

Carelon Global Solutions India · IND-KA-Bengaluru, Bagmane Solarium City

Energy Data Analyst

World Kinect · Budapest, HU

Summer Intern - Data Science and Analytics

TransUnion · Hong Kong

Data Scientist - MLOps

Barclays UK · Pune, Gera Commerzone SEZ

Director, Data Operations

Barclays UK · Glasgow, Clyde Place