Back to jobs
I

Data Engineer / 1

Warsaw, Masovian Voivodeship, PolandPosted 3 months ago
Full-timehybridAssociate

Job Description

Join our team to leverage your data engineering skills in a dynamic environment, ensuring seamless data migration and optimization for advanced AI and ML projects. Apply now to be part of our innovative journey!

Key responsibilities

Data pipeline development:

  • Design, develop, and deploy Python-based ETL/ELT pipelines to migrate data from the on-premises MS SQL Server into the Databricks instance,
  • Ensure efficient ingestion of historical parquet datasets into Databricks.

Data quality & validation:

  • Implement validation, reconciliation, and quality assurance checks to ensure accuracy and completeness of migrated data,
  • Handle schema mapping, field transformations, and metadata enrichment to standardize datasets,
  • Ensure data governance, quality assurance, and compliance are integral to all migration activities.

Performance optimization:

  • Tune pipelines for speed and efficiency, leveraging Databricks capabilities such as Delta Lake when appropriate,
  • Manage resource usage and scheduling for large dataset transfers.

Collaboration:

  • Work closely with AI engineers, data scientists, and business stakeholders to define data access patterns required for upcoming AI POCs,
  • Partner with infrastructure teams to ensure secure connection between legacy systems and Databricks.

Documentation & governance:

  • Maintain technical documentation for all data pipelines,
  • Adhere to data governance, compliance, and security best practices throughout the migration process.

Required skills & experience:

  • Proven experience in Python for data engineering tasks (PySpark, Pandas, etc.),
  • Hands-on experience with Databricks and the Spark ecosystem,
  • Solid understanding of ETL/ELT concepts, data modeling, and pipeline orchestration,
  • Experience working with Microsoft SQL Server, including direct database connections,
  • Practical experience ingesting Parquet data and managing large historical datasets,
  • Knowledge of Delta Lake and structured streaming in Databricks is a plus,
  • Familiarity with secure data transfer protocols between on-premises environments and cloud platforms,
  • Strong problem-solving skills and ability to work independently.

Preferred qualifications:

  • Experience with AI/ML data preparation workflows,
  • Understanding of data governance and compliance requirements related to customer and contract data,
  • Familiarity with orchestration tools such as Databricks Workflows or Airflow,
  • Experience in setting up Databricks environments from first use.

We hereby inform you that Inetum Polska sp. z o.o. has implemented an internal reporting (whistleblowing) procedure. The content of the procedure and the possibility to submit an internal report are available at:

https://inetum.whispli.com/speakup?locale=pl

See Your Match Score

Sign up and Renata will show you how this job matches your skills and experience.

Data Engineer / 1 at Inetum | Renata