Data Engineer / 1
Job Description
Join our team to leverage your data engineering skills in a dynamic environment, ensuring seamless data migration and optimization for advanced AI and ML projects. Apply now to be part of our innovative journey!
Key responsibilities
Data pipeline development:
- Design, develop, and deploy Python-based ETL/ELT pipelines to migrate data from the on-premises MS SQL Server into the Databricks instance,
- Ensure efficient ingestion of historical parquet datasets into Databricks.
Data quality & validation:
- Implement validation, reconciliation, and quality assurance checks to ensure accuracy and completeness of migrated data,
- Handle schema mapping, field transformations, and metadata enrichment to standardize datasets,
- Ensure data governance, quality assurance, and compliance are integral to all migration activities.
Performance optimization:
- Tune pipelines for speed and efficiency, leveraging Databricks capabilities such as Delta Lake when appropriate,
- Manage resource usage and scheduling for large dataset transfers.
Collaboration:
- Work closely with AI engineers, data scientists, and business stakeholders to define data access patterns required for upcoming AI POCs,
- Partner with infrastructure teams to ensure secure connection between legacy systems and Databricks.
Documentation & governance:
- Maintain technical documentation for all data pipelines,
- Adhere to data governance, compliance, and security best practices throughout the migration process.
Required skills & experience:
- Proven experience in Python for data engineering tasks (PySpark, Pandas, etc.),
- Hands-on experience with Databricks and the Spark ecosystem,
- Solid understanding of ETL/ELT concepts, data modeling, and pipeline orchestration,
- Experience working with Microsoft SQL Server, including direct database connections,
- Practical experience ingesting Parquet data and managing large historical datasets,
- Knowledge of Delta Lake and structured streaming in Databricks is a plus,
- Familiarity with secure data transfer protocols between on-premises environments and cloud platforms,
- Strong problem-solving skills and ability to work independently.
Preferred qualifications:
- Experience with AI/ML data preparation workflows,
- Understanding of data governance and compliance requirements related to customer and contract data,
- Familiarity with orchestration tools such as Databricks Workflows or Airflow,
- Experience in setting up Databricks environments from first use.
We hereby inform you that Inetum Polska sp. z o.o. has implemented an internal reporting (whistleblowing) procedure. The content of the procedure and the possibility to submit an internal report are available at: