Back to jobs
Job Description
Key Responsibilities
- Contribute to the design and development of scalable data pipelines and a growing data lake
- Build and extend data processing workflows using Python, Apache Spark, and Databricks
- Define technical standards, best practices, and reusable frameworks for data engineering
- Ensure data quality, reliability, performance, and maintainability across data solutions
- Support data modeling, data integration, and transformation processes for analytics and reporting
- Drive automation, monitoring, and CI/CD improvements to ensure operational excellence
- Collaborate across teams, acting as a technical interface between the data platform and engineering, analytics, and business stakeholders.
- Contribute to architecture decisions and long-term data platform strategy
Your Profile
- Outstanding programming experience, preferably in Python; ability to write clean, testable, production-grade code; able to write clean, testable, production-grade code
- Strong SQL skills and familiarity with structured and semi-structured data formats (JSON, Protobuf, Delta format)
- Hands-on experience with Apache Spark, ideally on Databricks, and understanding of the medallion architecture
- Solid grasp of data lakehouse principles, data modeling, and data governance concepts
- Experience building and maintaining CI/CD pipelines (e.g. GitLab CI); familiarity with IaC and deployment
- Cloud Platforms: Experience with AWS or comparable cloud providers; familiarity with Databricks as a managed Lakehouse platform
- Experience with event-driven architectures or streaming platforms (e.g. Kafka)
- Proven track record deploying, monitoring, and maintaining data pipelines and services in
production environments; experience with testing practices - Able to work autonomously and take ownership of tasks end-to-end
- Clear and concise communicator — comfortable working across engineering and data teams
Education & experieance
- Bachelor's degree in Software Engineering, Mathematics, Physics, or a related field
- 3 years of project/coding experience in a company
- Cloud experience:
- Databricks, AWS/Azure
- Staging, testing, Git, pipelines
- Distributed systems, data pipelines
- Python experience:
- Versioning, package management, requirements/environments
- Change data capture
- Data engineering experience:
- SQL
- Partitioning
- Data structures (tables, relations)
- Lakehouse
