Job Description
Key Responsibilities
· Contribute to the design and development of scalable data pipelines and a growing data lake
· Build and extend data processing workflows using Python, Apache Spark, and Databricks
· Define technical standards, best practices, and reusable frameworks for data engineering
· Ensure data quality, reliability, performance, and maintainability across data solutions
· Support data modeling, data integration, and transformation processes for analytics and reporting
· Drive automation, monitoring, and CI/CD improvements to ensure operational excellence
· Collaborate across teams, acting as a technical interface between the data platform and engineering, analytics, and business stakeholders.
· Contribute to architecture decisions and long-term data platform strategy
Your Profile
· Outstanding programming experience, preferably in Python; ability to write clean, testable, production-grade code; able to write clean, testable, production-grade code
· Strong SQL skills and familiarity with structured and semi-structured data formats (JSON, Protobuf, Delta format)
· Hands-on experience with Apache Spark, ideally on Databricks, and understanding of the medallion architecture
· Solid grasp of data lakehouse principles, data modeling, and data governance concepts
· Experience building and maintaining CI/CD pipelines (e.g. GitLab CI); familiarity with IaC and deployment
· Cloud Platforms: Experience with AWS or comparable cloud providers; familiarity with Databricks as a managed Lakehouse platform
· Experience with event-driven architectures or streaming platforms (e.g. Kafka)
· Proven track record deploying, monitoring, and maintaining data pipelines and services in production environments; experience with testing practices
· Able to work autonomously and take ownership of tasks end-to-end
· Clear and concise communicator — comfortable working across engineering and data teams
