Back to jobsBachelor’s degree in computer science, data management, information systems, information science or a related field; advanced degree in computer science, data management, information systems, information science or a related field preferred.
3+ years in data engineering building production data pipelines (batch and/or streaming) with Spark on cloud.
2+ years hands-on Azure Databricks (PySpark/Scala, Spark SQL, Delta Lake) including:
Azure Data Factory for orchestration (pipelines, triggers, parameterization, IRs) and integration with ADLS Gen2, Key Vault.
Strong SQL across large datasets; performance tuning (joins, partitions, file sizing).
Data quality at scale (e.g., Great Expectations/Deequ), monitoring and alerting; debug/backfill playbooks.
DevOps for data: Git branching, code reviews, unit/integration testing (pytest/dbx), CI/CD (Azure DevOps/GitHub Actions).
Infrastructure as Code (Terraform or Bicep) for Databricks workspaces, cluster policies, ADF, storage.
Observability & cost control: Azure Monitor/Log Analytics; cluster sizing, autoscaling, Photon; cost/perf trade-offs.
Proven experience collaborating with cross-functional stakeholders (analytics, data governance, product, security) to ship and support data products.
Job Description
- Delta Lake operations (MERGE/CDC, OPTIMIZE/Z-ORDER, VACUUM, partitioning, schema evolution).
- Unity Catalog (RBAC, permissions, lineage, data masking/row-level access).
- Databricks Jobs/Workflows or Delta Live Tables.
