Job Description
This role will be accountable for building and maintaining scalable data pipelines from source systems. The Data Engineer will ensure the availability, reliability, and performance of data products by integrating raw data from various sources. Key responsibilities include data modeling, ETL (Extract, Transform, Load) development, and ensuring data quality and security. This role will be accountable for data coming in from 1-3 source systems.
Key Responsibilities
1. Execute
-
Collaborate with data product managers to gather data requirement
-
Execute ETL solutions including data security, data quality and performance requirements.
-
Prepare documentation of data product lineage and all other related ETL topics.
2. Data Extraction, Load and Transformation
- Implement and maintain ELT pipelines to efficiently ingest and transform data from a wide variety of data sources and deliver datasets that meet business requirements.
- Ensure efficient and reliable data mapping to support business needs.
- Deliver complete documentation and knowledge transfer sessions for the Team and business partners.
- Maintain existing solutions, implement optimizations and enhancements, monitor data quality.
- Support the development and maintenance of scalable data pipelines leveraging Azure Synapse, PySpark, APIs, and SQL & performing advanced data cleaning, transformation, and manipulation to ensure high-quality, and reliable data flows.
3. Process Improvement, Performance and Cost optimization tuning
- Collaborate with Data Science, AI, and Data product teams to optimize performance and cost effectiveness of their solutions.
- Identify and support the design of internal process improvements, including automating manual processes, optimizing data product delivery, and redesigning solutions for enhanced scalability.
- Implement solution adjustments to improve performance and cost-effectiveness of data products.
4. Issue Resolution and Support
- Assist stakeholders with data-related product pipeline issues and support their data product needs.
- Work with the Analytics Operational Support team to investigate, troubleshoot, and resolve data errors / discrepancies.
Required Qualifications:
Level of Education and Discipline
- Bachelor’s degree in Mathematics, Statistics, Computer Science, Data Analytics/Science, or related field
Certifications and/or Licenses
- Microsoft Certified: Azure Data Engineer Associate (DP-900) or Microsoft Certified: Fabric Data Engineer Associate or related cloud technologies, Fabric IQ/databricks certifications a plus
Experience
- 3-5 years of data engineering experience.
- Demonstrated ability coding in one or more languages (PySpark preferred).
- Experience with building data pipelines.
- Demonstrated ability to manage multiple priorities simultaneously.
- Basic SQL performance tuning.
- Structured + semi-structured data handling.
- Unit testing for pipelines.
- Git-based workflow.
- Exposure to medallion architecture.
- Understanding of data privacy basics.
Interpersonal Skills
- Effective communication skills and ability to communicate effectively on technical and business issues both internally and externally.
- Ability to work independently, navigate problems, resolve conflicts, and bring solutions to the table.
- Build strong interpersonal relationships
- Time management and prioritization skills
- Strong technical curiosity and passion for problem solving and innovation
- Ongoing aspiration to learn about new industry tools
Other Skills & Competencies
- Knowledge of data analysis, visualization techniques, and frameworks.
- Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
- Experience with the following tooling:
- SQL, Fabric, Databricks, Synapse, Azure Data Factory, Azure ML, Azure DevOps (for CI/CD), OR (Secondary)
- GCP BigQuery, FiveTran, GCP Cloud Composer, GCP DLP (Data Loss Prevention), GCP Cloud Run, Vertex AI etc
