Data Flow Engineer
Job Description
- Design, develop, and maintain complex data flows within Cloudera DataFlow (Apache NiFi), ensuring scalable, reliable, and high-performance data movement across systems.
- Develop and optimize real-time and near real-time data pipelines leveraging NiFi, Kafka, and CDC technologies (e.g., Debezium, SQL-based connectors).
- Implement integrations with internal and external systems using REST APIs, JDBC, Kafka, and other communication protocols, ensuring secure and resilient data exchange.
- Design and manage data schemas (Avro), metadata, and lineage using Apache Atlas, ensuring full traceability and governance of data flows.
- Define and enforce data security and access control policies using Apache Ranger in alignment with enterprise governance frameworks.
- Monitor, troubleshoot, and optimize data pipelines for performance, reliability, and scalability, including proactive alerting and issue resolution.
- Collaborate with data engineers, architects, and business stakeholders to define requirements, design architectures, and deliver robust data flow solutions.
- Create and maintain technical documentation, SOPs, and runbooks for operational support and knowledge sharing.
- Support platform lifecycle activities, including upgrades, migrations, and enhancements across CDP, NiFi, and Kafka environments.
- Perform other related duties as assigned by the team leader.
- Advanced university degree (Master’s or equivalent) in computer science, information systems, data engineering, or a related field; a first-level degree combined with additional experience may be accepted in lieu of the advanced degree.
- At least one of the following certifications:
- Cloudera Certified Developer for Apache NiFi (or equivalent)
- Cloudera DataFlow (CFM) certification (or equivalent)
Equivalent certifications must be internationally recognized and accepted as valid credentials.
- Minimum 2–3 years of hands-on experience working with Apache NiFi, preferably within the Cloudera Data Platform (CDP) environment, including flow design, deployment, monitoring, and troubleshooting.
- Proven experience delivering at least one large-scale integration project using NiFi as a core technology (API integrations, database connectivity, transformation, routing, and delivery).
- Expert knowledge in designing, implementing, and maintaining complex data flows using Apache NiFi / Cloudera DataFlow.
- Advanced Python programming skills for data processing, automation, and custom flow development.
- Strong experience in building and integrating REST APIs, including authentication (OAuth/JWT), rate limiting, and error handling strategies.
- Hands-on experience with CDC (Change Data Capture) approaches, using NiFi processors/connectors and SQL-based methods.
- Practical experience with Apache Iceberg, including table design, schema evolution, partitioning, and integration with processing engines (e.g., Spark, Flink).
- Solid knowledge of data governance and catalog tools within CDP, including Apache Atlas (metadata, lineage, tagging) and Apache Ranger (security policies, authorization).
- Experience working with Apache Kafka as a messaging platform, including topics, producers/consumers, schema management, and NiFi integration.
- Good understanding of data serialization using Apache Avro, including schema evolution and compatibility principles.
- Strong analytical and problem-solving skills, with the ability to diagnose and resolve complex data pipeline issues.
- Excellent communication and collaboration skills, with the ability to work effectively in cross-functional teams.
- Fluency in written and spoken English.