Back to jobs
A

Data Flow Engineer

Warsaw, Masovian Voivodeship, PolandPosted 3 weeks ago
Full-timehybridMid-Senior Level

Job Description

  • Design, develop, and maintain complex data flows within Cloudera DataFlow (Apache NiFi), ensuring scalable, reliable, and high-performance data movement across systems.
  • Develop and optimize real-time and near real-time data pipelines leveraging NiFi, Kafka, and CDC technologies (e.g., Debezium, SQL-based connectors).
  • Implement integrations with internal and external systems using REST APIs, JDBC, Kafka, and other communication protocols, ensuring secure and resilient data exchange.
  • Design and manage data schemas (Avro), metadata, and lineage using Apache Atlas, ensuring full traceability and governance of data flows.
  • Define and enforce data security and access control policies using Apache Ranger in alignment with enterprise governance frameworks.
  • Monitor, troubleshoot, and optimize data pipelines for performance, reliability, and scalability, including proactive alerting and issue resolution.
  • Collaborate with data engineers, architects, and business stakeholders to define requirements, design architectures, and deliver robust data flow solutions.
  • Create and maintain technical documentation, SOPs, and runbooks for operational support and knowledge sharing.
  • Support platform lifecycle activities, including upgrades, migrations, and enhancements across CDP, NiFi, and Kafka environments.
  • Perform other related duties as assigned by the team leader.
  • Advanced university degree (Master’s or equivalent) in computer science, information systems, data engineering, or a related field; a first-level degree combined with additional experience may be accepted in lieu of the advanced degree.
  • At least one of the following certifications:
  1. Cloudera Certified Developer for Apache NiFi (or equivalent)
  2. Cloudera DataFlow (CFM) certification (or equivalent)
    Equivalent certifications must be internationally recognized and accepted as valid credentials.
  • Minimum 2–3 years of hands-on experience working with Apache NiFi, preferably within the Cloudera Data Platform (CDP) environment, including flow design, deployment, monitoring, and troubleshooting.
  • Proven experience delivering at least one large-scale integration project using NiFi as a core technology (API integrations, database connectivity, transformation, routing, and delivery).
  • Expert knowledge in designing, implementing, and maintaining complex data flows using Apache NiFi / Cloudera DataFlow.
  • Advanced Python programming skills for data processing, automation, and custom flow development.
  • Strong experience in building and integrating REST APIs, including authentication (OAuth/JWT), rate limiting, and error handling strategies.
  • Hands-on experience with CDC (Change Data Capture) approaches, using NiFi processors/connectors and SQL-based methods.
  • Practical experience with Apache Iceberg, including table design, schema evolution, partitioning, and integration with processing engines (e.g., Spark, Flink).
  • Solid knowledge of data governance and catalog tools within CDP, including Apache Atlas (metadata, lineage, tagging) and Apache Ranger (security policies, authorization).
  • Experience working with Apache Kafka as a messaging platform, including topics, producers/consumers, schema management, and NiFi integration.
  • Good understanding of data serialization using Apache Avro, including schema evolution and compatibility principles.
  • Strong analytical and problem-solving skills, with the ability to diagnose and resolve complex data pipeline issues.
  • Excellent communication and collaboration skills, with the ability to work effectively in cross-functional teams.
  • Fluency in written and spoken English.
Data Flow Engineer at ARHS | Renata