We are seeking a skilled and passionate Data Engineer to join our team and play a vital role in building and maintaining our data infrastructure. The ideal candidate will have extensive experience with AWS cloud services, particularly EMR, and strong proficiency in Spark and PySpark for data processing and transformation. This role will focus on designing, developing, and optimizing data pipelines to support our growing data needs.

Responsibilities:

AWS Data Services:
- Design, implement, and manage data solutions on AWS, leveraging services such as EMR, S3, Glue, and others.
- Optimize AWS data infrastructure for performance, scalability, and cost-effectiveness.
- Implement best practices for data security and compliance on AWS.
Apache Spark & PySpark:
- Develop and maintain scalable data pipelines using Apache Spark and PySpark.
- Perform data extraction, transformation, and loading (ETL/ELT) processes.
- Optimize Spark jobs for performance and efficiency.
- Develop and maintain data quality checks and validation processes.
Amazon EMR:
- Configure and manage EMR clusters for large-scale data processing.
- Troubleshoot and resolve EMR cluster issues.
- Optimize EMR cluster configurations for performance and cost.
- Deploy and monitor spark applications on EMR.
Data Pipeline Development:
- Design and implement robust and reliable data pipelines.
- Automate data ingestion, processing, and storage processes.
- Monitor data pipeline performance and troubleshoot issues.
- Work with various data sources, both structured and unstructured.
Collaboration and Communication:
- Collaborate with data scientists, analysts, and other engineers to understand data requirements.
- Document data pipelines and infrastructure.
- Communicate effectively with technical and non-technical stakeholders.
- Participate in code reviews.
Performance Optimization:
- Analyze query plans and optimize spark jobs.
- Monitor and tune data processing performance.
- Identify and resolve performance bottlenecks.

Qualifications:

Bachelor's degree in Computer Science, Data Science, or a related field (or equivalent experience).
Minimum 6-9 years of experience in a Data Engineering role.
Strong experience with Amazon Web Services (AWS) data services, particularly EMR.
Proficiency in Apache Spark and PySpark for data processing.
Experience with data warehousing and data lake concepts.
Strong SQL skills.
Experience with scripting languages (e.g., Python).
Understanding of data modeling and database design principles.
Experience with version control systems (e.g., Git).
Strong problem-solving and troubleshooting skills.
Excellent communication and collaboration skills.
Experience with other big data technologies (e.g., Hadoop, Hive, Kafka) is a plus.
Experience with data orchestration tools (ie airflow, step functions) is a plus.

Compensation, Benefits and Duration

Minimum Compensation: USD 37,000
Maximum Compensation: USD 130,000
Compensation is based on actual experience and qualifications of the candidate. The above is a reasonable and a good faith estimate for the role.
Medical, vision, and dental benefits, 401k retirement plan, variable pay/incentives, paid time off, and paid holidays are available for full time employees.
This position is not available for independent contractors
No applications will be considered if received more than 120 days after the date of this post

Data Engineer- Smithfield, RI

Job Description

See Your Match Score

More jobs at Photon

More jobs at Photon