Back to jobs
Job Description
Job Responsibilities:
AdvanSix is seeking a Big Data Engineer to build and operate our enterprise Unified Data Layer (UDL) - spanning IT and OT - to deliver trustworthy, performant data products that power Finance, Operations, Supply Chain & Logistics, HSE, Commercial, and corporate analytics. You’ll engineer batch/CDC/streaming pipelines, model curated/semantic layers, and harden run-state with testing, CI/CD, security, and observability. You’ll partner closely with the data team and larger IT organization.
Mission
Design and deliver scalable, secure data pipelines and data models that safely connect operational systems to analytics, ensure trusted and well‑governed data, and enable repeatable delivery of BI, ML, AI, and automation solutions.
Data Engineering & Modeling
· Build ingestion pipelines (batch, CDC, streaming) from S/4HANA/DataSphere, PHD/historian, LIMS, TMS, HSE, and other sources into landing → curated → semantic layers.
· Implement data contracts, schema/versioning, SCD handling, partitioning, and performance tuning (file formats, clustering, caching).
· Develop dimensional/semantic models that back certified Power BI datasets and APIs for apps/agents.
OT/IT Integration & Safety
· Integrate OT data via OPC UA/MQTT, broker/DMZ patterns, read-only historian feeds, and event/batch frames—no control-net reads.
· Collaborate with plant controls on change control, signal quality, and downtime windows.
Quality, Security & Observability
· Embed data quality rules, unit/integration tests, and validation checks (freshness, completeness, drift/PSI).
· Instrument lineage and end-to-end monitoring; build alerting and on-call runbooks to minimize MTTR.
· Enforce RBAC, secrets management, PII/HSE classifications, and retention aligned to Governance/MDM policies.
CI/CD, Cost & Reliability
· Automate build/test/deploy with Git-based CI/CD (environments, approvals, blue/green).
· Track and optimize cost/performance (cluster sizing, autoscaling, cache strategy); contribute to FinOps reviews.
Collaboration & Documentation
· Partner with Reporting & BI on semantic model contracts, RLS, and performance SLAs; avoid direct system scraping.
· Produce “readme” docs, data dictionaries, runbooks, and post-incident reviews; support knowledge transfer with vendors.
Basic Qualifications:
· Minimum 5 years' in data engineering building production pipelines at scale (batch/CDC/streaming).
· Hands-on with Azure data stack: Databricks or Fabric/Synapse, ADF/Pipelines, ADLS/OneLake, Azure SQL/SQL MI, Key Vault.
· Strong SQL and Python/PySpark; comfort with Spark Structured Streaming and performance tuning.
· Experience implementing tests/observability (freshness, schema, expectations), and Git-based CI/CD.
· Familiarity with SAP S/4HANA structures and SAP DataSphere semantic modeling.
· OT concepts: historians (PHD/PI), OPC UA/MQTT, event/batch frames, ISA-95/99 basics.
· Understanding of Power BI consumption (semantic models, RLS) and APIs for downstream AI/ML apps/agents.
Preferred Qualifications:
· Time-series/data-quality tooling (e.g., Great Expectations or equivalent patterns), feature/metric stores.
· MDM concepts (keys, survivorship), lineage/catalog tooling.
· TMS/WMS, LIMS, Historian, HSE domain exposure; Lean/Six Sigma mindset; FinOps awareness.
The base salary range for this position is $104,600 to $156,800 annually.
