Back to jobs
Mach9

Data Engineer

San FranciscoPosted 2 days ago
Full-timeremote

Job Description

The Role

We're seeking a Data Engineer to transform large-scale geospatial datasets into structured, reliable, and accessible formats that power Mach9's ML and product pipelines. You'll work with high-volume data sources — laser scan point clouds, imagery, and a long tail of geospatial formats — and own the systems that get them ingested, standardized, stored, and made available for training, perception, and production use in a consistent and efficient way.

This role sits at the front of everything we do: our models are only as good as the data feeding them, and you'll be the one making that data trustworthy at scale.

Responsibilities

  • Develop and maintain scalable, reproducible workflows for ingesting and processing large volumes of point cloud, imagery, and geospatial data.

  • Convert datasets from various sensor providers into Mach9's standardized internal formats.

  • Build CI/CD pipelines and automated checks that guarantee the correctness and consistency of data pipelines, including regression detection on dataset processing.

  • Optimize processing performance, query speed, and storage efficiency across large geospatial datasets.

  • Work closely with the customer success team to efficiently resolve issues and unblock customer projects.

    • Build and maintain agentic harness for automated dataset triage and code patching. Automatically propose or apply fixes, and escalate when human judgment is needed.

  • Work closely with ML and product teams to make data readily usable for training, inference and visualization.

  • Work closely with customers and data-provider partners to facilitate data integration (with occasional travels).

  • Puzzle-hunting: work with data formats with sparse or missing documentation.

Requirements

  • Strong software development, problem-solving, and debugging skills, with hands-on experience building production systems in Python.

  • Solid foundation in distributed systems and parallel computing.

  • Comfort operating with ambiguity — able to dig into undocumented or messy data formats, reverse-engineer how they work, and make steady progress without a clear spec.

  • Experience building agentic systems and setting up agent harnesses — orchestrating LLM-driven workflows for triage, debugging, or automated code patching.

  • Strong communication and collaboration skills, with the ability to work across ML, product, and customer-facing teams.

  • Bachelor's degree in Computer Science, Engineering, or equivalent experience.

Bonus qualifications

  • Experience building agentic systems and setting up agent harnesses — orchestrating LLM-driven workflows for triage, debugging, or automated code patching.

  • Understanding of geospatial data formats (e.g., LAS/LAZ, COPC, E57, GeoTIFF, Shapefiles) and tooling (e.g., GDAL, PDAL, untwine, laz-perf).

  • Expertise designing and managing data schemas and storage systems for geospatial data (e.g., Postgres/PostGIS, AWS S3).

  • Experience with large-scale data processing frameworks and cloud platforms (e.g., Spark, AWS Batch).

  • Familiarity with coordinate reference systems and transforms (CRS, WKT, pyproj, affine transforms).

  • Experience building data versioning, lineage, or artifact-tracking systems.

  • Experience operating data pipelines that feed ML training and inference.

  • Familiar with C++.

See Your Match Score

Sign up and Renata will show you how this job matches your skills and experience.

Get Started Free
Data Engineer at Mach9 | Renata