Data Pipeline Engineer (ML)

OSI EngineeringSeattle, WA, US

job_description.job_card.variable_days_ago

serp_jobs.job_preview.job_type

serp_jobs.job_card.full_time

job_description.job_card.job_description

Job Description

Our client is scaling production ML systems and needs a hands-on engineer to help build, maintain, and run essential ML data pipelines . You’ll own high-throughput data ingestion and transformation workflows (including image- and -type modalities), enforce rigorous data quality standards, and partner with research and platform teams to keep models fed with reliable, versioned datasets.

Design, build, and operate reliable ML data pipelines for batch and / or streaming use cases across cloud environments.
Develop robust ETL / ELT processes (ingest, validate, cleanse, transform, and publish) with clear SLAs and monitoring.
Implement data quality gates (schema checks, null / outlier handling, drift and bias signals) and data versioning for reproducibility.
Optimize pipelines for distributed computing and large modalities (e.g., images, multi-dimensional s).
Automate repetitive workflows with CI / CD and infrastructure-as-code; document, test, and harden for production.
Collaborate with ML, Data Science, and Platform teams to align datasets, features, and model training needs.

Minimum Qualifications :

5+ years building and operating data pipelines in production.

Cloud : Hands-on with AWS , Azure , or GCP services for storage, compute, orchestration, and security.

Programming : Strong proficiency in Python and common data / ML libraries ( pandas , NumPy , etc.).

Distributed compute : Experience with at least one of Spark , Dask , or Ray .

Modalities : Experience handling image-type and -type data at scale.

Automation : Proven ability to automate repetitive tasks (shell / Python scripting, CI / CD).

Data Quality : Implemented validation, cleansing, and transformation frameworks in production.

Data Versioning : Familiar with tools / practices such as DVC , LakeFS , or similar.

Languages : Fluent in English or Farsi .

Strongly PreferredSQL expertise (writing performant queries; optimizing on large datasets).

Data warehousing / lakehouse concepts and tools (e.g., Snowflake / BigQuery / Redshift ; Delta / Lakehouse patterns).

Data virtualization / federation exposure (e.g., Presto / Trino) and semantic / metadata layers.

Orchestration (Airflow, Dagster, Prefect) and observability / monitoring for data pipelines.

MLOps practices (feature stores, experiment tracking, lineage, artifacts).

Containers & IaC (Docker; Terraform / CloudFormation) and CI / CD for data / ML workflows.

Testing for data / ETL (unit / integration tests, great_expectations or similar).

Soft Skills Executes independently and creatively ; comfortable owning outcomes in ambiguous environments.

Proactive communicator who collaborates cross-functionally with DS / ML / Platform stakeholders.

Location : Seattle, WA

Duration : 1+ year

Pay : $56 / hr

serp_jobs.job_alerts.create_a_job

Data Engineer • Seattle, WA, US