Talent.com
Data Pipeline Engineer (ML)

Data Pipeline Engineer (ML)

OSI EngineeringSeattle, WA, US
job_description.job_card.variable_days_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
job_description.job_card.job_description

Job Description

Our client is scaling production ML systems and needs a hands-on engineer to help build, maintain, and run essential ML data pipelines . You’ll own high-throughput data ingestion and transformation workflows (including image- and -type modalities), enforce rigorous data quality standards, and partner with research and platform teams to keep models fed with reliable, versioned datasets.

  • Design, build, and operate reliable ML data pipelines for batch and / or streaming use cases across cloud environments.
  • Develop robust ETL / ELT processes (ingest, validate, cleanse, transform, and publish) with clear SLAs and monitoring.
  • Implement data quality gates (schema checks, null / outlier handling, drift and bias signals) and data versioning for reproducibility.
  • Optimize pipelines for distributed computing and large modalities (e.g., images, multi-dimensional s).
  • Automate repetitive workflows with CI / CD and infrastructure-as-code; document, test, and harden for production.
  • Collaborate with ML, Data Science, and Platform teams to align datasets, features, and model training needs.

Minimum Qualifications :

5+ years building and operating data pipelines in production.

  • Cloud : Hands-on with AWS , Azure , or GCP services for storage, compute, orchestration, and security.
  • Programming : Strong proficiency in Python and common data / ML libraries ( pandas , NumPy , etc.).
  • Distributed compute : Experience with at least one of Spark , Dask , or Ray .
  • Modalities : Experience handling image-type and -type data at scale.
  • Automation : Proven ability to automate repetitive tasks (shell / Python scripting, CI / CD).
  • Data Quality : Implemented validation, cleansing, and transformation frameworks in production.
  • Data Versioning : Familiar with tools / practices such as DVC , LakeFS , or similar.
  • Languages : Fluent in English or Farsi .
  • Strongly PreferredSQL expertise (writing performant queries; optimizing on large datasets).
  • Data warehousing / lakehouse concepts and tools (e.g., Snowflake / BigQuery / Redshift ; Delta / Lakehouse patterns).
  • Data virtualization / federation exposure (e.g., Presto / Trino) and semantic / metadata layers.
  • Orchestration (Airflow, Dagster, Prefect) and observability / monitoring for data pipelines.
  • MLOps practices (feature stores, experiment tracking, lineage, artifacts).
  • Containers & IaC (Docker; Terraform / CloudFormation) and CI / CD for data / ML workflows.
  • Testing for data / ETL (unit / integration tests, great_expectations or similar).
  • Soft Skills Executes independently and creatively ; comfortable owning outcomes in ambiguous environments.
  • Proactive communicator who collaborates cross-functionally with DS / ML / Platform stakeholders.
  • Location : Seattle, WA

    Duration : 1+ year

    Pay : $56 / hr

    serp_jobs.job_alerts.create_a_job

    Data Engineer • Seattle, WA, US