Talent.com
Machine Learning Engineer, Training Infrastructure

Machine Learning Engineer, Training Infrastructure

HedraSan Francisco, CA, United States
job_description.job_card.30_days_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
job_description.job_card.job_description

Overview

We are looking for an ML Engineer with 3+ YOE in high-performance computing systems to manage and optimize our computational infrastructure for training and deploying our machine learning models. The ideal candidate has diverse experience managing ML workloads at scale, supporting our 3DVAE and video diffusion models. We encourage you to apply even if you don’t meet every requirement — we value curiosity, creativity, and the drive to solve hard problems.

Responsibilities

  • Design, implement, and maintain scalable computing solutions for training and deploying ML models, ensuring infrastructure can handle large video datasets.
  • Manage and optimize the performance of our computing clusters or cloud instances, such as AWS or Google Cloud, to support distributed training.
  • Ensure that our infrastructure can handle the resource-intensive tasks associated with training large generative models.
  • Monitor system performance and implement improvements to maximize efficiency and utilization, using tools like Airflow for orchestration.
  • Collaborate across research teams to understand their computational needs and provide appropriate solutions, facilitating seamless model deployment.

Qualifications

  • Bachelor’s degree in Computer Science, Information Technology, or a related field, with a focus on system administration.
  • Experience with cloud computing platforms such as Amazon Web Services, Google Cloud, or Microsoft Azure, essential for managing large-scale ML workloads.
  • Values engineering processes and version control (CI / CD).
  • Knowledge of containerization technologies like Docker and Kubernetes required for deployments at scale.
  • Understanding of distributed training techniques and how to scale models across multi-node clusters aligning with video generation needs.
  • Strong problem-solving and communication skills, given the need to collaborate with diverse teams.
  • Benefits

  • Competitive compensation + equity
  • 401k (no match)
  • Healthcare (Silver PPO Medical, Vision, Dental)
  • Lunch and snacks at the office
  • Location : San Francisco, CA

    #J-18808-Ljbffr

    serp_jobs.job_alerts.create_a_job

    Machine Learning Engineer • San Francisco, CA, United States

    Job_description.internal_linking.related_jobs
    • serp_jobs.job_card.promoted
    • serp_jobs.job_card.new
    Machine Learning Engineer, Prediction

    Machine Learning Engineer, Prediction

    WaymoMountain View, CA, United States
    serp_jobs.job_card.full_time
    Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver.Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on buildin...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_hours
    • serp_jobs.job_card.promoted
    Machine Learning Engineer, GenAI Applied ML

    Machine Learning Engineer, GenAI Applied ML

    Scale AI, Inc.San Francisco, CA, United States
    serp_jobs.job_card.full_time
    At Scale AI, our mission is to accelerate the development of AI applications.For 8 years, Scale has been the leading AI data foundry, helping fuel the most exciting advancements in AI, including : g...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    • serp_jobs.job_card.new
    Staff Machine Learning Engineer, Acceleration

    Staff Machine Learning Engineer, Acceleration

    WaymoMountain View, CA, United States
    serp_jobs.job_card.full_time
    Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver.Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on buildin...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_1_hour
    • serp_jobs.job_card.promoted
    Staff Machine Learning Engineer

    Staff Machine Learning Engineer

    VirtualVocationsConcord, California, United States
    serp_jobs.job_card.full_time
    A company is looking for a Staff Machine Learning Engineer to design, build, and deploy advanced AI systems for financial technology applications. Key Responsibilities Develop and fine-tune large ...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Software Engineer, ML Infrastructure - Training Platform

    Software Engineer, ML Infrastructure - Training Platform

    Scale AI, Inc.San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Scale is looking for an AI / ML Infrastructure Engineer to join our Machine Learning Infrastructure team to build out our Training Platform. You will partner closely with Machine Learning researchers ...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Senior Lead Machine Learning Engineer

    Senior Lead Machine Learning Engineer

    VirtualVocationsHayward, California, United States
    serp_jobs.job_card.full_time
    A company is looking for a Senior Lead Machine Learning Engineer to lead the design and delivery of AI-powered intelligence systems. Responsibilities Design and implement infrastructure for agenti...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_1_day
    • serp_jobs.job_card.promoted
    ML Ops Engineer

    ML Ops Engineer

    VirtualVocationsConcord, California, United States
    serp_jobs.job_card.full_time
    A company is looking for an ML Ops Engineer to join their AI infrastructure team.Key Responsibilities Architect, implement, and maintain end-to-end ML pipelines Automate model training and deplo...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Founding Machine Learning Infrastructure Engineer

    Founding Machine Learning Infrastructure Engineer

    NomadicML Inc.San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Harvard, where they both did research in the intersection of computation and evaluations.Between them, they have authored multiple published papers in the machine learning domain and hold numerous ...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Machine Learning Engineer

    Machine Learning Engineer

    JobotSan Francisco, CA, US
    serp_jobs.job_card.full_time
    Entry Level ML Engineer Needed for Growing AI Startup!.This Jobot Job is hosted by : Reed Kellick.Are you a fit? Easy Apply now by clicking the "Apply Now" button and sending us your resume.Salary : ...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    Machine Learning Engineer, Training Infrastructure

    Machine Learning Engineer, Training Infrastructure

    IntelliPro Group Inc.San Francisco, CA, US
    serp_jobs.job_card.full_time
    serp_jobs.filters_job_card.quick_apply
    Machine Learning Engineer, Training Infrastructure Position Type : Full time Location : San Francisco, CA, USA Salary Range : $150,000 - $250, 000 (USD) Job ID# : 158135 Job Description : We are l...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Machine Learning Engineer

    Machine Learning Engineer

    VirtualVocationsSan Francisco, California, United States
    serp_jobs.job_card.full_time
    A company is looking for a Machine Learning Engineer for a 100% remote position.Key Responsibilities Design, build, and maintain machine learning models for production deployment Develop scalabl...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Applied Machine Learning Engineer

    Applied Machine Learning Engineer

    VirtualVocationsConcord, California, United States
    serp_jobs.job_card.full_time
    A company is looking for an Applied Machine Learning Engineer, Circuit Design - New College Grad 2025.Key Responsibilities Collaborate with a multi-functional team on Pre-silicon and Post Silicon...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Machine Learning Engineer, Training Infrastructure

    Machine Learning Engineer, Training Infrastructure

    Hedra, IncSan Francisco, CA, United States
    serp_jobs.job_card.full_time
    Hedra is a pioneering generative media company backed by top investors at Index, A16Z, and Abstract Ventures.We're building Hedra Studio, a multimodal creation platform capable of control, emotion,...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    • serp_jobs.job_card.new
    Senior Machine Learning Engineer

    Senior Machine Learning Engineer

    MeltwaterRedwood City, CA, United States
    serp_jobs.job_card.full_time
    Meltwater's Consumer Intelligence AI Team is looking for a.Natural Language Processing or Computer Vision features relying on the literature's state of the art. Those features are meant to be integr...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_hours
    • serp_jobs.job_card.promoted
    Manager of Applied Machine Learning

    Manager of Applied Machine Learning

    VirtualVocationsHayward, California, United States
    serp_jobs.job_card.full_time
    A company is looking for a Manager, Applied Machine Learning.Key Responsibilities Lead a team of machine learning scientists and engineers to design, develop, and deliver scalable ML solutions O...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_1_day
    • serp_jobs.job_card.promoted
    Machine Learning Engineer, Training Infrastructure

    Machine Learning Engineer, Training Infrastructure

    Ipro Networks Pte. Ltd.San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Job Title : Machine Learning Engineer, Training Infrastructure | Position Type : Full time | Location : San Francisco, CA, USA | Salary Range : $150,000 - $250,000 (USD) | Job ID# : 158135.Design, imple...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Machine Learning Infrastructure Engineer

    Machine Learning Infrastructure Engineer

    Character.AISan Francisco, CA, United States
    serp_jobs.job_card.full_time
    Machine Learning Infrastructure Engineer.Machine Learning Infrastructure Engineer.Machine Learning Infrastructure Engineer. Machine Learning Infrastructure Engineer.Get AI-powered advice on this job...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    • serp_jobs.job_card.new
    Machine Learning Engineer, Planning

    Machine Learning Engineer, Planning

    WaymoSan Francisco, CA, United States
    serp_jobs.job_card.full_time
    Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver.Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on buildin...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_hours
    • serp_jobs.job_card.promoted
    Senior Machine Learning Engineer

    Senior Machine Learning Engineer

    VirtualVocationsFremont, California, United States
    serp_jobs.job_card.full_time
    A company is looking for a Senior Machine Learning Engineer to join their Data Science team.Key Responsibilities : Design, build, and deploy end-to-end machine learning systems and data pipelines ...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Machine Learning Engineer, Training Infrastructure

    Machine Learning Engineer, Training Infrastructure

    HedraSan Francisco, CA, United States
    serp_jobs.job_card.full_time
    Hedra is a pioneering generative media company backed by top investors at Index, A16Z, and Abstract Ventures.We're building Hedra Studio, a multimodal creation platform capable of control, emotion,...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30