Talent.com
Research Engineer, Training Infrastructure Lead
Research Engineer, Training Infrastructure LeadGoodfire • San Francisco, CA, United States
Research Engineer, Training Infrastructure Lead

Research Engineer, Training Infrastructure Lead

Goodfire • San Francisco, CA, United States
job_description.job_card.30_days_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
job_description.job_card.job_description

Research Engineer, Training Infrastructure Lead

About Goodfire

Behind our name : Like fire, AI holds the potential for both immense benefit and significant risk. Just as mastering fire transformed human history, we believe the safe and intentional development of AI will shape the future of our species. Our goal is to tame this new fire.

Goodfire is an AI interpretability research company focused on understanding and intentionally designing advanced AI systems. We believe advances in interpretability will unlock the next frontier of safe and powerful foundation models and that deep research breakthroughs are necessary to make this possible.

Everything we do is in service of that mission. We move fast, take ownership, and constantly push to improve. We believe in acting today rather than tomorrow. We care deeply about the success of the organization and put the team above ourselves.

Goodfire is a public benefit corporation headquartered in San Francisco with a team of the world’s top interpretability researchers and engineers from organizations like OpenAI and DeepMind. We’ve raised $57M from investors like Menlo, Lightspeed and Anthropic and work with customers including Arc Institute, Mayo Clinic, and Rakuten.

The role :

We're seeking a senior engineering leader to own and evolve research platform and training infrastructure. You'll define both the technical vision and the implementation strategy for the systems that power our research breakthroughs.

Key responsibilities :

  • Design and build customizable training pipelines that scale from experimentation to production
  • Architect and implement large-scale model serving infrastructure for interpretability (reference : NDIF , Garcon )
  • Identify and execute on opportunities to dramatically accelerate research velocity
  • Lead technical decision-making for infrastructure that supports cutting-edge AI research

Who you are :

Goodfire is looking for experienced individuals who embody our values and share our deep commitment to making interpretability accessible. We care deeply about building a team who shares our values :

Put mission and team first

All we do is in service of our mission. We trust each other, deeply care about the success of the organization, and choose to put our team above ourselves.

Improve constantly

We are constantly looking to improve every piece of the business. We proactively critique ourselves and others in a kind and thoughtful way that translates to practical improvements in the organization. We are pragmatic and consistently implement the obvious fixes that work.

Take ownership and initiative

There are no bystanders here. We proactively identify problems and take full responsibility over getting a strong result. We are self-driven, own our mistakes, and feel deep responsibility over what we’re building.

Action today

We have a small amount of time to do something incredibly hard and meaningful. The pace and intensity of the organization is high. If we can take action today or tomorrow, we will choose to do it today.

What we are looking for :

Required experience :

  • 5+ years of experience in ML infrastructure, research engineering, and / or systems programming
  • Leadership experience as senior architect, tech lead, and / or engineering manager
  • Cross-functional expertise bridging research and engineering domains
  • Technical proficiency in Python, PyTorch / JAX, and distributed systems
  • Production experience deploying and maintaining ML systems at scale
  • Mission alignment with advancing AI safety and interpretability
  • Core competencies :

  • High-ownership leadership
  • Owns broad areas with autonomy, driving architectural and strategic decisions even amid uncertainty

  • Balances technical depth with speed, adapting as priorities evolve
  • Research-to-production mindset
  • Bridges fast research iteration with reliable, scalable production systems

  • Designs abstractions that preserve flexibility while ensuring robustness
  • Deep experience in Python, PyTorch, and large-scale training strategies
  • Hands-on with end-to-end ML infrastructure : from experiments to serving
  • Strong track record of scaling systems and debugging complex runs
  • Preferred qualifications
  • Contributions to open-source ML infrastructure projects

  • Experience in fast-paced startup or research lab environments
  • This role offers market competitive salary, equity, and competitive benefits. More importantly, you'll have the opportunity to work on groundbreaking technology with a world-class team on the critical path to ensuring a safe and beneficial future for humanity.

    The expected salary range for this position is $200,000 - $400,000 USD.

    Create a Job Alert

    Interested in building your career at Goodfire? Get future opportunities sent straight to your email.

    Apply for this job

    indicates a required field

    First Name

    Last Name

    Email

    Phone

    Resume / CV

    Enter manually

    Accepted file types : pdf, doc, docx, txt, rtf

    Enter manually

    Accepted file types : pdf, doc, docx, txt, rtf

    #J-18808-Ljbffr

    serp_jobs.job_alerts.create_a_job

    Infrastructure Engineer • San Francisco, CA, United States

    Job_description.internal_linking.related_jobs
    ML Research Engineer - Training

    ML Research Engineer - Training

    Achira • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Join a world‑class team of scientists, ML researchers, and engineers working together to make the physical microcosm predictable and reshape the future of drug discovery. Move beyond the beaten path...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_hours • serp_jobs.job_card.promoted • serp_jobs.job_card.new
    Machine Learning Infrastructure Engineer

    Machine Learning Infrastructure Engineer

    Ambience Healthcare • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Machine Learning Infrastructure Engineer.Machine Learning Infrastructure Engineer.Machine Learning Infrastructure Engineer. Machine Learning Infrastructure Engineer.Ambience Healthcare is the leadin...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Staff Infrastructure Engineer, Discovery Team

    Staff Infrastructure Engineer, Discovery Team

    Menlo Ventures • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Anthropic’s mission is to create reliable, interpretable, and steerable AI systems.We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Software Engineer, Research Infrastructure

    Software Engineer, Research Infrastructure

    OpenAI • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Software Engineer, Research Infrastructure.This role will support the fleet infrastructure team at OpenAI.The fleet team focuses on running the world’s largest, most reliable, and frictionless GPU ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_hours • serp_jobs.job_card.promoted • serp_jobs.job_card.new
    Machine Learning Engineer, Training Infrastructure

    Machine Learning Engineer, Training Infrastructure

    ZipRecruiter • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Machine Learning Engineer, Training Infrastructure.We are looking for an ML Engineer with 3+ years of experience in high-performance computing systems to manage and optimize our computational infra...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Software Engineer, ML Infrastructure - Training Platform

    Software Engineer, ML Infrastructure - Training Platform

    Scale AI, Inc. • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Scale is looking for an AI / ML Infrastructure Engineer to join our Machine Learning Infrastructure team to build out our Training Platform. You will partner closely with Machine Learning researchers ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Remote Finance Director - AI Trainer

    Remote Finance Director - AI Trainer

    Data Annotation • San Rafael, California
    serp_jobs.filters.remote
    serp_jobs.job_card.full_time +1
    We are looking for a finance professional to join our team to train AI models.You will measure the progress of these AI chatbots, evaluate their logic, and solve problems to improve the q...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Research Scientist / Engineer – Training Infrastructure

    Research Scientist / Engineer – Training Infrastructure

    IntelliPro Group Inc. • Palo Alto, CA, US
    serp_jobs.job_card.full_time
    serp_jobs.filters_job_card.quick_apply
    Research Scientist / Engineer – Training Infrastructure Position Type : Full time Location : Palo Alto, CA • Remote - US • Remote - International Salary Range : $220,000 - $300...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days
    Compute Infrastructure Strategy Lead

    Compute Infrastructure Strategy Lead

    OpenAI • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Compute Infrastructure Strategy Lead.The Industrial Compute team builds and operates the infrastructure behind OpenAI’s research and products. We design for scale, performance, and adaptability—brid...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Machine Learning Engineer, Training Infrastructure

    Machine Learning Engineer, Training Infrastructure

    IntelliPro Group Inc. • San Francisco, CA, US
    serp_jobs.job_card.full_time
    serp_jobs.filters_job_card.quick_apply
    Machine Learning Engineer, Training Infrastructure Position Type : Full time Location : San Francisco, CA, USA Salary Range : $150,000 - $250, 000 (USD) Job ID# : 158135 Job Description : We are l...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days
    ML Research Engineer, ML Systems

    ML Research Engineer, ML Systems

    Scale AI, Inc. • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Scale's ML platform (RLXF) team builds our internal distributed framework for large language model training and inference. The platform has been powering MLEs, researchers, data scientists and opera...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Machine Learning Engineer, Training Infrastructure

    Machine Learning Engineer, Training Infrastructure

    Hedra, Inc • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Hedra is a pioneering generative media company backed by top investors at Index, A16Z, and Abstract Ventures.We're building Hedra Studio, a multimodal creation platform capable of control, emotion,...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Top Level Microarchitecture

    Top Level Microarchitecture

    Eridu AI • Saratoga, CA, US
    serp_jobs.job_card.full_time
    Eridu AI isa Silicon Valley hardware startup focused on accelerating training and inference performance for large AI models. Today's AI model performance is often gated by infrastructure bottlenecks...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Licensed Mental Health Therapist (LMFT, LCSW, LPCC) - Moss Beach, CA

    Licensed Mental Health Therapist (LMFT, LCSW, LPCC) - Moss Beach, CA

    LifeStance Health • Moss Beach, CA, US
    serp_jobs.job_card.full_time
    At LifeStance Health, we believe in a truly healthy society where mental and physical healthcare are unified to make lives better. Our mission is to help people lead healthier, more fulfilling lives...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Machine Learning Engineer, Training Infrastructure

    Machine Learning Engineer, Training Infrastructure

    Ipro Networks Pte. Ltd. • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Job Title : Machine Learning Engineer, Training Infrastructure | Position Type : Full time | Location : San Francisco, CA, USA | Salary Range : $150,000 - $250,000 (USD) | Job ID# : 158135.Design, imple...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Machine Learning Infrastructure Engineer

    Machine Learning Infrastructure Engineer

    Character.AI • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Machine Learning Infrastructure Engineer.Machine Learning Infrastructure Engineer.Machine Learning Infrastructure Engineer. Machine Learning Infrastructure Engineer.Get AI-powered advice on this job...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Researcher Engineer / Scientist, Training

    Researcher Engineer / Scientist, Training

    OpenAI • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Researcher Engineer / Scientist, Training.OpenAI's Training team produces the large language models that power our research, products, and ultimately bring us closer to AGI.Achieving this goal requir...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_1_day • serp_jobs.job_card.promoted
    Distributed Training Engineer, Sora

    Distributed Training Engineer, Sora

    OpenAI • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Distributed Training Engineer, Sora.The Sora team is working on making video a key capability of OpenAI’s foundation models. We are a hybrid research and product team that seeks to understand and ex...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_hours • serp_jobs.job_card.promoted • serp_jobs.job_card.new