Talent.com
AIML - Staff ML Infrastructure Engineer, ML Platform & Technology - Pre-training Compute
AIML - Staff ML Infrastructure Engineer, ML Platform & Technology - Pre-training ComputeApple Inc. • San Francisco, CA, United States
serp_jobs.error_messages.no_longer_accepting
AIML - Staff ML Infrastructure Engineer, ML Platform & Technology - Pre-training Compute

AIML - Staff ML Infrastructure Engineer, ML Platform & Technology - Pre-training Compute

Apple Inc. • San Francisco, CA, United States
job_description.job_card.1_day_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
job_description.job_card.job_description

AIML - Staff ML Infrastructure Engineer, ML Platform & Technology - Pre-training Compute

San Francisco Bay Area, California, United States Machine Learning and AI

Apple is where individual imaginations gather together, committing to the values that lead to great work. Every new product we build, service we create, or Apple Store experience we deliver is the result of us making each other’s ideas stronger. That happens because every one of us shares a belief that we can make something wonderful and share it with the world, changing lives for the better. It’s the diversity of our people and their thinking that inspires the innovation that runs through everything we do. When we bring everybody in, we can do the best work of our lives. Here, you’ll do more than join something — you’ll add something!

Description

As an engineer on ML Compute team, your work will include : Drive large-scale pre-training initiatives to support cutting-edge foundation models, focusing on resiliency, efficiency, scalability, and resource optimization. Enhance distributed training techniques for foundation models. Research and implement new patterns and technologies to improve system performance, maintainability, and design. Optimize execution and performance of workloads built with JAX, PyTorch, XLA and CUDA on large distributed systems. Leverage high-performance networking technologies such as NCCL for GPU collectives and TPU interconnect (ICI / Fabric) for large-scale distributed training. Architect a robust MLOps platform to streamline and automate pretraining operations. Operationalize large-scale ML workloads on Kubernetes, ensuring distributed trainings are robust, efficient, and fault-tolerant. Lead complex technical projects, defining requirements and tracking progress with team members. Collaborate with cross-functional engineers to solve large-scale ML training challenges. Mentor engineers in areas of your expertise, fostering skill growth and knowledge sharing. Cultivate a team centered on collaboration, technical excellence, and innovation.

Minimum Qualifications

  • Bachelors in Computer Science, engineering, or a related field
  • 6+ years of hands‑on experience in building scalable backend systems for training and evaluation of machine learning models
  • Proficient in relevant programming languages, like Python or Go
  • Strong expertise in distributed systems, reliability and scalability, containerization, and cloud platforms
  • Proficient in cloud computing infrastructure and tools : Kubernetes, Ray, PySpark
  • Ability to clearly and concisely communicate technical and architectural problems, while working with partners to iteratively find solutions

Preferred Qualifications

  • Advance degrees in Computer Science, engineering, or a related field
  • Proficient in working with and debugging accelerators, like GPU, TPU, AWS Trainium
  • Proficient in ML training and deployment frameworks, like JAX, Tensorflow, PyTorch, TensorRT, vLLM
  • Compensation

    Apple’s base pay for this role ranges from $181,100 to $318,400, depending on your skills, qualifications, experience, and location.

    Benefits

  • Comprehensive medical and dental coverage
  • Retirement benefits
  • A range of discounted products and free services
  • Reimbursement for certain educational expenses, including tuition
  • This role may also be eligible for discretionary bonuses or commission payments as well as relocation assistance.

    Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant.

    #J-18808-Ljbffr

    serp_jobs.job_alerts.create_a_job

    Ml Engineer • San Francisco, CA, United States

    Job_description.internal_linking.related_jobs
    Senior Staff Software Engineer, ML Engineering, Perception

    Senior Staff Software Engineer, ML Engineering, Perception

    Waymo • Mountain View, CA, United States
    serp_jobs.job_card.full_time
    Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver.Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on buildin...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Senior Applied AI Engineer – ML for Systems & Infrastructure

    Senior Applied AI Engineer – ML for Systems & Infrastructure

    Databricks Inc. • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Senior Applied AI Engineer – ML for Systems & Infrastructure.The Applied AI team at Databricks sits at the forefront of advancing GenAI-powered products. Over the past years, we’ve launched Databric...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    AIML - Staff Machine Learning Engineer, Answers, Knowledge & Information (AKI)

    AIML - Staff Machine Learning Engineer, Answers, Knowledge & Information (AKI)

    Apple Inc. • Santa Clara, CA, United States
    serp_jobs.job_card.full_time
    AIML - Staff Machine Learning Engineer, Answers, Knowledge & Information (AKI).Santa Clara, California, United States Machine Learning and AI. In this role you will have the opportunity to develop L...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Senior Staff TLM, ML Data Infra

    Senior Staff TLM, ML Data Infra

    Waymo • Mountain View, CA, United States
    serp_jobs.job_card.full_time
    Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver.Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on buildin...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    AIML - Senior ML Engineer - Information Intelligence

    AIML - Senior ML Engineer - Information Intelligence

    Apple Inc. • Santa Clara, CA, United States
    serp_jobs.job_card.full_time
    Santa Clara, California, United States Machine Learning and AI.As a member of our fast-paced group you’ll have the unique and rewarding opportunity to shape upcoming products from Apple.Our team in...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Machine Learning Infrastructure Engineer

    Machine Learning Infrastructure Engineer

    Greylock Partners • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Machine Learning Infrastructure Engineer — join early B2C investment to help build large-scale ML infrastructure for a cutting-edge AI-first mobile product. Founders have experience building iconic ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Software Engineer, ML Infrastructure - Training Platform

    Software Engineer, ML Infrastructure - Training Platform

    Scale AI, Inc. • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Scale is looking for an AI / ML Infrastructure Engineer to join our Machine Learning Infrastructure team to build out our Training Platform. You will partner closely with Machine Learning researchers ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Staff Machine Learning Engineer, ML Infrastructure (Predictive Planner)

    Staff Machine Learning Engineer, ML Infrastructure (Predictive Planner)

    Waymo • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver.Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on buildin...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    AIML - Staff Machine Learning Engineer, Answers Knowledge and Information

    AIML - Staff Machine Learning Engineer, Answers Knowledge and Information

    Apple Inc. • Cupertino, CA, United States
    serp_jobs.job_card.full_time
    AIML - Staff Machine Learning Engineer, Answers Knowledge and Information.Cupertino, California, United States Machine Learning and AI. The AIML Information Intelligence team is creating groundbreak...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    AI Infrastructure Engineer, Model Serving Platform

    AI Infrastructure Engineer, Model Serving Platform

    Scale AI, Inc. • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    As a Software Engineer on the ML Infrastructure team, you will design and build platforms for scalable, reliable, and efficient serving of LLMs. Our platform powers cutting-edge research and product...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Applied AI / ML Engineer

    Applied AI / ML Engineer

    Catalyst Labs • San Jose, CA, US
    serp_jobs.job_card.full_time
    Catalyst Labs is a leading talent agency with a specialized vertical in Applied AI, Machine Learning, and Data Science.We stand out as an agency thats deeply embedded in our clients recruitment ope...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_1_day • serp_jobs.job_card.promoted
    Principal Staff Engineer – AI Infrastructure - AI / ML Leader

    Principal Staff Engineer – AI Infrastructure - AI / ML Leader

    Andiamo • San Francisco, CA, United States
    serp_jobs.job_card.permanent
    Principal Staff Engineer - AI Infrastructure.We are seeking a Principal Staff Engineer to lead the architecture and development of our next-generation AI infrastructure. This role sits at the inters...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    AIML - Staff Machine Learning Engineer - ML Efficiency, ML Platform & Technology

    AIML - Staff Machine Learning Engineer - ML Efficiency, ML Platform & Technology

    Apple Inc. • Santa Clara, CA, United States
    serp_jobs.job_card.full_time
    AIML - Staff Machine Learning Engineer - ML Efficiency, ML Platform & Technology.Santa Clara, California, United States — Machine Learning and AI. We are seeking highly motivated and experienced eng...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Staff Machine Learning Engineer, AI Platform

    Staff Machine Learning Engineer, AI Platform

    General Motors • Sunnyvale, CA, United States
    serp_jobs.job_card.full_time
    Remote : This role is based remotely but if you live within a 50-mile radius of Mountain View, you are expected to report to that location three times a week, at minimum. We are seeking an experience...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Staff Machine Learning Engineer, ML Performance & Optimization

    Staff Machine Learning Engineer, ML Performance & Optimization

    Waymo • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver.Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on buildin...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_hours • serp_jobs.job_card.promoted • serp_jobs.job_card.new
    AI Infrastructure Engineer, ML Data Platform

    AI Infrastructure Engineer, ML Data Platform

    Scale AI, Inc. • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Scale's AI Infrastructure team supports both R&D and applied Generative AI initiatives, driving breakthroughs in areas of post-training research such as AI safety, agents, and evaluating state-of-t...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Tech Lead Manager- MLRE, ML Systems

    Tech Lead Manager- MLRE, ML Systems

    Scale AI, Inc. • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Scale's LLM post-training platform team builds our internal distributed framework for large language model training.The platform powers MLEs, researchers, data scientists, and operators for fast an...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Machine Learning Infrastructure Engineer

    Machine Learning Infrastructure Engineer

    Character.AI • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Machine Learning Infrastructure Engineer.Machine Learning Infrastructure Engineer.Machine Learning Infrastructure Engineer. Machine Learning Infrastructure Engineer.Get AI-powered advice on this job...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    AIML - Staff Machine Learning Engineer

    AIML - Staff Machine Learning Engineer

    Apple Inc. • Cupertino, CA, United States
    serp_jobs.job_card.full_time
    Cupertino, California, United States Machine Learning and AI.The Apple Knowledge & Information (AKI) Entity Resolution team is looking for senior and staff engineers to lead software projects suffu...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    AIML - Machine Learning Engineer - Special Projects

    AIML - Machine Learning Engineer - Special Projects

    Apple Inc. • Cupertino, CA, United States
    serp_jobs.job_card.full_time
    Cupertino, California, United States Software and Services.In this role, you will : - Design and implement machine learning algorithms to aid in building user-facing interactive conversational featur...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted