Search jobs > Seattle, WA > Engineer ai ml

Software Engineer - AI/ML, AWS Neuron Distributed Training - Multimodality

Annapurna Labs (U.S.) Inc.
Seattle, Washington, USA
$129.3K a year
Full-time

AWS Neuron is the complete software stack for the AWS Inferentia (Inf1 / Inf2) and Trainium (Trn1), our cloud-scale Machine Learning accelerators.

This role is for a machine learning engineer in the Distribute Training team for AWS Neuron, responsible for development, enablement and performance tuning of a wide variety of ML model families, including massive-scale Large Language Models (LLM) such as GPT and Llama, as well as Stable Diffusion, Vision Transformers (ViT) and many more.

The ML Distributed Training team works side by side with chip architects, compiler engineers and runtime engineers to create, build and tune distributed training solutions with Trainium instances.

Experience with training these large models using Python is a must. FSDP (Fully-Sharded Data Parallel), Deepspeed and other distributed training libraries are central to this and extending all of this for the Neuron based system is key.

Key job responsibilities

You will help lead the efforts building distributed training support into Pytorch, Tensorflow using XLA and the Neuron compiler and runtime stacks.

You will help tune these models to ensure highest performance and maximize the efficiency of them running on the custom AWS Trainium and Inferentia silicon and the Trn1, Inf1 / 2 servers.

Strong software development and Machine Learning knowledge are both critical to this role.

About the team

Annapurna Labs was a startup company acquired by AWS in 2015, and is now fully integrated. If AWS is an infrastructure company, then think Annapurna Labs as the infrastructure provider of AWS.

Our org covers multiple disciplines including silicon engineering, hardware design and verification, software, and operations.

AWS Nitro, ENA, EFA, Graviton and F1 EC2 Instances, AWS Neuron, Inferentia and Trainium ML Accelerators, and in storage with scalable NVMe, are some of the products we have delivered, over the last few years.

Inclusive Team Culture

Here at AWS, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally.

We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences.

Amazon’s culture of inclusion is reinforced within our 16 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust.

Work / Life Balance

Our team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life.

We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment.

We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives.

Mentorship & Career Growth

Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship.

We care about your career growth and strive to assign projects based on what will help each team member develop into a better-rounded professional and enable them to take on more complex tasks in the future.

We are open to hiring candidates to work out of one of the following locations :

Seattle, WA, USA

BASIC QUALIFICATIONS

  • Bachelor's degree in computer science or equivalent
  • 3+ years of non-internship professional software development experience
  • 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
  • Experience programming with at least one software programming language
  • Experience in machine learning, data mining, information retrieval, statistics or natural language processing

PREFERRED QUALIFICATIONS

  • Master's degree in computer science or equivalent
  • 3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
  • Experience in computer architecture
  • Previous software engineering expertise with Pytorch / Jax / Tensorflow, Distributed libraries and Frameworks, End-to-end Model Training.
  • Previous experience with training multi-modal models for understanding and generating images / videos / audios
  • 25 days ago
Related jobs
Promoted
Canonical - Jobs
Seattle, Washington

As a software engineer on the team, you'll collaborate on an end-to-end data analytics and mlops solution composed of popular, open-source, machine learning tools, such as Kubeflow, MLFlow, DVC, and Feast. Python and Kubernetes Specialist Engineers focused on Data, AI/ML and Analytics Solutions....

Promoted
Apple, Inc.
Seattle, Washington

The Data Platform team within the AIML organization powers analytics, experimentation, and ML feature engineering to power Siri, Search, and other ML features we all love in our Apple devices. Participate in product design reviews to ensure security is a core component of design - Collaborate with s...

Promoted
ziprecruiter
Seattle, Washington

We are looking for a passionate Software Development Engineer who will work with an outstanding development team to create high scale distributed services that expand the reach and functionality of Config. You should be invested in helping our enterprise customers enforce IT policies and best practi...

Promoted
Apple, Inc.
Seattle, Washington

You will be responsible for setting technical direction for the team, driving engineering quality for the platforms and providing mentorship to junior engineers. BS in Computer Engineering, Electrical Engineering, Computer Science, or equivalent experience. Build Machine Learning & Experimentati...

Amazon Development Center U.S., Inc. - B02
Seattle, Washington

As a Software Development Engineer, you will solve technical challenges and act as a mentor to other junior engineers. AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consiste...

Databricks
Seattle, Washington

As a software engineer on the Runtime team at Databricks, you will be building the next generation distributed data storage and processing systems that can outperform specialized SQL query engines in relational query performance, yet provide the expressiveness and programming abstractions to support...

Apple
Seattle, Washington

The Data Platform team within the AIML organization powers analytics, experimentation, and ML feature engineering to power Siri, Search, and other ML features we all love in our Apple devices. Participate in product design reviews to ensure security is a core component of design - Collaborate with s...

Siemens Industry Software Inc.
Bellevue, Washington

Your Role as a Senior Software Engineer in Cloud &Industrial AI:. Are you passionate about revolutionizing industries throughthe application of cutting-edge AI? Siemens Digital Industries, a driving forcein industrial automation, is on a mission to transform the landscape ofManufacturing Enginee...

Oracle
Seattle, Washington

As a Senior Director of the software engineering division, you will apply your extensive knowledge of software architecture to lead software development tasks associated with developing, debugging or designing software applications, operating systems and databases according to current and future des...

Amazon Web Services, Inc.
Seattle, Washington

Our team, Collaborative Intelligence Technologies (CIT) is responsible for developing software and services to deliver knowledge management solutions to AWS customers and worldwide teams (such as Solutions Architects and Professional Services) so that our customers can find right information at righ...