Search jobs > Santa Clara, CA > Senior software engineer

Senior Software Engineer, Distributed Systems - DGX Cloud

NVIDIA
Santa Clara, California, US
$180K-$276K a year
Full-time

NVIDIA is hiring engineers to scale up its AI Infrastructure. We expect you to have a strong programming background, a deep understanding of distributed systems, familiarity with software testing and deployment, and excellent communication and planning abilities.

We also welcome out-of-the-box thinkers who can provide new ideas with strong execution bias. Expect to be constantly challenged, improving, and evolving for the better.

You and other engineers in this team will help advance NVIDIA's capacity to build and deploy leading infrastructure solutions for a broad range of AI-based applications that affect core data science.

If you're creative, passionate about what you do, and love having fun, apply today!

Please read the following job description thoroughly to ensure you are the right fit for this role before applying.

For two decades, we have pioneered visual computing, the art and science of computer graphics. With the invention of the GPU - the engine of modern visual computing - the field has expanded to encompass video games, movie production, product design, medical diagnosis, and scientific research.

Today, we stand at the beginning of the next era, the AI computing era, ignited by a new computing model, GPU deep learning.

What You Will Be Doing

  • Designing and architecting a comprehensive platform that automates GPU asset provisioning, configuration, and lifecycle management across cloud providers.
  • Implementing monitoring and health management capabilities that enable industry-leading reliability, availability, and scalability of GPU assets.

You will be harnessing multiple data streams, ranging from GPU hardware diagnostics to cluster and network telemetry, to predict system failures in order to optimize workload success rates.

Working with engineering teams across NVIDIA to ensure your software integrates seamlessly from the hardware all the way up to the AI training applications.

What We Need To See

  • Highly motivated with strong communication skills, you have the ability to work successfully with multi-functional teams, principles, and architects and coordinate effectively across organizational boundaries and geographies.
  • 5+ years of software engineering experience on large-scale production systems.
  • You possess a BS in Computer Science / Engineering / Physics / Mathematics or other comparable degree or equivalent experience.
  • Expert level knowledge of a systems programming language (Go, Python) and a solid understanding of Data Structure and Algorithms.
  • Understanding of performance, security, and reliability in complex distributed systems. Familiarity with system-level architecture, data synchronization, fault tolerance, and state management.

Ways To Stand Out From The Crowd

  • Proficiency in architecting and managing large-scale distributed systems, independent of cloud providers.
  • Advanced hands-on experience and deep understanding of cluster management systems (Kubernetes, Slurm, Bright Cluster Manager).
  • Proven operational excellence in designing and maintaining AI infrastructure.

NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people on the planet working for us.

If you are creative and autonomous, we want to hear from you!

The base salary range is 180,000 USD - 276,000 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits.

NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status, or any other characteristic protected by law.

J-18808-Ljbffr

2 days ago
Related jobs
Promoted
Google
Sunnyvale, California

Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scal...

Promoted
VirtualVocations
Fremont, California

A company is looking for a Senior Software QA Performance Engineer I who will ensure efficient and effective performance of software applications. ...

Promoted
Google
Sunnyvale, California

Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scal...

Promoted
VirtualVocations
Fremont, California

Java Cloud Software Engineer to join their engineering team in a remote capacity. ...

Promoted
Interviewstreet, Inc. dba HackerRank
Cupertino, California

HackerRank seeks Senior Software Engineer in Cupertino, CA to evaluate technologies, develop POCs, solve technical challenges, and propose innovative solutions for our technical and business problems. ...

Promoted
Apple
Cupertino, California

The Apple Maps Data Infrastructure team needs exceptional engineers to help build capabilities across a spectrum of technologies in a hybrid-cloud environment. Former experience in public cloud systems, particularly in AWS or GCP environments, is a great plus. That is why we are looking for strong e...

Promoted
Johnson & Johnson
Santa Clara, California

As a Senior Software Engineer you will join the Connectivity & Cybersecurity team, leading the development of software features and controls of the surgical robotics system. Johnson & Johnson's family of companies is recruiting for a Senior Software Engineer, within our Robotics & Digita...

Promoted
NVIDIA
Santa Clara, California

We are now looking for a Senior Deep Learning Software Engineer, for Algorithmic Model Optimization!. As a Senior Deep Learning Software Engineer, you will be at the forefront of pushing the boundaries of these models and enabling their deployment at a larger scale with unmatched efficiency. This is...

Promoted
HP
Palo Alto, California

ACS (Advanced Compute & Solutions) is seeking a Senior Software Engineer to lead ACS Software Development in our high growth, future-oriented businesses, including Data Science, AI and other emerging areas. The Senior Software Engineer will play a pivotal role in designing and implementing AI so...

General Motors
Mountain View, California

The compensation information is a good faith estimate only.It is based on what a successful applicant might be paid in accordance with applicable state laws.The compensation may not be representative for positions located outside of New York, Colorado, California, or Washington.The salary range for ...