Search jobs > Santa Clara, CA > Remote > Senior software engineer

Senior Software Engineer, Distributed Systems - DGX Cloud

NVIDIA
Santa Clara, CA, US
$180K-$276K a year
Remote
Full-time

NVIDIA is hiring engineers to scale up its AI Infrastructure. We expect you to have a strong programming background, a deep understanding of distributed systems, familiarity with software testing and deployment, and excellent communication and planning abilities.

We also welcome out-of-the-box thinkers who can provide new ideas with strong at execution bias. Expect to be constantly challenged, improving, and evolving for the better.

You and other engineers in this team will help advance NVIDIA's capacity to build and deploy leading infrastructure solutions for a broad range of AI-based applications that affect core data science.

What are you waiting for if you're creative, passionate about what you do, and love having fun apply today!

For two decades, we have pioneered visual computing, the art and science of computer graphics. With the invention of the GPU - the engine of modern visual computing - the field has expanded to encompass video games, movie production, product design, medical diagnosis and scientific research.

Today, we stand at the beginning of the next era, the AI computing era, ignited by a new computing model, GPU deep learning.

What you will be doing :

We are designing and architecting a comprehensive platform that automates GPU asset provisioning, configuration, and lifecycle management across cloud providers.

Implementing monitoring and health management capabilities that enable industry leading reliability, availability, and scalability of GPU assets.

You will be harnessing multiple data streams, ranging from GPU hardware diagnostics to cluster and network telemetry, we can predict system failures in order to optimize workload success rates.

Work with engineering teams across NVIDIA to ensure your software integrates seamlessly from the hardware all the way up to the AI training applications.

What we need to see :

Highly motivated with strong communication skills, you have the ability to work successfully with multi-functional teams, principles and architects and coordinate effectively across organizational boundaries and geographies.

5+ years of software engineering experience on large-scale production systems.

You possess a BS in Computer Science / Engineering / Physics / Mathematics or other comparable Degree or equivalent experience.

Expert level knowledge of a systems programming language (Go, Python) and a solid understanding of Data Structure and Algorithms.

Understanding of performance, security and reliability in complex distributed systems. Familiarity with system level architecture, data synchronization, fault tolerance and state management.

Ways to stand out from the crowd :

Proficiency in architecting and managing large-scale distributed systems, independent of cloud providers

Advanced hands-on experience and deep understanding of cluster management systems (kubernetes, Slurm, Bright Cluster Manager)

Proven operational excellence in designing and maintaining AI infrastructure

NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people on the planet working for us.

If you are creative and autonomous, we want to hear from you!

The base salary range is 180,000 USD - 276,000 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and . NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

30+ days ago
Related jobs
Promoted
NVIDIA
Santa Clara, California

We expect you to have a strong programming background, a deep understanding of distributed systems, familiarity with software testing and deployment, and excellent communication and planning abilities. Proficiency in architecting and managing large-scale distributed systems, independent of cloud pro...

Promoted
TikTok
San Jose, California

We are looking for software engineers who are excited to grow their business understanding, build highly scalable and reliable software/infrastructure, partner across functions with global teams, and make big impacts. Responsible for the development of scalable and reliable systems aligned with prod...

Promoted
Enumerix, Inc.
Palo Alto, California

Are you ready to lead the charge in one of the most groundbreaking revolutions in genomics? Enumerix is on the lookout for a visionary Staff Engineer, Instrument Software to spearhead our mission in launching the revolutionary UltraPCR™ platform. Collaborate closely with hardware and software teams ...

Promoted
TikTok
San Jose, California

Our product engineering team is responsible for building an e-commerce ecosystem that is innovative, secure and intuitive for our users. BS/MS degree in Computer Science, Engineering, or related field. ...

Promoted
Blackpoint Cyber
Santa Clara, California

We are looking for an experienced software engineer to join our Planning and Control team to work on Infrastructure, Architecture, and Tooling. Develop tools that enable rapid debugging, testing, and evaluation of Autonomous Planning and Control software. BS, MS, or PhD or equivalent experience in e...

Promoted
Apple
Cupertino, California

We are seeking a hard-working Senior Software Engineer with a proven track record in mobile and desktop app development for iOS and macOS. Drive the adoption of standard methodologies for testing and quality assurance across the engineering team, fostering a culture of excellence in software develop...

Promoted
Walmart
Sunnyvale, California

Option 1: Bachelor's degree in computer science, computer engineering, computer information systems, software engineering, or related area and 3 years' experience in software engineering or related area. Master's degree in Computer Science, Computer Engineering, Computer Information Systems, Softwar...

Promoted
Uber
Sunnyvale, California

Experience in optimization systems and algorithmic or ML based optimization (such as ranking/recommendation systems, ads auction systems, etc). This is a key role as a senior engineering leader in Driver pricing. The team is tasked with setting driver prices for Uber rides globally, using large scal...

Promoted
Walmart
Sunnyvale, California

Demonstrate up-to-date expertise in Software Engineering and apply this to the development of action plans. We’re a team of software engineers, data scientists, and service professionals who make an epic impact and are at the forefront of the next retail disruption. Option 1: Bachelor's degree in co...

Promoted
Pylon
Palo Alto, California

Mortgage is complex, our platform is complex, good software engineering is complex. At this early stage, we’re looking for engineers who can see the opportunity of what we’re building towards and want to have a hand in building it. If you like pushing yourself to learn a massive amount while shippin...