Search jobs > Santa Clara, CA > Remote > Senior software engineer

Senior Deep Learning Systems Software Engineer - AI Infrastructure

NVIDIA
Santa Clara, CA, US
Remote
Full-time

NVIDIA is an industry leader with groundbreaking developments in High-Performance Computing, Artificial Intelligence and Visualization.

The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services.

Our work opens up new universes to explore, enables amazing creativity and discovery and powers what were once science fiction inventions from artificial intelligence to autonomous cars.

NVIDIA is seeking senior engineers who are mindful of performance analysis and optimization to help us squeeze every last clock cycle out of all facets of Deep Learning such as training and inferencing, one of today's most important workloads in the world.

If you are unafraid to work across all layers of the hardware / software stack from GPU architecture to Deep Learning Framework to achieve peak performance, we want to hear from you! This role offers an opportunity to directly impact the hardware and software roadmap in a fast-growing technology company that leads the AI revolution while helping deep learning users around the globe enjoy ever-higher training speeds.

What you'll be doing :

Understand, analyze, profile, and optimize deep learning workloads on state-of-the-art hardware and software platforms.

Build tools to automate workload analysis, workload optimization, and other critical workflows.

Collaborate with cross-functional teams to analyze and optimize cloud application performance on diverse GPU architectures.

Identify bottlenecks and inefficiencies in application code and propose optimizations to enhance GPU utilization.

Drive end-to-end platform optimization from a hardware level to the application and service levels

Design and implement performance benchmarks and testing methodologies to evaluate application performance.

Provide guidance and recommendations on optimizing cloud-native applications for speed, scalability, and resource efficiency.

Share knowledge and best practices with domain expert teams as they transition applications to distributed environments.

What we need to see :

Masters in CS, EE or CSEE or equivalent experience

8+ years of experience in application performance engineering

Experience using large scale multi node GPU infrastructure on premise or in CSPs

Background in deep learning model architectures and experience with Pytorch and large scale distributed training

Experience with application profiling tools such as NVIDIA NSight, Intel VTune etc.

Deep understanding of computer architecture, and familiarity with the fundamentals of GPU architecture. Experience with NVIDIA's Infrastructure and software stacks.

Proven experience analyzing, modeling and tuning DL application performance.

Proficiency in Python and C / C++ for analyzing and optimizing application code

Ways to stand out from the crowd :

Strong fundamentals in algorithms and GPU programming experience (CUDA or OpenCL)

Understanding of NVIDIA's server and software ecosystem

Hands-on experience in performance optimization and benchmarking on large-scale distributed systems

Hands-on experience with NVIDIA GPUs, HPC storage, networking, and cloud computing.

In-depth understanding storage systems, Linux file systems, RDMA networking

NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us.

If you're creative and autonomous, we want to hear from you.

The base salary range is 180,000 USD - 339,250 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and . NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

30+ days ago
Related jobs
Promoted
Tarana Wireless
Milpitas, California

Experience with software development for multi-core embedded systems, including real-time operating systems (RTOS), memory and cycle optimization, caches, multi-threaded programming, inter-process communication, and HW/SW interaction. You will be working on the design, development and integration of...

Promoted
ServiceNow
Santa Clara, California

As a Senior Staff Data Platform Software Engineer, you will have the opportunity to become a key member of the Data Scale team in the Platform Persistence group. You’ll work toward managing our explosive data growth and ensuring our systems remain available and highly responsive. We move fast ...

Promoted
EVONA
CA, United States

As a Senior flight Software Engineer, you’ll lead the development and architecture of critical flight software and systems, reporting directly to the Director of Software, and collaborating with an impressive technical team of aerospace engineers. Senior Flight Software Engineer Responsibilities:. S...

Promoted
24 Seven Talent
San Jose, California

Our client in the tech space is looking to bring on a Senior Backend Software Engineer - Global E-Commerce Logistics to their team on a fulltime basis. Maintain the quality and stability of the system, and guide engineers at all levels to continuously optimize various technical indicators of the sys...

Promoted
EVONA
CA, United States

Senior Satellite Systems Engineer | LA. Bachelor’s degree in Systems Engineering, Aerospace Engineering, or related field. If you're a systems engineer, with experience in spacecraft development, this could be a great fit for you!. Develop and maintain system-level documentation, including specifica...

TikTok
San Jose, California

Continuous learning, keeping up with the latest research directions and results in the field of machine learning, for the exploration of big data applications in various vertical areas. TikTok has global offices including Los Angeles, New York, London, Paris, Berlin, Dubai, Singapore, Jakarta, Seoul...

NVIDIA
Santa Clara, California
Remote

Work with engineering teams across NVIDIA to ensure your software integrates seamlessly from the hardware all the way up to the AI training applications. Proven operational excellence in designing and maintaining AI infrastructure. NVIDIA is hiring engineers to scale up its AI Infrastructure. We exp...

TikTok
San Jose, California

As a Senior iOS Engineer for the User Growth team, you will:. Promote robust and maintainable code, clear documentation, and deliver high-quality work on a tight schedule. TikTok has global offices including Los Angeles, New York, London, Paris, Berlin, Dubai, Singapore, Jakarta, Seoul and Tokyo. To...

NVIDIA
Santa Clara, California

We are looking for an experienced software engineer with expertise in designing and optimizing cloud Infrastructure. An era in which our GPU acts as the brains of computers, generative AI, robots, and self-driving cars that can understand the world. Software development experience in cloud engineeri...

TikTok
San Jose, California

TikTok has global offices including Los Angeles, New York, London, Paris, Berlin, Dubai, Singapore, Jakarta, Seoul and Tokyo. Together, we inspire creativity and bring joy - a mission we all believe in and aim towards achieving every day. Utilizing cutting-edge machine learning technology, advanced ...