Search jobs > Santa Clara, CA > Application engineer

Principal Engineer, Performance Analysis - AI Applications and Services

NVIDIA Corporation
Santa Clara, CA, US
Full-time

Principal Engineer, Performance Analysis - AI Applications and Services

We are seeking a highly motivated performance engineer to join our AI Applications organization to work on distributed cloud native accelerated video analytics applications.

Our team is building distributed cloud native accelerated real-time video streaming AI inference and video analytics platforms running on the Edge and cloud in a Kubernetes environment as part of the Metropolis ecosystem.

As a performance engineer, you will work with the Application teams to understand the architecture, profile, identify bottlenecks and optimize.

You will build a good understanding of application resource utilization characteristics across CPU, GPU and network accelerators.

A good understanding of distributed systems performance is must to scale these applications across multiple CPU and GPU nodes.

Your duties include collecting data and information on the applications you wish to optimize, identifying areas for improvement and developing strategies to bring about those positive changes.

What you'll be doing :

  • You will plan, enable and drive performance initiatives across our Cloud Native application teams.
  • Review, develop, deploy and manage tools and strategies to systematically run performance experiments.
  • Collect and organize performance data and share with key partners.
  • Work closely with application teams to understand application resource utilization characteristics. Identify performance issues through profiling of the various components.
  • You will learn and have a good understanding of various accelerators in the system for an application workload and recommend E2E performance optimizations relative to capabilities of the system.
  • You will assist developers and product teams on best accelerators and systems for E2E system performance.
  • Improve and standardize performance measurement processes across our applications and GPU systems.
  • Work closely with GPU cloud native teams at Nvidia to deploy the latest and most optimal GPU resource sharing strategies for our applications in a Kubernetes environment.

What we need to see :

  • Masters degree or PhD in Computer Science or a related field, or equivalent experience.
  • 15+ years of experience in optimizing system design, complexity analysis, software design in Unix / Linux systems, performance, and application issues.
  • Experience in real-time streaming AI inference systems.
  • A history of working on distributed accelerated systems and solving sophisticated performance problems.
  • Deep hands-on experience with distributed systems based on Kubernetes.
  • Experience with on-prem and cloud systems and ability to work with partners across multiple teams.
  • Experience using and handling and optimizing modern Cloud and container-based Enterprise computing architectures.
  • Strong verbal and written communication and teamwork skills.
  • Ability to multitask effectively in a multifaceted environment and action driven with strong analytical skills.

Ways To Stand out from the Crowd :

  • Background with real-time computer vision AI inference and / or analytics platforms.
  • Experience in application issues, algorithms, and data structures.
  • Understanding of the functioning of AI services, deep learning and AI.
  • Exposure to scheduling and resource management systems.
  • Knowledge of GPU programming such as OpenCL or CUDA and knowledge of Multi-node GPU setups, GPU clusters, or Cloud computing.

NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization.

The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services.

Our work opens new universes to explore, enables outstanding creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars.

NVIDIA is looking for phenomenal people like you to help us accelerate the next wave of artificial intelligence. Widely considered to be one of the technology world’s most desirable employers.

We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and passionate about new technologies we want you on our team!

The base salary range is 272,000 USD - 419,750 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits.

NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

J-18808-Ljbffr

1 day ago
Related jobs
Promoted
Harnham
Mountain View, California

As a Principal AI Engineer, you'll be at the forefront of their AI strategy and initiatives, designing and developing platforms and solutions that drive positive outcomes through their Responsible AI products. Develop and maintain AI models and algorithms for various applications, including machine ...

Promoted
Tykhe Inc
CA, United States

The successful candidate will work closely with applied research, engineering, product management, customer success, and go-to-market teams to build applications that push the state-of-the-art in Generative AI across model architectures, distributed computing, optimization techniques, large-scale de...

Promoted
ThisWay
San Jose, California

Develop prototypes and MVP architectures for innovative features and services that are consistent, maintainable, and scalable. The position is crucial for driving innovation and maintaining a competitive edge in the AI and machine learning space. Mentor and guide a team of AI engineers and data scie...

Promoted
Apple
Cupertino, California

We are in search of an accomplished and driven Machine Learning Engineer who has a robust understanding of Large Language Models and Generative AI. By contributing to our team, you'll play an integral part in developing Siri, Photos, Music, and various other services, leaving a significant footprint...

Promoted
Palo Alto Networks
Santa Clara, California

This role is central to our mission, focusing on the development and optimization of backend services, with a keen eye for scalability, reliability, and performance. The ideal candidate will possess a deep understanding of cloud computing, particularly within the Google Cloud Platform (GCP), and hav...

Promoted
Apple, Inc.
Cupertino, California

We are in search of an accomplished and driven Machine Learning Engineer who has a robust understanding of Large Language Models and Generative AI. As a Machine Learning Engineer on the Relevance and Graph Inference Team, you'll join a phenomenal team of hardworking engineers and researchers and wil...

Amazon Web Services, Inc.
Santa Clara, California

Mentor and develop engineers, help define and grow the technical culture, and assist in attracting and retaining top engineering talent. Their expertise is deep and broad; hands on, producing both detailed technical work and high-level architectural designs but, driven to engage with business proble...

Amazon.com Services LLC
Sunnyvale, California

Has current and extensive experience designing and building distributed systems and applications. Software Development Engineer, you will engage with an experienced cross-disciplinary team to develop cloud solutions serving home security devices and services. The ideal candidate will be passionate a...

Cisco
San Jose, California

The Cisco Security AI team delivers AI products and platform for all Cisco secure products and portfolios so businesses around the world can defend against threats and safeguard the most vital aspects of their business with security resilience. Drive prototypes and MVP architecture and development t...

TikTok
San Jose, California

Code Fixing and Refactoring: Leverage AI models to suggest and apply code fixes, improvements, and refactorings to enhance code performance and maintainability. About the TeamQuality Technology Team focuses on the quality domain, providing a series of effective quality products and tools throughout ...