Talent.com
Engineering Manager - AI DevOps

Engineering Manager - AI DevOps

NVIDIASanta Clara, CA, US
job_description.job_card.variable_days_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
job_description.job_card.job_description

AI DevOps Engineering Manager

NVIDIA is looking for an outstanding AI DevOps Engineering Manager to lead and expand our next-gen inference operations infrastructure. Join us in transforming AI inference delivery, supporting NVIDIA's innovative products like Dynamo, Triton, NIXL, and our quickly growing range of AI inference solutions. This role is essential for our GitHub First initiative, enabling public CI / CD infrastructure with GPU and Kubernetes capabilities to deliver high-throughput, low-latency inferencing solutions in distributed environments. Lead a team ensuring our AI products achieve outstanding performance and reliability worldwide.

What You'll Be Doing

  • Supervise a team of DevOps engineers with expertise in AI inference infrastructure, test automation (SDET), and Infrastructure as Code (IaC)
  • Architect and implement scalable test automation strategies for AI inference workloads, including performance benchmarking and automated quality gates
  • Lead the maintenance of our GitHub First public CI infrastructure, focusing on single / multi-GPU testing, Kubernetes multi-node GPU testing, and CSP validation
  • Drive Infrastructure as Code efforts by employing Terraform, Ansible, and Kubernetes to support scaling across multiple clouds and lead GPU clusters effectively.
  • Attain operational proficiency encompassing 24x7 on-call rotations, SRE methodologies, automated monitoring, and self-repairing systems to guarantee uptime exceeding 99.9%
  • Lead release coordination, cost optimization, and management of multi-cloud deployments

What We Need To See

  • Bachelor's / Master's degree in Computer Science, Engineering, or equivalent experience
  • 4+ years leading DevOps / SRE organizations with direct SDET leadership experience
  • 8+ years hands-on experience in software development, test automation, or infrastructure engineering with AI / ML or GPU-intensive workloads
  • Proficiency in Infrastructure as Code (IaC) platforms : Terraform, Ansible, or CloudFormation with exposure to multiple cloud environments (AWS, GCP, Azure, OCI)
  • Strong technical leadership in test automation frameworks, CI / CD pipeline development, and quality engineering practices
  • Familiarity with containerization and orchestration tools such as Docker and Kubernetes for leading AI / ML workloads and GPU resources
  • Proven success building and scaling teams in fast-paced, high-growth environments
  • Effective interpersonal skills to collaborate with remote teams and build agreement
  • Proficiency in Python, Rust, or related programming languages along with the capability to engage in architecture conversations
  • Demonstrated history of operational proficiency encompassing 24x7 on-call oversight, SRE methodologies, and robust high-availability infrastructures
  • Ways To Stand Out From The Crowd

  • Experience with CI / CD (specifically GitHub Actions), releasing Open-source AI software
  • Proficient in Deep AI / ML infrastructure with expertise in NVIDIA technologies such as CUDA, TensorRT, Dynamo and Triton Inference Server, including coordinating GPU cluster operations and GPU workload performance benchmarking
  • Background in DevOps, system software testing, and previous experience leading teams on inference engines, model serving platforms, or AI acceleration frameworks
  • Track record with monitoring tools (Prometheus, Grafana), security scanning, static / dynamic analysis tools, and license compliance automation for critical AI inferencing frameworks.
  • Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 224,000 USD - 356,500 USD for Level 3, and 272,000 USD - 425,500 USD for Level 4. You will also be eligible for equity and benefits.

    Applications for this job will be accepted at least until September 29, 2025. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

    serp_jobs.job_alerts.create_a_job

    Engineering Manager • Santa Clara, CA, US

    Job_description.internal_linking.related_jobs
    • serp_jobs.job_card.promoted
    Engineering Manager - AI / Data

    Engineering Manager - AI / Data

    Sprinter HealthMenlo Park, CA, US
    serp_jobs.job_card.full_time
    We're looking for a software engineering manager with 3+ years of eng.We want to make a difference in the lives of those falling between the cracks of the current healthcare system, and could use a...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Senior Engineering Manager, Analytics Platform

    Senior Engineering Manager, Analytics Platform

    SentrySan Francisco, CA, United States
    serp_jobs.job_card.full_time
    Bad software is everywhere, and we’re tired of it.Sentry is on a mission to help developers write better software faster so we can get back to enjoying technology. With more than $217 million in fun...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Software Engineering Manager II, AI / ML, Google Cloud

    Software Engineering Manager II, AI / ML, Google Cloud

    GoogleMountain View, CA, United States
    serp_jobs.job_card.full_time
    Software Engineering Manager II, AI / ML, Google Cloud.Experience owning outcomes and decision making, solving ambiguous problems and influencing stakeholders. deep expertise in domain.X Note : By appl...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Engineering Manager, Experiences

    Engineering Manager, Experiences

    King River Capital GroupSan Francisco, CA, United States
    serp_jobs.job_card.full_time
    Discord is used by over 200 million people every month for many different reasons, but there’s one thing that nearly everyone does on our platform : . Over 90% of our users play games, spending a comb...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Engineering Manager - Website Engineering

    Engineering Manager - Website Engineering

    ElasticMountain View, California, United States
    serp_jobs.job_card.full_time
    Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale - unleashing the potential of businesses and people.The Elastic Search AI...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Software Engineering Manager II, Big Query, Google Cloud

    Software Engineering Manager II, Big Query, Google Cloud

    Google Inc.Sunnyvale, CA, United States
    serp_jobs.job_card.full_time
    Software Engineering Manager II, Big Query, Google Cloud.Google place Sunnyvale, CA, USA.Bachelor's degree or equivalent practical experience. Master’s degree or PhD in Engineering, Computer Science...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Engineering Manager II, AI Model Foundations

    Engineering Manager II, AI Model Foundations

    BoxRedwood City, CA, US
    serp_jobs.job_card.full_time
    Engineering Manager II, Ai Model Foundations.Box (NYSE : BOX) is the leader in Intelligent Content Management.Our platform enables organizations to fuel collaboration, manage the entire content lifec...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Senior Engineering Manager - AI Enablement

    Senior Engineering Manager - AI Enablement

    RobloxSan Mateo, CA, US
    serp_jobs.job_card.full_time
    For roles that are based at our headquarters in San Mateo, CA : The starting base pay for this position is as shown below. The actual base pay is dependent upon a variety of job-related factors such ...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Engineering Manager, Ads

    Engineering Manager, Ads

    DiscordSan Francisco, CA, US
    serp_jobs.job_card.full_time
    Discord is used by over 200 million people every month for many different reasons, but there's one thing that nearly everyone does on our platform : play video games. Over 90% of our users play games...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Senior Engineering Manager - Machine Learning and AI

    Senior Engineering Manager - Machine Learning and AI

    RipplingSan Francisco, CA, US
    serp_jobs.job_card.full_time
    Senior Engineering Manager - Machine Learning and AI.Rippling gives businesses one place to run HR, IT, and Finance.It brings together all of the workforce systems that are normally scattered acros...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Senior Engineering Manager - Slack Search & AI

    Senior Engineering Manager - Slack Search & AI

    Stypi (Acquired by Salesforce)San Francisco, CA, US
    serp_jobs.job_card.full_time
    Salesforce is the #1 AI CRM, where humans with agents drive customer success together.And innovation isn't a buzzword it's a way of life. The world of work as we know it is changing and we're looki...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Engineering Manager- Machine Learning Infrastructure

    Engineering Manager- Machine Learning Infrastructure

    Plaid IncSan Francisco, CA, United States
    serp_jobs.job_card.full_time
    Plaid is evolving into an AI-first company, where data and machine learning are the key enablers of smarter, more secure insight products built on top of Plaid’s vast financial data network.The Mac...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    AI Engineering Manager, Premium AI

    AI Engineering Manager, Premium AI

    LinkedInSunnyvale, CA, US
    serp_jobs.job_card.full_time
    AI Engineering Manager, Premium AI.The Premium AI teams work on products at LinkedIn which impact millions of members on the platform. On the Premium AI team, you will be working on exciting product...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Engineering Manager, AI Developer Technology

    Engineering Manager, AI Developer Technology

    NVIDIASanta Clara, CA, US
    serp_jobs.job_card.full_time
    AI Developer Technology Engineering Manager.Join our global Developer Technology (DevTech) team at NVIDIA, where we drive innovation and enhance the value of our platforms for developers.As a key m...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Engineering Manager

    Engineering Manager

    LucidLinkSan Francisco, CA, US
    serp_jobs.job_card.full_time
    LucidLink Engineering Manager Opportunity.LucidLink is a fast-growing startup on a mission to make data instantly and securely accessible from everywhere. As remote and hybrid work has become the ne...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Engineering Manager, Applied AI

    Engineering Manager, Applied AI

    MergeSan Francisco, CA, US
    serp_jobs.job_card.full_time
    Engineering Manager, Applied AI.Merge enables B2B companies to add hundreds of integrations to their products, making it easy to access and sync their customers' data. We offer Unified APIs that pro...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Cloud Platform Engineering Manager

    Cloud Platform Engineering Manager

    Brahma Consulting GroupBelmont, CA, US
    serp_jobs.job_card.full_time
    Cloud Platform Engineering Manager.We are seeking a Cloud Platform Engineering Manager to lead a small but highly capable team building our cloud-based streaming location platform.You will oversee ...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Software Engineering Manager, AI Networking Menlo Park, CA • AI Infrastructure +1 more • Engine[...]

    Software Engineering Manager, AI Networking Menlo Park, CA • AI Infrastructure +1 more • Engine[...]

    MetaMenlo Park, CA, United States
    serp_jobs.job_card.full_time
    Software Engineering Manager, AI Networking.Meta is seeking a highly motivated and experienced Software Engineering Manager to join our team. As an engineering manager you will support, enable and e...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30