Talent.com
Senior Product Manager - Observability and Resilience

Senior Product Manager - Observability and Resilience

NVIDIASanta Clara, CA, US
job_description.job_card.30_days_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
job_description.job_card.job_description

Product Manager For Resiliency And Observability

NVIDIA has become the platform upon which every new AI-powered application is built. From healthcare research applications to autonomous vehicles, or voice-recognition systems, there is a need to simplify and deliver predictability for AI applications and workflows ... and NVIDIA is right in the center of this revolution. Resiliency and Observability are key to delivering customer value and exhilarating customer experience. This product manager will lead the development of foundational tools dedicated to ensuring the resiliency and observability of large-scale accelerated computing platforms. By creating essential tools for system diagnostics, performance monitoring, and automated recovery, they will empower customers to confidently operate both complex AI training and demanding inference workloads with maximum uptime and efficiency.

What You Will Be Doing :

  • Be a subject-matter expert on resiliency and observability. Deeply understand failure modes across the GPU hardware, network, and software stack, along with the telemetry signals that reveal them, and how they correlate to workload health and SLOs. Master modern reliability architectures. Keep up-to-date with the industry trends.
  • Build for all that want to use. Drive joint project planning. Define concrete achievements, tasks, and work for resiliency and observability initiatives with external partners.
  • Fuel innovation in reliability tooling. Lead ideation sessions to propose novel approaches and shape new proof-of-concepts.
  • Bridge development, SRE, and partner teams. Facilitate clear communication, triage emergent issues rapidly, and ensure feedback loops between engineering and customer operations remain tight.
  • Coordinate execution across different functions. Work with engineering, design, operations, sales, and marketing to embed resiliency and observability requirements into every product launch, capacity expansion, and lifecycle transition.

What We Need To See :

  • BS or MS in Computer Science, Computer Engineering, or a related field (or equivalent experience) and 12+ years of product-management experience in enterprise technology.
  • Experience with GPU observability (DCGM, NVML, etc.) and integration into large-scale telemetry systems.
  • Deep knowledge of AI / ML infrastructure, high-performance computing (HPC), networking, and cloud technologies (IaaS, PaaS) including containerization, Kubernetes, and automation tools.
  • Familiarity with modern observability stacks : metrics, logs, traces, OpenTelemetry, Prometheus / Grafana, ELK / OpenSearch.
  • Experience building and preferably deep understanding of secure, compliance-focused telemetry pipelines (SOC2, FedRAMP).
  • Ability to articulate trade-offs among latency, throughput, cost, and reliability to both engineering and executive audiences.
  • Data-driven approach : defines SLIs / SLOs, manages error budgets, and develops value models.
  • Strong cross-functional execution : writes clear specs and PRDs, produces GTM collateral, and leads agile processes.
  • Ways To Stand Out From The Crowd :

  • Masters / PhD or expertise in distributed systems, performance modeling, or fault-tolerant computing.
  • Experience with MLOps and LLMOps ecosystems and integrating with enterprise platforms; deployments at modern data-center scale; delivered ML / AI observability solutions for LLMOps, predictive incident detection, or anomaly classification.
  • Startup or 0 ->
  • 1 experience building cloud-native observability or resilience tools; proven success bringing open-source observability products to market and shaping GTM strategy.

  • Familiarity with MLOps toolchains and integrations with monitoring platforms such as Splunk, Datadog, and Grafana Cloud.
  • Expertise with containerization technologies like Docker and Kubernetes, plus virtualization. Proficiency in network architecture and high-performance interconnects (InfiniBand, Ethernet, RoCE).
  • We have some of the most forward-thinking and hardworking people in the world working for us and, due to outstanding growth, our elite engineering teams are growing fast. NVIDIA is widely considered to be one of the industry's most desirable employers. NVIDIA is at the center of Deep Learning, Artificial Intelligence, and Autonomous Vehicles. If you're looking for a challenge, thrives in an ambiguous environment and shares our passion for technology, we want to hear from you. We are looking for great people to help us accelerate the next wave of artificial intelligence.

    Applications for this job will be accepted at least until August 21, 2025. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

    serp_jobs.job_alerts.create_a_job

    Product Manager • Santa Clara, CA, US

    Job_description.internal_linking.related_jobs
    • serp_jobs.job_card.promoted
    Senior Product Manager (Future Opportunities)

    Senior Product Manager (Future Opportunities)

    TwitterSan Francisco, CA, US
    serp_jobs.job_card.full_time
    Senior Product Manager (Future Opportunities).Twitter promotes and protects the public conversation.Twitter is the town square of the internet. At Twitter, we work with one goal in mind : to improve ...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Platform Product Manager

    Platform Product Manager

    Brahma Consulting GroupRedwood City, CA, US
    serp_jobs.job_card.full_time
    As a Platform Product Manager, you will own the infrastructure that powers real-time 3D positioning technology—from how radio signals are received to the data pipelines that deliver location ...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Applications Product Manager

    Applications Product Manager

    Brahma Consulting GroupRedwood City, CA, US
    serp_jobs.job_card.full_time
    As a Software Applications Product Manager you will own the vision, strategy, and execution of enterprise web and mobile applications. You will work closely with engineering, business, and customer ...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Director, Product Management (Medical Device)

    Director, Product Management (Medical Device)

    Bayside SolutionsSan Mateo County, CA, US
    serp_jobs.job_card.full_time +1
    Director, Product Management (Medical Device).The Director, Product Management is a strategic marketing role that reports to the Sr. Director, Product & Market Development.This role will lead th...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Senior Product Manager (IoT)

    Senior Product Manager (IoT)

    Palo Alto NetworksSanta Clara, CA, US
    serp_jobs.job_card.full_time
    We're seeking a strategic and execution-driven Product Manager to shape and elevate our product experience.In this role, you'll define new features, own core product areas, and lead cross-functiona...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Senior Product Manager

    Senior Product Manager

    SimplyInsuredSan Francisco, CA, US
    serp_jobs.job_card.full_time
    At SimplyInsured we are on a mission to eliminate fear in health insurance.Health insurance is complicated, expensive, and really important - so it tends to create fear for most people; our goal is...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Senior Product Manager, Core

    Senior Product Manager, Core

    BounceSan Francisco, CA, US
    serp_jobs.job_card.full_time
    Bounce is a global luggage storage marketplace transforming the way people travel and explore.With over 20,000+ trusted partners in 100+ countries, Bounce connects travelers with local businesses o...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Senior Product Manager - Autonomy

    Senior Product Manager - Autonomy

    Applied IntuitionMountain View, CA, US
    serp_jobs.job_card.full_time
    Senior Product Manager - Autonomy.Applied Intuition is the vehicle intelligence company that accelerates the global adoption of safe, AI-driven machines. Founded in 2017, Applied Intuition delivers ...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Senior Product Manager, Measurement (Experimentation)

    Senior Product Manager, Measurement (Experimentation)

    TikTokSan Francisco, CA, United States
    serp_jobs.job_card.full_time
    Senior Product Manager, Measurement (Experimentation).TikTok's Measurement Product team builds experimentation products that help marketers and brands realize their true business value generated by...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Senior Product Manager (Vulnerability Management)

    Senior Product Manager (Vulnerability Management)

    Palo Alto NetworksSanta Clara, CA, US
    serp_jobs.job_card.full_time
    At Palo Alto Networks® everything starts and ends with our mission : .Being the cybersecurity partner of choice, protecting our digital way of life. Our vision is a world where each day is safer a...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Senior Product Manager, Platform

    Senior Product Manager, Platform

    FloQastSan Jose, CA, US
    serp_jobs.job_card.full_time
    Senior Product Manager, Platform.FloQast is at the forefront of the accounting industry, providing an AI-powered Accounting Transformation Platform created by accountants, for accountants.We are de...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Senior Product Manager, Achieve

    Senior Product Manager, Achieve

    StravaSan Francisco, CA, US
    serp_jobs.job_card.full_time
    Senior Product Manager, Achieve.Strava is the leading social platform for athletes and the largest sports community in the world, with over 150 million athletes in 185 countries.If you sweat you're...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Senior Product Manager

    Senior Product Manager

    Invenia GroupSan Jose, CA, United States
    serp_jobs.job_card.full_time
    Are you looking for an exciting new career challenge?.An innovative life sciences company is seeking a Senior Product Manager (Spatial Genomics) to lead its expanding spatial genomics portfolio.Thi...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Senior Accessibility Product Manager

    Senior Accessibility Product Manager

    Adobe Inc.San Jose, CA, United States
    serp_jobs.job_card.full_time
    Senior Accessibility Product Manager.Our Company : Changing the world through digital experiences is what Adobe's all about. We give everyone—from emerging artists to global brands—everything they ne...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Senior Product Manager, Growth

    Senior Product Manager, Growth

    Recruiting From ScratchSan Francisco, CA, United States
    serp_jobs.job_card.full_time
    Ground-Breaking Scientific Software Company Seeks Senior Product Manager, Growth.Our client is revolutionizing scientific communication through innovative software that transforms how researchers a...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Senior Product Manager, Achieve San Francisco, CA

    Senior Product Manager, Achieve San Francisco, CA

    StravaSan Francisco, CA, United States
    serp_jobs.job_card.full_time
    Strava is the leading social platform for athletes and the largest sports community in the world, with over 150 million athletes in 185 countries. If you sweat you’re an athlete, and Strava’s mobile...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Senior Director, Product, Core Product

    Senior Director, Product, Core Product

    StravaSan Francisco, CA, US
    serp_jobs.job_card.full_time
    Senior Director, Product, Core Product.Strava is the app for active people.With over 150 million athletes in more than 185 countries, it's more than tracking workoutsit's where connection, motivati...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Director, Product Management, Integrations

    Director, Product Management, Integrations

    Medallia, Inc.Pleasanton, CA, United States
    serp_jobs.job_card.full_time
    Medallia is the pioneer and market leader in Experience Management.Our award-winning SaaS platform, Medallia Experience Cloud, leads the market in the management of experiences, insights, and actio...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Senior Product Manager, Enablement and Training

    Senior Product Manager, Enablement and Training

    RingCentralBelmont, CA, United States
    serp_jobs.job_card.full_time
    Senior Product Manager, Enablement and Training page is loaded## Senior Product Manager, Enablement and Traininglocations : Belmont, Californiatime type : Full timeposted on : Posted Yesterdayjo...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Senior Product Manager, DGX Product Management

    Senior Product Manager, DGX Product Management

    NVIDIASanta Clara, CA, US
    serp_jobs.job_card.full_time
    Enterprise Ai Factory Product Manager.NVIDIA is the defining technology company of the Artificial Intelligence era.We are building a legacy of innovation that is powered by outstanding technologyan...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days