Search jobs > Santa Clara, CA > Site reliability engineer

Site Reliability Engineer

NVIDIA
Santa Clara, California, US
$160K-$247.3K a year
Full-time

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology and outstanding people.

Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world.

Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, encouraging environment where everyone is inspired to do their best work.

Come join the team and see how you can make a lasting impact on the world.

Do you have the right skills and experience for this role Read on to find out, and make your application.

We are looking for a Staff Site Reliability Engineer to join our team. You should have experience supporting and working with teams across the company to improve the usability, reliability, and performance for enterprise applications.

What You'll Be Doing

  • Design, develop, and evolve the Site Reliability Engineering practice.
  • Deploy and support tools from a system engineering perspective and be able to solve any issues in-depth.
  • Help the SRE teams define technology and business strategies that deliver iterative enhancements to the tools and processes that improve availability, observability, and scalability.
  • Recognize, validate, and publish emerging technologies and architectures that align with business objectives.
  • Lead and build the proven foundation for the Infrastructure and Application lifecycle on installation, monitoring, observability, and user experience.
  • Build tooling to lower the barrier of entrance for engineering teams to plug in and enjoy the benefits of Reliability.
  • Documenting institutional knowledge.
  • Building software to help operations and support teams.

What We Need To See

  • Bachelor’s and / or Masters in computer science or related field of study (or equivalent experience).
  • 8+ demonstrable experience deploying and supporting applications in a Cloud environment.
  • Having Confluence, Jira, and Service Desk experience is a plus.
  • Excellent Windows and Linux system skills.
  • Good understanding of security components like SSL, load balancer, firewalls, etc.
  • Extensive experience supporting applications in high-availability environments.
  • Scripting skills to automate repetitive and basic tasks.
  • Experience in documenting processes and procedures.
  • Strong interpersonal skills with the ability to understand and explain technical issues to a non-technical audience.

Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package.

As you plan your future, see what we can offer to you and your family www.nvidiabenefits.com.

The base salary range is 160,000 USD - 247,250 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits.

NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

J-18808-Ljbffr

5 days ago
Related jobs
Promoted
VirtualVocations
Fremont, California

A company is looking for a Staff Site Reliability Engineer - Incident Response. ...

Promoted
Apple
Cupertino, California

The Apple Service Engineering - Edge & Messaging SRE team is looking for Site Reliability Engineers to build and run the services that hundreds of millions of customers use every day. We're looking for a talented and passionate person who loves designing, engineering and running systems and infrastr...

Promoted
TikTok
San Jose, California

Scale up systems sustainably through mechanisms like automation, and initiate changes that improve system reliability and processing speed. Bachelor's degree in Computer Science or a related technical background involving software/system engineering, or equivalent working experience. ...

Promoted
Zscaler
San Jose, California

We're looking for an experienced Staff Site Reliability Engineer-Technical Duty Officer to join our Shared Platform Engineer team. Site Reliability Engineer, with relevant experience in an Operations or Engineering environment. Our Engineering team built the world's largest cloud security platform f...

Promoted
Spry Info Solutions, INC
Santa Clara, California

We are looking for a site reliability engineer with an expertise in Splunk configuration, setup and monitoring. Implement integration to external system to develop Splunk use cases and proliferate Splunk usage across the enterprise; provide engineering expertise and assistance to the Splunk user. Th...

Promoted
LeadStack Inc.
San Jose, California

Job Title: DevOps/Site Reliability Engineer. Location: Hybrid on site, San Jose, CA. ...

Siemens Industry Software Inc.
Fremont, California

The position involves performance based compensation and reports to anInfrastructure Engineering Manager who manages personnel at multiple sites. Are you ready to have your system support skills andexperience leveraged to improve the productivity of developers working onworld-class engineering softw...

Tarana Wireless
Milpitas, California

As a Senior Site Reliability Engineer, you will help us manage software that runs on the cloud and remotely manages millions of radio devices. Automate the monitoring and auto-scaling of the production environment, to support millions of connected devices Monitoring of all live systems Troubleshoot ...

SAMSUNG
Mountain View, California

We are looking for a passionate Embedded Site Reliability Engineer who will lead the technical strategy and vision for our underpinning infrastructure, alerting & monitoring, infrastructure provisioning, networking, and development tooling in collaboration with other engineering teams and leadership...

Fractal
CA, United States

Must be willing to participate in on-call rotationWork cross-functionally with Services and Engineering teams. ...