Mgr, Site Reliability Engineering

NetApp
SAN JOSE, California, United States
$180.2K-$250.3K a year
Full-time
We are sorry. The job offer you are looking for is no longer available.

About NetApp

NetApp is the intelligent data infrastructure company, turning a world of disruption into opportunity for every customer.

No matter the data type, workload or environment, we help our customers identify and realize new business possibilities.

And it all starts with our people.

If this sounds like something you want to be part of, NetApp is the place for you. You can help bring new ideas to life, approaching each challenge with fresh eyes.

We embrace diversity and openness because it's in our DNA. Of course, you won't be doing it alone. At NetApp, we're all about asking for help when we need it, collaborating with others, and partnering across the organization - and beyond.

At NetApp, we fully embrace and advance a diverse, inclusive global workforce with a culture of belonging that leverages the backgrounds and perspectives of all employees, customers, partners, and communities to foster a higher performing organization."-George Kurian, CEO

Job Summary

The Site Reliability Engineering (SRE) Manager will lead a dynamic team responsible for ensuring our critical systems' reliability, performance, and efficiency.

This role involves a strategic blend of engineering and operations and requires a strong background in software development, systems engineering, and leadership.

This is a pivotal role in our operations, demanding a dedicated individual who excels in a fast-paced and collaborative environment.

We invite you to apply if you are driven by system reliability and ready to lead a high-performing team.

Job Responsibilities

  • Lead and mentor a team of SREs, fostering a culture of continuous improvement and innovation.
  • Collaborate with product and engineering teams to design and implement scalable solutions.
  • Develop and maintain a reliable monitoring and alerting system to detect and mitigate issues proactively.
  • Drive incident management processes and conduct post-mortem analyses to prevent future outages.
  • Manage priorities, projects, and the overall workflow of the SRE team.
  • Ensure compliance with security best practices and company policies.
  • Stay ahead of industry trends and emerging technologies to continuously improve system reliability and performance.

Job Requirements

  • Minimum of 8 years of experience in SRE, DevOps, or similar roles, with at least 2+ years in a leadership position with direct reports.
  • Experience leading geographically dispersed teams.
  • Proficiency in programming languages such as Python, Go, or Java.
  • Extensive experience with cloud services (AWS, GCP, Azure) and container orchestration tools (Kubernetes, Docker).
  • Solid understanding of CI / CD pipelines and automation tools (Jenkins, Ansible, Terraform).
  • Exceptional knowledge of observability tools and setting up architecture for proactive monitoring of the product.
  • Proven track record of designing and implementing scalable, high-availability systems.
  • Exceptional problem-solving skills and the ability to work under pressure.
  • Excellent communication and team-building skills.

Education

Bachelor’s degree in computer science, Engineering, or a related field; Master’s preferred.

Compensation

The base salary range for this position is $180,200 $250,300 and will be determined by the candidate's location, qualifications, experience, and education.

Final compensation packages are competitive and in line with industry standards, reflecting a variety of factors, and include a comprehensive benefits package.

This may cover Health Insurance, Life Insurance, Retirement or Pension Plans, Paid Time Off (PTO), various Leave options, Performance-Based Incentives, employee stock purchase plan, and / or restricted stocks (RSU’s), with all offerings subject to regional variations and governed by local laws, regulations, and company policies.

Benefits may vary by country and region, and further details will be provided as part of the recruitment process.

Equal Opportunity Employer :

NetApp is firmly committed to Equal Employment Opportunity (EEO) and to compliance with all federal, state and local laws that prohibit employment discrimination based on age, race, color, gender, sexual orientation, gender identity, national origin, religion, disability or genetic information, pregnancy, protected veteran status, and any other protected classification.

Did you know...

Statistics show women apply to jobs only when they're 100% qualified. But no one is 100% qualified. We encourage you to shift the trend and apply anyway! We look forward to hearing from you.

Why NetApp?

We are all about helping customers turn challenges into business opportunity. It starts with bringing new thinking to age-old problems, like how to use data most effectively to run better - but also to innovate.

We tailor our approach to the customer's unique needs with a combination of fresh thinking and proven approaches.

We enable a healthy work-life balance. Our volunteer time off program is best in class, offering employees 40 hours of paid time per year to volunteer with their favorite organizations.

We provide comprehensive medical, dental, wellness, and vision plans for you and your family. We offer educational assistance, legal services, and access to discounts.

Finally, we provide financial savings programs to help you plan for your future.

If you want to help us build knowledge and solve big problems, let's talk.

8 days ago
Related jobs
Promoted
VirtualVocations
Santa Clara, California

A company is looking for a Director of Site Reliability Engineering. ...

Promoted
Apple
Cupertino, California

We are looking for passionate and talented Site Reliability Engineering Manager to continue our focus in providing our customers the highest quality Apple Services experience. Demonstrable success leading engineering teams; ideally SRE or Production Engineering. Understanding of SRE principals, incl...

Promoted
Capgemini Engineering
Sunnyvale, California

Site Reliability Engineer - Infra and DevOps. Capgemini is seeking a hardworking Site Reliability Engineer to join our versatile team in Sunnyvale, CA. The ideal candidate will possess Bachelors in STEM or equivalent, and a minimum of 7 years' experience as a Site Reliability Engineer. World leader ...

Promoted
Apple
Cupertino, California

Experience applying software engineering to solve large scale operational problems (Java and Golang preferred). Track record of improving service reliability and efficiency whilst lowering operational cost. ...

Promoted
Syntricate Technologies Inc
Santa Clara, California

Position: Site Reliability Engineering (SRE). Site Reliability Engineering (SRE). Location: Santa Clara, CA (Onsite). ...

Promoted
Diverse Lynx
Santa Clara, California

Skills: Site Reliability Engineering (SRE), GIT(Bitbucket), Jenkins, AWS CodeBuild, AWS CodeDeploy. ...

Promoted
Plume Design, Inc.
Palo Alto, California

We’re looking for a seasoned Technical Manager, experienced with Customer Facing environments, to Captain our Site Reliability Engineering Team. Supervise a team of Site Reliability Engineers who provide first-line support to Customer Clouds. ...

BHO Tech
Palo Alto, California

Site Reliability Engineering (SRE) is what you get when you treat operations as if it’s a software problem. ...

Cisco
Milpitas, California

You will be actively a part of an engineering focused team that cultivates innovation, collaboration and diversity. You will be working on operating the infrastructure and applications as part of Nexus Cloud engineering team. ...

TriMedx
San Jose, California

The Clinical Engineering Senior Site Manager leads clinical engineering initiatives to provide superior customer service and operational efficiency by managing the execution of the TRIMEDX Medical Equipment Management Plan (MEMP). The Clinical Engineering Senior Site Manager also directs Joint Commi...