Search jobs > Cupertino, CA > Site reliability engineer

Site Reliability Engineer/Lead

Diamondpick
Cupertino, CA, United States
Full-time

Kindly Note : This would be a W2 Opportunity

Primary Focus : This role involves close collaboration with the customer&

tech leads and teams from development, infrastructure, cloud deployment,

and DevOps. The SRE Tech Lead presents findings, works through

challenges, and provides solutions for reliability improvements.

Required Skills :

o Communication & Collaboration : Strong ability to work directly with

customer teams, understand their design challenges, and

communicate complex technical concepts clearly.

o DevOps & Cloud : Hands-on experience in DevOps practices and

cloud infrastructure (AWS, Kubernetes, Rancher, etc.), with the ability

to recommend reliability solutions.

o Performance Tuning & Reliability : Strong knowledge of

performance bottlenecks, capacity planning, and reliability

engineering best practices.

o Problem Solving : Analytical skills to work through customer

challenges and limitations, and to co-develop short-term and long-term

solutions.

o Dev Engineering & Infra Design : Background in software

development engineering and infrastructure design, enabling

effective discussions with customer teams on the technical aspects.

o CI / CD Tools : Experience with Jenkins, ArgoCD, and other CI / CD

tools to discuss automation and deployment pipelines with customers.

Educational Qualification :

o Bachelor’s or Master’s Degree in Computer Science, Information

Systems, or related fields.

Years of Experience :

o 8-10 years of experience in Site Reliability Engineering, DevOps, or

related fields, with a significant focus on cloud and infrastructure

performance.

o Proven experience working in a customer-facing role where direct

engagement and collaborative problem-solving are key.

3 days ago
Related jobs
Nvidia Corporation
Santa Clara, California

Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination of software and systems engineering practices. Senior Site Reliability Engineer - Observability and Telem...

TikTok
Mountain View, California

Site Reliability Engineering at TikTok combines software and systems engineering to build and run large-scale, massively distributed, and fault-tolerant systems. Site Reliability Engineering (SRE). TikTok is the leading destination for short-form mobile video. Scale systems sustainably through mecha...

Silver Valley Metals Corporation, site: Bunker Hill Mine
Palo Alto, California

Site Reliability Engineer, Production Engineer, Platform Engineer). Collaborate, partner, advise, review and mentor engineering teams on Reliability topics like high reliability architecture, observability, safe change management. Experience leading and driving company-wide reliability efforts and e...

Western Digital
Milpitas, California

As Secure Development Factory (SDF) Site Reliability Engineer - DevOps, you will be at the heart of Western Digital’s engineering process, delivering the software development tools and infrastructure that empowers engineering teams to develop and deliver high quality products quickly. You will lead ...

ByteDance
San Jose, California

Our data infrastructure Site Reliability Engineering (SRE) team is a pioneer in innovation. Establish sustainable mechanisms for scaling systems, such as automation, to drive enhancements in reliability, efficiency, and velocity. ...

CENTRL Inc.
Mountain View, California

CENTRL’s clients include leading companies around the world including several Fortune 500 firms. CENTRL is led by a highly experienced management team with a proven track record and is backed by some of the leading investors such as Providence Strategy Growth and Susquehanna Growth Equity. In this l...

Zoom
San Jose, California

You will also produce designs and lead more junior team members through their implementation and deployment to production. You will be the tech lead for production resilience and disaster recovery, and you will define the roadmap for improvements in this area. Be familiar with chaos engineering and ...

TikTok
Mountain View, California

TikTok is the leading destination for short-form mobile video. The teams within USDS that deliver on this commitment daily span across Trust & Safety, Security & Privacy, Engineering, User & Product Ops, Corporate Functions and more. Initiate and lead scripting/tooling/automation to streamline proce...

Altimetrik
Mountain View, California

Design, implement, and maintain complex data systems supporting millions of customers with Cloud Native principles and best practices to ensure highly available, secure, performant and scalable database systems.Build and maintain CI/CD pipelines in Jenkins.Build and deploy services in Kubernetes clu...

Inworld AI
Mountain View, California

DevOps, Infrastructure, Operations, or Site Reliability Engineer (or as a software engineer with relevant experience). We are looking for a Staff Cloud DevOps/Site Reliability Engineer to join our team. Our Technical Operations team manages the infrastructure, DevOps, and Site Reliability of our pla...