Search jobs > San Francisco, CA > Senior site reliability

Senior Site Reliability Engineer (SRE)

Collective Health
San Francisco, CA
Full-time

What you'll do :

  • Establish service level indicators and data-driven objectives, and develop SRE standards and processes to uphold and improve uptime, latency, and system health.
  • Define and execute initiatives to continuously improve our deployed cloud footprint in areas such as observability / monitoring, risk detection and mitigation, disaster recovery, cost optimization, and related areas.
  • Collaborate across engineering and other stakeholders to ensure that key stability and maintainability requirements are understood and maintained.
  • Create automation in areas such as monitoring, alerting, deployment, and others to enable scale and efficiency.
  • Be part of the SRE on-call rotation, including responsibility for incident response.
  • Implement best practices around incident management and root cause analysis while being part of on-call rotations.
  • Provide mentorship to junior site reliability engineers on best practices.

To be successful in this role, you'll need :

  • Bachelor's degree in Computer Science, Management Information Systems, or equivalent practical experience.
  • 4+ years of experience in site reliability engineering focused on maintaining production-grade cloud infrastructure.
  • Familiarity with a wide range of cloud-based infrastructure technologies, such as those used in container orchestration, data orchestration, business middleware, security, and governance.

This includes AWS (S3, EC2, RDS, more), Kubernetes, Docker, Kafka, Jenkins, and Grafana.

  • Demonstrated track record in effectively analyzing and troubleshooting large-scale distributed systems.
  • Systematic problem-solving approach, coupled with effective communication skills and a sense of drive.

Pay Transparency Statement

This is a hybrid position based out of our offices : San Francisco, CA , Plano, TX , or Lehi, UT . Hybrid employees are expected to be in the office three days per week (Plano, TX ) or two days per week (all other locations). # LI -hybrid

The actual pay rate offered within the range will depend on factors including geographic location, qualifications, experience, and internal equity.

In addition to the salary, you will be eligible for stock options and benefits like health insurance, 401k, and paid time off.

30+ days ago
Related jobs
Promoted
Cisco Systems, Inc.
San Francisco, California

As a Site Reliability Engineer on the team, you will focus on helping the team handle the company's core datastore services, maintaining a constantly growing infrastructure capable of handling a very high volume of incoming data per day. We're looking for talented engineers with a software or operat...

CIRCLE
San Francisco, California

As a Senior Site Reliability Engineer at Circle, you will design, build, and maintain Circle’s infrastructure estate to meet the growing worldwide customer base on public cloud providers across multiple regions. Senior Site Reliability Engineer (III). Senior Site Reliability Engineer (III). All the ...

Federal Reserve System
San Francisco, California
Remote

As a Senior Engineer of the SRE / Production Operations team for FedNow, you will operate the production environment for the program. The team uses open source and proprietary software to support Engineering, DevOps, and DevSecOps tools, services, and solutions. The SRE / Production Operations team ...

CIRCLE
San Francisco, California

As a Senior Site Reliability Engineer at Circle, you will design, build, and maintain Circle’s infrastructure estate to meet the growing worldwide customer base on public cloud providers across multiple regions. Senior Site Reliability Engineer (III). Senior Site Reliability Engineer (III). All the ...

BHO Tech
San Francisco, California

We are seeking a talented senior engineer to focus on scaling systems sustainably through automation, and evolve systems by pushing for changes that improve reliability and velocity. This role would closely collaborate with our engineering team to engage in and improve the whole lifecycle of service...

eTeam
Remote, CA
Remote

Minimum years exp in Terraform, Ansible, Networking, Jenkins, Python, GCP in Technology companies.Security (vulnerability management)....

GEICO
Oakland, California

The ideal candidate has a deep understanding of technology, risk management, site reliability engineering principles and strategic planning to design and implement resilient systems that safeguard our business from potential threats. Distinguished Engineer – Network and Server Hardware SRE. Develop ...

BHO Tech
San Francisco, California

We are looking for a Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will:. We are deliberate and self-reflective about the kind of engineering team and culture that we are building, seeking engineers that are not only strong in their own aptitudes but care deeply abo...

Retool, Inc.
San Francisco, California

As our first Site Reliability Engineer, you will be instrumental in defining and shaping the processes and practices for a pivotal new business offering. This role requires a blend of deep technical expertise in site reliability engineering and a keen product sense to create solutions that not only ...

My3Tech
San Mateo, California

Role: Site Reliability Engineering (SRE). Participate in Site Reliability Engineerings oncallrotation. Location: San Mateo CA (Onsite). This is 100%OnSite in San Mateo CA. ...