Talent.com
Staff Site Reliability Engineer - Observability
Staff Site Reliability Engineer - ObservabilityHispanic Alliance for Career Enhancement • Boston, MA, United States
Staff Site Reliability Engineer - Observability

Staff Site Reliability Engineer - Observability

Hispanic Alliance for Career Enhancement • Boston, MA, United States
job_description.job_card.1_day_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
job_description.job_card.job_description

Overview

At CVS Health, we’re building a world of health around every consumer and surrounding ourselves with dedicated colleagues who are passionate about transforming health care.

As the nation’s leading health solutions company, we reach millions of Americans through our local presence, digital channels and more than 300,000 purpose-driven colleagues – caring for people where, when and how they choose in a way that is uniquely more connected, more convenient and more compassionate. And we do it all with heart, each and every day.

Responsibilities

Position Summary

The PCW (Pharmacy & Consumer Wellness) Edge SRE team is seeking a Staff Site Reliability Engineer (SRE) with a primary focus on observability to join our team. This role will lead the design, implementation, and optimization of observability systems to ensure the reliability, performance, and scalability of our environment with emphasis on edge environments. You will collaborate with cross-functional teams to build robust monitoring, alerting, and telemetry solutions, enabling proactive issue detection and resolution across distributed systems. As a senior member of the SRE team, you will drive best practices, mentor others, and shape the strategic evolution of our observability ecosystem in a complex, edge-centric architecture.

Note : the original text uses emphasis; content retained as-is where applicable.

Observability Strategy & Implementation :

Design and implement comprehensive observability solutions tailored for edge computing environments, including monitoring, logging, tracing, and metrics collection, to provide deep visibility into system performance and health across distributed remote facilities.

Define and maintain Service Level Indicators (SLIs), Service Level Objectives (SLOs), and business KPIs to measure and enhance system reliability in edge and centralized infrastructure.

Build and optimize dashboards, visualizations, and alerting systems to enable real-time insights and rapid incident response for edge nodes and remote facilities.

Implement distributed tracing and log aggregation systems to troubleshoot complex issues in edge computing environments.

System Reliability & Performance in Edge Computing :

Collaborate with engineering teams to ensure applications and infrastructure at edge locations are designed with observability in mind, incorporating best practices for instrumentation and monitoring in resource-constrained environments.

Drive proactive identification of issues in edge facilities through advanced observability tools, reducing Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR) across distributed systems.

Lead incident postmortems, analyzing root causes specific to edge environments and implementing observability-driven improvements to prevent recurrence.

Tooling & Automation for Edge Environments :

Develop and maintain tools, scripts, and automation to enhance observability pipelines, optimizing for the unique challenges of edge computing, such as bandwidth limitations and intermittent connectivity.

Evaluate and integrate industry-standard observability tools (e.g., Prometheus, Grafana, ELK Stack, OpenTelemetry) and recommend solutions tailored for edge computing use cases.

Optimize observability data storage, retention, and querying to balance performance, cost, and scalability across a large number of remote facilities.

Leadership & Collaboration :

Mentor and guide junior SREs and engineers on observability best practices for edge computing, fostering a culture of reliability and proactive monitoring.

Partner with solution, engineering, and business teams to align observability efforts with business objectives, ensuring seamless operation of edge and centralized systems.

Lead cross-functional initiatives to improve observability, reliability, and operational efficiency across distributed edge infrastructure.

Continuous Improvement :

Stay current with emerging observability trends, tools, and methodologies, particularly those suited for edge computing and distributed systems, and advocate for their adoption.

Contribute to the development of observability standards, runbooks, and documentation tailored for edge environments to ensure consistency and scalability.

Drive cost optimization for observability infrastructure while maintaining high-quality monitoring and alerting capabilities across remote facilities.

Required Qualifications

7+ years of experience in Site Reliability Engineering, Observability Engineering, or a related field.

5+ years of experience with observability tools and platforms such as Prometheus, Grafana, Splunk, ELK, OpenTelemetry, or similar.

3+ years of experience with microservices, containerized environments (e.g., Kubernetes, Docker), and distributed systems, particularly in edge deployments.

Preferred Qualifications

Experience with implementation of AIOps.

Demonstrated ability to handle observability challenges in environments with intermittent connectivity, high latency, or geographically dispersed infrastructure.

Strong proficiency in programming / scripting languages (e.g., Python, Java) for automation and tooling in distributed environments.

Expertise working in edge computing environments with a large number of remote facilities, managing observability for distributed, high-latency, or resource-constrained systems.

Experience with OpenTelemetry or other open-source observability frameworks optimized for edge computing.

Familiarity with chaos engineering principles to validate observability systems in edge environments.

Certifications in cloud platforms (Google Cloud Professional certification) or Kubernetes.

Strong problem-solving skills with a proactive, analytical mindset, particularly for addressing edge computing challenges.

Excellent communication and collaboration skills to work effectively with cross-functional teams across centralized and remote locations.

Ability to mentor and lead technical initiatives with a focus on observability and reliability in edge environments.

Comfortable working in a fast-paced, dynamic environment with a focus on delivering customer value.

Knowledge of incident management processes and tools (e.g., ServiceNow, xMatters, Opsgenie) tailored for distributed systems.

Deep understanding of monitoring, logging, and tracing concepts, including metrics collection, log aggregation, and distributed tracing for edge and centralized systems.

Familiarity with cloud infrastructure, CI / CD pipelines, and edge-specific deployment patterns.

Education

Bachelor’s degree, or equivalent experience (HS diploma + 4 years relevant experience)

Business Overview

Bring your heart to CVS Health Every one of us at CVS Health shares a single, clear purpose : Bringing our heart to every moment of your health. This purpose guides our commitment to deliver enhanced human-centric health care for a rapidly changing world. Anchored in our brand — with heart at its center — our purpose sends a personal message that how we deliver our services is just as important as what we deliver. Our Heart At Work Behaviors support this purpose. We want everyone who works at CVS Health to feel empowered by the role they play in transforming our culture and accelerating our ability to innovate and deliver solutions to make health care more personal, convenient and affordable. We strive to promote and sustain a culture of diversity, inclusion and belonging every day. CVS Health is an affirmative action employer, and is an equal opportunity employer, as are the physician-owned businesses for which CVS Health provides management services. We do not discriminate in recruiting, hiring, promotion, or any other personnel action based on race, ethnicity, color, national origin, sex / gender, sexual orientation, gender identity or expression, religion, age, disability, protected veteran status, or any other characteristic protected by applicable federal, state, or local law. We proudly support and encourage people with military experience (active, veterans, reservists and National Guard) as well as military spouses to apply for CVS Health job opportunities.

Pay Range

The typical pay range for this role is :

$118,450.00 - $284,280.00

This pay range represents the base hourly rate or base annual full-time salary for all positions in the job grade within which this position falls. The actual base salary offer will depend on a variety of factors including experience, education, geography and other relevant factors. This position is eligible for a CVS Health bonus, commission or short-term incentive program in addition to the base pay range listed above. This position also includes an award target in the company’s equity award program.

Benefits & Additional Information

Our people fuel our future. Our teams reflect the customers, patients, members and communities we serve and we are committed to fostering a workplace where every colleague feels valued and that they belong.

Great benefits for great people

We take pride in our comprehensive and competitive mix of pay and benefits – investing in the physical, emotional and financial wellness of our colleagues and their families to help them be the healthiest they can be. In addition to our competitive wages, our great benefits include :

Affordable medical plan options, a 401(k) plan (including matching company contributions), and an employee stock purchase plan .

No-cost programs for all colleagues including wellness screenings, tobacco cessation and weight management programs, confidential counseling and financial coaching.

Benefit solutions that address the different needs and preferences of our colleagues including paid time off, flexible work schedules, family leave, dependent care resources, colleague assistance programs, tuition assistance, retiree medical access and many other benefits depending on eligibility.

For more information, please refer to the benefits section of CVS Health careers site.

We anticipate the application window for this opening will close on : 10 / 23 / 2025

Qualified applicants with arrest or conviction records will be considered for employment in accordance with all federal, state and local laws.

We are an equal opportunity and affirmative action employer. We do not discriminate in recruiting, hiring, promotion, or any other personnel action based on race, ethnicity, color, national origin, sex / gender, sexual orientation, gender identity or expression, religion, age, disability, protected veteran status, or any other characteristic protected by applicable federal, state, or local law.

#J-18808-Ljbffr

serp_jobs.job_alerts.create_a_job

Site Reliability Engineer • Boston, MA, United States