Talent.com
Senior Site Reliability Engineer
Senior Site Reliability EngineerAhold Delhaize USA • Quincy, MA, US
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Ahold Delhaize USA • Quincy, MA, US
job_description.job_card.variable_hours_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
job_description.job_card.job_description

Ahold Delhaize USA, a division of global food retailer Ahold Delhaize, is part of the U.S. family of brands, which includes five leading omnichannel grocery brands - Food Lion, Giant Food, The GIANT Company, Hannaford and Stop & Shop.

The Site Reliability Engineer (SRE) III is responsible for ensuring the scalability, reliability, and performance of production systems through automation, observability, incident response, and infrastructure engineering. This role involves designing and implementing robust operational processes and tooling to support highly available, fault-tolerant systems in a cloud-native environment.

The SRE III collaborates closely with engineering squads, product teams, and stakeholders to embed reliability best practices across the software delivery lifecycle. The role includes ownership of system uptime, service level objectives (SLOs), and operational excellence, along with mentoring junior engineers and leading cross-functional initiatives that improve system resilience.

Our flexible / hybrid work schedule includes 3 in-person days at either our Chicago, IL office, Quincy, MA office, or Salisbury, NC office and 2 remote days.

Applicants must be currently authorized to work in the United States on a full-time basis.

Responsibilities

  • Design and implement infrastructure solutions that ensure system availability, scalability, and reliability across cloud-native environments like AKS and Kubernetes.
  • Develop automation for provisioning, deployment, configuration, monitoring, and incident remediation using tools such as Terraform, ArgoCD, and GitHub Actions.
  • Collaborate with engineering teams to define and track service level objectives (SLOs) and service level indicators (SLIs).
  • Build and manage microservices-based platforms leveraging Spring Boot, Java, Tomcat, and Redis.
  • Monitor production environments using Datadog and proactively address performance and reliability issues.
  • Perform root cause analysis and lead post-incident reviews to drive continual improvement.
  • Manage CI / CD pipelines and deployment automation using GitHub, Docker, and container orchestration technologies.
  • Create and maintain infrastructure as code (IaC) using Terraform, with deployment pipelines integrated into GitOps workflows.
  • Lead and support operational readiness reviews, game days, chaos engineering practices, and failure mode analysis.
  • Build scalable observability and alerting frameworks with Datadog.
  • Implement resilient, asynchronous architectures using Kafka for event-driven services.
  • Reduce operational toil through self-healing automation and proactive system tuning.
  • Troubleshoot Linux-based environments such as Ubuntu and optimize them for reliability.
  • Provide on-call support and ensure 24 / 7 / 365 system reliability for mission-critical applications.
  • Collaborate with the security team to enforce secure operational practices and cloud compliance.
  • Mentor junior engineers and contribute to documentation, technical design, and knowledge-sharing across the organization.

Qualifications

  • Bachelor's Degree in Computer Science, Information Systems, or a related technical field; equivalent training, certifications, or experience will be considered.
  • 5+ years of experience in a Site Reliability Engineering, or DevOps, or Java programming role.
  • Experience managing production-grade systems and services on AKS / Kubernetes in distributed environments.
  • Proficiency in programming and scripting languages including Python, Java, Bash, or Go.
  • Proven experience with Spring Boot, Tomcat, Redis, and microservices architecture.
  • Hands-on experience in managing Linux environments, particularly Ubuntu.
  • Proficiency with observability stacks and performance monitoring using Datadog, Prometheus, and ELK.
  • Deep understanding of containerization and orchestration using Docker, Kubernetes, and ArgoCD.
  • Experience managing event-driven systems using Kafka.
  • Expertise in IaC and automation using Terraform and GitHub Actions.
  • Familiarity with networking concepts, DNS, load balancing, and cloud infrastructure (AWS, Azure, or GCP).
  • Strong analytical, debugging, and problem-solving skills.
  • Excellent verbal and written communication skills and the ability to collaborate effectively across teams.
  • Salary Range : $125,040 - $187,560

    Ahold Delhaize USA is an equal opportunities employer.

    J-18808-Ljbffr

    serp_jobs.job_alerts.create_a_job

    Senior Site Reliability Engineer • Quincy, MA, US

    Job_description.internal_linking.related_jobs
    Senior System Reliability Analysis Engineer

    Senior System Reliability Analysis Engineer

    Draper Labs • Cambridge, MA, United States
    serp_jobs.job_card.full_time
    Draper is an independent, nonprofit research and development company headquartered in Cambridge, MA.The 2,000+ employees of Draper tackle important national challenges with a promise of delivering ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Sr. Reliability Engineer

    Sr. Reliability Engineer

    Raytheon • Tewksbury, Massachusetts, US
    serp_jobs.job_card.full_time
    Date Posted : 2025-10-06 Country : United States of America Location : MA133 : Tewksbury, Ma Bldg 3 Concord 50 Apple Hill Drive Concord - Building 3, Tewksbury, MA, 01876 USA Position Role Type : Onsite...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Utilities / Facilities Site Leader (R&D Site)

    Utilities / Facilities Site Leader (R&D Site)

    Mentor Technical Group • Boston, MA, US
    serp_jobs.job_card.full_time
    serp_jobs.filters_job_card.quick_apply
    Mentor Technical Group (MTG) provides a comprehensive portfolio of technical support and solutions for the FDA-regulated industry. As a world leader in life science engineering and technical solutio...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30
    Senior Engineer, Reliability Engineering

    Senior Engineer, Reliability Engineering

    Raytheon • Tewksbury, MA, United States
    serp_jobs.job_card.full_time
    MA134 : Innovation Dr Tewks Bdg 400 836 North Street Building 400, Tewksbury, MA, 01876 USA.Person, or Immigration Status Requirements : . At Raytheon, the foundation of everything we do is rooted in o...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Senior Manager, Site Reliability Engineering

    Senior Manager, Site Reliability Engineering

    Xometry • Boston, MA, US
    serp_jobs.job_card.full_time
    Xometry (NASDAQ : XMTR) powers the industries of today and tomorrow by connecting the people with big ideas to the manufacturers who can bring them to life. Xometry's digital marketplace gives ma...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Senior Platform Engineer

    Senior Platform Engineer

    Raft • Hanscom Air Force Base, MA, United States
    serp_jobs.job_card.full_time
    All of the programs we support require.All work must be conducted within the continental U.Distributed Data Systems, Platforms at Scale, and Complex Application Development, with headquarters in Re...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Software Reliability Engineer

    Software Reliability Engineer

    Raft • Hanscom Air Force Base, MA, United States
    serp_jobs.job_card.full_time
    All of the programs we support require.All work must be conducted within the continental U.Distributed Data Systems, Platforms at Scale, and Complex Application Development, with headquarters in Mc...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Reliability Engineering Co-Op - Spring 2026

    Reliability Engineering Co-Op - Spring 2026

    Entegris • Billerica, MA, United States
    serp_jobs.job_card.full_time
    Reliability Engineering Co-Op - Spring 2026.Reliability Engineering Co-Op - Spring 2026 Here at Entegris, we use advanced science to enable technologies that transform the world, and we are seeking...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Senior Reliability, Availability, Maintainability (RAM) System Engineer

    Senior Reliability, Availability, Maintainability (RAM) System Engineer

    McBride • Bedford, MA, US
    serp_jobs.job_card.full_time
    Senior Reliability, Availability, Maintainability (RAM) System Engineer.McBride is looking for a Reliability, Availability, Maintainability (RAM) System Engineer to join the Force Protection Branch...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_hours • serp_jobs.job_card.promoted • serp_jobs.job_card.new
    Director of Site Reliability Engineering

    Director of Site Reliability Engineering

    Oscar • Boston, MA, United States
    serp_jobs.job_card.full_time +1
    My client is searching for a Director of Site Reliability Engineering to play a pivotal role in scaling operations, strengthening platform reliability, and shaping the long-term DevOps vision.This ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Mover (Taskrabbit)

    Mover (Taskrabbit)

    Taskrabbit • Rockport, MA, US
    serp_jobs.job_card.full_time
    Taskrabbit is looking for capable, hardworking individuals to join our global network of independent service providers, who we call Taskers. Whether you're experienced with physical labor or you...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Civil Design Engineer (Land Development)

    Civil Design Engineer (Land Development)

    Jobot • Kingston, MA, US
    serp_jobs.job_card.full_time
    Civil Design Engineer (Land Development) - Competitive Salary, Bonus, Benefits, Work / Life Balance!!!.This Jobot Job is hosted by : Tony Barhoum. Are you a fit? Easy Apply now by clicking the "Apply N...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Reliability Engineer

    Reliability Engineer

    Vicor Corp. • Andover, MA, United States
    serp_jobs.job_card.full_time
    The applications in which our products are used are typically in the higher-performance, higher-power segments of the market segments we serve. Our products are sold worldwide to customers ranging f...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Lead Semiconductor Reliability Engineer

    Lead Semiconductor Reliability Engineer

    Raytheon • Boxford, MA, United States
    serp_jobs.job_card.full_time
    MA112 : Andover MA 358 Lowell St Dukes 358 Lowell Street Dukes, Andover, MA, 01810 USA.Person, or Immigration Status Requirements : . The ability to obtain and maintain a U.At Raytheon, the foundation ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Senior Full Stack Engineer

    Senior Full Stack Engineer

    Raft • Hanscom Air Force Base, MA, United States
    serp_jobs.job_card.full_time
    All of the programs we support require.All work must be conducted within the continental U.Distributed Data Systems, Platforms at Scale, and Complex Application Development, with headquarters in Mc...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Sr. Manager - Site Reliability Engineering (SRE)

    Sr. Manager - Site Reliability Engineering (SRE)

    1010 Analog Devices Inc. • Wilmington, MA, United States
    serp_jobs.job_card.full_time +1
    NASDAQ : ADI ) is a global semiconductor leader that bridges the physical and digital worlds to enable breakthroughs at the Intelligent Edge. ADI combines analog, digital, and software technologie...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    LogRocket, Inc • Boston, MA, United States
    serp_jobs.job_card.full_time
    LogRocket is an equal opportunity employer.We celebrate diversity and are committed to creating an inclusive environment for all employees. LogRocket will consider sponsoring visas for applicants in...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Senior Reliability Engineer

    Senior Reliability Engineer

    GE Aerospace • Lynn, MA, US
    serp_jobs.job_card.full_time
    Working at GE Aerospace means bringing your unique perspective, innovative spirit, drive, and curiosity to a collaborative and diverse team working to advance aerospace for future generations.If yo...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_hours • serp_jobs.job_card.promoted • serp_jobs.job_card.new