Talent.com
serp_jobs.error_messages.no_longer_accepting
Senior Manager, Site Reliability Engineering

Senior Manager, Site Reliability Engineering

Western Governors UniversitySalt Lake City, UT, United States
job_description.job_card.30_days_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
  • serp_jobs.job_card.permanent
job_description.job_card.job_description

If you're passionate about building a better future for individuals, communities, and our country-and you're committed to working hard to play your part in building that future-consider WGU as the next step in your career.

Driven by a mission to expand access to higher education through online, competency-based degree programs, WGU is also committed to being a great place to work for a diverse workforce of student-focused professionals. The university has pioneered a new way to learn in the 21st century, one that has received praise from academic, industry, government, and media leaders. Whatever your role, working for WGU gives you a part to play in helping students graduate, creating a better tomorrow for themselves and their families.

The salary range for this position takes into account the wide range of factors that are considered in making compensation decisions including but not limited to skill sets; experience and training; licensure and certifications; and other business and organizational needs.

At WGU, it is not typical for an individual to be hired at or near the top of the range for their position, and compensation decisions are dependent on the facts and circumstances of each case. A reasonable estimate of the current range is :

Grade : Management Technical 715

Pay Range : $ - $

Job Description

The Senior Manager of Site Reliability Engineering (SRE) leads the function responsible for ensuring that critical systems and services are reliable, scalable, and resilient. The role combines technical leadership with organizational management, directing SRE teams in designing, implementing, and operating infrastructure that supports business needs. This position defines service reliability standards, drives incident response practices, oversees automation initiatives, and partners with other engineering and product teams to balance reliability with delivery velocity. This position's main objective is to improve reliability, performance, and operational efficiency to ensure our students and faculty are delighted with the fully online educational experience.

Primary Responsibilities

  • Leads and mentors SRE teams, creating an environment that encourages ownership, collaboration, and continuous improvement.
  • Establishes the SRE vision, goals, and operational strategies in alignment with organizational objectives.
  • Defines reliability roadmaps and communicate priorities to engineering and executive stakeholders.
  • Develops, drives, and supports Service Level Objectives (SLOs), Indicators (SLIs), and Agreements (SLAs) across systems.
  • Directs incident management processes, including response coordination, root cause analysis, and follow-up actions.
  • Implements practices that reduce downtime and ensure systems meet availability, scalability, and performance expectations.
  • Drives adoption of infrastructure as code, CI / CD pipelines, and automated testing to improve operational efficiency.
  • Oversees monitoring, alerting, and observability systems that provide insight into service health.
  • Evaluates and implements emerging tools that enhance service reliability and reduce manual toil.
  • Collects and evaluates system and application data to improve the performance and reliability of the environment proactively.
  • Partners with software engineering, security, and product teams to integrate reliability into all development lifecycle phases.
  • Provides senior leadership and other stakeholders with transparent reporting on reliability trends, risks, and improvement initiatives.
  • Fosters a culture of blameless postmortems and shared accountability for uptime and performance.
  • Promotes best practices for resilience, scalability, and disaster recovery.
  • Regularly assesses and improves reliability processes and team workflows.
  • Stays informed of evolving technologies and practices in SRE, DevOps, AI, Machine Learning, and cloud infrastructure.
  • Performs other related duties as assigned.

This job description includes a general representation of job requirements rather than a comprehensive inventory of all required responsibilities or work activities. The contents of this document or related job requirements may change at any time with or without notice.

Qualifications

Knowledge, Skills, and Abilities

  • Strong understanding of distributed systems, cloud-native architectures, and infrastructure design.
  • Deep familiarity with cloud service providers (AWS, GCP, Azure) and their reliability and security best practices.
  • Knowledge of software development lifecycles, DevOps principles, and SRE practices such as SLOs, SLIs, and error budgets.
  • Understanding of networking, storage, and systems performance concepts.
  • Knowledge of compliance, data security, and regulatory requirements relevant to system reliability and operations.
  • Skills

  • Technical proficiency in infrastructure as code, automation frameworks, and modern programming / scripting languages (Python, Go, Bash, etc.).
  • Expertise in monitoring, logging, and observability platforms (Prometheus, Grafana, Datadog, Splunk, etc.).
  • Skilled in incident management, root cause analysis, and postmortem processes.
  • Strong leadership and people management skills, with experience developing and scaling technical teams.
  • Effective communication skills, including the ability to explain technical concepts to both engineers and executives.
  • Strong problem-solving, prioritization, and decision-making skills under pressure.
  • Abilities

  • Ability to balance short-term operational needs with long-term reliability and scalability goals.
  • Ability to foster a culture of reliability, accountability, and continuous improvement within technical teams.
  • Ability to collaborate across engineering, product, and business teams to align reliability efforts with strategic goals.
  • Ability to anticipate system weaknesses and proactively design resilience into infrastructure and applications.
  • Ability to lead through influence, driving adoption of SRE practices across the organization.
  • Ability to adapt to evolving technologies, industry practices, and organizational needs.
  • Education

  • Bachelor's or Master's degree in Computer Science, Engineering, or a related technical field, or equivalent professional experience.
  • Experience

  • 8+ years of experience in Software Engineering / Development with some knowledge of SRE
  • 3+ years of experience managing or leading technical teams, preferably in a reliability or infrastructure-focused capacity.
  • Proven track record of delivering reliable, scalable systems in complex environments.
  • Strong expertise with cloud platforms such as AWS, GCP, or Azure.
  • Hands-on experience with Kubernetes, container orchestration, and microservices architectures.
  • Proficiency with infrastructure as code and automation tools (Terraform, Ansible, Pulumi, etc.).
  • Solid programming or scripting ability in Python, Go, Java, JavaScript, and / or Bash.
  • Deep understanding of monitoring, logging, and observability systems (e.g., New Relic, Grafana, Datadog, Splunk, Dynatrace).
  • Experience implementing and managing SLOs, SLIs, and SLAs to measure and improve service reliability.
  • Leadership Qualifications
  • Demonstrated ability to build, mentor, and lead high-performing engineering teams.
  • Strong communication skills with the ability to engage technical teams and executive leadership.
  • Ability to balance immediate operational demands with long-term reliability strategy.
  • Experience fostering a blameless culture of incident management and continuous improvement.
  • Strategic mindset with the ability to align technical priorities to business goals.
  • At WGU, it is not typical for an individual to be hired at or near the top of the range for their position, and compensation decisions are dependent on the facts and circumstances of each case. A reasonable estimate of the current range is :
  • Pay Range : $170,400.00 - $281,200.00
  • Experience in lieu of education

    An equivalent combination of training, experience, credentials, or accomplishments demonstrating the ability to perform the essential functions of this job may substitute for education degree requirements.

    Position & Application Details

    Full-Time Regular Positions (classified as regular and working 40 standard weekly hours) : This is a full-time, regular position (classified for 40 standard weekly hours) that is eligible for bonuses; medical, dental, vision, telehealth and mental healthcare; health savings account and flexible spending account; basic and voluntary life insurance; disability coverage; accident, critical illness and hospital indemnity supplemental coverages; legal and identity theft coverage; retirement savings plan; wellbeing program; discounted WGU tuition; and flexible paid time off for rest and relaxation with no need for accrual, flexible paid sick time with no need for accrual, 11 paid holidays, and other paid leaves, including up to 12 weeks of parental leave.

    How to Apply : If interested, an application will need to be submitted online. Internal WGU employees will need to apply through the internal job board in Workday.

    Additional Information

    Disclaimer : The job posting highlights the most critical responsibilities and requirements of the job. It's not all-inclusive.

    Accommodations : Applicants with disabilities who require assistance or accommodation during the application or interview process should contact our Talent Acquisition team at recruiting@wgu.edu.

    Equal Employment Opportunity : All qualified applicants will receive consideration for employment without regard to any protected characteristic as required by law.

    serp_jobs.job_alerts.create_a_job

    Engineering Manager • Salt Lake City, UT, United States

    Job_description.internal_linking.related_jobs
    • serp_jobs.job_card.promoted
    Site Service Preconstruction Manager

    Site Service Preconstruction Manager

    SennecaSalt Lake City, UT, US
    serp_jobs.job_card.full_time
    Site Service Preconstruction Manager.Our company offers a comprehensive and competitive benefits package, including medical, dental, and vision insurance. company-paid life insurance; short-term an...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    • serp_jobs.job_card.new
    Grand Lodge Maintenance Technician Level II - Winter 2025 - 26

    Grand Lodge Maintenance Technician Level II - Winter 2025 - 26

    Deer Valley ResortCottonwood Heights, UT, United States
    serp_jobs.job_card.full_time
    Please note, this position is located at Deer Valley Resort in Park City, UT.Classic, consistent quality from a winning team!. Deer Valley Resort is nestled in the Wasatch Mountains of Utah, in the ...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_hours
    • serp_jobs.job_card.promoted
    Senior Manager of Completion

    Senior Manager of Completion

    VirtualVocationsSalt Lake City, Utah, United States
    serp_jobs.job_card.full_time
    A company is looking for a Completion Senior Manager - Level 6.Key Responsibilities Lead a team of over 100 employees, focusing on culture, safety, quality, cost, and schedule obligations Overse...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_1_day
    • serp_jobs.job_card.promoted
    Technical Senior Manager, SecOps

    Technical Senior Manager, SecOps

    VirtualVocationsProvo, Utah, United States
    serp_jobs.job_card.full_time
    A company is looking for a Technical Senior Manager, SecOps.Key Responsibilities Act as the primary technical escalation point for complex operational issues Maintain and optimize critical syste...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_1_day
    • serp_jobs.job_card.promoted
    Senior Software Engineering Manager

    Senior Software Engineering Manager

    VirtualVocationsProvo, Utah, United States
    serp_jobs.job_card.full_time
    Manager, Software Engineering-EDI (Remote).Key Responsibilities Manage the vision, implementation, and maintenance of EDI and data integration applications Supervise development staff and coordi...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Property & Casualty Systems Leader

    Property & Casualty Systems Leader

    VirtualVocationsProvo, Utah, United States
    serp_jobs.job_card.full_time
    A company is looking for a Property & Casualty Systems Leader.Key Responsibilities Manage product implementation for AMS and integrated platforms, including request prioritization and refinement ...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    • serp_jobs.job_card.new
    Engineer, Reliability

    Engineer, Reliability

    AES CorporationSalt Lake City, UT, United States
    serp_jobs.job_card.full_time
    Are you ready to be part of a company that's not just talking about the future, but actively shaping it? Join The AES Corporation (NYSE : AES), a. AES is committed to shaping a future through innovat...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_hours
    • serp_jobs.job_card.promoted
    Site Reliability Engineer II

    Site Reliability Engineer II

    VirtualVocationsProvo, Utah, United States
    serp_jobs.job_card.full_time
    A company is looking for a Site Reliability Engineer II- Process Automation.Key Responsibilities Optimize and automate incident and change management processes to enhance system efficiency and re...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    VirtualVocationsProvo, Utah, United States
    serp_jobs.job_card.full_time
    A company is looking for an Operations Engineer - (Site Reliability Engineer).Key Responsibilities Design, implement, and maintain scalable systems for production and test environments Identify ...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Senior Site Contract Manager

    Senior Site Contract Manager

    VirtualVocationsProvo, Utah, United States
    serp_jobs.job_card.full_time
    A company is looking for a Senior Site Contract Manager in Oncology.Key Responsibilities Review and analyze contracts, agreements, and legal documents related to clinical trials and vendor relati...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_1_day
    • serp_jobs.job_card.promoted
    Lift Maintenance Electronic Technician - Year Round

    Lift Maintenance Electronic Technician - Year Round

    Deer Valley ResortCottonwood Heights, UT, US
    serp_jobs.job_card.full_time
    Please note, this position is located at Deer Valley Resort in Park City, UT.Classic, consistent quality from a winning team!. Deer Valley Resort is nestled in the Wasatch Mountains of Utah, in the ...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    VirtualVocationsProvo, Utah, United States
    serp_jobs.job_card.full_time
    A company is looking for a Senior / Staff Software Engineer (SRE).Key Responsibilities Design, automate, and scale secure cloud infrastructure Lead incident response and manage system outages Mai...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    BankTalent HQMidvale, UT, United States
    serp_jobs.job_card.full_time
    Zions Bancorporation's Enterprise Technology and Operations (ETO) team is transforming what it means to work for a financial institution. With a commitment to technology and innovation, we have been...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Site Manager 1

    Site Manager 1

    Clearance JobsMagna, UT, US
    serp_jobs.job_card.full_time
    At Northrop Grumman, our employees have incredible opportunities to work on revolutionary systems that impact people's lives around the world today, and for generations to come.Our pioneering and i...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Senior Technical Engagement Manager

    Senior Technical Engagement Manager

    VirtualVocationsSalt Lake City, Utah, United States
    serp_jobs.job_card.full_time
    A company is looking for a Senior Technical Engagement Manager.Key Responsibilities Manage engagement project workflows for cyber incident recovery across various environments Oversee timelines,...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_1_day
    • serp_jobs.job_card.promoted
    Site Reliability Engineer (SRE)

    Site Reliability Engineer (SRE)

    Unisys CorporationSalt Lake City, UT, United States
    serp_jobs.job_card.full_time
    What success looks like in this role : .Design, implement, and manage scalable and reliable systems.Monitor system performance and troubleshoot issues. Collaborate with development teams to improve th...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Site Engineering Manager - Missile Systems

    Site Engineering Manager - Missile Systems

    MoogSalt Lake City, UT, US
    serp_jobs.job_card.permanent
    Site Engineering Manager - Missile Systems.We have an exciting opportunity for a skilled Site Engineering Manager to join Moog's Space and Defense Group in our high-production facility in Salt Lake...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Senior Engineering Manager

    Senior Engineering Manager

    VirtualVocationsProvo, Utah, United States
    serp_jobs.job_card.full_time
    A company is looking for a Senior Engineering Manager, Dash Experiences.Key Responsibilities Lead, mentor, and scale a full-stack product engineering team driving Stacks and Start Page Own execu...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Senior Manager, Site Reliability Engineering

    Senior Manager, Site Reliability Engineering

    VirtualVocationsProvo, Utah, United States
    serp_jobs.job_card.full_time
    A company is looking for a Technical Senior Manager - Site Reliability Engineering.Key Responsibilities Provide exceptional client service and support through direct interaction Ensure client Se...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Site Reliability Engineering Manager

    Site Reliability Engineering Manager

    VirtualVocationsProvo, Utah, United States
    serp_jobs.job_card.full_time
    A company is looking for a Manager, Site Reliability Engineer.Key Responsibilities Ensure systems and services maintain high availability, reliability, and scalability Develop and maintain autom...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30