Principal Site Reliability Engineer - Hybrid

Charles Schwab
South Shore, Illinois, United States
Full-time
We are sorry. The job offer you are looking for is no longer available.

Position Type : RegularYour opportunityAt Schwab, you are empowered to make an impact on your career. Here, innovative thought meets creative problem solving, helping us challenge the status quo and transform the finance industry together.

As a Principal Site Reliability Engineer for Schwab's Technology Solutions organization, you will be responsible for building a purposeful, proactive, and sustainable approach to reliability on a foundation of SRE principles.

You will partner with multiple support teams, architects, developers, and other stakeholders to develop common tools and guidance and drive adoption of key reliability engineering practices in support of large-scale and mission-critical services.

Through your deep SRE knowledge and history of implementation, you will have open, candid conversations with senior leaders and engineers and play a pivotal role in establishing a foundational SRE practice at Schwab.

The SRE Enablement team is part of Schwab’s Strategy & Business Management (SBM) organization where you will work with your counterparts to help implement the SRE practice through evangelization, organizational engagement, resiliency programs, office hours, leadership forums and an engineering focused Community of Practice.

What you haveRequired Qualification5+ years in SRE role plus at least 3 years in an architect or technical leadership role.

Minimum 3+ years of experience designing and implementing highly scalable and fault tolerant systems.In-depth knowledge of resilience patterns (i.

e. circuit breakers, timeouts, retries, etc.) and how to design and implement them.In-depth knowledge of CICD processes and tools to ensure software is delivered safely using known deployment strategies (i.

e. blue / green, canary deployments, feature toggles, etc.).3+ years hands-on experience with monitoring and observability tools (e.

g., Prometheus, Grafana, Datadog, Splunk), with a proven track record of setting up dashboards and alerts.Developed at least 5 scripts or tools that reduced repetitive operational toil.

Authored technical postmortems (at least weekly) with root cause analyses and documented action items that resulted in measurable resiliency improvements.

Authored and maintained comprehensive SRE documentation for at least 3 critical systems or workflows, including incident response guides, runbooks, operational playbooks, SLO implementation, and observability.

Contributed to the SLO strategy for at least 5 teams, ensuring alignment with business and client objectives.Presented findings or led training sessions at least twice annually to share SRE practices, enhancing team performance or adoption rates for reliability engineering methods.

Managed or mentored at least 2 junior engineers or teams in SRE best practices, with improvements in incident resolution speed and reliability metrics.

Led or participated in at least three cross-functional SRE-focused initiatives that included key stakeholders from both technical and business units.

Participated in resilience or chaos engineering exercises at least yearly, with documentation showing a reduction in unplanned downtime.

Preferred QualificationsEvangelize SRE mindset and practices across the Schwab Technology Solutions organization.Partner with support, development, and business stakeholders to develop, measure, and leverage service level objectives.

Design and develop solutions to eliminate toil and manual effort from day-to-day support responsibilities.Identify and implement improvements to logging, metrics, and tracing telemetry and triaging capabilities across a diverse technology stack.

Lead complex triage and postmortem activities for critical issues and drive prioritization / resolution of remediation items.

Perform chaos engineering experiments to improve application resilience to known and unknown failures.Document reliability guidance and best practices.

Advocate for and drive adoption of said practices.Foster a culture of learning through coaching, mentoring, and knowledge sharing around reliability practices, processes, and tools.

Develop tools, frameworks, and instrumentation to validate and increase release success for applications.What’s in it for youAt Schwab, we’re committed to empowering our employees’ personal and professional success.

Our purpose-driven, supportive culture, and focus on your development means you’ll get the tools you need to make a positive difference in the finance industry.

Our Hybrid Work and Flexibility approach balances our ongoing commitment to workplace flexibility, serving our clients, and our strong belief in the value of being together in person on a regular basis.

We offer a competitive benefits package that takes care of the whole you both today and in the future : 401(k) with company match and Employee stock purchase planPaid time for vacation, volunteering, and 28-day sabbatical after every 5 years of service for eligible positionsPaid parental leave and family building benefitsTuition reimbursementHealth, dental, and vision insurance

21 hours ago
Related jobs
Promoted
Charles Schwab
Oak Lawn, Illinois

As a Principal Site Reliability Engineer for Schwab's Technology Solutions organization, you will be responsible for building a purposeful, proactive, and sustainable approach to reliability on a foundation of SRE principles. You will partner with multiple support teams, architects, developers, and ...

Promoted
Gusto
Chicago, Illinois

Staff Site Reliability Engineer. Gusto's Infrastructure Engineering team enables our product teams to build impactful products by building secure, resilient, and accessible systems, using tools like AWS, terraform, and Kubernetes. Establish standards and build deterministic automation while optimizi...

Promoted
Charles Schwab
McCook, Illinois

As a Principal Site Reliability Engineer for Schwab's Technology Solutions organization, you will be responsible for building a purposeful, proactive, and sustainable approach to reliability on a foundation of SRE principles. You will partner with multiple support teams, architects, developers, and ...

Gusto
Chicago, Illinois

Staff Site Reliability Engineer. Gusto’s Infrastructure Engineering team enables our product teams to build impactful products by building secure, resilient, and accessible systems, using tools like AWS, terraform, and Kubernetes. Establish standards and build deterministic automation while optimizi...

iManage
Chicago, Illinois

Here is what one of our leaders, Principal Site Reliability Engineer ( Malik Muratovic ), has to say about the culture of the team:  "The most exciting part about our project is learning new technologies, expanding our knowledge base to grow our careers, and delivering a platform that de...

American College of Surgeons
Chicago, Illinois

This role involves handling complex issues, providing high-level technical support, and leading efforts to improve application reliability and performance. It is based in our Chicago office and is a hybrid role: 3 days per week in the office/2 days per week remote. ...

Circle
Chicago, Illinois

Senior or Staff Site Reliability Engineer - Performance EngineeringCircle is a financial technologypany at the epicenter of the emerging internet of money, where value can finally travel like other digital data - globally, nearly instantly and less expensively than legacy settlement systems. As a Se...

Splunk Inc
Illinois, United States
Remote

Site Reliability Engineers in this role will be engaging with multiple service owners across the platform to teach and implement modern interpretations ofSRE,observability, Chaos Engineering andDevOps. Splunk's Cloud Services group is looking for a Site ReliabilityEngineer to help lead, design and b...

FIS
Chicago, Illinois

Site Reliability Engineer (SRE) will focus on Scalability, High Availability, Performance, Stability and Reliability of Software Applications. SRE will build automations to simplify operations and processes, collaborate with cross-functional teams to create proactive engineering mechanisms and ensur...

Federal Reserve System
Chicago, Illinois
Remote

As a Senior Engineer of the SRE / Production Operations team for FedNow, you will operate the production environment for the program. The team uses open source and proprietary software to support Engineering, DevOps, and DevSecOps tools, services, and solutions. You will work closely with Engineers ...