Search jobs > Boston, MA > Lead site reliability

Lead Site Reliability Engineer - Platform Infrastructure

Klaviyo
Boston, Massachusetts, US
$192K-$288K a year
Full-time

At Klaviyo, we value the unique backgrounds, experiences and perspectives each Klaviyo (we call ourselves Klaviyos) brings to our workplace each and every day.

We believe everyone deserves a fair shot at success and appreciate the experiences each person brings beyond the traditional job requirements.

If youre a close but not exact match with the description, we hope youll still consider applying. Want to learn more about life at Klaviyo?

Visit to see how we empower creators to own their own destiny.

Lead Site Reliability Engineering (SRE) is what you get when you treat system operations as a software engineering problem.

The mission of the Site Reliability Engineering team is to ensure uninterrupted service for Klaviyo customers and act as a force multiplier for Klaviyo product teams to deliver better software , building self-healing applications and eking out every drop of performance , you will own foundational Klaviyo services and make a big impact on the productivity of our product engineering teams.

Klaviyo is growing fast and we have openings for all skill levels across all of our teams. Learn more about our engineering culture at

How You'll Make a Difference

  • Ship foundational services to enable Klaviyo engineering to move faster with confidence
  • Design and develop systems and processes that enable highly available & scalable systems
  • Achieve break-throughs in systems throughput by identifying and eliminating bottlenecks
  • Leverage technology such as Python, AWS, Django, Kubernetes, Bash, Terraform, MySQL, Redis, Cassandra, Postgresql to advance Klaviyos platform
  • Champion best practices by actively collaborating with other teams in a culture that values whiteboarding and technical design review
  • Contribute to the company in multiple areas, constantly pushing yourself to be a better engineer and to level up all of your peers within your team and within Klaviyo.
  • Design, write and deliver software to dramatically improve the availability, scalability, latency, and efficiency of Klaviyos services
  • Participate in periodic on call duties with a focus on solving issues when they are discovered, preventing recurrences and minimizing alert fatigue
  • Implement architectural improvements to achieve breakthrough results in Klaviyo systems operational scalability and reliability.
  • Work hand-in-hand with product-facing engineers and other SREs to ship impactful code
  • Perform quantitative analysis to understand and scale Klaviyo systems
  • Uncover and advocate for preventative, upstream solutions with internal stakeholders
  • Evangelize Site Reliability best practices across the engineering organization

Who You Are

  • Solid 10+ years of experience in the SRE / Devops field
  • BA or BS Degree in Computer Science, related field, or equivalent experience
  • Ability to handle yourself in outage situations and to drive failures to root cause analysis and prevention of future issues
  • Understanding of Linux (we run Ubuntu) and all layers of the networking stack
  • Experience working on an engineering team building software
  • Experience writing code using best practices in a language such as Python, Ruby, Go, etc.

The pay range for this role is listed below. Sales roles are also eligible for variable compensation and hourly non-exempt roles are eligible for overtime in accordance with applicable law.

This role is eligible for benefits, including : medical, dental and vision coverage, health savings accounts, flexible spending accounts, 401(k), flexible paid time off and company-paid holidays and a culture of learning that includes a learning allowance and access to a professional coaching service for all employees.

Base Pay Range For US Locations :

$192,000 $288,000 USD

Get to Know Klaviyo

Were Klaviyo (pronounced clay-vee-oh). We empower creators to own their destiny by making first-party data accessible and actionable like never before.

We see limitless potential for the technology were developing to nurture personalized experiences in ecommerce and beyond.

To reach our goals, we need our own crew of remarkable creatorsambitious and collaborative teammates who stay focused on our north star : delighting our customers.

If youre ready to do the best work of your career, where youll be welcomed as your whole self from day one and supported with generous benefits, we hope youll join us.

Klaviyo is committed to a policy of equal opportunity and non-discrimination. We do not discriminate on the basis of race, ethnicity, citizenship, national origin, color, religion or religious creed, age, sex (including pregnancy), gender identity, sexual orientation, physical or mental disability, veteran or active military status, marital status, criminal record, genetics, retaliation, sexual harassment or any other characteristic protected by applicable law.

IMPORTANT NOTICE : Our company takes the security and privacy of job applicants very seriously. We will never ask for payment, bank details, or personal financial information as part of the application process.

All our legitimate job postings can be found on our official career site. Please be cautious of job offers that come from non-company email addresses (@), instant messaging platforms, or unsolicited calls.

You can find our Job Applicant Privacy Notice here .

30+ days ago
Related jobs
Promoted
Takeda
Boston, Massachusetts

Proven experience with platform engineering automation lifecycle, leveraging infrastructure as code, configuration management, orchestration, modern patterns, APIs, and site reliability engineering concepts. The Platform Engineering Lead for Shared Infrastructure Platform. Lead and oversee platform ...

Promoted
Apple
Cambridge, Massachusetts

The role further offers a learning platform to dig into the latest research about on-device machine learning, an exciting ML front-tier ! Possible example areas include efficient inference, model compression, ML compilers, and/or federated learning. Understanding about performance modeling and profi...

Promoted
Zelis Healthcare, LLC
Boston, Massachusetts

Contribute to the development and maintenance of the infrastructure roadmap, focusing on initiatives that enhance security, reliability, and operational excellence. We foster a hybrid and remote friendly culture and all of our employee's work locations are based on the needs of the position and dete...

CIRCLE
Boston, Massachusetts

As a Senior Site Reliability Engineer at Circle, you will design, build, and maintain Circle’s infrastructure estate to meet the growing worldwide customer base on public cloud providers across multiple regions. Staff Site Reliability Engineer (IV). Senior Site Reliability Engineer (III). Senior Sit...

Syrinx
Needham, Massachusetts

As a Site Reliability Engineer, you will help design, analyze and resolve issues with infrastructure in collaboration with product development teams; you will design, deploy and manage automation tools that increase predictability as well as decrease time to market while reducing cost. Redis)• Exper...

MongoDB
Boston, Massachusetts

The Cloud Site Reliability Engineering Team designs and builds the global infrastructure on which we deploy our services. The Cloud Team is responsible for several services including MongoDB Atlas - our database as a service offering and fastest growing product, MongoDB Realm- our serverless platfor...

State Street
Boston, Massachusetts

To better manage the migration and to establish continuously supported infrastructure for our developers and deployment leads, we are creating a new role within the team that will focus on AI Platform and Ops. As AI Platform Engineer and Ops Lead for the Bionics Group you will. The right candidate w...

The Resource Technology Partners
Boston, Massachusetts

Senior Software Engineer - Greenfield Development (Full-time). As an experienced Engineer and a senior member in our team, you’ll be immersed in all the elements of Software Development Lifecycles - design, development, integration, operation, support and testing. You’ll be instrumental in crafting ...

Splunk Inc
Massachusetts, United States
Remote

Site Reliability Engineers in this role will be engaging with multiple service owners across the platform to teach and implement modern interpretations ofSRE,observability, Chaos Engineering andDevOps. Splunk's Cloud Services group is looking for a Site ReliabilityEngineer to help lead, design and b...

Motion Recruitment
Concord, Massachusetts

This is a fulltime opportunity for a Senior Site Reliability Engineer with a software product company in the manufacturing and mechanical engineering space. You will be the first US based Site Reliability Engineer and will be responsible for working across verticals and regions to deliver and mainta...