Search jobs > Boston, MA > Senior site reliability

Senior Site Reliability Engineer - Platform Infrastructure

Klaviyo
Boston, MA
$156.8K-$235.2K a year
Full-time

At Klaviyo, we value the unique backgrounds, experiences and perspectives each Klaviyo (we call ourselves Klaviyos) brings to our workplace each and every day.

We believe everyone deserves a fair shot at success and appreciate the experiences each person brings beyond the traditional job requirements.

If you’re a close but not exact match with the description, we hope you’ll still consider applying. Want to learn more about life at Klaviyo?

Visit to see how we empower creators to own their own destiny.

Engineers come to Klaviyo with experience in a variety of languages and from a number of disciplines. All engineers are expected to become extremely proficient in the technologies we use (not exhaustive) :

  • Python, Django
  • MySQL, Cassandra, RabbitMQ, Redis, Pulsar
  • Amazon Web Services (EC2, RDS, Aurora, etc.), Kubernetes on EKS

The SRE team builds foundational backend services as well as tooling and automation to allow product teams to release and scale their software reliably and predictably.

SREs are team players who embed themselves within product teams as needed to advance the architecture and performance of software systems and train their peers in topics such as debugging distributed systems, building self-healing applications and eking out every drop of performance possible.

Internally, we call this role Senior Site Reliability Engineer on the Platform Infrastructure team. As a Senior Site Reliability Engineer you will own multiple foundational Klaviyo services and make a big impact on the productivity of our product engineering teams.

Mission and Vision of the Platform Infrastructure SRE Team

Vision : Offer a programmatically accessible catalog of durable, reliable, and easy to use and maintain infrastructure components enabling quality product delivery and maintenance.

Mission : Provide self-service tooling that enables use of our infrastructure components in a consolidated, consistent fashion via API-bound schemas and automation.

What You'll be Working With

  • New Kubernetes infrastructure with ArgoCD - in the testing / iterating phase of the project, lots of teams to onboard, lots to learn and build out
  • Abstractions to create ease of use for engineering teams
  • Lots of collaboration opportunities both within and outside of SRE
  • EC2-based infrastructure tooling - just starting to design and implement this, want to make it on par with what we do for Kubernetes

How You'll Make a Difference

  • Ship foundational services to enable Klaviyo engineering to move faster with confidence
  • Design and develop systems and processes that enable highly available & scalable systems
  • Design, build and deliver software to dramatically improve the availability, scalability, latency, and efficiency of Klaviyo’s services
  • Achieve break-throughs in systems throughput by identifying and eliminating bottlenecks
  • Leverage technology such as Python, AWS, Django, Kubernetes, Bash, Terraform, MySQL, RabbitMQ, Redis, Cassandra, Postgresql to advance Klaviyo’s platform
  • Champion best practices by actively collaborating with other teams in a culture that values whiteboarding and technical design review
  • Contribute to the company as a subject matter expert in multiple areas, constantly pushing yourself to be a better engineer and to level up all of your peers within your team and within Klaviyo.
  • Mentor and pair with other Klaviyo engineers to build better software by focusing on performance, self-healing system, configuration as code;

defensive programming, application security, etc.

  • Participate in periodic on call duties with a focus on solving issues when they are discovered, preventing recurrences and minimizing alert fatigue
  • Prototype and advocate for architectural improvements to achieve breakthrough results in Klaviyo systems’ operational scalability and reliability
  • Work hand-in-hand with product-facing engineers to ship impactful code
  • Perform quantitative investigation to understand and scale Klaviyo systems and manage the cross-functional effort to resolve scalability issues
  • Produce and advocate for preventative, upstream solutions with internal stakeholders and external vendors and dependencies
  • Confidently make informed, data-driven choices in a fast paced environment with competing priorities

Who You Are

  • Knowledge of Linux operating systems and computer networking
  • Experience writing code in a programming language such as Python, Ruby, Go, etc.
  • Experience administering cloud-based infrastructure (e.g. AWS)
  • Ability to troubleshoot production issues related to computer infrastructure, configuration, monitoring, deployments, and continuous integration and delivery
  • Ability and willingness to learn
  • Ability to communicate clearly and mentor and coach others on a team
  • Ability to participate in an on-call rotation

The pay range for this role is listed below. Sales roles are also eligible for variable compensation and hourly non-exempt roles are eligible for overtime in accordance with applicable law.

This role is eligible for benefits, including : medical, dental and vision coverage, health savings accounts, flexible spending accounts, 401(k), flexible paid time off and company-paid holidays and a culture of learning that includes a learning allowance and access to a professional coaching service for all employees.

Base Pay Range For US Locations : $156,800 $235,200 USD

Get to Know Klaviyo

We’re Klaviyo (pronounced clay-vee-oh). We empower creators to own their destiny by making first-party data accessible and actionable like never before.

We see limitless potential for the technology we’re developing to nurture personalized experiences in ecommerce and beyond.

To reach our goals, we need our own crew of remarkable creators ambitious and collaborative teammates who stay focused on our north star : delighting our customers.

If you’re ready to do the best work of your career, where you’ll be welcomed as your whole self from day one and supported with generous benefits, we hope you’ll join us.

Klaviyo is committed to a policy of equal opportunity and non-discrimination. We do not discriminate on the basis of race, ethnicity, citizenship, national origin, color, religion or religious creed, age, sex (including pregnancy), gender identity, sexual orientation, physical or mental disability, veteran or active military status, marital status, criminal record, genetics, retaliation, sexual harassment or any other characteristic protected by applicable law.

IMPORTANT NOTICE : Our company takes the security and privacy of job applicants very seriously. We will never ask for payment, bank details, or personal financial information as part of the application process.

All our legitimate job postings can be found on our official career site. Please be cautious of job offers that come from non-company email addresses (@klaviyo), instant messaging platforms, or unsolicited calls.

You can find our Job Applicant Privacy Notice .

30+ days ago
Related jobs
Promoted
Capital One
Cambridge, Massachusetts
Remote

Senior Lead Engineer - Generative AI Infrastructure (Remote-Eligible). Because of our investments in public cloud infrastructure and machine learning platforms, we are now uniquely positioned to harness the power of AI. We are committed to building world-class applied science and engineering teams a...

Klaviyo
Boston, Massachusetts

Internally, we call this role Senior Site Reliability Engineer on the Platform Infrastructure team. As a Senior Site Reliability Engineer you will own multiple foundational Klaviyo services and make a big impact on the productivity of our product engineering teams. Mission and Vision of the Platform...

Promoted
Federal Reserve Bank of Cleveland
Boston, Massachusetts

Perform research activities with a focus on complex business systems, automated systems development, infrastructure, enterprise information systems, and project enhancements with Bank and System-wide impact. Identify and analyze operational impacts of new projects, services, infrastructure, enterpri...

Promoted
Apple
Cambridge, Massachusetts

The role further offers a learning platform to dig into the latest research about on-device machine learning, an exciting ML front-tier ! Possible example areas include efficient inference, model compression, ML compilers, and/or federated learning. ...

State Street
Quincy, Massachusetts

The State Street Cyber Architecture & Engineering team is looking for a C. Support our development teams across multiple environments in an AGILE environment to build and manage infrastructure at large scale. Audit existing services for problems in infrastructure with security, configuration, or oth...

Alarm.com
Boston, Massachusetts

Senior Software Engineer (Site Reliability Engineer). If the above holds true for you, then we would love to talk to you! is looking for a versatile Site Reliability Engineer to work on our Platform team. You will be part of the team that focuses on supporting our backend infrastructure, both triagi...

SS&C Technologies
Waltham, Massachusetts

Aloha is an all-new investment operations platform that provides extensive asset class and functional support across the front, middle and back office. We're seeking skilled DevOps Engineer to support Test Automation, Build & Release and Security teams. Lead discussions on cloud platform system ...

Siemens Healthcare Diagnostics Inc.
Norwood, Massachusetts

Participating in design reviews and workingwith engineering to drive DFR (Design for Reliability) process and tools andsupport in defining and executing reliability plans. Advanced degree preferred) in Mechanical /Electrical /Biomedical/Systems /Reliability Engineering or related discipline. Previou...

Eversource Energy
Westwood, Massachusetts

Molds the long-term company-wide resilience and reliability strategy by developing, linking and applying new and existing resiliency and reliability planning methods, models, equipment and technologies. Supports and/or leads the development of analytics tools needed for resilience and reliability pl...

State Street
Boston, Massachusetts

Globallink Infrastructure is looking to hire AVP level candidate for the Platform Engineering team. Automate testing of the software components built by Platform Engineering team. This is a hybrid role between Systems Engineering and Development. Work under minimal supervision to analyze, design, de...