Search jobs > San Francisco, CA > Site reliability engineer

Site Reliability Engineer

Retool
San Francisco, CA, United States
Full-time

WHY WE'RE LOOKING FOR YOU

As our first Site Reliability Engineer, you will be instrumental in defining and shaping the processes and practices for a pivotal new business offering.

You will play a crucial role in ensuring the reliability, scalability, and performance of our services while collaborating closely with our product and GTM teams.

This is a unique opportunity to significantly impact the direction and success of a key initiative within our company.

Reducing friction in deploying Retool is one of the largest levers for us to grow efficiently as a business. You’ll be figuring out how to productize a scalable deployment solution that is both effective and delightful for our customers.

This role requires a blend of deep technical expertise in site reliability engineering and a keen product sense to create solutions that not only perform well but also provide an exceptional developer experience.

IN THIS ROLE YOU'LL

  • Infrastructure Management : Design, implement, and manage scalable and resilient infrastructure using AWS, Kubernetes, and Terraform.
  • Process Shaping : Define and implement processes and practices that will support our new business offering, ensuring they are robust, scalable, and aligned with industry best practices.
  • Automation : Automate deployment and maintenance tasks to improve efficiency and scalability of this offering.
  • Documentation & Knowledge Sharing : Create and maintain comprehensive documentation for systems, processes, and procedures.

Mentor and guide other team members on best practices.

Monitoring & Alerting : Leverage existing observability systems to build new products that ensure the health and performance of our services.

THE SKILLSET YOU'LL BRING

  • Technical Expertise :
  • Strong experience with AWS and Kubernetes.
  • Proficiency in managing PostgreSQL databases.
  • Extensive experience with infrastructure as code (IaC) using Terraform.
  • Operational Experience :
  • Previous experience in a similar SRE or DevOps role, ideally within a SaaS environment.
  • Strong background in monitoring, logging, and alerting systems (e.g., Prometheus, Grafana, Datadog).
  • Programming Skills :
  • Proficiency in one or more programming languages (e.g., Python, Go, Java).
  • Problem-Solving Skills :
  • Excellent problem-solving skills and the ability to troubleshoot complex issues.
  • Collaboration & Communication :
  • Strong interpersonal and communication skills, with the ability to work effectively in a team-oriented environment.

NICE TO HAVE

  • Experience with CI / CD pipelines and tools (e.g., Buildkite, GitLab CI).
  • Knowledge of security best practices and tools.

J-18808-Ljbffr

12 days ago
Related jobs
Promoted
HashiCorp
San Francisco, California

As a Senior Site Reliability Engineer on the Infrastructure Services team, you will play a pivotal role in designing, building, and maintaining the infrastructure that underpins all HashiCorp cloud products. Have extensive experience in site reliability engineering, cloud infrastructure management, ...

Promoted
Cisco Systems, Inc.
San Francisco, California

As a Principal Site Reliability you will focus on innovating and providing strong technical vision as well as work with the team to build reliable, scalable and highly available datastores on a constantly growing multi-region scale platform. We're looking for a reliability-focused engineering l...

Promoted
Rakuten Symphony
San Mateo, California

We are looking for individuals to join our team across all functional areas of our business – from sales to engineering, support functions to product development. Proactively identify and address issues to meet service level agreements (SLA), ensuring the continuous reliability and performance of th...

Okta, Inc.
San Francisco, California

Senior Site Reliability Engineer to join a team focused on designing and developing Security solutions to harden our cloud infrastructure. Okta’s Workforce Identity Cloud Security Engineering group. You will act as a liaison between the Security org and the Engineering org to build technical leverag...

BHO Tech
San Francisco, California

We are looking for a Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will:. We are deliberate and self-reflective about the kind of engineering team and culture that we are building, seeking engineers that are not only strong in their own aptitudes but care deeply abo...

GitLab
San Francisco, California
Remote

Delivery engineers work closely with teams across Development, Quality, Security, and Reliability Engineering to ensure features are delivered in a safe, scalable and efficient fashion. SRE's with Delivery: Deployments specialization work alongside Backend Engineers with a focus primarily on improvi...

Cisco
San Francisco, California

As a Principal Site Reliability you will focus on innovating and providing strong technical vision as well as work with the team to build reliable, scalable and highly available datastores on a constantly growing multi-region scale platform. We’re looking for a reliability-focused engineering leader...

eTeam
Remote, CA
Remote

Minimum years exp in Terraform, Ansible, Networking, Jenkins, Python, GCP in Technology companies.Security (vulnerability management)....

Tecfino
San Francisco Bay Area, California

DevOps mindset and familiarity with the concept of Site Reliability Engineering – inherent sense of ownership through the development and deployment lifecycle. As part of the Cloud Platform Engineering team, you will be building Five9’s Classic on-prem and Modern SaaS platform. An ideal candidate fo...

Retool Inc.
San Francisco, California

As our first Site Reliability Engineer, you will be instrumental in defining and shaping the processes and practices for a pivotal new business offering. This role requires a blend of deep technical expertise in site reliability engineering and a keen product sense to create solutions that not only ...