Search jobs > Mountain View, CA > Site reliability engineer

Site Reliability Engineer

Altimetrik
Mountain View, CA, United States
Full-time

Design, implement, and maintain complex data systems supporting millions of customers with Cloud Native principles and best practices to ensure highly available, secure, performant and scalable database systems

  • Build and maintain CI / CD pipelines in Jenkins
  • Build and deploy services in Kubernetes cluster using helm, kustomize, etc
  • Contribute to infrastructure changes to AWS with deep understanding of AWS services
  • Engage in on-call for pre-production and production systems supporting multi-million users
  • Write / Review RCA docs to prevent recurrence of Incidents in future and share the learnings
  • Contribute to major system upgrades, deployment automation, monitoring enhancements and Production changes
  • Create operational playbooks, contribute to how-to articles, and gain domain knowledge to drive changes in the team
  • Participate and contribute in FMEA / Chaos testing, Security remediations, etc
  • Share best practices and patterns for operational excellence and cost optimization
  • Reduce or eliminate manual steps by automating as much as possible
  • Continuously look for opportunities to increase developer velocity and productivity

Qualifications :

  • Bachelor’s or master’s degree in computer science or a related technical field. Equivalent experience will be considered
  • 4+ years of hands-on development & operational experience with building and maintaining infrastructure in AWS
  • Extensive performance monitoring, troubleshooting & tuning experience
  • Experience with AWS services and hands-on knowledge of hosting on Cloud
  • Experience with scripting languages for DevOps automation
  • Experience with any one of the programming languages : Java / Python / Ruby
  • Knowledge of Docker & Kubernetes, ArgoCD,
  • Experience with monitoring and observability using Splunk, Wavefront, AppDynamics, Prometheus, Tracing, etc
  • 1 day ago
Related jobs
E-Solutions
California, United States

Site Reliability Engineer (SRE). We are seeking a skilled Site Reliability Engineer (SRE) to join our dynamic team. You will be responsible for ensuring the availability and reliability of our SaaS products, which host customer data and require 24x7 uptime. Ensure the reliability, availability, and ...

OKX
San Jose, California

As a Site Reliability Engineer, you'll be critical to helping engineering teams at OKX design, deploy, and manage reliable software across all our development and production environments. Applicants should apply via Okcoin and OKX internal or external careers site. ...

Adobe
San Jose, California

We have a phenomenal opportunity for a Site Reliability Engineer to join our RTCDP team. Experience working as a Site Reliability Engineer or in a similar role. From the moment you wake up in the morning until you go to bed at night, consider the media you consume, the adverts you see, the apps you ...

SmartThings
Mountain View, California

SmartThings is seeking a Staff Site Reliability Engineer to be the technical leader on a newly formed SRE team whose mission is to drive platform reliability and operations improvements across critical areas such as availability, latency, efficiency, capacity, change management, monitoring, and inci...

Zscaler
San Jose, California

We're looking for an experienced Site Reliability Engineer to join our Site Reliability Engineering team. Reporting to the Manager- Site Reliability Engineering, you'll be responsible for:. Our Engineering team built the world's largest cloud security platform from the ground up, and we keep buildin...

ByteDance
San Jose, California

Our data infrastructure Site Reliability Engineering (SRE) team is a pioneer in innovation. Establish sustainable mechanisms for scaling systems, such as automation, to drive enhancements in reliability, efficiency, and velocity. ...

Palo Alto Networks
Santa Clara, California

As a Senior Staff DevOps Engineer for the CDL/SLS team, you will be part of a team supporting the services running on this infrastructure. This includes automation, architecture, performance, observability, troubleshooting, security, and reliability. Infrastructure, Operations, DevOps, or System Eng...

Adobe
San Jose, California

The DXUE team works on all aspects of software engineering and is responsible for the entire stack i. Have at least 5 years of experience as SRE in Cloud engineering. You have crafted resilient solutions to ensure reliability. If you have a disability or special need that requires accommodation to n...

DICE
Sunnyvale, California

Position: Site Reliability Engineer(SRE). Balance feature development speed and reliability with well-defined service-level objectives. ...

ByteDance
San Jose, California

Our data infrastructure Site Reliability Engineering (SRE) team is a pioneer in innovation. Establish sustainable mechanisms for scaling systems, such as automation, to drive enhancements in reliability, efficiency, and velocity. ...