Search jobs > Mountain View, CA > Site reliability engineer

Site Reliability Engineer

Altimetrik
Mountain View, CA, United States
Full-time

Design, implement, and maintain complex data systems supporting millions of customers with Cloud Native principles and best practices to ensure highly available, secure, performant and scalable database systems

  • Build and maintain CI / CD pipelines in Jenkins
  • Build and deploy services in Kubernetes cluster using helm, kustomize, etc
  • Contribute to infrastructure changes to AWS with deep understanding of AWS services
  • Engage in on-call for pre-production and production systems supporting multi-million users
  • Write / Review RCA docs to prevent recurrence of Incidents in future and share the learnings
  • Contribute to major system upgrades, deployment automation, monitoring enhancements and Production changes
  • Create operational playbooks, contribute to how-to articles, and gain domain knowledge to drive changes in the team
  • Participate and contribute in FMEA / Chaos testing, Security remediations, etc
  • Share best practices and patterns for operational excellence and cost optimization
  • Reduce or eliminate manual steps by automating as much as possible
  • Continuously look for opportunities to increase developer velocity and productivity

Qualifications :

  • Bachelor’s or master’s degree in computer science or a related technical field. Equivalent experience will be considered
  • 4+ years of hands-on development & operational experience with building and maintaining infrastructure in AWS
  • Extensive performance monitoring, troubleshooting & tuning experience
  • Experience with AWS services and hands-on knowledge of hosting on Cloud
  • Experience with scripting languages for DevOps automation
  • Experience with any one of the programming languages : Java / Python / Ruby
  • Knowledge of Docker & Kubernetes, ArgoCD,
  • Experience with monitoring and observability using Splunk, Wavefront, AppDynamics, Prometheus, Tracing, etc
  • 22 hours ago
Related jobs
Promoted
Apple
Cupertino, California

At least 5 years in a Site Reliability Engineering, DevOps or infrastructure focused role. The Apple Services Engineering (ASE) team is one of the most exciting examples of Apple's long-held passion for combining art and technology. These engineers build secure, end-to-end solutions. Thanks to Apple...

Palo Alto Networks
Santa Clara, California

We are seeking development heavy Site Reliability Engineers to design, build, maintain, and scale production services and server farms within our FedRAMP SASE product portfolio in. We want passionate engineers who bring new ideas in all facets of DevOps. Collaboration and partnership are at the foun...

Palo Alto Networks
Santa Clara, California

DevOps Engineer (or equal role) with a passion for technology and strong motivation and responsibility for high reliability and service level. We are seeking experienced senior level Software Engineers to develop and deliver next-generation technologies within our Prisma Access Edge Platform team. W...

Hireio, Inc.
San Jose, California

Site Reliability Engineering(SRE) team. Scale systems sustainably through mechanisms such as automation; evolve systems reliability, efficiency, and velocity by pushing for changes. ...

Groq
Mountain View, California

Site Reliability Engineer, Distributed Systems. Specifically engineered for the demands of large language models (LLMs), the Language Processing Unit outpaces the GPU in speed, power, efficiency, and cost-effectiveness. Some roles may require being located near our primary sites, as indicated in the...

Bytedance
San Jose, California

Site Reliability Engineers (SRE) of the Applied Machine Learning (AML) team combines system engineering and the art of machine learning to develop and run massively distributed AI/recommendation systems around the world. On our site reliability engineering team, you'll have the opportunity to sharpe...

Xlysi
Santa Clara, California

Role: Site Reliability Engineer with focus on Network/Security. ...

Adobe Inc.
San Jose, California

Adobe's Reliability Engineering team is looking for a Site Reliability Engineer (SRE) to help build and operate services like Adobe Sign. You have a track record as a site reliability engineer or eager to build a career in large-scale SaaS businesses, and a strong desire to implement initiatives and...

Nvidia Corporation
Santa Clara, California

Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination of software and systems engineering practices. Senior Site Reliability Engineer - DGX Cloud. SRE at NVIDI...

Bytedance
San Jose, California

Participate in technical operations and rotations in response to performance and reliability issues. Graduate with Bachelor's or Master's degree in Software Development, Computer Science, Computer Engineering, or a related technical discipline. ...