Site Reliability Engineer - Senior (NE)

Ursus
San Diego, CA
Full-time

Description

  • Hands-on application management and support for AWS cloud environments, including full-stack diagnosis, fault resolution and root cause analysis.
  • Proactive monitoring of production systems and identify issues before service impact.
  • Drive and Implement monitoring tools / metrics / reports for tracking application / service performance.
  • Collaborate with engineering and system teams to drive changes and ensure optimal application performance and resiliency.
  • Lead service and system performance analysis, service capacity planning, and service continuity validation for multiple applications.
  • Identify areas for process automation, and develop automated scripts / tools to for regular operational activities.
  • Review and influence design, architecture, standards, and methods for deploying, monitoring and operating services and applications.
  • Actively participate and / or commit in the execution of tasks required to meet milestones and deliverables set by the SCRUM team throughout the release cycle.
  • Provide rotational on-call support.

Qualifications :

  • BS in Computer Science or equivalent experience
  • 3+ years professional Site Reliability experience operating at scale in high pace environment
  • 4+ years hands-on with AWS, Kubernetes, Infrastructure as Code, monitoring and alerting
  • Experience with building out Kubernetes cluster from scratch preferably using EKS
  • Extensive use of automation for Infrastructure as Code preferably via Terraform
  • Strong development experience in one of these languages Python or Go
  • Experienced user of one or more source code management tools, preferably Git
  • Should have experience with continuous integration, continuous delivery / deployment tools like Jenkins and ArgoCD

IND123

30+ days ago
Related jobs
Promoted
Addison Group
San Diego, California

A gaming and entertainment company is seeking a skilled Site Reliability Engineer for a one-year contract in San Diego, CA. Python, Datadog, Grafana, Kubernetes. BS in Computer Science, Software Engineering, or equivalent experience. ...

Promoted
Fractal
CA, United States

Influence and create new designs, architectures, standards, and methods for supporting the platform. Understand C3 deployment automation flows to upgrade as needed and effectively troubleshoot issues with system updates and upgrades. Work cross-functionally with Services and Engineering teams. Demon...

Promoted
Bayside Solutions
CA, United States

Kubernetes Site Reliability Engineer. You will be responsible for mission-critical cloud systems to maintain constant uptime, scale seamlessly, and allow new applications and services to flourish. We require a highly self-motivated engineer who is passionate about excellence, quality, and detail and...

Tencent
California, US

Are you passionate about gaming and skilled in managing distributed online systems? Uncapped Games is looking for a Site Reliability Engineer like you! Join us in our quest to revolutionize the Real-Time Strategy (RTS) genre with our groundbreaking new game. If you're eager to blend your technical p...

Canonical - Jobs
San Diego, California

As a Senior Site Reliability / Gitops Engineer you will. As an Senior SRE & Gitops engineer you'll be in a unique position to drive operations automation to the next level, both in our own private clouds as well as in the public clouds. Experience working with Kubernetes or other container o...

Sunrise Systems
San Diego, California

Excellent communication skills, with the ability to influence across disciplines, for example: interdepartmental (with fellow engineers), cross-functional (with Marketing, Clinical Representatives, Software Engineers, Test Engineers, Quality Engineers, Manufacturing, Regulatory, etc. Senior Engineer...

Canonical - Jobs
San Diego, California

To become a member of this team, you need to be a software engineer fluent in Python, you need a genuine interest in the full open source infrastructure stack from metal to containers, and you need the ability to work in a high pressure operations environment with mission-critical services for globa...

Raytheon Technologies
Chula Vista, California

As an essential component of our Nacelle engineering lifecycle, our Reliability and Safety Team is responsible for ensuring safety compliance through systematic application of Reliability, System Safety, Lightning Safety and Fire Safety principles to all products under Collins Aerospace, Advanced St...

TALENT Software Services
San Diego, California

Site Reliability Engineer - Platform Support. The team works directly with software engineering teams to deliver services and configurations to enable our company to deliver new experiences and functionality to our millions of PlayStation customers. This SRE role will focus on providing direct, leve...

AppFolio, Inc
San Diego, California

Proven ability to diagnose and monitor performance and reliability issues across the stack: relational databases, networking, OS, containers, load balancers, etc. We are hiring a Senior Infrastructure Engineer with a strong background in database technologies, especially running MySQL at scale. This...