Site Reliability Engineer - Senior (NE)

Ursus

San Diego, CA

Full-time

Description

Hands-on application management and support for AWS cloud environments, including full-stack diagnosis, fault resolution and root cause analysis.
Proactive monitoring of production systems and identify issues before service impact.
Drive and Implement monitoring tools / metrics / reports for tracking application / service performance.
Collaborate with engineering and system teams to drive changes and ensure optimal application performance and resiliency.
Lead service and system performance analysis, service capacity planning, and service continuity validation for multiple applications.
Identify areas for process automation, and develop automated scripts / tools to for regular operational activities.
Review and influence design, architecture, standards, and methods for deploying, monitoring and operating services and applications.
Actively participate and / or commit in the execution of tasks required to meet milestones and deliverables set by the SCRUM team throughout the release cycle.
Provide rotational on-call support.

Qualifications :

BS in Computer Science or equivalent experience
3+ years professional Site Reliability experience operating at scale in high pace environment
4+ years hands-on with AWS, Kubernetes, Infrastructure as Code, monitoring and alerting
Experience with building out Kubernetes cluster from scratch preferably using EKS
Extensive use of automation for Infrastructure as Code preferably via Terraform
Strong development experience in one of these languages Python or Go
Experienced user of one or more source code management tools, preferably Git
Should have experience with continuous integration, continuous delivery / deployment tools like Jenkins and ArgoCD

IND123

30+ days ago

Related jobs

Promoted

Site Reliability Engineer

Addison Group

San Diego, California

A gaming and entertainment company is seeking a skilled Site Reliability Engineer for a one-year contract in San Diego, CA. Python, Datadog, Grafana, Kubernetes. BS in Computer Science, Software Engineering, or equivalent experience. ...

Promoted

Site Reliability Engineer

Fractal

CA, United States

Influence and create new designs, architectures, standards, and methods for supporting the platform. Understand C3 deployment automation flows to upgrade as needed and effectively troubleshoot issues with system updates and upgrades. Work cross-functionally with Services and Engineering teams. Demon...

Promoted

Kubernetes Site Reliability Engineer

Bayside Solutions

CA, United States

Kubernetes Site Reliability Engineer. You will be responsible for mission-critical cloud systems to maintain constant uptime, scale seamlessly, and allow new applications and services to flourish. We require a highly self-motivated engineer who is passionate about excellence, quality, and detail and...

Uncapped Games - Senior Site Reliability Engineer (SRE/DevOps)

Tencent

California, US

Are you passionate about gaming and skilled in managing distributed online systems? Uncapped Games is looking for a Site Reliability Engineer like you! Join us in our quest to revolutionize the Real-Time Strategy (RTS) genre with our groundbreaking new game. If you're eager to blend your technical p...

Senior Site Reliability / Gitops Engineer

Canonical - Jobs

San Diego, California

As a Senior Site Reliability / Gitops Engineer you will. As an Senior SRE & Gitops engineer you'll be in a unique position to drive operations automation to the next level, both in our own private clouds as well as in the public clouds. Experience working with Kubernetes or other container o...

Senior Engineer - Systems / Reliability

Sunrise Systems

San Diego, California

Excellent communication skills, with the ability to influence across disciplines, for example: interdepartmental (with fellow engineers), cross-functional (with Marketing, Clinical Representatives, Software Engineers, Test Engineers, Quality Engineers, Manufacturing, Regulatory, etc. Senior Engineer...

Site Reliability Engineer, Americas

Canonical - Jobs

San Diego, California

To become a member of this team, you need to be a software engineer fluent in Python, you need a genuine interest in the full open source infrastructure stack from metal to containers, and you need the ability to work in a high pressure operations environment with mission-critical services for globa...

Senior Principal Reliability & Safety Engineer

Raytheon Technologies

Chula Vista, California

As an essential component of our Nacelle engineering lifecycle, our Reliability and Safety Team is responsible for ensuring safety compliance through systematic application of Reliability, System Safety, Lightning Safety and Fire Safety principles to all products under Collins Aerospace, Advanced St...

Site Reliability Engineer - Mid (CPE)

TALENT Software Services

San Diego, California

Site Reliability Engineer - Platform Support. The team works directly with software engineering teams to deliver services and configurations to enable our company to deliver new experiences and functionality to our millions of PlayStation customers. This SRE role will focus on providing direct, leve...

Sr. Site Reliability Engineer - Database Systems

AppFolio, Inc

San Diego, California

Proven ability to diagnose and monitor performance and reliability issues across the stack: relational databases, networking, OS, containers, load balancers, etc. We are hiring a Senior Infrastructure Engineer with a strong background in database technologies, especially running MySQL at scale. This...