Search jobs > San Jose, CA > Site reliability engineer

Site Reliability Engineer

Altius Technologies, Inc.
San Jose, CA, United States
Full-time

Creating and supporting automation scripts (shell / ansible / python) for infrastructure deployments, validations and monitoring to improve operational tasksScheduling monitoring scripts using cron and airlfowMonitoring using tools including Dynatrace, Apica, Grafana etcDatabase handling Build CICD pipelines Incident handling and problem management Mandatory Skills : Experience in Ansible / Python Monitoring Tools Dynatrace / Apica / Grafana Required Experience : 14 plus years of IT Infrastructure experience Extensive experience working with linux flavors like rhel / centos os, shells, filesystems and utilitiesExperience in programming languages like Python, ansibleKnowledge of distributed computing and experience working with container orchestration frameworks including on-prem and rancher kubernetes and good knowledge on kubernetes objectsExperience working with Storage, ONTAP is preferable : volume, aggregates, back ups, DR planningExperience scheduling monitoring scripts using cron and airlfowExperience with monitoring tools including Dynatrace, Apica, Grafana etcDatabase knowledge including sql and nosql dbsExperience building CICD pipelines (preferred)Cloud platform knowledge (specifically AWS) is required

8 days ago
Related jobs
Promoted
NVIDIA
Santa Clara, California

Site Reliability Engineering (SRE) is an engineering discipline that involves designing, building, and maintaining large-scale production systems with high efficiency and availability. It encompasses various areas, including software and systems engineering practices, storage, data management, and s...

Promoted
Google
Sunnyvale, California

Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. Master's degree in Computer Science or Engineering. SRE ensures that Google Cloud's services—both our internally critical and our externally-visib...

Promoted
CV Library
Santa Clara, California

Work with development teams to ensure that applications have scalability and reliability built-in from day one - Agile is second nature to you and you're excited to work in scrum teams and represent the SRE perspective. Design and enhance software architecture to improve scalability, service reliabi...

Promoted
Redolent Infotech Pvt. Ltd.
Sunnyvale, California

Azure DevOps Engineer@ Sunnyvale CA. Create frameworks, processes and best practices to be used across Engineering. ...

Promoted
Atlassian
Mountain View, California

As a Site Reliability Engineer (SRE) you will actively work to improve the performance and reliability of services as well as address root causes of incidents and reduce incident rates. Love staying ahead of the growth curve and experimenting with new software and environments? Get on board as an At...

Promoted
Nvidia Corporation
Santa Clara, California

Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination of software and systems engineering practices. Senior Site Reliability Engineer - DGX Cloud. SRE at NVIDI...

Promoted
Bytedance
San Jose, California

Participate in technical operations and rotations in response to performance and reliability issues. Graduate with Bachelor's or Master's degree in Software Development, Computer Science, Computer Engineering, or a related technical discipline. ...

Promoted
Zscaler
San Jose, California

Position: Staff Site Reliability Engineer. Resolve escalations and help prevent reiteration of incidents with process, monitoring and reliability improvements. Relevant experience preferably in an Operations or Engineering environment. ...

Palo Alto Networks
Santa Clara, California

Experience in Site Reliability Engineering, Production Engineering, or DevOps. As a Principal Site Reliability Engineer, you will be part of a team supporting the services running on this infrastructure. This includes automation, architecture, performance, observability, troubleshooting, security, a...

ByteDance
San Jose, California

Our data infrastructure Site Reliability Engineering (SRE) team is a pioneer in innovation. Establish sustainable mechanisms for scaling systems, such as automation, to drive enhancements in reliability, efficiency, and velocity. ...