Search jobs > San Jose, CA > Site reliability engineer
Creating and supporting automation scripts (shell / ansible / python) for infrastructure deployments, validations and monitoring to improve operational tasksScheduling monitoring scripts using cron and airlfowMonitoring using tools including Dynatrace, Apica, Grafana etcDatabase handling Build CICD pipelines Incident handling and problem management Mandatory Skills : Experience in Ansible / Python Monitoring Tools Dynatrace / Apica / Grafana Required Experience : 14 plus years of IT Infrastructure experience Extensive experience working with linux flavors like rhel / centos os, shells, filesystems and utilitiesExperience in programming languages like Python, ansibleKnowledge of distributed computing and experience working with container orchestration frameworks including on-prem and rancher kubernetes and good knowledge on kubernetes objectsExperience working with Storage, ONTAP is preferable : volume, aggregates, back ups, DR planningExperience scheduling monitoring scripts using cron and airlfowExperience with monitoring tools including Dynatrace, Apica, Grafana etcDatabase knowledge including sql and nosql dbsExperience building CICD pipelines (preferred)Cloud platform knowledge (specifically AWS) is required
Senior Site Reliability Engineer - Storage Platform
Site Reliability Engineering (SRE) is an engineering discipline that involves designing, building, and maintaining large-scale production systems with high efficiency and availability. It encompasses various areas, including software and systems engineering practices, storage, data management, and s...
Software Engineer III, Site Reliability Engineering, Google Cloud
Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. Master's degree in Computer Science or Engineering. SRE ensures that Google Cloud's services—both our internally critical and our externally-visib...
Site Reliability Engineer
Work with development teams to ensure that applications have scalability and reliability built-in from day one - Agile is second nature to you and you're excited to work in scrum teams and represent the SRE perspective. Design and enhance software architecture to improve scalability, service reliabi...
Site Reliability Engineer (SRE)
Azure DevOps Engineer@ Sunnyvale CA. Create frameworks, processes and best practices to be used across Engineering. ...
Site Reliability Engineer
As a Site Reliability Engineer (SRE) you will actively work to improve the performance and reliability of services as well as address root causes of incidents and reduce incident rates. Love staying ahead of the growth curve and experimenting with new software and environments? Get on board as an At...
Senior Site Reliability Engineer - DGX Cloud
Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination of software and systems engineering practices. Senior Site Reliability Engineer - DGX Cloud. SRE at NVIDI...
Site Reliability Engineer Graduate (Edge Platform) - 2024 Start (BS/MS)
Participate in technical operations and rotations in response to performance and reliability issues. Graduate with Bachelor's or Master's degree in Software Development, Computer Science, Computer Engineering, or a related technical discipline. ...
Staff Site Reliability Engineer - Federal (US Citizen)
Position: Staff Site Reliability Engineer. Resolve escalations and help prevent reiteration of incidents with process, monitoring and reliability improvements. Relevant experience preferably in an Operations or Engineering environment. ...
Principal Site Reliability Engineer (SASE)
Experience in Site Reliability Engineering, Production Engineering, or DevOps. As a Principal Site Reliability Engineer, you will be part of a team supporting the services running on this infrastructure. This includes automation, architecture, performance, observability, troubleshooting, security, a...
Site Reliability Engineer Graduate (Technical Infrastructure) - 2025 Start (BS/MS)
Our data infrastructure Site Reliability Engineering (SRE) team is a pioneer in innovation. Establish sustainable mechanisms for scaling systems, such as automation, to drive enhancements in reliability, efficiency, and velocity. ...