Search jobs > San Jose, CA > Site reliability engineer

Site Reliability Engineer

Altius Technologies, Inc.
San Jose, CA, United States
Full-time

Creating and supporting automation scripts (shell / ansible / python) for infrastructure deployments, validations and monitoring to improve operational tasksScheduling monitoring scripts using cron and airlfowMonitoring using tools including Dynatrace, Apica, Grafana etcDatabase handling Build CICD pipelines Incident handling and problem management Mandatory Skills : Experience in Ansible / Python Monitoring Tools Dynatrace / Apica / Grafana Required Experience : 14 plus years of IT Infrastructure experience Extensive experience working with linux flavors like rhel / centos os, shells, filesystems and utilitiesExperience in programming languages like Python, ansibleKnowledge of distributed computing and experience working with container orchestration frameworks including on-prem and rancher kubernetes and good knowledge on kubernetes objectsExperience working with Storage, ONTAP is preferable : volume, aggregates, back ups, DR planningExperience scheduling monitoring scripts using cron and airlfowExperience with monitoring tools including Dynatrace, Apica, Grafana etcDatabase knowledge including sql and nosql dbsExperience building CICD pipelines (preferred)Cloud platform knowledge (specifically AWS) is required

8 days ago
Related jobs
Promoted
Apple
Cupertino, California

Support and improve the Hardware Technology engineering environment from design through deployment, including additional refinement and scale-up to support future growth - Support the day-to-day operations of the environment including monitoring, measuring, and troubleshooting infrastructure and ser...

Promoted
NVIDIA
Santa Clara, California

Join our team at NVIDIA as a Senior Site Reliability Engineer focused on HPC storage and play a crucial role in designing, implementing, and optimizing on-prem High-Performance Computing (HPC) storage solutions while harnessing the power of cloud computing. You will collaborate closely with engineer...

Promoted
Google
Sunnyvale, California

Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. We're proud to be our engineers' engineers and love voiding warranties by taking things apart so we can rebuild them. Master's degree in Computer ...

Promoted
CV Library
San Jose, California

Extensive experience working with Linux flavors like RHEL/CentOS OS, shells, filesystems, and utilities.Knowledge of distributed computing and experience working with container orchestration frameworks including on-prem and Rancher Kubernetes, with good knowledge of Kubernetes objects.Experience wor...

Zoom
San Jose, California

You will also design and implement reliability best practices to accomplish a highly available service ( Additionally, you will identify and fix problems in Kubernetes operators, submitting code fixes to OSS if needed. ...

TikTok
Mountain View, California

The teams within USDS that deliver on this commitment daily span across Trust & Safety, Security & Privacy, Engineering, User & Product Ops, Corporate Functions and more. Preferred Qualifications:-Master's degree in Computer Science, Engineering or a related field. ...

ByteDance
San Jose, California

TEAM INTRODUCTIONThe Infrastructure Engineering team supports the company's fast growth by building and operating hyperscale systems and infrastructures. Participate in technical operations and rotations in response to performance and reliability issues. Demonstrated software engineering experience ...

Optomi
CA, United States

Senior Site Reliability Engineer (SRE). The ideal candidate will be responsible for enhancing the reliability and performance of our complex, large-scale enterprise applications, while collaborating with various engineering teams. Collaborate and provide technical leadership across engineering teams...

TikTok
Mountain View, California

Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed services and infrastructures. As a site reliability engineer in the Ads data platform area, you will have the opportunity to manage the services and infrastructures in one...

LinkedIn
Mountain View, California

LinkedIn is looking to hire Senior Site Reliability Engineer within the production Storage Engineering group. DWDM, CWDM, MMF, SMF, SR, LR, ZR, SONET, MPLS)· Software engineering skills with efficient, maintainable and testable C/C++/Python· Experience deploying storage for shared-nothing applicatio...