Site Reliability Engineer

Altius Technologies, Inc.

San Jose, CA, United States

Full-time

Creating and supporting automation scripts (shell / ansible / python) for infrastructure deployments, validations and monitoring to improve operational tasksScheduling monitoring scripts using cron and airlfowMonitoring using tools including Dynatrace, Apica, Grafana etcDatabase handling Build CICD pipelines Incident handling and problem management Mandatory Skills : Experience in Ansible / Python Monitoring Tools Dynatrace / Apica / Grafana Required Experience : 14 plus years of IT Infrastructure experience Extensive experience working with linux flavors like rhel / centos os, shells, filesystems and utilitiesExperience in programming languages like Python, ansibleKnowledge of distributed computing and experience working with container orchestration frameworks including on-prem and rancher kubernetes and good knowledge on kubernetes objectsExperience working with Storage, ONTAP is preferable : volume, aggregates, back ups, DR planningExperience scheduling monitoring scripts using cron and airlfowExperience with monitoring tools including Dynatrace, Apica, Grafana etcDatabase knowledge including sql and nosql dbsExperience building CICD pipelines (preferred)Cloud platform knowledge (specifically AWS) is required

8 days ago

Related jobs

Promoted

Site Reliability Engineer - Software CSG

Apple

Cupertino, California

Support and improve the Hardware Technology engineering environment from design through deployment, including additional refinement and scale-up to support future growth - Support the day-to-day operations of the environment including monitoring, measuring, and troubleshooting infrastructure and ser...

Promoted

Senior Site Reliability Engineer - Storage

NVIDIA

Santa Clara, California

Join our team at NVIDIA as a Senior Site Reliability Engineer focused on HPC storage and play a crucial role in designing, implementing, and optimizing on-prem High-Performance Computing (HPC) storage solutions while harnessing the power of cloud computing. You will collaborate closely with engineer...

Promoted

Senior Software Engineer, Site Reliability Engineering, Google Cloud

Google

Sunnyvale, California

Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. We're proud to be our engineers' engineers and love voiding warranties by taking things apart so we can rebuild them. Master's degree in Computer ...

Promoted

Site Reliability Engineer

CV Library

San Jose, California

Extensive experience working with Linux flavors like RHEL/CentOS OS, shells, filesystems, and utilities.Knowledge of distributed computing and experience working with container orchestration frameworks including on-prem and Rancher Kubernetes, with good knowledge of Kubernetes objects.Experience wor...

Sr. Site Reliability Engineer

Zoom

San Jose, California

You will also design and implement reliability best practices to accomplish a highly available service ( Additionally, you will identify and fix problems in Kubernetes operators, submitting code fixes to OSS if needed. ...

Site Reliability Engineer, Infrastructure and Assurance Services - USDS

TikTok

Mountain View, California

The teams within USDS that deliver on this commitment daily span across Trust & Safety, Security & Privacy, Engineering, User & Product Ops, Corporate Functions and more. Preferred Qualifications:-Master's degree in Computer Science, Engineering or a related field. ...

Site Reliability Engineer Intern (Cloud and System)- 2025 Summer (BS/MS)

ByteDance

San Jose, California

TEAM INTRODUCTIONThe Infrastructure Engineering team supports the company's fast growth by building and operating hyperscale systems and infrastructures. Participate in technical operations and rotations in response to performance and reliability issues. Demonstrated software engineering experience ...

Senior Site Reliability Engineer

Optomi

CA, United States

Senior Site Reliability Engineer (SRE). The ideal candidate will be responsible for enhancing the reliability and performance of our complex, large-scale enterprise applications, while collaborating with various engineering teams. Collaborate and provide technical leadership across engineering teams...

Site Reliability Engineer, TikTok Ads- USDS

TikTok

Mountain View, California

Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed services and infrastructures. As a site reliability engineer in the Ads data platform area, you will have the opportunity to manage the services and infrastructures in one...

Senior Site Reliability Engineer - Storage Engineering

Mountain View, California

LinkedIn is looking to hire Senior Site Reliability Engineer within the production Storage Engineering group. DWDM, CWDM, MMF, SMF, SR, LR, ZR, SONET, MPLS)· Software engineering skills with efficient, maintainable and testable C/C++/Python· Experience deploying storage for shared-nothing applicatio...

Site Reliability Engineer

Site Reliability Engineer - Software CSG

Senior Site Reliability Engineer - Storage

Senior Software Engineer, Site Reliability Engineering, Google Cloud

Site Reliability Engineer

Sr. Site Reliability Engineer

Site Reliability Engineer, Infrastructure and Assurance Services - USDS

Site Reliability Engineer Intern (Cloud and System)- 2025 Summer (BS/MS)

Senior Site Reliability Engineer

Site Reliability Engineer, TikTok Ads- USDS

Senior Site Reliability Engineer - Storage Engineering

Related searches