Search jobs > Chicago, IL > Site reliability engineer

Site Reliability Engineer

Remotely Inc
Chicago, Illinois, US
$56K-$66K a year
Full-time

This is a remote position.

Please make sure you read the following details carefully before making any applications.

Site Reliability Engineer (1 year experience, remote)

Be part of our future! This job posting builds our talent pool for potential future openings. We'll compare your skills and experience against both current and future needs.

If there's a match, we'll contact you directly. No guarantee of immediate placement, and we only consider applications from US / Canada residents during the application process.

Hiring Type : Full-Time

Base Salary : $56K-$66K Per Annum.

About The Job

As an SRE, you'll troubleshoot and resolve technical issues, optimize performance, and establish reliability-based release management processes.

The SRE role is the practical implementation of DevOps principles, where speed and stability are carefully balanced, and the team acts as versatile problem solvers, filling gaps in knowledge and expertise to ensure efficient software operations.

You will :

  • Apply SRE principles to maintain the reliability, availability, and performance of software systems.
  • Automate deployment processes, configuration management, and CI / CD pipelines to streamline software development and delivery.
  • Plan and assist with the migration of Windows and Linux-based machines to containerized machines.
  • Plan and assist with the overall Disaster Recovery (DR) of the infrastructure and operations (InfraOps).
  • Manage and maintain software infrastructure, ensuring proper configuration, security, and scalability.
  • Perform system administration tasks, monitor system performance, troubleshoot issues, and apply necessary fixes.
  • Act as a versatile problem solver, filling gaps in team knowledge and expertise to ensure smooth and efficient software operations.
  • Facilitate smooth team and project transitions, providing guidance, training, and support for development teams to manage their infrastructure independently.
  • Develop a reliability rating system to assess team and project performance, collecting and analyzing metrics to evaluate adherence to best practices.
  • Respond quickly and effectively to critical incidents, conducting post-incident reviews to identify root causes and implement preventive measures.
  • Develop and maintain automation tools and scripts to improve operational efficiency.
  • Identify performance bottlenecks and implement optimizations to enhance system response times and resource utilization.
  • Stay up to date with the latest industry trends, technologies, and best practices related to SRE, DevOps, and infrastructure management.
  • Collaborate effectively with cross-functional teams and communicate technical concepts and recommendations clearly to both technical and non-technical stakeholders.
  • Implement a reliability-based release management process, allowing teams with higher reliability scores to perform quick and frequent releases.
  • Proactively identify potential issues and implement preventive measures to reduce incidents and outages.
  • Implement observability practices to detect abnormal behaviors in the software and collect information for effective problem resolution.
  • Set and monitor critical metrics to gain insights into system reliability, including latency, traffic, errors, and saturation levels.
  • Establish Service-Level Objectives (SLOs) and measure Service-Level Indicators (SLIs) to assess the quality-of-service delivery and reliability.
  • Plan, participate, and manage on-call rotations to ensure prompt response to reported software issues.
  • Utilize incident response tools to categorize the severity of reported cases and handle them promptly.
  • Implement configuration management tools to automate software workflows and enhance team productivity.

Projects you could work on :

  • Implementing automated CI / CD pipelines for smooth software deployment.
  • Setting up and maintaining a reliable and scalable cloud infrastructure.
  • Designing and implementing the migration of physical machines to virtual machines.
  • Designing incident response procedures and post-incident review processes.
  • Developing automation tools to streamline repetitive tasks and improve team productivity.
  • Analyzing system performance metrics and optimizing resources for better efficiency.
  • Establishing observability practices to detect and resolve software issues proactively.
  • Defining SLOs and SLIs to assess service quality and reliability across projects.
  • Planning and managing on-call rotations to ensure timely issue resolution.
  • Configuring and maintaining software workflows using configuration management tools.

J-18808-Ljbffr

5 days ago
Related jobs
Splunk Inc
Illinois, United States
Remote

Site Reliability Engineers in this role will be engaging with multiple service owners across the platform to teach and implement modern interpretations ofSRE,observability, Chaos Engineering andDevOps. Splunk's Cloud Services group is looking for a Site ReliabilityEngineer to help lead, design and b...

Enova International
Chicago, Illinois

The Site Reliability engineer will join a team of fellow SRE and Observability engineers, working together to make Enova’s reliability best of breed. As a Site Reliability engineer you will help maintain the reliability of our consumer business from a technology and operational standpoint, and will ...

Venmo
Chicago, Illinois

At PayPal, we continually evolve the business by promoting a culture where solutions from the Engineering team help transform our business through customer-focused incentives, innovative design, and creative excellence. We own Venmo’s infrastructure end-to-end: operations, automation, security and c...

American College of Surgeons
Chicago, Illinois

This role involves handling complex issues, providing high-level technical support, and leading efforts to improve application reliability and performance. ...

Veradigm®
Chicago, Illinois
Remote

As a Senior Site Reliability Engineer, you will bring at least 4-7 years of relevant industry experience, including a minimum of 3 years as a Site Reliability, DevOps Engineer or equivalent. Site Reliability Engineer, DevOps Engineer, or an equivalent position for at least 2-3 years. Stay updated wi...

Early Warning®
Chicago, Illinois

To create and maintain the next generation of application infrastructure and to be responsible for reliability, automation and scalability using the latest best practices. Education or experience equivalent to a Bachelor’s degree in computer science or engineering. ...

Ankura
Chicago, Illinois
Remote

Site Reliability Engineers at Ankura play a pivotal role in supporting hosted Business Applications within the Technology Services Organization by participating in the development, implementation, and management of technology services and solutions. Reporting to the Manager of the Business Applicati...

Compunnel Inc.
Chicago, Illinois

Is this your next job Read the full description below to find out, and do not hesitate to make an application.Financial Services Domain , predominantly Wealth management.SRE Tools / Technologies : Hands on Knowledge of 3-5 years.IaaC – Terraform, Yaml, Liquibase etc.Azure Cloud Infrastructure – Serv...

Americaneagle.com
Chicago, Illinois

As a Site Reliability Engineer at Americaneagle. Site Issue Resolution: Identify and resolve issues across the entire website flow, including browser, CDN/WAF/Network/Load Balancer, web server/code, database, integrations, and server performance. This role requires a versatile individual comfortable...

Northern Trust Corp
Chicago, Illinois

Should be able to interpret the alerts like SiteScope, Dynatrace, checkpoint etc. ...