Search jobs > Atlanta, GA > Site reliability engineer

Site Reliability Engineer

Softworld, a Kelly Company
Atlanta, GA, United States
Full-time

The Cloud Site Reliability Engineer (SRE) works closely with cloud development team, IT operations team and business partners to streamline and implement enhanced monitoring and alerting capability across infrastructure, application layers.

By leveraging automation tools, SREs address and resolve issues, minimizing manual workload and enhancing system scalability and reliability.

Their core focus lies in standardization and automation to build and run fault-tolerant systems. Typically, SREs possess a background in software engineering, system engineering, or system administration, coupled with substantial IT operations experience.

SREs oversee availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning.

  • Writing and developing code to automate processes, such as analyzing logs, testing production environments and responding to any issues?
  • Collaborates with agile teams and business partners to develop specifications that resolve problems and enhancement needs, including focusing on monitoring, and metrics for operational readiness
  • Identify bottlenecks in development and deployment processes and designs automation solutions to mitigate?
  • Develop new capabilities in displaying / monitoring / alerting on key performance indicators by tracking business transactions in real-time
  • Maintain and grow knowledge of platform configuration management, monitoring of established metrics, and troubleshooting ?
  • Provides continuous feedback to development teams on system stability, defect analysis, and system enhancements ?
  • Design and develop alert escalation and incident response automation?
  • Provide production support for cloud service outages and incidents and work on both tactical and strategic plans for outage prevention?
  • Provide feedback on resiliency and maintainability of solutions to Cloud and App architects?
  • Conduct disaster recovery scenario generation and testing?
  • Implement sustainable, audit-ready processes that support information technology controls, including deployment execution, access management, audits, incident management and related requirements.

Must-have technical skills :

  • Should have at least 3 years’ experience as a site reliability engineer on a cross functional agile team working in Azure.
  • Have working knowledge of agile development methodologies (scrum, sprints, KanBan etc.) and tools (Azure DevOps etc.)
  • Have at least 3 years hands-on experience using IaC tools Terraform, Github, Ansible and Packer
  • Proven experience across testing, integration, source code management, deployment and containerization
  • Sound problem-solving skills with the ability to quickly process complex information and present it clearly and simply?
  • Experience with cloud technologies and services including those for Compute, Storage, Databases and API Management
  • On-premise to cloud migration experience
  • 4 days ago
Related jobs
Promoted
VirtualVocations
Norcross, Georgia

A company is looking for a Senior Associate Site Reliability Engineer responsible for designing, building, and maintaining infrastructure for highly available solutions. ...

Motion Recruitment
Atlanta, Georgia

This environment will appeal to the Engineer that prefers to work on quick-moving Agile teams, where a good idea will be immediately listened to, and (if proven) quickly implemented. In terms of responsibilities, this engineer will be leading Automation integration and Container Security best-practi...

Promoted
VirtualVocations
Norcross, Georgia

A company is looking for a Staff Site Reliability Engineer. Key Responsibilities:Lead and mentor a team of SREs while collaborating with TechOps/Security on security best practicesDesign, implement, and maintain AWS infrastructure to ensure reliability and scalabilityDevelop monitoring systems and a...

Home Depot
GEORGIA, US
Remote

The Staff Software Engineer is responsible for leading a team of engineers building and designing a product that our customers and associates love. As a Staff Software Engineer, you will be part of a dynamic team with engineers of all experience levels who help each other build and grow technical an...

Promoted
VirtualVocations
Decatur, Georgia

A company is looking for a Site Reliability Engineering Architect to lead a team responsible for system reliability, performance, and efficiency. ...

Bank of America
Atlanta, Georgia

We are seeking a Platform Engineer in support of Network Automation with at least 5-7 years of professional experience to join a team that sustains and enhances platforms, infrastructure, and microservices for network automation. BS in Computer Science, Engineering, Management Information Systems, o...

Motion Recruitment
Georgia, United States
Remote

They are currently seeking a Site Reliability Engineer to join their growing team. Microsoft Certified: Azure DevOps Engineer Expert, Azure Administrator Associate). ...

Cox Automotive
Atlanta, Georgia

Evolve problem statements into actionable items that enable the team to deliver measurable value by staying updated with industry trends, emerging technologies, and best practices in DevOps and Site Reliability Engineering domains in order to shape actionable items for the data services engineering ...

Motion Recruitment
Georgia, United States
Remote

They are looking for an eager Site Reliability Engineer to join their team. BS in Computer Science, Engineering, or related field. ...

Experient Group
Atlanta, Georgia

We are seeking an experienced Lead Site Reliability Engineer in guiding and driving a small team of engineers. Must be able to work 3-4 days onsite in Roswell, GA area. Writing code assisting the client engineering team by automating technology infrastructure, system management, and application moni...