Site Reliability Engineer

Softworld, a Kelly Company

Atlanta, GA, United States

Full-time

The Cloud Site Reliability Engineer (SRE) works closely with cloud development team, IT operations team and business partners to streamline and implement enhanced monitoring and alerting capability across infrastructure, application layers.

By leveraging automation tools, SREs address and resolve issues, minimizing manual workload and enhancing system scalability and reliability.

Their core focus lies in standardization and automation to build and run fault-tolerant systems. Typically, SREs possess a background in software engineering, system engineering, or system administration, coupled with substantial IT operations experience.

SREs oversee availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning.

Writing and developing code to automate processes, such as analyzing logs, testing production environments and responding to any issues?
Collaborates with agile teams and business partners to develop specifications that resolve problems and enhancement needs, including focusing on monitoring, and metrics for operational readiness
Identify bottlenecks in development and deployment processes and designs automation solutions to mitigate?
Develop new capabilities in displaying / monitoring / alerting on key performance indicators by tracking business transactions in real-time
Maintain and grow knowledge of platform configuration management, monitoring of established metrics, and troubleshooting ?
Provides continuous feedback to development teams on system stability, defect analysis, and system enhancements ?
Design and develop alert escalation and incident response automation?
Provide production support for cloud service outages and incidents and work on both tactical and strategic plans for outage prevention?
Provide feedback on resiliency and maintainability of solutions to Cloud and App architects?
Conduct disaster recovery scenario generation and testing?
Implement sustainable, audit-ready processes that support information technology controls, including deployment execution, access management, audits, incident management and related requirements.

Must-have technical skills :

Should have at least 3 years’ experience as a site reliability engineer on a cross functional agile team working in Azure.
Have working knowledge of agile development methodologies (scrum, sprints, KanBan etc.) and tools (Azure DevOps etc.)
Have at least 3 years hands-on experience using IaC tools Terraform, Github, Ansible and Packer
Proven experience across testing, integration, source code management, deployment and containerization
Sound problem-solving skills with the ability to quickly process complex information and present it clearly and simply?
Experience with cloud technologies and services including those for Compute, Storage, Databases and API Management
On-premise to cloud migration experience

3 days ago

Related jobs

Promoted

Site Reliability Engineer

VirtualVocations

Decatur, Georgia

A company is looking for a Software Engineer (SRE/DevOps) to serve as a subject matter expert in the SRE and DevOps space. ...

Promoted

AWS Site Reliability Engineer

NLB Services

Atlanta, Georgia

Partner with business and technical product owners to set SLOs / SLIs / error budgets to manage reliability of infrastructure and applications Partner with other SREs to bring best practices or learnings from across the organization to them Scale and optimize existing infrastructure and services sus...

Senior or Staff Site Reliability Engineer - Cloud Infrastructure

Circle

Atlanta, Georgia

As a Senior Site Reliability Engineer at Circle, you will design, build, and maintain Circle's infrastructure estate to meet the growing worldwide customer base on public cloud providers across multiple regions. Staff Site Reliability Engineer (IV). Senior Site Reliability Engineer (III). Senior Sit...

Site Reliability Engineer / Remote / Azure

Motion Recruitment

Georgia, United States

Remote

They are currently seeking a Site Reliability Engineer to join their growing team. Microsoft Certified: Azure DevOps Engineer Expert, Azure Administrator Associate). ...

Site-Reliability Engineer II

Cox Automotive

Atlanta, Georgia

Evolve problem statements into actionable items that enable the team to deliver measurable value by staying updated with industry trends, emerging technologies, and best practices in DevOps and Site Reliability Engineering domains in order to shape actionable items for the data services engineering ...

Cloud Senior Site Reliability Engineer

Bank of America

Atlanta, Georgia

Designs solutions to visualize key production support metrics enabling Operational Readiness and Site Reliability Engineer teams to identify scenarios requiring intervention. This job is responsible for partnering with leaders across engineering and technology to define objective reliability goals f...

Site Reliability Engineer / Azure + DataDog / Sandy Springs, GA

Motion Recruitment

Atlanta, Georgia

A well-known Ecommerce shop in the Atlanta area is looking for a Site Reliability Engineer with a strong knowledge of Azure cloud to join their ranks! This company is hiring full-time, in their HQ office space in Dunwoody, GA. You will be automating their Azure infrastructure, building out applicati...

Site Reliability Engineer-FedRAMP, AWS (FULLY REMOTE) - 29122

Splunk Inc

Georgia, United States

Remote

Site Reliability Engineers in this role will be engaging with multiple service owners across the platform to teach and implement modern interpretations ofSRE,observability, Chaos Engineering andDevOps. Splunk's Cloud Services group is looking for a Site ReliabilityEngineer to help lead, design and b...

Senior Site Reliability Engineer II

Motion Recruitment

Atlanta, Georgia

They are looking to add a Senior Site Reliability Engineer to their team to work directly on the SaaS based AI/ML product that essentially runs analytics for communications they manage. Their office is located in Sandy Springs, GA and they do require a 3 days on site hybrid commute. Creatively solve...

Site Reliability Engineer-CTJ-Poly

Microsoft

Atlanta, Georgia

OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administrationOR Master's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experi...

Site Reliability Engineer

Site Reliability Engineer

AWS Site Reliability Engineer

Senior or Staff Site Reliability Engineer - Cloud Infrastructure

Site Reliability Engineer / Remote / Azure

Site-Reliability Engineer II

Cloud Senior Site Reliability Engineer

Site Reliability Engineer / Azure + DataDog / Sandy Springs, GA

Site Reliability Engineer-FedRAMP, AWS (FULLY REMOTE) - 29122

Senior Site Reliability Engineer II

Site Reliability Engineer-CTJ-Poly

Related searches