Search jobs > Atlanta, GA > Site reliability engineer

Site Reliability Engineer

Kobiton
Atlanta, GA, US
Full-time
Quick Apply

What we do :

Kobiton empowers enterprises to accelerate mobile app delivery through manual, automated, and no-code testing on real devices.

Kobiton’s AI-augmented mobile testing platform uniquely delivers one-hour continuous testing and integration. Founded in 2016, Kobiton is venture-backed and headquartered in Atlanta.

At Kobiton, we care a lot about experience - the experience we provide our enterprise customers, the experience our platform enables them to provide their users, and especially the experience we provide our internal customers - our employees.

As we empower enterprises to deliver a better mobile experience, we strive to empower our employees by delivering a better work experience.

We do this by committing to transparency (no, really) and a culture of collaboration, curiosity, and action in which we strive to work well together on things that actually matter.

We also offer benefits that actually matter, including company-paid employee health benefits, self-managed (otherwise known as unlimited PTO), and an annual stipend for employee development through our Growbition program.

Kobiton ranked the 18th Fastest-Growing Company in North America on the 2022 Deloitte Technology Fast 500™ and one of Georgia’s Top 40 Technology Companies in 2022 and 2023.

As one of Atlanta’s Best & Brightest we’re searching for the best and brightest to join our team and help us continue to deliver the best experiences - internally and externally.

What you’ll do :

As a Site Reliability Engineer at Kobiton, you will be responsible for ensuring the reliability, performance, and scalability of our systems and services.

You will work closely with development and operations teams to build and maintain robust infrastructure, automate processes, and troubleshoot complex issues.

Your role is crucial in providing a seamless and reliable experience for our customers.

Key Responsibilities :

System Reliability and Performance : You’ll help us monitor and maintain the reliability, availability, and performance of our data center and AWS cloud systems.

You’ll implement and manage systems to detect and resolve performance issues proactively.

  • Automation and Tools Development : You’ll help automate repetitive tasks and processes to improve efficiency and reduce manual intervention and develop and maintain tools and scripts for system monitoring, deployment, and maintenance.
  • Infrastructure Management : We strive to create remotely managed systems and infrastructure. You will have the opportunity to define and create approaches to automation, fail over, and automated recovery systems.
  • Collaboration : We work closely with our development teams to integrate reliability and performance best practices into the software development lifecycle.

You’ll provide guidance and support to ensure that applications are designed for reliability and scalability.

  • Capacity Planning and Scaling : We pay attention to the performance (and cost) of our systems. You’ll analyze system capacity and forecast future needs and implement scaling strategies to handle growth and ensure system performance under varying loads.
  • Building custom ISO’s : Kobiton offers proprietary software on a range of different hardware configurations giving you the opportunity to work on a variety of Linux distributions, creating bootable ISO infrastructure as well as creating different hardware platforms for customers.

Requirements :

Technical Expertise : You’ll need to have strong experience in systems administration, infrastructure management, and cloud platforms along with proficiency in scripting languages such as Ansible and Bash.

It would be ideal for you to have experience with github actions, terraform, troubleshooting and incident recovery, hardware management, and configuration as code.

  • Experience with container orchestration systems (e.g., Kubernetes, Docker).
  • Understanding of microservices architecture and distributed systems.
  • Experience with Nexus, Github to automate our build and deployment pipelines
  • Experience with managing DNS, DHCP, virtualization platforms (VMware / Proxmox).
  • Monitoring and Incident Management : Experience with monitoring tools such as AWS Grafana Suite and incident management best practices.
  • Collaboration Skills : You’ll have strong communication and collaboration skills, with the ability to work effectively with cross-functional teams and you’ll be expected to provide technical support and guidance across the organization
  • Problem-Solving Abilities : You’ll bring excellent analytical and problem solving skills with a proactive approach to identifying, resolving, and preventing recurring issues.

Benefits

  • 100% company-paid Medical, Dental, & Vision insurance for you and 80% company-paid coverage for your family.
  • Self-Managed Paid Time Off (aka Unlimited PTO).
  • 401(k) Retirement Plan.
  • $1,000 annual stipend for professional development through our Growbiton program.
  • Paid Parental Leave Program, available from day one.
  • Access coffee at Bellwood Coffee Shop and fitness center in the 1776 office for FREE.
  • Quarterly Culture program that provides a variety of team-building, social, educational, and wellness events for all team members on the third Wednesday of each month.

Kobiton is proud to be an equal opportunity employer. We care about our people and celebrate our differences. We want to work with talented, collaborative, and innovative people.

We do not discriminate in hiring or any employment decision based on race, color, religion, national origin, age, sex (including pregnancy, childbirth, or related medical conditions), marital status, ancestry, physical or mental disability, genetic information, veteran status, gender identity or expression, sexual orientation, or other characteristics protected by law.

16 days ago
Related jobs
Promoted
VirtualVocations
Decatur, Georgia

Key Responsibilities:Develop and implement automation solutions to streamline operationsDesign and implement effective monitoring and alerting systemsOwn the incident lifecycle, leading root cause analysis and resolutionRequired Qualifications:Bachelor's degree in Computer Science, Engineering, or a...

Promoted
Capital One
North Druid Hills, Georgia
Remote

Locations: US Remote, United States of AmericaSr Lead Site Reliability Engineer - Back End, Shopping (Remote-Eligible)Interested in joining a dynamic remote-first engineering team in a fast-paced environment full of greenfield problem-solving? Then Capital One Shopping might be the place for you. Wh...

Promoted
VirtualVocations
Atlanta, Georgia

A company is looking for a Senior Site Reliability Engineer to contribute to the operational success and growth of their cloud infrastructure. ...

Promoted
Cox
Atlanta, Georgia

This role is for an opening for a Senior Site Reliability Engineer (SRE) on the Manheim Logistics SRE team. As a Senior Site Reliability Engineer at Cox Automotive you will:. Engage with engineering teams to ensure best practices are implemented. Improve predictability and reliability of software re...

Home Depot
GEORGIA, US
Remote

The Staff Software Engineer is responsible for leading a team of engineers building and designing a product that our customers and associates love. As a Staff Software Engineer, you will be part of a dynamic team with engineers of all experience levels who help each other build and grow technical an...

MTech Systems
Atlanta, Georgia

SRE), you will be tasked with maintaining and enhancing the reliability, performance, and efficiency of our applications. Foster a culture of reliability within the IT infrastructure, continuously seeking opportunities to enhance system performance and cost savings. Bachelor's degree in Computer Sci...

Cox Enterprises
Atlanta, Georgia

This Software Engineer will be part of the Site Reliability Engineering (SRE) team. This entry-level position offers an excellent opportunity for those with a strong software engineering foundation or a degree in computer science to develop their skills in the exciting field of Site Reliability Engi...

Highbrow LLC
Atlanta, Georgia
Remote

Able to clearly communicate: interface between App and DevOps teams.Able to write/troubleshoot Jenkins scripts.IT experience including Design and implementation of.Continuous integration, Continuous delivery, Continuous deployment, and.As well as Unix administration.Proficient in Creating and managi...

Sun Technologies
Atlanta, Georgia
Remote

Bachelors Degree – CS or Engineering. Design and develop tools and processes to aid in improving infrastructure reliability and allow for monitoring and reporting. A hands-on engineer who leads by doing. Be in an on-call rotation to respond to incidents that impact Client availability and provide su...

Gusto
Atlanta, Georgia

Staff Site Reliability Engineer. Gusto’s Infrastructure Engineering team enables our product teams to build impactful products by building secure, resilient, and accessible systems, using tools like AWS, terraform, and Kubernetes. Establish standards and build deterministic automation while optimizi...