Search jobs > Atlanta, GA > Site reliability engineer

Site Reliability Engineer

Kobiton
Atlanta, GA, US
Full-time
Quick Apply

What we do :

Kobiton empowers enterprises to accelerate mobile app delivery through manual, automated, and no-code testing on real devices.

Kobiton’s AI-augmented mobile testing platform uniquely delivers one-hour continuous testing and integration. Founded in 2016, Kobiton is venture-backed and headquartered in Atlanta.

At Kobiton, we care a lot about experience - the experience we provide our enterprise customers, the experience our platform enables them to provide their users, and especially the experience we provide our internal customers - our employees.

As we empower enterprises to deliver a better mobile experience, we strive to empower our employees by delivering a better work experience.

We do this by committing to transparency (no, really) and a culture of collaboration, curiosity, and action in which we strive to work well together on things that actually matter.

We also offer benefits that actually matter, including company-paid employee health benefits, self-managed (otherwise known as unlimited PTO), and an annual stipend for employee development through our Growbition program.

Kobiton ranked the 18th Fastest-Growing Company in North America on the 2022 Deloitte Technology Fast 500™ and one of Georgia’s Top 40 Technology Companies in 2022 and 2023.

As one of Atlanta’s Best & Brightest we’re searching for the best and brightest to join our team and help us continue to deliver the best experiences - internally and externally.

What you’ll do :

As a Site Reliability Engineer at Kobiton, you will be responsible for ensuring the reliability, performance, and scalability of our systems and services.

You will work closely with development and operations teams to build and maintain robust infrastructure, automate processes, and troubleshoot complex issues.

Your role is crucial in providing a seamless and reliable experience for our customers.

Key Responsibilities :

System Reliability and Performance : You’ll help us monitor and maintain the reliability, availability, and performance of our data center and AWS cloud systems.

You’ll implement and manage systems to detect and resolve performance issues proactively.

  • Automation and Tools Development : You’ll help automate repetitive tasks and processes to improve efficiency and reduce manual intervention and develop and maintain tools and scripts for system monitoring, deployment, and maintenance.
  • Infrastructure Management : We strive to create remotely managed systems and infrastructure. You will have the opportunity to define and create approaches to automation, fail over, and automated recovery systems.
  • Collaboration : We work closely with our development teams to integrate reliability and performance best practices into the software development lifecycle.

You’ll provide guidance and support to ensure that applications are designed for reliability and scalability.

  • Capacity Planning and Scaling : We pay attention to the performance (and cost) of our systems. You’ll analyze system capacity and forecast future needs and implement scaling strategies to handle growth and ensure system performance under varying loads.
  • Building custom ISO’s : Kobiton offers proprietary software on a range of different hardware configurations giving you the opportunity to work on a variety of Linux distributions, creating bootable ISO infrastructure as well as creating different hardware platforms for customers.

Requirements :

Technical Expertise : You’ll need to have strong experience in systems administration, infrastructure management, and cloud platforms along with proficiency in scripting languages such as Ansible and Bash.

It would be ideal for you to have experience with github actions, terraform, troubleshooting and incident recovery, hardware management, and configuration as code.

  • Experience with container orchestration systems (e.g., Kubernetes, Docker).
  • Understanding of microservices architecture and distributed systems.
  • Experience with Nexus, Github to automate our build and deployment pipelines
  • Experience with managing DNS, DHCP, virtualization platforms (VMware / Proxmox).
  • Monitoring and Incident Management : Experience with monitoring tools such as AWS Grafana Suite and incident management best practices.
  • Collaboration Skills : You’ll have strong communication and collaboration skills, with the ability to work effectively with cross-functional teams and you’ll be expected to provide technical support and guidance across the organization
  • Problem-Solving Abilities : You’ll bring excellent analytical and problem solving skills with a proactive approach to identifying, resolving, and preventing recurring issues.

Benefits

  • 100% company-paid Medical, Dental, & Vision insurance for you and 80% company-paid coverage for your family.
  • Self-Managed Paid Time Off (aka Unlimited PTO).
  • 401(k) Retirement Plan.
  • $1,000 annual stipend for professional development through our Growbiton program.
  • Paid Parental Leave Program, available from day one.
  • Access coffee at Bellwood Coffee Shop and fitness center in the 1776 office for FREE.
  • Quarterly Culture program that provides a variety of team-building, social, educational, and wellness events for all team members on the third Wednesday of each month.

Kobiton is proud to be an equal opportunity employer. We care about our people and celebrate our differences. We want to work with talented, collaborative, and innovative people.

We do not discriminate in hiring or any employment decision based on race, color, religion, national origin, age, sex (including pregnancy, childbirth, or related medical conditions), marital status, ancestry, physical or mental disability, genetic information, veteran status, gender identity or expression, sexual orientation, or other characteristics protected by law.

16 days ago
Related jobs
Promoted
VirtualVocations
Atlanta, Georgia

A company is looking for an Associate Site Reliability Engineer to support identity risk operations and enhance operational efficiency. ...

Promoted
Capital One
Atlanta, Georgia

Sr Lead Site Reliability Engineer - Back End, Shopping (Remote-Eligible). If you have visited our website in search of information on employment opportunities or to apply for a position, and you require an accommodation, please contact Capital One Recruiting at 1-800-304-9102 or via email at. Intere...

Promoted
VirtualVocations
Decatur, Georgia

A company is looking for a Senior Associate Site Reliability Engineer responsible for designing, building, and maintaining infrastructure for highly available solutions. ...

Promoted
Capital One
East Point, Georgia
Remote

Locations: US Remote, United States of AmericaSr Lead Site Reliability Engineer - Back End, Shopping (Remote-Eligible)Interested in joining a dynamic remote-first engineering team in a fast-paced environment full of greenfield problem-solving? Then Capital One Shopping might be the place for you. Wh...

Promoted
VirtualVocations
Atlanta, Georgia

Key Responsibilities:Develop and implement automation solutions to streamline operationsDesign and implement effective monitoring and alerting systemsOwn the incident lifecycle, leading root cause analysis and resolutionRequired Qualifications:Bachelor's degree in Computer Science, Engineering, or a...

Promoted
Cox
Stone Mountain, Georgia

This role is for an opening for a Senior Site Reliability Engineer (SRE) on the Manheim Logistics SRE team. As a Senior Site Reliability Engineer at Cox Automotive you will:. Engage with engineering teams to ensure best practices are implemented. Improve predictability and reliability of software re...

Gusto
Atlanta, Georgia

Staff Site Reliability Engineer. Gusto’s Infrastructure Engineering team enables our product teams to build impactful products by building secure, resilient, and accessible systems, using tools like AWS, terraform, and Kubernetes. Establish standards and build deterministic automation while optimizi...

Microsoft
Atlanta, Georgia

OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administrationOR Master's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experi...

iHeartMedia
Lithia Springs, Georgia

The Senior Site Reliability Engineer will be responsible for leading a talented team of SREs/DevOps Engineers across a wide variety of Cloud Services. Run Reliability Incident management processes along with Root Cause Analysis, developing Runbooks . ...

Cox Automotive
Atlanta, Georgia

Evolve problem statements into actionable items that enable the team to deliver measurable value by staying updated with industry trends, emerging technologies, and best practices in DevOps and Site Reliability Engineering domains in order to shape actionable items for the data services engineering ...