Search jobs > Denver, CO > Senior site reliability

Senior Site Reliability Engineer

Procare Solutions
Denver, Colorado, US
$110K-$135K a year
Full-time

About Procare

In order to make an application, simply read through the following job description and make sure to attach relevant documents.

Our mission is to simplify childcare operations and create meaningful connections by providing technology, expertise, and unparalleled service.

Procare Solutions is the #1 name in childcare software used by more than 35,000 childcare businesses across the country. For over 30 years, childcare professionals have looked to Procare to provide real-time information for making critical decisions, maintaining compliance with local and state regulations, and adhering to business best practices.

We make childcare management run smoothly, so that our customers can spend more time focusing on the kiddos, not back office administrative duties.

A Little About The Role

We are seeking a highly skilled and experienced Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a deep understanding and extensive experience working with AWS, a thorough knowledge of the Linux operating system, and a robust background in managing and optimizing infrastructure and services in a cloud environment.

As an SRE, you will be responsible for maintaining the reliability, availability, and performance of our applications and infrastructure.

What You Will Do

  • Infrastructure Management : Design, implement, and maintain scalable, reliable, and secure AWS infrastructure using best practices.
  • Monitoring & Alerting : Develop and maintain monitoring, logging, and alerting solutions to ensure the health and performance of our systems.

Utilize tools such as New Relic, AWS CloudWatch, Prometheus, Grafana, and ELK stack.

  • Automation & Scripting : Automate infrastructure provisioning, configuration, and deployment processes using tools like Terraform, CloudFormation, and Ansible.
  • Incident Management : Respond to and resolve production incidents, conduct root cause analysis, and implement corrective measures to prevent recurrence.
  • Performance Optimization : Continuously analyze system performance and implement tuning improvements to enhance the overall efficiency and scalability of the infrastructure.
  • Security Compliance : Ensure all systems and infrastructure comply with security best practices and policies. Implement and manage IAM roles and policies, VPC configurations, and security groups.
  • Collaboration : Work closely with development teams to integrate reliability into the software development lifecycle, including CI / CD pipeline management using tools such as Jenkins or AWS CodePipeline.
  • Documentation : Maintain comprehensive documentation of infrastructure, processes, and incident reports to ensure knowledge sharing and transparency.

Our Ideal Candidate Will Have

  • AWS Expertise : Minimum 5 years' of hands-on experience with AWS services including EC2, S3, RDS, Lambda, ECS / EKS, CloudFormation, CloudWatch, VPC, and IAM.
  • Linux Expertise : Deep knowledge and extensive experience with Linux operating systems, including system administration, shell scripting, and troubleshooting.
  • SRE Tools & Technologies : Familiarity with common SRE-related services and tools such as Kubernetes, Docker, Prometheus, Grafana, Elasticsearch, Logstash, Kibana (ELK), and Splunk.
  • Automation & Configuration Management : Proficiency in infrastructure as code (IaC) tools like Terraform, Ansible, and CloudFormation.
  • Monitoring & Logging : Experience with monitoring and logging solutions, including setting up metrics, creating dashboards, and alerts.
  • Networking : Strong understanding of networking concepts, including DNS, load balancing, VPN, firewalls, and network security.
  • Programming & Scripting : Proficiency in at least one programming / scripting language such as Python, Go, or Bash.
  • Problem-Solving : Excellent problem-solving skills with a proactive and analytical approach to resolving issues.
  • Communication : Strong written and verbal communication skills, with the ability to collaborate effectively with cross-functional teams.
  • Certifications : AWS Certified Solutions Architect Professional, AWS Certified DevOps Engineer, or similar certifications.
  • DevOps Engineering Background : Experience in DevOps engineering, including continuous integration and continuous deployment (CI / CD) practices and tools.
  • Experience : Previous experience in a similar SRE role within a large-scale, complex environment.

Why Procare?

  • Excellent comprehensive benefits packages including : medical, dental, & vision plans.
  • HSA option with employer contributions.
  • Vacation time, holidays, sick days, volunteer & personal days.
  • 401K Plan with employer match and immediate vesting.
  • Employee Stock Purchase Plan.
  • Employee Discount Program.
  • Medical, Dependent Care, and Transportation FSA Plans.
  • Company paid Short and Long-Term disability and Life Insurance.
  • RTD EcoPass for all Denver employees.
  • Tuition Reimbursement and continued Professional Development.
  • Fast paced, high energy workplace environment in prime downtown location.
  • Regular company provided meals.

Salary

$110,000-$135,000 / year DOE

Location

While our preference is a candidate located in Denver, CO, this role is open to remote candidates in the following states : AL, AZ, CA, CO, CT, FL, GA, ID, IL, IN, IA, KY, ME, MD, MA, MI, MN, MO, NV, NJ, NY, NC, OH, OR, PA, TN, TX, VA, WA, WI.

J-18808-Ljbffr

5 days ago
Related jobs
Promoted
VirtualVocations
Littleton, Colorado

A company is looking for a Site Reliability Engineering (SRE) Solution Architect. ...

StubHub
Denver, Colorado

StubHub is looking for a Senior Site Reliability Engineer (SRE) to design and develop next-generation technologies and complex features. Extensive experience (typically 5+ years) in a site reliability engineering or a related role, demonstrating a strong command of incident management, mitigation, &...

Procare Solutions
Denver, Colorado

We are seeking a highly skilled and experienced Site Reliability Engineer (SRE) to join our team. DevOps Engineering Background: Experience in DevOps engineering, including continuous integration and continuous deployment (CI/CD) practices and tools. As an SRE, you will be responsible for maintainin...

Vertafore
Denver, Colorado

This position reports up to the Site Reliability Manager. Continuous practicing of software engineering best practices that include Agile methodology. A great problem solver by using reverse engineering discipline. BS/MS degree in Computer Science, Engineering, or established professionals with rele...

FRUITION
Denver, Colorado

You will work under the guidance of a Senior Kubernetes Engineer and Senior Developers. Additional work includes improving observability using Prometheus, Grafana, and helping with site migrations onto our Kubernetes platform. Script out mass site migrations from generic hosting providers to Fruitio...

Xcel Energy Inc
Denver, Colorado

Required Bachelor's degree in Engineering from ABET accredited curriculum (or recognized equivalency) 5+ years relevant engineering experience EIT/FE (Engineer in Training/Fundamentals of Engineering) required Ability to demonstrate the unique technical skills and core competencies for this engineer...

Splunk Inc
Colorado, United States

You will partner with senior engineers to solve difficult problems. Learn more aboutSplunkcareers and how you can become a part of our journey!Role:Splunk is looking for a TechOps Engineer with the ability to provide day-to-day technical expertise for our Splunk Cloud Azure TechOps team and the Splu...

Visa Inc.
Littleton, Colorado

Lead Site Reliability Engineer - Cloud Operations. Employees in hybrid roles are expected to work from the office 2-3 set days a week (determined by leadership/site), with a general guidepost of being in the office 50% or more of the time based on business needs. Masters in Computer Science or relat...

Tendencys Innovatios
Littleton, Colorado

Lead Site Reliability Engineer - Cloud Operations. Employees in hybrid roles are expected to work from the office 2-3 set days a week (determined by leadership/site), with a general guidepost of being in the office 50% or more of the time based on business needs. Masters in Computer Science or relat...

Visa
Littleton, Colorado

Site Reliability Engineer is responsible for the support of the Visa HP Non-Stop systems and associated payments applications in a multi-datacenter and multi-processing environment. Site Reliability Engineer will facilitate problem situations with the appropriate management, support groups, and serv...