Site Reliability Engineer (SRE)

E-Solutions
California, United States
Permanent
Full-time

Title : Site Reliability Engineer (SRE)

Location : United States (Remote)

Full Time only

Job Overview : We are seeking a skilled Site Reliability Engineer (SRE) to join our dynamic team. The ideal candidate will have hands-on experience in managing and securing cloud infrastructure, with expertise in AWS services, infrastructure as code, and cloud security tools.

You will be responsible for ensuring the availability and reliability of our SaaS products, which host customer data and require 24x7 uptime.

Key Responsibilities :

Cloud Infrastructure Management : Design, deploy, and manage cloud environments using AWS services such as Elastic Beanstalk, CloudFormation, IAM, etc.

Implement infrastructure as code using Terraform for consistent and repeatable deployments. Manage code repositories and CI / CD pipelines with Jenkins and Git.

Reliability & Performance : Ensure the reliability, availability, and performance of cloud-based systems. Develop and implement monitoring strategies using tools like CloudWatch, Grafana, Prometheus, and SolarWinds to ensure early detection of issues.

Optimize system performance and plan for capacity to meet growing demand.

Security & Compliance : Ensure cloud infrastructure complies with regulated security and compliance programs such as FedRAMP, PCI, HIPAA, etc.

Utilize security tools like Twistlock and Tenable Nessus to identify vulnerabilities and maintain a secure environment. Conduct regular security scans, patching, and upgrades to maintain compliance and security posture.

Incident Response & On-Call Duty : Participate in on-call rotations to respond to incidents, troubleshoot issues, and ensure service availability within defined SLAs.

Handle escalations in the on-call process, ensuring timely response and resolution of critical issues. Field internal requests related to system access and availability, ensuring issues are addressed promptly.

Collaboration & Change Management : Work closely with engineering, compliance, and change management teams to plan and execute changes in the cloud environment.

Create and manage change tickets, ensuring thorough documentation and adherence to change management processes.

Compensation & Benefits :

  • Salary : $$$ Open to discuss
  • Health Insurance
  • Dental Insurance
  • Permanent Opportunity

Required Skills :

  • Proficiency in AWS services (Elastic Beanstalk, CloudFormation, IAM,
  • Strong experience with infrastructure as code tools, particularly Terraform.
  • Scripting skills in Python, Bash, and PowerShell.
  • Experience with CI / CD tools like Jenkins and Git.
  • Knowledge of security and compliance standards (FedRAMP, PCI, HIPAA).
  • Familiarity with cloud security tools such as Twistlock and Tenable Nessus.
  • Experience with monitoring and incident management tools (CloudWatch, Grafana, Prometheus, SolarWinds, PagerDuty).
  • Ability to participate in on-call rotations and handle 24x7 support responsibilities.

Preferred Qualifications :

  • Previous experience working in a SaaS environment.
  • Strong problem-solving skills and the ability to troubleshoot complex issues in a cloud environment.
  • Excellent communication skills and ability to work collaboratively with cross-functional teams.

Disclaimer : E-Solutions Inc. provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, disability, genetic information, marital status, amnesty, or status as a covered veteran in accordance with applicable federal, state and local laws.

We especially invite women, minorities, veterans, and individuals with disabilities to apply. EEO / AA / M / F / Vet / Disability.

30+ days ago
Related jobs
Promoted
VirtualVocations
Burbank, California

A company is looking for a Staff Site Reliability Engineer to play a key role in site reliability engineering and cloud operations of global cloud infrastructure. ...

Promoted
Avetta (formerly PICS)
Tustin, California

Join Avetta as a Site Reliability Engineer. Site Reliability Engineers are pioneers of the production systems, we believe in proactive discovery and analysis of our entire stack, continually optimizing, tuning, and scaling the system for maximal end-user experience on a globally distributed cloud-ba...

Promoted
VirtualVocations
Burbank, California

A company is looking for a Site Reliability Engineer II to join their Platform and Site Reliability engineering team. ...

Promoted
SpaceX
Hawthorne, California

GNC Site Reliability Engineer to operate and scale custom-built mission-critical products for Guidance Navigational and Control (GNC). Bachelor's degree in computer science, information systems/IT, engineering, math, or scientific discipline and 5 years of software development experience OR 7+ years...

Zetachain
San Francisco, California
Remote

Site Reliability Engineer to join our team and run critical infrastructure for our blockchain and web applications. DevOps Engineer/SRE Transitioning to Blockchain. An experienced DevOps Engineer or SRE looking to pivot into the blockchain sector. Ensure all processes meet our security, performance,...

Fractal
CA, United States

Must be willing to participate in on-call rotationWork cross-functionally with Services and Engineering teams. ...

Work Truck Solutions
Chico, California

Site Reliability Engineer or similar role. You’ll work closely with the development and infrastructure teams to enhance the reliability, scalability, and performance of our Azure-based systems. Collaborate with development teams to improve application reliability and cloud integration. Require...

NVIDIA
Santa Clara, California
Remote

Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination of software and systems engineering practices. SRE at NVIDIA ensures that our internal and external facin...

AppFolio, Inc
San Diego, California

You’ll collaborate with engineering and platform teams, helping them to improve the reliability and quality of our infrastructure. We are hiring a Senior Infrastructure Engineer with a strong background in database technologies, especially running MySQL at scale. You’re experienced with cloud-based ...

ByteDance
San Jose, California

Participate in technical operations and rotations in response to performance and reliability issues. ...