Title : Site Reliability Engineer (SRE)
Location : United States (Remote)
Full Time only
Job Overview : We are seeking a skilled Site Reliability Engineer (SRE) to join our dynamic team. The ideal candidate will have hands-on experience in managing and securing cloud infrastructure, with expertise in AWS services, infrastructure as code, and cloud security tools.
You will be responsible for ensuring the availability and reliability of our SaaS products, which host customer data and require 24x7 uptime.
Key Responsibilities :
Cloud Infrastructure Management : Design, deploy, and manage cloud environments using AWS services such as Elastic Beanstalk, CloudFormation, IAM, etc.
Implement infrastructure as code using Terraform for consistent and repeatable deployments. Manage code repositories and CI / CD pipelines with Jenkins and Git.
Reliability & Performance : Ensure the reliability, availability, and performance of cloud-based systems. Develop and implement monitoring strategies using tools like CloudWatch, Grafana, Prometheus, and SolarWinds to ensure early detection of issues.
Optimize system performance and plan for capacity to meet growing demand.
Security & Compliance : Ensure cloud infrastructure complies with regulated security and compliance programs such as FedRAMP, PCI, HIPAA, etc.
Utilize security tools like Twistlock and Tenable Nessus to identify vulnerabilities and maintain a secure environment. Conduct regular security scans, patching, and upgrades to maintain compliance and security posture.
Incident Response & On-Call Duty : Participate in on-call rotations to respond to incidents, troubleshoot issues, and ensure service availability within defined SLAs.
Handle escalations in the on-call process, ensuring timely response and resolution of critical issues. Field internal requests related to system access and availability, ensuring issues are addressed promptly.
Collaboration & Change Management : Work closely with engineering, compliance, and change management teams to plan and execute changes in the cloud environment.
Create and manage change tickets, ensuring thorough documentation and adherence to change management processes.
Compensation & Benefits :
- Salary : $$$ Open to discuss
- Health Insurance
- Dental Insurance
- Permanent Opportunity
Required Skills :
- Proficiency in AWS services (Elastic Beanstalk, CloudFormation, IAM,
- Strong experience with infrastructure as code tools, particularly Terraform.
- Scripting skills in Python, Bash, and PowerShell.
- Experience with CI / CD tools like Jenkins and Git.
- Knowledge of security and compliance standards (FedRAMP, PCI, HIPAA).
- Familiarity with cloud security tools such as Twistlock and Tenable Nessus.
- Experience with monitoring and incident management tools (CloudWatch, Grafana, Prometheus, SolarWinds, PagerDuty).
- Ability to participate in on-call rotations and handle 24x7 support responsibilities.
Preferred Qualifications :
- Previous experience working in a SaaS environment.
- Strong problem-solving skills and the ability to troubleshoot complex issues in a cloud environment.
- Excellent communication skills and ability to work collaboratively with cross-functional teams.
Disclaimer : E-Solutions Inc. provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, disability, genetic information, marital status, amnesty, or status as a covered veteran in accordance with applicable federal, state and local laws.
We especially invite women, minorities, veterans, and individuals with disabilities to apply. EEO / AA / M / F / Vet / Disability.