Site Reliability Engineer 2
Description :
Reason : Special Project Department : Stock US Eng 6 Months
Job Summary
We are seeking a skilled Site Reliability Engineer (SRE) Level 2 to join our dynamic team. The ideal candidate will have a strong technical background, excellent problem-solving skills, and a passion for enhancing system reliability and performance.
You will play a crucial role in monitoring, automating, and optimizing our infrastructure to ensure the seamless operation of our services.
Key Responsibilities :
- System Monitoring and Incident Response : Monitor system health, performance metrics, and availability. Respond promptly to incidents and outages, ensuring minimal downtime.
- Infrastructure Management : Manage and optimize both cloud and on-premise infrastructure using Infrastructure as Code (IaC) tools.
- Automation : Develop and maintain automation scripts and tools to enhance operational efficiency and reduce manual tasks.
- Collaboration : Work closely with development teams to implement CI / CD practices and improve deployment processes.
- Capacity Planning : Analyze usage patterns and forecast capacity needs to ensure system scalability and reliability.
- Documentation : Create and maintain comprehensive documentation for systems, processes, and incident response protocols.
- Security Best Practices : Implement and enforce security measures to protect infrastructure and data.
- Post-Incident Reviews : Conduct post-mortems on incidents to identify root causes and implement corrective actions.
Required Skills :
- Strong knowledge of Linux / Unix systems and proficiency in scripting languages (e.g., Python, Bash).
- Familiarity with cloud platforms (e.g., AWS) and their services.
- Experience with container orchestration (e.g., Kubernetes, Docker).
- Proficiency in using monitoring and alerting tools (e.g., Prometheus, Grafana, Nagios).
- Experience with version control systems (e.g., Git).
- Strong troubleshooting skills with the ability to diagnose complex system issues.
- Excellent verbal and written communication skills for collaboration with cross-functional teams.
- Understanding of Agile development practices and methodologies.
Experience :
1-4 years of experience in Site Reliability Engineering or a similar role.
Location : Newton, MA
Newton, MA
Schedule :
- Start Date : 10 / 14 / 2024
- Estimated End Date : 04 / 07 / 2025
- Hours Per Week : 40.00
- Hours Per Day : 8.00
Compensation :
Up to $38.73 / hr. (W2 / Non-Exempt)
33897117
10 days ago