We are seeking a talented and experienced Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a strong background in AWS, Windows, and Linux environments.
This role requires expertise in automation, infrastructure as code, and system administration, with a focus on Terraform and Ansible automation.
Key Responsibilities :
- Design, implement, and maintain infrastructure using Terraform.
- Automate configuration management and deployments with Ansible for both Linux and Windows environments.
- Use GitHub Actions for Ansible and Docker builds automation.
- Manage and secure infrastructure with AWS Key Management Service (KMS).
- Monitor system performance and ensure high availability.
- Manage SSL certificates and ensure secure communications.
- Write and maintain scripts using PowerShell and Bash for automation tasks via Ansible / Octo.
- Collaborate with development teams to ensure smooth deployment and operation of applications.
- Troubleshoot and resolve issues in production and non-production environments.
- Implement and manage logging solutions, including Filebeat.
Required Skills :
- Extensive experience with AWS services and architecture.
- Proficiency in Terraform for infrastructure as code.
- Expertise in Ansible for configuration management and automation.
- Experience with Docker and containerization.
- Strong knowledge of GitHub Actions for automation tasks.
- Experience with AWS KMS and security best practices.
- Familiarity with monitoring tools and practices.
- Solid understanding of SSL certificate management.
- Proficient in PowerShell and Bash scripting.
- Knowledge of logging tools and practices, including Filebeat.
- Strong problem-solving skills and attention to detail.
- Excellent communication and collaboration skills.
Nice to Have :
- Experience with Octopus Deploy (Octo) for CI / CD.
- Knowledge of F# programming language.
- Experience with Kubernetes, especially Amazon EKS.
- Familiarity with additional monitoring and logging tools.
12 days ago