Exciting opportunity in Sandy Springs, GA! This company sells software for an e-commerce website focused in the retail industry.
They are seeking an experienced Azure Site Reliability Engineer to join their team. This is an On-Site position and is a full-time role.
In this role, you'll work with cutting-edge technologies such as Azure Services and Datadog!
Our client is looking for hard-working individuals who work well on a team. Here, you will have the chance to grow your skills, work on meaningful projects, and enjoy a supportive work-life balance.
If you are ready to grow your skills, then this is the place for you!
Required Skills & Experience
- Proficiency with Azure services
- Strong experience with Datadog
- 5+ YOE with Site Reliability
- Proficient in scripting languages like Python, PowerShell, or Bash
- Strong skills in diagnosing, troubleshooting, and optimizing system performance issues across large-scale environments.
Desired Skills & Experience
- Knowledge of Datadog integrations for Azure services, Kubernetes, and CI / CD pipeline monitoring.
- Familiarity with managing and optimizing databases such as Azure SQL, Cosmos DB, or MySQL.
- Knowledge of SRE principles such as error budgets, automation, and incident postmortems.
- Familiarity with IaC (Terraform and Ansible)
- Understanding of compliance standards (ISO, SOC 2, GDPR) and security practices specific to cloud environments.
What You Will Be Doing
Tech Breakdown :
- Core services : Azure Kubernetes Service (AKS), Azure Functions, Azure App Services.
- Set up Datadog to monitor Azure resources, including Virtual Machines, AKS clusters, and storage accounts.
- Use Datadog’s dashboards and anomaly detection features to proactively detect and resolve system issues before they impact users.
- Monitor deployments through Datadog to detect any application errors or performance issues introduced during updates.
- Develop and optimize CI / CD pipelines for efficient, reliable application deployment.
Daily Responsibilities
- Automate resource provisioning and deployment with IaC tools like Terraform or ARM templates.
- Continuously monitor Azure infrastructure and applications using Datadog for performance, uptime, and resource utilization.
- Use Infrastructure as Code (IaC) tools like Terraform or Ansible to provision, update, and manage cloud infrastructure.
- Develop, maintain, and improve CI / CD pipelines to automate Docker image builds and Kubernetes deployments.
- Respond to system alerts, production issues, and incidents. Work to resolve outages quickly and perform root cause analysis to prevent future incidents.
The Offer
Bonus OR Commission eligible
You will receive the following benefits :
- Medical, Dental, and Vision Insurance
- Vacation Time
- 401(k) with a company match, commuter benefits, paid holidays, PTO, quarterly bonuses, and more
- Health Insurance
Applicants must be currently authorized to work in the US on a full-time basis now and in the future.