Job Description
Job Description
Hope you are doing well.
We are looking for Site Reliability Engineer with Azure. Please share your resume if you are interested.
Site Reliability Engineer with Azure
Location : Plano, TX (Onsite from day1)
Full Time
Job Description : Required Skills :
Required Skills :
Site Reliability Engineers for Cloud (Azure) to proactively drive Resiliency as new functionality is launched, by reviewing User Stories & Code changes being performed by Scrum Teams to eliminate weak points of failure, configure Alerting, Monitoring, Dashboarding and impact analysis tools, optimize on-call processes & procedures, documenting "tribal" knowledge, conducting post-incident reviews, and drive lower MTTR reduction.
During outage help provide impact, mitigation and drive our SLAs.
Key Roles & Responsibilities :
Coordinate and guide Cloud migration of microservice based architecture on cloud various environments.
Building and implementing Cloud service for the high availability, performance, monitoring, and incident response.
Enable and Provide infrastructure support for DevOps team including on-prem and Cloud administration.
Implement and enhance Automation framework for delivery of microservices based arch applications using Java, J2EE, Jenkins, Maven, linux,K8s, on both on-prem and in cloud.
Work with Application Developers on a day-to-day basis to collect requirements for next release.
Implement monitoring and alerting creating Dashboards for specific metrics, set thresholds, and trigger alerts based on those thresholds interpret the alerts and automatically heal system.
Perform root cause analysis brainstorming session on incident resolutions provide corrective and preventative measures to perform & avoid or mitigate future incidents working with DevOps teams.
Exercise a high degree of responsibility for the processes, systems, and tools created and managed.
Ability to work across teams to continuously analyze system performance in production, troubleshoot consumer and engineering reported issues, and proactively identify areas in need of optimization.
Work with team to gather requirements, research, evaluate, design, plan, deploy, and support the ELK stack on Linux. Build highly-resilient, high-performance, scalable, and flexible systems.
Requirements :
Azure Cloud / Linux systems administration and scripting / automation experience.
Experience in designing, analyzing, and troubleshooting large-scale distributed systems.
Debug production issues across services and levels of the stack.
Experience with one or more orchestration, deployment tools Azure Resource Manager (ARM), Terraform, Ansible.
Familiarity with Git or other source control systems.
Experience with TFS or Visual Studio Team Services (VSTS).
Experience using tools to create and manage CI (continuous integration) and CD (continuous delivery) pipelines.
Experience working with Microsoft Azure Public Cloud.
PowerShell or Python experience, specifically for systems automation.
RESTful and WebSocket APIs.
Working knowledge of the TCP / IP stack, internet routing and load balancing.
Experience with monitoring alerting using technologies like Log Analytics, Dynatrace Prometheus, Nagios, Kafka.
Experience implementing, designing, deploying Docker, Kubernetes, Serverless (Function or Lambda's).
Previous experience working with geographically-distributed coworkers.
Strong interpersonal communication skills (including listening, speaking, and writing) and ability to work well in a diverse, team-focused environment with other SREs, Engineers, Product Managers, etc.
Bachelor's degree in Computer Science, Information Systems or related field.
3 to 5 years of Azure Cloud Admin or solution architect, Azure certification desirable.
Regards
Krishna Verma
Talent Acquisition Specialist Amaze Systems