Site Reliability Engineer with Azure

Amaze Systems Inc.
Plano, TX, US
Full-time

Job Description

Job Description

Hope you are doing well.

We are looking for Site Reliability Engineer with Azure. Please share your resume if you are interested.

Site Reliability Engineer with Azure

Location : Plano, TX (Onsite from day1)

Full Time

Job Description : Required Skills :

Required Skills :

Site Reliability Engineers for Cloud (Azure) to proactively drive Resiliency as new functionality is launched, by reviewing User Stories & Code changes being performed by Scrum Teams to eliminate weak points of failure, configure Alerting, Monitoring, Dashboarding and impact analysis tools, optimize on-call processes & procedures, documenting "tribal" knowledge, conducting post-incident reviews, and drive lower MTTR reduction.

During outage help provide impact, mitigation and drive our SLAs.

Key Roles & Responsibilities :

Coordinate and guide Cloud migration of microservice based architecture on cloud various environments.

Building and implementing Cloud service for the high availability, performance, monitoring, and incident response.

Enable and Provide infrastructure support for DevOps team including on-prem and Cloud administration.

Implement and enhance Automation framework for delivery of microservices based arch applications using Java, J2EE, Jenkins, Maven, linux,K8s, on both on-prem and in cloud.

Work with Application Developers on a day-to-day basis to collect requirements for next release.

Implement monitoring and alerting creating Dashboards for specific metrics, set thresholds, and trigger alerts based on those thresholds interpret the alerts and automatically heal system.

Perform root cause analysis brainstorming session on incident resolutions provide corrective and preventative measures to perform & avoid or mitigate future incidents working with DevOps teams.

Exercise a high degree of responsibility for the processes, systems, and tools created and managed.

Ability to work across teams to continuously analyze system performance in production, troubleshoot consumer and engineering reported issues, and proactively identify areas in need of optimization.

Work with team to gather requirements, research, evaluate, design, plan, deploy, and support the ELK stack on Linux. Build highly-resilient, high-performance, scalable, and flexible systems.

Requirements :

Azure Cloud / Linux systems administration and scripting / automation experience.

Experience in designing, analyzing, and troubleshooting large-scale distributed systems.

Debug production issues across services and levels of the stack.

Experience with one or more orchestration, deployment tools Azure Resource Manager (ARM), Terraform, Ansible.

Familiarity with Git or other source control systems.

Experience with TFS or Visual Studio Team Services (VSTS).

Experience using tools to create and manage CI (continuous integration) and CD (continuous delivery) pipelines.

Experience working with Microsoft Azure Public Cloud.

PowerShell or Python experience, specifically for systems automation.

RESTful and WebSocket APIs.

Working knowledge of the TCP / IP stack, internet routing and load balancing.

Experience with monitoring alerting using technologies like Log Analytics, Dynatrace Prometheus, Nagios, Kafka.

Experience implementing, designing, deploying Docker, Kubernetes, Serverless (Function or Lambda's).

Previous experience working with geographically-distributed coworkers.

Strong interpersonal communication skills (including listening, speaking, and writing) and ability to work well in a diverse, team-focused environment with other SREs, Engineers, Product Managers, etc.

Bachelor's degree in Computer Science, Information Systems or related field.

3 to 5 years of Azure Cloud Admin or solution architect, Azure certification desirable.

Regards

Krishna Verma

Talent Acquisition Specialist Amaze Systems

5 hours ago
Related jobs
Promoted
Hispanic Technology Executive Council
Irving, Texas

Collaboration with multiple stakeholders and partners across Engineering and Operations as well as partner teams within the wider Citi organization. Certification or formal training in site reliability engineering concepts and practices. You will raise the bar on both our existing products but also ...

Promoted
MSRcosmos LLC
Plano, Texas

Sr Site Reliability Engineer with expertise in AWS Cloud Engineering, 5G RAN Engineering, Network Design and Engineering, 5G Core Engineering. Role: Site Reliability Engineer with SW Development. Sr Site Reliability Engineer leads the solution to any problem or issue with an automation-first mindset...

Promoted
Capital One
Plano, Texas

Lead Platform Engineer, Site Reliability Engineering (SRE). Site Reliability Engineering experience. We are seeking Platform Engineers who are passionate about creating and supporting DevOps tools with emerging technologies to join our team. As a Platform Engineer, you’ll have the opportunity to be ...

Splunk Inc
Texas, United States

Learn more about Splunk careers and how you can become a part of our journey!Role:Splunk is looking for a TechOps Engineer with the ability to provide day-to-day technical expertise for our Splunk Cloud Azure TechOps team and the Splunk organization. As a TechOps Engineer, you will be interfacing wi...

Tyler Technologies
Plano, Texas

This Site Reliability Engineer position is a technical role within the Technical and Cloud Services group that helps ensure the reliability, scalability, and performance of our infrastructure while driving automation and efficiency in our development process. Site Reliability and/or DevOps role, wit...

Bank of America
Plano, Texas

We are seeking a Platform Engineer in support of Network Automation with at least 5-7 years of professional experience to join a team that sustains and enhances platforms, infrastructure, and microservices for network automation. We hire individuals with a broad range of backgrounds and experiences ...

Talent Groups
TX, United States

SRE Lead, Site Reliability Engineer Lead, Lead SRE, DevOps Lead, Senior SRE, Senior Site Reliability Engineer. Collaborate with Dev, QA, and SRE teams to ensure smooth integration and operational efficiency. ...

Splunk Inc
Texas, United States
Remote

Site Reliability Engineers in this role will be engaging with multiple service owners across the platform to teach and implement modern interpretations ofSRE,observability, Chaos Engineering andDevOps. Splunk's Cloud Services group is looking for a Site ReliabilityEngineer to help lead, design and b...

JPMorgan Chase & Co.
Plano, Texas

Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform. Supports the adoption of site reliability engineering best practices within your team. Collaborates with other software engineers and teams to design, devel...

Infosys
TX, United States

At least 4 Years of experience in Site Reliability Engineering, DevSecOps implementation and consulting experience. Candidate must be located within commuting distance of Richardson, Texas or willing to relocate to the area. Strong experience with object orientation, microservices and Distributed Sy...