Site Reliability Engineer II

Microsoft

Redmond, Washington, United States

$98.3K-$193.2K a year

Full-time

Overview

The Azure Dedicated team plays a unique role in the Azure ecosystem. Through unique integrations of bare metal infrastructure, we are powering many of the latest AI services and innovations for the entire company.

We're seeking an Site Reliability Engineer II to join us in this mission to power the biggest AI training workloads imaginable.

As a Site Reliability Engineer II in our team, you will get exposed to some of the biggest AI infrastructure in the world and you will help us build the most reliable AI training services possible.

This opportunity will allow you to connect to the AI mission in a real and tangible way by building a service oriented view of the infrastructure that allows for common High Performance Computing building blocks execute flawlessly on it.

You will get exposed to the biggest names in the AI industry and have the opportunity to get hands on key Graphics Processing Unit and Infiniband infrastructure that powers everything.

This opportunity has flexible working arrangements for the successful candidate as appropriate.

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals.

Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Qualifications

Required Qualifications :

4+ years technical experience in software engineering, network engineering, or systems administrationOR Bachelor's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administrationOR Master's Degree in Computer Science, Information Technology, or related field

Other Requirements :

Ability to meet Microsoft, customer and / or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings :
Microsoft Cloud Background Check : This position will be required to pass the Microsoft Cloud Background Check upon hire / transfer and every two years thereafter.

Preferred Qualifications :

Experience in Infiniband networks and their management
Experience in High Performance Computing workload topologies and schedulers

5+ years technical experience in software engineering, network engineering, or systems administration

OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration

OR Master's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration.

Site Reliability Engineering IC3 - The typical base pay range for this role across the U.S. is USD $98,300 - $193,200 per year.

There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $127,200 - $208,800 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here :

Microsoft will accept applications for the role until July 13, 2024.

azurecorejobs

Responsibilities

Help build the mission control automation and insights to manage the AI infrastructure such as Ethernet networks, Server Management and Infiniband Management.
Use your skills to bring Service Level Agreements in line with Service Level Obligations with the customers asks for AI training reliability.
Partake in livesite and troubleshooting anywhere in the stack and help identify key gaps in telemetry that impede the Service Level Agreements.
Embody ourand

Benefits / perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.

Industry leading healthcareEducational resourcesDiscounts on products and servicesSavings and investmentsMaternity and paternity leaveGenerous time awayGiving programsOpportunities to network and connect

30+ days ago

Related jobs

Promoted

Site Reliability Engineer

VirtualVocations

Seattle, Washington

A company is looking for a Site Reliability Engineer to support cloud-based infrastructure development and maintenance. ...

Senior Site Reliability Engineer

Microsoft

Redmond, Washington

OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administrationOR Master's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experien...

Site Reliability Engineer - Data Infrastructure (Seattle)

ByteDance

Seattle, Washington

Our data infrastructure Site Reliability Engineering (SRE) team is a pioneer in innovation. Establish sustainable mechanisms for scaling systems, such as automation, to drive enhancements in reliability, efficiency, and velocity. ...

Senior Active Directory Site Reliability Engineer

Microsoft

Redmond, Washington

Our team is looking for a Senior Active Directory Site Reliability Engineer. As a Senior Active Directory Site Reliability Engineer, you will provide leadership, direction and accountability for strategic application architecture plans, system design, and implementation. Site Reliability Engineering...

Site Reliability Engineer-FedRAMP, AWS (FULLY REMOTE) - 29122

Splunk Inc

Seattle, Washington

Remote

Site Reliability Engineers in this role will be engaging with multiple service owners across the platform to teach and implement modern interpretations ofSRE,observability, Chaos Engineering andDevOps. Splunk's Cloud Services group is looking for a Site ReliabilityEngineer to help lead, design and b...

Senior Site Reliability Engineer - TEAMS

Microsoft

Redmond, Washington

What is a Site Reliability Engineer (SRE)? SRE is what you get when you treat operations as if it is a software engineering problem. Are you interested in working for one of the most exciting teams at Microsoft? Then look no further than Microsoft Teams Site Reliability Engineering (SRE) team. As a ...

Site Reliability Engineer Graduate (Technical Infrastructure) - 2025 Start (BS/MS)

ByteDance

Seattle, Washington

Site Reliability Engineer - Database

Oracle

Seattle, Washington

Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Are you a seasoned Site Reliability Engineer or Cloud DevOps guru?. Articulate technical characteristics of services and technology areas and guide Development Te...

Senior or Staff Site Reliability Engineer - Performance Engineering

CIRCLE

Seattle, Washington

Senior Site Reliability Engineer (III). Senior Site Reliability Engineer (III). As a Senior Site Reliability Engineer at Circle, you will design, build, and maintain Circle’s infrastructure estate to meet the growing worldwide customer base on public cloud providers across multiple regions. Staff Si...

Site Reliability Engineer - Video Platform - USDS (SEA)

TikTok

Seattle, Washington

The USDS Video Platform team is seeking an experienced Site Reliability Engineer to help us continue improving TikTok's video system. The teams within USDS that deliver on this commitment daily span across Trust & Safety, Security & Privacy, Engineering, User & Product Ops, Corporate Functions and m...

Site Reliability Engineer II

Site Reliability Engineer

Senior Site Reliability Engineer

Site Reliability Engineer - Data Infrastructure (Seattle)

Senior Active Directory Site Reliability Engineer

Site Reliability Engineer-FedRAMP, AWS (FULLY REMOTE) - 29122

Senior Site Reliability Engineer - TEAMS

Site Reliability Engineer Graduate (Technical Infrastructure) - 2025 Start (BS/MS)

Site Reliability Engineer - Database

Senior or Staff Site Reliability Engineer - Performance Engineering

Site Reliability Engineer - Video Platform - USDS (SEA)

Popular searches