Job Description
Serve as a part of the incident management team in a 24X7 Microsoft Azure environment. Candidate will diagnose, mitigate and / or escalate system issues to maintain a high level of system / platform availability.
Candidate will serve as a part of the Live Site work stream and will require an understanding of core Windows Azure components and tools to diagnose issues.
Duties and Responsibilities
- Responds to incident tickets in a 24x7 operational environment to meet SLA objectives.
- Troubleshoots system issues using diagnostic tools like netmom, windbg, and custom application tools.
- Reviews system logs to identify and mitigate system issues.
- Leverage knowledge base to help troubleshoot, identify and resolve systems issues.
- Update knowledge base troubleshooting guides and lessons learned as required.
- Document incident fixes and make recommendations to engineering team for system improvements for consideration in future releases.
- Document system issues resulting in system outages and coordinate change though change management process.
- Support collaboration across operations, development teams and external partners.
- Support tiger team calls to streamline knowledge sharing and timely resolution of system issues.
- Monitor solution performance according to client specification and SLAs, escalate as needed.
- Other supporting duties, as directed.
- Willingness to work overtime and varying hours as required.
Minimum Qualifications
- BS in Computer Science or other technical discipline is preferred.
- 1-2 years of operations experience providing application infrastructure support; 1 year performing system administrator support
Clearance Requirement
TS / SCI with Full Scope Poly required.
Other Job Specific Skills
- Experience with system administration support tools such as Windows / Linux
- Experience supporting a 24x7 cloud based environment.
- Strong interpersonal skills
- Strong oral and written communication skills
- Experience in supporting Cloud based environment and tools such as Azure / AWS
- Experience analyzing, troubleshooting, and providing solutions for technical issues
- Ability to problem solve and collaborate with team members
- Strong organizational and multi-tasking skills
- Strong in technical communications with both technical and non-technical peers
- Able to maintain professionalism under pressure
- Strong customer focus
30+ days ago