Role : Production / Application support
Location : Irving, TX (hybrid, 3 days work from office)
Contact : 6 months
Job Description :
Key Responsibilities and Duties
- Handle incoming tickets for supporting client’s stores
- Work in a fast past 24x7 role that requires rotating oncall rotations
- Monitor our dashboards, reports and alerts to ensure the highest availability.
- Work closely with other SRE members to improve our observability and SRE Maturity
“Must Have” Specific Knowledge and Skills
Ability to work from office minimum 3 days per weekExceptional verbal and written communication skillsStrong technical troubleshooting skillsBachelor's degree or equivalent work experience 5+ years of experience supporting complex distributed systems3+ years of experience in managing / supporting public cloud-based infrastructure (AWS or Azure)3+ years of experience with running and / or managing large infrastructure services with multiple availability regions Public Cloud (AWS, GCP, Azure)Work cross-functionally with the various teams in the organization and help establish SLOs and then help teams consistently achieve those SLOs.Working experience with IoT devices, and Microsoft Intune.Experience with RCA’s, Monitoring and Alarming in all environments and familiar with tools like Mongo Charts, New Relic, Cloudwatch, Service Now.Experience participating in Scrum / Kanban, AGILE workflow technologies and using JIRA, Confluence and OneDrive.Retail SRE Operations experiencing managing in store devices and IOT.Experience using PostmanAdditional Skills and Other Requirements
Experience managing IOT devices in Microsoft IntunesExperience implementing and evangelizing the principles of the Google SRE handbookExperience building MongDB and NoSQL queriesExperiencing managing PM2 Batch processesExperience managing and supporting AWS Lambda or other serverless workloads