Location: Irving, TX
Term: 6 months
Day-to-Day Responsibilities:
-
Handle incoming tickets for supporting our stores.
-
Work in a fast past 24x7 role that requires rotating on call rotations.
-
Monitor our dashboards, reports and alerts to ensure the highest availability.
-
Work closely with other SRE members to improve our observability and SRE Maturity.
-
Work cross-functionally with the various teams in the organization and help establish SLOs and then help teams consistently achieve those SLOs.
Is this a good fit? (Requirements):
-
Bachelor's degree.
-
5+ years of experience supporting complex distributed systems.
-
3+ years of experience in managing/supporting public cloud-based infrastructure (AWS or Azure).
-
3+ years of experience with running and/or managing large infrastructure services with multiple availability regions Public Cloud (AWS, GCP, Azure).
-
2+ years DevOps experience.
-
Experience managing IOT devices in Microsoft Intune.
-
Experience implementing and evangelizing the principles of the Google SRE handbook.
-
Experience building MongDB and NoSQL queries.
-
Experiencing managing PM2 Batch processes.
-
Experience with Python, Java script, Node.js, YAML and similar languages.
-
Experience managing and supporting AWS Lambda or other serverless workloads.
-
Experience with RCA’s, Monitoring and Alarming in all environments and familiar with tools like Mongo Charts, New Relic, CloudWatch, Service Now.
-
Experience using Postman.
-
Experience participating in Scrum/Kanban, AGILE workflow technologies and using JIRA, Confluence and OneDrive.
-
Working experience with IoT devices, and Microsoft Intune.
-
Retail SRE Operations experiencing managing in store devices and IOT.
-
Ability to work from office minimum 3 days per week.
-
Exceptional verbal and written communication skills.
-
Strong technical troubleshooting skills.