As the Site Reliability Engineer / Tanzu, must have the strong ability to work collaboratively with and across agile teams. The Site Reliability Engineer (SRE) shall be able to build and maintain infrastructure as code on large scale multi-site deployments. Additionally, the SRE shall utilize their experience to evaluate and assess new ways to scale platform capabilities. Will troubleshoot issues until root causes are understood on high traffic production systems, participate in design and code review processes, interact with product owners to coordinate infrastructure changes and be responsible for identifying bottlenecks and improving performance of the platform. The SRE will deploy and maintain 3rd party observability tooling, and collaborate with customers to build and tune Service Level Indicators and Service Level Agreements. Will demonstrate proficiency with facilitation of Post Mortem retrospectives and execution or adjudication of Root Cause Analyses.
- VMware Tanzu
- Exceptionally proficient in software incident response, troubleshooting, problem isolation, and incident command with extreme ownership of software availability.
- 8+ years of experience working in Software Engineering, or Site Reliability Engineering.
- 5+ years, building and maintaining Container Orchestration across hybrid-cloud infrastructure.
- 5+ years of experience deploying and configuring modern observability tooling for monitoring and alerting.
- Proficiency writing automation with bash or python.
- Experience writing or troubleshooting software delivery pipelines, eg: GitLab CI and Concourse.
- Highly preferred that candidates have a Bachelor's degree in Computer Science, Mathematics or equivalent technical degree or equivalent industry experience.