Work with the brightest minds at one of the largest financial institutions in the world. This is long-term contract opportunity that includes a competitive benefit package!
Our client has been around for over 150 years and is continuously innovating in today's digital age. If you want to work for a company that is not only a household name, but also truly cares about satisfying customers' financial needs and helping people succeed financially, apply today.
Position: Senior Site Reliability Engineer
Location: Multiple locations across the US
Term: 12 months
- Support our client's assessment to baseline the SRE environment and organization.
- Support alignment of development and operations through shared goals and balance between functional and nonfunctional requirements.
- Support setup of the Service Management reference architecture (Service Strategy, Design, Transition, and Improvement) , and operational processes management (Incident, Problem, Change, Release, Capacity etc).
- Support development and operation teams to define each stakeholder’s service-level indicators (SLIs) that reflects reliability. (Ex. Availability, Response Time, Latency, Throughput etc).
- Support our client's SRE team resource building (Acquire, Develop, Manage) forming, storming, norming, performing, and adjourning.
- Support establishing a charter helps define what the priorities of the SRE team include and how they operate and what the SRE team should not engage in.
- Support identify & organize the team: mix of established domain knowledge and fresh viewpoints. Broad Talent mix, small, fast, nimble, with authority and reduced bureaucracy.
- Support how to define and agree on Service Level Objectives (SLOs) of measurable benchmarks to quantify the value of each SLI (Ex: Availability 99.999%, Response Time < 1 second. Etc).
- Support how SLOs are calculated (formula) including how data is collected, aggregated, analyzed and reported.
- Support SRE principles using automation to scale load, balance operation toil and improvement development, create an error budget to control velocity balance effective self-regulation of features against stability, practice observability, use actionable automated Runbooks, hold blameless postmortem for every event, apply error budge difference between negotiated performance (that is, the SLO) and actual performance.
- Define and measure production availability, navigating known downtime, and service level outages.
- Debug problems at scale for our mission critical services, and help our platform and service teams to implement lasting fixes to recurring issues.
- Support provision environments using infrastructure as code provided by automation team and execute, debug, and configure CI/CD pipelines.
- Support Onboard new projects/applications onto the platform using automation
- Support acceptance testers for automation developed by an automation team and use the automation to satisfy service requests in a more timely fashion.
- Designs, codes, tests, debugs and documents programs using Agile development practices.
- Maintains broad knowledge of other technology engineering disciplines and collaborating with other key experts to ensure we are making the right technology choices for the client.
- Translates advanced technology experience, an in-depth knowledge of the organizations tactical and strategic business objectives, the enterprise technological environment, the organization structure, and strategic technological opportunities and requirements into technical engineering solutions.
- Responsible for being an expert resource for architects in the development of target architectures to ensure that they can be properly designed and implemented through sound engineering practices.
- Maintains knowledge of industry best practices and new technologies and recommends innovations that enhance operations and/or provide a competitive advantage to the organization.
- Provides expert counsel to senior technology leadership and advises and mentors others with the goal of knowledge transfer.
- Represents the Company to external industry groups, influencing industry standards.
Is this a good fit? (Requirements):
- 10+ years of application development and implementation experience
- 3+ years of development experience with languages such as Python or Java
- 3+ years of build-deploy automation and configuration experience within the Linux and Unix environment
- 2+ years of experience in/as DevOps/Site Reliability Engineer
- Working knowledge of Cloud, API and No-SQL databases
- Working knowledge of Grafana, ELK, AppDynamics
- 7+ years of software engineering experience
- An industry-standard technology certification
- Strong verbal, written, and interpersonal communication skills