Title: Site Reliability Engineer I
Location: US – Remote
Summary of Major Responsibilities
This position is focused on providing strategic direction on and execution of infrastructure, security, continuous integration, deployment, and IT operations practices, scaling and metrics, as well as running day-to-day operations of production and development infrastructure for a cloud based hosted platforms.
The Site Reliability Engineer (SRE) will work with other Software Engineers, Database Engineers, and Product Managers to analyze system and network loads to address stability and performance challenges and collaborate with others to operate various systems. The SRE performs ongoing application support by diagnosing and resolving issues, maintaining applications, and evaluating and recommending options for improving performance, maintainability and operability. This also includes streamlining processes to increase system scalability and reliability, improve efficiency, and minimize errors.
Essential Duties and Responsibilities
- Understanding of security and encryption best practices.
- Responsible for designing, building, maintaining, and scaling production services and server farms across multiple data centers for complex and data-intensive cloud services.
- Design and enhance software architecture to improve scalability, service reliability, capacity, and performance.
- Write automation code for provisioning and operating infrastructure at massive scale. You are not an operator, you’re an experienced software engineer focused on operations.
- Work with development teams to make sure the applications fit nicely within the infrastructure and scalability/reliability is designed and implemented from the grounds up. You will work with QA on building pipelines and automation for delivering and deploying applications to production.
- Roll up the sleeves to troubleshoot incidents, formulate theories and test your hypothesis, and narrow down possibilities to find the root cause.
- Write postmortem reviews and remediation recommendation.
- Identify bad trends before they become problems; respond to automated system alerts, effectively troubleshoot system errors and work incidents to return systems to normal operating conditions.
- Author and update high-quality documentation of all relevant specifications, systems and procedures.
- Other duties as assigned.
- Uphold company mission and values through accountability, innovation, integrity, quality, and teamwork.
- Support and comply with the company’s Quality Management System policies and procedures.
- On-call rotation supporting the infrastructure.
- Regular and reliable attendance.
- Ability to work designated schedule.
- Ability to work nights and/or weekends.
- Ability to work on a mobile device, tablet, or in front of a computer screen and/or perform typing for approximately 90% of a typical working day.
- Ability and means to travel between Madison locations.
- Bachelor’s degree in computer sciences or related field; or high school degree/general education diploma and 2 years of relevant experience in lieu of degree.
- 3+ years of experience in systems engineering.
- 3+ years of work and/or formal classroom experience with modern application design and cloud environments.
- 3+ years of work and/or formal classroom experience working with software development and operations teams.
- Authorized to work in the Unites States without sponsorship.
- Demonstrated ability to perform the Essential Duties of the position with or without accommodation.
- 1+ years of experience developing highly available systems architecture using modern technologies.
- Track record in successfully addressing performance, scalability and latency challenges.
- AWS Solutions Architect, AWS SysOps Administrator, or AWS Developer certification.