Works closely with software architects and technical leads to ensure decisions meet long-term enterprise growth needs
SRE professionals are engineers who specialize in reliability with the right mix of knowledge and skills in software and systems, responsible to analyze business needs, problem determination, advise design, build, test, deploy, changes and maintenance of a well-engineered information system.
- Collaborates with leaders, business analysts, project managers, IT architects, technical leads and other developers, along with internal customers, to understand requirements and develop needs according to business requirements
- Maintains and enhances existing enterprise services, applications, and platforms using domain driven design and test-driven development
- Troubleshoots and debugs complex issues; identifies and implements solutions
- Creates detailed project specifications, requirements, and estimates
- Researches and implements new technologies to enhance current processes, security, and performance
- Supports the development of coding standards and adheres to best practices and security guidelines
- CI/CD implementation for deployments, creating branching strategy for repository, and Git issues.
- Engage with all development and project teams as primary Azure DevOps SME.
- Maintain and support integrations with Cherwell, App Center, SauceLabs SonarQube, Artifactory, Git hooks etc
- Maintain and support hosted and on-premise build agents for Azure DevOps
- Create and maintain document for Azure DevOps CI/CD.
- Android and IOS build server creation.
- Responding to support notifications during and after normal work hours, including nights and weekend, when necessary.
- Help team reduce footprint in Azure and make development environment dynamic with Deployment slots.
- System Thinking end-to-end -Broad understanding of enterprise architectures and complex (backend) systems (understand more than the component itself)
- Understanding of systems from a reliability perspective. Ability to root cause sources of instability in a high-traffic, distributed system.
- Passion for resolving reliability issues and identify strategies to mitigate going forward.
- Understanding and practical working experience of operating system / hypervisor internals, are familiar with the TCP/IP stack, network routing and load balancing. Experience with configuration and troubleshooting
- Approach troubleshooting systematically and have a deep sense of ownership for whatever you work on.
- Maintains personal responsibility and commitment to respond to and address incidents quickly
- Good Software engineering skills (with experience in Python, Go and/or Java(script), Node.js, Angular, NoSQL)
- Solve difficult engineering problems (and don’t mind getting your hands dirty)
- Passionate about automation and innovations that improve productivity by reducing toil
- Data-driven / scientific approach to fact-finding and prioritization.
- Fair understanding of mathematical and statistical models to assess trends.
- Organizational knowledge / Strong communication (verbally and written) / collaboration / negotiation skill, working in a diverse team cross business unit
- Monitoring Event Mgmt of complex systems, Instrumenting