Job Description: Site Reliability Engineer
Overview
The Site Reliability Engineer position is responsible for keeping all user-facing services and other production systems running efficiently and smoothly through the effective use of automation and development practices. Your hands-on knowledge in system design, application development, testing, and operational stability will help the team deliver highly reliable products and solutions.
Position Responsibilities and Duties
- Design, code, test, and deliver software to automate manual operational work
- Ensure high-availability and disaster-recovery abilities across solutions
- Enhance and improve upon existing monitoring and alerting capabilities to avoid incidents
- Monitor systems capacity and performance
- Design, build and maintain infrastructure that enables auto-scaling for peak performance
- Design self-healing and resiliency patterns via usage of Chaos Engineering practices
- Become a technical leader and contributor to projects, including coding, code reviews, and architectural discussions
- Assist in debugging production issues across services and levels of the stack
- Troubleshoot priority incidents, facilitate blameless post-mortems and ensure permanent closure of incidents
Qualifications
- Strong verbal and oral communication skills and a positive, can-do attitude required
- DevOps / Infrastructure Engineer with a development background (must possess coding skills)
- Proficiency in modern programming languages – Node.js, Python, Java, PHP
- Development of automation/monitoring scripts and an understanding of interfaces
- Working knowledge of infrastructure components (e.g. routers, load balancers, containers, storage, network, etc)
- Expertise in AWS Cloud, CloudFormation, SAM templates, and IaC
- Experience in automated Quality Assurance techniques and practices
- Knowledge of performance, monitoring, telemetry tools
- Experience in managing DevOps practices and toolsets – CI/CD, Ansible, AWS CodePipeline
- A Bachelor's degree in IT or equivalent experience in a software engineering discipline
Job Type
- Full-time
Recommended Skills
Systems DesignPhp (Scripting Language)
Infrastructure
Testing
Storage (Computing)
Application Development