Senior Site Reliability Developer Michael Page Dallas, TX $100,000-175,000 per year

Kate · 6 Ноя 2021

Our client is looking for a Senior Site Reliability Developer - Technical Lead to join our rapidly growing technology team. The Senior SRE-TL will join the SRE squad and will be responsible for keeping all user-facing services and other production systems running smoothly. The Senior SRE - Technical Lead will be accountable for the reliability, scalability and resilience of complex infrastructure components.

MPI does not discriminate on the basis of race, color, religion, sex, sexual orientation, gender identity or expression, national origin, age, disability, veteran status, marital status, or based on an individual's status in any group or class protected by applicable federal, state or local law. MPI encourages applications from minorities, women, the disabled, protected veterans and all other qualified applicants.

Description

Team leadership, knowledge sharing & coaching - 25%

Enforce an effective and efficient scrum process where all team members work in the same direction
Guide SRE engineers, when needed, to break down user stories into manageable tasks
Propose and drive a development process that emphasizes quality through code reviews, automated testing, continuous integration pipelines and documentation
Develop a deep understanding of the team's roadmap and influence it with fact-based technical arguments
Ensure proper documentation of team activities
Ensure the demo of features developed are well prepared and presented to stakeholders
Review Pull Request, documentation with the objective to guide and upskill junior developers on various technical/SRE topics
Provide fact-based technical feedback on each squad member to managers as part of the evaluation cycle
Actively contribute to SSENSE University, the internal peer learning platform, to promote continuous learning
Participate in the onboarding of new developers
Mentor Junior in all areas and other SREs in their area of deep knowledge.
Set an example for a team of SREs with positive and inclusive leadership and discussion on work
Trusted to de-escalate conflicts inside the team

Production Operations - 20%

Handle emergency response either by being on-call or by reacting to symptoms according to monitoring and escalation when needed
Accountable for ensuring & improving documentation on site reliability measures, either in application documentation, or in runbooks, explaining the issues encountered and the solutions implemented
Actively seek and identify opportunities and implement them to improve the availability and performance of the system by applying the learnings from monitoring and observation
Identify parts of the system that do not scale, provide immediate palliative measures and drive long term resolution of these incidents.
Improve the SSENSE codebase by resolving issues
Optimize cloud cost and reduce system resource usage by setting clear requirements through efficiency and capacity planning

Maintain Service Level Objectives (SLO)/ Service Level Indicator (SLI) - 20%

Plan, design and execute solutions within the infrastructure team to reach specific goals agreed upon
Share the learnings publicly, either by creating issues that provide context for anyone to understand it or by writing blog posts
Proposes ideas and solutions within the infrastructure team to reduce the workload by automation
Identify Service Level Indicators (SLIs) that will align the team to meet the availability and latency objectives
Perform and run blameless RCAs on incidents and outages aggressively looking for answers that will prevent the incident from ever happening again

Product delivery - 15%

Anticipate the technical challenges the squad will face when delivering solutions and propose and implement technical solutions to those issues
Write testable, efficient, and reusable code suitable for continuous integration and automated deployments, that respects best practices and SSENSE development standards
Raise the bar for professional SRE engineers, lead by example, and help others learn the craft through rigorous code reviews and coaching

Ownership and accountability - 10%

Be accountable for performance, reliability, scalability and resilience of complex and critical infrastructure components (web servers, data stores, hosted services, load balancers, etc.) through the proper use of replication, sharding, load balancing, monitoring, SLAs, alerting, and auto-scaling
Be an active participant in the incident escalation chain and prompt resolution
Upgrade and patch systems as required while ensuring availability of service
Contribute to cross-squad initiatives, acting as a change agent amongst peers to foster adoption of new processes or technical solutions

Remote|Ecommerce Leadership Opportunity

Bachelor's degree in Computer Science, Engineering, or a related technical field, Master's degree, an asset
Minimum 8 years of experience working as SRE
A minimum of 8 years experience administrating Linux based environments (Red Hat, CentOS, Debian or Ubuntu)
A minimum of 8 years experience with service-oriented architectures, micro-services.
Must have at least 2 years of working in Agile development life cycle
A minimum of 8 years experience practicing continuous integration and continuous delivery
Minimum 5 years of experience with infrastructure automation frameworks in at least two of these technologies:, Saltstack, Terraform, or Cloud Foundation engine
Expertise in infrastructure to support a microservice architecture
A minimum of 4 years experience in Infrastructure-as-code specifically with Terraform
Strong knowledge of caching technologies (Fastly, Redis) with the ability to identify opportunities for improvement
Expertise with RDBMS (MySql, Post-gres) and NoSQL (DynamoDB, DocumentDB, Mongo DB) databases at scale
Proficiency in Cloud resources (AWS) with the ability to operate them for the components owned, Certification preferred
Ability to use containers and orchestration frameworks (Kubernetes, Docker, Container registries etc.)
Proficiency in Git
Must have at least 4 years of experience with Kubernetes. Nice to have Amazon EKS, ECS experience

MPI does not discriminate on the basis of race, color, religion, sex, sexual orientation, gender identity or expression, national origin, age, disability, veteran status, marital status, or based on an individual's status in any group or class protected by applicable federal, state or local law. MPI encourages applications from minorities, women, the disabled, protected veterans and all other qualified applicants.

Working in an agile environment, our squads are made up of experienced innovators in Product Management, QA, Design, DevOps, Software Development, Machine Learning, Data Engineering, and Security. Headquartered in Montreal, our technology organization has been growing at a rate of 2X year-over-year and is doubling once again in 2021 as we expand across Canada, US, and Europe.
Michael Page

Senior Site Reliability Developer Michael Page Dallas, TX $100,000-175,000 per year

Kate

Administrator