As a Site Reliability Engineer (SRE), you'll help build a meaningful engineering discipline, combining software and systems to develop creative engineering solutions to operations problems. Much of our support and software development focuses on optimizing existing systems, building infrastructure, and reducing work through automation. You'll join a team of curious problem solvers with a diverse set of perspectives who are thinking big and taking risks. In this environment, you'll take the lead on relevant projects, supported by an organization that provides the support and mentorship you need to learn and grow. As an SRE, you'll be focused on running better production applications and systems.
Job Responsibilities
We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. In accordance with applicable law, we make reasonable accommodations for applicants' and employees' religious practices and beliefs, as well as any mental health or physical disability needs.
Job Responsibilities
- As part of SRE team provide operations and RTE support for the applications within the product group
- Develop solutions to automate manual development, deployment and operations tasks
- Facilitate application modernization through DevOps and CI/CD tool chain adoption and implementation.
- Responsible for the security, availability, performance, change/ incident management, telemetry, and capacity management of the application
- Define, measure, and improve the Service Level Objectives (SLO) by applying software engineering principles
- Develop tools and visualization to measure application compliance to defined SLI/ SLO
- Partner with the App Dev teams throughout the life cycle to help build products conforming to applicable Non Functional Requirements (NFR)
- 2-4 years of enterprise level professional experience in operations support; incident management, change management, RTE Support
- Experience implementing DevOps Toolchain and Continuous Integration/ Continuous Delivery (CI/CD) pipeline automation
- Development or Support experience in Java, Spark, Kubernetes or Hadoop based data platforms.
- Familiarity with SRE Principles, processes and tools
- Experience in an RDMS database product like Oracle or SQL server
- Development and production monitoring experience in BMC Control-M scheduling tool
- Report and dashboard development experience using Tableau Reporting Tool
- Experience in ServiceNow Change Management and Incident Management modules
- Familiarity with Agile methodology and experience working in Agile Scrum or Kanban teams
We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. In accordance with applicable law, we make reasonable accommodations for applicants' and employees' religious practices and beliefs, as well as any mental health or physical disability needs.