In the role of Site Reliability Engineer for Controls, Operational Risk, Compliance and Practices Technology, you will work in a collaborative team of software professionals and be responsible for improving the health of the applications. The Site Reliability Engineer will be part of a horizontal function that is responsible to ensure that the practices, processes and tools are in place to ensure stability and functionality of each application. This team will ensure the highest level of quality and success in support of technical issues, DR testing, and hardware/software updates. The SRE is expected to implement DevOps practices and automate the release process and develop scripts to automate the manual processes.
You will be working directly with other SRE members and development team members in the development and support of innovative technology solutions including user interfaces, middle-tier and server-side components, and will need to ensure adherence to architecture standards, risk management, and security policies.
As a Site Reliability Engineer for our technology teams, you will have the opportunity to instrument, build and maintain complex applications and also maintain vendor applications from a development and risk perspective.
Primary Responsibilities:
Troubleshoots incidents, conducts blameless post-mortems and ensures permanent closure of incidents.
Engages with development team throughout the life cycle to help develop software for reliability.
Applies analytics on historic data, such as incidents and usage patterns, to predict issues and take proactive action.
Drives adoption of self-healing and resiliency patterns such as circuit breaker, bulkhead etc.
Designs and conducts performance tests, identifies bottlenecks and opportunities for optimization.
Defines and drives adoption of best in class monitoring frameworks to accomplish end to end flow monitoring and noiseless alerting.
Designs, develops, tests and delivers software to automate manual operational work
Deploys software and product upgrades.
Adds value to team delivery and works with team to complete tasks to high quality and actively learns new skills.
Facilitates maximum speed of delivery by objectively binding to error budgets of the service.
Manages the effort split between manual operational work and engineering work.
Coaches other team members and manages teams as needed.
Required Skills:
Excellent debugging and trouble shooting skills.
Expert in performance monitoring and capacity management of large systems using various tools.
Expert in at least one technology stack (Java/J2EE/Python) with designing, coding, testing, and delivering software.
Expert in at least one of the relational databases (SQL Server, Oracle, DB2 etc.).
Hands-on experience with cloud technologies (Cloud Foundry, Kubernetes, AWS).
Hands-on experience with big data services (Hadoop, HDFS, Hive, Yarn, HBase, Kafka, Zookeeper).
Working knowledge of Groovy, batch scripting, PowerShell or shell scripting.
Experience developing, deploying and debugging distributed systems in a Linux, Hadoop environment.
Experience with monitoring tools such as AppD, Splunk, ELK, Geneos.
Analysis of SLI metrics and performance data. Interpreting and correlating it to SLOs and SLAs.
Experience with deployment automation, CI/CD, DevOps, Jenkins, GIT, BitBucket.
Experience with cloud/container environments, big data, analytical tools (Tableau, Alteryx).
Expert practitioner in one or more technology domains, may be a cross-domain expert able to solve complex and mission critical problems within a business or across the firm.
Working knowledge of infrastructure components like routers, load balancers and networks.
Comfortable working in Agile mode and proficient in continuous integration and continuous delivery.
Solid understanding of micro-service design methodologies.
Solid analytical and problem solving skills.
A proven team lead with excellent communications skills.
Attention to detail and time-management skills.
Is endlessly curious about applications and application stability.
JPMorgan Chase & Co., one of the oldest financial institutions, offers innovative financial solutions to millions of consumers, small businesses and many of the world's most prominent corporate, institutional and government clients under the J.P. Morgan and Chase brands. Our history spans over 200 years and today we are a leader in investment banking, consumer and small business banking, commercial banking, financial transaction processing and asset management.
We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. In accordance with applicable law, we make reasonable accommodations for applicants' and employees' religious practices and beliefs, as well as any mental health or physical disability needs.
Equal Opportunity Employer/Disability/Veterans
You will be working directly with other SRE members and development team members in the development and support of innovative technology solutions including user interfaces, middle-tier and server-side components, and will need to ensure adherence to architecture standards, risk management, and security policies.
As a Site Reliability Engineer for our technology teams, you will have the opportunity to instrument, build and maintain complex applications and also maintain vendor applications from a development and risk perspective.
Primary Responsibilities:
Troubleshoots incidents, conducts blameless post-mortems and ensures permanent closure of incidents.
Engages with development team throughout the life cycle to help develop software for reliability.
Applies analytics on historic data, such as incidents and usage patterns, to predict issues and take proactive action.
Drives adoption of self-healing and resiliency patterns such as circuit breaker, bulkhead etc.
Designs and conducts performance tests, identifies bottlenecks and opportunities for optimization.
Defines and drives adoption of best in class monitoring frameworks to accomplish end to end flow monitoring and noiseless alerting.
Designs, develops, tests and delivers software to automate manual operational work
Deploys software and product upgrades.
Adds value to team delivery and works with team to complete tasks to high quality and actively learns new skills.
Facilitates maximum speed of delivery by objectively binding to error budgets of the service.
Manages the effort split between manual operational work and engineering work.
Coaches other team members and manages teams as needed.
Required Skills:
Excellent debugging and trouble shooting skills.
Expert in performance monitoring and capacity management of large systems using various tools.
Expert in at least one technology stack (Java/J2EE/Python) with designing, coding, testing, and delivering software.
Expert in at least one of the relational databases (SQL Server, Oracle, DB2 etc.).
Hands-on experience with cloud technologies (Cloud Foundry, Kubernetes, AWS).
Hands-on experience with big data services (Hadoop, HDFS, Hive, Yarn, HBase, Kafka, Zookeeper).
Working knowledge of Groovy, batch scripting, PowerShell or shell scripting.
Experience developing, deploying and debugging distributed systems in a Linux, Hadoop environment.
Experience with monitoring tools such as AppD, Splunk, ELK, Geneos.
Analysis of SLI metrics and performance data. Interpreting and correlating it to SLOs and SLAs.
Experience with deployment automation, CI/CD, DevOps, Jenkins, GIT, BitBucket.
Experience with cloud/container environments, big data, analytical tools (Tableau, Alteryx).
Expert practitioner in one or more technology domains, may be a cross-domain expert able to solve complex and mission critical problems within a business or across the firm.
Working knowledge of infrastructure components like routers, load balancers and networks.
Comfortable working in Agile mode and proficient in continuous integration and continuous delivery.
Solid understanding of micro-service design methodologies.
Solid analytical and problem solving skills.
A proven team lead with excellent communications skills.
Attention to detail and time-management skills.
Is endlessly curious about applications and application stability.
JPMorgan Chase & Co., one of the oldest financial institutions, offers innovative financial solutions to millions of consumers, small businesses and many of the world's most prominent corporate, institutional and government clients under the J.P. Morgan and Chase brands. Our history spans over 200 years and today we are a leader in investment banking, consumer and small business banking, commercial banking, financial transaction processing and asset management.
We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. In accordance with applicable law, we make reasonable accommodations for applicants' and employees' religious practices and beliefs, as well as any mental health or physical disability needs.
Equal Opportunity Employer/Disability/Veterans