Site Reliability Engineering SRE Practitioner Peritus Inc Jersey City, NJ

Kate

Administrator
Команда форума
Role:- Site Reliability Engineering SRE Practitioner

Type: Only fulltime

Location: Jersey city, NJ

Senior Transformation SRE Practitioner will work with our largest clients to help them define and implement a new way of delivering operations that meets their business needs and improves the quality of the services they provide to their customers.

He/She will assist our Financial Services clients and other enterprise level customers in creating a resilient ongoing run function for their private / public cloud environment, by leveraging automation, production KPIs, predictive analytics, AI/ML. The consultant works with the customer's development, SRE, operations, change, and service management teams to identify, define, and build the expected operational resiliency. They will focus on and set best practices, direction, and drive operations as it relates to resiliency, using automation. They will be responsible for defining & driving proactive support best practices for our customers, using predictive analysis and AI/ML.

Furthermore, the practitioner will lead the discovery (baseline on current state assessment), awareness (training sessions developed and executed for product managers, engineering teams and operation team members), and adoption (implementation of runbook / toil automation, architecture and development of observability / self -healing platforms).

We Are Looking For

Seasoned leaders who have been hands-on, production experience and can provide thought leadership, architectural definition and direction. A passion for data and automation

Senior individuals that have run operations teams, worked on company transformations and managed critical systems, who are looking to use their own experience to benefit some of the world's biggest financial institutions

Previous deep experience creating automation within: SRE, support, change management organizations, DevOps, development pipelines

Deep understanding and embracing of operational KPIs, defining and using KPIs to drive continuous improvement, automation, proactive support models

Cloud transformation leaders wanting to go deep on cloud operating models, operations capabilities as they relate to system resiliency

Responsibilities and Abilities

Collaborate with other specialist consultants, account and sales teams, Client Partners to develop SRE practice

Solutions - Define and deliver on-site Professional Services engagements with partners and customers

Delivery - Engagements include on-site, semi-remote or remote projects to plan, build and mature operations capabilities

Insights - Work with Altimetrik engineering and support teams to convey partner and customer needs and feedback as input to technology roadmaps

Qualifications -

Experienced with the use of automation in the context of IT operations

Hands-on experience with key operations technologies such as: Monitoring (NewRelic, DataDog, AWS X-Ray, AppDynamics, NetCool, Zabbix etc.), Alerting (PagerDuty, NetCool, OpsGenie etc.), ITSM (ServiceNow, Jira Service Desk etc.), Scripting (Powershell, Bash, Batch files etc.), Dashboarding (Graphana, Kibana, Prometheus, Zabbix etc.),Logging (Elastic, Splunk etc.)

Understanding of enterprise IT operational capabilities examples include Change, Release, Incident Management, infrastructure management or applications management

Track record of hands-on delivery of processes, procedures or technical solutions e.g., Runbooks, ITSM Processes, governance or monitoring\alerting scripts, automation

Understanding of modern application delivery (such as , DevOps, CI\CD Pipelines etc.) methods and how to transition operations from traditional approaches to supporting product lead teams

.

Preferred Qualifications

How engineering teams, cloud operations and infrastructure teams function, their core responsibilities and interfaces

The change in responsibility and accountability that comes with making the most of SRE transformation program

How to transition ITIL processes to Agile based frameworks

How to translate theoretical models into customer needs without overlooking nuance such as institutional memory and culture

Demonstrated ability to think strategically about the business, product, and technical challenges of operating enterprise production environments

Familiarity with Application Delivery frameworks and approaches (COTS, DevOps, CI/CD, Waterfall)

Understanding of a public cloud platform from an operations perspective, experience of running transactional systems at scale ($1bn+) and managing multi-service complex environments

Infrastructure delivery knowledge, skills and experience

Experience of leading teams with a mix of technical and operational roles
 
Сверху