RESPONSIBILITIES:
Kforce has a client that is seeking a Site Reliability Engineer in Beaverton, OR. Summary: Within the Site Reliability Engineering, our goal is to provide technical solutions to complex production problems with a focus on reduction of incident and problem toil, speeding detection and recovery of critical incidents through observability and continuous improvement through operational health measurement and sharing. Duties Include:
Adaptability
Engineering
Splunk
Amazon Web Services
Business Process
Kforce has a client that is seeking a Site Reliability Engineer in Beaverton, OR. Summary: Within the Site Reliability Engineering, our goal is to provide technical solutions to complex production problems with a focus on reduction of incident and problem toil, speeding detection and recovery of critical incidents through observability and continuous improvement through operational health measurement and sharing. Duties Include:
- Drive reliability throughout the Engineering Organizations through Observability, informed architectural improvements, and automation
- Collaborate closely with Engineering teams to build cohesive service operation solution into the overall service design
- Build and enhance the DevOps process, environment and tool chains for high service reliability and availability
- Exercise and optimize the service operation process to support the whole service with all partner teams; Mitigate and recover live site incident efficiently
- Bachelor's degree in Computer Science, Engineering, Math, Science or another technical field
- 5+ years of working experience in IT industry in building large scale applications/services on platforms like AWS/Azure
- 3+ years of experience in software development automating business processes using Java, Node or Python on Cloud platform
- Proficient in building micro services using Java in Cloud platform
- Understanding of distributed systems architecture
- Experience in supporting high available and scalable systems with ability to debug/troubleshoot live systems
- Adaptive and flexible to manage multiple tasks with changing priority.
- Hands on experience with Observability tools like Splunk, NewRelic, Azure monitor or CloudWatch
- Good troubleshooting skills and deep understanding of Metrics, Logs and Traces
- Experience, interest, and adaptability to working in a Lean Scaled Agile delivery environment
- Exceptional written, verbal, and interpersonal communication skills with management, technical peers, and business stakeholders
Recommended Skills
ReliabilityAdaptability
Engineering
Splunk
Amazon Web Services
Business Process