Job Description:
Site Reliability Engineer - SRE
Description:
UST Global® is looking for Site Reliability Engineer to manage end to end application and system stack and to work with one of the leading financial services organization in the US. Site Reliability Engineering (SRE) is a discipline that combines software and systems engineering for building and running large-scale, distributed, fault-tolerant systems. SRE ensures that internal and external services meet or exceed reliability and performance expectations.
SRE is also an engineering approach to building and running production systems -engineer solutions to operational problems. As SREs are responsible for overall system operation, utilizing a breadth of tools and approaches to solve a broad set of problems. Practices such as limiting time spent on operational work, blameless postmortems, proactive identification, and prevention of potential outages.
Responsibilities:
· You will be part of the team to migrate and transform the on-prem applications and data centers to public Cloud (GCP), and then.
· You will engage in and improve the software development lifecycle - from inception and design, through development, deployment, operation and refinement
· Develop and maintain the large-scale infrastructure
· Own build tools and CI/CD automation pipeline
· You will influence and design infrastructure, architecture, standards and methods for large-scale systems
· You will support services prior to production via infrastructure design, software platform development, load testing, capacity planning and launch reviews
· You will maintain services during deployment and in production by measuring and monitoring key performance and service level indicators including availability, latency, and overall system health
· You will automate system scalability and continually work to improve system resiliency, performance and efficiency
· Investigate, diagnose, and resolve performance and reliability problems in a wide range of large-scale and high-throughput services
· Collaborate with architects and application engineers to ensure applications are maintainable, scalable, and follow appropriate disaster recovery and high availability strategies
· Contributions to handbook, runbooks, and general documentation
· You will remediate tasks within corrective action plan via sustainable, preventative, and automated measures whenever possible
Requirements:
· BS degree in Computer Science or related technical field, or equivalent job experience required
· 4plus years of SRE experience in Cloud environments
· 2+ years of experience developing and/or administering software in public cloud
· Strong working knowledge and working experience on GCP (Google Cloud Platform)
· Experience in DevOps and CI/CD pipelines and build tools like Jenkins.
· 2 -4 years of experience in languages such as Python, Ruby, Bash, Java, Go, Perl, JavaScript and/or node.js
· Experience managing Infrastructure as code via tools such as Terraform or CloudFormation
· Must have great communication skills
· Experience operating a production environment at high scale with emphasis on availability, latency
· Deep knowledge of container orchestration tools such as Docker, Kubernetes
· Familiar with configuration management tools and Deployment tools such as Chef, Octopus
· Experience in software development in one or more of the following: C, C++, Java, Go and/or Perl, Python.
· Prior experience in developing and/or administering software in Windows with Dotnet applications
· Strong team player with a “can do” attitude, and the flexibility to jump in wherever needed
· Demonstrable cross-functional knowledge with systems, storage, networking, security and databases
· System administration skills, including automation and orchestration of Linux/Windows using Chef, Puppet, Ansible, Salt Stack and/or containers (Docker, Kubernetes, etc.)
· Proficiency with continuous integration and continuous delivery tooling and practices
· Strong analytical and troubleshooting skills
· Ability and willingness to learn and apply new tools and technologies
· Extra Points for any of the following:
· Prior experience in developing applications in .NET technologies or Java
· You have expertise designing, analyzing and troubleshooting large-scale distributed systems.
· You take a system problem-solving approach, coupled with strong communication skills and a sense of ownership and drive
· You are passionate for automation with a desire to eliminate toil whenever possible
· You've built software or maintained systems in a highly secure, regulated or compliant industry
· You thrive in and have experience and passion for working within a DevOps culture and as part of a team
About Us:
UST Global is a technology partner dedicated to transforming businesses, communities, and the people who live within them. Operating in 25 countries, we deliver future-ready digital transformation strategy services, products, and platforms that create new possibilities and help you imagine what's next in financial services, healthcare, retail, manufacturing, semiconductor, and communications. But what matters most is the deep partnership we forge with you to solve the unique challenges you face today, while preparing you for tomorrow.
That's us together. That's UST Global.
EEO Statement
For US region, Equal Employment Opportunity and Diversity and Inclusion Strategy UST is committed to a policy of equal employment opportunity for applicants and employees. Employment decisions comply with all applicable laws prohibiting discrimination in employment, including Title VII of the Civil Rights Act of 1964, the Age Discrimination in Employment Act of 1967, the Americans with Disabilities Act of 1990, the Immigration and Nationality Act, Genetic Information Nondiscrimination Act and any applicable state and local laws. Further, UST is committed to having a workplace that encourages diversity and inclusion.
Site Reliability Engineer - SRE
Description:
UST Global® is looking for Site Reliability Engineer to manage end to end application and system stack and to work with one of the leading financial services organization in the US. Site Reliability Engineering (SRE) is a discipline that combines software and systems engineering for building and running large-scale, distributed, fault-tolerant systems. SRE ensures that internal and external services meet or exceed reliability and performance expectations.
SRE is also an engineering approach to building and running production systems -engineer solutions to operational problems. As SREs are responsible for overall system operation, utilizing a breadth of tools and approaches to solve a broad set of problems. Practices such as limiting time spent on operational work, blameless postmortems, proactive identification, and prevention of potential outages.
Responsibilities:
· You will be part of the team to migrate and transform the on-prem applications and data centers to public Cloud (GCP), and then.
· You will engage in and improve the software development lifecycle - from inception and design, through development, deployment, operation and refinement
· Develop and maintain the large-scale infrastructure
· Own build tools and CI/CD automation pipeline
· You will influence and design infrastructure, architecture, standards and methods for large-scale systems
· You will support services prior to production via infrastructure design, software platform development, load testing, capacity planning and launch reviews
· You will maintain services during deployment and in production by measuring and monitoring key performance and service level indicators including availability, latency, and overall system health
· You will automate system scalability and continually work to improve system resiliency, performance and efficiency
· Investigate, diagnose, and resolve performance and reliability problems in a wide range of large-scale and high-throughput services
· Collaborate with architects and application engineers to ensure applications are maintainable, scalable, and follow appropriate disaster recovery and high availability strategies
· Contributions to handbook, runbooks, and general documentation
· You will remediate tasks within corrective action plan via sustainable, preventative, and automated measures whenever possible
Requirements:
· BS degree in Computer Science or related technical field, or equivalent job experience required
· 4plus years of SRE experience in Cloud environments
· 2+ years of experience developing and/or administering software in public cloud
· Strong working knowledge and working experience on GCP (Google Cloud Platform)
· Experience in DevOps and CI/CD pipelines and build tools like Jenkins.
· 2 -4 years of experience in languages such as Python, Ruby, Bash, Java, Go, Perl, JavaScript and/or node.js
· Experience managing Infrastructure as code via tools such as Terraform or CloudFormation
· Must have great communication skills
· Experience operating a production environment at high scale with emphasis on availability, latency
· Deep knowledge of container orchestration tools such as Docker, Kubernetes
· Familiar with configuration management tools and Deployment tools such as Chef, Octopus
· Experience in software development in one or more of the following: C, C++, Java, Go and/or Perl, Python.
· Prior experience in developing and/or administering software in Windows with Dotnet applications
· Strong team player with a “can do” attitude, and the flexibility to jump in wherever needed
· Demonstrable cross-functional knowledge with systems, storage, networking, security and databases
· System administration skills, including automation and orchestration of Linux/Windows using Chef, Puppet, Ansible, Salt Stack and/or containers (Docker, Kubernetes, etc.)
· Proficiency with continuous integration and continuous delivery tooling and practices
· Strong analytical and troubleshooting skills
· Ability and willingness to learn and apply new tools and technologies
· Extra Points for any of the following:
· Prior experience in developing applications in .NET technologies or Java
· You have expertise designing, analyzing and troubleshooting large-scale distributed systems.
· You take a system problem-solving approach, coupled with strong communication skills and a sense of ownership and drive
· You are passionate for automation with a desire to eliminate toil whenever possible
· You've built software or maintained systems in a highly secure, regulated or compliant industry
· You thrive in and have experience and passion for working within a DevOps culture and as part of a team
About Us:
UST Global is a technology partner dedicated to transforming businesses, communities, and the people who live within them. Operating in 25 countries, we deliver future-ready digital transformation strategy services, products, and platforms that create new possibilities and help you imagine what's next in financial services, healthcare, retail, manufacturing, semiconductor, and communications. But what matters most is the deep partnership we forge with you to solve the unique challenges you face today, while preparing you for tomorrow.
That's us together. That's UST Global.
EEO Statement
For US region, Equal Employment Opportunity and Diversity and Inclusion Strategy UST is committed to a policy of equal employment opportunity for applicants and employees. Employment decisions comply with all applicable laws prohibiting discrimination in employment, including Title VII of the Civil Rights Act of 1964, the Age Discrimination in Employment Act of 1967, the Americans with Disabilities Act of 1990, the Immigration and Nationality Act, Genetic Information Nondiscrimination Act and any applicable state and local laws. Further, UST is committed to having a workplace that encourages diversity and inclusion.