Lead Level Data Engineer
End client is the GAP
Duration is 1+ year
Work can be done remotely, but prefer people willing to work on EST or CST.
US Citizen or GC or TN
Communications will be very important The right person in this role will lead to another 20+ hires by the end of this year!
RESPONSIBILITIES:
End client is the GAP
Duration is 1+ year
Work can be done remotely, but prefer people willing to work on EST or CST.
US Citizen or GC or TN
Communications will be very important The right person in this role will lead to another 20+ hires by the end of this year!
RESPONSIBILITIES:
- Provisioning, configuring, and monitoring batch analytics pipelines with technologies like Spark/databricks, Kafka and Event hubs on Cloud Microsoft Azure or other major cloud platform.
- Scratch-build a highly scalable, available, fault-tolerant data processing systems using cloud technologies, HDFS, YARN, Map-Reduce, Hive, Kafka, Spark, and other big data technologies. These systems should handle batch and real-time data processing
- Develop and manage ETLs to source data from various sales and operational systems and create unified data model for analytics and reporting
- Design, implement, and support data warehouse / data lake infrastructure using Azure big data stack, Azure Synapse, Cosmos DB, Python, Spark etc.
- Maintain and support existing platforms and evolve to newer technology stacks and architectures
- Creating ETL/Ingestion jobs using Apache Spark and monitor/maintain the jobs on Cloud (Azure)
- Identify and surface performance and optimization opportunities and implement caching to improve the performance of derived views and using different joint methodologies like hash, nested, and merge.
- Schedule jobs using Scheduler and importing and exporting views to different databases and pre-prod environments and document the best practices and be actively involved in design discussions.
- Bachelor's degree in computer science or equivalent experience and 8+ years of software development experience
- Strong experience working on spark platform, Python and python libraries
- Strong SQL skills
- Working experience on cloud-based environments such as Databricks on cloud platform environment, ADF, PySpark etc.
- Proficiency writing notebooks using Python/Spark
- Well versed with relational databases, nonrelational databases, data streams, and file stores
- Experience in building distributed environments utilizing streaming architecture with Kafka, Spark, etc.
- Automation using Linux shell & Python scripting
- Hands-on experience in the DevOps model, production support and Agile methodologies as well understanding the concepts Test Driven Development and Business Driven Development
- Possess strong analytical and problem-solving skills and proven ability to think objectively and interpret meaningful themes from quantitative and qualitative data
- Experience with Azure is a plus
- Experience with Azure DevOps, GitLabs, Maven, Gradle, Jenkins, CodeFresh or other CI/CD tools
- Strong knowledge of Hadoop and associated technologies such as MapReduce and Spark
- Hands-on experience in the DevOps model, production support and Agile methodologies as well understanding the concepts of MLOps