KEY RESPONSIBILITES AND DUTIES
- Design, develop and build data integration pipeline architecture and ensure successful creation of the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using Spark, SQL and other technologies.
- Build robust and scalable applications using SQL, Scala/Python and Spark
- Work on creating data ingestion processes to maintain Global Datalake
- Expertise in writing complex, highly optimized queries across large datasets
- Code deployments, Code Reviews
- Develop and maintain design documentation, test cases, performance and monitoring and performance evaluation using GIT, Confluence, Cluster Manager
“MUST HAVE” SPECIFIC KNOWLEDGE AND SKILLS
- Proficiency in Python, Pyspark, SQL, Azure technologies
- 5+ years of experience with Big data
ADDITIONAL SKILLS AND OTHER REQUIREMENTS
- Retail experience and knowledge of commercial data is a huge plus