-
Implement ETL process using defined framework.
-
Monitor performance and advise any necessary infrastructure changes.
-
Create/modify tables, views in hive.
-
Write Shell Scripts to execute hive on spark jobs.
-
Automate the Shell Scripts on job scheduling tool - Autosys.
-
Improve job performance by implementing hive parameters, spark configuration level changes and spark optimization techniques.
-
Create/modify HQL script to retrieve data from hive tables or to use HQL script for data processing.
-
Work with team to defining data retention logic as per business requirements.
-
Perform and oversee tasks such as writing scripts, writing T-SQL queries and calling APIs.
-
Customize and oversee integration tools, warehouses, databases, and analytical systems.
-
Design the data flow, create data flow diagrams and implement design level changes.
-
Design and implement data stores that support the scalable processing and storage of the high-frequency data.
-
With the help of Admin/Support team, solve any ongoing issues with operating the cluster.
-
Bachelor’s or Master’s degree in Computer/Data Science Technical is required.
-
7+ years of hands-on years of data engineering experience with data warehouse, data lake, and enterprise big data platforms required.
-
2+ years of experience with real-time data stream platforms such as Flume, Kafka and Spark Streaming.
-
Experience working in an Agile/Iterative methodology required.
-
Working experience with Bigdata-Hadoop ecosystem: NoSQL-Hive, Impala, Spark, Scala, shell scripting and RDMBS-MS SQL server required.
-
Experience with integration of data from multiple data sources with full load, incremental load and real time load.
-
Working experience with development/deployment tool: Jira, Bitbucket, Jenkins, RLM.
-
Experience with Spark ,Hadoop v2, MapReduce, HDFS required.
-
Good knowledge of Big Data querying tools, such as Pig, Hive, and Impala required.
-
Experience with various ETL techniques and frameworks required.