Job Title: Cloud Data Engineer
Job Location: Middletown, NJ
Job Duration: 12 months
The Cloud Data Engineer will develop, support and optimize production data pipelines hosted in Azure Databricks. The Engineer will be using Azure cloud native services like Azure Databricks, Azure SQL (Postgres), Azure Data Factory, etc. as well as Pyspark and MS SQL. This individual will both develop and validate data pipelines to ensure they are accurate, optimized, and complete by working closely with SMEs. Development tools in Python, Spark and Scala and will test modules as needed.
- Develop data pipelines to support business requirements
- Perform data quality validation. Employ data mining techniques to achieve data synchronization, redundancy elimination, source identification, data reconciliation, and problem root cause analysis.
- Investigate and resolve data issues across platforms and applications, including discrepancies of definition, format, and function.
- Tune data models to optimize performance
- Build the data infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and Azure data tier technologies.
- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
- Review Pyspark code and SQL query logic to ensure accuracy
- Develop and maintain scalable data pipelines and builds out new API integrations to support continuing increases in data volume and complexity
- 8+ years of hands-on data engineering experience with major Big Data technologies and frameworks including but not limited to Hadoop, MapReduce, Databricks, Apache Spark, Snowflake, Flume, ZooKeeper, Hive, HBase, etc.
- 5+ years of experience working with cloud platforms in data engineering disciplines and developing automation projects.
- 3+ years of strong experience in Azure data platform and tools, including Azure Data Lake Storage (ADLS) Gen 2, Azure Databricks, Azure Data Factory, etc.
- Strong hands-on programming experience in Python and Spark required.
- Experience in building Devops pipeline within the Azure stack
- Experience with Databricks Testing Framework
- Experience with schema design (ex: star schema) and dimensional data modeling
- SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases
- Experience in working in Unix environment (Linux, Ubuntu, etc) and virtualization
- Knowledge of various NoSQL databases such as Cassandra, HBase, MongoDB, etc
- DP-203: Data Engineering on Microsoft Azure
- BS in Computer Science or Software Engineering, Master’s Degree preferred