Description : Summary
Summary
The Data Engineer will be responsible for building, maintaining data pipelines and data products to ingest, process large volume of structured / unstructured data from various sources.
The Data engineer will work on analyzing the data needs, migrating the data into an Enterprise data lake, build data products and reports.
The role requires experience with building real time and batch based ETL pipelines with strong understanding of big data technologies and distributed processing frameworks with.
Skill Needs
oExpertise working with large scale distributed systems (Hadoop, Spark).
oStrong understanding of the big data cluster, and its architecture
oExperience building and optimizing big data ETL pipelines.
oAdvanced programming skills with Python, Java, Scala
oGood knowledge of spark internals and performance tuning of spark jobs.
oStrong SQL skills and is comfortable operating with relational data models and structure.
oCapable of accessing data via a variety of API / RESTful services.
oExperience with messaging systems like Kafka.
oExperience with No SQL databases. Neo4j, mongo, etc.
oExpertise with Continuous Integration / Continuous Delivery workflows and supporting applications.
oExposure to cloud environments and architectures. (preferably Azure)
oAbility to work collaboratively with other teams.
oExperience with containerization using tools such as Docker.
oStrong knowledge of Linux and Bash. Can interact with the OS at the command line and create shell scripts to automate workflows.
oAdvanced understanding of software development and collaboration, including experience with tools such as Git.
oExcellent written and verbal communication skills, comfortable presenting in front of non-technical audiences.
Essential Responsibilities include but not limited to :
oDesign and develop ETL workflows to migrate data from varied data sources including SQL Server, Netezza, Kafka etc. in batch and real-time.
oDevelop checks and balances to ensure integrity of the ingested data.
oDesign and Develop Spark jobs as per requirements for data processing needs.
oWork with Analysts and Data Scientists to assist them in building scalable data products.
oDesigns systems, alerts, and dashboards to monitor data products in production