JD : Role : ETL Engineer
Role : ETL Engineer
Location : Remote
No. of years of experience : 8+ years’
Hardcore Technical - ETL
- ETL experience Strong, 8+ years
- Hadoop Strong, 8+ years
- PySpark Strong, 8+ years
- Spark Strong, 8+ years
- Dask Required, should be able to set up DASK framework
- JupyterHub Setup Experience required
Location : Pittsburgh preferred, but remote is fine
Job Description :
We are seeking an experienced ETL Engineer to join our team. The ideal candidate will have 8 to 10 years of experience in designing, developing, and optimizing ETL processes, with strong expertise in Hadoop, PySpark, Spark, and Dask.
The role involves setting up and managing data workflows, ensuring data integrity and efficiency, and collaborating with cross-functional teams to meet business needs.
If you are passionate about data engineering and have a track record of delivering high-quality ETL solutions, we encourage you to apply.
Key Responsibilities :
- ETL Processes : Design, develop, and optimize ETL pipelines to efficiently extract, transform, and load data from various sources.
- Hadoop : Implement and manage Hadoop-based solutions for large-scale data processing and storage, ensuring optimal performance and scalability.
- PySpark : Develop and maintain PySpark applications for processing and analyzing big data, leveraging Spark's capabilities for distributed computing.
- Spark : Utilize Apache Spark for data processing, including batch and streaming data applications, ensuring high performance and reliability.
- Dask : Set up and manage the Dask framework for parallel computing and distributed data processing, optimizing workflows and handling large-scale data tasks.
- JupyterHub Setup : Configure and maintain JupyterHub environments for collaborative data analysis and notebook sharing, ensuring a seamless user experience.
Must-Have Skills :
- ETL Expertise : Strong experience in designing and managing ETL processes, including data extraction, transformation, and loading.
- Hadoop : Proven proficiency with Hadoop ecosystem tools and technologies for big data processing and storage.
- PySpark : Extensive experience with PySpark for data processing, including writing and optimizing Spark jobs.
- Spark : Deep understanding of Apache Spark for both batch and real-time data processing.
- Dask : Hands-on experience setting up and managing the Dask framework for distributed computing and large-scale data processing.
- JupyterHub Setup : Experience configuring and maintaining JupyterHub for data analysis and notebook collaboration.
- Communication Skills : Strong verbal and written communication skills, with the ability to articulate complex technical concepts to diverse audiences.
- Independent Work : Ability to work independently, manage multiple tasks, and deliver high-quality results with minimal supervision.
Good-to-Have Skills :
- Cloud Platforms : Familiarity with cloud-based data platforms for deploying and managing big data solutions.
- Data Visualization : Experience with data visualization tools (e.g., Tableau, Power BI) for creating insightful visualizations and reports.
- Data Engineering Tools : Knowledge of additional data engineering tools and frameworks, including ETL and data integration technologies.
- Agile Methodologies : Experience with Agile development practices and methodologies for managing data projects and tasks.
Qualifications :
- Bachelor’s or Master’s degree in Computer Science, Data Engineering, Software Engineering, or a related field.
- 8 to 10 years of experience in data engineering, with strong expertise in ETL processes, Hadoop, PySpark, Spark, and Dask.
- Proven experience setting up and managing JupyterHub environments.