Job Summary :
To be considered for an interview, please make sure your application is full in line with the job specs as found below.
We are looking for a highly skilled and experienced Lead Data Engineer with over 10 years of IT expertise in software analysis, design, development, testing, and implementation of Big Data, Hadoop, Java, ETL, and database technologies.
The ideal candidate should have a deep understanding of the application lifecycle, from initiation to deployment and support, with hands-on experience in designing and implementing complex data engineering solutions.
Key Responsibilities :
- Architect and develop the best suitable business logics and application framework, including the selection of technical stack for data engineering projects.
- Build and maintain Ingestion frameworks for detecting and reading data from source folders using CDC (Change Data Capture) strategy.
- Convert Hive / SQL queries into Spark transformations using Spark RDDs, Spark SQL, and Scala.
- Manage Spark applications and launch clusters with Spark on GCP DataProc cluster, including the use of CICD for deployment.
- Develop data ingestion pipelines using Kafka and Spark Streaming APIs.
- Perform Spark RDD transformations, map business analysis, and implement actions for optimal data processing.
- Integrate real-time streaming of data using GCP PubSub into Spark applications.
- Develop and maintain Spark SQL tables and queries for ad-hoc data analysis.
- Create Lambda workflow jobs for automation and schedule using Airflow, passing configurations dynamically.
- Migrate Hive queries into Spark transformations using DataFrames, SQL Context, and Scala.
- Implement test scripts supporting test-driven development and continuous integration (CI).
- Perform data processing with GCP Dataflow and load data into GCP BigQuery.
- Write and execute Shell scripts for automating deployment processes.
- Collaborate with cross-functional teams including clients, stakeholders, and business analysts to ensure seamless integration and delivery of data engineering projects.
- Maintain and manage Hadoop infrastructure, including log files and security integrations, leveraging Cloudera Manager.
Required Skills and Qualifications :
- 10+ years of experience in IT, focusing on Big Data, Hadoop, and related technologies.
- Expertise in Hadoop ecosystem tools including HDFS, MapReduce, Hive, Pig, Impala, Spark, Kafka, Zookeeper, Oozie, and Sqoop.
- Hands-on experience with Hadoop shell commands, Spark RDDs, Spark SQL, and Scala programming.
- Proficiency in data analysis, transformation, validation, and cleansing.
- Experience with Java, Scala, Python, and familiarity with databases like Oracle and MySQL.
- Proficient in version control tools like GIT, SVN, and project management tools such as JIRA and GitHub.
- Experience with GCP technologies like DataProc, PubSub, BigQuery, and Dataflow.
- Strong understanding of Agile methodology and Software Development Lifecycle (SDLC).
- Excellent interpersonal, technical, and communication skills.
- Ability to manage and adapt to changing technologies and environments with a self-driven, adaptive, and quick learning approach.
Skills :
- Experience with CICD pipelines for deploying applications in GCP environments.
- Experience in cloud security, integrating with Kerberos authentication and authorization.
- Familiarity with tools like Airflow for job scheduling and orchestration.
- Experience in VPN, winSCP, FileZilla, SFTP, and FTP protocols.
Environment :
Hadoop, Scala, PySpark, Spark SQL, Hive, GCP DataProc, Storage, Secret Management, GSutils, BigQuery, MySQL, UNIX Shell Scripting, PubSub, Springboot API.
J-18808-Ljbffr