Lead the migration of datasets and ETL workflows from Cloudera Hadoop (Hive, Impala, HDFS, etc.) to an Apache Iceberg based architecture.
Analyze existing data pipelines and storage formats (e.g., Parquet, ORC) to plan and execute a smooth migration strategy.
Design and implement scalable data ingestion and transformation pipelines using Apache Spark, Flink, or equivalent tools.
Optimize data partitioning, schema evolution, compaction, and metadata management using Iceberg best practices.
Integrate Iceberg tables with query engines like Trino or Presto to support data analytics use cases.
Ensure compatibility and data quality during the migration phase through robust testing, validation, and lineage tracking.
Establish monitoring, logging, and performance tuning for migrated pipelines and Iceberg tables.
Seniority level
Mid-Senior level
Employment type
Contract
Job function
Information Technology
Industries
IT Services and IT Consulting
J-18808-Ljbffr
Data Lead • Strongsville, OH, US