Role : Azure Databricks Data Engineer Generative AI & Advanced Lakehouse Solutions
Location : Atlanta, GA ( On-site)
Duration : 12 Months C2H
Job Summary :
Our client is seeking an innovative Azure Databricks Data Engineer experienced in building scalable data lakehouse architectures and Generative AI (GenAI) solutions. The ideal candidate will design and operationalize advanced data pipelines using Azure Databricks, Unity Catalog, Delta Lake, and Lake Flow, while integrating LLM-based AI assistants and chatbots powered by Azure OpenAI.
Key Responsibilities :
(a) Data Engineering & Lakehouse Development :
- Design, develop, and optimize data pipelines using Azure Databricks, Auto Loader, and Spark Structured Streaming for batch and real-time data processing.
- Implement Delta Lake for unified, reliable, and ACID-compliant data storage.
- Build Lake Flow declarative pipelines for simplified orchestration and dependency management across ingestion, transformation, and serving layers.
- Apply Medallion Architecture (Bronze Silver Gold) principles for modular, reusable data modeling.
- Utilize Unity Catalog and Meta store for centralized governance, lineage, and fine-grained data access control across workspaces.
- Develop and maintain SQL Warehouses for analytics and BI consumption.
- Implement SCD Type 1 & Type 2 (Slowly Changing Dimensions) logic for historical tracking and data consistency.
- Build and maintain Streaming Tables to enable continuous processing and near real-time analytics.
(b) Data Transformation & Optimization :
Design reusable and version-controlled data transformation workflows using dbt (Data Build Tool) within Databricks.Optimize Spark jobs via adaptive query execution, caching, partitioning, and Z-ordering for performance and cost efficiency.Implement alerts and notification mechanisms (e.g., via Databricks Jobs, Lake Flow, or Azure Monitor) for proactive pipeline monitoring.Package and deploy reusable data artifacts using Databricks Asset Bundles (DABs) to standardize deployments across environments.(c) Generative AI & Intelligent Automation :
Develop GenAI-powered assistants and bots leveraging Azure OpenAI, Lang Chain, and vector databases (Azure Cognitive Search, Pinecone, etc.).Integrate Retrieval-Augmented Generation (RAG) pipelines for context-aware enterprise chatbot experiences.Enable conversational analytics and document-based Q&A using company data sources through LLM integrations.Collaborate with ML engineers and solution architects to deploy AI features on Azure Kubernetes Service (AKS) or Azure App Service.Required Skills & Qualifications :
Bachelor's or Master's degree in Computer Science, Information Systems, or related field.8+ years of hands-on experience in data engineering on Azure Databricks.Proficiency in PySpark, SQL, Delta Lake, and Lake Flow.Deep understanding of Unity Catalog & Meta store governance, Spark Streaming & Auto Loader ingestion, ACID Transactions and Delta Lake optimization, Medallion Architecture best practices, SCD1 / SCD2 implementation, StreamingTables, and SQL Warehouse operations.Experience with dbt (Data Build Tool) for modular data transformations.Strong knowledge of CI / CD pipelines (Azure DevOps, GitHub Actions) and environment management using Databricks Asset Bundles.Familiarity with Generative AI frameworks (Azure OpenAI, Lang Chain) and LLM integration patterns.Preferred Skills :
Experience with Azure Synapse, Power BI, and Azure Data Factory (ADF).Familiarity with Data Governance tools (e.g., Azure Purview).Understanding of MLOps / AIOps and Lakehouse AI convergence patterns.Knowledge of cost optimization and workload tuning in Databricks.Soft Skills :
Excellent analytical and troubleshooting capabilities.Strong communication and documentation skills.Ability to work cross-functionally with AI, data science, and business teams.Passion for exploring cutting-edge data & AI innovations.