Key Responsibilities :
Develop, maintain, and optimize scalable ETL / ELT pipelines using PySpark and Databricks.
Collaborate with cross-functional teams to design and implement data models and data integration solutions.
Create and maintain robust SQL scripts for querying, transforming, and analyzing data.
Work on Databricks to manage big data workloads and ensure optimal performance for large-scale datasets.
Ensure data quality, integrity, and governance across the organization's data assets.
Automate data workflows and deploy reliable solutions for real-time data processing.
Debug and troubleshoot performance issues with data pipelines and implement enhancements.
Stay up-to-date with emerging trends and best practices in data engineering and big data technologies.
Required Skills and Qualifications :
Educational Background :
Bachelor's or Master's degree in Computer Science, Information Technology, Data Science, or a related field.
Certifications in Databricks, Azure, or related technologies are a plus.
Technical Skills :
Proficiency in SQL for complex queries, database design, and optimization.
Strong experience with PySpark for data transformation and processing.
Hands-on experience with Databricks for building and managing big data solutions.
Familiarity with cloud platforms like AWS, Azure, or Google Cloud.
Knowledge of data warehousing concepts and tools (e.g., Snowflake, Redshift).
Experience with data versioning and orchestration tools like Git, Airflow, or Dagster.
Solid understanding of Big Data ecosystems (Hadoop, Hive, etc.).
Preferred Qualifications :
7+ years of relevant work experience in data engineering or software engineering equivalent.
3+ years of experience in implementing big data processing technology : AWS / Azure / GCP, Apache Spark, Python.
Experience writing and optimizing SQL queries in a business environment with large-scale, complex datasets.
Data Engineer • Dallas, TX, United States