Data Engineer

eTek IT Services, Inc.
Cincinnati, OH, US
Full-time

Job Description

Job Description

Job Description We are seeking a skilled Data Engineer to join our Data Science team. The ideal candidate will be responsible for designing, building, and maintaining scalable data pipelines and infrastructure to support data analytics, machine learning, and Retrieval-Augmented Generation (RAG) type Large Language Model (LLM) workflows.

This role requires a strong technical background, excellent problem-solving skills, and the ability to work collaboratively with data scientists, analysts, and other stakeholders.

Key Responsibilities :

  • Data& Pipeline Development :
  • Design, develop, and maintain robust and scalable ETL (Extract, Transform, Load) processes.
  • Ensure data is collected, processed, and stored efficiently and accurately.
  • Data& Integration :
  • Integrate data from various sources, including databases, APIs, and third-party data providers.
  • Ensure data consistency and integrity across different systems.
  • RAG Type LLM Workflows :
  • Develop and maintain data pipelines specifically tailored for Retrieval-Augmented Generation (RAG) type Large Language Model (LLM) workflows.
  • Ensure efficient data retrieval and augmentation processes to support LLM training and inference.
  • Collaborate with data scientists to optimize data pipelines for LLM performance and accuracy.
  • Semantic / Ontology& Data& Layers :
  • Develop and maintain semantic and ontology data layers to enhance data integration and retrieval.
  • Ensure data is semantically enriched to support advanced analytics and machine learning models.
  • Collaboration :
  • Work closely with data scientists, analysts, and other stakeholders to understand data requirements and deliver solutions.
  • Provide technical support and guidance on data-related issues.
  • Data& Quality and Governance :
  • Implement data quality checks and validation processes to ensure data accuracy and reliability.
  • Adhere to data governance policies and best practices.
  • Performance Optimization :
  • Monitor and optimize the performance of data pipelines and infrastructure.
  • Troubleshoot and resolve data-related issues in a timely manner.
  • Support for Analysis :
  • Support short-term ad-hoc analysis by providing quick and reliable data access.
  • Contribute to longer-term goals by developing scalable and maintainable data solutions.
  • Documentation :
  • Maintain comprehensive documentation of data pipelines, processes, and infrastructure.
  • Ensure knowledge transfer and continuity within the team.

Technical Requirements :

  • Education and Experience :
  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
  • 3+ years of experience in data engineering or a related role.
  • Technical Skills :
  • Proficiency in Python (mandatory).
  • Experience with other programming languages such as Java or Scala is a plus.
  • Experience with SQL and NoSQL databases (e.g., MySQL, PostgreSQL, MongoDB).
  • Familiarity with big data technologies (e.g., Hadoop, Spark, Kafka).
  • Experience with cloud platforms (e.g., AWS, Azure, Google Cloud) and their data services.
  • RAG Type LLM Skills :
  • Experience with data pipelines for LLM workflows, including data retrieval and augmentation.
  • Familiarity with natural language processing (NLP) techniques and tools.
  • Understanding of LLM architectures and their data requirements.
  • Semantic / Ontology& Data& Layers :
  • Familiarity with semantic and ontology data layers and their application in data integration and retrieval.
  • Tools and Frameworks :
  • Experience with ETL tools and frameworks (e.g., Apache NiFi, Airflow, Talend).
  • Familiarity with data visualization tools (e.g., Tableau, Power BI) is a plus.
  • Soft Skills :
  • Strong analytical and problem-solving skills.
  • Excellent communication and collaboration abilities.
  • Ability to work in a fast-paced, dynamic environment.

Preferred Qualifications :

  • Experience with machine learning and data science workflows.
  • Knowledge of data governance and compliance standards.
  • Certification in cloud platforms or data engineering.
  • 30+ days ago
Related jobs
Promoted
Golden Technology
Cincinnati, Ohio

Experience with Cosmos DB, Azure Data Explorer, Azure Synapse Analytics, Azure Data Lake, Azure Data Factory, Azure SQL, Azure Databricks, Azure Machine Learning or equivalent tools & technologies. Author and publish data governance models, data lineage and data dictionary for Fulfillment Servic...

Promoted
Agility Partners LLC
Cincinnati, Ohio

Data Engineer and has helped/lead development teams in delivering high-quality data orchestration solutions with min 7+ years’ experience. Agility Partners is seeking a qualified Senior Azure Data Engineer to fill an open position with a leading grocer in the Cincinnati area. Development: Working wi...

Promoted
PatientPoint®
Cincinnati, Ohio

Develop and maintain data documentation, including data dictionaries, data lineage, and data flow diagrams, best practices and data recovery processes to provide clear visibility into the data ecosystem. As a Senior Data Analytics Engineer on our hybrid agile scrum team, your responsibilities will b...

MEDPACE
Cincinnati, Ohio

Medpace is a full-service clinical contract research organization (CRO).We provide Phase I-IV clinical development services to the biotechnology, pharmaceutical and medical device industries.Our mission is to accelerate the global development of safe and effective medical therapeutics through its sc...

Huntington National Bank
Ohio

The Data Protection Engineer Senior will independently perform Data Protection engineering activities of building, configuring, troubleshooting, integrating and administrating Data protection technologies aligned to one of the Data Protection sub-domains (Data in Transit, Data at Rest, Cryptographic...

Aditi Consulting
Blue Ash, Ohio

Understanding the benefits of data warehousing, data architecture, data quality processes, data warehousing design and implementation, table structure, fact and dimension tables, logical and physical database design. Experience working on cloud migration methodologies and processes including tools l...

Amazon Data Services, Inc.
Cincinnati, Ohio

Experience using SQL to pull data from a database or data warehouse and scripting experience (Python) to process data for modeling. We support all AWS data centers and all of the servers, storage, networking, power, and cooling equipment that ensure our customers have continual access to the innovat...

Jobs for Humanity
Cincinnati, Ohio

The role of Cyber Security Analyst with Applied Cryptography team of Data Protection group involves closely working with end users to review their application architecture for sensitive data protection, providing tools and support needed to secure the data form attack surface. You will be part of a ...

Nationwide Private Client
Ohio, US

Produces data building blocks, data models, and data flows for varying client demands such as dimensional data, standard and ad hoc reporting, data feeds, dashboard reporting, and data science research & exploration. Creates simple to moderate business user access methods to structured and unstructu...

GE Renewable Energy Power and Aviation
Cincinnati, Ohio

The Staff Dynamic Data System Engineer - Test is responsible for designing, validating and developing. Work collaboratively to design, procure, and evaluate data systems and software for acquiring instrumentation data from engine and rig tests. Responsible for execution of laboratory and field valid...