Search jobs > Kansas City, MO > Data engineer

Data Engineer

Catalytic Data Science
Kansas City, Missouri, US
Full-time

Data Engineer III (Large Language Models)

Making sure you fit the guidelines as an applicant for this role is essential, please read the below carefully.

About Catalytic Data Science (CDS) :

Catalytic Data Science is a groundbreaking cloud R&D platform designed to integrate volumes of scientific resources, data, and analytic tools while providing the ability to network with colleagues in one secure and scalable environment.

By enabling R&D teams to work more collaboratively and improving productivity company-wide, the Catalytic platform helps teams achieve key R&D milestones faster and with greater accuracy.

Our customers are passionate about making the world a better place, and we are inspired by the opportunity to help them.

The Role

You are a Data Engineer with experience in processing terabytes of data and working with large language models (LLMs). You have experience in creating and automating scalable, fault-tolerant, and reproducible data pipelines for natural language processing (NLP) using Amazon AWS technologies.

You will design and implement data ingestion, processing, and storage solutions that can handle massive amounts of text data from various sources.

You are interested in helping to create a platform completely built on top of AWS. You are eager to join a team of Life Scientists and Software Engineers that believe the brightest minds in research should have the best tools to drive innovation.

What You’ll Do

  • Build, test, and operate automated Extract, Transform, and Load (ETL) pipelines that process terabytes of text data nightly.
  • Develop service frontends around our various backend data stores (AWS Aurora, MySQL, Elasticsearch, S3).
  • Rapidly prototype, test, and deploy data pipelines for LLMs using AWS.
  • Collaborate with data scientists and NLP engineers to understand the data requirements and specifications for LLMs and related tasks such as text summarization, translation, and question answering.
  • Optimize the performance, reliability, and scalability of the data pipelines and LLMs by applying best practices and techniques such as data partitioning, caching, compression, and monitoring.
  • Ensure the quality, integrity, and security of the data by implementing data validation, cleaning, and governance policies and procedures.
  • Research and evaluate new technologies and methods for data engineering and LLMs and stay updated with the latest trends and developments in the field.
  • Participate in data architecture and engineering decisions, bringing your strong experience and knowledge to bear.

Qualifications

  • Bachelor's degree or higher in computer science, engineering, or a related field.
  • 3+ years of experience in data engineering, preferably with large-scale text data and LLMs and 6+ years of any software engineering experience (including data engineering).
  • Proficient in Python 3 or Java, preferably both.
  • Experience with data modeling, ETL, and data warehouse design and implementation.
  • Expertise with ETL schedulers such as Airflow, Prefect or similar frameworks.
  • Familiar with LLMs and NLP concepts and frameworks such as Transformers, BERT, GPT, PaLM, and LLaMA.
  • Day-to-day experience using AWS technologies such as Lambda, ECS Fargate, SQS, & SNS.
  • Experience extracting, processing, storing, and querying of petabyte-scale datasets.
  • Familiarity with building and using containers.
  • Familiarity with event-based microservices.
  • Strong communication, collaboration, and problem-solving skills.

Core Skills :

  • ETL Processes
  • Data Modeling and Database Design
  • Proficiency in Large Language Models
  • Data Pipeline Optimization
  • Cross-functional Collaboration
  • Problem-solving and Analytical Skills

Nice-to-Haves

  • Prior experience with Elasticsearch (custom development and / or administration) is a huge plus.
  • Knowledge of Graph databases.

What Do We Love in Team Members?

Your specialization is less important than your ability to learn fast and adapt to shifting technologies. We’re especially fond of people who :

  • Focus on customer’s needs and our company’s goals, not just writing code.
  • Iterate until customers love what you’ve built.
  • Self-start and initiate.
  • Self-organize.
  • Strive to grow personally and professionally, beyond just expanding technical abilities.
  • Love to experiment with new technology and share knowledge with the team.

In compliance with federal law, all persons hired will be required to verify identity and eligibility to work in the United States and to complete the required employment eligibility verification document form upon hire.

J-18808-Ljbffr

1 day ago
Related jobs
Highmark Health
MO, Working at Home, Missouri

In partnership with other business, platform, technology, and analytic teams across the enterprise, design, build and maintain well-engineered data solutions in a variety of environments, including traditional data warehouses, Big Data solutions, and cloud-oriented platforms. Align with security, da...

C2FO
Kansas City, Missouri

Our Data Engineers use cutting-edge, open-source technologies to collect, process, and store the companys data. Collaborate with the Data Engineering team to create and maintain testable, maintainable data pipelines. Job Category****:** Engineering **Requisition Number****:** DATAE001513 ...

Olsson
Kansas City, Missouri

We are Olsson, a team-based, purpose-driven engineering and design firm. This person will have extensive commissioning knowledge within large facilities, such as data centers. Olsson is a nationally recognized, employee-owned firm specializing in planning and design, engineering, field services, env...

Veeva Systems
Kansas City, Missouri

As a Senior QA Engineer in Opendata, you will lead the effort for building the QA framework from the ground up and the overall QA of the Data Platform for Opendata. Check the data source locations and formats, perform a data count, and verify that the columns and data types meet the requirements. Ve...

Barkley
Kansas City, Missouri

Validate data delivery from Data Suppliers to ensure data is aligned to expectations, resolving technical & data issues as they arise. Must have experience in Cloud based Data Engineering with design, implementation and operationalization of large-scale data and analytics solutions, ideally Snowflak...

Catalytic Data Science
Kansas City, Missouri

You are a Data Engineer with experience in processing terabytes of data and working with large language models (LLMs). Collaborate with data scientists and NLP engineers to understand the data requirements and specifications for LLMs and related tasks such as text summarization, translation, and que...

JLL
Kansas City, Missouri

The Data Center Operating Engineer is responsible for delivery of best practice systems and problem resolution on all data center electrical and mechanical infrastructure (UPS, MV electrical systems, generators, cooling systems etc. Learn and understand the data center site in-order to manage incide...

Olsson Associates Inc.
Kansas City, Missouri

Project Mechanical Engineer - Data Center. We are Olsson, a team-based, purpose-driven engineering and design firm. As a Project Mechanical Engineer, you will serve as project manager on small projects, prepare planning and design documents, and process design calculations. Bachelor's degree in mech...

Outcome Logix ( A Tech 50 Finalist company 2022, by Pittsburgh Technology Council )
Kansas City, Missouri
Remote

Proven experience in systems engineering, with a focus on hybrid cloud platforms, automated storage systems, and data protection technologies. We are seeking a Systems Engineer who will be responsible for designing, implementing, and maintaining IT infrastructure, focusing on HPE GreenLake, VLM, and...

Olsson
Kansas City, Missouri

The large hyperscale data center campuses we design throughout the will give you the opportunity to work on some of the largest and most complex engineering-driven projects being built today. As a Project Engineer on our Data Center Civil Team, you will be a part of the firm’s largest and most comp...