(Senior) Bioinformatics Data Engineer, Omics Pipelines, Translational and Quantitative Sciences Data Engineering

Genmab
Princeton, NJ
Full-time

The Role

The successful candidate will contribute to the mission of the global data engineering function and be responsible for many aspects of data including creation of data-as-a-product, architecture, access, classification, standards, integration, and pipelines.

Although your role will involve a diverse set of data-related responsibilities, your key focus will be on the creation of bioinformatics pipelines to process bulk and single cell genomics and transcriptomics data for the enablement and downstream interpretation of Translational and Quantitative Sciences functions, including Data Science, Translational Medicine, Precision Medicine, and Translational Research.

You will have a balance of subject matter expertise in life science data, terminology and processes and technical expertise for hands-on implementation.

You will be expected to create workflows to standardize and automate data, connect systems, enable tracking of data, implement triggers and data cataloging.

With your experience in the Research domain, you will possess knowledge of diverse assay types such as IHC, flow cytometry, cytokine data, but specialize in genomics and transcriptomics.

Your ultimate goal will be to place data at the fingertips of stakeholders and enable science to go faster. You will join an enthusiastic, agile, fast-paced and explorative global data engineering team.

Responsibilities

  • Design, implement and manage ETL data pipelines that process and transform vast amounts of scientific data from public, internal and partner sources into various repositories on a cloud platform (AWS)
  • Incorporate bioinformatic tools and libraries to the processing pipelines for omics assays such as bulk and single cell RNASeq
  • Enhance end-to-end workflows with automation that rapidly accelerate data flow with pipeline management tools such as Step Functions, Airflow, or Databricks Workflows in combination with specialized bioinformatics pipeline tools such as WDL, Nextflow, or Snakemake
  • Implement and maintain bespoke databases for scientific data (RWE, in-house labs, CRO data) and consumption by analysis applications and AI products
  • Innovate and advise on the latest technologies and standard methodologies in Data Engineering and Data Management, including recent advancements with GenAI, and latest bioinformatics tools, modules and techniques in RNA sequencing analysis
  • Manage relationships and project coordination with external parties such as Contract Research Organizations (CRO) and vendor consultants / contractors
  • Define and contribute to data engineering practices for the group, establishing shareable templates and frameworks, determining best usage of specific cloud services and tools, and working with vendors to provision cutting edge tools and technologies
  • Collaborate with stakeholders to determine best-suited data enablement methods to optimize the interpretation of the data, including creating presentations and leading tutorials on data usage as appropriate
  • Apply value-balanced approaches to the development of the data ecosystem and pipeline initiatives
  • Proactively communicate data ecosystem and pipeline value propositions to partnering collaborators, specifically around data strategy and management practices
  • Participate in GxP validation processes

Requirements

  • BS / MS in Computer Science, Bioinformatics, or a related field with 5+ years of software engineering experience (8+ years for senior role) or a PhD in Computer Science, Bioinformatics or a related field and 2+ years of software engineering experience (5+ years for senior role)
  • Excellent skills and deep knowledge of ETL pipeline, automation and workflow managements tools such as Airflow, AWS Glue, AWS Step Functions, and CI / CD is a must.

Strong preference specifically for AWS Step Functions and Lambda.

Excellent skills with bioinformatics pipeline tools and troubleshooting for quality such as Snakemake, WDL, and Nextflow.

Strong preference for Nextflow.

Excellent skills and deep knowledge in Python, Pythonic design and object-oriented programming is a must, including common Python libraries such as pandas.

Experience with R a plus

  • Excellent understanding of different bioinformatics modules and databases such as STAR, HISAT2, featureCounts, fastQC, RSeQC and Cell Ranger and how they’re used on different types of genomic and transcriptomic data such as single cell transcriptomics
  • Solid understanding of modern data architectures and their implementation offerings such as Databricks’ Delta Tables, Athena, Glue, Iceberg, and their applications to Lakehouse and medallion architecture.
  • Experience working with clinical data and understanding of GxP compliance and validation processes
  • Proficiency with modern software development methodologies such as Agile, source control, project management and issue tracking with JIRA
  • Proficiency with container strategies using Docker, Fargate, and ECR
  • Proficiency with AWS cloud computing services such as Lambda functions, ECS, Batch and Elastic Load Balancer and other compute frameworks such as Spark, EMR, and Databricks.

Strong preference for experience with AWS Omics.

For US based candidates, the proposed salary band for this position is as follows :

$,.00 $,.00

The actual salary offer will carefully consider a wide range of factors, including your skills, qualifications, experience, and location.

Also, certain positions are eligible for additional forms of compensation, such as bonuses.

About You

  • You are passionate about our purpose and genuinely care about our mission to transform the lives of patients through innovative cancer treatment
  • You bring rigor and excellence to all that you do. You are a fierce believer in our rooted-in-science approach to problem-solving
  • You are a generous collaborator who can work in teams with diverse backgrounds
  • You are determined to do and be your best and take pride in enabling the best work of others on the team
  • You are not afraid to grapple with the unknown and be innovative
  • You have experience working in a fast-growing, dynamic company (or a strong desire to)
  • You work hard and are not afraid to have a little fun while you do so

Locations

Genmab leverages the effectiveness of an agile working environment, when possible, for the betterment of employee work-life balance.

Our offices are designed as open, community-based spaces that work to connect employees while being immersed in our state-of-the-art laboratories.

Whether you’re in one of our collaboratively designed office spaces or working remotely, we thrive on connecting with each other to innovate.

30+ days ago
Related jobs
Promoted
Bloomberg
Princeton, New Jersey

Use technical skills to develop, scale, and maintain the data pipelines and processes that interact with our databases. Be using technology to solve problems and optimize current processes in areas such as data quality, data acquisition, and workflow automation, alongside domain experts and technica...

Promoted
Bank of America Corporation
Pennington, New Jersey

We hire individuals with a broad range of backgrounds and experiences and invest heavily in our teammates and their families by offering competitive benefits to support their physical, emotional, and financial well-being. Engaging with stakeholders to evolve and drive adherence to the policy and res...

Promoted
RCM Life Sciences and IT
Piscataway, New Jersey

This range is specific to certain locations and takes into account the wide range of factors that are considered in making compensation decisions including but not limited to skill sets; experience and training; licensure and certifications; and other business and organizational needs. RCM Life Scie...

Bloomberg
Princeton, New Jersey

In Data, we are responsible for delivering this data, news and analytics through innovative technology - quickly and accurately. We optimize the value of our data by combining our domain and technical expertise to make our data fit-for-purpose, timely and accurate. As a Senior Data Annotation Analys...

Highmark Health
NJ, Working at Home, New Jersey

Provides business process, system support and data quality governance through data coordination and integration to ensure efficient processes and consistent data flows to business and stakeholders. Provide business process, system support and data quality governance through data coordination and int...

Bloomberg
Princeton, New Jersey

This work includes managing the semantic and field inventories, working with subject matter experts to help onboard them to the Metadata tech stack, and much more! The team composition features technical specialists and data engineers to drive these initiatives. Data Team Leaders are the frontline m...

Institute for Advanced Study
Princeton, New Jersey

Reporting to the Director of Development Operations, the Development Data and Prospect Research Manager provides critical support for revenue generation at the Institute by efficiently collecting, entering, and reporting on fundraising data; optimizing the use of fundraising systems; providing data-...

Sunrise Systems
Trenton, New Jersey

This senior consultant is for a Cloud data Engineer that will lead in data pipelines and data infrastructure as well as build or assist in any data engineering tasks. Experience building and optimizing ‘big datadata pipelines, architectures, and datasets. The State of NJ is seeking a Cloud Data En...

Genmab
Princeton, New Jersey
Remote

Standardize data type specifications to align with Electronic Data Capture (EDC) requirements and clinical trial. CDISC (SDTM) standards and clinical data standards development. Coordinate with vendors to guarantee that data collection is compatible with clinical databases and meets SDTM requirement...

Katalyst HealthCares & Life Sciences
NJ

Collaborate with regulatory affairs and quality assurance teams to ensure compliance with submission guidelines and standards. Coordinate the preparation and documentation of submission packages, including data specifications and validation documentation. Stay abreast of evolving regulatory requirem...