The Position
If you are a big data engineer and want to work on something that truly can change the world, this job is for you. Biology is approaching an inflection where we can directly leverage data to understand the cellular basis of human diseases and from this generate therapeutics that can treat these diseases.
Our Translational Genomics initiative is spearheading this effort and bringing together data from human genetics, functional genomics, molecular biology, disease model engineering, and tissue and cellular profiling.
We need a Data Engineering Lead to help us create a next-generation data engine that scalably and rigorously ingests and transforms data generated from this initiative so they are ready for machine-driven analysis.
The Data Engineering Lead will act as an architect and engineering manager tasked to oversee the construction and operation of this data engine.
This data engine will be used to help assemble an exabyte scale connected and computable data universe composed of high value internally and externally generated data and results that we can build our data science efforts on top of.
Your efforts will therefore directly enable computational discovery of disease targets and from these potentially life saving therapies.
A person hired in this position will
Manage a team that will architect and deliver a next generation data engine that enables scalable, flexible, and rigorous data transformations using modern data management practices.
Help architect and deliver data infrastructure that will enable machines to crawl and compute on and across all our data.
Work with a cross functional team of scientists and engineers to design and deliver these solutions.
Exert influence across the informatics organization via presentations and collaborations.
Successful candidates will meet the following requirements
You have a BS in a computational discipline with 12 years of work experience or a Masters with 7 years of experience.
7+ years experience architecting and developing scalable pipelines, frameworks and platforms to power data science efforts in distributed cloud environments, 5 of which are on AWS.
Multiple years of experience leading a distributed team of engineers to deliver solutions.
Practical understanding of the data management practices required to power rigorous data science and enable advanced analytics like AI & ML.
Exceptional communication skills.
Experience leading projects focused on omics data.
Hands-on experience working with the following technologies, frameworks, and languages : Java, Scala, Python, Spark, Airflow, RabbitMQ, Spring.
What to expect from us
A highly collaborative and dynamic research environment where we aim to advance the rate of scientific discovery using purposefully built solutions.
Access to large multimodal omic datasets focused on disease biology, samples and compute resources.
Access to state-of-the-art technologies and pioneering research.
Participation in seminar series featuring academic and industry scientists.
Campus-like lifestyle with a healthy work-life balance.
Mentored opportunities to further develop professional skills.