Software Engineer - Data

Twelve Labs
San Francisco, CA, United States
Full-time
We are sorry. The job offer you are looking for is no longer available.

Who we are

At Twelve Labs, we are pioneering the development of cutting-edge multimodal foundation models that have the ability to comprehend videos just like humans do.

Our models have redefined the standards in video-language modeling, empowering us with more intuitive and far-reaching capabilities, and fundamentally transforming the way we interact with and analyze various forms of media.

With a remarkable $77 million in Seed and Series A funding, our company is backed by top-tier venture capital firms such as NVIDIA's NVentures, NEA, Radical Ventures, and Index Ventures, and prominent AI visionaries and founders such as Fei-Fei Li, Silvio Savarese, Alexandr Wang and more.

Headquartered in San Francisco, with an influential APAC presence in Seoul, our global footprint underscores our commitment to driving worldwide innovation.

We are a global company that values the uniqueness of each person's journey. It is the differences in our cultural, educational, and life experiences that allow us to constantly challenge the status quo.

We are looking for individuals who are motivated by our mission and eager to make an impact as we push the bounds of technology to transform the world.

Join us as we revolutionize video understanding and multimodal AI.

As a Software Engineer, Data at Twelve Labs, you will build core data infrastructure for acquiring, preprocessing, cleaning, filtering, and labeling multimodal text-vision datasets for model training.

In this role, you will have a larger impact on the quality of our models than perhaps any other engineering role at the entire company : well filtered & labeled data is core to everything we do.

This role is a perfect fit for distributed systems engineers who want to advance video understanding by delivering world class systems for *unstructured* multimodal corpora.

In this role, you will

  • Acquire, filter, label (leveraging techniques like RLAIF), and sanitize large-scale vision-language datasets for LLM / VLM pretraining
  • Scale our data systems to enable our evolution from double-digit to triple-digit billion parameter models (and beyond!)
  • Mentor junior engineers / researchers, and hold a high bar around code quality / engineering best practices
  • Establish strong relationships with 3rd party data vendors and human-in-the-loop data labeling services
  • Build the highest impact, not the flashiest, libraries and services
  • Lead by example in interviewing, hiring, and onboarding passionate and empathetic engineers
  • Work across teams to understand and manage project priorities and product deliverables, evaluate trade-offs, and drive technical initiatives from ideation to execution to shipment

You may be a good fit if you have

  • 7+ years of industry experience (or 4+ with a PhD in a related technical domain)
  • A PhD, or a Master's degree, in machine learning or a closely related discipline
  • Led teams of 3+ engineers as a technical lead
  • Experience building model-bootstrapped language or vision-language datasets (RLAIF, etc.)
  • Managed data acquisition for large generative or contrastive modelsExperience with FFmpeg or other high performance image / video processing libraries (bonus points for past work with such processing on GPUs / accelerators)
  • Deep experience as a backend and / or data engineer & an interest in ML / AI systems
  • Strong Python expertise and considerable prior work history with at least one statically typed language (we use Golang)
  • Strong communication skills in written and spoken English

Interview and Onboarding Process :

1) Recruiter Phone Screen

2) Initial Technical Assessment

3) Final round technical assessment & culture interview

4) Reference Checks

We're also excited to share that we'll do global onboarding in Seoul for all new hires (paid company travel!).

Even if there are a few checkboxes that aren't ticked through your prior experience, we still encourage you to apply! If you are a 0-to-1 achiever, a ferocious learner, and a kind and fun team player who motivates others, you will find a home at Twelve Labs.

We welcome applicants from all walks of life and are committed to equal-opportunity employment. We cherish and celebrate diversity not just because it is the right thing to do, but because it makes our company much stronger.

Benefits and Perks

An open and inclusive culture and work environment.

Work closely with a collaborative, mission-driven team on cutting-edge AI technology.

Full health, dental, and vision benefits

Extremely flexible PTO and parental leave policy. Office closed the week of Christmas and New Years.

Remote-flexible, offices in San Francisco and Seoul and coworking stipend

VISA support (such as H1B and OPT transfer for US employees)

3 days ago
Related jobs
Promoted
SoFi
San Francisco, California

As a Senior Software Engineer, you will work alongside our experienced team of data engineers and product managers to develop and maintain our cutting-edge data handling platform using Snowflake, dbt, Sagemaker, and Airflow. SoFi runs on data! We are seeking a highly motivated Senior Software Engine...

Promoted
BEDI Partnerships
San Francisco, California

Lead data platform initiatives towards Udemy's comprehensive data quality and governance, data catalog and lineage, data privacy, data observability, monitoring and alerting, and disaster recovery. Udemy's Data org is looking for a highly experienced, self-driven, creative data platform engineer pas...

Mindlance
San Leandro, California

Job Description: In this contingent resource assignment, you may: Consult on or participate in moderately complex initiatives and deliverables within Software Engineering and contribute to large-scale planning related to Software Engineering deliverables. Review and analyze moderately complex Softwa...

Benchling
San Francisco, California

As one of Benchlings Data Platform engineers, youll join a rapidly growing, premier engineering team and form the foundation of our data pillar, encompassing customer-facing data products, internal analytics, and the customer-facing data warehouse. Define and design data transformations and pipeline...

Rippling
San Francisco, California

As the market increasingly rewards organizations that own scaled distribution networks, control extensive data sets, and rapidly bring products to market, Rippling and the Data Bridge team are poised to lead. Knowledge of data processing pipelines using frameworks like Apache Spark, Flink or Databri...

BHO Tech
San Francisco, California

We are hiring a Senior Software Engineer for our Data Platforms team in San Francisco. Data import, data distributed processing and scalability are some of the projects you will be expected to be a major contributor. We are looking for strong engineer to help build out our multiple services. We’re l...

Fathom
San Francisco, California
Remote

Senior Software Engineer (Backend/Data). Developing data infrastructure to ingest, sanitize and normalize a broad range of medical data, such as electronics health records, journals, established medical ontologies, crowd-sourced labelling and other human inputs. A solid understanding of databases an...

Snowflake
San Mateo, California

We are seeking talented Senior Software Engineers who are technical leaders in the big data open source community to join us to define the strategy, engage and deliver innovation into the open source community, and bring Snowflake to millions of big data professionals. Our customers want to bring mo...

Mindlance
San Leandro, California

Job Description: In this contingent resource assignment, you may: Consult on or participate in moderately complex initiatives and deliverables within Software Engineering and contribute to large-scale planning related to Software Engineering deliverables. Review and analyze moderately complex Softwa...

Benchling
San Francisco, California

As one of Benchlings Data Platform engineers, youll join a rapidly growing, premier engineering team and form the foundation of our data pillar, encompassing customer-facing data products, internal analytics, and the customer-facing data warehouse. Define and design data transformations and pipeline...