MongoDB’s mission is to empower innovators to create, transform, and disrupt industries by unleashing the power of software and data.
We enable organizations of all sizes to easily build, scale, and run modern applications by helping them modernize legacy workloads, embrace innovation, and unleash AI.
Our industry-leading developer data platform, MongoDB Atlas, is the only globally distributed, multi-cloud database and is available in more than 115 regions across AWS, Google Cloud, and Microsoft Azure.
Atlas allows customers to build anywhere on the edge, on premises, or across cloud providers. With offices worldwide and over 175,000 developers joining MongoDB every month, it’s no wonder that leading organizations, like Samsung and Toyota, trust MongoDB to build next-generation, AI-powered applications.
The Data Pipelines Engineering team is responsible for building ETL pipelines that populate the Internal Data Platform, which drives analytics that help the company run more efficiently.
Our team builds highly performant and scalable processes that extract massive datasets and makes those datasets available for querying in an optimal way.
We are also building a Generative AI framework that will help teams within the company tap into the data that we store in their Retrieval-Augmented Generation (RAG)-based applications.
We are looking to speak to candidates who are based in New York City for our hybrid working model.
What you’ll do :
- Innovate strategies for building AI tools, including how to optimally chunk and retrieve RAG-based data and which LLMs are best suited to support use cases
- Stay abreast of industry trends in the AI space, and evaluate and incorporate new concepts / tools into MongoDB’s internal AI architecture
- Understand the dangers that come with chatbots, and build guardrails that prevent those dangers from occurring
- Build solutions that evaluate the content returned by AI tools using a variety of frameworks, and use the evaluation results to prevent hallucinations
- Work with Security and Compliance teams to ensure that datasets have appropriate permissions and regulations in place
- Work with our Data Platform, Architecture, and Governance sibling teams to make data scalable, consumable, and discoverable
We’re looking for someone with :
- 1+ years building AI and RAG-based applications
- 3+ years building ML models
- 5+ years of Python experience
- Hive, Iceberg, Glue, or other technologies that expose big data as tables
- Familiarity with different big data file types such as Parquet, Avro, and JSON
- 1+ years of Spark experience (nice-to-have)
- 5+ years of building ETL pipelines for a Data Lake / Warehouse (nice-to-have)
Success Measures :
- In 3 months, you'll have a thorough understanding of the architecture of MongoDB’s internal Data and AI ecosystem
- In 6 months, you'll have owned the delivery of a large project from start (scoping, design) to finish (delivery)
- In 12 months, you'll have designed new features, led development work, and become a go-to expert on parts of the system