About the Team
You’ll join the team that’s behind OpenAI’s data infrastructure that powers critical engineering, product, alignment teams that are core to the work we do at OpenAI.
The systems we support include our data warehouse, batch compute infrastructure, streaming infrastructure, data orchestration system, data lake, vector databases, critical integrations, and more.
About the Role
The Applied Data Platform team designs, builds, and operates the foundational data infrastructure that enables products and teams at OpenAI.
You are comfortable with work such as scaling Kubernetes services, OLAP systems, debugging Kafka consumer lag, diagnosing distributed kv store failures, designing a system to retrieve image vectors with low latency.
You are well versed with infrastructure tooling such as Terraform, worked with Kubernetes, and have the SRE skill sets.
This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees.
In this role, you will :
Design, build, and maintain data infrastructure systems such as distributed compute, data orchestration, distributed storage, streaming infrastructure while ensuring scalability, reliability, and security
Ensure our data platform can scale reliably to the next several orders of magnitude
Accelerate company productivity by empowering your fellow engineers & teammates with excellent data tooling and systems, providing a best in case experience
Bring new features and capabilities to the world by partnering with product engineers, trust & safety and other teams to build the technical foundations
Like all other teams, we are responsible for the reliability of the systems we build. This includes an on-call rotation to respond to critical incidents as needed
You might thrive in this role if you :
Have 4+ years in data infrastructure engineering OR
Have 4+ years in infrastructure engineering with a strong interest in data
Take pride in building and operating scalable, reliable, secure systems
Are comfortable with ambiguity and rapid change
Have a voracious and intrinsic desire to learn and fill in missing skills and an equally strong talent for sharing learnings clearly and concisely with others
Some of the technologies you’ll be working with include Apache Spark, Clickhouse, Python, Terraform, Kafka, Azure EventHub, Vector DBs.