Search jobs > San Francisco, CA > Engineering manager

Engineering Manager - Supercomputing Storage

OpenAI
San Francisco, CA
Full-time

About the Team

Storage Infrastructure provides APIs for data access, placement, and lifecycle management, while ensuring that the storage systems’ capacity, throughput, and IOPs satisfy the needs of our AI researchers.

Scalability, reliability, security, and usability are the core concerns of the team.

About the Role

As a TLM / engineering manager in the Storage Pillar, you will lead a team to design, build, and operate Exascale systems to scalably and reliably manage our research data across multiple regions.

We’re looking for distributed systems engineers who have worked on exascale data management systems or distributed filesystems.

You do not need to be an ML / DL expert to deliver world-class infrastructure, but you do need to be able to quickly obtain a deep technical understanding of new domains.

This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees.

In this role, you will :

Directly managing ICs responsible for software to manage exascale data, and make it accessible to researchers

Drive the reliability, predictability, and cost effectiveness of our storage systems

Interface with researchers to understand and accommodate data use-cases

Ensure the security of our critical datasets

Build and grow high performing teams in a deeply iterative, collaborative, fast-paced environment to bring our technology to millions of users around the world, and ensure it’s delivered with safety and reliability in mind.

You might thrive in this role if you :

Have a deep understanding of distributed systems principles and a proven track record in designing and building scalable, reliable, and secure storage solutions.

Possess strong programming skills

Have experience working in public clouds (especially Azure)

Have a bias for action and comfort building in a fast paced, dynamic environment

Can create a diverse, equitable, and inclusive culture that makes all feel welcome while enabling radical candor and the challenging of group think.

Have a humble attitude, an eagerness to help your colleagues, and a desire to do whatever it takes to make the team succeed.

Are experienced in collaborating with cross-functional teams to ensure that reliability and scalability are considered in the design and development of new features and services.

Own problems end-to-end, and are willing to pick up whatever knowledge you're missing to get the job done.

Have excellent communication skills. Expressing ideas clearly and listening carefully are among the most important requirements for success in this role.

As a bonus, understand of AI / ML workloads

17 days ago
Related jobs
Promoted
Zooz
San Mateo, California

Zoox is looking for a leader for our IT Platform Engineering Storage and Cloud teams. Zoox's robot and AI efforts rely on petabyte-scale high performance storage systems and strategic use of cloud compute and SaaS. In this role, you will closely work with engineering teams to support Zoox's ongoing ...

Promoted
Stripe
San Francisco, California

As a Program Manager, you will play a key role leading programs working across multiple engineering, product and cross-functional (finance, legal, compliance, risk, and other) teams spanning Stripe. At Stripe, product development is an extremely collaborative effort between engineering, design, anal...

OpenAI
San Francisco, California

As a TLM / engineering manager in the Storage Pillar, you will lead a team to design, build, and operate Exascale systems to scalably and reliably manage our research data across multiple regions. Storage Infrastructure provides APIs for data access, placement, and lifecycle management, while ensuri...

Microsoft
San Francisco, California

As a Principal Software Engineering Manager in Azure Storage, you will lead our SFTP offering and other charters related to the Azure front end services, one of our core layers of the Azure Storage stack. Azure Storage front service exposes all the abstraction of Azure Storage such as VM Disks, Obje...

OpenAI
San Francisco, California

As a TLM / engineering manager in the Scalability Pillar, you will lead a team to simplify and scale the operations of our DC-scale computers. These conditions demand a novel approach to cluster infrastructure, and it is the work of the Supercomputing Scalability Pillar to invent it. The focus is on...

DNV
Oakland, California

Energy Storage Independent Engineering Project Manager. This role would be serving our Independent Engineering (IE) team within the Energy Storage Engineering (ESE) team. Approximately 3 to 5 years of experience in batteries, energy storage, and/or solar + storage industry in the context of engineer...

OpenAI
San Francisco, California

As a TLM / engineering manager in the Scheduling Pillar, you will lead the team that designs, writes, deploys, and operates job lifecycle management systems for model training on some of the largest supercomputers in the world. The Supercomputing Scheduling Pillar at OpenAI is dedicated to ensuring ...

DoorDash
San Francisco, California

We’re hiring a Senior Engineering Manager for our Storage Access Platform and Storage Infrastructure teams. You will help us evolve our Storage offering by building a platform that strives to manage itself and disappear in the background, enabling engineers to focus on building product experiences o...

Promoted
Fresenius Medical Care
South San Francisco, California

Manages the operations of multiple Home Therapies programs with direct responsibility for results, including growth, patient retention, program costs, operational methods, scheduling, and staffing. Responsible for profit and loss management of the assigned program including optimal performance of pr...

Promoted
join-cascade.com
San Francisco, California

We are seeking a Director of Engineering & Co-Founder to become part of building the most efficient and joyful place for medical providers to treat a patient online. Work hard and rest well: Fifteen days of paid-time-off (PTO) every year plus federal holidays off. Cascade is an AI native healthc...