About the Team
Storage Infrastructure provides APIs for data access, placement, and lifecycle management, while ensuring that the storage systems’ capacity, throughput, and IOPs satisfy the needs of our AI researchers.
Scalability, reliability, security, and usability are the core concerns of the team.
About the Role
As a TLM / engineering manager in the Storage Pillar, you will lead a team to design, build, and operate Exascale systems to scalably and reliably manage our research data across multiple regions.
We’re looking for distributed systems engineers who have worked on exascale data management systems or distributed filesystems.
You do not need to be an ML / DL expert to deliver world-class infrastructure, but you do need to be able to quickly obtain a deep technical understanding of new domains.
This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees.
In this role, you will :
Directly managing ICs responsible for software to manage exascale data, and make it accessible to researchers
Drive the reliability, predictability, and cost effectiveness of our storage systems
Interface with researchers to understand and accommodate data use-cases
Ensure the security of our critical datasets
Build and grow high performing teams in a deeply iterative, collaborative, fast-paced environment to bring our technology to millions of users around the world, and ensure it’s delivered with safety and reliability in mind.
You might thrive in this role if you :
Have a deep understanding of distributed systems principles and a proven track record in designing and building scalable, reliable, and secure storage solutions.
Possess strong programming skills
Have experience working in public clouds (especially Azure)
Have a bias for action and comfort building in a fast paced, dynamic environment
Can create a diverse, equitable, and inclusive culture that makes all feel welcome while enabling radical candor and the challenging of group think.
Have a humble attitude, an eagerness to help your colleagues, and a desire to do whatever it takes to make the team succeed.
Are experienced in collaborating with cross-functional teams to ensure that reliability and scalability are considered in the design and development of new features and services.
Own problems end-to-end, and are willing to pick up whatever knowledge you're missing to get the job done.
Have excellent communication skills. Expressing ideas clearly and listening carefully are among the most important requirements for success in this role.
As a bonus, understand of AI / ML workloads