Search jobs > San Jose, CA > System software engineer

Software Engineer, Storage System

TikTok
San Jose, California, US
$145K-$355K a year
Full-time

Responsibilities

Please read the following job description thoroughly to ensure you are the right fit for this role before applying.

TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. TikTok has global offices including Los Angeles, New York, London, Paris, Berlin, Dubai, Singapore, Jakarta, Seoul and Tokyo.

Our team was established to help realize our company vision, building a global platform for creation and communication. We are doing world-class work in machine learning, computer vision, natural language processing, speech and audio, and knowledge, and transferring our work into products, which hundreds of millions of users worldwide use.

As a vital AI infrastructure for the company, our machine learning system integrates our most up-to-date R&D results in AI algorithms and systems.

Come and join us, you will get the chance of building large-scale machine learning systems and working with the best AI system and algorithm researchers and engineers.

What You’ll Be Doing

  • Build a unified data storage format and query engine in different scenarios (high availability / high throughput, large volume / sequential or random access).
  • Build an efficient system for model parameter management, sharding, and deduplication for LLMs.
  • Develop multi-level / hierarchical storage architecture, not limited to HBM / DDR / disk.
  • Optimize the training system for availability and fault tolerance; improve the data consistency, and capacity of the system.
  • Research and implement state of the art indexing / storage structures for machine learning on latest hardware.

Qualifications

  • Proficient in the use of C++ / Python in the Linux environment.
  • Proficient in the design, development, maintenance and continuous optimization of large-scale distributed systems, and be able to identify potential problems in complex systems.
  • Have participated in optimizations for parameter-server-like systems, or indexing structure of query engines; or have experience in using / optimizing large-scale distributed storage systems such as HDFS and PFS.
  • Strong communication skills and develop new solutions based on issues that arise.

Bonus

  • Understand open source storage / engine projects such as Redis, RocksDB, Presto, etc.; understand common Machine learning file storage formats such as parquet, TFRecord, IndexRecordIO, etc.
  • Familiar with one of the machine learning frameworks (TensorFlow / PyTorch / Jax).
  • Have experience in one of the following fields : database systems, distributed storage, AI infrastructure, HW / SW co-design, High performance computing, ML hardware architecture (GPU, accelerators, networking), machine learning frameworks, operating systems.
  • ACM / OI competitive programming experiences.

TikTok is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives.

Our platform connects people from across the globe and so does our workplace. At TikTok, our mission is to inspire creativity and bring joy.

To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach.

TikTok is committed to providing reasonable accommodations in our recruitment processes for candidates with disabilities, pregnancy, sincerely held religious beliefs or other reasons protected by applicable laws.

If you need assistance or a reasonable accommodation, please reach out to us at [email protected].

Job Information :

The base salary range for this position in the selected city is $145000 - $355000 annually. Compensation may vary outside of this range depending on a number of factors, including a candidate’s qualifications, skills, competencies and experience, and location.

Base pay is one part of the Total Package that is provided to compensate and recognize employees for their work, and this role may be eligible for additional discretionary bonuses / incentives, and restricted stock units.

Our company benefits are designed to convey company culture and values, to create an efficient and inspiring work environment, and to support our employees to give their best in both work and life.

We offer the following benefits to eligible employees :

We cover 100% premium coverage for employee medical insurance, approximately 75% premium coverage for dependents and offer a Health Savings Account(HSA) with a company match.

As well as Dental, Vision, Short / Long term Disability, Basic Life, Voluntary Life and AD&D insurance plans. In addition to Flexible Spending Account(FSA) Options like Health Care, Limited Purpose and Dependent Care.

Our time off and leave plans are : 10 paid holidays per year plus 17 days of Paid Personal Time Off (PPTO) (prorated upon hire and increased by tenure) and 10 paid sick days per year as well as 12 weeks of paid Parental leave and 8 weeks of paid Supplemental Disability.

We also provide generous benefits like mental and emotional health benefits through our EAP and Lyra. A 401K company match, gym and cellphone service reimbursements.

The Company reserves the right to modify or change these benefits programs at any time, with or without notice.

J-18808-Ljbffr

12 days ago
Related jobs
Promoted
Cisco Systems, Inc.
San Jose, California

Computer Science, Software Engineering, Computer Engineering, Electrical Engineering or related program or other academic certification. Our software engineers are the gurus behind the scenes, ensuring all of our programs are easy to use and bug free. Engineering, Information Technology, Supply Chai...

Synergistic Systems Inc
San Jose, California

As Nex’s Software Engineer in Embedded Systems/Firmware, you are part of an ambitious and versatile Platform Engineering Team pushing the boundaries of motion gaming by optimizing Playground’s operating system and integrating state-of-the-art image processing, CV and ML algorithms. Software Engineer...

Databricks
Mountain View, California

As a software engineer on the Runtime team at Databricks, you will be building the next generation distributed data storage and processing systems that can outperform specialized SQL query engines in relational query performance, yet provide the expressiveness and programming abstractions to support...

Google
Sunnyvale, California

As a Staff Software Engineer, you will be working on new compute platforms and systems for both general purpose compute and accelerated ML hardware, with an emphasis on hardware-software-physical co-optimizations for both Cloud and internal use cases. Experience with the intersection of ML and compu...

Cadence Design Systems
San Jose, California

A full-time student pursuing a MS/PhD in computer science, statistics, computer engineering, or electrical engineering. ...

NVIDIA
Santa Clara, California

We are looking for forward-thinking, hard-working, and creative people to join a multifaceted software team with high standards! This software engineering role involves developing analysis tools on various combinations of OS and hardware at different scales from single system to large data center. A...

ByteDance
San Jose, California

Come and join us, you will get the chance of building large-scale machine learning systems, and working with the best AI system and algorithm researchers and engineers. As a vital AI infrastructure for the company, our machine learning system integrates our most up-to-date R&D results in AI algorith...

NVIDIA
Santa Clara, California
Remote

Software Engineer who will help build best-in-class simulators for our DGX Server platforms. Simulations play a critical role in building scalable systems at Speed of Light! As an NVIDIAN, get immersed in a diverse environment where everyone is encouraged to do their best work. Build & enhance s...

Microsoft
Mountain View, California

As a Senior Software Engineer in Azure Storage, you will design, implement, optimize, and maintain the Azure Storage Stack running on DPU nodes. Microsoft Azure Storage is a highly distributed, massively scalable, and ubiquitously accessible cloud storage platform. Within Azure Storage, the xDPU tea...

Hitachi Vantara Corporation
Santa Clara, California

Software Architect - Storage Systems. Software Architect to join our CTO office team. The Senior Architect or Developer will lead the design and development of on-prem and cloud infrastructure management platforms, focusing on integrating analytics and AI OPS to enhance system performance and reliab...