Search jobs > San Jose, CA > Data scientist

Data Scientist - LLM

TikTok
San Jose
Full-time

TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. TikTok has global offices including Los Angeles, New York, London, Paris, Berlin, Dubai, Mumbai, Singapore, Jakarta, Seoul and Tokyo.

Why Join UsCreation is the core of TikTok's purpose. Our platform is built to help imaginations thrive. This is doubly true of the teams that make TikTok possible.

Together, we inspire creativity and bring joy - a mission we all believe in and aim towards achieving every day. To us, every challenge, no matter how difficult, is an opportunity;

to learn, to innovate, and to grow as one team. Status quo? Never. Courage? Always. At TikTok, we create together and grow together.

That's how we drive impact - for ourselves, our company, and the communities we serve. Join us. Our Data Science team is a diverse group of problem solvers, located in Singapore, China, Canada and US, who are passionate about translating complex data into clear, actionable insights.

By pioneering state-of-the-art data science techniques and fostering a culture of data-driven decision-making, we aim to unlock unprecedented growth opportunities and operational excellence.

We are responsible for developing innovative methods, models and algorithms to ensure the supply of high-quality and diverse data for both SFT and Pretraining of LLM / VLM.

Responsibilities- Design and develop data collection pipelines to gather and preprocess diverse datasets from various sources.

  • Design and develop data processing pipelines, including data labeling, data filtering, data cleaning, data visualization, data auditing, etc.
  • Implement machine learning models to improve the quality and diversity of data.- Develop machine learning models and algorithms to detect the issues of the current moderation system and also the TikTok ecosystem.

Minimum Qualifications - Major in computer science, or any other related technical discipline;- Strong proficiency in building large-scale data processing pipelines, familiar with distributed workload (.

multiprocessing).- Proficiency in at least one programming language commonly used in machine learning, such as Python and ability to write clean, maintainable code.

  • Proficiency in at least one deep learning framework, such as PyTorch.- At least 3 years of experience in at least one of the following areas : machine learning, pattern recognition, NLP, data mining, multimodality, LLM.
  • 30+ days ago
Related jobs
Promoted
TikTok
San Jose, California

Design and develop data processing pipelines, including data labeling, data filtering, data cleaning, data visualization, data auditing, etc. Our Data Science team is a diverse group of problem solvers, located in Singapore, China, Canada and US, who are passionate about translating complex data int...

Promoted
Zscaler
San Jose, California

Position: Senior Applied Scientist / Data Scientist (ML & LLM). We are seeking an Applied Scientist / Data Scientist with deep expertise in machine learning, artificial intelligence, including the development and deployment of large language models. Familiarity with SQL, NoSQL databases, and dat...

Promoted
Karkidi
Sunnyvale, California

Experience with Data Source Identification: Requires knowledge of Functional business domain and scenarios; Categories of data and where it is held; Business data requirements; Database technologies and distributed datastores (e. Direct the gathering of data, assessing data validity and synthesizing...

TikTok
San Jose, California

Design and develop data processing pipelines, including data labeling, data filtering, data cleaning, data visualization, data auditing, etc. Our Data Science team is a diverse group of problem solvers, located in Singapore, China, Canada and US, who are passionate about translating complex data int...

TikTok
San Jose, California

Design and develop data processing pipelines, including data labeling, data filtering, data cleaning, data visualization, data auditing, etc. Our Data Science team is a diverse group of problem solvers, located in Singapore, China, Canada and US, who are passionate about translating complex data int...

Promoted
Reliable Robotics
Mountain View, California

As a Displays Software Engineer at Reliable Robotics, you will be a part of the remote piloting software team. You will have ownership over the entire lifecycle of these applications and will work with software engineers and designers within the team, as well as systems engineers and pilots to build...

Promoted
VirtualVocations
Fremont, California

Key Responsibilities:Facilitate communication and collaboration between IT and business teamsConduct thorough business analysis to gather requirements and develop solutionsTranslate business requirements into technical specifications and monitor IT projectsRequired Qualifications:7+ years of experie...

Promoted
Atlas AI
Palo Alto, California

As the Senior / Lead ML Tooling Software Engineer at Atlas AI, reporting to the Head of Engineering, you will be part of the core team defining, building, testing, and delivering our GeoAI platform to enable internal and external Data Scientists (DS) and Machine Learning Engineers (MLE) to implement...

Promoted
Apple
Sunnyvale, California

You will implement robust, scalable ML infrastructure, including data storage, processing, and model serving components, to support seamless integration of AI/ML models into production environments. Would you like to work in a fast-paced environment where your technical abilities will be challenged ...

Promoted
Walmart Inc.
Sunnyvale, California

Experience Data pipeline engineering HUDI / DELTA, Google HIVE, Elastic Search for data migration. Option 1: Bachelor's degree in computer science, computer engineering, computer information systems, software engineering, or related area and 3 years' experience in software engineering or related are...