Site Reliability Engineer, Recommendation Infrastructure - USDS

TikTok

Los Angeles

Full-time

About TikTok Security TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy.

Data Security ( USDS ) is a subsidiary of TikTok in the . This new, security-first division was created to bring heightened focus and governance to our data protection policies and content assurance protocols to keep .

users safe. Our focus is on providing oversight and protection of the TikTok platform and . user data, so millions of Americans can continue turning to TikTok to learn something new, earn a living, express themselves creatively, or be entertained.

The teams within USDS that deliver on this commitment daily span across Trust & Safety, Security & Privacy, Engineering, User & Product Ops, Corporate Functions and more.

Why Join UsCreation is the core of TikTok's purpose. Our platform is built to help imaginations thrive. This is doubly true of the teams that make TikTok possible.

Together, we inspire creativity and bring joy - a mission we all believe in and aim towards achieving every day. To us, every challenge, no matter how difficult, is an opportunity;

to learn, to innovate, and to grow as one team. Status quo? Never. Courage? Always. At TikTok, we create together and grow together.

That's how we drive impact - for ourselves, our company, and the communities we serve. Join us. In order to enhance collaboration and cross-functional partnerships, among other things, at this time, our organization follows a hybrid work schedule that requires employees to work in the office 3 days a week, or as directed by their manager / department.

We regularly review our hybrid work model, and the specific requirements may change at any time. Responsibilities :

Engage in and improve the whole lifecycle of Recommendation systems from system design consulting through to launch reviews, deployment, operation and refinement
Deliver tools / software to improve the reliability and scalability of services, automate operations and improve R&D efficiency
Build availability of large-scale services deployed across global data centers
Plan, manage and optimize cloud resources utilization, ensuring SLA of large-scale clusters
Measure and monitor availability, latency and overall service health
Practice sustainable incident response and postmortems.

Qualifications

Bachelor's degree or above majoring in Computer Science or related fields, with at least 5 years of related work experience
Experience in SRE of large-scale systems deployment with high reliability and scalability
Familiar with system operation skills in Linux and network
Experience programming in at least one of the following languages : Python, Perl, Go, or C / C++
Experience in designing, analyzing and troubleshooting large-scale distributed systems
Familiar with popular CI / CD procedures and environments
Effective communication skills and a sense of ownership and drive

30+ days ago

Related jobs

Site Reliability Engineer, Data Engineering - USDS

Los Angeles, California

This is a Site Reliability Engineer role, focusing on the data pipeline reliability for the Video Platform team in USDS. The teams within USDS that deliver on this commitment daily span across Trust & Safety, Security & Privacy, Engineering, User & Product Ops, Corporate Functions and mo...

Site Reliability Engineer, Recommendation Infrastructure - USDS

Los Angeles, California

The teams within USDS that deliver on this commitment daily span across Trust & Safety, Security & Privacy, Engineering, User & Product Ops, Corporate Functions and more. Responsibilities:• Engage in and improve the whole lifecycle of Recommendation systems — from system design consulting through to...

Site Reliability Engineer - Video Platform - USDS (LA)

Los Angeles, California

The USDS Video Platform team is seeking an experienced Site Reliability Engineer to help us continue improving TikTok's video system. The teams within USDS that deliver on this commitment daily span across Trust & Safety, Security & Privacy, Engineering, User & Product Ops, Corporate Fun...

Senior Engineer II - Site Reliability Engineering

VirtualVocations

Los Angeles, California

A company is looking for a Senior Engineer II - Site Reliability Engineering. Key Responsibilities:Provide guidance and support to product engineering teams for developing high-quality software systems through monitoring toolsManage monitoring tools and best practices to ensure total visibility into...

Site Reliability Engineer - USDS

Los Angeles, California

Proven work experience as a Site Reliability Engineer, Systems Engineer, or similar software engineering role. Site Reliability Engineering(SRE) at TikTok combines software and systems engineering to build and run large-scale, massively distributed, and fault-tolerant systems. The teams within USDS ...

Site Reliability Engineer - k8s - Remote

VirtualVocations

Los Angeles, California

A company is looking for a Site Reliability Engineer - k8s (SRE) - Remote. ...

Site/System Reliability Engineer

The Walt Disney Company

Santa Monica, California

Disney Entertainment & ESPN Technology is looking for a Site/System Reliability Engineer to join the Production Platforms team inside the Engineering Services organization. As a Site/System Reliability Engineer, you will play a pivotal role in a highly performant and geographically dispersed tea...

Sr. Site Reliability Engineer, Data (Application Software)

Hawthorne, California

SITE RELIABILITY ENGINEER), DATA (APPLICATION SOFTWARE). Aerospace experience is not required to be successful here - rather we look for smart, motivated, collaborative site reliability engineers who love solving problems and want to make an impact on a super inspiring mission. Bachelor's degree in ...

Site Reliability Principal Engineer

City National Bank

Los Angeles, California

SITE RELIABILITY PRINCIPAL ENGINEER WHAT IS THE OPPORTUNITY? As an SRE, you will utilize your software, systems engineering, and operations background to build and run large-scale, fault-tolerant systems. Your role is to ensure the reliability, scalability and maximum uptime of CNB systems in the Da...

Site Reliability Engineer-FedRAMP, AWS (FULLY REMOTE) - 29122

California, United States

Site Reliability Engineers in this role will be engaging with multiple service owners across the platform to teach and implement modern interpretations ofSRE,observability, Chaos Engineering andDevOps. Splunk's Cloud Services group is looking for a Site ReliabilityEngineer to help lead, design and b...