Search jobs > Los Angeles, CA > Site reliability engineer

Site Reliability Engineer - Video Platform - USDS (LA)

TikTok
Los Angeles
Full-time

About TikTok . Data SecurityTikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy.

Data Security ( USDS ) is a subsidiary of TikTok in the . This new, security-first division was created to bring heightened focus and governance to our data protection policies and content assurance protocols to keep .

users safe. Our focus is on providing oversight and protection of the TikTok platform and . user data, so millions of Americans can continue turning to TikTok to learn something new, earn a living, express themselves creatively, or be entertained.

The teams within USDS that deliver on this commitment daily span across Trust & Safety, Security & Privacy, Engineering, User & Product Ops, Corporate Functions and more.

Why Join UsCreation is the core of TikTok's purpose. Our platform is built to help imaginations thrive. This is doubly true of the teams that make TikTok possible.

Together, we inspire creativity and bring joy - a mission we all believe in and aim towards achieving every day. To us, every challenge, no matter how difficult, is an opportunity;

to learn, to innovate, and to grow as one team. Status quo? Never. Courage? Always. At TikTok, we create together and grow together.

That's how we drive impact - for ourselves, our company, and the communities we serve. Join us. Team IntroTikTok video system is a world-leading video platform that provides multimedia storage, delivery, transcoding services.

As part of the USDS, the Video Platform team is responsible for building the next generation video processing platform which provides excellent experiences for billions of users around the world.

The USDS Video Platform team is seeking an experienced Site Reliability Engineer to help us continue improving TikTok's video system.

If you are passionate about ensuring software reliability, love problem-solving, and are prepared for exciting challenges, we would like you on our team.

In order to enhance collaboration and cross-functional partnerships, among other things, at this time, our organization follows a hybrid work schedule that requires employees to work in the office 3 days a week, or as directed by their manager / department.

We regularly review our hybrid work model, and the specific requirements may change at any time.Responsibilities- Responsible for overall reliability of TikTok's video system, including video publishing and distribution.

  • Perform lifecycle management of production systems including change management, service deployment, operations and emergency response.
  • Monitor the system and respond to incidents to maintain system service level agreement (SLA), review and follow up all production incidents.
  • Perform capacity management of compute, storage and network bandwidth resources to ensure system stability and save infrastructure costs.
  • Provide strong support during big events to ensure the system is capable of consuming a large volume of Internet traffic.
  • Build tools, automations, visualizations and monitors to facilitate the operation and optimization of the global infrastructure.

Minimum Qualifications- Bachelor's degree in Computer Science or a related technical background involving software / system engineering, or equivalent working experience.

2+ years of SRE or DevOps experience in large scale online services- Programming experience with at least one of the following languages : C, C++, Java, Python, C# or Go.

Preferred Qualifications- Extensive knowledge of networking, operation system, database system and container technology.- Good understanding of every aspect of microservice architecture, and hands on experience in troubleshooting in large scale distributed systems.

  • Hands on experience in common opensource systems such as Linux, MySQL, MongoDB, Redis and ELK.- Experience in building solutions with AWS, Google, Azures and other cloud services is a plus.
  • Passionate, self-motivated and good teamwork skills. D&I Statement
  • 30+ days ago
Related jobs
Promoted
VirtualVocations
Los Angeles, California

Key Responsibilities:Design and implement SRE practices for system availability and scalabilityManage incidents, emergency response, and system monitoringDevelop and lead cross-functional projects and programsRequired Qualifications:10+ years as a software engineer5+ years of experience as a Site Re...

Promoted
Disney Cruise Line - The Walt Disney Company
Glendale, California

Our Performance and Reliability teams are leading the improvements, optimization, and availability of applications across the Disney organization and business units, taking a consultative approach to Reliability Engineering by supporting, educating, mentoring, and delivering automation to foster per...

Promoted
VirtualVocations
Los Angeles, California

Kubernetes)Experience with scripting languages and Infrastructure as CodeKnowledge of AWS and familiarity with other cloud platformsExperience with CI/CD tools and deployment strategies. ...

TikTok
Los Angeles, California

Responsibilities:• Engage in and improve the whole lifecycle of Recommendation systems — from system design consulting through to launch reviews, deployment, operation and refinement• Deliver tools/software to improve the reliability and scalability of services, automate operations and improve R&D e...

GEICO
Los Angeles, California
Remote

GEICO is seeking an experienced Staff Engineer with a passion for building high-performance, low maintenance, zero-downtime platforms, and applications. Our Staff Engineer works with our Sr Staff Engineer and Sr. Collaborate with product managers, team members, customers, and other engineering teams...

City National Bank
Los Angeles, California

SITE RELIABILITY PRINCIPAL ENGINEER WHAT IS THE OPPORTUNITY? As an SRE, you will utilize your software, systems engineering, and operations background to build and run large-scale, fault-tolerant systems. What you will do Be a technical expert to architect solutions that helps to improve reliability...

Fox Corporation
Los Angeles, California

Work closely with Video & Player Engineering and 3rd party teams to help implement scalability, cost visibility and observability in the platform. Fox is hiring a Senior Site Reliability Engineer to help build infrastructure and platforms to support our live direct to consumer APIs for live events s...

Circle
Los Angeles, California

Senior or Staff Site Reliability Engineer - Performance EngineeringCircle is a financial technologypany at the epicenter of the emerging internet of money, where value can finally travel like other digital data - globally, nearly instantly and less expensively than legacy settlement systems. Experie...

Fox Corporation
Los Angeles, California

Work closely with Video & Player Engineering and 3rd party teams to help design and implement scalability, cost visibility and observability in the platform. Fox is hiring a Principal Site Reliability Engineer - Kubernetes to build and operate infrastructure and platforms to support APIs around our ...

SpaceX
Hawthorne, California

We are looking for an experienced Site Reliability Engineer to operate and scale custom-built mission-critical software products for engineering, test, and launch. SITE RELIABILITY ENGINEER (APPLICATION SOFTWARE). Bachelor’s degree in computer science, information systems, or engineering discipline;...