Search jobs > Los Angeles, CA > Site reliability engineer

Site Reliability Engineer - USDS

TikTok
Los Angeles
Full-time

About TikTok . Data SecurityTikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy.

Data Security ( USDS ) is a subsidiary of TikTok in the . This new, security-first division was created to bring heightened focus and governance to our data protection policies and content assurance protocols to keep .

users safe. Our focus is on providing oversight and protection of the TikTok platform and . user data, so millions of Americans can continue turning to TikTok to learn something new, earn a living, express themselves creatively, or be entertained.

The teams within USDS that deliver on this commitment daily span across Trust & Safety, Security & Privacy, Engineering, User & Product Ops, Corporate Functions and more.

Why Join UsCreation is the core of TikTok's purpose. Our platform is built to help imaginations thrive. This is doubly true of the teams that make TikTok possible.

Together, we inspire creativity and bring joy - a mission we all believe in and aim towards achieving every day. To us, every challenge, no matter how difficult, is an opportunity;

to learn, to innovate, and to grow as one team. Status quo? Never. Courage? Always. At TikTok, we create together and grow together.

That's how we drive impact - for ourselves, our company, and the communities we serve. Join us. Site Reliability Engineering(SRE) at TikTok combines software and systems engineering to build and run large-scale, massively distributed, and fault-tolerant systems.

In our team, you’ll have the opportunity to manage the complex challenges of scale, while using expertise in coding, algorithms, complexity analysis, and large-scale system design.

We embrace a culture of diversity, intellectual curiosity, openness, and problem-solving. We encourage close collaboration while promoting self-direction.

In order to enhance collaboration and cross-functional partnerships, among other things, at this time, our organization follows a hybrid work schedule that requires employees to work in the office 3 days a week, or as directed by their manager / department.

We regularly review our hybrid work model, and the specific requirements may change at any time. Responsibilities- Develop and maintain automation procedures to maximize system efficiency and minimize human intervention.

  • Work closely with software engineering teams to design, deploy and operate elements to ensure that systems are functionally robust.
  • Ensure system scalability to handle growth in web traffic and data. - Implement monitoring tools and set up metrics to keep track of system health and performance.
  • Participate in on-call rotations, assist with incident management, and diagnose, resolve, and prevent production issues.
  • Conduct performance tests to find and address system bottlenecks. - Collaborate with teams across the organization to define Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs).
  • Practice sustainable user support, incident response, and blameless postmortems.
  • Bachelor's degree in Computer Science, Information Technology, or a related field with 3+ years of experience- Proven work experience as a Site Reliability Engineer, Systems Engineer, or similar software engineering role.
  • Proficient knowledge of high-level programming languages (. Python, Go, Java, and Shell script). - Experience in network architecture, database modeling, cloud systems and large-scale distributed systems.
  • Strong understanding of Linux operating systems and open-source technologies. - Preferred Experience in MySQL, Redis, Ngnix, Kubernetes, Docker, OpenStack, Hadoop, Spark, etc- Preferred Knowledge of monitoring tools and methodologies (such as Prometheus, Grafana).
  • Excellent problem-solving skills, strategic thinking, and a strong ability to debug complex systems.- Exceptional communication skills and the ability to effectively collaborate with cross-functional teams.
  • 30+ days ago
Related jobs
Promoted
TikTok
Los Angeles, California

The teams within USDS that deliver on this commitment daily span across Trust & Safety, Security & Privacy, Engineering, User & Product Ops, Corporate Functions and more. Data Security ("USDS") is a subsidiary of TikTok in the U. Participate as part of a global team to support ...

Promoted
VirtualVocations
Los Angeles, California

Key Responsibilities:Preventing scaling or stability bottlenecks of platform servicesDelivering reliability to the stack and enabling software engineering teamsInstrumenting, automating, and load testing distributed products and servicesRequired Qualifications:7+ years of SRE, Production, or Systems...

Promoted
City National Bank (CNB)
Los Angeles, California

As an SRE, you will utilize your software, systems engineering, and operations background to build and run large-scale, fault-tolerant systems. Your role is to ensure the reliability, scalability and maximum uptime of CNB systems in the Data Center or Cloud Platform. Design mechanisms for proactive ...

Promoted
VirtualVocations
Los Angeles, California

...

Promoted
SpaceX
Hawthorne, California

Bachelor’s degree in computer science, information systems/IT, or an engineering discipline; OR 5+ years of professional experience in software, DevOps, or site reliability engineering in lieu of a degree. As a Senior Site Reliability Engineer, you will architect, develop, and test key aspects of th...

Orangepeople
Glendale, California

We are seeking an experienced systems engineer who will join our DevOps/SRE cultured Systems Engineering team. This position works closely with various business engineering and production teams to gather requirements, troubleshoot issues, and provide customer support. Accountable for/teaching other ...

Bayside Solutions
CA, United States

Along with CloudStack/OpenStack, Virtualization and Linux, really needing the below experience as well.Kickstart and Bootstrap, as well as deployment to 100k servers across different data centers simultaneously.Additionally, experience with load balancers, high availability (HA), and failover proces...

Fox Corporation
Los Angeles, California

Fox is hiring a Staff Site Reliability Engineer to help build and operate infrastructure and platforms to support APIs around our live direct to consumer APIs for major live events such as the Super Bowl, World Cup, and World Series. The staff engineer will serve as an SME for solving thundering her...

E-Solutions
California, United States

Site Reliability Engineer (SRE). We are seeking a skilled Site Reliability Engineer (SRE) to join our dynamic team. You will be responsible for ensuring the availability and reliability of our SaaS products, which host customer data and require 24x7 uptime. Ensure the reliability, availability, and ...

Disney Entertainment & ESPN Technology
Burbank, California

The Senior Site Reliability Engineer is a key member of our Performance and Reliability embedded teams. Our Performance and Reliability teams are leading the improvements, optimization, and availability of applications across the Disney organization and business units, taking a consultative approach...