Search jobs > San Jose, CA > Senior site reliability

Senior Site Reliability Engineer, Recommendation Infrastructure

TikTok
San Jose, California, US
Full-time

Responsibilities

If you think you are the right match for the following opportunity, apply after reading the complete description.

TikTok is the leading destination for short-form mobile video. At TikTok, our mission is to inspire creativity and bring joy.

TikTok's global headquarters are in Los Angeles and Singapore, and its offices include New York, London, Dublin, Paris, Berlin, Dubai, Jakarta, Seoul, and Tokyo.

Why Join Us

Creation is the core of TikTok's purpose. Our platform is built to help imaginations thrive. This is doubly true of the teams that make TikTok possible.

Together, we inspire creativity and bring joy - a mission we all believe in and aim towards achieving every day.

To us, every challenge, no matter how difficult, is an opportunity; to learn, to innovate, and to grow as one team. Status quo?

Never. Courage? Always. At TikTok, we create together and grow together. That's how we drive impact - for ourselves, our company, and the communities we serve. Join us.

Our Recommendation Infrastructure Team is responsible for building up and optimizing the architecture for our recommendation system to provide the most stable and best experience for our TikTok users.

SREs in our team keep the systems up and running with the highest level of availability, and create highly automated systems and pipelines.

What You'll Do

  • Engage in and improve the whole lifecycle of Recommendation systems from system design consulting through to launch reviews, deployment, operation and refinement.
  • Deliver tools / software to improve the reliability and scalability of services, automate operations and improve R&D efficiency.
  • Build availability of large-scale services deployed across global data centers.
  • Plan, manage and optimize cloud resources utilization, ensuring SLA of large-scale clusters.
  • Measure and monitor availability, latency and overall service health.
  • Practice sustainable incident response and postmortems.

Qualifications

  • Bachelor's degree or above majoring in Computer Science or related fields.
  • At least 2 years of work experience in SRE of large-scale systems deployment with high reliability and scalability.
  • Familiar with system operation skills in Linux and network.
  • Experience programming in at least one of the following languages : Python, Perl, Go, or C / C++.
  • Experience in designing, analyzing and troubleshooting large-scale distributed systems.
  • Familiar with popular CI / CD procedures and environments.
  • Effective communication skills and a sense of ownership and drive.

TikTok is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives.

Our platform connects people from across the globe and so does our workplace. At TikTok, our mission is to inspire creativity and bring joy.

To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach.

We are passionate about this and hope you are too.

TikTok is committed to providing reasonable accommodations in our recruitment processes for candidates with disabilities, pregnancy, sincerely held religious beliefs or other reasons protected by applicable laws.

If you need assistance or a reasonable accommodation, please reach out to us at this link.

Job Information :

Compensation Description (annually)

The base salary range for this position in the selected city is $334000 - $435000 annually. Compensation may vary outside of this range depending on a number of factors, including a candidate’s qualifications, skills, competencies and experience, and location.

Base pay is one part of the Total Package that is provided to compensate and recognize employees for their work, and this role may be eligible for additional discretionary bonuses / incentives, and restricted stock units.

Our company benefits are designed to convey company culture and values, to create an efficient and inspiring work environment, and to support our employees to give their best in both work and life.

We offer the following benefits to eligible employees :

  • 100% premium coverage for employee medical insurance, approximately 75% premium coverage for dependents and offer a Health Savings Account (HSA) with a company match.
  • Dental, Vision, Short / Long term Disability, Basic Life, Voluntary Life and AD&D insurance plans.
  • 10 paid holidays per year plus 17 days of Paid Personal Time Off (PPTO) (prorated upon hire and increased by tenure) and 10 paid sick days per year as well as 12 weeks of paid Parental leave and 8 weeks of paid Supplemental Disability.
  • Mental and emotional health benefits through our EAP and Lyra.
  • A 401K company match, gym and cellphone service reimbursements.

The Company reserves the right to modify or change these benefits programs at any time, with or without notice.

J-18808-Ljbffr

1 day ago
Related jobs
Promoted
Samsung Semiconductor
San Jose, California

To serve as the IT Infrastructure Architect you will design and implement information systems that support IT infrastructure. A key part of this mission is ensuring we have optimized IT infrastructure for the efficient and secure operation of the company’s computing and networking infrastructure — a...

Promoted
Karkidi
Mountain View, California

Master’s degree or PhD in Engineering, Computer Science, or a related technical field. Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. We're looking for engineers who bring fresh i...

Promoted
Mastech Inc.
San Jose, California

Front End Infrastructure/CAD Senior Engineer. Front End Infrastructure/ Senior CAD Engineer. Role: Front End Infrastructure/ Senior CAD Engineer. Front End Infrastructure/ Senior CAD Engineer. ...

Gatik
Mountain View, California

Gatik is hiring a Senior/Staff Vehicle Reliability Engineer. In this role you will develop vehicle and hardware tests and validation plans focusing on reliability and durability based upon the requirements provided. This role will be onsite in our Mountain View, CA headquarters. Create reliability t...

NVIDIA
Santa Clara, California

Participate in product and engineering design reviews, assess the reliability budget of products/designs, and inspire changes that enhance product reliability. Interface and interact with all pertinent engineering groups, suppliers, and partners ensuring the desired reliability is achieved using Des...

Fractal
CA, United States

Responding to alerts from all critical infrastructure resolving environment issues. Must be willing to participate in on-call rotationWork cross-functionally with Services and Engineering teams. Qualifications:Demonstrated a good understanding in deploying, managing, and operating scalable and fault...

Juniper Networks
Cupertino, California

You will keep stellar cloud uptime and reliability. Maintain system availability, health and service levels (SLAs, SLOs) of the large-scale cloud infrastructure, running in AWS and GCP. Support infrastructure components, data streaming frameworks and databases, such as Kubernetes, Flink, Storm, Spar...

General Motors
Mountain View, California

Chaos engineering implementation and experience a big plus. Experience with configuration and management of SSO, Big Data/ No-SQL in cloud infrastructure. BS/MS in Computer Science/Engineering preferred. This means the successful candidate is expected to report onsite three times per week at minimum...

Ajmera Infotech Inc.
San Jose, California

Site Reliability Engineer - Kubernetes. We are seeking a seasoned Senior Azure DevOps Engineer with extensive experience in Kubernetes to lead our cloud infrastructure initiatives. As a senior member of our DevOps team, you will be instrumental in designing, implementing, and optimizing our Azure-ba...

Power Integrations
San Jose, California

Evaluation and design of reliability test hardware for dynamic life testing, reliability testing, statistical analysis, modeling of reliability data, ATE test data interpretation, device level electrical characterization, and IC failure analysis. Only candidates with prior working experience as reli...