Search jobs > San Jose, CA > Site reliability engineer

Site Reliability Engineer, Cloud Native Platform

TikTok
San Jose, California, US
$136.8K-$280K a year
Full-time

Responsibilities

Please read the following job description thoroughly to ensure you are the right fit for this role before applying.

TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. TikTok has global offices including Los Angeles, New York, London, Paris, Berlin, Dubai, Singapore, Jakarta, Seoul and Tokyo.

Our infrastructure team is seeking experienced site reliability engineers to build globally distributed edge platform for provisioning and deploying edge services.

Our team operates a large network of edge POPs around the world to accelerate site traffic and cache CDN content. We use Kubernetes to manage on-prem / cloud nodes and build an eco-system around it, including tools for monitoring, alerting, logging, CI / CD, etc.

and various services with automated deployment and scaling in order to maximize daily operation efficiencies. On top of the Kubernetes infra, we build the edge computing platform (PaaS) to help deploy and manage global edge services.

Key Responsibilities

  • Deploy and administrate Kubernetes clusters both on-prem and in cloud (AWS, GCP, etc.).
  • Collaborate with software engineers to build enterprise-level edge computing platform (PaaS) with cutting-edge Cloud Native Computing Foundation (CNCF) technologies.
  • Design, develop, automate, and continuously improve platform services and pipelines, such as monitoring, alerting, logging, tracing, CI / CD, etc.
  • Improve Kubernetes system efficiency and debug issues related to networking, storage, scheduling, etc.
  • Collaborate with open-source communities to advance Kubernetes and edge computing technologies.

Qualifications

Minimum Qualifications

  • Master’s degree (or Bachelor's degree with 3+ years of experience) in Computer Engineering, Computer Science, or related fields.
  • 1+ years of experience in Kubernetes administration.
  • 3+ years of experience in Unix / Linux systems from kernel to shell and beyond.
  • Experience with Kubernetes CNI deployment and troubleshooting, including (but not limited to) the following CNIs : Cilium, Kube-Router, Calico, Flannel.
  • Experience in designing, analyzing, and building automation tools for large scale and complex systems.

Preferred Qualifications

  • CKA (Certified Kubernetes Administrator) certification.
  • Experience in using and contributing to open-source projects in Kubernetes ecosystem, e.g. Kubespray, CNI, Helm, Istio / Linkerd, Prometheus, ArgoCD, OPA, Harbor, Envoy, etc.
  • Experience in networking technologies such TCP / IP, BGP, DNS, load balancers, etc.
  • Experience in CI / CD pipeline design and development.
  • Experience in Kubernetes API, Operator, and Custom Resource Definition (CRD) development.

Inclusivity Commitment

TikTok is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives.

Our platform connects people from across the globe and so does our workplace. We are passionate about this and hope you are too.

Job Information :

The base salary range for this position in the selected city is $136800 - $280000 annually. Compensation may vary outside of this range depending on a number of factors, including a candidate’s qualifications, skills, competencies and experience, and location.

Our company benefits are designed to convey company culture and values, to create an efficient and inspiring work environment, and to support our employees to give their best in both work and life.

J-18808-Ljbffr

2 days ago
Related jobs
Promoted
TikTok
San Jose, California

Stay Updated: Keep current with industry trends, best practices, and emerging technologies related to site reliability and infrastructure engineering. Our platform is built to help imaginations thrive. Our Compute Platform SRE team supports all Big Data services and products across the company. We a...

Promoted
jobs.lever.co - ATS
Palo Alto, California

Proven work experience 10+ yrs as a reliability engineer, production engineer, infrastructure software engineer, or a similar role in a fast-paced, rapidly scaling company. Strong proficiency in GPU cloud infrastructure, including the underlying concepts of scheduling, scaling, cloud storage, networ...

Promoted
TikTok
Mountain View, California

Site Reliability Engineering (SRE) at TikTok combines software and systems engineering to build and run large-scale, massively distributed, and fault-tolerant systems. Our focus is on providing oversight and protection of the TikTok platform and U. The teams within USDS that deliver on this commitme...

Promoted
Bytedance
San Jose, California

Our System Technology and Engineering (STE) team is committed to the development and research in infrastructure system technologies, such as operating system kernel, various virtualization technologies (Cloud Native, SDN, NFV), performance optimization of system software and libraries, stability and...

Promoted
TikTok
Mountain View, California

Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed services and infrastructures. As a site reliability engineer in the Ads data platform area, you will have the opportunity to manage the services and infrastructures in one...

Promoted
TikTok
San Jose, California

Our data infrastructure Site Reliability Engineering (SRE) team is a pioneer in innovation. Design and implement software platforms and monitoring frameworks to govern service-oriented architecture (SOA) efficiently, automatically, and intelligently. Develop and manage components of cloud-managed da...

ByteDance
San Jose, California

TEAM INTRODUCTION Our data infrastructure Site Reliability Engineering (SRE) team is a pioneer in innovation. With a suite of more than a dozen products, including TikTok, Helo, and Resso, as well as platforms specific to the China market, including Toutiao, Douyin, and Xigua, ByteDance has made it ...

VLink Inc
Mountain View, California

CALIFORNIA or WASHINGTON residents.Must be comfortable for hands on Python Coding.Linux Admin (System Administration & Network Configuration).Debugging & Troubleshooting (Application and Infrastructure) production performance issues.Knowledge of MQ (Message Queue – i.CICD Tooling & DevOps Automation...

CDK Global
San Jose, California
Remote

Software Engineer - (SRE - Site Reliability Engineer). We have a highly collaborative and supportive team ready to bring you into our diverse, exciting, and innovation rich hybrid on-premise and AWS cloud environment. Work with internal groups such as Product Engineering, Tools and QA to adopt SRE b...

ByteDance
San Jose, California

With a suite of more than a dozen products, including TikTok and Helo as well as platforms specific to the China market, including Toutiao, Douyin, and Xigua, ByteDance has made it easier and more fun for people to connect with, consume, and create content. With the mission of making content creatio...