Search jobs > San Jose, CA > Site reliability engineer

Site Reliability Engineer, Cloud Native Platform

ByteDance
San Jose
Full-time

ResponsibilitiesFounded in 2012, ByteDance's mission is to inspire creativity and enrich life. With a suite of more than a dozen products, including TikTok and Helo as well as platforms specific to the China market, including Toutiao, Douyin, and Xigua, ByteDance has made it easier and more fun for people to connect with, consume, and create content.

Why Join UsCreation is the core of ByteDance's purpose. Our products are built to help imaginations thrive. This is doubly true of the teams that make our innovations possible.

Together, we inspire creativity and enrich life - a mission we aim towards achieving every day. To us, every challenge, no matter how ambiguous, is an opportunity;

to learn, to innovate, and to grow as one team. Status quo? Never. Courage? Always. At ByteDance, we create together and grow together.

That's how we drive impact - for ourselves, our company, and the users we serve. Join us. Edge Platform team is seeking experienced SRE to develop our edge platform networking and maintain its stability as well as drive the functionality and capability of our infrastructure to the next level.

Our team builds kubernetes / k8s based global edge platform to manage TikTok's self-built CDN PoPs, supporting workload and service management, providing L4 / L7 ingress and service mesh cpability, and enabling dynamic traffic routing and scheduling.

Edge Platform team operates hundreds of POPs and their networking and traffic around the world to run edge workloads ( CDN cache, live streaming, gaming, real-time communication, etc.

The L4 / L7 ingress with traffic scheduling capability are the core part of the platform, as most edge workloads are networking intensive.

We are looking for passionate engineers to join and work together to build a cloud-native edge platform that provides one-stop solutions for edge services. Responsibilities

  • Deploy and administrate Kubernetes clusters both on-prem and in cloud (AWS, GCP, etc.).
  • Collaborate with software engineers to build enterprise-level platform (PaaS) with cutting-edge Cloud Native Computing Foundation (CNCF) technologies.
  • Design, develop, automate, and continuously improve platform services and pipelines, such as monitoring, alerting, logging, tracing, CI / CD, etc.
  • Improve Kubernetes system efficiency and debug issues related to networking, storage, scheduling, etc.
  • Collaborate with open-source communities to advance Kubernetes and Cloud Native technologies.QualificationsMinimum Qualifications
  • Bachelor's degree with 2+ years of experience in Computer Engineering, Computer Science, or related fields.
  • 2+ years of experience in Kubernetes administration.
  • 3+ years of experience in Unix / Linux systems from kernel to shell and beyond.
  • Experience with Kubernetes CNI deployment and troubleshooting, including (but not limited to) the following CNIs : Cilium, Kube-Router, Calico, Flannel. Preferred Qualifications
  • CKA (Certified Kubernetes Administrator) certification.
  • Experience in using and contributing to open-source projects in Kubernetes ecosystem, Kubespray, CNI, Helm, KubeEdge, Istio / Linkerd, Prometheus, ArgoCD, OPA, Harbor, Envoy, etc.
  • Experience in networking technologies such TCP / IP, BGP, DNS, load balancers, etc.
  • Experience in CI / CD pipeline design and development.
  • Experience in Kubernetes API, Operator, and Custom Resource Definition (CRD) development.
  • Experience in designing, analyzing, and building automation tools for large scale and complex systems. ByteDance is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives.

Our platform connects people from across the globe and so does our workplace. At ByteDance, our mission is to inspire creativity and enrich life.

To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach.

We are passionate about this and hope you are too. ByteDance Inc. is committed to providing reasonable accommodations in our recruitment processes for candidates with disabilities, pregnancy, sincerely held religious beliefs or other reasons protected by applicable laws.

If you need assistance or a reasonable accommodation,

30+ days ago
Related jobs
Promoted
TikTok
Mountain View, California

Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed services and infrastructures. As a site reliability engineer in the Ads data platform area, you will have the opportunity to manage the services and infrastructures in one...

Promoted
Google Inc.
Mountain View, California

Software Engineering Manager II, Site Reliability Engineering, Google Cloud. Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. Master's degree in Computer Science or Engineering. SRE ensures that ...

Promoted
TikTok
San Jose, California

Provide site reliability engineering support to deploy and maintain the machine learning (ML) system and platform, including training, inference, and pipeline orchestration in the production environment under the guidance of Senior-level SREs. Working across all phases of the SDLC, including require...

Promoted
Karkidi
Mountain View, California

Software Engineering Managers have not only the technical expertise to take on and provide technical leadership to major projects, but also manage a team of Engineers. With technical and leadership expertise, you manage engineers across multiple teams and locations, a large product budget and overse...

NVIDIA
Santa Clara, California
Remote

Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination of software and systems engineering practices. SRE at NVIDIA ensures that our internal and external facin...

BlueFish Technologies
CA, United States

Role: Site Reliability Egineer(SRE)</b></p> <p><b>Location: Mountain View, CA(Hybrid 2 days Onsite)</b></p> <p aria-hidden="true"> </p> <p aria-hidden="true"> </p> <p><b>Job Description / Requirement:</b...

Protingent
Sunnyvale, California

Site Reliability Engineer (SRE). Protingent Staffing has an exciting contract opportunity for Site Reliability Engineer (SRE) with our client located in Sunnyvale, CA. Able to navigate through diverse cloud platforms. Participate in system design consulting, platform management, and capacity plannin...

TikTok
Mountain View, California

Team Insight:CDN Site Reliability Engineering combines software and network engineering with system operations to build and run large-scale, massively distributed infrastructure. CDN performance and traffic engineering, network solution architecting or network-focused site reliability engineering ro...

Splunk Inc
California, United States
Remote

Splunk's Cloud Services group is looking for a Site ReliabilityEngineer to help lead, design and build the next generation of our large scale cloudoffering. Site Reliability Engineers in this role will be engaging with multiple service owners across the platform to teach and implement modern interpr...

Siemens Industry Software Inc.
Fremont, California

The position involves performance based compensation and reports to anInfrastructure Engineering Manager who manages personnel at multiple sites. Are you ready to have your system support skills andexperience leveraged to improve the productivity of developers working onworld-class engineering softw...