Senior Site Reliability Engineer

https:/www.energyjobline.com/sitemap.xml
San Jose, California, US
Full-time

About the company

Apply fast, check the full description by scrolling below to find out the full requirements for this role.

It is the leading destination for short-form mobile video. It is the largest Unicorn startup and the leader in short-form video hosting service, surpassing 1.

3 billion mobile downloads in the United States and 2 billion worldwide. With 1.5 billion monthly active users, it ranks as one of the most popular social entertainment apps.

About the team

Our data infrastructure Site Reliability Engineering (SRE) team is a pioneer in innovation. We seamlessly merge software development and infrastructure operations to design, build, and manage large-scale, highly distributed systems.

We take pride in overseeing one of the industry's most extensive cloud infrastructures. As software development evolves, building systems from a mix of components has become the new standard.

In this era, SRE takes a central role. This role demands the ability to design, develop, and operate these components, transforming them into cloud-managed, scalable, and reliable elements.

Our professionals play a critical role as connectors, ensuring the seamless integration of these diverse components to deliver high-performing systems.

Our dynamic SRE field is about actively shaping the future of technology, not just keeping pace with it. We contribute significantly to the next chapter of data infrastructure.

We're currently in the process of building global teams around the world. Join us today and embark on this transformative journey!

Responsibilities :

  • Participate in and enhance the complete service lifecycle, from inception and design, through development, capacity planning, launch reviews, deployment, operation, and refinement.
  • Design and implement software platforms and monitoring frameworks to govern service-oriented architecture (SOA) efficiently, automatically, and intelligently.
  • Develop and manage components of cloud-managed data infrastructure, encompassing technologies such as Kubernetes, Redis, MySQL, Flink, and more.
  • Establish sustainable mechanisms for scaling systems, such as automation, to drive enhancements in reliability, efficiency, and velocity.
  • Provide sustainable user support, manage incident responses, and conduct blameless postmortems as part of our ongoing efforts to improve our systems.

Requirements :

  • Bachelor's degree in Computer Science or a related technical field with 5+ years of experience.
  • Experience programming in one of the following : C, C++, Java, Python, Go, and Rust.
  • Familiar with Unix / Linux system internals, networking, and distributed systems.
  • Experience in MySQL, Redis, Nginx, Kubernetes, Docker, OpenStack, Hadoop, Spark, Flink, etc.
  • Experience in designing and analyzing large-scale distributed systems.
  • Strong skills in problem solving and communication.
  • Bilingual in Mandarin and English.

Benefits :

Our company benefits are designed to convey company culture and values, create an efficient and inspiring work environment, and support our employees to give their best in both work and life.

We offer the following benefits to eligible employees :

100% premium coverage for employee medical insurance, approximately 75% premium coverage for dependents, and a Health Savings Account (HSA) with a company match.

Dental, Vision, Short / Long term, Basic Life, Voluntary Life, and AD&D insurance plans.

  • 10 paid holidays per year plus 17 days of Paid Personal Time Off (PPTO) (prorated upon hire and increased by tenure) and 10 paid sick days per year, as well as 12 weeks of paid Parental leave and 8 weeks of paid Supplemental leave.
  • Mental and emotional health benefits through our EAP and Lyra. A 401K company match, gym, and cellphone service reimbursements.

The Company reserves the right to modify or change these benefits programs at any time, with or without notice.

J-18808-Ljbffr

2 days ago
Related jobs
Promoted
TikTok
Mountain View, California

Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed services and infrastructures. As a site reliability engineer in the Ads data platform area, you will have the opportunity to manage the services and infrastructures in one...

Storm2
CA, United States

Senior Site Reliability Engineer. Work with engineering teams to establish and maintain reliability standards. Enhance system reliability through testing, fault tolerance, and disaster recovery planning. ...

Redolent Infotech Pvt. Ltd.
Sunnyvale, California

Develop software solutions to enable reliability and operability of large scale distributed systems. Create frameworks, processes, and best practices to be used across Engineering. ...

Atlassian
Mountain View, California

As a Site Reliability Engineer (SRE) you will actively work to improve the performance and reliability of services as well as address root causes of incidents and reduce incident rates. Love staying ahead of the growth curve and experimenting with new software and environments? Get on board as an At...

Cepton Technologies Inc.
San Jose, California

Senior Engineer of Product Quality and Reliability. Senior Engineer of Product Quality and Reliability. Work with electrical engineers (EE), mechanical engineers (ME), and process engineers to analyze failure modes and identify risks in the optical, electrical, and mechanical systems of LiDAR produc...

Palo Alto Networks
Santa Clara, California

We are seeking development heavy Site Reliability Engineers to design, build, maintain, and scale production services and server farms within our FedRAMP SASE product portfolio in. We want passionate engineers who bring new ideas in all facets of DevOps. Collaboration and partnership are at the foun...

Agile Datapro
CA, United States

Bachelor’s degree in computer science, Software Engineering, or a related field. ...

Xscape Photonics Inc
Santa Clara, California

We are seeking a skilled Laser Reliability and Failure Analysis Engineer to join our team in Santa Clara, CA. The successful candidate will be responsible for assessing the reliability of semiconductor lasers through various testing and analysis methods. Key responsibilities include performing Failu...

NEOPHOTONICS
San Jose, California

Senior Reliability & Failure Analysis Engineer. Senior Reliability & Failure Analysis Engineer. Senior Reliability & Failure Analysis Engineer. This position will be an integral member of the engineering team and work closely with design, product development, and product manufacturing to guarantee t...

Zscaler
San Jose, California

Position: Staff Site Reliability Engineer. Resolve escalations and help prevent reiteration of incidents with process, monitoring and reliability improvements. Relevant experience preferably in an Operations or Engineering environment. ...