Search jobs > Cupertino, CA > Site reliability engineer

Site Reliability Engineer - Redis

Apple
Cupertino
Full-time

Summary :

The Apple Service Engineering - Redis SRE team is looking for Site Reliability Engineers with experience in developing processes, tools, and automation for managing distributed systems in production environments.

Our SRE team combines software and systems engineering and system administration practices to build and run large-scale, massively distributed, fault-tolerant systems.

Our software ensures that Apple’s services are reliable, scalable and secure, and we leverage both open source and home-grown technologies to provide managed data infrastructure services.

You will help building next generation search infrastructure and platform services, collaborating cross-functionally with various ASE teams, from store and commerce to search and recommendations.

You’ll create platforms that can rapidly scale to serve personalized and non-personalized data with very low latencies. You should be someone who is not afraid to question assumptions, are a good standout colleague under tight deadlines, and can take on problems with elegant technical solutions.

Key Qualifications :

Demonstrated expertise developing database systems, storage engines, distributed systems, or performance engineering.Experience developing critical internet services and / or platform infrastructure.

Proficient in modern Java and optionally Python / Go.Optional experiencing with managing services run on Kubernetes Optional experience with EC2, EBS, and Terraform

Description :

The ASE Redis SRE team develops applications and tooling that are safe, reliable, scalable, and fast. This work requires an innovative spirit and an extraordinary degree of care and difficulty in engineering.

Team members contribute to all major components of Redis deployment infrastructure, including maintenance automation, backup service application, monitoring and alerting tooling / dashboards, deployment architecture, focused on stability, performance, and scaling.

Success in this role requires expertise in several of the following : - Understanding of core SRE concepts - Monitoring, Alerting, Incident management.

  • Understanding of database concepts (consistency models, isolation levels, crash and recovery semantics). - Performance engineering (design concepts, profile-guided optimization).
  • Service management across a bare metal, virtualized (EC2),Kubernetes platforms. - Fundamentals of system-level hardware and networking components (storage devices and controllers, network interfaces, CPU and memory layout in server-class systems).
  • Operating systems concepts (process scheduling, disk and network I / O, performance). - Datacenter architecture (networking topologies, host placement strategies, and failure modes);

design of multi-datacenter systems; failure domains; and wide-area networking. This role also requires excellent communication and a high degree of customer focus when engaging with internal platform customers.

As a distributed team, ability to work optimally with colleagues based in other locations is also essential; experience in this area is a plus.

Prior experience with development or maintenance of distributed databases / storage systems is recommended. Apple values craftsmanship and Performance is a key ingredient.

Come join us at Apple Services Engineering and help us deliver services and applications that are fluid and responsive. You will collaborate with engineers from across Apple to define the metrics, set targets, uncover optimization opportunities, define quality guardrails, and ship a product / service that will delight our customers.

This role is for engineers who enjoy deep technical engineering that spans large cross-organizational projects. Your openness to learning and implementing new technologies will contribute to the continuous evolution of our organization.

Additional Requirements :

30+ days ago
Related jobs
Promoted
TikTok
Mountain View, California

Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed services and infrastructures. As a site reliability engineer in the data platform area, you will have the opportunity to manage the services and infrastructures in one of ...

Promoted
TikTok
San Jose, California

Scale up systems sustainably through mechanisms like automation, and initiate changes that improve system reliability and processing speed. Bachelor's degree in Computer Science or a related technical background involving software/system engineering, or equivalent working experience. Hands on experi...

Promoted
TikTok
San Jose, California

Deliver tools/software to improve the reliability and scalability of services, automate operations and improve R&D efficiency. At least 2 years of work experience in SRE of large-scale systems deployment with high reliability and scalability. ...

Promoted
Palo Alto Networks
Santa Clara, California

We are looking for an exceptional Principal Site Reliability Engineer to enhance our ATP Infra team. This role will work on producing mission-critical platforms, tools, and processes that will ensure the highest levels of availability and reliability of all our applications. Represent SRE in design ...

Apple
Cupertino, California

We are looking for seasoned software and systems engineers to join the Block Storage SRE team at Apple. This engineer’s work will affect hundreds of millions of users and be essential to the success of some of the most visible current and future Apple features. We think critically and strive to bala...

Adobe
San Jose, California

We have a phenomenal opportunity for a Site Reliability Engineer to join our RTCDP team. Experience working as a Site Reliability Engineer or in a similar role. From the moment you wake up in the morning until you go to bed at night, consider the media you consume, the adverts you see, the apps you ...

NVIDIA
Santa Clara, California
Remote

Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination of software and systems engineering practices. SRE at NVIDIA ensures that our internal and external facin...

Splunk Inc
California, United States
Remote

Site Reliability Engineers in this role will be engaging with multiple service owners across the platform to teach and implement modern interpretations ofSRE,observability, Chaos Engineering andDevOps. Splunk's Cloud Services group is looking for a Site ReliabilityEngineer to help lead, design and b...

Palo Alto Networks
Santa Clara, California

We are looking for an exceptional Principal Site Reliability Engineer to enhance our ATP Infra team. This role will work on producing mission-critical platforms, tools, and processes that will ensure the highest levels of availability and reliability of all our applications. Represent SRE in design ...

Palo Alto Networks
Santa Clara, California

We are seeking development heavy Site Reliability Engineers to design, build, maintain, and scale production services and server farms within our FedRAMP SASE product portfolio in. We want passionate engineers who bring new ideas in all facets of DevOps. Collaboration and partnership are at the foun...