Site Reliability Engineer

Apple Inc.
Sunnyvale, California, US
Full-time

Imagine what you could do here. At Apple, new ideas have a way of becoming extraordinary products, services, and customer experiences very quickly.

Bring passion and dedication to your job and there's no telling what you could accomplish. The people here at Apple don’t just create products they create the kind of wonder that’s revolutionized entire industries.

It’s the diversity of those people and their ideas that inspires the innovation that runs through everything we do, from amazing technology to industry-leading environmental efforts.

Join Apple, and help us leave the world better than we found it.

All candidates should make sure to read the following job description and information carefully before applying.

Apple's Manufacturing Systems & Infrastructure (MSI) team is responsible for gathering, consolidating and tracking all manufacturing data for Apple’s products and modules worldwide.

This data is used throughout the company and the product's lifecycle, from the very beginning, to validate that units being built are fully tested and of high quality before leaving the factory, all of the way through to warranty support for customers.

As a Senior Site Reliability Engineer, you will play a critical role in maintaining and enhancing the reliability of our production systems.

You will collaborate with engineering teams to design, implement, and monitor infrastructure and services, employing your expertise in automation and performance optimization.

Description

Design, develop, and maintain scalable, reliable, and efficient infrastructure.

Implement monitoring, alerting, and logging systems to ensure the health and performance of applications.

Automate repetitive tasks and improve system efficiency through scripting and tool development.

Collaborate with development teams to improve service reliability and promote best practices in software development and deployment.

Conduct root cause analysis of system failures and implement corrective actions to prevent recurrence.

Participate in on-call rotations and respond to incidents, minimizing downtime and impact on users.

Drive continuous improvement initiatives to enhance system performance, scalability, and reliability.

Mentor and provide guidance to junior team members, fostering a culture of learning and innovation.

Minimum Qualifications

  • 7+ years of experience in site reliability engineering, DevOps, or a related field.
  • Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience.

Preferred Qualifications

  • Strong experience with cloud platforms : AWS, Google Cloud Platform, or Microsoft Azure.
  • Proficiency in infrastructure as code tools : Terraform, Ansible, or CloudFormation.
  • Expertise in containerization and orchestration : Docker, Kubernetes and HELM.
  • Experience with CI / CD pipelines and tools : Jenkins, ArcoCD.
  • Strong scripting and programming skills : Python, Go, Shell, or Ruby.
  • In-depth knowledge of monitoring and observability tools : Prometheus, Grafana, Open Telemetry, Splunk.
  • Familiarity with version control systems : Git.
  • Solid understanding of Linux / Unix system administration and networking.
  • Excellent problem-solving skills and a proactive approach to incident management.
  • Experience with database management and optimization : MySQL, PostgreSQL, or NoSQL databases like MongoDB and Cassandra.
  • Knowledge of message brokers and streaming platforms : Kafka, RabbitMQ, or Amazon Kinesis.

Apple is an equal opportunity employer that is committed to inclusion and diversity. We take affirmative action to ensure equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics.

Learn more about your EEO rights as an applicant.

J-18808-Ljbffr

7 days ago
Related jobs
Promoted
NetApp
San Jose, California

Manages, supports and maintains a reliable environment for the site in order to ensure the stability and security of multiple open-source systems/platforms that are run or operated in that environment. Building and supporting a reliable site for the environment in order to meet the development and m...

Promoted
Zscaler
San Jose, California

We're looking for an experienced Staff Site Reliability Engineer-Technical Duty Officer to join our Shared Platform Engineer team. Site Reliability Engineer, with relevant experience in an Operations or Engineering environment. Our Engineering team built the world's largest cloud security platform f...

Promoted
Spry Info Solutions, INC
Santa Clara, California

We are looking for a site reliability engineer with an expertise in Splunk configuration, setup and monitoring. Implement integration to external system to develop Splunk use cases and proliferate Splunk usage across the enterprise; provide engineering expertise and assistance to the Splunk user. Th...

Promoted
Rubrik
Palo Alto, California

Senior Site Reliability Engineers at Rubrik are systems/software engineers who ensure that Rubrik's infrastructure services run smoothly and have the capacity for future growth. As a Senior Site Reliability Engineer, you will be responsible for:. Minimum 3-5 years of experience as a Development, Dev...

Promoted
Palo Alto Networks
Santa Clara, California

Experience in Site Reliability Engineering, Production Engineering, or DevOps. As a Sr Principal Site Reliability Engineer, you will be part of a team supporting the services running on this infrastructure. This includes automation, architecture, performance, observability, troubleshooting, security...

Promoted
CV Library
Santa Clara, California

Work with development teams to ensure that applications have scalability and reliability built-in from day one - Agile is second nature to you and you're excited to work in scrum teams and represent the SRE perspective. Design and enhance software architecture to improve scalability, service reliabi...

Apple
Cupertino, California

We are looking for seasoned software and systems engineers to join the Block Storage SRE team at Apple. This engineer’s work will affect hundreds of millions of users and be essential to the success of some of the most visible current and future Apple features. We think critically and strive to bala...

Ajmera Infotech Inc.
San Jose, California

Site Reliability Engineer - Kubernetes. We are seeking a seasoned Senior Azure DevOps Engineer with extensive experience in Kubernetes to lead our cloud infrastructure initiatives. Bachelor’s degree in Computer Science, Engineering, or a related field. ...

Splunk Inc
San Jose, California
Remote

Site Reliability Engineers in this role will be engaging with multiple service owners across the platform to teach and implement modern interpretations ofSRE,observability, Chaos Engineering andDevOps. Splunk's Cloud Services group is looking for a Site ReliabilityEngineer to help lead, design and b...

Oracle
Redwood City, California

As a Site Reliability Engineer, you will solve interesting technical challenges by defining, designing, deploying, and solving key Oracle Cloud services, platforms, and infrastructure, always thinking about reliability, scalability, resilience, security, and performance. We are unencumbered and will...