Senior Site Reliability EngineerMango • Los Angeles, CA, United States

Senior Site Reliability Engineer

Mango • Los Angeles, CA, United States

job_description.job_card.1_day_ago

serp_jobs.job_preview.job_type

serp_jobs.job_card.full_time

job_description.job_card.job_description

We are seeking a Senior Site Reliability Engineer to own and evolve the infrastructure that supports our on-premise instruments, data systems, and machine learning pipelines. This role combines systems-level engineering with software craftsmanship, requiring deep understanding of how compute, storage, and networking layers interact under real workloads.You will be the go-to expert for diagnosing performance issues in our on-prem system. This could be from kernel-level I / O bottlenecks to distributed service latency. In addition to building robust automation that keeps our systems consistent and observable.Key ResponsibilitiesInfrastructure Design & Reliability Design, deploy, and maintain our on-premise and hybrid infrastructure which includes Dell PowerEdge and PowerVault servers, prosumer NAS units, and high-throughput data processing clusters. Implement fault-tolerant systems with reproducible deployments and clear observability.Performance & Systems Analysis Investigate complex performance issues across hardware, OS, and software boundaries. You will be using Linux toolin addition to in-house application-level metrics to uncover root causes in filesystems, caching layers, or I / O scheduling.Automation & Tooling Build automation for system provisioning, configuration management, and software deployment using Python, Go, Ansible, or similar frameworks. Develop lightweight services and tools that make reliability visible and maintainable.Collaboration Work closely with our software and hardware teams to co-design systems that meet the needs of high-resolution imaging and ML inference workloads. Translate hardware realities into software reliability guarantees.Observability & Incident Response Develop and maintain monitoring, alerting, and logging systems to ensure early detection of issues. Lead incident response and post-mortem efforts with a focus on learning and prevention.Documentation & Communication Produce clear documentation and communicate findings effectively to the broader team from network topology diagrams to kernel tuning rationales.General QualificationsDeep understanding of Linux systems and performance (I / O schedulers, RAID, caching, NUMA, kernel parameters).Hands-on experience designing and managing on-premise servers, storage arrays, or HPC clusters.Comfort with automation and software development (Python, Go, Bash, or similar).Strong diagnostic and analytical skills : ability to decompose performance problems across multiple layers.Proven track record of improving system reliability, throughput, and maintainability in a fast-paced environment.Excellent written and verbal communication skills for cross-disciplinary collaboration.Self-driven, curious, and motivated by understanding systems deeply rather than just maintaining them.Bonus Qualities (Not Required)510 years of relevant industry experience in systems engineering, SRE, or infrastructure software roles.Experience tuning Linux filesystems (ext4, btrfs) and software RAID (mdadm).Familiarity with containerization and orchestration (Docker, Compose, Kubernetes).Knowledge of networking fundamentals (VLANs, bonding, LACP, 10 GbE / 40 GbE).Experience supporting data-heavy scientific or ML workloads.Demonstrated technical leadership mentoring others in debugging, reliability, or performance analysis.

recblid a27ykxdqpvdzrj81gllu1mnyf3d85k

serp_jobs.job_alerts.create_a_job

Senior Site Reliability Engineer • Los Angeles, CA, United States

Job_description.internal_linking.related_jobs

Systems Engineer (Reliability, Maintainability & Availability – RMA)

G2 Ops, Inc. • El Segundo, CA, US

serp_jobs.job_card.full_time

serp_jobs.filters_job_card.quick_apply

El Segundo, CA at our customer site Work Setting : In person, some remote opportunity, and / or flexible working hours, not a fully remote position Salary Range : $105,000 – 160,000 plus com...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_variable_days

Senior Production Engineer

VirtualVocations • Garden Grove, California, United States

serp_jobs.job_card.full_time

A company is looking for a Senior Production Engineer to join their infrastructure and reliability engineering team.Key Responsibilities Design, automate, scale, and support production systems in...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_1_day • serp_jobs.job_card.promoted

Senior Vulnerability Management Engineer

VirtualVocations • Garden Grove, California, United States

serp_jobs.job_card.full_time

A company is looking for a Senior Vulnerability Management Engineer to lead the identification, assessment, and remediation of security vulnerabilities across enterprise systems.Key Responsibilitie...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_variable_hours • serp_jobs.job_card.promoted • serp_jobs.job_card.new

Senior Observability Engineer

VirtualVocations • Huntington Beach, California, United States

serp_jobs.job_card.full_time

A company is looking for a Senior Engineer, Observability.Key Responsibilities Configure and tune monitoring tools for proactive management of customer environments Document processes and standa...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted

Lead Plumbing Engineer

ACCO Engineered Systems • Costa Mesa, CA, United States

serp_jobs.job_card.full_time

This position is responsible for independently delivering engineering services, from conceptual design through construction completion. Essential Duties & Responsibilities.Complete project planning,...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted

Senior Plumbing Engineer

ACCO Engineered Systems • Pasadena, CA, United States

serp_jobs.job_card.full_time

This position is responsible for independently delivering engineering services, from conceptual design through construction completion. Essential Duties & Responsibilities.Expert in project planning...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted

Senior Site Reliability Engineer

VirtualVocations • Pasadena, California, United States

serp_jobs.job_card.full_time

A company is looking for a Senior Site Reliability Engineer (DevOps / DevSecOps).Key Responsibilities Architect, maintain, and optimize AWS GovCloud infrastructure for scalability and cost efficien...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted

Reliability Engineering Manager

FLIR Systems • El Segundo, CA, US

serp_jobs.job_card.full_time

Teledyne Technologies Incorporated provides enabling technologies for industrial growth markets that require advanced technology and high reliability. These markets include aerospace and defense, fa...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted

Customer Reliability Engineer

VirtualVocations • Garden Grove, California, United States

serp_jobs.job_card.full_time

A company is looking for a Customer Reliability Engineer III.Key Responsibilities Manage and resolve customer technical issues via support tickets and real-time interactions Act as a liaison bet...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted

Senior Plumbing Engineer - Healthcare - P2S

P2S Inc. • Long Beach, CA, United States

serp_jobs.job_card.full_time

Our specialties include electrical, mechanical, plumbing, fire protection, and technology integration.Our offered services range from engineering and commissioning to construction management.With o...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted

Infrastructure, DevOps & Reliability Engineer (Multiple Roles, Remote & On-Site)

MLabs • Los Angeles, CA, US

serp_jobs.job_card.full_time

serp_jobs.filters_job_card.quick_apply

We’re recruiting Infrastructure, DevOps, and Reliability Engineers for high-growth startups including .AirGarage, Dyno Therapeutics, Codex Health, and Banquet Health.These roles focus on scali...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_30

Sediment Remediation Engineer

VirtualVocations • North Hollywood, California, United States

serp_jobs.job_card.full_time

A company is looking for a Sediment Remediation Technical Director.Key Responsibilities Develop and supervise environmental engineering products, including technical reports and design specificat...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted

Plumbing Design Engineer - Healthcare - P2S

P2S Inc. • Long Beach, CA, United States

serp_jobs.job_card.full_time

serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted

Kubernetes Resident Engineer

VirtualVocations • Whittier, California, United States

serp_jobs.job_card.full_time

A company is looking for a Portworx Engineer.Key Responsibilities Deploy and manage Portworx Enterprise across various Kubernetes environments Lead troubleshooting efforts related to persistent ...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_variable_hours • serp_jobs.job_card.promoted • serp_jobs.job_card.new

Site Reliability Engineer

VirtualVocations • Van Nuys, California, United States

serp_jobs.job_card.full_time

A company is looking for a Site Reliability Engineer to join their SRE platform team.Key Responsibilities Contribute to the design, implementation, and maintenance of scalable and reliable system...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted

Lead Site Reliability Engineer

VirtualVocations • Huntington Beach, California, United States

serp_jobs.job_card.full_time

A company is looking for a Lead Site Reliability Engineer (SRE).Key Responsibilities Drive incident response best practices, lead postmortems, and define SLAs / SLOs across platform services Colla...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted

Principal Site Reliability Engineer

VirtualVocations • Glendale, California, United States

serp_jobs.job_card.full_time

A company is looking for a Principal Site Reliability Engineer.Key Responsibilities Design, implement, and manage scalable systems ensuring high availability and optimal performance Lead automat...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted

Senior Test Engineer

Jobot • Long Beach, CA, US

serp_jobs.job_card.full_time

Senior Test Engineer Needed For Innovative and Growing Space, Defense, and Aerospace Engineering and Manufacturing Company. This Jobot Job is hosted by : Billy Surch.Are you a fit? Easy Apply now by ...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted

Senior Database Reliability Engineer

VirtualVocations • North Hollywood, California, United States

serp_jobs.job_card.full_time

A company is looking for a Senior Database Reliability Engineer to ensure the performance, scalability, and reliability of its databases and supporting applications. Key Responsibilities Optimize ...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_1_day • serp_jobs.job_card.promoted

Illinois Licensed Solutions Engineer

VirtualVocations • Carson, California, United States

serp_jobs.job_card.full_time

Solutions Engineer, Retail - CPG.Key Responsibilities Serve as the main technical voice for 1-5 clients, guiding them through their transformational journey Implement technical strategies aligne...serp_jobs.internal_linking.show_more

serp_jobs.last_updated.last_updated_1_day • serp_jobs.job_card.promoted