Talent.com
Senior / Staff Site Reliability Engineer, Storage

Senior / Staff Site Reliability Engineer, Storage

FluidstackSan Francisco, CA, United States
job_description.job_card.30_days_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
job_description.job_card.job_description

About Fluidstack

Fluidstack is building GPU supercomputers for top AI labs, governments, and enterprises. Our customers include Mistral, Poolside, Black Forest Labs, Meta, and more.

Our team is small, highly motivated, and focused on providing a world class supercomputing experience. We put out customers first in everything we do, working hard to not just win the sale, but to win repeated business and customer referrals.

We hold ourselves and each other to high standards. We expect you to care deeply about the work you do, the products you build, and the experience our customers have in every interaction with us.

You must work hard, take ownership from inception to delivery, and approach every problem with an open mind and a positive attitude. We value effectiveness, competence, and a growth mindset.

About the Role

Our Senior / Staff Site Reliability Engineer (Storage) is the connective tissue of Fluidstack’s platform. As part of a small, senior team you’ll own the availability, performance and cost-efficiency of our storage, compute and networking layers. You’ll combine software engineering, systems thinking and a relentless customer focus to keep our SLIs and SLOs razor-sharp — and to raise the bar every quarter.

Focus

Automate everything. Replace repetitive ops with Python / Go tooling, Kubernetes operators and GitOps workflows.

Tune the stack. Profile and optimise storage I / O paths, hypervisors and kernel parameters to crush tail-latency.

Harden for scale. Design failure-tolerant architectures, run game-days and embed chaos engineering to validate them.

Own incidents. Lead 24×7 on-call rotations, drive blameless post-mortems and turn lessons into lasting fixes.

Partner with engineers. Review designs, instrument new services and evangelise reliability patterns across product teams.

Measure what matters. Define SLIs / SLOs that map directly to customer experience and build dashboards / alerts to track them.

Drive continuous improvement. Identify tech debt, propose roadmap items and mentor engineers on reliability best practice.

About you

10+ yrs professional SRE / production-engineering experience, including large-scale architecture & design.

Proficiency in Python, Go or similar; able to write clean, tested, maintainable code.

Deep hands-on knowledge of Docker, Kubernetes, Terraform / Ansible, and modern CI / CD (GitLab, GitHub Actions, etc.).

Expertise in observability stacks (Prometheus, Grafana, OpenTelemetry) and incident-management workflows.

Strong grasp of Linux internals, TCP / IP networking and security best-practices.

Excellent written & verbal communication; comfortable leading cross-functional deep-dives.

Benefits

Competitive total compensation package (cash + equity).

Retirement or pension plan, in line with local norms.

Health, dental, and vision insurance.

Generous PTO policy, in line with local norms.

#J-18808-Ljbffr

serp_jobs.job_alerts.create_a_job

Senior Site Reliability Engineer • San Francisco, CA, United States

Job_description.internal_linking.related_jobs
  • serp_jobs.job_card.promoted
Senior Site Reliability Engineer, Scalability

Senior Site Reliability Engineer, Scalability

Meraki, LLCSan Francisco, CA, United States
serp_jobs.job_card.full_time
Application window is open until further notice.The Infrastructure SRE team is responsible for the compute, storage and security underpinning Meraki's cloud in 10 data centers worldwide.Meraki's hi...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
  • serp_jobs.job_card.promoted
Site Reliability Engineer - SRE at Descope Los Altos, CA

Site Reliability Engineer - SRE at Descope Los Altos, CA

Itlearn360Los Altos, CA, United States
serp_jobs.job_card.full_time
Site Reliability Engineer - SRE job at Descope.Descope R&D group is a skilled team of developers with a unique DNA of creativity,flexibility,anopen mindset. We are looking for a passionate SRE to jo...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
  • serp_jobs.job_card.promoted
Site Reliability Engineer

Site Reliability Engineer

ConductorOneSan Francisco, CA, United States
serp_jobs.job_card.full_time
Shape the future of identity with the highest-caliber team.If you’re amazing at what you do and want to solve big challenges in identity and security, come on board. Identity is how companies are be...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
  • serp_jobs.job_card.promoted
Staff Site Reliability Engineer, Storage

Staff Site Reliability Engineer, Storage

Epoch BiodesignSan Francisco, CA, United States
serp_jobs.job_card.full_time
Crusoe is building the World’s Favorite AI-first Cloud infrastructure company.We’re pioneering vertically integrated, purpose-built AI infrastructure solutions trusted by Fortune 500 companies to p...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
  • serp_jobs.job_card.promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Rollbar, Inc.San Francisco, CA, United States
serp_jobs.job_card.full_time
Wikimedia Foundation is hiring a Senior Site Reliability Engineer (SRE) to join our Service Operations SRE team, where we take care of the infrastructure that runs wikipedia.The SRE team at Wikimed...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
  • serp_jobs.job_card.promoted
Site Reliability Engineer

Site Reliability Engineer

ZapierSan Francisco, CA, United States
serp_jobs.job_card.full_time
We're humans who simply think computers should do more work.At Zapier, we’re not just making software—we’re building a platform to help millions of businesses globally scale with automation and AI....serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
  • serp_jobs.job_card.promoted
Site Reliability Engineer

Site Reliability Engineer

PsiQuantumPalo Alto, CA, United States
serp_jobs.job_card.full_time
Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
  • serp_jobs.job_card.promoted
Senior Site Reliability Engineer, Storage

Senior Site Reliability Engineer, Storage

Epoch BiodesignSan Francisco, CA, United States
serp_jobs.job_card.full_time
Crusoe Energy is on a mission to unlock value in stranded energy resources through the power of computation.Take a look at what we do! - https : / / www. We aim to align the long term interests of the c...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
  • serp_jobs.job_card.promoted
Senior / Staff Site Reliability Engineer, Compute

Senior / Staff Site Reliability Engineer, Compute

FluidstackSan Francisco, CA, United States
serp_jobs.job_card.full_time
Fluidstack is building GPU supercomputers for top AI labs, governments, and enterprises.Our customers include Mistral, Poolside, Black Forest Labs, Meta, and more. Our team is small, highly motivate...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
  • serp_jobs.job_card.promoted
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Palo Alto NetworksSanta Clara, CA, US
serp_jobs.job_card.full_time
At Palo Alto Networks® everything starts and ends with our mission : .Being the cybersecurity partner of choice, protecting our digital way of life. Our vision is a world where each day is safer a...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_hours
  • serp_jobs.job_card.promoted
Site Reliability Engineer - Technical Lead

Site Reliability Engineer - Technical Lead

ZipRecruiterSan Francisco, CA, United States
serp_jobs.job_card.full_time
Veryon is a leading software and technology company that enables aviation teams around the world to improve efficiency and safety. Our products maximize uptime for aircraft maintenance teams through...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
  • serp_jobs.job_card.promoted
Senior Software Engineer - Site Reliability

Senior Software Engineer - Site Reliability

Ironclad Inc.San Francisco, CA, United States
serp_jobs.job_card.full_time
Every dollar earned, relationship formed, and advantage gained comes down to the contract that makes it real.But getting a contract done is more complicated than it should be.And when contract data...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
  • serp_jobs.job_card.promoted
Site Reliability Engineer

Site Reliability Engineer

CanonicalSan Francisco, CA, United States
serp_jobs.job_card.full_time
Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiat...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
  • serp_jobs.job_card.promoted
Site Reliability Engineer (SRE) - grok.com & API

Site Reliability Engineer (SRE) - grok.com & API

Pantera CapitalPalo Alto, CA, United States
serp_jobs.job_card.full_time
AI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
  • serp_jobs.job_card.promoted
Site Reliability Engineer

Site Reliability Engineer

WritemedSan Francisco, CA, United States
serp_jobs.job_card.full_time
Would you like to join one of the fastest-growing organizations with a goal of using the latest AI, GenAI, LLM, Cloud, and Digital Technologies to advance drug development and improve patient care ...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
  • serp_jobs.job_card.promoted
Senior Software Engineer, Site Reliability Engineer (SRE)

Senior Software Engineer, Site Reliability Engineer (SRE)

harvey.aiSan Francisco, CA, United States
serp_jobs.job_card.full_time
At Harvey, we’re transforming how legal and professional services operate — not incrementally, but end-to-end.By combining frontier agentic AI, an enterprise-grade platform, and deep domain experti...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
  • serp_jobs.job_card.promoted
Associate Site Reliability Engineer

Associate Site Reliability Engineer

Salesforce, Inc.San Francisco, CA, United States
serp_jobs.job_card.full_time
To get the best candidate experience, please consider applying for a maximum of 3 roles within 12 months to ensure you are not duplicating efforts. Job CategorySoftware EngineeringJob Details • • • •Abo...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
  • serp_jobs.job_card.promoted
Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE)

BasetenSan Francisco, CA, United States
serp_jobs.job_card.full_time
Site Reliability Engineer (SRE).Baseten powers inference for the world's most dynamic AI companies, like OpenEvidence, Clay, Mirage, Gamma, Sourcegraph, Writer, Abridge, Bland, and Zed.By uniting a...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30