Talent.com
Site Reliability Engineer - Inference
Site Reliability Engineer - InferenceJobright.ai • San Francisco, CA, United States
Site Reliability Engineer - Inference

Site Reliability Engineer - Inference

Jobright.ai • San Francisco, CA, United States
job_description.job_card.30_days_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
job_description.job_card.job_description

Join to apply for the Site Reliability Engineer - Inference role at Jobright.ai

2 days ago Be among the first 25 applicants

Join to apply for the Site Reliability Engineer - Inference role at Jobright.ai

Get AI-powered advice on this job and more exclusive features.

Jobright is an AI-powered career platform that helps job seekers discover the top opportunities in the US. We are NOT a staffing agency. Jobright does not hire directly for these positions. We connect you with verified openings from employers you can trust.

Job Summary :

Lambda is the #1 GPU Cloud for ML / AI teams, providing tools for building, testing, and deploying AI products at scale. The Site Reliability Engineer - Inference will work on developing a large-scale platform for running AI models and building a high-throughput, low-latency API for distributed systems.

Responsibilities :

  • Work on our Inference service, helping us to develop our large-scale platform for running new, cutting-edge models across tens of thousands of GPUs
  • Help build a high-throughput, low-latency API and routing system running at geographically-distributed scale
  • Shape a highly reliable distributed system with a focus on reducing operational overhead and deep observability and capacity management.
  • Work with the team and our internal ML researchers to adopt and improve new inference engines, models and architectures across a variety of different mediums (such as text, image, video and audio)
  • Tackle global networking challenges to deliver the lowest possible latency to our users across all of Lambda’s available capacity
  • Help push Lambda forward into the state of the art, and be part of a team that is operating right at the edge of new developments in the industry.

Qualifications : Required :

  • 8 or more years of experience as a software reliability engineer or software engineer working on large-scale, internet-facing production services
  • Highly skilled at writing Go and Python
  • Experience with bare-metal system installation and administration
  • Experience deploying applications and operators on Kubernetes
  • Product-focused, balancing operational needs and keeping overheads down with the need to ship features at a rapid pace
  • Proven track record of working in an environment with rapid deployment and the ability to stay on top of shifting priorities as the industry rapidly develops
  • Willingness to take ownership of projects and help drive them forwards through design, implementation, launch, and maintenance.
  • Preferred :

  • Experience working with machine learning models
  • Experience operating large-scale, geographically distributed systems
  • Experience developing Kubernetes operators and components
  • Company :

    Lambda provides infrastructure, cloud services, and software for the training and inferencing of AI models. Founded in 2012, headquartered in San Jose, California, USA, team size 201-500 employees, currently Late Stage. Lambda has a track record of offering H1B sponsorships.

    Seniority level

    Seniority level

    Mid-Senior level

    Employment type

    Employment type

    Full-time

    Job function

    Industries

    Software Development

    Referrals increase your chances of interviewing at Jobright.ai by 2x

    Inferred from the description for this job

    Medical insurance

    Vision insurance

    401(k)

    Get notified when a new job is posted.

    Sign in to set job alerts for “Site Reliability Engineer” roles.

    San Francisco, CA $160,000.00-$180,000.00 4 days ago

    Software Engineer, Infrastructure, Early Career

    San Francisco, CA $126,000.00-$170,000.00 11 hours ago

    San Francisco, CA $180,000.00-$280,000.00 3 days ago

    San Francisco, CA $130,000.00-$238,000.00 1 day ago

    San Francisco, CA $150,000.00-$250,000.00 1 day ago

    San Francisco, CA $150,000.00-$230,000.00 4 months ago

    San Francisco, CA $99,500.00-$200,000.00 2 weeks ago

    Full-Stack Software Engineer (Jr / Mid level)

    San Francisco, CA $120,000.00-$180,000.00 1 day ago

    San Francisco, CA $56.25-$137,000.00 5 days ago

    Software Development Engineer I - Frontend & Mobile

    San Francisco, CA $99,500.00-$200,000.00 3 weeks ago

    San Francisco, CA $160,000.00-$200,000.00 2 months ago

    San Francisco, CA $150,000.00-$176,000.00 3 months ago

    San Francisco, CA $120,000.00-$190,000.00 9 months ago

    San Francisco, CA $130,000.00-$140,000.00 2 weeks ago

    Software Engineer, AI Intern (Summer 2026)

    San Francisco, CA $125,000.00-$175,000.00 2 months ago

    Software Engineer, AI Intern (Winter 2026)

    San Francisco, CA $130,000.00-$240,000.00 2 weeks ago

    San Francisco, CA $163,200.00-$223,200.00 3 days ago

    Software Engineer, Frontend (All Levels)

    San Francisco, CA $150,000.00-$220,000.00 2 weeks ago

    San Francisco, CA $150,000.00-$283,000.00 4 days ago

    San Francisco, CA $155,000.00-$339,500.00 2 weeks ago

    San Francisco, CA $140,000.00-$280,000.00 8 months ago

    San Francisco, CA $165,000.00-$165,000.00 2 years ago

    San Francisco, CA $120,000.00-$200,000.00 2 years ago

    We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

    #J-18808-Ljbffr

    serp_jobs.job_alerts.create_a_job

    Site Reliability Engineer • San Francisco, CA, United States

    Job_description.internal_linking.related_jobs
    Site Reliability Engineer

    Site Reliability Engineer

    ConductorOne • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Shape the future of identity with the highest-caliber team.If you’re amazing at what you do and want to solve big challenges in identity and security, come on board. Identity is how companies are be...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Fortinet • Sunnyvale, CA, United States
    serp_jobs.job_card.full_time
    At Fortinet, we strive to provide a supportive, collaborative environment where people are empowered to do the best work of their careers. Our team members enjoy solving complex problems, and obsess...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Site Reliability Engineer I

    Site Reliability Engineer I

    prosper.com • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    As a Site Reliability Engineer I at Prosper, you will play a crucial role in enhancing the reliability, scalability, and maintainability of our technology platform. This entry-level position is desi...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Latent • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Latent is building the intelligence infrastructure for American healthcare.Our products are already helping hospitals and clinics dramatically increase workflow output, speed up patient access to m...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Bits to Atoms • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Site Reliability Engineer (SRE).You’ll work at the intersection of infrastructure, AI / ML systems, and mission-critical physical operations. You’ll collaborate directly with engineering, AI, and oper...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Site Reliability Engineer, Consultant

    Site Reliability Engineer, Consultant

    Blue Shield of CA • Oakland, CA, US
    serp_jobs.job_card.full_time
    The Technology Operations Center (TOC) team provides 24 x 7 coverage of observability monitoring events including batch operations to assure successful execution and completion of critical business...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_1_day • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    PsiQuantum • Palo Alto, CA, United States
    serp_jobs.job_card.full_time
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Redwood Materials, Inc. • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Redwood is localizing a global battery supply chain that seamlessly integrates recovery, reuse, and recycling—keeping critical minerals in circulation and driving the energy transition.Founded in 2...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Sigmaways Inc • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    As a Site reliability engineer, you will partner with development and IT teams to implement CI / CD pipelines, develop automation and monitoring solutions to ensure our platforms are secure, scalable...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    WorkOS • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    About WorkOS 🚀 WorkOS builds tools and services for developers to help them implement authentication, identity, authorization, and overall enterprise readiness. We’re a fully distributed team with ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Site Reliability Engineer I

    Site Reliability Engineer I

    Prosper Marketplace • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    As a Site Reliability Engineer I at Prosper, you will play a crucial role in enhancing the reliability, scalability, and maintainability of our technology platform. This entry-level position is desi...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Together AI • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    As a Site Reliability Engineer (SRE) at Together, you are responsible for keeping all user-facing services and production systems running smoothly. You are a blend of a pragmatic operator and a soft...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Redwood Materials • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Redwood is localizing a global battery supply chain that seamlessly integrates recovery, reuse, and recycling — keeping critical minerals in circulation and driving the energy transition.Founded in...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Fractal • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    This range is provided by Fractal.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Fractal Analytics is a strategic AI partner to Fortune 500 com...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Writemed • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Would you like to join one of the fastest-growing organizations with a goal of using the latest AI, GenAI, LLM, Cloud, and Digital Technologies to advance drug development and improve patient care ...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Primer • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Primer helps B2B products break out of the B2C-centric marketing box.Our platform turns consumer ad channels, data streams, and emerging AI workflows into measurable growth engines for go-to-market...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Site Reliability Engineer II

    Site Reliability Engineer II

    Hinge Health • San Francisco, CA, United States
    serp_jobs.job_card.full_time
    From scaling Kubernetes clusters to improving observability with Datadog, we build the tooling and automation that empower product teams to ship with confidence. Collaborate with engineering teams t...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    LTD Global • Berkeley, CA, US
    serp_jobs.job_card.full_time
    serp_jobs.filters_job_card.quick_apply
    We are seeking a Site Reliability Engineer to join our Operations Group.This role plays a key part in advancing scientific discovery by supporting high-performance computing (HPC) and data analysis...serp_jobs.internal_linking.show_more
    serp_jobs.last_updated.last_updated_30