Talent.com
Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE)

BasetenSan Francisco, CA, United States
job_description.job_card.30_days_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
job_description.job_card.job_description

Join to apply for the Site Reliability Engineer (SRE) role at Baseten

Baseten powers inference for the world's most dynamic AI companies, like OpenEvidence, Clay, Mirage, Gamma, Sourcegraph, Writer, Abridge, Bland, and Zed. By uniting applied AI research, flexible infrastructure, and seamless developer tooling, we enable companies operating at the frontier of AI to bring cutting-edge models into production. With our recent $150M Series D funding, backed by investors including BOND, IVP, Spark Capital, Greylock, and Conviction, we’re scaling our team to meet accelerating customer demand.

About the Role

As a Site Reliability Engineer, you'll envision and build robust systems and processes that ensure our infrastructure is scalable, reliable, and efficient. This can range from automating deployments and monitoring systems to optimizing performance and managing incidents.

We all work closely with our users, learning from their past struggles in operationalizing ML, onboarding them onto our platform, and turning our learnings into ideas for improving Baseten.

Example Initiatives

  • Multi-cloud capacity management
  • Inference on B200 GPUs
  • Multi-node inference
  • Fractional H100 GPUs for efficient model serving

Responsibilities

  • Build and maintain scalable infrastructure to support the deployment and operation of machine learning models.
  • Establish standards and best practices for reliability and performance across the infrastructure.
  • Automate processes when relevant, particularly for managing CI / CD pipelines.
  • Own products and projects end-to-end, functioning as both an engineer and a project manager, with a focus on user empathy, project specification, and end-to-end execution.
  • Collaborate with cross-functional teams to understand project requirements and translate them into technical solutions.
  • Mentor junior team members and contribute to knowledge sharing within the organization.
  • Navigate ambiguity and exercise good judgment on tradeoffs and tools needed to solve problems, avoiding unnecessary complexity.
  • Demonstrate pride, ownership, and accountability for your work, expecting the same from your teammates.
  • Requirements

  • Bachelor's, Master's, or Ph.D. degree in Computer Science, Engineering, Mathematics, or related field.
  • 3+ years of professional work experience in a fast-paced, high-growth environment.
  • Extensive experience with Kubernetes.
  • Experience in building and maintaining scalable infrastructure.
  • Experience with infrastructure-as-code tools (e.g., Terraform, CloudFormation, Pulumi) and CI / CD tooling (e.g., GitHub Actions, GitLab CI, Circle CI, Jenkins).
  • Relevant OSS observability experience (Prometheus, ELK stack, Grafana stack, Opentelemetry) is a plus.
  • Ability to own projects end-to-end, from project specification to execution.
  • No prior machine learning experience required, but should be open to learning about it.
  • Benefits

  • Competitive compensation package.
  • This is a unique opportunity to be part of a rapidly growing startup in one of the most exciting engineering fields of our era.
  • An inclusive and supportive work culture that fosters learning and growth.
  • Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.
  • Base pay range : $150,000.00 / yr - $250,000.00 / yr

    #J-18808-Ljbffr

    serp_jobs.job_alerts.create_a_job

    Site Reliability Engineer Sre • San Francisco, CA, United States

    Job_description.internal_linking.related_jobs
    • serp_jobs.job_card.promoted
    Site Reliability Engineer (SRE)

    Site Reliability Engineer (SRE)

    AI FundSan Francisco, CA, United States
    serp_jobs.job_card.full_time
    Baseten powers inference for the world's most dynamic AI companies, like.As a Site Reliability Engineer, you'll envision and build robust systems and processes that ensure our infrastructure is sca...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ConductorOneSan Francisco, CA, United States
    serp_jobs.job_card.full_time
    Shape the future of identity with the highest-caliber team.If you’re amazing at what you do and want to solve big challenges in identity and security, come on board. Identity is how companies are be...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Site Reliability Engineer II

    Site Reliability Engineer II

    VirtualVocationsSanta Clara, California, United States
    serp_jobs.job_card.full_time
    A company is looking for a Site Reliability Engineer II- Process Automation.Key Responsibilities Optimize and automate incident and change management processes to enhance system efficiency and re...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Principal Site Reliability Engineer

    Principal Site Reliability Engineer

    FortinetSanta Clara, CA, United States
    serp_jobs.job_card.full_time
    At Fortinet, we strive to provide a supportive, collaborative environment where people are empowered to do the best work of their careers. Our team members enjoy solving complex problems, and obsess...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Senior Manager, Site Reliability Engineering

    Senior Manager, Site Reliability Engineering

    VirtualVocationsFremont, California, United States
    serp_jobs.job_card.full_time
    A company is looking for a Technical Senior Manager - Site Reliability Engineering.Key Responsibilities Provide exceptional client service and support through direct interaction Ensure client Se...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Site Reliability Engineer I

    Site Reliability Engineer I

    prosper.comSan Francisco, CA, United States
    serp_jobs.job_card.full_time
    As a Site Reliability Engineer I at Prosper, you will play a crucial role in enhancing the reliability, scalability, and maintainability of our technology platform. This entry-level position is desi...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    PsiQuantumPalo Alto, CA, United States
    serp_jobs.job_card.full_time
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    VirtualVocationsSan Jose, California, United States
    serp_jobs.job_card.full_time
    A company is looking for a Senior / Staff Software Engineer (SRE).Key Responsibilities Design, automate, and scale secure cloud infrastructure Lead incident response and manage system outages Mai...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Redwood Materials, Inc.San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Redwood is localizing a global battery supply chain that seamlessly integrates recovery, reuse, and recycling—keeping critical minerals in circulation and driving the energy transition.Founded in 2...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    FortinetSunnyvale, CA, United States
    serp_jobs.job_card.full_time
    At Fortinet, we strive to provide a supportive, collaborative environment where people are empowered to do the best work of their careers. Our team members enjoy solving complex problems, and obsess...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Rockwoods IncPleasanton, CA, US
    serp_jobs.job_card.full_time
    Note : Candidates must have relevant experience in Medical / Healthcare domains, this is mandatory.Senior SRE Engineer - Pleasanton, 5 days office. Primary work : 24x7 On-call support and setting up mo...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_1_day
    • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Together AISan Francisco, CA, United States
    serp_jobs.job_card.full_time
    As a Site Reliability Engineer (SRE) at Together, you are responsible for keeping all user-facing services and production systems running smoothly. You are a blend of a pragmatic operator and a soft...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Site Reliability Engineer - Openstack

    Site Reliability Engineer - Openstack

    FortinetSunnyvale, CA, United States
    serp_jobs.job_card.full_time
    Fortinet is recruiting a Site Reliability Engineer- OPENSTACK to join our FortiStack team.This team is responsible for the management, operation and continued development of our Openstack-based pri...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    VirtualVocationsOakland, California, United States
    serp_jobs.job_card.full_time
    A company is looking for an Operations Engineer - (Site Reliability Engineer).Key Responsibilities Design, implement, and maintain scalable systems for production and test environments Identify ...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Site Reliability Engineer (SRE)

    Site Reliability Engineer (SRE)

    OPPO US Research CenterPalo Alto, CA, United States
    serp_jobs.job_card.full_time
    OPPO US Research Center is seeking a skilled and proactive.Site Reliability Engineer (SRE).In this role, you will be responsible for ensuring the stability, scalability, and performance of our appl...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Site Reliability Engineer (SRE) - grok.com & API

    Site Reliability Engineer (SRE) - grok.com & API

    Pantera CapitalPalo Alto, CA, United States
    serp_jobs.job_card.full_time
    AI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Redwood MaterialsSan Francisco, CA, United States
    serp_jobs.job_card.full_time
    Redwood is localizing a global battery supply chain that seamlessly integrates recovery, reuse, and recycling — keeping critical minerals in circulation and driving the energy transition.Founded in...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Site Reliability Engineer (SRE)

    Site Reliability Engineer (SRE)

    Air AppsSan Francisco, CA, United States
    serp_jobs.job_card.full_time
    At Air Apps, we believe in thinking bigger—and moving faster.We’re a family-founded company on a mission to create the world’s first AI-powered Personal & Entrepreneurial Resource Planner (PRP), an...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30