Talent.com
Software Engineer, Infrastructure Reliability

Software Engineer, Infrastructure Reliability

OpenAISan Francisco, CA, United States
job_description.job_card.variable_days_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
job_description.job_card.job_description

About the Team

We’re hiring Software Engineers to join our Applied Infrastructure organization, and more specifically for our Database Systems and Online Storage teams. These teams operate with a high degree of autonomy and are deeply collaborative, with a shared mandate to raise the bar on safety, reliability, and velocity across OpenAI.

About the Role

You’ll be at the heart of scaling and hardening the infrastructure that powers some of the most widely used AI systems in the world. You’ll help ensure our systems are highly reliable, observable, performant, and secure—so researchers can iterate quickly, and products like ChatGPT and the OpenAI API can serve millions of users safely and effectively.

This is a hands-on, high-leverage role for engineers who thrive on ownership, love solving deep technical problems across the stack, and want to work on systems that support cutting-edge research and deploy at global scale. You’ll play a key part in shaping technical direction, proactively improving system resilience, and collaborating closely with infra, product, and research teams to turn complex infrastructure into reliable platforms.

You might thrive in this role if you

  • Have a deep understanding of distributed systems principles and a proven track record in building and operating scalable and reliable systems.
  • Have a keen eye for performance and optimization. You know how to squeeze the most performance out of complex, globally-distributed systems.
  • Have experience operating orchestration systems such as Kubernetes at scale and building abstractions over cloud platforms.
  • Are comfortable working in Linux environments, and with tools like Kubernetes, Terraform, CI / CD pipelines, and modern observability stacks.
  • Are experienced in collaborating with cross-functional teams to ensure that reliability and scalability are considered in the design and development of new features and services.
  • Have a humble attitude, an eagerness to help your colleagues, and a desire to do whatever it takes to make the team succeed.
  • Own problems end-to-end, and are willing to pick up whatever knowledge you\'re missing to get the job done.
  • Are comfortable with ambiguity and rapid change.

Responsibilities (from the original descriptions)

  • Design, build, and operate reliable and performant systems used across engineering.
  • Identify and fix performance bottlenecks and inefficiencies, ensuring our infrastructure can scale to the next order of magnitude.
  • Dig deep to resolve complex issues.
  • Continuously improve automation to reduce manual work. Improve internal tooling and our developer experience.
  • Contribute to incident response, postmortems, and the development of best practices around system reliability and scalability.
  • Qualifications

  • 4+ years of relevant industry experience, with 2+ years leading large scale, complex projects or teams as an engineer or tech lead
  • A passion for distributed systems at scale with a focus on reliability, scalability, security, and continuous improvement.
  • Proven experience as an reliability engineer, production engineer, or a similar role in a fast-paced, rapidly scaling company.
  • Strong proficiency in cloud infrastructure (like AWS, GCP, Azure) and IaC tools such as Terraform. Proficiency in programming / scripting languages.
  • Experience with containerization technologies and container orchestration platforms like Kubernetes.
  • Experience with observability tools such as Datadog, Prometheus, Grafana, Splunk and ELK stack.
  • Experience with microservices architecture and service mesh technologies.
  • Knowledge of security best practices in cloud environments.
  • Strong understanding of distributed systems, networking, and database technologies.
  • Excellent problem-solving skills and ability to work in a fast-paced environment.
  • About OpenAI

    OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.

    We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic.

    For additional information, please see OpenAI’s Affirmative Action and Equal Employment Opportunity Policy Statement.

    Qualified applicants with arrest or conviction records will be considered for employment in accordance with applicable law, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act. For unincorporated Los Angeles County workers : we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment : protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non-public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations.

    To notify OpenAI that you believe this job posting is non-compliant, please submit a report through this form. No response will be provided to inquiries unrelated to job posting compliance.

    We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link.

    OpenAI Global Applicant Privacy Policy

    At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.

    #J-18808-Ljbffr

    serp_jobs.job_alerts.create_a_job

    Software Engineer Infrastructure • San Francisco, CA, United States

    Job_description.internal_linking.related_jobs
    • serp_jobs.job_card.promoted
    Flight Software Infrastructure Engineer

    Flight Software Infrastructure Engineer

    Reliable RoboticsMountain View, CA, United States
    serp_jobs.job_card.permanent
    We're building safety-enhancing technology for aviation that will save lives.Automated aviation systems will enable a future where air transportation is safer, more convenient and fundamentally tra...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Backend Infrastructure Engineer

    Backend Infrastructure Engineer

    Strategic Employment Partners (SEP)San Francisco, CA, US
    serp_jobs.job_card.full_time
    Join a stealth-mode startup on a mission to redefine how people shop online.Our client is building a hyper-personalized, AI-powered shopping experience backed by some of the most successful names i...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Software Engineer - AI Agent Infrastructure (Healthcare)

    Software Engineer - AI Agent Infrastructure (Healthcare)

    Honey HealthHayward, CA, US
    serp_jobs.job_card.full_time
    Honey Health is the all-in-one AI back office for primary and specialty care.Our AI agents autonomously handle core back-office jobs, such as aggregating patient data, processing orders and prescri...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Senior Infrastructure Engineer

    Senior Infrastructure Engineer

    Ariat InternationalSan Leandro, CA, US
    serp_jobs.job_card.full_time
    We are looking for a seasoned Senior Infrastructure Engineer to join our IT team and contribute to the design, deployment, and management of enterprise infrastructure systems.This role is critical ...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Software Engineer, ML Infrastructure - Training Platform

    Software Engineer, ML Infrastructure - Training Platform

    Scale AI, Inc.San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Scale is looking for an AI / ML Infrastructure Engineer to join our Machine Learning Infrastructure team to build out our Training Platform. You will partner closely with Machine Learning researchers ...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Site Reliability Engineer

    Site Reliability Engineer

    PsiQuantumPalo Alto, CA, United States
    serp_jobs.job_card.full_time
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Software Engineer, Infrastructure

    Software Engineer, Infrastructure

    Menlo VenturesSan Francisco, CA, United States
    serp_jobs.job_card.full_time
    Biotechnology is rewriting life as we know it, from the medicines we take, to the crops we grow, the materials we wear, and the household goods that we rely on every day. But moving at the new speed...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Software Engineer III, Infrastructure, Core

    Software Engineer III, Infrastructure, Core

    Google Inc.San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Software Engineer III, Infrastructure, Core.Qualified applications with arrest or conviction records will be considered for employment in accordance with the San Francisco Fair Chance Ordinance for...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Software Engineer III, Site Reliability Engineering

    Software Engineer III, Site Reliability Engineering

    Google Inc.San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Software Engineer III, Site Reliability Engineering.San Francisco, CA, USA ; Raleigh, NC, USA ; +2 more ; +1 more.Experience driving progress, solving problems, and mentoring more junior team membe...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Software Engineer (Site Reliability Engineer)

    Software Engineer (Site Reliability Engineer)

    CerebrasSan Francisco, CA, United States
    serp_jobs.job_card.full_time
    San Francisco or Palo Alto, CA.At Anyscale, we take a market-based approach to compensation.We are data-driven, transparent, and consistent. As the market data changes over time, the target salary f...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Backend Infrastructure Engineer (San Francisco)

    Backend Infrastructure Engineer (San Francisco)

    Strategic Employment Partners (SEP)San Francisco, CA, US
    serp_jobs.job_card.full_time +1
    Join a stealth-mode startup on a mission to redefine how people shop online.Our client is building a hyper-personalized, AI-powered shopping experience backed by some of the most successful names i...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Software Infrastructure & Platform Engineer

    Software Infrastructure & Platform Engineer

    PsiQuantumPalo Alto, CA, United States
    serp_jobs.job_card.full_time
    Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Software Engineer (Infrastructure)

    Software Engineer (Infrastructure)

    GreptileSan Francisco, CA, United States
    serp_jobs.job_card.full_time
    K – $210K • $75K – $125K Equity • Up to $25,000 in relocation assistance.Greptile is an AI code reviewer that catches bugs and anti-patterns in pull requests with complete context of the codebase.H...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Senior Infrastructure Software Engineer, Enterprise AI

    Senior Infrastructure Software Engineer, Enterprise AI

    Scale AI, Inc.San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Scale GP is building the next generation of enterprise-grade Generative AI products.Our platform provides APIs for knowledge retrieval, inference, and evaluation, enabling customers to build and de...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    • serp_jobs.job_card.new
    Software Engineer, Infrastructure

    Software Engineer, Infrastructure

    DecagonSan Francisco, CA, United States
    serp_jobs.job_card.full_time
    Decagon is the leading conversational AI platform empowering every brand to deliver concierge customer experience.Our AI agents provide intelligent, human-like responses across chat, email, and voi...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_hours
    • serp_jobs.job_card.promoted
    Infrastructure Software Engineer, Public Sector

    Infrastructure Software Engineer, Public Sector

    Scale AI, Inc.San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Scale AI is seeking a highly skilled and motivated.Software Engineer, AI Infrastructure & Security.Public Sector Engineering team. As a part of this team, you will play a critical role in delivering...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Platform & Infrastructure Engineer

    Platform & Infrastructure Engineer

    MindsdbSan Francisco, CA, US
    serp_jobs.job_card.full_time
    Job description ABOUT USMindsDB is a fast-growing AI startup headquartered in San Francisco, California.MindsDB is an AI Analytics solution that connects to diverse data sources and applications th...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Software Engineer, Cloud Infrastructure

    Software Engineer, Cloud Infrastructure

    MediumSan Francisco, CA, United States
    serp_jobs.job_card.full_time
    JobTarget helps you cut through the noise to reach and convert the best candidates.Our unified platform uses data and automated technology to help you efficiently manage applications and connect wi...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_1_day