Talent.com
AI Infra SRE Engineer

AI Infra SRE Engineer

Nastech GlobalSan Jose, CA, United States
job_description.job_card.variable_days_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
  • serp_jobs.filters_job_card.quick_apply
job_description.job_card.job_description

Position : AI Infra SRE Engineer DGX

Location : Remote

Duration : Fulltime

Must-have

  • NVIDIA (DGX) or equivalent high-performance-compute (HPC) clusters (e.g. Cray, HPE, IBM)
  • Cisco UCS C885A
  • Docker

Good to have

  • DevOps Automation
  • CI / CD systems (e.g., GitLab, GitHub Actions, Jenkins)
  • Terraform, Ansible, Jenkins
  • Python
  • GoLang, C / C++
  • Enterprise Grade Kubernetes cluster (RedHat OpenShift preferred) and / or Google Anthos
  • Software development lifecycle includes design, development, testing, packaging, and deployment using Golang
  • Roles & Responsibilities

  • Technical knowledge of high-performance compute, NVIDIA DGX / GPUs and / or Cisco Unified Compute System.
  • Handle availability, latency, scalability and efficiency of NVIDIA and Cisco UCS infrastructure
  • by instilling engineering reliability into the development life cycle with a focus on fault tolerant approaches.
  • Drive capacity planning, performance analysis, instrumentation, and other non-functional systems requirements.
  • Automate operational capabilities using Python, Ansible, Terraform, Go etc.
  • Deliver automation through CI / CD pipeline and chatbot etc.
  • Implement metrics driven processes to ensure service quality targets are met.
  • serp_jobs.job_alerts.create_a_job

    Sre Engineer • San Jose, CA, United States

    Job_description.internal_linking.related_jobs
    • serp_jobs.job_card.promoted
    Senior Applied AI Engineer – ML for Systems & Infrastructure

    Senior Applied AI Engineer – ML for Systems & Infrastructure

    Databricks Inc.San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Senior Applied AI Engineer – ML for Systems & Infrastructure.The Applied AI team at Databricks sits at the forefront of advancing GenAI-powered products. Over the past years, we’ve launched Databric...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    AI Research Engineer, Lead

    AI Research Engineer, Lead

    Menlo VenturesSan Francisco, CA, United States
    serp_jobs.job_card.full_time
    The Technical Lead will drive AI research in one or more of the following areas : structure prediction, protein design, and lead optimization. PhD in AI, Machine Learning, Bioinformatics, or related ...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Lead AI Engineer

    Lead AI Engineer

    Recruiting from ScratchSan Francisco, CA, United States
    serp_jobs.job_card.full_time
    Recruiting from Scratch is a talent firm that focuses on placing the best candidate for our clients.Our team is 100% remote and we work with teams across North America, South America, and Europe to...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Artificial Intelligence Engineer

    Artificial Intelligence Engineer

    GlobalLogicSanta Clara, CA, US
    serp_jobs.job_card.full_time
    Has hands on experience and knowledge in one or more of the following algorithms : Open3D or MeshLab or Blender or Ansys. Has experience running SageMaker and other AWS services.Has some hands-on exp...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    AI Agentic Engineer

    AI Agentic Engineer

    DocuSign, Inc.San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Docusign brings agreements to life.Docusign solutions to accelerate the process of doing business and simplify people’s lives. With intelligent agreement management, Docusign unleashes business-crit...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    AI Infrastructure Engineer, Model Serving Platform

    AI Infrastructure Engineer, Model Serving Platform

    Scale AI, Inc.San Francisco, CA, United States
    serp_jobs.job_card.full_time
    As a Software Engineer on the ML Infrastructure team, you will design and build platforms for scalable, reliable, and efficient serving of LLMs. Our platform powers cutting-edge research and product...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    Applied AI Engineer, Enterprise GenAI

    Applied AI Engineer, Enterprise GenAI

    Scale AI, Inc.San Francisco, CA, United States
    serp_jobs.job_card.full_time
    AI is becoming vitally important in every function of our society.At Scale, our mission is to accelerate the development of AI applications. For 8 years, Scale has been the leading AI data foundry, ...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    SRE & Data Engineer for Cloud Native AI SAAS

    SRE & Data Engineer for Cloud Native AI SAAS

    SherlockTalentSan Francisco, CA, United States
    serp_jobs.job_card.full_time
    Job Title : SRE & Data Engineer.Location : Bay Area, CA, 3 days a week onsite.Job Type : Founding Level SWE, Full Time.Salary : Founders-level equity and $200K-$250K+. Establish and maintain the foundat...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    SRE Engineer

    SRE Engineer

    SpeakSan Francisco, CA, United States
    serp_jobs.job_card.full_time
    Our mission is to reinvent the way people learn, starting with language.Learning a language can change a life by opening doors to new cultures, careers, and communities. Two billion people around th...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    GTM Lead, Physical AI (AI Infra)

    GTM Lead, Physical AI (AI Infra)

    LavendoSan Francisco, CA, US
    serp_jobs.job_card.full_time
    Our client is a leading AI infrastructure provider operating a full-stack cloud platform designed for intensive AI workloads. The company features proprietary hardware and software architecture, inc...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    AI Engineer

    AI Engineer

    Airwallex Pty Ltd.San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Airwallex is the only unified payments and financial platform for global businesses.Powered by our unique combination of proprietary infrastructure and software, we empower over 150,000 businesses ...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Senior Software Engineer - AI

    Senior Software Engineer - AI

    CerebrasSan Francisco, CA, United States
    serp_jobs.job_card.full_time
    Senior Artificial Intelligence Engineer.AI-powered financial products and features.In this role, you will design and maintain AI infrastructure, optimize machine learning models, and integrate cutt...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    AI Infrastructure Engineer, ML Data Platform

    AI Infrastructure Engineer, ML Data Platform

    Scale AI, Inc.San Francisco, CA, United States
    serp_jobs.job_card.full_time
    Scale's AI Infrastructure team supports both R&D and applied Generative AI initiatives, driving breakthroughs in areas of post-training research such as AI safety, agents, and evaluating state-of-t...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_30
    • serp_jobs.job_card.promoted
    AI Solutions Engineer

    AI Solutions Engineer

    ButterflyMX, Inc.San Francisco, CA, United States
    serp_jobs.job_card.full_time
    ButterflyMX is on a mission to empower people to open and manage doors & gates from a smartphone.Our products are installed in more than 20,000+ multifamily, commercial, gated communities, and stud...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Applied AI Engineer

    Applied AI Engineer

    Parafin Inc.San Francisco, CA, United States
    serp_jobs.job_card.full_time
    At Parafin, we’re on a mission to grow small businesses.Small businesses are the backbone of our economy, but traditional banks often don’t have their backs. We build tech that makes it simple for s...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    Gen AI Architect

    Gen AI Architect

    Simarn SolutionsFremont, California, United States
    serp_jobs.job_card.full_time
    serp_jobs.filters_job_card.quick_apply
    Job Title : Gen AI Architect .We are seeking an experienced Senior Technical Lead / Gen AI Architect with strong expertise in Langfuse v3, Azure AI services, and GenAI lifecycle management...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Sr. Software Engineer- AI / LLM

    Sr. Software Engineer- AI / LLM

    SupermicroSan Jose, CA, United States
    serp_jobs.job_card.full_time
    Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days
    • serp_jobs.job_card.promoted
    Sr. AI / ML Engineer

    Sr. AI / ML Engineer

    VectraSan Jose, CA, United States
    serp_jobs.job_card.full_time
    Vectra is the leader in AI-driven threat detection and response for hybrid and multi-cloud enterprises.The Vectra AI Platform delivers integrated signal across public cloud, SaaS, identity, and dat...serp_jobs.internal_linking.show_moreserp_jobs.last_updated.last_updated_variable_days