Search jobs > San Francisco, CA > Remote > Cloud infrastructure

Chief Architect ML/AI Infrastructure – Cloud Resource Optimization – REMOTE

Living Talent Company
San Francisco, California, US
Remote
Full-time

Chief Architect AI / ML Infrastructure Cloud Cost Optimization & Resource Utilization

Make your application after reading the following skill and qualification requirements for this position.

  • Startup (revenue-generating, Series A)
  • Company size : 30
  • Future unicorn
  • REMOTE first culture
  • Smart, fun, low-ego team culture
  • Compensation : Base Salary 250k+, Equity

Key Responsibilities

  • Architecture & Development : Kubernetes-based ML / AI optimization platforms
  • Leadership & Collaboration : with C-staff, product management, engineering, and design partners.
  • Communication : Create detailed architecture diagrams, documents, and presentations.
  • User Experience Focus : for Infrastructure Admin and MLOps staff.
  • Open Source Community : Stay actively involved with CNCF and related projects.
  • Enterprise-Class Solutions : Drive & deliver solutions for enterprise-class data, ML, AI applications.
  • FinOps & SRE Best Practices : FinOps for cloud financial management, modern SRE practices.

Qualifications

  • Entrepreneurial, Startup Experience
  • 10 years+ infrastructure level software architecture and development.

Extensive Experience

  • Linux, Virtualization platforms (hands-on)
  • AWS, GCP or Azure.

Strong Experience

Kubernetes-based ML / AI systems (Kubeflow, Kueue, KServe, GPU Operators, DRA, Karpenter)

Deep Knowledge

  • ML / AI use cases & customer stories of model development, training, inference, & hardware accelerator usage (CPU, GPU, TPU).
  • Modern cloud-native architectures (scalability, availability, reliability, security, observability).
  • Proven track record of delivering complex distributed systems.
  • Active involvement in open-source communities, particularly CNCF and related projects.
  • Strong leadership and team collaboration skills.
  • Excellent communication skills, both verbal and written.

Preferred Qualifications

  • Knowledge of additional ML / AI frameworks and tools.
  • Experience in DevOps practices and tools.
  • Certification in Kubernetes or related technologies.
  • Awareness of FinOps and SRE best practices
  • Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.

J-18808-Ljbffr

Remote working / work at home options are available for this role.

1 day ago
Related jobs
Promoted
Optum
San Francisco, California
Remote

Optum AI is chartered to drive value on high impact enterprise AI problems, democratize AI through the enterprise ML platform, accelerate the adoption of Generative Artificial Intelligence (Gen AI) and drive Responsible AI. As the Principal AI/ML Infrastructure and Ops Engineer, you will be responsi...

Promoted
Scale AI, Inc.
San Francisco, California

In this role, you will help lead the design and development of core cloud infrastructure platforms and systems, while supporting orchestration, data abstraction, data pipelines, identity & access management, and underlying infrastructure. Own the underlying cloud infrastructure stack running on AWS ...

Promoted
VirtualVocations
Oakland, California

A company is looking for a Solutions Architect, AI/ML to join their Professional Services team. ...

Promoted
Figma
San Francisco, California

You will be combining industry best practices and a first-principles approach to design and build ML infrastructure that will improve Figma’s design and collaboration tool. Build infrastructure to train, deploy, and serve models at scale. Combine industry best practices and a first-principles approa...

Promoted
VirtualVocations
Oakland, California

A company is looking for an AI/ML Solution Architect responsible for developing AI-enabled solutions for healthcare revenue cycle management. ...

Promoted
DICE
San Francisco, California

Exceptions can be made if you can find a right resource with strong Google Cloud Platform experience with AI/ML experience and Healthcare experience. Architect and develop AI or machine learning solutions on platforms such as AWS, Databricks, Azure, Google Cloud, and OpenAI. Excellent communication ...

Promoted
Karkidi
Burlingame, California

We are looking for an experienced AI/ML Infrastructure Engineer who cares about impact, ownership, cross-functional projects, and mentorship. Be part of a team working on building out scalable infrastructure to train, evaluate, deploy, perform inference, and monitor our ML models. Build, deploy, and...

Highmark Health
CA, Working at Home, California

We are seeking an experienced AI/ML/Software Cloud Solution Architect to join our AI Services and Platforms team and drive the development of innovative AI (generative and predictive) based solutions for our enterprise stakeholders. In addition, you will create high-level and detailed design plans f...

UnitedHealth Group
San Francisco, California
Remote

Optum AI is chartered to drive value on high impact enterprise AI problems, democratize AI through the enterprise ML platform, accelerate the adoption of Generative Artificial Intelligence (Gen AI) and drive Responsible AI. As the Principal AI/ML Infrastructure and Ops Engineer, you will be responsi...

Seekup Strategies
CA, US
Remote

Provenexperience as a Solution Architect with a background in software engineering,cloud technologies, web APIs, or data engineering, ideally with AWSimplementation experience. As a Solution Architect you’ll be atthe forefront of crafting impactful solutions for our pharmaceutical andbiotech clients...