Talent.com
Cluster Infrastructure Engineer
Cluster Infrastructure EngineerCartesia • San Francisco, CA, United States
Cluster Infrastructure Engineer

Cluster Infrastructure Engineer

Cartesia • San Francisco, CA, United States
job_description.job_card.variable_hours_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
job_description.job_card.job_description

About Cartesia

Our mission is to build the next generation of AI : ubiquitous, interactive intelligence that runs wherever you are. Today, not even the best models can continuously process and reason over a year-long stream of audio, video and text—1B text tokens, 10B audio tokens and 1T video tokens—let alone do this on-device.

We're pioneering the model architectures that will make this possible. Our founding team met as PhDs at the Stanford AI Lab, where we invented State Space Models or SSMs, a new primitive for training efficient, large-scale foundation models. Our team combines deep expertise in model innovation and systems engineering paired with a design-minded product engineering team to build and ship cutting edge models and experiences.

We're funded by leading investors at Index Ventures and Lightspeed Venture Partners, along with Factory, Conviction, A Star, General Catalyst, SV Angel, Databricks and others. We're fortunate to have the support of many amazing advisors, and 90+ angels across many industries, including the world's foremost experts in AI.

About the Role

We’re looking for a Cluster Infrastructure Engineer to help build and scale the compute backbone that powers Cartesia’s research on real-time, multimodal intelligence. In this role, you’ll work at the intersection of distributed systems and infrastructure engineering, designing and operating the large-scale GPU clusters that train and serve Cartesia’s foundation models. You’ll own systems that need to be fast, reliable, and highly automated — ensuring our researchers and product teams can move at the speed of innovation. You’ll build the tooling, automation, and monitoring needed to keep clusters resilient under load, quickly diagnose and resolve issues, and continuously push the boundaries of scalability and efficiency.

Your Impact

Design and build large-scale GPU clusters for model training and low-latency inference

Develop automation for provisioning, scaling, and monitoring to ensure clusters are fast, resilient, and self-healing

Collaborate closely with research and product teams to enable distributed training at scale, optimizing for speed, reliability, and utilization

Implement robust observability and alerting systems to monitor GPU health, node stability, and job performance

Diagnose and triage hardware, networking, and distributed training issues across environments, coordinating with provider support as needed

Continuously improve cluster reliability, developer ergonomics, and overall system efficiency across Cartesia’s research and production workloads

What You Bring

Strong engineering fundamentals and experience building and operating large-scale distributed systems

Deep familiarity with GPU cluster management using Kubernetes and Slurm

A blend of developer empathy and raw performance engineering, designing systems and tools that are intuitive to use and fast

Ability to balance principled engineering with the urgency of keeping mission-critical systems alive

Proficiency with Infrastructure-as-Code tools (Terraform, Ansible, etc.) and observability tools (Prometheus, Grafana, etc.)

Strong debugging skills— comfortable diagnosing NCCL issues, CUDA errors, and network or driver-level faults.

What Sets You Apart

Experience optimizing large-scale distributed training frameworks such as DeepSpeed, Megatron-LM, or similar

Familiarity with advanced parallelization techniques such as FSDP, context parallelism, or tensor parallelism

Our culture

🏢 We’re an in-person team based out of San Francisco. We love being in the office, hanging out together and learning from each other everyday.

🚢 We ship fast. All of our work is novel and cutting edge, and execution speed is paramount. We have a high bar, and we don’t sacrifice quality and design along the way.

🤝 We support each other. We have an open and inclusive culture that’s focused on giving everyone the resources they need to succeed.

#J-18808-Ljbffr

serp_jobs.job_alerts.create_a_job

Infrastructure Engineer • San Francisco, CA, United States

Job_description.internal_linking.related_jobs
Senior Infrastructure Engineer

Senior Infrastructure Engineer

Picarro • Santa Clara, CA, United States
serp_jobs.job_card.full_time
Santa Clara, CA, is a leading technology company specializing in high-precision gas analyzers and optical spectroscopy instruments, built on Cavity Ring-Down Spectroscopy (CRDS) for ultra-sensitive...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Cloud Infrastructure Architect

Cloud Infrastructure Architect

VirtualVocations • Concord, California, United States
serp_jobs.job_card.full_time
A company is looking for an Infrastructure Architect, Cloud.Key Responsibilities Assess existing cloud resources and recommend changes to improve system performance Guide engineers and developer...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Staff Infrastructure Engineer

Staff Infrastructure Engineer

VirtualVocations • Hayward, California, United States
serp_jobs.job_card.full_time
A company is looking for a Staff Infrastructure Engineer.Key Responsibilities Collaborate with cross-functional teams to understand infrastructure requirements and develop scalable solutions Des...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Cloud Infrastructure Engineer

Cloud Infrastructure Engineer

VirtualVocations • Oakland, California, United States
serp_jobs.job_card.full_time
A company is looking for a Senior Software Engineer - Cloud Infrastructure.Key Responsibilities Architect and build a robust, scalable, and highly available distributed infrastructure Develop a ...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Senior Cloud Infrastructure Engineer

Senior Cloud Infrastructure Engineer

VirtualVocations • Concord, California, United States
serp_jobs.job_card.full_time
A company is looking for a Senior Cloud Infrastructure Engineer to lead the implementation, security, and operations of cloud environments for DoD cyber training capabilities.Key Responsibilities ...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Infrastructure Engineer

Infrastructure Engineer

VirtualVocations • Hayward, California, United States
serp_jobs.job_card.full_time
A company is looking for an Infrastructure Engineer, GPU.Key Responsibilities Contribute to the Bare Metal GPU product by providing security and operational best practices Maintain critical shar...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Infrastructure Monitoring Consultant

Infrastructure Monitoring Consultant

VirtualVocations • Santa Clara, California, United States
serp_jobs.job_card.full_time
A company is looking for an Infrastructure Monitoring Consultant to lead infrastructure monitoring initiatives for critical applications. Key Responsibilities Analyse and collect critical processe...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Azure Infrastructure Customer Engineer

Azure Infrastructure Customer Engineer

VirtualVocations • Hayward, California, United States
serp_jobs.job_card.full_time
A company is looking for an Azure Infrastructure and Operations Customer Engineer.Key Responsibilities Deliver workshops and training on Azure IaaS migration and management Conduct infrastructur...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_1_day • serp_jobs.job_card.promoted
Senior Infrastructure Engineer

Senior Infrastructure Engineer

VirtualVocations • Hayward, California, United States
serp_jobs.job_card.full_time
A company is looking for a Senior Infrastructure Engineer.Key Responsibilities Research, architect, and deploy complex infrastructure systems across various environments Design and implement aut...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Software Infrastructure & Platform Engineer

Software Infrastructure & Platform Engineer

PsiQuantum • Palo Alto, CA, United States
serp_jobs.job_card.full_time
Quantum computing holds the promise of humanity's mastery over the natural world, but only if we can build a.PsiQuantum is on a mission to build the first real, useful quantum computers, capable of...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
AWS Certified Infrastructure Engineer

AWS Certified Infrastructure Engineer

VirtualVocations • Fremont, California, United States
serp_jobs.job_card.full_time
A company is looking for an Infrastructure Systems Engineer to manage infrastructure for federal government and private sector clients. Key Responsibilities Design, deploy, and manage AWS environm...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_1_day • serp_jobs.job_card.promoted
Infrastructure Engineer

Infrastructure Engineer

Mercor, Inc. • San Francisco, CA, United States
serp_jobs.job_card.full_time
We use our platform to source, vet, and onboard expert contractors who help train AI models in a wide variety of domains. Our technology is so effective it’s used by all of the top 5 AI labs.We scal...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Solutions Engineer (AI Cloud Infrastructure)

Solutions Engineer (AI Cloud Infrastructure)

Novita AI • San Francisco, CA, United States
serp_jobs.job_card.full_time
We are a high-growth, global AI cloud infrastructure provider at the forefront of the artificial intelligence revolution. Our cutting-edge platform offers developers and enterprises powerful, scalab...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_hours • serp_jobs.job_card.promoted • serp_jobs.job_card.new
Infrastructure Operations Engineer

Infrastructure Operations Engineer

VirtualVocations • San Francisco, California, United States
serp_jobs.job_card.full_time
A company is looking for an Infrastructure Operations Engineer.Key Responsibilities Design, test, and deploy physical infrastructure for SAAS Monitor hardware availability and coordinate procure...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_variable_days • serp_jobs.job_card.promoted
Infrastructure Software Engineer, Public Sector

Infrastructure Software Engineer, Public Sector

Scale AI, Inc. • San Francisco, CA, United States
serp_jobs.job_card.full_time
Scale AI is seeking a highly skilled and motivated.Software Engineer, AI Infrastructure & Security.Public Sector Engineering team. As a part of this team, you will play a critical role in delivering...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Infrastructure Software Engineer

Infrastructure Software Engineer

VirtualVocations • Fremont, California, United States
serp_jobs.job_card.full_time
A company is looking for an Infrastructure Software Engineer for the US Federal sector.Key Responsibilities Build infrastructure as code for product teams to release the Generative AI platform C...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Platform Engineer II

Platform Engineer II

VirtualVocations • Hayward, California, United States
serp_jobs.job_card.full_time
A company is looking for a Platform Engineer II - Enterprise Storage Support Engineer.Key Responsibilities Designs, engineers, and implements systems infrastructure Proactively manages and monit...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_30 • serp_jobs.job_card.promoted
Platform & Infrastructure Engineer

Platform & Infrastructure Engineer

MindsDB • San Francisco, California, US
serp_jobs.job_card.full_time
Job description ABOUT USMindsDB is a fast-growing AI startup headquartered in San Francisco, California.Not sure what skills you will need for this opportunity Simply read the full description belo...serp_jobs.internal_linking.show_more
serp_jobs.last_updated.last_updated_1_hour • serp_jobs.job_card.promoted • serp_jobs.job_card.new