Overview
DevOps Engineer (Founding Team)
Location : San Francisco Bay Area
Type : Full-Time
Compensation : Competitive salary + meaningful equity (founding tier)
Backed by 8VC, we're building a world-class team to tackle one of the industry’s most critical infrastructure problems.
About the Role
We're building an AI-native, multi-tenant enterprise platform for complex domains in industrial verticals. In this architecture, DevOps isn't just about shipping features — it's about operationalizing intelligent agents , ensuring traceability across AI systems , and supporting mission-critical ML infrastructure at scale.
We're looking for a DevOps engineer who can own infrastructure from Day 1 — automating everything from CI / CD and observability to cloud governance and security. You’ll work with a highly technical team building real-time AI pipelines and multi-agent systems. If you want to be the person who makes the platform run — fast, secure, reliable, and explainable — this is your role.
Responsibilities
- Build and maintain scalable cloud infrastructure across AWS / GCP / Azure with a focus on secure, tenant-isolated deployments
- Own and evolve CI / CD systems (e.g. GitHub Actions, ArgoCD) with progressive rollout, testing, and rollback flows
- Establish observability tooling across services, agents, and pipelines (OpenTelemetry, Prometheus, Grafana, Sentry)
- Implement policy-as-code (OPA, Rego) for deployment safety, RBAC, audit logging, and approval workflows
- Define and enforce SLAs, uptime targets (99.99%+), incident response, and remediation workflows
- Secure infrastructure : IAM, VPC, encryption, key management, image scanning, secrets rotation
- Automate deployments, infrastructure provisioning (Terraform, Helm), and environment replication
What We’re Looking For
Core Experience :
4–10+ years in DevOps, platform engineering, or SRE in production-grade systemsStrong experience with Docker, Kubernetes (EKS / GKE), Terraform or PulumiHands-on experience deploying and monitoring distributed cloud-native systemsFamiliar with GitOps practices, CI / CD design, progressive delivery, and secure SDLCClear understanding of how to implement monitoring, alerting, and failure simulation in dynamic environmentsEngineering Mindset :
Obsessed with reliability, latency, uptime, and repeatabilitySecurity-aware and compliance-consciousProactive — you don’t wait for alerts to fix thingsComfortable collaborating with backend, AI, and data teamsBonus : Agent-Native / ML Ops Capabilities
We’re building an agentic, AI-native platform from the ground up. Experience here isn’t required, but would be a strong differentiator :Experience running LLM orchestration frameworks (e.g. LangChain, LangGraph, Dust, ReAct agents)Building retrieval-augmented generation (RAG) pipelines — and deploying them safely and repeatablyFamiliarity with vector DBs (Weaviate, Qdrant, Pinecone) and embedding pipelinesMonitoring and governing long-running or multi-agent chainsAuditability and replay systems for agent decision-makingServing fine-tuned or open-source LLMs with model versioning and GPU scaling (e.g. vLLM, TGI)Interest in auto-remediation using agents (e.g. observability + alert → insight → response via LLM)Why This Role Matters
DevOps is the nervous system of the platform — every agent, every data fabric component, every pipeline flows through what you build. This is a rare opportunity to design that system early, the right way, and future-proof it for scale, compliance, and trust.
If you're excited by intelligent systems, distributed data, and deeply technical infrastructure problems — and you want your work to have immediate real-world impact — we’d love to hear from you.
#J-18808-Ljbffr