We are seeking a highly skilled MLOps Engineer to design, build, and operate scalable machine learning infrastructure that supports modern AI applications. You’ll play a key role in enabling reliable data flows, embedding pipelines, prompt orchestration, and intelligent agent management. This role is ideal for an engineer who thrives at the intersection of ML, DevOps, and systems reliability.
- Pipeline Development & Operations : Build and operate robust data, embedding, and prompt pipelines to support production AI / ML workloads.
- Agent Registry & Identity : Maintain a secure and scalable system for managing AI agent identity, versioning, and registration.
- CI / CD & Infrastructure as Code (IaC) : Deliver automated workflows for model deployment and infrastructure provisioning using modern DevOps tooling.
- Coordination Primitives : Design and implement primitives for distributed coordination and orchestration of ML agents and services.
- Observability & Guardrails : Implement observability, monitoring, and guarded execution frameworks to ensure safe and reliable AI system behavior.
Requirements
Strong experience with MLOps, DevOps, or SRE practices in production environments.Hands-on expertise with CI / CD pipelines and Infrastructure as Code (Terraform, Pulumi, etc.).Solid understanding of data engineering , feature / embedding pipelines , and ML model deployment .Familiarity with observability tooling (Prometheus, Grafana, ELK, OpenTelemetry, etc.).Experience with distributed systems and coordination mechanisms (e.g., Kubernetes, service meshes, message queues).Proficiency in one or more languages : Python, Go, or similar .Bonus : Knowledge of LLM ops , prompt engineering infrastructure , or agent frameworks .