Overview
We are seeking a skilled MLOps Engineer to join our team and ensure the seamless deployment, monitoring, and optimization of AI models in production.
The MLOps Engineer will design, implement, and maintain end-to-end machine learning pipelines, focusing on automating model deployment, monitoring model health, detecting data drift, and managing AI-related logging. This role will involve building scalable infrastructure and dashboards for real-time and historical insights, ensuring models are secure, performant, and aligned with business needs.
Key Responsibilities
Deploy and manage machine learning models in production using tools like MLflow, Kubeflow, or AWS SageMaker, ensuring scalability and low latency.
Build and maintain dashboards using Grafana, Prometheus, or Kibana to track real-time model health (e.g., accuracy, latency) and historical trends.
Implement drift detection pipelines using tools like Evidently AI or Alibi Detect to identify shifts in data distributions and trigger alerts or retraining.
Set up centralized logging with ELK Stack or OpenTelemetry to capture AI inference events, errors, and audit trails for debugging and compliance.
Develop CI / CD pipelines with GitHub Actions or Jenkins to automate model updates, testing, and deployment.
Apply secure-by-design principles to protect data pipelines and models, using encryption, access controls, and compliance with regulations like GDPR or NIST AI RMF.
Work with data scientists, AI Integration Engineers, and DevOps teams to align model performance with business requirements and infrastructure capabilities.
Optimize models for production (e.g., via quantization or pruning) and ensure efficient resource usage on cloud platforms like AWS, Azure, or Google Cloud.
Maintain clear documentation of pipelines, dashboards, and monitoring processes for cross-team transparency.
Qualifications
Bachelor's or Master's degree in Computer Science, Data Science, Engineering, or a related field.
5+ years in MLOps, DevOps, or software engineering with a focus on AI / ML systems.
Proficiency in Python and SQL; familiarity with JavaScript or Go is a plus.
Understanding of model performance metrics (e.g., precision, recall, AUC) and drift detection methods (e.g., KS test, PSI).
Strong problem-solving and debugging skills for resolving pipeline and monitoring issues.
Preferred Qualifications
Equal employment opportunity, including veterans and individuals with disabilities.
J-18808-Ljbffr
Mlops Engineer • Arlington, VA, US