Cloud Machine Learning Operations (MLOps) Engineer

Applied Insight
Hybrid in Hanover/Remote, MD
Remote
Full-time

About Us : Innovating to solve real-world problems

Applied Insight enhances the ability of federal government customers to preserve national security, deliver justice and serve the public with advanced technologies and quality analysis.

We work closely with agencies and industry to overcome technical and cultural hurdles to innovation, empowering them with the latest end-to-end cloud infrastructure, big data and cyber capabilities.

Our expertise in cross-domain and boundary solutions, network analytics, DevOps and low-to-high development is unique in our industry.

We develop and deliver innovative products and applications that are deployed in highly sensitive customer environments and have broad applications for federal missions.

On joining the Applied Insight team, you’ll be working to solve real-world problems on missions that matter with people who share your passions and encourage your ambition.

It’s vital to us that we hire committed people who are great at what they do. We return that commitment by empowering them with the autonomy, the support and the tools they need to fulfill their true potential.

A day in the life (just a few of the things you may do on any given day) :

Enhance your current skillset by disrupting traditional workflows and processes building an enterprise-scale environment using AWS technologies coupled with DevOps methodologies.

You will be an integral part of a team of knowledgeable technologists responsible for helping to build an enterprise-scale cloud presence within the IC for software development, web hosting, research, and more! This is a multi-faceted position requiring you to spend time working directly with AWS services and the underlying operating systems themselves, to efficiently improve security automations, aid collaboration efforts with software engineers, and streamline infrastructure processes.

This position offers the opportunity to use your existing infrastructure, IT, or systems engineering experience and apply it to solve problems with tools and concepts unique to AWS and Cloud Service Provider environments.

This role is in support of dynamic, rotating professional services engagements lasting from 6 weeks to 6 months. In this role, you will continuously be focused on forward leaning migration, evolution, and optimization tasking within AWS-based environments (no long-term operations and maintenance tasking!).

You will be regularly tasked with new customers and exciting new challenges to help advance customer adoption of AWS across multiple classification domains.

  • NVIDIA Triton Inference Server Expertise : Leverage your in-depth knowledge of NVIDIA Triton to design and manage scalable and high-performance inference pipelines in a production, enterprise system.
  • Model Deployment : Collaborate with data scientists and software engineers to deploy machine learning models, ensuring optimal performance, resource utilization, and cost tracking and savings
  • Scalability : Architect and implement solutions to scale machine learning inference to handle large workloads efficiently.
  • Performance Optimization : Monitor and fine-tune model inference for optimal speed and resource utilization.
  • Automation : Implement automation tools and processes for model deployment, monitoring, and scaling.
  • Monitoring and Logging : Develop robust monitoring and logging solutions to track model performance, system health, and data quality in real-time.
  • Security : Help implement security best practices to protect machine learning models and data.
  • Documentation : Maintain detailed documentation of machine learning operations processes and best practices.
  • Collaboration : Work closely with a cross-functional Product team to understand business requirements and translate them into technical solutions.
  • Troubleshooting : Provide technical support for debugging and resolving issues related to model deployment and inference.

You will excel in this role if you are :

  • Embracing Emerging Technology : You will leverage AWS and its accompanying tools daily as you help build and stand up a game-changing development environment.
  • Well-rounded : You appreciate the opportunity to work across multiple technologies such as scripting, development / test / QA tools, cloud, container, and orchestration tools, Linux and Windows operating systems, networking, security and automation.
  • Motivated : You want to continually learn new things and work with new technologies.
  • Agile : Able to work as part of small team working together to develop a solution for both commercial and government customers.
  • Focused on Automation : Wherever possible, you look for ways to automate manual processes to increase efficiency, speed, and operability of tasks.

What we are expecting from you ( the qualifications you must have) :

  • TS / SCI W / POLY NEEDED
  • Bachelor's or Master's degree in Computer Science, Software Engineering, or a related field.
  • Proven experience (3+ years) as a Machine Learning Operations Engineer with a focus on NVIDIA Triton.
  • Experience with other MLOps tools and platforms
  • Strong programming skills in Python.
  • Familiarity with machine learning frameworks like TensorFlow or PyTorch.
  • Experience with GPU hardware and optimization for deep learning workloads.
  • Strong problem-solving skills and the ability to work effectively in a collaborative team environment.
  • Excellent communication skills and the ability to convey technical concepts to both technical and non-technical stakeholders.
  • Solutions Architect Associate credential or other Associate (In Progress acceptable)

What we are desiring from you ( the nice-to-have qualifications) :

  • Proficiency in containerization technologies and orchestration tools (, Docker, AWS Fargate, Amazon Elastic Container Service, AWS Elastic Kubernetes Service).
  • Knowledge of DevOps practices and continuous integration / continuous deployment (CI / CD) pipelines
  • Familiarity with the AWS cloud platform
  • Previous experience in the deployment of machine learning models in production environments.

What we will provide in return : Excellent compensation and amazing benefits

  • Multiple health insurance options
  • 401k Immediate Vesting. Company matches 100% of the first 3% contributed and 50% of the next 2% contributed.
  • Fully paid long-term disability, short-term disability, and life insurance.
  • Flexible Spending Account options.
  • Generous paid time off.
  • Flexible work schedules with the ability to bank extra hours for additional time off.
  • Government shutdown protection where employees don't have to use leave for up to 3 days out of the year for inclement weather or budget issues.
  • Employee centric culture and a belief that we should empower those who are good at what they do and then give them the tools they need to achieve success and grow their career.
  • A commitment to learning and growth and easy ways to achieve both including a training budget, education assistance, mentorship programs and collaborative learning sessions.
  • A collaborative environment that fosters communication and an open-door policy.

including Vets and Disabled.

5 days ago
Related jobs
Applied Insight
Hanover, Maryland
Remote

Proven experience (3+ years) as a Machine Learning Operations Engineer with a focus on NVIDIA Triton. Collaborate with data scientists and software engineers to deploy machine learning models, ensuring optimal performance, resource utilization, and cost tracking and savings. Maintain detailed docume...

Promoted
Maverc Technologies
Columbia, Maryland

A talented Machine Learning Engineer to support our AI Center of Excellence! In this role, you and your team will be responsible for the entire lifecycle of machine learning models, from managing and deploying them to troubleshooting any pipeline issues that arise. Manage and deploy machine learning...

Promoted
Orbis Operations
Annapolis Junction, Maryland

The Cloud Design Engineer develops, maintains, and enhances complex and diverse Web-Based User Interfaces that interact with Big-Data Cloud systems based upon documented requirements. ORBIS is looking for a Cloud Design Engineer to join our dynamic team in Annapolis Junction, MD. Eight (8) years sof...

Promoted
GIGATEC
Annapolis Junction, Maryland

Experience using a machine-learning framework (. Do you get excited about learning new technologies, problem solving, and influencing outcomes?. The defense community needs an engineering partner who can not only keep up, but bring the technical expertise and passion necessary to solve the new harde...

Promoted
eTeam
Baltimore, Maryland

Machine Learning Engineer/Software Engineer. Familiarity machine learning operations (MLOps) best practices for deployment and monitoring. EC2, Sagemaker, CodeDeploy, SNS), and open sources products (as needed) to build infrastructure and workflows that will support enterprise deployment of machine ...

Procession Systems
Hanover, Maryland

Proven experience (3+ years) as a Machine Learning Operations Engineer with a focus on NVIDIA Triton. Model Deployment: Collaborate with data scientists and software engineers to deploy machine learning models, ensuring optimal performance, resource utilization, and cost tracking and savings. Docume...

Inovalon
Bowie, Maryland

Inovalon was founded in 1998 on the belief that technology, and data specifically, would empower the transformation of the entire healthcare ecosystem for the better, improving both outcomes and economics.At Inovalon, we believe that when our customers are successful in their missions, healthcare im...

Johns Hopkins Applied Physics Laboratory
Laurel, Maryland

Description Do you have a passion for creating machine-learning-based tools that enable the safe development and deployment of autonomous systems? Do you want to make an impact on the future of our nation's defense capabilities? Do you thrive in dynamic and collaborative environments? If so, we are ...

The Pennsylvania State University
Annapolis Junction, Maryland

In this position you will research, develop, and deliver algorithmic, machine learning, and artificial intelligence based approaches to solve complex sponsor problems. Design, develop, and research machine learning systems, models, and schemes. Study, transform, and apply state-of-the-art machine le...

Maverc Technologies
Columbia, Maryland

A talented Machine Learning Engineer to support our AI Center of Excellence! In this role, you and your team will be responsible for the entire lifecycle of machine learning models, from managing and deploying them to troubleshooting any pipeline issues that arise. Machine Learning Engineer . M...