Principal AI/ML Infrastructure and Ops Engineer - Remote

UnitedHealth Group

Seattle, WA, US

Remote

Full-time

Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best.

Here, you will find a culture guided by diversity and inclusion, talented peers, comprehensive benefits and career development opportunities.

Come make an impact on the communities we serve as you help us advance health equity on a global scale. Join us to start Caring.

Connecting. Growing together.

Optum AI is chartered to drive value on high impact enterprise AI problems, democratize AI through the enterprise ML platform, accelerate the adoption of Generative Artificial Intelligence (Gen AI) and drive Responsible AI.

Projecting to deliver $8.4B of benefit value over the next 5 years through these efforts as well as reduce risk through safe, accurate, and unbiased AI, this is a key focus of the enterprise.

As the Principal AI / ML Infrastructure and Ops Engineer, you will be responsible for the overall operations related to United AI Studio (enterprise AI / ML platform).

This individual contributor (IC) role requires deep expertise in building and managing large-scale AI / ML platforms, providing strategic guidance, and hands-on technical leadership.

You will play a critical role in ensuring the stability, reliability, scalability, and performance of United AI Studio in compliance with enterprise standards, working with other engineering teams, customers, and our leadership.

Experience with modern Infrastructure and DevOps tools and paradigms, as well as hands-on knowledge with major cloud-based services like Azure, AWS and GCP is a must.

You’ll enjoy the flexibility to work remotely

from anywhere within the U.S. as you take on some tough challenges.

Primary Responsibilities :

Infrastructure Strategy & Planning : Lead the design and implementation of scalable infrastructure solutions that align with the company’s strategic goals and operational needs
Cloud & Hybrid Environment Management : Oversee the management of multi-cloud (Azure, AWS, GCP) and hybrid infrastructure environments, enabling secure & scalable solution hosting and ensuring optimal performance and cost-effectiveness balancing performance and budgetary constraints
Automation & DevOps : Drive automation across the infrastructure lifecycle, leveraging Infrastructure as Code (IaC) and DevOps principles to streamline deployment and management processes
Systems Monitoring & Performance Tuning : Develop and implement monitoring frameworks for infrastructure, identifying areas for performance improvement, optimization, and ensuring high availability
Disaster Recovery & Business Continuity : Design, test, and implement disaster recovery and business continuity plans to ensure minimal downtime and data integrity
Security & Compliance : Collaborate with cybersecurity teams to ensure all systems and operations comply with industry standards and are secure against evolving threats
Capacity Planning & Cost Optimization : Forecast and manage capacity requirements for the AI / ML infrastructure while identifying opportunities to reduce costs without compromising performance
Thought Leadership : Stay updated with the latest in cloud technologies, AI / ML infrastructure advancements, and DevOps practices, providing leadership within the organization on best practices
Mentorship & Leadership : Act as a technical mentor for junior team members, fostering a culture of continuous learning and professional development within the team
Cross-Departmental Collaboration : Work closely with software engineering, cybersecurity, and AI / ML teams to ensure infrastructure supports the broader technical ecosystem

You’ll be rewarded and recognized for your performance in an environment that will challenge you and give you clear direction on what it takes to succeed in your role as well as provide development for other roles you may be interested in.

Required Qualifications :

Bachelor’s degree in computer science, information technology, or a related field
10+ years of infrastructure experience : Proven experience managing large-scale, cloud-based, enterprise-level software platforms and deep understanding of multi-cloud architectures, specifically Azure, AWS, and GCP, with hands-on experience in cloud management
6+ years of practical experience in Infrastructure-as-Code and CI / CD tools like Terraform, Git Actions and alike
5+ years of practical experience in containerization technologies (Kubernetes, Docker) and orchestration for large-scale workloads
5+ years of practical experience in Scripting & Automation Skills : Advanced proficiency in scripting languages such as Python and Bash to support automation and system integration efforts

Preferred Qualifications :

Master’s degree in computer science, information technology, or a related field
Experience in monitoring and optimizing performance of distributed systems, particularly AI / ML pipelines and data processing workflows
High-availability systems experience : demonstrated success in building and maintaining highly available, fault-tolerant infrastructure
Proven security & compliance knowledge : solid understanding of security best practices and experience ensuring compliance with relevant regulatory framework
Machine Learning and LLM Operations experience : exposure to modern tools and techniques in MLOps and LLMOps fields
Experience with AI / ML-specific infrastructure tools (e.g., MLflow, Kubeflow) for managing and deploying models at scale
Proven leadership in a Healthcare environment : experience working within a healthcare or regulated industry, with a deep understanding of the unique challenges and compliance requirements
Proven disaster recovery expertise : hands-on experience designing and implementing business continuity and disaster recovery solutions
Demonstrated familiarity with GPU-accelerated computing and the management of AI / ML hardware infrastructure, including AI-specific cloud services and GPU clusters
Ability to work independently, manage multiple projects simultaneously, and adapt to changing priorities in a fast-paced environment
Demonstrated innovative problem solving : track record of introducing innovative infrastructure solutions that improve efficiency, reduce costs, or enhance performance across the enterprise
Allemployees working remotely will be required to adhere to UnitedHealth Group’s Telecommuter Policy.

California, Colorado, Connecticut, Hawaii, Nevada, New Jersey, New York, Rhode Island, Washington, Washington, D.C. Residents Only : The salary range for this role is $, to $, annually.

Pay is based on several factors including but not limited to local labor markets, education, work experience, certifications, etc.

UnitedHealth Group complies with all minimum wage laws as applicable. In addition to your salary, UnitedHealth Group offers benefits such as, a comprehensive benefits package, incentive and recognition programs, equity stock purchase and k contribution (all benefits are subject to eligibility requirements).

No matter where or when you begin a career with UnitedHealth Group, you’ll find a far-reaching choice of benefits and incentives.

Application Deadline : This will be posted for a minimum of 2 business days or until a sufficient candidate pool has been collected.

Job posting may come down early due to volume of applicants.

At UnitedHealth Group, our mission is to help people live healthier lives and make the health system work better for everyone.

We believe everyone of every race, gender, sexuality, age, location and income deserves the opportunity to live their healthiest life.

Today, however, there are still far too many barriers to good health which are disproportionately experienced by people of color, historically marginalized groups and those with lower incomes.

We are committed to mitigating our impact on the environment and enabling and delivering equitable care that addresses health disparities and improves health outcomes an enterprise priority reflected in our mission.

Diversity creates a healthier atmosphere : UnitedHealth Group is an Equal Employment Opportunity / Affirmative Action employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, age, national origin, protected veteran status, disability status, sexual orientation, gender identity or expression, marital status, genetic information, or any other characteristic protected by law.

UnitedHealth Group is a drug - free workplace. Candidates are required to pass a drug test before beginning employment.

9 days ago

Related jobs

Promoted

Principal AI Engineer (Option for remote)

Deep Abacus

Seattle, Washington

Remote

Flexible remote and hybrid work options available as well as competitive salary and equity, generous time-off for rest and recharge, and a comprehensive benefits package including health, dental, vision, and retirement plans. Our team of AI scientists, engineers, and innovators is dedicated to expon...

Promoted

Senior Civil Design Engineer (Site & Infrastructure)

Brown And Caldwell

Seattle, Washington

This includes site improvements at pump stations and treatment plants, linear utilities and construction, stormwater grading and drainage, and retrofits for municipal and private industrial clients. We provide a comprehensive benefits package that promotes employee health, performance, and success w...

Promoted

Principal Security Engineer IS

Providence Health & Services

Renton, Washington

Security Engineers are responsible for researching, evaluating, and designing technical security solutions for the enterprise in support of the Enterprise Information Security (EIS) strategies; providing technical security assessment support; developing, maintaining, and monitoring an effective Info...

Promoted

Principal Applied Scientist - AI & Deep Learning

MissionStaff

Seattle, Washington

In this role, you'll spearhead the development and deployment of AI models, deep learning solutions, and industry-leading techniques for challenges such as identity verification, fraud detection, background checks, and risk management. Lead a dynamic team of scientists and engineers, and be the inno...

Promoted

Principal Software Engineer

Mercedes-Benz Research & Development North...

Seattle, Washington

Design and deliver core software features and services, establish development and operational processes, coach team members and coordinate with internationally distributed design and development teams. They’re inspired by the newest trends, find the best solutions for the customer, develop the lates...

Senior AI and ML Infra Engineer, Research Clusters

NVIDIA

Redmond, Washington

Collaborate with diverse teams, including researchers, data engineers, and DevOps professionals, to create a seamless and integrated AI/ML infrastructure ecosystem. Strong background in software engineering, with experience in building and maintaining large-scale distributed systems, preferably in t...

Systems Development Engineer, Corporate Infrastructure Engineering and Deployment Services

Amazon.com Services LLC

Seattle, Washington

We have a worldwide focus on the sustainable and effective launch, adoption and utilization of Amazon’s technologies / systems for our Corporate, AWS and Customer Services customer base. The CIS team is looking for a System Development Engineer (SysDE) To help in our mission to develop and support o...

Software Engineer - AI/ML, AWS Neuron Distributed Training - Multimodality

Annapurna Labs (U.S.) Inc.

Seattle, Washington

The ML Distributed Training team works side by side with chip architects, compiler engineers and runtime engineers to create, build and tune distributed training solutions with Trainium instances. This role is for a machine learning engineer in the Distribute Training team for AWS Neuron, responsibl...

Steam and Power Systems Engineer (Remote)

System One

Bellevue, Washington

Remote

System One is seeking a highly skilled and motivated Steam and Power Systems Engineer to oversee and coordinate the integration of steam, power, and auxiliary systems within the Energy Island of a power plant. The successful candidate will provide both technical and administrative oversight, ensurin...

Expert AI/ML Applied Scientist - Generative AI ( Seattle & Bay Area)

SAP

Bellevue, Washington

SAP AI Product Engineering is responsible for designing central frameworks and cloud services such as Joule, Generative AI Hub, AI Core, and Document Information Extraction as the AI foundation for embedded AI throughout our portfolio in addition to our efforts towards SAP’s foundation models, both ...

Principal AI/ML Infrastructure and Ops Engineer - Remote

Principal AI Engineer (Option for remote)

Senior Civil Design Engineer (Site & Infrastructure)

Principal Security Engineer IS

Principal Applied Scientist - AI & Deep Learning

Principal Software Engineer

Senior AI and ML Infra Engineer, Research Clusters

Systems Development Engineer, Corporate Infrastructure Engineering and Deployment Services

Software Engineer - AI/ML, AWS Neuron Distributed Training - Multimodality

Steam and Power Systems Engineer (Remote)

Expert AI/ML Applied Scientist - Generative AI ( Seattle & Bay Area)

Related searches