Talent.com
Senior HPC Cluster Engineer
Senior HPC Cluster EngineerVirtualVocations • Rockville, Maryland, United States
serp_jobs.error_messages.no_longer_accepting
Senior HPC Cluster Engineer

Senior HPC Cluster Engineer

VirtualVocations • Rockville, Maryland, United States
job_description.job_card.variable_days_ago
serp_jobs.job_preview.job_type
  • serp_jobs.job_card.full_time
job_description.job_card.job_description

A company is looking for a Senior AI and ML HPC Cluster Engineer.

Key Responsibilities

Provide leadership and strategic guidance on managing large-scale HPC systems, including deployment of compute, networking, and storage

Develop and enhance the ecosystem around GPU-accelerated computing, including scalable automation solutions

Build and maintain AI and ML heterogeneous clusters both on-premises and in the cloud

Required Qualifications

Bachelor's degree in Computer Science, Electrical Engineering, or related field, or equivalent experience

Minimum 5+ years of experience designing and operating large-scale compute infrastructure

Experience with AI / HPC advanced job schedulers, such as Slurm, K8s, PBS, RTDA, or LSF

Proficient in administering Centos / RHEL and / or Ubuntu Linux distributions

Solid understanding of cluster configuration management tools such as Ansible, Puppet, or Salt

serp_jobs.job_alerts.create_a_job

Senior Hpc Engineer • Rockville, Maryland, United States