A company is looking for a Senior AI and ML HPC Cluster Engineer.
Key Responsibilities
Provide leadership and strategic guidance on managing large-scale HPC systems, including deployment of compute, networking, and storage
Develop and enhance the ecosystem around GPU-accelerated computing, including scalable automation solutions
Build and maintain AI and ML heterogeneous clusters both on-premises and in the cloud
Required Qualifications
Bachelor's degree in Computer Science, Electrical Engineering, or related field, or equivalent experience
Minimum 5+ years of experience designing and operating large-scale compute infrastructure
Experience with AI / HPC advanced job schedulers, such as Slurm, K8s, PBS, RTDA, or LSF
Proficient in administering Centos / RHEL and / or Ubuntu Linux distributions
Solid understanding of cluster configuration management tools such as Ansible, Puppet, or Salt
Senior Hpc Engineer • Flushing, New York, United States