A company is looking for a GPU and HPC Infrastructure Engineer - New College Grad 2025.
Key Responsibilities
Contribute to the automation of datacenter operations and lifecycle management for large-scale Machine Learning systems
Implement monitoring and health management capabilities for GPU assets to ensure reliability and scalability
Develop software for NVLINK topography management and build automated test infrastructure for distributed systems
Required Qualifications
Pursuing or recently completed a BS or MS in Computer Science, Engineering, Physics, Mathematics, or a comparable degree
Software engineering experience on large-scale production systems
Strong knowledge of a systems programming language (Go, Python) and understanding of Data Structures and Algorithms
High-level knowledge of Linux system administration and cluster management systems (Kubernetes, SLURM)
Understanding of performance, security, and reliability in complex distributed systems
Infrastructure Engineer • Kansas City, Missouri, United States