Overview
We are seeking an experienced HPC System Administrator to manage, maintain, and optimize HPC infrastructure, ensuring reliability, performance, and security.
Responsibilities
- Administer HPC systems (installation, configuration, patching, tuning).
- Support HPC users (job submission, troubleshooting, training).
- Monitor system health, performance, and resource utilization.
- Diagnose and resolve hardware / software / network issues.
- Ensure compliance with security policies and implement data protection.
- Maintain documentation and generate performance reports.
Qualifications
Experience with workload managers (SLURM, PBS, LSF, Torque).Knowledge of parallel filesystems (Lustre, GPFS) & high-speed interconnects (InfiniBand).Familiarity with monitoring tools (Nagios, Grafana, Prometheus).Understanding of HPC security best practices.Strong problem-solving skills, able to work independently and collaboratively.Seniority level
Mid-Senior levelEmployment type
ContractJob function
Information TechnologyIndustries
Software DevelopmentJ-18808-Ljbffr