Search jobs > Boston, MA > System administrator

Senior HPC Systems Administrator

Khoury College
Boston, MA, US
Full-time

About the Opportunity

Job Summary :

The Research Computing (RC) team at Northeastern University seeks a motivated, self-starting individual to be a member of our dynamic team as a Senior HPC Systems Administrator.

The successful candidate will help lead the administration, development, and expansion of the on-premise HPC resources, research computing environment (VMs, storage, software, etc.

and potentially leveraging cloud-computing resources.

As the Senior HPC Systems Administrator, your primary objective will be to setup, monitor, operate, and support high-performance clusters and solutions for faculty research and teaching.

You will assist research groups in taking full advantage of Northeastern’s HPC cluster installed at the Massachusetts Green High Performance Computing Center.

In addition, you will help ensure researchers needs are met, and develop new and novel solutions to support research and teaching, as they pertain to RC systems and solutions.

Minimum Qualifications :

  • A bachelor’s degree in engineering or computational sciences with strong Linux systems experience, or 5 years of professional experience in Linux system administration.
  • Minimum 5 years Unix / Linux system administration.
  • Minimum 5 years managing an HPC environment.
  • Strong work ethic and ability to communicate complex ideas to stakeholders with diverse background (both oral and written communication).
  • Demonstrated experience in scripting and coding : bash, python, ruby, C / C++.
  • Experience with multiple programming languages (including Fortran, C, C++, Perl, Rust).
  • Experience with large scale parallel filesystems (GPFS, Lustre, etc).
  • Experience configuring and managing batch schedulers, specifically Slurm.
  • Experience with common HPC software stacks and management tools (e.g., CMake, Spack)
  • Experience with Containers (e.g., docker, singularity, podman)
  • Knowledge of standard networking, security practices, and user management in a large computing environment.
  • Experience with identifying and debugging system problems and performance issues at scale, esp. for GPU-accelerated systems.

Preferred Qualifications

  • 7+ years of experience as HPC systems administrator at a technology company or research institution.
  • Experience with managining and operating HPC clusters at other leading institutions, DOE / DOD / NSF-funded HPC centers, or HPC-focused cloud computing entities.
  • Experience implementing scientific software on GPU-based HPC systems, esp. with AI / ML workload heavy systems.
  • Regular use of identity management software like LDAP / Active Directory.
  • Knowledge of virtualization environments (Xen, Eucalyptus, VMWare), cloud computing environments (AWS, Azure, GCP, OpenNebula), and container environments (Docker, Singularity, Shifter).
  • Experience using configuration management tools such as Puppet, Chef, Ansible, or Salt.
  • Knowledge of database concepts and basic administration for MySQL, Postgres, MongoDB, Microsoft SQL, Oracle.
  • Knowledge of a variety of filesystems and storage deployments : NFS, SMB, Lustre, GPFS, Ceph, S3, etc.
  • Strong track-record in working with academic researchers funded by federal sponsors and adapting to their evolving needs.

Key Responsibilities & Accountabilities :

  • Manage, monitor, and maintain HPC systems to meet the needs of diverse set of Northeastern researchers and students.
  • Quickly identify operational issues, debug performance problems, and optimize HPC systems.
  • Build system software infrastructure, in collaboration with other RC staff, to move jobs among on-premise Northeastern clusters, offload to non-Northeastern resources (e.

g., HPC clusters at other academic institutions, national labs, or cloud computing).

  • Interact with and support faculty with diverse computational needs and backgrounds
  • Develop and build novel systems / frameworks and workflows to meet researcher’s changing needs.
  • Communicate progress and participate in reviews with technical staff and senior management.
  • Anticipate and communicate computational needs and challenges proactively to the Northeastern research computing leadership.
  • Document and track progress of multiple ongoing needs / open issues.
  • Effectively participate in external collaborations (locally / regionally) and funding opportunities.
  • Work with RC staff to develop and maintain technical documentation, both internal (admin facing) and external (user facing) documentation.
  • Work with the research computing leadership on short and long term strategies for expanding RC support and solutions.
  • Attend conferences and workshops relevant to HPC to advance skills.
  • Stay current with emerging HPC technologies and trends.
  • Promote diversity, equity, inclusion, and accessibility by fostering a collaborative workplace and group culture.

Cover Letter :

The applicants are encouraged to include a cover letter highlighting the answers to two key questions :

1) One technical incident / experience related to HPC system administration that you are most proud of (that is, an incident where you identified an issue / challenge and led the charge of devising and implementing a technical solution).

2) One non-technical incident / experience where you believe your strong work ethic, proactiveness, inclusive nature, and / or team-player spirit led to significant success for the whole team.

Please limit the cover letter to a single page.

Position Type

Information Technology

30+ days ago
Related jobs
Promoted
Booz Allen Hamilton
Boston, Massachusetts

ServiceNow Platform and Systems Administrator, Senior. As a ServiceNow Platform and Systems Administrator, you can resolve a problem with a complete end-to-end solution in a fast, agile environment. As a ServiceNow Platform and Systems Administrator at Booz Allen, you’ll use your passion to master n...

Promoted
MITRE
Bedford, Massachusetts

Provide Linux systems administration support for MITRE’s HPC systems to ensure the availability, performance, and security of systems. We are seeking an experienced Linux HPC Systems engineer to join our team!. The HPC team is responsible for purchasing, deploying, and maintaining HPC hardware and u...

Promoted
Draper Labs
Cambridge, Massachusetts

The Senior Classified Systems Administrator supports the day-to-day activities of users, perform system maintenance, and upkeep of Information Technology and the design, development, documentation and deployment of IT solutions. Develops, tests, systems specifications and requirements, by evaluating...

Khoury College
Boston, Massachusetts

The Research Computing (RC) team at Northeastern University seeks a motivated, self-starting individual to be a member of our dynamic team as a Senior HPC Systems Administrator. As the Senior HPC Systems Administrator, your primary objective will be to setup, monitor, operate, and support high-perfo...

Raytheon Technologies
Woburn, Massachusetts

Senior System Administrator - Woburn, MA location, on site 100% and must have a active secret clearance!. We create leading-edge technology solutions to prevent security threats, secure the integrity of our systems and support IT infrastructures around the world. Senior Analyst, Information Workplac...

Booz Allen Hamilton
Boston, Massachusetts

ServiceNow Platform and Systems Administrator, Senior. As a ServiceNow Platform and Systems Administrator, you can resolve a problem with a complete end-to-end solution in a fast, agile environment. As a ServiceNow Platform and Systems Administrator at Booz Allen, you’ll use your passion to master n...

The Resource Technology Partners
Boston, Massachusetts

Senior Software Engineer - Greenfield Development (Full-time). As an experienced Engineer and a senior member in our team, you’ll be immersed in all the elements of Software Development Lifecycles - design, development, integration, operation, support and testing. Designing and implementing distribu...

Shawmut Design and Construction
Boston, Massachusetts

You will join a small team of systems administrators responsible for resolving escalated service tickets and the setup, configuration, and maintenance of Azure infrastructure, Azure PaaS solutions, Windows servers and services, Microsoft 365 services, and a hypervisor environment largely built on Hy...

SGA
Billerica, Massachusetts

This includes, but not limited to maintaining GxP Systems Inventory, User Access Management – activation, modification and deactivation of users, managing user access privileges, implementation and/or retirement of GxP systems, ensuring that Backup of all GxP systems is being taken, Recovery of GxP ...

ReqRoute
Boston, Massachusetts

Senior Windows and VMware Systems Administrator. Provide senior level comprehensive Windows and VMware systems administration and operations. Advise, plan and define change management activities on the supported systems, including requirements for off-hour systems maintenance work. Define and implem...