Search jobs > Frederick, MD > Hpc

HPC Systems Manager

Frederick National Laboratory for Cancer Research
Frederick, Maryland, United States
Full-time

HPC Systems Manager

Job ID : req4124

Employee Type : exempt full-time

Division : Enterprise Information Technology

Facility : Frederick : Ft Detrick

Location : PO Box B, Frederick, MD 21702 USA

The Frederick National Laboratory is a Federally Funded Research and Development Center (FFRDC) sponsored by the National Cancer Institute (NCI) and operated by Leidos Biomedical Research, Inc.

The lab addresses some of the most urgent and intractable problems in the biomedical sciences in cancer and AIDS, drug development and first-in-human clinical trials, applications of nanotechnology in medicine, and rapid response to emerging threats of infectious diseases.

Accountability, Compassion, Collaboration, Dedication, Integrity and Versatility; it's the FNL way.

PROGRAM DESCRIPTION

The mission of Enterprise Information Technology (EIT) is to develop an enterprise-level, consolidated information technology infrastructure that provides exceptional IT capabilities to the Frederick National Labs for Cancer Research (NCI-Frederick / FNLCR) in support of basic, translational, and clinical cancer and AIDS research.

The IT Operations Group (ITOG) is a part of Enterprise Information Technology (EIT) within Leidos Biomedical Research, Inc.

  • ITOG is responsible for computational servers, storage servers, virtual machine infrastructure, and the FNLCR network. ITOG focuses on implementing enterprise IT best practices in the areas of computational services, storage, backup, and archiving;
  • batch and application support; server consolidation and virtualization; network infrastructure; unification of voice, teleconferencing, and video communication technologies;

and improved infrastructure for collocation of dedicated servers.

KEY ROLES / RESPONSIBILITIES

  • Work with scientific researchers to architect, implement, and deploy : HPC clusters, high-capacity, high-bandwidth storage, and scientific software applications necessary to support scientific research.
  • Manage and grow a small and technically strong team of HPC engineers who develop, build, and deploy HPC systems that are part of our product.
  • Partner with enterprise storage and networking teams to optimize workflows and workloads needed by scientific labs with large data generators.
  • Model, characterize, and tune the performance of HPC systems to achieve the most efficient and cost-effective solution.
  • Manage the HPC capacity plan, develop deployment schedules, and identify critical science deliverables.
  • Identify and manage risks for the HPC systems and develop mitigation plan.
  • Perform without considerable direction and mentor and supervise employees if needed.

BASIC QUALIFICATIONS

This position can be filled as a HPC Systems Manager I or II.

To be considered for this position, you must minimally meet the knowledge, skills, and abilities listed below :

  • HPC Systems Manager I
  • Possession of Bachelor’s degree from an accredited college / university according to the Council for Higher Education Accreditation (CHEA) or four (4) years relevant experience in lieu of degree.

Foreign degrees must be evaluated for U.S. equivalency.

  • In addition to the education requirement, a minimum of four (4) years of experience in managing Linux and Windows systems in a high-throughput, data intensive environment
  • Including two (2) years as a technical lead and / or managing a technical team.
  • Solid knowledge of HPC systems, storage, high-speed interconnect, and GPU architecture.
  • Excellent client-facing skills
  • Experience with batch control software such as SLURM
  • Strong understanding of Linux internals.
  • Broad experience with high performance storage systems, NFS, SMB, POSIX.
  • Familiarity with system performance analysis, monitoring, and tuning.
  • Excellent written and verbal communication skills.
  • HPC Systems Manager II
  • In addition to the education requirement, a minimum of six (6) years of experience in managing Linux and Windows systems in a high-throughput, data intensive environment
  • Including four (4) years as a technical lead and / or managing a technical team.
  • Experience with AI / ML models and application across a variety of domains
  • Experience with programming in a variety of languages, both traditional and nontraditional
  • Experience with container technologies and associated infrastructure.
  • Experience with Cloud and hybrid models
  • Knowledge of emerging computing technologies
  • Knowledge of various microarchitectures and developing firmware
  • Ability to rapidly evaluate scientific research on new and emerging technologies
  • This position is considered a safety-sensitive position and will be subject to random drug testing per the Leidos Biomedical Research Drug Free Workplace Program.
  • Ability to obtain and maintain a security clearance

EXPECTED COMPETENCIES

Candidates with these desired skills will be given preferential consideration :

SLURM, GPU, HPC Architecture, Linux

Commitment to Diversity

All qualified applicants will receive consideration for employment without regard to sex, race, ethnicity, age, national origin, citizenship, religion, physical or mental disability, medical condition, genetic information, pregnancy, family structure, marital status, ancestry, domestic partner status, sexual orientation, gender identity or expression, veteran or military status, or any other basis prohibited by law.

Leidos will also consider for employment qualified applicants with criminal histories consistent with relevant laws.

readytowork

16 days ago
Related jobs
Promoted
Frederick National Laboratory for Cancer Research
Frederick, Maryland

Manage and grow a small and technically strong team of HPC engineers who develop, build, and deploy HPC systems that are part of our product. This position can be filled as a HPC Systems Manager I or II. Model, characterize, and tune the performance of HPC systems to achieve the most efficient and c...

Promoted
DISH Network
Damascus, Maryland

Provide support for security, network and operational issues. Strong Linux skills and knowledge of protocols including LDAP, SFTP, CUPS, NFS, TCP/IP networks. ...

Promoted
Acts Retirement-Life Communities
Adamstown, Maryland

In this role, you will be responsible for managing all aspects of construction, including but not limited to, apartment renovations, and interior and exterior construction projects. This will involve direct oversight of the apartment renovation process by coordinating work activities and oversight o...

Promoted
BNBI
Frederick, Maryland

Monitors system logs/e-mail alerts and activity on network devices and troubleshoots problems with network connectivity. Day-to-day management of Cisco switches, the phone system, and other network infrastructure components controlled by NBACC. Provides support for network infrastructure not control...

Promoted
Office of The Chief Financial Officer
Maryland, MD, United States

Experience building an automated cloud infrastructure across Prod and Non-Prod environments. Experience with Configuration management and Infrastructure as Code (IaC) toolsets. ...

Promoted
Thales Trusted Cyber Technologies
Clarksburg, Maryland

Work with management and team members, decision makers, and stakeholders to define business requirements, systems goals, and to identify and resolve business systems issues. Designing, developing, administrating, troubleshooting, documenting, securing, debugging and implementing hardware, operating ...

Promoted
Adams CG
Frederick, Maryland

HVAC and/or Plumbing construction projects. Please send your Resume and/or Project List. ...

Promoted
Booker DiMaio, LLC
Frederick, Maryland

Setting up and maintaining computer hardware and software, including operating systems and applications . Configuring systems, troubleshooting issues, and providing technical support to users . Maintaining security for networks and systems, including performing security tests and monitorin...

Promoted
Battelle National Biodefense Inst
Frederick, Maryland

Monitors system logs/e-mail alerts and activity on network devices and troubleshoots problems with network connectivity. Day-to-day management of Cisco switches, the phone system, and other network infrastructure components controlled by NBACC. Provides support for network infrastructure not control...

Promoted
Forterra, Inc.
Clarksburg, Maryland

The ideal candidate has a strong software development background, familiarity with robotics and autonomous systems, and proven track record of collaborating with a cross-functional team of engineers, researchers, and product managers to create robust and scalable software for complex systems. Forter...