Search jobs > Rockville, MD > Senior hpc architect

Senior HPC Architect

GDIT
Rockville, MD, USA
$136.3K-$184.5K a year
Full-time

Job Description :

GDIT is seeking a Senior HPC Architect to join our Scientific Infrastructure Team, providing High Performance Computing (HPC) services for a large biomedical research community with the National Institute of Allergy and Infectious Diseases (NIAID).

Our Scientific Infrastructure Team is responsible for enabling and managing HPC and its associated infrastructure and interconnects across multiple locations, 100’s of COTS and open-source scientific applications, and 40PB of data storage to include data archive, lifecycle policy management and data sharing services.

This team serves as a customer-facing presence for the NIAID research community, providing a single point of support for new initiatives, ongoing projects, and scientific infrastructure needs.

In your role as a Senior HPC Architect, you will be a subject matter expert architecting, implementing, and managing multiple high performance compute clusters and their associated infrastructure for a large biomedical research community.

Work Visa sponsorship will not be provided for this position.

HOW A SENIOR HPC ARCHITECT WILL MAKE AN IMPACT :

  • Provide hands-on administration and support for two HPC clusters; a 4000+ core HPC cluster that is GPU-focused and a 1,500+ core HPC cluster, including monitoring performance and health of both clusters
  • Install and support bioinformatics applications for a large and diverse research community with needs in genomics, cryo-electron microscopy, AI / ML
  • Architect and design HPC clusters to include designing new clusters or expanding existing components such as storage, InfiniBand, and compute
  • Monitor and report on cluster performance and generate data to show usage and trends
  • Perform troubleshooting and problem-solving for complex HPC operational and performance issues
  • Collaborate with researchers to guide them in effective use of the HPC resources, such as job scheduler submission, data formats, and building data workflows to effectively move data from scientific instruments to the HPC clusters for analysis.
  • Provide input to the Scientific Infrastructure team leader for setting priorities for cluster operations, scheduling policies, resources needed, etc.
  • Develop and maintain documentation and diagrams for the HPC clusters, review GitHub pull requests, and update content and training materials on the user wiki portal.
  • Teach and mentor team members on system design, best practices, and troubleshooting techniques.

WHAT YOU’LL NEED TO SUCCEED :

Education : BS / BA (or equivalent)

Required Experience : Minimum of 10 years related experience

Required Technical Skills :

  • Minimum of 5 years’ experience as engineer or architect with HPC technologies
  • Hands-on architecture design experience with HPC to include storage, file system, InfiniBand, security, authentication, and compute architectures
  • Experience with Slurm job scheduling, including troubleshooting job status and optimizing submission scripts
  • Experience using Git to manage shared software configuration code bases
  • Hands-on experience with cloud-based services (e.g. Azure, AWS, GCP)
  • Minimum of five years’ experience in Linux systems administration
  • Good understanding of storage administration and optimization, such as performing upgrades and defining RAID configurations
  • Good understanding of fundamental networking concepts and their practical applications
  • Experience with Spack or EasyBuild package manager, including making packages from PyPi, R, Github
  • Knowledge and experience in one or more scripting languages applicable to Linux (e.g. Bash, Perl, Python)

Security Clearance Level : Must be able to obtain a NIH Public Trust

Preferred Skills :

  • Experience administering RedHat / CentOS based systems
  • Experience working in a life-sciences oriented environment
  • Experience configuring and using monitoring systems to monitor HPC clusters
  • Ability to determine meaningful metrics and usage data for monthly status reports and health dashboards
  • Experience with DevOps or DevSecOps methodologies, such as automation and configuration management
  • Strong troubleshooting skills

Location : This position is primarily remote. However, you must reside within commuting distance to the 5601 Fishers Lane client site in Rockville, MD and be able to be onsite as required to meet contractual obligations and project needs.

Travel expenses will not be reimbursed.

GDIT IS YOUR PLACE :

  • 401K with company match
  • Comprehensive health and wellness packages
  • Internal mobility team dedicated to helping you own your career
  • Professional growth opportunities including paid education and certifications
  • Cutting-edge technology you can learn from
  • Rest and recharge with paid vacation and holidays

GDITFedHealthJobs -NIH

GDITFedHealthJobs

GDITPriority

The likely salary range for this position is $136,340 - $184,460. This is not, however, a guarantee of compensation or salary.

Rather, salary will be set based on experience, geographic location and possibly contractual requirements and could fall outside of this range.

Scheduled Weekly Hours :

Travel Required : None

None

T elecommuting Options :

Hybrid

Work Location : USA MD Rockville

USA MD Rockville

30+ days ago
Related jobs
GDIT
Rockville, Maryland

GDIT is seeking a Senior HPC Architect to join our Scientific Infrastructure Team, providing High Performance Computing (HPC) services for a large biomedical research community with the National Institute of Allergy and Infectious Diseases (NIAID). In your role as a Senior HPC Architect, you will be...

Promoted
Futrend Technology Inc
Rockville, Maryland

Customers in the Federal ;This role will support Voice over IP (VoIP) solutions and Mobile Device Management....

Promoted
ALTA IT Services
Rockville, Maryland

Our Client is a DC Metro based Hi-tech 8(a) small business with customers across federal & state government agencies and commercial organizations.We are a CMMI and ISO certified organization with expertise in areas such as Cybersecurity, Software Engineering, Application Development and Modernizati....

Promoted
VirtualVocations
Darnestown, Maryland

A company is looking for a Senior Network and Computer Systems Administrator (Configuration Management) to manage and optimize IT infrastructure. ...

Promoted
GAMA-1 Technologies
Silver Spring, Maryland

GAMA-1 Technologies, LLC is seeking an experienced and highly motivated remote Enterprise Architect III with experience in AWS and data lake technologies to join our team. In this role, the successful candidate will be instrumental in driving GAMA-1's efforts to support cloud initiatives, focuse...

Promoted
MORI Associates
Greenbelt, Maryland

As a Senior Systems Administrator you will be part of a dedicated team of diverse professionals creating and supporting cutting edge solutions for our client’s critical missions. We focus on offering a complete range of services from strategic consulting to the development of Information Syste...

Promoted
Optimized Technical Solutions
Riverdale Park, Maryland
Remote

Optimized Technical Solutions is seeking a Cloud Architect to join our team. Designing and architecting cloud native services and infrastructure. ...

Promoted
ServiceSource, Inc.
Capitol Heights, Maryland

Maximize network reliability by monitoring performance, troubleshooting network problems, and scheduling maintenance and upgrades. Administers, manages, designs, documents, and evaluates network systems. Creates and maintains network documentation. Administers and maintains routers, switches, networ...

Promoted
Leidos Holding
Bethesda, Maryland

The Leidos Digital Modernization Sector Health IT Division is seeking a Senior Cloud Solution Architect/Technical Lead with extensive knowledge and expertise in Amazon Web Services (AWS), Microsoft Azure and Google Cloud Platform (GCP) technologies and architecture. The Senior Cloud Architect/Techni...

Promoted
Booz Allen Hamilton
Rockville, Maryland

As a systems administrator, you'll work in a fast-paced environment with many opportunities to learn about Enterprise Content Management (ECM) and Business Process Automation (BPM) tools. Microsoft Certified Systems Administrator (MCSA) Certification. You'll perform Windows Server administration, in...