Search jobs > San Francisco, CA > Hpc engineer

HPC Systems Engineer

University of California - San Francisco Campus and Health
San Francisco, California, US
$132.4K-$198.6K a year
Full-time

HPC Systems Engineer

Make your application after reading the following skill and qualification requirements for this position.

IT EDW Operations

Full Time

81533BR

Job Summary

The CoreHPC team at UCSF is seeking an HPC Systems Engineer to play a key role in the development, maintenance, and day-to-day operations of the new CoreHPC cluster.

This next-generation institutional HPC cluster is focused on AI and Data Science, which is currently being built.

The HPC Systems Engineer will :

  • Apply advanced systems infrastructure concepts and skills to the operations and improvement of large-scale and highly complex research Cyber Infrastructure (CI) with unique computing, networking, and storage systems designed to address cutting-edge research problems.
  • Apply their engineering and design skills to develop new CI solutions and to develop and enhance monitoring to maintain the integrity of CI systems.
  • Select methods, techniques, and evaluation criteria to develop new CI solutions to address complex research problems.
  • Be an active member of the support and maintenance efforts for the CoreHPC cluster, resolving user issues, fixing technical problems, resolving outages, patching, and maintaining systems uptime and availability.
  • Provide consultation, support, and guidance to researchers on how to address computational problems using standard tools, packages, and approaches.
  • Develop enhancements of monitoring to maintain the integrity of CI systems.
  • Participate in multiple technical projects simultaneously.
  • Apply working knowledge of security control frameworks to maintain the integrity of the CI systems and the research being performed on them.
  • Give presentations to associated teams and other technical units.
  • Evaluate new technologies including performing moderate to complex cost / benefit analyses.

This position may lead cross-functional technical working groups and projects in support of onboarding research customers or making systems improvements.

The final salary and offer components are subject to additional approvals based on UC policy. Our placement within the salary range is dependent on a number of factors including your work experience and internal equity within this position classification at UCSF.

For positions that are represented by a labor union, placement within the salary range will be guided by the rules in the collective bargaining agreement.

The salary range for this position is $132,400 - $198,600 (Annual Rate).

Department Description

Academic Research Systems (ARS) serves the needs of the UCSF research community by providing an integrated repository of HIPAA-compliant clinical and life sciences data and a centralized, secure, professionally managed infrastructure for the storage and management of research data.

ARS empowers medical scientific investigations by offering secure computing environments, data capture, management and analysis tools, and support services which meet researchers' needs.

The Core HPC team of the Academic Research Service (ARS) focuses on large-scale, high-performance computational and storage services for UCSF researchers so they can address complex computational, AI, and data science problems.

Required Qualifications

  • Bachelor's degree in a related area such as computer science or engineering, and 6+ years of experience with large-scale or HPC systems or 10+ years of related experience with large-scale or HPC systems and / or equivalent experience / training.
  • Knowledge of HPC job scheduler system design and operation such as SLURM or PBS.
  • Demonstrated skill (5 years +) deploying, managing, and troubleshooting Warewulf (or similar) infiniband-based clusters.
  • Expert knowledge of HPC systems infrastructure design.
  • Strong knowledge of High-performance parallel filesystems and storage such as GPFS, Lustre, Vast, DDN, etc.
  • Advanced experience writing and editing the most complex scripts used to perform system maintenance and administration.
  • Advanced knowledge of computer security best practices and policies including demonstrated experience securing research cyberinfrastructure systems to meet NIST 800-171 / 800-223, HIPAA, or IS-3 requirements.
  • Ability to elicit and communicate technical and non-technical information in a clear and concise manner.
  • Self-motivated and works independently and as part of a team. Demonstrates problem-solving skills. Able to learn effectively and meet deadlines.
  • Understanding of system performance monitoring and actions that can be taken to improve or correct performance.
  • Knowledge of the design, development, and application of technology and systems to meet business needs.
  • General knowledge of other areas of IT. Thorough understanding of and experience with systems-related issues and actions that can be taken to improve or correct performance.
  • Demonstrated skills associated with adapting equipment and technology to serve user needs. Demonstrated a comprehensive understanding of how system management actions affect other systems, system users, and dependent / related functions.
  • Demonstrated advanced knowledge, skills, and abilities associated with system problem identification and resolution. Experience with design, configuration, operation, repair, and tuning of technology systems.
  • Demonstrated testing and test planning skills. Demonstrated ability to create automated Req testing.
  • Ability to write technical documentation in a clear and concise manner. Ability to develop Req runbooks defining complex technical processes in a clear and concise manner.

Preferred Qualifications

n / a

About UCSF

The University of California, San Francisco (UCSF) is a leading university dedicated to promoting health worldwide through advanced biomedical research, graduate-level education in the life sciences and health professions, and excellence in patient care.

It is the only campus in the 10-campus UC system dedicated exclusively to the health sciences. We bring together the world's leading experts in nearly every area of health.

We are home to five Nobel laureates who have advanced the understanding of cancer, neurodegenerative diseases, aging and stem cells.

Pride Values

UCSF is a diverse community made of people with many skills and talents. We seek candidates whose work experience or community service has prepared them to contribute to our commitment to professionalism, respect, integrity, diversity and excellence - also known as our PRIDE values.

In addition to our PRIDE values, UCSF is committed to equity - both in how we deliver care as well as our workforce. We are committed to building a broadly diverse community, nurturing a culture that is welcoming and supportive, and engaging diverse ideas for the provision of culturally competent education, discovery, and patient care.

Additional information about UCSF is available at diversity.ucsf.edu. Join us to find a rewarding career contributing to improving healthcare worldwide.

Equal Employment Opportunity

The University of California San Francisco is an Equal Opportunity / Affirmative Action Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, protected veteran or disabled status, or genetic information.

Organization

Campus

Job Code and Payroll Title

000520 SYS ADM 4

Job Category

Clinical Systems / IT Professionals

Bargaining Unit

99 - Policy-Covered (No Bargaining Unit)

Employee Class

Career

Percentage

100%

Location

Flexible (combination of onsite and remote work), San Francisco, CA

Shift

Days

Shift Length

8 Hours

J-18808-Ljbffr

2 days ago
Related jobs
Promoted
UC San Diego
Oakland, California

The incumbent works extensively with members of the SDSC HPC systems group to coordinate operations between TSCC and SDSCs other HPC systems and storage. The incumbent works extensively with members of the SDSC HPC systems group to coordinate operations between TSCC and SDSCs other HPC systems and s...

Promoted
University of California - San Francisco
San Francisco, California

The CoreHPC team at UCSF is seeking an HPC Systems Engineer to play a key role in the development, maintenance, and day-to-day operations of the new CoreHPC cluster. Bachelor's degree in a related area such as computer science or engineering, and 6+ years of experience with large-scale or HPC system...

Promoted
UC San Diego
Oakland, California

The HPC Systems Engineer is responsible for overseeing the management of national and campus-level high-performance computing (HPC) clusters and their associated storage systems, including large parallel file systems, NFS file servers, and underlying storage technologies. The HPC Systems Engineer is...

Promoted
University of California - San Francisco
San Francisco, California

The CoreHPC team at UCSF is seeking an HPC Systems Engineer to play a key role in the development, maintenance, and day-to-day operations of the new CoreHPC cluster. Bachelor's degree in a related area such as computer science or engineering, and 6+ years of experience with large-scale or HPC system...

Promoted
University of California - San Francisco Campus and Health
San Francisco, California

The CoreHPC team at UCSF is seeking an HPC Systems Engineer to play a key role in the development, maintenance, and day-to-day operations of the new CoreHPC cluster. Bachelor's degree in a related area such as computer science or engineering, and 6+ years of experience with large-scale or HPC system...

Promoted
InsideHigherEd
San Francisco, California

The University of California, Berkeley, is one of the world's leading institutions of higher education, distinguished by its combination of internationally recognized academic and research excellence; the transformative opportunity it provides to a large and diverse student body; its public mission ...

Promoted
UC San Diego
Oakland, California

The Systems and Cloud Administrator is a member of the Azure Solutions & Systems team and is primarily responsible for a broad range of activities in support of driving innovation within the core competencies of the division. The Systems and Cloud Administrator is a member of the Azure Solutions & S...

Promoted
University of California Office of the President
Oakland, California
Remote

The UCOP Infrastructure Operations team is growing and searching for an energetic, self-starter to serve in a new Linux Systems Administrator - AWS Cloud Operations Specialist role. Responsible for Linux system maintenance and performance, tuning and capacity planning efforts across UCOP's Linux sys...

Promoted
University of California - Riverside
Oakland, California

The Network Administrator is a member of the Network Operations team within the Information Technology Solutions (ITS) organization at the University of California, Riverside (UCR) and shares responsibility for the development, operation, and maintenance of UCR's network infrastructure. Responsibili...

Promoted
University of California-Berkeley
Berkeley, California

The University of California, Berkeley, is one of the world's leading institutions of higher education, distinguished by its combination of internationally recognized academic and research excellence; the transformative opportunity it provides to a large and diverse student body; its public mission ...