Search jobs > San Francisco, CA > High performance computing

Lead High Performance Computing (HPC) Engineer

University of California - San Francisco
San Francisco, California, US
$132.4K-$198.6K a year
Full-time

Lead High Performance Computing (HPC) Engineer

IT EDW Operations

Make sure to apply quickly in order to maximise your chances of being considered for an interview Read the complete job description below.

Full Time

81014BR

Job Summary

Applies advanced systems infrastructure concepts to the planning and operations of advanced cyberinfrastructure (HPC and HTC) engineering duties on large-scale and highly complex infrastructure with unique computing, networking, and storage systems designed to address complex, cutting-edge research problems.

Selects methods, techniques and evaluation criteria to develop new CI solutions to address complex research problems. Develop enhancements of monitoring to maintain the integrity of CI systems.

Lead cross-functional technical teams and projects alongside research programs. Participate in multiple technical projects simultaneously.

Applies working knowledge of security control frameworks to maintain the integrity of the CI systems and the research being performed on them.

Gives presentations to associated team and other technical units. Evaluates new technologies including performing moderate to complex cost / benefit analyses.

May lead a team of systems / infrastructure professionals.

The salary range for this position is $132,400 - $198,600 (Annual Rate).

Department Description

Academic Research Systems (ARS) serves the needs of the UCSF research community by providing an integrated repository of HIPAA compliant clinical and life sciences data and a centralized, secure, professionally managed infrastructure for the storage and management of research data.

ARS empowers medical scientific investigations by offering secure computing environments, data capture, management and analysis tools, and support services which meet researchers' needs.

The Core HPC team of the Academic Research Service (ARS) focuses on large scale, high performance computational and storage services for UCSF researchers so they can address complex computational, AI, and data science problems.

Required Qualifications

  • Bachelor's degree in a related area such as computer science or engineering, and 6+ years of experience with large-scale or HPC systems
  • or* 10+ years of related experience with large-scale or High-Performance Computing (HPC) systems
  • Knowledge of HPC job scheduler system design and operation such as SLURM or PBS
  • Demonstrated skill (5 years +) deploying, managing, and troubleshooting Warewulf (or similar) infiniband-based clusters
  • Expert knowledge of HPC systems infrastructure design
  • Strong knowledge of High-performance parallel filesystems and storage such as GPFS, Lustre, Vast, DDN, etc
  • Advanced knowledge of computer security best practices and policies including demonstrated experience securing research cyberinfrastructure systems to meet NIST 800-171 / 800-223, HIPPA or IS-3 requirements
  • Ability to write technical documentation in a clear and concise manner. Ability to develop Req runbooks defining complex technical processes in a clear and concise manner
  • Demonstrated testing and test planning skills. Demonstrated ability to create automated Req testing.
  • Ability to elicit and communicate technical and non-technical information in a clear and concise manner.
  • Self-motivated and works independently and as part of a team. Demonstrates problem-solving skills. Able to learn effectively and meet deadlines.
  • Understanding of system performance monitoring and actions that can be taken to improve or correct performance.
  • Demonstrated advanced knowledge, skills, and abilities associated with system problem identification and resolution. Experience with design, configuration, operation, repair, and tuning of technology systems.
  • Advanced experience writing and editing the most complex scripts used to perform system maintenance and administration.

Preferred Qualifications

  • Knowledge of the design, development, and application of technology and systems to meet business needs.
  • General knowledge of other areas of IT. Thorough understanding of and experience with systems-related issues and actions that can be taken to improve or correct performance.
  • Demonstrated skills associated with adapting equipment and technology to serve user needs. Demonstrated comprehensive understanding of how system management actions affect other systems, system users, and dependent / related functions.

About UCSF

The University of California, San Francisco (UCSF) is a leading university dedicated to promoting health worldwide through advanced biomedical research, graduate-level education in the life sciences and health professions, and excellence in patient care.

It is the only campus in the 10-campus UC system dedicated exclusively to the health sciences. We bring together the world's leading experts in nearly every area of health.

We are home to five Nobel laureates who have advanced the understanding of cancer, neurodegenerative diseases, aging and stem cells.

Pride Values

UCSF is a diverse community made of people with many skills and talents. We seek candidates whose work experience or community service has prepared them to contribute to our commitment to professionalism, respect, integrity, diversity and excellence - also known as our PRIDE values.

In addition to our PRIDE values, UCSF is committed to equity - both in how we deliver care as well as our workforce. We are committed to building a broadly diverse community, nurturing a culture that is welcoming and supportive, and engaging diverse ideas for the provision of culturally competent education, discovery, and patient care.

Additional information about UCSF is available at diversity.ucsf.edu. Join us to find a rewarding career contributing to improving healthcare worldwide.

Equal Employment Opportunity

The University of California San Francisco is an Equal Opportunity / Affirmative Action Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, protected veteran or disabled status, or genetic information.

Organization

Campus

Job Code and Payroll Title

000520 SYS ADM 4

Job Category

Clinical Systems / IT Professionals

Bargaining Unit

99 - Policy-Covered (No Bargaining Unit)

Employee Class

Career

Percentage

100%

Location

Flexible (combination of onsite and remote work), Mission Center Building (SF), San Francisco, CA

Shift

Days

Shift Length

8 Hours

J-18808-Ljbffr

11 days ago
Related jobs
Promoted
University of California - San Francisco
San Francisco, California

Lead High Performance Computing (HPC) Engineer. The CoreHPC team at UCSF is seeking a Lead HPC Engineer to fill a technical leadership role in the design, development, maintenance, and day-to-day operations of the new CoreHPC cluster. Bachelor's degree in a related area such as computer science or e...

Promoted
University of California - San Francisco Campus and Health
San Francisco, California

Lead High Performance Computing (HPC) Engineer. The CoreHPC team at UCSF is seeking a Lead HPC Engineer to fill a technical leadership role in the design, development, maintenance, and day-to-day operations of the new CoreHPC cluster. Bachelor's degree in a related area such as computer science or e...

Promoted
GSPANN Technologies, Inc
San Francisco, California

Role: Lead Performance Engineer. Performance Testing Tools: Proficiency in using performance testing tools like. Performance Testing Methodologies: Strong knowledge of performance testing methodologies such as load testing, stress testing, endurance testing, spike testing, and scalability testing. W...

Promoted
Salesforce, Inc.
San Francisco, California

If you’re fired up about software performance, automating everything, and working with phenomenal engineers, this is the job for you! If you are a developer that is passionate about performance, or a performance tester that is also interested in contributing performance improvements, we would love t...

Promoted
Salesforce.com, Inc.
San Francisco, California

If you're fired up about software performance, automating everything, and working with phenomenal engineers, this is the job for you! If you are a developer that is passionate about performance, or a performance tester that is also interested in contributing performance improvements, we would love t...

Promoted
Salesforce, Inc.
San Francisco, California

If you’re fired up about software performance, automating everything, and working with phenomenal engineers, this is the job for you! If you are a developer that is passionate about performance, or a performance tester that is also interested in contributing performance improvements, we would love t...

Zendar
Berkeley, California

You will also be responsible for working closely with other engineering teams to analyze the performance of the Software stack and to improve all areas of the codebase. Zendar is creating a high-resolution radar imaging system that has resolution similar to lidar, allowing cars to see in inclement w...

NVIDIA
Remote, CA, US
Remote

As a member of the DLFW Infrastructure team, you will provide leadership in the design and implementation of groundbreaking GPU compute cluster that runs demanding deep learning, high performance computing, and computationally intensive workloads. NVIDIA's Deep Learning Optimized Frameworks Group is...

Celonis
San Francisco, California

We're Celonis, the global leader in Process Mining technology and one of the world's fastest-growing SaaS firms. As a Value Engineer, you are spearheading our mission of data-driven business transformation with our customers. Discover and translate customers’ strategic priorities into high-impact Ce...

NVIDIA
Remote, CA, US
Remote

As a member of our team in NVIDIA's NVHPC compilers & tools group, you will analyze and run High Performance Computing (HPC) applications on HPC servers and systems to gain insight into the performance characteristics of these applications. Analyze High Performance Computing(HPC) applications to...