Search jobs > Cheyenne, WY > System engineer iii

HPC Systems Engineer III

University Corporation for Atmospheric Research (UCAR)
Cheyenne, WY, United States
$140K-$175K a year
Full-time

Job Description Summary :

UCAR is excited to announce the job opening for a HPC Systems Engineer III role. This position is responsible for providing system engineering service and support for the Computational & Information Systems Laboratory's (CISL) high-performance supercomputers, high-performance networks, service infrastructures (e.

g., JupyterHub, Globus, containers, Open Science Data Federation (OSDF)), and storage services. The environment is composed of multi-vendor resources with numerous specialized hardware components and requires coordination and communication with the other groups and divisions within CISL.

Production systems supported are located at the NSF NCAR Wyoming Supercomputing Center (NWSC) located in Cheyenne Wyoming.

May be required to work at the NWSC during periods of system installation, system upgrade, or system troubleshooting.

NSF NCAR's Computational and Information Systems Laboratory (CISL) is a leader in supercomputing and data services necessary for the advancement of atmospheric and geospace science.

CISL's mission is to remain a leader at the forefront of ensuring that research universities, NCAR, and the larger atmospheric, oceanographic, and related research communities have access to the computational resources they need for their research.

To fulfill the need for a stronger workforce at the intersection of High Performance Computing (HPC) and geoscience problems, CISL engages in education and outreach activities to inspire and attract a diverse future workforce.

Position Details : Visa Sponsored Job :

Visa Sponsored Job :

Relocation Assistance Eligible :

Job Location : Boulder, Colorado

Boulder, Colorado

Position Type & Term :

Full time, Regular

Compensation Range :

Salary Range $140,000 - $175,000

Final salary and rates are based on education, experience, skills relevant to the role.*

Application Notes

Application Deadline : This position will be posted until 11 : 59 PM MT on Monday, November 25, 2024 . Applications will not be accepted past this date.

Required application materials (preferably in PDF Format) :

Resume

Cover Letter - Please address how your skills and experience meet the needs of this position (for more information, please refer to the Key Responsibilities and Knowledge, Skills, and Abilities sections of this job posting).

Questionnaire - Included in Workday application; pleas reflect on your own personal and professional experiences to provide examples

  • Please share ONE specific example of how you have created an environment where your teammates feel safe to provide feedback on initiatives you are working on.
  • Please explain how you have used horizontal scaling in your work.
  • Please explain how you see AI tools, like large language models, potentially enhancing your work.

Partial Relocation assistance is available for this position to eligible candidate.

UCAR / NCAR will not sponsor a work visa (e.g., J-1, H1-B, etc.) for this position. U.S. Citizenship, Permanent Residency, or other protected status under 8 U.

S.C. 1324b(a)(3) is required for this position

What you will do :

As part of the High Performance Computing Systems Group (HSG), provides system engineering leadership and support for the Computational & Information Systems Laboratory's (CISL) high-performance supercomputers, block and object storage systems, data archival systems, high-performance networks, and data transfer services.

The environment is composed of multi-vendor resources with numerous specialized hardware components and requires coordination and communication with the other groups and divisions within CISL.

Primary job location is in Boulder Colorado. Production systems supported are located at the NCAR Wyoming Supercomputing Center (NWSC) located in Cheyenne Wyoming.

May be required to work at the NWSC during periods of system installation, system upgrade, or system troubleshooting.

Responsibilities :

Software Engineering and Development

Develops, implements, and documents new features or capabilities in system administration and system monitoring software.

Develops and maintains systems software as necessary for the deployment and management of all aspects of high-performance supercomputers, clusters, storage, and network fabrics.

Develops and maintains security monitoring and analysis software. Performs installation and necessary hardware and software integration as part of the HPC infrastructure deployments and upgrades.

Develops and maintains security monitoring and analysis software. Helps define group standards and guidelines for software development and documentation.

Leads software development projects including requirements gathering, design, and project management. Writes code to enhance system management capabilities of the HPC infrastructure and automate repeatedly performed system administration tasks.

Manages, designs, and develops bench marking tool suites for use during procurement and for ongoing performance monitoring of the high-performance computing environment.

Develops acceptance testing criteria and applications for system procurement.

Research and Evaluation

Researches new and emerging technology (e.g., cloud), evaluates the potential impact of the new hardware and software technology on workflows and plans, and makes recommendations to the HPCD division and CISL management for future procurement of hardware and software products, configurations, and functional enhancements or upgrades in support of the high-performance computing environment.

Performs evaluations and benchmarks, and compiles reports on new hardware and software systems related to the high-performance computing environment (i.

e., computing, storage, networking).

Participates in projects relating to the high-performance computing environment and may have direct responsibility for design and procurement decisions.

This may include development of systems level code to support the various aspects of the HPC infrastructure software and hardware.

Participates in the RFP process by contributing to the technical specification, requirements definition, review, decision making, acceptance, and implementation for future procurement.

Operational Monitoring and Troubleshooting

Operates and monitors the behavior of the group managed supercomputers, clusters, servers, storage, and network fabrics on a routine, daily basis to ensure proper and efficient operations.

Alerts other HPC Systems Group staff, vendor representatives, and / or NWSC staff of abnormal conditions or behaviors, as appropriate, and takes remedial actions as necessary.

Diagnoses and may repair failed software and / or hardware components, or may mentor / assist other staff in such.

Provides service on a 7x24 on-call basis troubleshooting and resolving system related problems presented by users, other sections in CISL, and vendor-employed engineers and analysts.

Refers and escalates problems to senior members of the HPC Systems Group or appropriate staff as necessary. Documents troubleshooting and operational techniques and best practices, mentors other team members when necessary.

Systems Administration

Provides systems support for diverse hardware and software architectures. Leads the installation and upgrades of system hardware and software, including computational systems, clusters, standalone machines, storage systems and a variety of network fabrics including Ethernet, InfiniBand, and Fibre Channel.

Helps define standards and guidelines for operation and maintenance, and produces systems operation and procedural documentation.

Compiles, installs and maintains commercial and open source application software. Documents system administration tasks and mentors other team members when necessary.

Project Management

Leads team projects utilizing standard project management tools and techniques. Under the direction of the HSG group lead, provide project coordination, technical expertise and planning for system deployment projects.

Develops budgets, project timelines, and task structures for the group. May guide and review the tasks of team members and provide guidance as necessary.

May participate in cross-group and cross-division projects as necessary including taking a lead role.

Organizational Representation and Reporting

Provides regular HSG activities reports to management and may contribute to CISL or NCAR annual report and development plans.

Attends group, division, and laboratory meetings and may represent HSG and its activities at such meetings. May represent the group at larger organizational meetings and broader community events as appropriate.

Who We'd Love to Join Our Team

Successful candidates will ensure their application materials speak to the following criteria :

Education & Experience

Bachelor's degree and eight to twelve years of progressive experience or equivalent combination of education and experience in one or more of the following fields : Computer Science, Mathematics, Computer / Electrical Engineering, Information Sciences, Software Engineering, or equivalent related field.

Knowledge, Skills, and Abilities

Demonstrated skill in the installation, configuration, administration, troubleshooting, and securing of compute clusters

Experience with deploying and maintaining infrastructure for hardware and software stacks for services such as Globus, JupyterHub, Kubernetes

Demonstrated skill in the configuration and troubleshooting of high-performance Ethernet fabrics

Demonstrated skill in operating container infrastructure

Demonstrated skill in common scripting and programming languages (e.g., ANSI / GNU C, Python, etc.) and general software engineering practices

Demonstrated skill in performing tasks requiring organization and attention to detail

Excellent written and verbal communication skills and the ability to write and interpret systems documentation

Communicates effectively with lab and / or program. May communicate with entire organization.

Able to explain concepts with high technical complexity to others of various technical backgrounds. This may include risks, control, and impacts.

Employs active listening to lab or program needs to create solutions to technical problems at a high level of complexity.

Makes formal presentations at lab or program level and advocates for proposed solutions.

Ability to work collaboratively with teams of different skill levels and backgrounds

Ability to mentor team members and collaborators

Ability to function effectively within a matrixed, multidisciplinary team

Maintains professional contact with members of industry and sponsors.

May interact at national level with sponsors / presentations.

Desired, but not required :

Experience with infrastructure as code solutions, such as Ansible

Experience with on-premise as well as commercial clouds

Experience with and / or interest in project and team management

Experience with infrastructure for CI / CD workflows

Experience with high-performance computing and related technologies

OTHER REQUIREMENTS :

Occasional travel to the NCAR Wyoming Supercomputer Center, which is approximately 90 miles north of Boulder

Periodic 7x24 on-call support in rotation with other staff

Providing assessment and feedback on vendor technology roadmap, RFI / RFP to the HSG group head and the HPCD division director

Work location requirements :

This position is expected to support a hybrid format (remote and in-person work) with some days each week in-person at the primary Boulder, CO office.

Production systems supported are located at the NCAR Wyoming Supercomputing Center (NWSC) in Cheyenne, Wyoming and the Systems Engineer will be required to work at the NWSC to assist with new supercomputers, storage commissioning, major upgrades, outages, downtimes, etc.

Benefits Overview

UCAR affirms its commitment to employees through competitive benefits . In addition to medical, dental, vision, retirement, and life insurance, UCAR offers a variety of programs focused on work-life balance and professional, and personal development. These include :

Tuition Assistance, time off allowance to attend classes, and other professional development opportunities

UCAR contributes 10% of your eligible pay into your retirement account; 100% fully vested on day one

Starting minimum accrual of 20 days of personal time off each year (prorated for less than full-time positions)

10 paid holidays

10 days of sick leave each year

12 weeks of paid parental leave

Short-term medical leave paid at 100% of your regular salary

EcoPass for local Colorado residents to use the Denver and Boulder-area transit system at no cost

Commitment to Diversity, Equity & Inclusion

Our organization is committed to creating a diverse, equitable, and inclusive work environment and fostering a culture where everyone feels welcome and supported.

To learn more about these efforts, visit the Office of Diversity, Equity & Inclusion Strategic Plan and our Diversity & Inclusion : A Welcoming Workplace site.

Research shows that women and people of color are less likely to apply for a position if they do not meet almost 100% of the desired skills and experience .

Please note this is not necessary! If you meet the minimum requirements and have a passion for the work, you are encouraged to apply.

We can provide on-the-job training for the rest!

Commitment to Job Application Fairness

Applicants are not required to provide age or age-related information and may redact information related to age, date of birth, or dates of attendance at or graduation from an educational institution from any submissions during the initial application process.

Some Final Considerations

At UCAR NCAR UCP, you will work alongside a dedicated team of professionals conducting critical research and community outreach to solve complex Earth system science problems including climate change, air pollution, extreme weather, floods, drought, wildfires, and space weather, all with the goal of improving human life and reducing economic loss.

Each of us, from scientists to the professionals who support their work, serves the public and a collaborative community of scientists in our mission to understand the complex processes that make up the Earth system, from the ocean floor to the Sun's core.

Flexible Work

At UCAR, we are committed to supporting our mission by giving staff the flexibility to find the schedule and location that works best to maintain their own work-life circumstances and reach their full potential as professionals.

Many positions within our organization are eligible for fully on-site, hybrid, fully-remote and / or flexible work schedules.

Equal Opportunity Employer

UCAR is committed to providing equal opportunity for all employees and applicants for employment and does not discriminate on the basis of race, age, creed, color, religion, national origin or ancestry, sex, gender, disability, veteran status, genetic information, sexual orientation, gender identity or expression, or pregnancy.

Whatever your intersection of identities, you are welcome at UCAR.

Export Control

All positions are required to comply with U.S. export compliance regulations work location requirements regarding access to facilities and research systems.

Visa Wait Times

Please consider the length of visa procurement when applying for this posting, understanding that you will not be able to begin employment until you are able to get a visa and enter the U.S.

19 days ago
Related jobs
Promoted
University Corporation for Atmospheric Research (UCAR)
Cheyenne, Wyoming

As part of the High Performance Computing Systems Group (HSG), provides system engineering leadership and support for the Computational & Information Systems Laboratory's (CISL) high-performance supercomputers, block and object storage systems, data archival systems, high-performance networks, and d...

Promoted
PMI (Project Management Institute)
Cheyenne, Wyoming

JobPosting","title":"Data Engineer II","datePosted":"2024-04-15T00:00:00","validThrough":null,"description":"Data Engineer II (Multiple Openings), Project Management Institute, Inc. Data Engineer II (Multiple Openings), Project Management Institute, Inc. The position requires a minimum of a Bachelor...

Promoted
VirtualVocations
Cheyenne, Wyoming

A company is looking for a Senior Network Security Engineer to join their Infrastructure Services Team. ...

Promoted
Intel
Cheyenne, Wyoming

Education: Bachelors & 6+ years OR Masters & 4+ years of experience in Technology or STEM-related fieldMinimum Qualifications:3+ years designing any MDM (master data management) solution3+ years configuring and/or developing any MDM (master data management) solutionPreferred Qualifications:SAP MDG (...

Promoted
Cardinal Health
Cheyenne, Wyoming

Collaborate with internal peersFollow Cardinal Health’s software development lifecycleLead small to mid-sized technical initiatives and provide technical direction and inputAct as technical mentor on team to provide skill uplift to junior and mid-level team membersUtilizes development skills to buil...

Promoted
Oracle
Cheyenne, Wyoming

We’re looking for hands-on engineers with expertise and passion for solving complex problems in distributed database systems, storage infrastructure, transaction processing, and highly available services. Career Level - IC3ResponsibilitiesAs a member of the software engineering division, you will ta...

Promoted
Wilson Language Training
Cheyenne, Wyoming

Wilson Language Training Senior Software Engineer Cheyenne, Wyoming Apply Now. Wilson Language Training is growing and is looking to hire a Senior Software Engineer. This position builds and supports the delivery of WLT software projects by working with, and guiding, a group of talented web develope...

Promoted
Humana
Cheyenne, Wyoming

Humana IT Enterprise Observability engineering team is looking for a well versed and experienced Senior Software Engineer-Splunk. As a Senior software engineer, you will provide support for the full system engineering life-cycle. Humana Senior Software Engineer-Splunk Cheyenne, Wyoming Apply Now. As...

Moses-Weitzman Health System Inc
Remote, Wyoming, 99999
Remote

The Grants Administrator is responsible for providing expert management of the grants portfolio pre and post award. The Grants Administrator will execute complex duties with a high degree of independence. Directs the preparation of proposal budget development and prepares final budget, facilitates t...

CVS Health
Work from home, WY, US
Remote

We are seeking a highly skilled and motivated individual to join our team as a Big Data Cloud-Based Vulnerability Management Data Analytics Developer. This is an exciting opportunity to work on cutting-edge technology and contribute to our mission of safeguarding critical data and infrastructure. Th...