High-Performance Computing (HPC) Systems Administrator

Charlestown, MAPosted 4 weeks ago

Full-timeonsite

Job Description

Site: The General Hospital Corporation

Mass General Brigham relies on a wide range of professionals, including doctors, nurses, business people, tech experts, researchers, and systems analysts to advance our mission. As a not-for-profit, we support patient care, research, teaching, and community service, striving to provide exceptional care. We believe that high-performing teams drive groundbreaking medical discoveries and invite all applicants to join us and experience what it means to be part of Mass General Brigham.

Job Summary

Summary
Responsible for ensuring the efficient and effective operation of computer systems, networks, and software applications

Does this position require Patient Care?
No

Essential Functions
-Installs and configures computer systems, networks, and software applications.

-Manages system and network performance, including monitoring and analyzing system logs and performance metrics to identify issues and optimize performance.

-Troubleshoot and resolve system issues and outages, including coordinating with other IT professionals to identify and resolve issues.

-Conduct system upgrades, installations, and migrations, including developing project plans, testing, and implementing changes.

-Keep up-to-date with advances in computer systems, networks, and software applications, including attending industry conferences, completing continuing education and professional development courses, and participating in online forums and user groups.

Qualifications

Education
Bachelor's Degree Related Field of Study required

Can this role accept experience in lieu of a degree?
Yes

Licenses and Credentials
Class D Passenger Vehicle Driver's License preferred

Experience
Experience in systems/applications administration. 2-3 years required.

Knowledge, Skills and Abilities
- Proficiency in a variety of operating systems.
- Experience with virtualization technologies.
- Strong knowledge of networking technologies.
- Experience with backup and recovery technologies and disaster recovery planning.
- Experience with scripting languages.
- Excellent problem-solving, analytical, and critical-thinking skills.
- Strong communication, collaboration, and interpersonal skills.

Additional Job Details (if applicable)

The Martinos Center for Biomedical Imaging at Massachusetts General Hospital seeks a dedicated and highly motivated High-Performance Computing (HPC) Systems Administrator (Sysadmin) to oversee and optimize the center's HPC cluster, a core computational resource supporting cutting-edge biomedical and neuroimaging research. The HPC Sysadmin will play a critical role in maintaining and enhancing the cluster's performance, supporting researchers in their computational workflows, and ensuring the scalability and reliability of the system.

This role is ideal for an individual with strong experience in HPC systems administration, an understanding of scientific computing needs, and the ability to work collaboratively with researchers from diverse disciplines.

This position is based at the Martinos Center for Biomedical Imaging in the Charlestown Navy Yard. This position offers a hybrid work environment, allowing for a combination of remote work and on-site responsibilities. The candidate must be located within a commutable distance to Charlestown, MA, and be available to attend regular in-person meetings with the Center’s Faculty and Leadership.

Why Join Us?
• Work in a multidisciplinary environment supporting groundbreaking research in computational methods, neuroscience, cancer, and cardiovascular health.
• Operate a state-of-the-art HPC cluster in collaboration with world-class researchers and scientists.
• Be part of a team dedicated to pushing the boundaries of technology in biomedical imaging.

Key Responsibilities

Cluster Management:
- Oversee the day-to-day operations, maintenance, and optimization of the Martinos Center's HPC cluster, ensuring high availability, reliability, and performance.
- Perform hardware and software upgrades, patching, and troubleshooting of HPC nodes, storage, and networking.
User Support:
- Provide technical support and guidance to researchers and staff using the HPC cluster for computational tasks, such as neuroimaging, machine learning, and data analysis.
- Assist users with job scheduling, resource allocation, and troubleshooting.
System Monitoring and Performance Optimization:
- Develop and implement robust monitoring tools to track resource utilization and identify performance bottlenecks.
- Analyze workloads and provide recommendations for optimization of computational workflows.
Collaboration and Training:
- Collaborate with researchers to understand their computational needs and assist in designing tailored HPC solutions for their projects.
- Develop training materials and lead workshops to educate researchers on best practices for using the cluster.

Qualifications

Experience with job scheduling using Slurm required.
Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field.
3+ years of experience in HPC systems administration or equivalent.
Strong expertise in Linux systems administration (e.g., CentOS, RHEL, Ubuntu) in an HPC environment.
Proficiency in HPC-related programming and scripting languages (e.g., Bash, Python, Perl).
Familiarity with parallel computing, distributed systems, and scientific computing frameworks.
Hands-on experience with storage systems, networking, and security in an HPC environment.
Excellent interpersonal and communication skills to interact with researchers and non-technical staff, and previous experience working with researchers
Demonstrated ability to adapt to changing technologies, workflows, and priorities in a dynamic research environment.
Strong organizational and time-management skills to efficiently manage multiple concurrent projects and tasks.

Preferred:

Advanced degree in Computer Science, Engineering, or a related field.
Knowledge of biomedical or neuroimaging applications and related software (e.g., FreeSurfer, FSL, SPM, ANTs, MATLAB).
Experience with machine learning workflows and GPU-based computing (e.g., PyTorch, CUDA, TensorFlow).
Familiartiy with data-intensive workflows and large-scale storage systems.

Candidate experience is thoughtfully considered throughout the recruitment process, and we offer flexibility in salary based on qualifications and experience.

Remote Type

Hybrid

Work Location

149 Thirteenth Street Building 149

Scheduled Weekly Hours

Employee Type

Regular

Work Shift

Day (United States of America)

Pay Range

$63,648.00 - $90,750.40/Annual

Grade

At Mass General Brigham, we believe in recognizing and rewarding the unique value each team member brings to our organization. Our approach to determining base pay is comprehensive, and any offer extended will take into account your skills, relevant experience if applicable, education, certifications and other essential factors. The base pay information provided offers an estimate based on the minimum job qualifications; however, it does not encompass all elements contributing to your total compensation package. In addition to competitive base pay, we offer comprehensive benefits, career advancement opportunities, differentials, premiums and bonuses as applicable and recognition programs designed to celebrate your contributions and support your professional growth. We invite you to apply, and our Talent Acquisition team will provide an overview of your potential compensation and benefits package.

EEO Statement:

1200 The General Hospital Corporation is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religious creed, national origin, sex, age, gender identity, disability, sexual orientation, military service, genetic information, and/or other status protected under law. We will ensure that all individuals with a disability are provided a reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. To ensure reasonable accommodation for individuals protected by Section 503 of the Rehabilitation Act of 1973, the Vietnam Veteran’s Readjustment Act of 1974, and Title I of the Americans with Disabilities Act of 1990, applicants who require accommodation in the job application process may contact Human Resources at (857)-282-7642.

Mass General Brigham Competency Framework

At Mass General Brigham, our competency framework defines what effective leadership “looks like” by specifying which behaviors are most critical for successful performance at each job level. The framework is comprised of ten competencies (half People-Focused, half Performance-Focused) and are defined by observable and measurable skills and behaviors that contribute to workplace effectiveness and career success. These competencies are used to evaluate performance, make hiring decisions, identify development needs, mobilize employees across our system, and establish a strong talent pipeline.

About Wentworth-Douglass Hospital

Website

More jobs at Wentworth-Douglass Hospital

Business Analyst II, International Patient Services

Somerville, MA

Neuro/ICU Staff Nurse BWH

Boston, MA

Neuro/ICU Staff Nurse BWH

Boston, MA

Stepdown Medicine Nurse BWH

Boston, MA

Neuro/ICU Nurse BWH

Boston, MA

Patient Business Representative

Boston, MA

Similar roles

CPU Verification Fellow, RISC-V High-Performance Processor

Tenstorrent · United States

AI Software Engineer — High-Performance GPU Communication

amd · Shanghai, Shanghai, China

Mechanical Engineer – HPC (High-Performance Computer) Systems & Liquid Cooling in Architecture and Network Solutions | AUMOVIO Korea

Aumovio · Seongnam Si, Gyeonggi-do, Korea, Republic of

Senior Sales Rep, AI Infrastructure & High-Performance Computing (HPC)

Western Digital · San Jose, CA, United States

High-Performance Computing DevOps Architect

Applied Materials · Bangalore, IND

Principal High-Performance LLM Training Engineer

NVIDIA · US-CA-Santa Clara