
High-Performance Computing (HPC) Systems Administrator
Job Description
Mass General Brigham relies on a wide range of professionals, including doctors, nurses, business people, tech experts, researchers, and systems analysts to advance our mission. As a not-for-profit, we support patient care, research, teaching, and community service, striving to provide exceptional care. We believe that high-performing teams drive groundbreaking medical discoveries and invite all applicants to join us and experience what it means to be part of Mass General Brigham.
Job Summary
SummaryResponsible for ensuring the efficient and effective operation of computer systems, networks, and software applications
Does this position require Patient Care?
No
Essential Functions
-Installs and configures computer systems, networks, and software applications.
-Manages system and network performance, including monitoring and analyzing system logs and performance metrics to identify issues and optimize performance.
-Troubleshoot and resolve system issues and outages, including coordinating with other IT professionals to identify and resolve issues.
-Conduct system upgrades, installations, and migrations, including developing project plans, testing, and implementing changes.
-Keep up-to-date with advances in computer systems, networks, and software applications, including attending industry conferences, completing continuing education and professional development courses, and participating in online forums and user groups.
Qualifications
Education
Bachelor's Degree Related Field of Study required
Can this role accept experience in lieu of a degree?
Yes
Licenses and Credentials
Class D Passenger Vehicle Driver's License preferred
Experience
Experience in systems/applications administration. 2-3 years required.
Knowledge, Skills and Abilities
- Proficiency in a variety of operating systems.
- Experience with virtualization technologies.
- Strong knowledge of networking technologies.
- Experience with backup and recovery technologies and disaster recovery planning.
- Experience with scripting languages.
- Excellent problem-solving, analytical, and critical-thinking skills.
- Strong communication, collaboration, and interpersonal skills.
Additional Job Details (if applicable)
The Martinos Center for Biomedical Imaging at Massachusetts General Hospital seeks a dedicated and highly motivated High-Performance Computing (HPC) Systems Administrator (Sysadmin) to oversee and optimize the center's HPC cluster, a core computational resource supporting cutting-edge biomedical and neuroimaging research. The HPC Sysadmin will play a critical role in maintaining and enhancing the cluster's performance, supporting researchers in their computational workflows, and ensuring the scalability and reliability of the system.
This role is ideal for an individual with strong experience in HPC systems administration, an understanding of scientific computing needs, and the ability to work collaboratively with researchers from diverse disciplines.
This position is based at the Martinos Center for Biomedical Imaging in the Charlestown Navy Yard. This position offers a hybrid work environment, allowing for a combination of remote work and on-site responsibilities. The candidate must be located within a commutable distance to Charlestown, MA, and be available to attend regular in-person meetings with the Center’s Faculty and Leadership.
Why Join Us?
• Work in a multidisciplinary environment supporting groundbreaking research in computational methods, neuroscience, cancer, and cardiovascular health.
• Operate a state-of-the-art HPC cluster in collaboration with world-class researchers and scientists.
• Be part of a team dedicated to pushing the boundaries of technology in biomedical imaging.
Key Responsibilities
Cluster Management:
Oversee the day-to-day operations, maintenance, and optimization of the Martinos Center's HPC cluster, ensuring high availability, reliability, and performance.
Perform hardware and software upgrades, patching, and troubleshooting of HPC nodes, storage, and networking.
User Support:
Provide technical support and guidance to researchers and staff using the HPC cluster for computational tasks, such as neuroimaging, machine learning, and data analysis.
Assist users with job scheduling, resource allocation, and troubleshooting.
System Monitoring and Performance Optimization:
Develop and implement robust monitoring tools to track resource utilization and identify performance bottlenecks.
Analyze workloads and provide recommendations for optimization of computational workflows.
Collaboration and Training:
Collaborate with researchers to understand their computational needs and assist in designing tailored HPC solutions for their projects.
Develop training materials and lead workshops to educate researchers on best practices for using the cluster.
Qualifications
Experience with job scheduling using Slurm required.
Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field.
3+ years of experience in HPC systems administration or equivalent.
Strong expertise in Linux systems administration (e.g., CentOS, RHEL, Ubuntu) in an HPC environment.
Proficiency in HPC-related programming and scripting languages (e.g., Bash, Python, Perl).
Familiarity with parallel computing, distributed systems, and scientific computing frameworks.
Hands-on experience with storage systems, networking, and security in an HPC environment.
Excellent interpersonal and communication skills to interact with researchers and non-technical staff, and previous experience working with researchers
Demonstrated ability to adapt to changing technologies, workflows, and priorities in a dynamic research environment.
Strong organizational and time-management skills to efficiently manage multiple concurrent projects and tasks.
Preferred:
Advanced degree in Computer Science, Engineering, or a related field.
Knowledge of biomedical or neuroimaging applications and related software (e.g., FreeSurfer, FSL, SPM, ANTs, MATLAB).
Experience with machine learning workflows and GPU-based computing (e.g., PyTorch, CUDA, TensorFlow).
Familiartiy with data-intensive workflows and large-scale storage systems.
Candidate experience is thoughtfully considered throughout the recruitment process, and we offer flexibility in salary based on qualifications and experience.
Remote Type
Work Location
Scheduled Weekly Hours
Employee Type
Work Shift
Pay Range
$63,648.00 - $90,750.40/Annual
Grade
6
EEO Statement:
Mass General Brigham Competency Framework
At Mass General Brigham, our competency framework defines what effective leadership “looks like” by specifying which behaviors are most critical for successful performance at each job level. The framework is comprised of ten competencies (half People-Focused, half Performance-Focused) and are defined by observable and measurable skills and behaviors that contribute to workplace effectiveness and career success. These competencies are used to evaluate performance, make hiring decisions, identify development needs, mobilize employees across our system, and establish a strong talent pipeline.