PROGRAM DESCRIPTION We are seeking a skilled and motivated Computational Scientist to join the Cancer Genomics Research Laboratory (CGR), located at the National Cancer Institute (NCI) Shady Grove campus in Rockville, MD. CGR is operated by Leidos Biomedical Research, Inc., and collaborates with the NCI’s Division of Cancer Epidemiology and Genetics (DCEG) - the world’s leading cancer epidemiology research group. Our scientific team leverages cutting-edge technologies to investigate genetic, epigenetic, transcriptomic, proteomic, and molecular factors that drive cancer susceptibility and outcomes. We are deeply committed to the mission of discovering the causes of cancer and advancing new prevention strategies through our contributions to DCEG’s pioneering research. Our team of CGR bioinformaticians supports DCEG’s multidisciplinary family- and population-based studies by working closely with epidemiologists, biostatisticians, and basic research scientists in DCEG’s intramural research program. We provide end-to-end bioinformatics support for genome-wide association studies (GWAS) using SNP arrays, methylation arrays, targeted sequencing, whole-exome sequencing, whole-transcriptome sequencing, and whole-genome sequencing, along with viral and metagenomic studies from both short- and long-read sequencing platforms. This includes the analysis of germline and somatic variants, structural variations, copy number variations, microsatellite analysis, gene and isoform expression, base modifications, viral and bacterial genomics, and more. Additionally, we advance cancer research by integrating the latest technologies, such as single-cell, multi-omics, spatial transcriptomics, and proteomics, in collaboration with the Functional and Molecular and Digital Pathology Laboratory groups within CGR. We extensively analyze large population databases such as All of Us, UK Biobank, gnomAD, and the 1000 Genomes Project to inform and validate GWAS signals, study associations between genetic variation and gene expression, protein levels, and metabolites, and develop polygenic risk scores across multiple populations. The bioinformatics team develops and implements sophisticated HPC- and cloud-enabled pipelines and data analysis methodologies, blending traditional bioinformatics and statistical approaches with cutting-edge techniques such as machine learning, deep learning, and generative AI models. We prioritize reproducibility through the use of containerization, workflow and code management tools, thorough benchmarking, and detailed workflow documentation. Our infrastructure and data management team works closely with researchers and bioinformaticians to maintain and optimize a high-performance computing (HPC) cluster, provision cloud environments, and curate and share large datasets. The successful candidate will demonstrate scientific and technical leadership in analyzing large-scale single-cell, multi-omics, spatial transcriptomics, and proteomics datasets across diverse cancer types, supported by a strong publication record and code repositories that reflect advanced expertise in single-cell and spatial omics data analysis and interpretation. The computational scientist will develop and test hypotheses, design analytical plans, execute end-to-end analyses, and summarize, interpret, and present results while collaborating closely with investigators and scientists. The candidate will utilize strong knowledge of experimental design, upstream quality control (QC) metric interpretation and visualization, nuclear and cell segmentation approaches, and post-segmentation sample-level QC, filtering, and clustering to generate high-quality results from large projects. Additional expertise should include multi-sample data integration, batch correction, QC, coarse- and fine-grained clustering, label transfer, and cell-type annotation. The candidate should also possess expertise in advanced downstream statistical modeling tailored to address important scientific questions, with a strong foundation in statistical, machine learning, and deep learning approaches for biological data analysis. In addition, the candidate must possess strong scientific literature review and research skills to stay current with emerging developments in the field and incorporate new analytical approaches into their work. This role requires demonstrated expertise in handling large and complex datasets and collaborating effectively within multidisciplinary research teams to generate meaningful biological insights. KEY ROLES/RESPONSIBILITIES Lead end-to-end analyses and discussions of single-cell, multi-omics, spatial transcriptomics, and proteomics projects through close collaboration with DCEG investigators, CGR wet-lab scientists, MDPL scientists, and bioinformaticians. Demonstrate strong knowledge in experimental design, hypothesis formulation and testing, and development of analytical aims, leveraging expertise in cancer biology and spatial omics. Evaluate existing spatial infrastructure and analytical capabilities and build upon them by implementing state-of-the-art methods for single-cell RNA-seq, single-cell ATAC-seq, multi-omics, and spatial omics analyses. Use strong knowledge of community standards and best practices to benchmark existing and emerging software tools and incorporate them into workflows. Perform nuclear segmentation and expansion of high-resolution images by comparing various tools and optimize for different tissue and cancer types. Evaluate QC metrics and appropriate filtering criteria for downstream processing. Perform batch correction, data integration, cell clustering, and annotation. Identify and create new single-cell references across different cancer and tissue types. Study tumor microenvironments and immune cell infiltration. Apply statistical, machine learning, and deep learning approaches required for both upstream and downstream analyses. Develop clear, interpretable visualizations and analytical reports to communicate findings and support scientific discovery. Perform integration of multi-modal omics datasets to support large-scale oncology research initiatives. Conduct reproducible scientific research through documentation of software versions, processes, and pipelines, along with the use of tools such as Markdown documents, Conda and R environments, Docker, Singularity, GitHub, and Snakemake/Nextflow. Use High-Performance Compute Clusters and the Slurm scheduler, optimize computational resources and data storage requirements to ensure scalability for large datasets. Summarize and interpret findings through clear visualizations and reports, and present results to senior leadership and scientists from diverse backgrounds. Contribute to manuscript preparation, submission, and revision processes, with strong opportunities for scientific co-authorship. Stay current with advances in the field through scientific literature review, seminars, workshops, and cross-disciplinary collaborations. BASIC QUALIFICATIONS To be considered for this position, you must minimally meet the knowledge, skills, and abilities listed below: Possession of a PhD degree from an accredited college or university according to the Council for Higher Education Accreditation (CHEA) in Bioinformatics, Computational Biology, Biostatistics, or a related field. Foreign degrees must be evaluated for U.S. equivalency. Demonstrated experience with single-cell, single-nucleus multiomic, and spatial omics data analysis, including a solid understanding of statistical and analytical methods for biomarker discovery and spatial profiling. Experience working with both sequencing- and imaging-based spatial omics platforms. Strong publication record demonstrating the ability to analyze and interpret single-cell and spatial omics datasets. Strong programming skills in R and Python, with the ability to work in RStudio, VS Code, and Jupyter Notebooks. Strong knowledge of reproducibility practices and version control using Docker, Singularity, GitHub, workflow management systems (Snakemake/Nextflow), R and Conda environments, and Markdown documents. Proficiency in shell scripting (e.g., Bash, AWK, SED). Proficiency working in Linux-based HPC or cloud environments, with a strong understanding of Slurm and the ability to work with large datasets using best practices for algorithmic efficiency, parallelization, and scalability. Ability to work independently and collaboratively with internal and external investigators. Strong written, verbal, and presentation skills. Ability to work effectively in a multidisciplinary research environment and communicate technical findings clearly to non-specialist audiences through reports and presentations detailing methodologies and results. Efficient and organized data management for large projects. Strong work ethic and a proactive, solution-oriented mindset. Self-motivated, research-focused professional with a passion for advancing cancer genomics. Ability to obtain and maintain a security clearance. PREFERRED QUALIFICATIONS Candidates with these desired skills will be given preferential consideration: Minimum of three years of postdoctoral or equivalent experience in academia or industry. Experience analyzing data from major single-cell and spatial platforms, including 10x Visium HD, Xenium, and Ultivue. Strong working knowledge of end-to-end processing and analysis of single-cell RNA-seq, single-cell ATAC-seq, single-nucleus multi-omics, and spatial transcriptomics datasets. Proficiency in Cell Ranger, Space Ranger, StarDist, QuPath, and familiarity with HALO for evaluating tissue and data quality, nuclear segmentation, and image analysis. Familiarity with nuclear segmentation approaches such as CellPose, Baysor and Proseg, and cell-free spatial segmentation approaches such as FICTURE. Proficiency in Seurat, Bioconductor, SpatialData, Scanpy, and Squidpy frameworks for end-to-end analysis. Strong knowledge of: Harmony, FastMNN, and RPCA for batch correction Non-spatial (Leiden, Louvain) and spatial (Banksy, spaGCN) clustering algorithms RCTD, Azimuth, and SingleR for label transfer and cell annotation Cell-cell communication tools such as LIANA, CellChat, and CellPhoneDB DESeq2 for differential expression analysis Niche and pathway enrichment analysis (GSEA) Visualization tools such as Loupe Browser, IGV, and UCSC Genome Browser Major file formats such as BAM and Parquet Proficiency in creating high-quality visualizations using ggplot2 in R and data visualization libraries in Python (e.g., Matplotlib, Seaborn, Plotly). A public code portfolio (e.g., GitHub, GitLab) demonstrating relevant expertise. Additional experience analyzing bulk transcriptomics, proteomics, metabolomics, or cancer genomics data from next-generation sequencing platforms is a plus.

Computational Scientist I, Single-cell/Spatial Cancer Genomics, CGR

Job Description

More jobs at Frederick National Laboratory for Cancer Research

More jobs at Frederick National Laboratory for Cancer Research