Data Scientist
Job Description
Job Summary:
The NeuroAI Lab at UCSF, led by Assistant Professor Pedro Pinheiro-Chagas, invites applications for a Data Scientist to support and advance our research initiatives at the intersection of AI and cognitive neuroscience. The successful candidate will contribute to a multidisciplinary team developing cutting-edge AI systems for the discovery, diagnosis, and care of neurodegenerative diseases. The Data Scientist will work across the lab’s core research areas, including: (1) foundation AI models that leverage large-scale
clinical databases to identify disease subtypes and risk factors; (2) agentic AI decision support systems that synthesize multimodal patient data for specialist-level diagnoses; and (3) AI conversational agents for scalable clinical assessments. The role involves designing and maintaining robust data pipelines, curating and analyzing complex multimodal datasets (clinical notes, neuropsychological assessments, neuroimaging), developing and evaluating machine learning models, and contributing to research publications. The ideal candidate will have strong programming skills and experience with generative AI and traditional ML algorithms, a passion for translational research, and the ability to work effectively across disciplines. This position offers exceptional opportunities for professional growth at the forefront of health AI, with direct exposure to world-class clinical collaborators and the potential to impact real-world patient care.
Department Description:
The NeuroAI Lab operates at the UCSF Memory and Aging Center-Department of Neurology - Weill Institute for Neuroscience. We are also affiliated with the Bakar Computational Health Sciences Institute and the Center for Intelligent Imaging (ci2).
% of time | Essential Function (Yes/No) |
Key Responsibilities (To be completed by Supervisor) |
20% | Plans long-term statistical studies, including the preparation of proposals, design of survey instruments and determining sampling procedures. | |
20% | Gathers, analyzes, prepares and summarizes the collection of information and data; recommends statistical approaches, trends, sources and uses. | |
5% | Prepares data for presentation. | |
10% | Identifies multivariate strategies. | |
5% | Prepares reports of studies for internal validation and cross validation studies. | |
5% | Analyzes the interrelationships of data and defines logical aspects of data sets. | |
10% | Develops systems for organizing data to analyze, identify and report trends. | |
10% | Manages database of research data for projects. | |
5% | Reviews new software instruments and potential effects on statistical testing. May make programming modifications. | |
5% | Participates in development and implementation of data security policies and procedures. | |
5% | Keeps abreast of technical advances in storage, documentation and dissemination of computerized data. | |
0 | ||
0 | ||
0 | ||
0 | ||
100% | (To update total %, enter the amount of time in whole numbers (without the % symbol - e.g., 15, 20) then highlight the total sum (e.g., 1%) at the bottom of the column and press F9. The total sum should add up to 100%.) |
Required Qualifications:
- Bachelor’s degree in related area and / or equivalent experience / training.
- Proficiency in Python, with demonstrated experience in data analysis, machine learning, and statistical modeling.
- Experience building and maintaining data pipelines, integrating APIs, and working with large-scale datasets.
- Familiarity with version control (Git) and collaborative software development practices.
- Strong analytical and problem-solving skills, with the ability to work independently and manage multiple projects.
- Excellent written and verbal communication skills.
Preferred Qualifications:
- Master’s or Ph.D. in Computer Science, Data Science, Biomedical Informatics, Computational Neuroscience, Biostatistics, or a related quantitative field.
- Experience with LLMs.
- Familiarity with clinical, cognitive, or health-related datasets (e.g., electronic health records, neuroimaging, neuropsychological assessments).
- Knowledge of data security, privacy, and regulatory standards for clinical applications (e.g., HIPAA compliance).
- Experience with deep learning frameworks (PyTorch, TensorFlow) and cloud computing platforms.
- Track record of peer-reviewed publications or contributions to open-source projects.
- Interest in or experience with agentic AI architectures, retrieval-augmented generation (RAG), or multi-agent systems.
- Ability to thrive in a fast-paced, multidisciplinary research environment.