Faculty in the Statistical Learning and Data Science Hub advance statistical and machine learning methods tailored to the unique challenges of biomedical and epidemiologic data, including high-dimensionality, heterogeneity, and longitudinal structure. Their work spans supervised and unsupervised learning, causal inference with complex dependencies, and interpretable modeling for health prediction and individualized treatment. By developing scalable, robust, and reproducible algorithms, this Hub enables inference across diverse domains, from precision oncology and risk stratification to real-time decision support in clinical and public health settings.
Hub Faculty

Elizabeth Brown
Bayesian Biostatistics · Joint Modeling · Complex Data Analysis
Elizabeth Brown advances Bayesian methods and joint models for analyzing complex longitudinal and survival data. Her work supports the development of novel analytic approaches for vaccine trials and large-scale population studies.

Chongzhi Di
Real-World Data · Longitudinal Health Data · Functional Data Analysis
Chongzhi Di develops statistical learning methods for analyzing complex, high-dimensional, and real-time data in biomedical and behavioral research. His work focuses on wearable devices, accelerometry, and mobile-health applications, advancing methods for functional data analysis, causal inference, and adaptive study design to uncover links between physical activity, chronic disease, and health outcomes.

Youyi Fong
Machine Learning Methods · Bayesian Modeling · Complex Biomedical Data
Youyi Fong leads the Fong Group at Fred Hutch developing machine‑learning, deep‑learning, threshold‑regression and Bayesian methods to extract robust statistical inference from complex biomedical data, including immunologic assays, cell‑imaging, and protein‑sequence data.

Fei Gao
Clinical Trial Analysis · Public Health Epidemiology · HIV Incidence Etimation
Fei Gao develops statistical methods for biomedical and public health research, including semiparametric inference, causal inference frameworks, and survival analysis models. Her group, the Gao Lab, focuses on improving study design and inference for complex data structures.

Peter Gilbert
Causal Inference Methods · Immune Correlates Analysis · Vaccine Trial Statistics
Peter Gilbert develops statistical methods for vaccine trials, including immune-correlates evaluation, sieve analysis, causal inference, and survival and competing-risk models. The Gilbert Group focuses on rigorous inference for high-dimensional immunologic and pathogen-genetic data.

M. Elizabeth Halloran
Vaccine Study Methodology · Causal Inference · Infectious Disease Modeling
M. Elizabeth "Betz" Halloran, Professor Emeritus and member of the National Academy of Sciences, develops statistical and mathematical methods for infectious disease and vaccine studies. She leads the Center for Inference and Dynamics of Infectious Diseases (CIDID) at Fred Hutch, which translates methodological innovations into tools for outbreak and prevention research. Dr. Halloran’s innovations in causal inference and infectious disease modeling remain central to the statistical foundations of vaccine efficacy and outbreak dynamics.

Ying Huang
Biomarker Methodology · High‑Dimensional Data Analysis · Individualized Treatment Rules
Ying Huang leads the Huang Group where she develops efficient and robust statistical methods for biomarker studies. Her work focuses on selecting biomarkers from high-dimensional data, constructing individualized risk‑ and treatment‑selection rules, and improving inference for surrogate endpoints in clinical trials.

Holly Janes
Biomarker Evaluation · Risk Prediction Methods · Adaptive Trial Design
Holly Janes develops statistical methods to evaluate biomarkers and optimize treatment or prevention decisions in vaccine and infectious‑disease studies. Her work supports rigorous risk prediction, surrogate‑endpoint validation, and adaptive design strategies in prevention trials.

Jeff Leek
Statistical Learning · Reproducible Research · Scalable Data Tools
Jeff Leek develops methods to extract insight from large, noisy biomedical datasets. His work advances statistical learning, responsible AI, and open science through scalable tools and data resources such as the Fred Hutch Data Science Lab that promote rigorous, reproducible, and inclusive data analysis across biomedical and public health domains.

Jingyi Jessica Li
High-Dimensional Inference · Classification · False Discovery Rate Control · Simulation-Based Inference
Jingyi Jessica Li develops statistical and computational methods for high-dimensional biological data, with a focus on transcriptomics, single-cell and spatial omics, and multi-omics integration. Her work advances rigorous inference and reproducibility through novel frameworks for FDR control, simulation, and benchmarking, enabling robust discovery in genomic science through her Junction of Statistics and Biology Lab.

Jing Ma
Causal Inference · Semiparametric Inference · Methods
Jing Ma develops statistical methods for estimating causal effects and analyzing complex biomedical data, particularly microbiome and high-dimensional -omics studies. Her research combines semiparametric inference with network-based models to uncover microbial associations, enhance biomarker discovery, and support reproducible integration of multi-scale biological data. . Learn more at the Jing Ma Lab.

Ross Prentice
Survival Analysis Methodology · Case‑Cohort Design · Surrogate Endpoint Methods
Ross L. Prentice developed statistical methods that have shaped the analysis of survival, measurement error, case‑cohort studies, surrogate endpoints, competing risks, and longitudinal data. His methodological innovations remain foundational across clinical trials, nutritional epidemiology, and prevention research. As Professor Emeritus, Ross Prentice’s foundational work in survival analysis, case-cohort design, and surrogate endpoint methodology continues to influence the development of statistical tools across biomedical research.

Tim Randolph
High‑Dimensional Data Analysis · Multi‑Omic Integration · Machine Learning Methods
Tim Randolph leads the Randolph Lab, where he develops statistical and machine‑learning methods to analyze complex high‑dimensional molecular data, including genomics, proteomics, metabolomics, microbiome, and neuroimaging. His methodological work supports integrated multi‑omic analyses and enables robust inference from structured, high‑volume biological data.

Wei Sun
High-Dimensional Data Analysis · Statistical Learning · Statistical Integration
Wei Sun develops statistical and computational methods to understand the genetic and molecular basis of complex diseases. His research integrates multi-omic data (e.g., single-cell and spatial omics), imaging data, and statistical learning approaches, with applications spanning tumor immune microenvironments, cancer early detection, and defining gene functions from knock-out studies. He is co‑principal investigator of the MorPhiC U01 project, a multi-institutional consortium that maps gene–phenotype associations using large-scale knockout and omic data, reinforcing his leadership in causal and integrative modeling. Learn more at the Sun Lab.

Michael C. Wu
Longitudinal Data Analysis · Clinical Trials · Translational Methods
Michael Wu develops statistical learning methods with a focus on microbiome analysis, -omics integration, and translational medicine. He leads The Wu Group, which advances methodological innovations in clinical trial design, longitudinal and spatial data modeling, and precision oncology applications, improving both biomedical discovery and patient care. Through his leadership of the Biostatistics Consulting and Collaboration Center (BC3), he also helps deploy cutting-edge methods across the Cancer Consortium, ensuring broad impact of data science innovations.

Qian (Vicky) Wu
Biomarker Modeling · Molecular Data Analysis · Trial Design Algorithms
Qian “Vicky” Wu leads development of statistical models and computational tools for biomarker, genetic, and molecular data analysis to predict treatment responses and optimize clinical trial design. Her work enables rigorous inference from complex molecular datasets and supports personalized medicine strategies in immunotherapy and oncology studies.

Lue Ping Zhao
Statistics Learning · High-Dimensional Data · Risk Modeling
Lue Ping Zhao develops statistical learning methodology for analyzing complex and high dimensional genetic/genomic data, arising from both translational and basic science research projects. His current works focuses on immunogenetics of autoimmune type 1 diabetes, assessing potentially causal associations of DNA polymorphisms and functional amino acids with the susceptibility to type 1 diabetes as well as with the progression from the seroconversion to the disease onset, with an ultimate goal of discovering novel immunotherapeutic targets.

Yingqi Zhao
Dynamic Treatment Regimes · Causal Inference · Personalized Medicine
Yingqi Zhao develops statistical and machine learning methods to inform individualized treatment strategies in clinical and public health settings. Her Zhao Group advances causal inference, reinforcement learning, and adaptive trial design, providing rigorous frameworks for decision-making under uncertainty in precision oncology and chronic disease management.