Statistical Learning and Data Science Hub

Faculty develop methods for structured and unstructured biomedical data that advance statistical inference, machine learning, causal inference, and algorithmic modeling. Their work delivers principled uncertainty quantification (e.g., confidence intervals, Bayesian posteriors, conformal risk control), supporting individualized prediction and robust data integration in observational, population health, and translational research settings.

Hub Faculty

Photo of Elizabeth Brown

Elizabeth Brown

Bayesian Biostatistics  ·  Joint Modeling  ·  Complex Data Analysis

Elizabeth Brown advances Bayesian methods and joint models for analyzing complex longitudinal and survival data. Her work supports the development of novel analytic approaches for vaccine trials and large-scale population studies.

photo of Chongzhi Di

Chongzhi Di

Real-World Data  ·  Mobile and Wearable Devices  ·  Longitudinal Health Data  ·  Functional Data Analysis

Chongzhi Di develops statistical learning methods for analyzing complex, high-dimensional, and real-time data in biomedical and behavioral research. His work focuses on wearable devices, accelerometry, and mobile-health applications, advancing methods for functional data analysis, causal inference, and adaptive study design to uncover links between physical activity, chronic disease, and health outcomes.

Photo of Youyi Fong

Youyi Fong

Deep Learning Methods  ·  Threshold Regression  ·  Complex Biomedical Data

Youyi Fong leads the Fong Group at Fred Hutch developing deep learning, threshold regression, Bayesian methods, and efficient study sampling designs to extract robust statistical inference from complex biomedical data, including immunologic assays, cell‑imaging, and protein‑sequence data.

Photo of Fei Gao

Fei Gao

Causal Inference  ·  Public Health Epidemiology  ·  HIV Incidence Etimation

Fei Gao develops statistical methods for biomedical and public health research, including semiparametric inference, causal inference frameworks, and survival analysis models. Her group, the Gao Lab, focuses on improving methodological strategies and inference for complex data structures in epidemiological research.

Photo of Peter Gilbert

Peter Gilbert

Causal Inference Methods  ·  Immune Correlates/Surrogate Endpoint  Analysis  ·  Vaccine Trial Statistics

Peter Gilbert develops statistical methods for vaccine and monoclonal antibody clinical trials, including immune correlates evaluation, sieve analysis, causal inference, and survival analysis models. The Gilbert Group focuses on rigorous inference for high-dimensional immunologic and pathogen-genetic data.

Photo of Elizabeth Halloran

M. Elizabeth Halloran

Vaccine Study Methodology  ·  Causal Inference  ·  Infectious Disease Modeling

M. Elizabeth "Betz" Halloran, Professor Emeritus and member of the National Academy of Sciences, develops statistical and mathematical methods for infectious disease and vaccine studies. She leads the Center for Inference and Dynamics of Infectious Diseases (CIDID) at Fred Hutch, which translates methodological innovations into tools for outbreak and prevention research. Dr. Halloran’s innovations in causal inference and infectious disease modeling remain central to the statistical foundations of vaccine efficacy and outbreak dynamics.

Photo of Ying Huang

Ying Huang

Biomarker Methodology  ·  High‑Dimensional Data Analysis  ·  Individualized Treatment Rules

Ying Huang leads the Huang Group, where she develops efficient and robust statistical methods to advance biomarker research. Her work centers on selecting biomarkers from high-dimensional data, constructing individualized risk- and treatment-selection rules, and improving inference for surrogate endpoints in clinical trials.

Photo of Holly Janes

Holly Janes

Biomarker Evaluation  ·  Risk Prediction Methods  ·  Adaptive Trial Design

Holly Janes develops statistical methods to evaluate biomarkers and optimize treatment or prevention decisions in vaccine and infectious‑disease studies. Her work supports rigorous risk prediction, surrogate‑endpoint validation, and adaptive design strategies in prevention trials.

photo of Jeff Leek

Jeff Leek

Statistical Learning  ·  Reproducible Research  ·  Scalable Data Tools

Jeff Leek develops methods to extract insight from large, noisy biomedical datasets. His work advances statistical learning, responsible AI, and open science through scalable tools and data resources such as the Fred Hutch Data Science Lab that promote rigorous, reproducible, and inclusive data analysis across biomedical and public health domains.

photo of Jingyi Jessica Li

Jingyi Jessica Li

High-Dimensional Inference  ·  Classification  ·  False Discovery Rate Control  ·  Simulation-Based Inference

Jingyi Jessica Li develops statistical and computational methods for high-dimensional biological data, with a focus on transcriptomics, single-cell and spatial omics, and multi-omics integration. Her work advances rigorous inference and reproducibility through novel frameworks for FDR control, simulation, and benchmarking, enabling robust discovery in genomic science through her Junction of Statistics and Biology Lab.

photo of Jing Ma

Jing Ma

Causal Inference  ·  Semiparametric Inference  ·  Methods

Jing Ma develops statistical methods for estimating causal effects and analyzing complex biomedical data, particularly microbiome and high-dimensional -omics studies. Her research combines semiparametric inference with network-based models to uncover microbial associations, enhance biomarker discovery, and support reproducible integration of multi-scale biological data. . Learn more at the Jing Ma Lab.

Photo of Ross Prentice

Ross Prentice

Survival Analysis Methodology  ·  Case‑Cohort Design  ·  Surrogate Endpoint Methods

Ross L. Prentice developed statistical methods that have shaped the analysis of survival, measurement error, case‑cohort studies, surrogate endpoints, competing risks, and longitudinal data. His methodological innovations remain foundational across clinical trials, nutritional epidemiology, and prevention research. As Professor Emeritus, Ross Prentice’s foundational work in survival analysis, case-cohort design, and surrogate endpoint methodology continues to influence the development of statistical tools across biomedical research.

Photo of Tim Randolph

Tim Randolph

High‑Dimensional Data Analysis  ·  Multi‑Omic Integration  ·  Machine Learning Methods

Tim Randolph leads the Randolph Lab, where he develops statistical and machine‑learning methods to analyze complex high‑dimensional molecular data, including genomics, proteomics, metabolomics, microbiome, and neuroimaging. His methodological work supports integrated multi‑omic analyses and enables robust inference from structured, high‑volume biological data.

photo of Wei Sun

Wei Sun

High-Dimensional Data Analysis  ·  Statistical Learning  ·  Statistical Integration

Wei Sun develops statistical and computational methods to understand the genetic and molecular basis of complex diseases. His research integrates multi-omic data (e.g., single-cell and spatial omics), imaging data, and statistical learning approaches, with applications spanning tumor immune microenvironments, cancer early detection, and defining gene functions from knock-out studies. He is co‑principal investigator of the MorPhiC U01 project, a multi-institutional consortium that maps gene–phenotype associations using large-scale knockout and omic data, reinforcing his leadership in causal and integrative modeling. Learn more at the Sun Lab.

photo of Mike Wu

Michael C. Wu

Longitudinal Data Analysis  ·  Clinical Trials  ·  Translational Methods

Michael Wu develops statistical learning methods with a focus on microbiome analysis, -omics integration, and translational medicine. He leads The Wu Group, which advances methodological innovations in clinical trial design, longitudinal and spatial data modeling, and precision oncology applications, improving both biomedical discovery and patient care. Through his leadership of the Biostatistics Consulting and Collaboration Center (BC3), he also helps deploy cutting-edge methods across the Cancer Consortium, ensuring broad impact of data science innovations. 

Photo of Vicky Wu

Qian (Vicky) Wu

Biomarker Modeling  ·  Statistical Genetics  ·   Statistical Software

Qian “Vicky” Wu  leads development of statistical models and computational tools for biomarker, genetic, and molecular data analysis, including serum cytokine data analysis, immunohistochemical (IHC) data analysis, epigenetic (CUT&RUN, ChIP-seq, DNAse-seq), GWAS, RNA-seq, gene regulatory networks, and copy number variants (CNV) analysis. Most of her work is related with CAR-T immunotherapy. She is the lead biostatistician of more than 20 CAR-T trials running at Fred Hutch and Seattle Children’s. Also, she has developed several R tools, including TrialSize, CNVtest, ChIPtest, Spacelog, HDI_Shiny, TVCurves, SampleN, etc. 

photo of Lue Ping Zhao

Lue Ping Zhao

Statistics Learning  ·  High-Dimensional Data  ·  Risk Modeling

Lue Ping Zhao develops statistical learning methodology for analyzing complex and high dimensional genetic/genomic data, arising from both translational and basic science research projects.  His current works focuses on immunogenetics of autoimmune type 1 diabetes, assessing potentially causal associations of DNA polymorphisms and functional amino acids with the susceptibility to type 1 diabetes as well as with the progression from the seroconversion to the disease onset, with an ultimate goal of discovering novel immunotherapeutic targets.

photo of Yingqi Zhao

Yingqi Zhao

Dynamic Treatment Regimes  ·  Causal Inference  ·  Personalized Medicine

Yingqi Zhao develops statistical and machine learning methods to inform individualized treatment strategies in clinical and public health settings. Her Zhao Group advances causal inference, reinforcement learning, and adaptive trial design, providing rigorous frameworks for decision-making under uncertainty in precision oncology and chronic disease management.