Statistical Learning and Data Science Hub

Faculty in the Statistical Learning and Data Science Hub advance statistical and machine learning methods tailored to the unique challenges of biomedical and epidemiologic data, including high-dimensionality, heterogeneity, and longitudinal structure. Their work spans supervised and unsupervised learning, causal inference with complex dependencies, and interpretable modeling for health prediction and individualized treatment. By developing scalable, robust, and reproducible algorithms, this Hub enables inference across diverse domains, from precision oncology and risk stratification to real-time decision support in clinical and public health settings.

Hub Faculty

Photo of Elizabeth Brown

Elizabeth Brown

Bayesian Biostatistics  ·  Joint Modeling  ·  Complex Data Analysis

Elizabeth Brown advances Bayesian methods and joint models for analyzing complex longitudinal and survival data. Her work supports the development of novel analytic approaches for vaccine trials and large-scale population studies.

photo of Chongzhi Di

Chongzhi Di

Real-World Data  ·  Longitudinal Health Data  ·  Functional Data Analysis

Chongzhi Di develops statistical learning methods for analyzing complex, high-dimensional, and real-time data in biomedical and behavioral research. His work focuses on wearable devices, accelerometry, and mobile-health applications, advancing methods for functional data analysis, causal inference, and adaptive study design to uncover links between physical activity, chronic disease, and health outcomes.

Photo of Youyi Fong

Youyi Fong

Machine Learning Methods  ·  Bayesian Modeling  ·  Complex Biomedical Data

Youyi Fong leads the Fong Group at Fred Hutch developing machine‑learning, deep‑learning, threshold‑regression and Bayesian methods to extract robust statistical inference from complex biomedical data, including immunologic assays, cell‑imaging, and protein‑sequence data.

Photo of Fei Gao

Fei Gao

Clinical Trial Analysis  ·  Public Health Epidemiology  ·  HIV Incidence Etimation

Fei Gao develops statistical methods for biomedical and public health research, including semiparametric inference, causal inference frameworks, and survival analysis models. Her group, the Gao Lab, focuses on improving study design and inference for complex data structures.

Photo of Peter Gilbert

Peter Gilbert

Causal Inference Methods  ·  Immune Correlates Analysis  ·  Vaccine Trial Statistics

Peter Gilbert develops statistical methods for vaccine trials, including immune-correlates evaluation, sieve analysis, causal inference, and survival and competing-risk models. The Gilbert Group focuses on rigorous inference for high-dimensional immunologic and pathogen-genetic data.

Photo of Elizabeth Halloran

M. Elizabeth Halloran

Vaccine Study Methodology  ·  Causal Inference  ·  Infectious Disease Modeling

M. Elizabeth "Betz" Halloran, Professor Emeritus and member of the National Academy of Sciences, develops statistical and mathematical methods for infectious disease and vaccine studies. She leads the Center for Inference and Dynamics of Infectious Diseases (CIDID) at Fred Hutch, which translates methodological innovations into tools for outbreak and prevention research. Dr. Halloran’s innovations in causal inference and infectious disease modeling remain central to the statistical foundations of vaccine efficacy and outbreak dynamics.

Photo of Ying Huang

Ying Huang

Biomarker Methodology  ·  High‑Dimensional Data Analysis  ·  Individualized Treatment Rules

Ying Huang leads the Huang Group where she develops efficient and robust statistical methods for biomarker studies. Her work focuses on selecting biomarkers from high-dimensional data, constructing individualized risk‑ and treatment‑selection rules, and improving inference for surrogate endpoints in clinical trials.

Photo of Holly Janes

Holly Janes

Biomarker Evaluation  ·  Risk Prediction Methods  ·  Adaptive Trial Design

Holly Janes develops statistical methods to evaluate biomarkers and optimize treatment or prevention decisions in vaccine and infectious‑disease studies. Her work supports rigorous risk prediction, surrogate‑endpoint validation, and adaptive design strategies in prevention trials.

photo of Jeff Leek

Jeff Leek

Statistical Learning  ·  Reproducible Research  ·  Scalable Data Tools

Jeff Leek develops methods to extract insight from large, noisy biomedical datasets. His work advances statistical learning, responsible AI, and open science through scalable tools and data resources such as the Fred Hutch Data Science Lab that promote rigorous, reproducible, and inclusive data analysis across biomedical and public health domains.

photo of Jingyi Jessica Li

Jingyi Jessica Li

High-Dimensional Inference  ·  Classification  ·  False Discovery Rate Control  ·  Simulation-Based Inference

Jingyi Jessica Li develops statistical and computational methods for high-dimensional biological data, with a focus on transcriptomics, single-cell and spatial omics, and multi-omics integration. Her work advances rigorous inference and reproducibility through novel frameworks for FDR control, simulation, and benchmarking, enabling robust discovery in genomic science through her Junction of Statistics and Biology Lab.

photo of Jing Ma

Jing Ma

Causal Inference  ·  Semiparametric Inference  ·  Methods

Jing Ma develops statistical methods for estimating causal effects and analyzing complex biomedical data, particularly microbiome and high-dimensional -omics studies. Her research combines semiparametric inference with network-based models to uncover microbial associations, enhance biomarker discovery, and support reproducible integration of multi-scale biological data. . Learn more at the Jing Ma Lab.

Photo of Ross Prentice

Ross Prentice

Survival Analysis Methodology  ·  Case‑Cohort Design  ·  Surrogate Endpoint Methods

Ross L. Prentice developed statistical methods that have shaped the analysis of survival, measurement error, case‑cohort studies, surrogate endpoints, competing risks, and longitudinal data. His methodological innovations remain foundational across clinical trials, nutritional epidemiology, and prevention research. As Professor Emeritus, Ross Prentice’s foundational work in survival analysis, case-cohort design, and surrogate endpoint methodology continues to influence the development of statistical tools across biomedical research.

Photo of Tim Randolph

Tim Randolph

High‑Dimensional Data Analysis  ·  Multi‑Omic Integration  ·  Machine Learning Methods

Tim Randolph leads the Randolph Lab, where he develops statistical and machine‑learning methods to analyze complex high‑dimensional molecular data, including genomics, proteomics, metabolomics, microbiome, and neuroimaging. His methodological work supports integrated multi‑omic analyses and enables robust inference from structured, high‑volume biological data.

photo of Wei Sun

Wei Sun

High-Dimensional Data Analysis  ·  Statistical Learning  ·  Statistical Integration

Wei Sun develops statistical and computational methods to understand the genetic and molecular basis of complex diseases. His research integrates multi-omic data (e.g., single-cell and spatial omics), imaging data, and statistical learning approaches, with applications spanning tumor immune microenvironments, cancer early detection, and defining gene functions from knock-out studies. He is co‑principal investigator of the MorPhiC U01 project, a multi-institutional consortium that maps gene–phenotype associations using large-scale knockout and omic data, reinforcing his leadership in causal and integrative modeling. Learn more at the Sun Lab.

photo of Mike Wu

Michael C. Wu

Longitudinal Data Analysis  ·  Clinical Trials  ·  Translational Methods

Michael Wu develops statistical learning methods with a focus on microbiome analysis, -omics integration, and translational medicine. He leads The Wu Group, which advances methodological innovations in clinical trial design, longitudinal and spatial data modeling, and precision oncology applications, improving both biomedical discovery and patient care. Through his leadership of the Biostatistics Consulting and Collaboration Center (BC3), he also helps deploy cutting-edge methods across the Cancer Consortium, ensuring broad impact of data science innovations. 

Photo of Vicky Wu

Qian (Vicky) Wu

Biomarker Modeling  ·  Molecular Data Analysis  ·  Trial Design Algorithms

Qian “Vicky” Wu leads development of statistical models and computational tools for biomarker, genetic, and molecular data analysis to predict treatment responses and optimize clinical trial design. Her work enables rigorous inference from complex molecular datasets and supports personalized medicine strategies in immunotherapy and oncology studies.

  ·  
photo of Lue Ping Zhao

Lue Ping Zhao

Statistics Learning  ·  High-Dimensional Data  ·  Risk Modeling

Lue Ping Zhao develops statistical learning methodology for analyzing complex and high dimensional genetic/genomic data, arising from both translational and basic science research projects.  His current works focuses on immunogenetics of autoimmune type 1 diabetes, assessing potentially causal associations of DNA polymorphisms and functional amino acids with the susceptibility to type 1 diabetes as well as with the progression from the seroconversion to the disease onset, with an ultimate goal of discovering novel immunotherapeutic targets.

photo of Yingqi Zhao

Yingqi Zhao

Dynamic Treatment Regimes  ·  Causal Inference  ·  Personalized Medicine

Yingqi Zhao develops statistical and machine learning methods to inform individualized treatment strategies in clinical and public health settings. Her Zhao Group advances causal inference, reinforcement learning, and adaptive trial design, providing rigorous frameworks for decision-making under uncertainty in precision oncology and chronic disease management.