University of Oregon, PhD (Mathematics)
University of Puget Sound, BS (Mathematics)
Tim Randolph is a statistician/mathematician who works with clinical, laboratory and public health scientists to analyze data from studies that are collecting a wide range of molecular measurements: genes, proteins, metabolites, microbes, biochemical markers and neuro-connectivity. Some analyses simply involve testing of whether these measurements differ between two groups of people or samples — drug versus placebo or healthy versus not healthy. However, molecular data often represent 1000’s of interacting measurements: genes interact with one another, genes code for proteins, enzymes regulate metabolite production, metabolites function together in pathways, microbial communities (and their genes) influence many host molecular functions, etc. These complex interactions—and their effects on human health—are the essence of many Fred Hutch scientific projects. Tim works with these scientists by applying appropriate statistical and machine learning tools, or developing his own methods, to aid their research.
My work focuses on mathematical and statistical methods to facilitate the analysis of data in clinical, laboratory and public health sciences research. These data are often comprised of many (1000’s or more) measurements per sample which represent the presence and/or activity of molecular functions. For example, patterns of gene expression may predictive of disease; groups of metabolites may be indicative of a drug’s success; or communities of gut bacteria may be associated with health. A particularly interesting challenge is to help researchers understand how these varied types of data, when analyzed together, may reveal additional insights. The goal is to put all of this into a statistical framework that accounts for the many uncertainties in the data so that inferences can be made about relationships between molecular measurements and health or disease.
I collaborate with clinical and laboratory scientists who are using data, as described above, in studies on a wide range of topics: early detection of colorectal cancer; body fat and cancer risk; brain imaging studies of HIV-associated cognitive decline; personalizing drug dose in hematopoietic stem cell transplant recipients; or understanding why leukemia patients respond to differently to the same drug.
Kernel Penalized Regression: This topic is motivated by the popular use of distance- and kernel-based association tests for analyzing multivariate data. Examples include investigating whether patterns of metabolite measures (and/or microbial abundances and/or gene expressions) are related to disease. Briefly, given n samples each with p measurements, an n-by-n (dis)similarity matrix can be formed to summarize relationships between the samples. These relationships may be plotted or further summarized to investigate whether the p measurements are associated with a disease or phenotype of the n individuals. The goal is to incorporate these structures into a high-dimensional linear regression model that selects variables (i.e., which metabolites, genes, etc) that are associated with the outcome — with A Shojaie, S Zhao and others.
PEER (Partially Empirical Eigenvectors for Regression): This family of projects is aimed at analyses of data having many (many) variables which may relate to one another based on spatial, temporal, biochemical or functional structure. These are extensions of penalized regression (such as ridge regression and the lasso) and provide a statistically tractable way of incorporating biological context into the process of estimation for an otherwise ill-posed (high-dimensional, underdetermined) problem. Applications include: longitudinally-sampled functions; (e.g., time-course or spectroscopy data); genomic or metabolic networks; neuro-connectivity data; phylogenetic structure for microbiome data — with J Harezlak, MG Kundu, Z Feng, D Brzyski, M Karas and others.
TACOMA (Tissue Array Co-Occurrence Matrix Analysis): accurate and interpretable open-source machine-learning algorithm for quantifying immunohistochemically-stained tissue images — with Donghui Yan and Pei Wang.
Functional principal components (FPC) for longitudinal HIV data and survival analysis: effective statistical approach for exploiting longitudinal patterns in CD4 counts and viral load. In contrast to classical summaries of these data, FPC scores are able to extract enough information from CD4-count and viral-load trajectories to reveal association between CD4 counts and survival. — lead by Dr S Holte; with J Baeten, J Ding, J Tien and J Overbaugh.
Sahale MS: Aimed at elucidating properties of the various methods of protein and/or peptide quantification---e.g., spectral counting or ion abundance measures. This JAVA software (SahaleJ) provides computational methods to quantify peptide and protein abundance and an R-package (SahaleR) provides some basic statistical methods for comparative "shotgun" proteomics experiments — lead by T Milac; with P Wang.