A look at the statistics of outliers with Youyi Fong

Vaccine and Infectious Disease Division

A look at the statistics of outliers with Youyi Fong

Dr. Youyi Fong

Assistant member Dr. Youyi Fong joined VIDD in July 2010, after completing a PhD in Biostatistics from the University of Washington, where he worked with biostatistics professor Dr. Jon Wakefield. Fong is interested in improving statistical methods for biological assays used in HIV vaccine trials.

In HIV vaccine trials and in large clinical trials in general, experiments have to be planned precisely and there is often little room for error.  In a long-term, expensive trial such as a large vaccine trial, there may be limited questions scientists can address using blood samples or other data collected from the study volunteers.  So experiments must be planned precisely and methods to deal with outlying data points are especially important.  Such outliers deviate markedly from other data points and are beyond the expected variability in an experiment.  New VIDD assistant member Dr. Youyi Fong is working on statistical methods to deal with these outliers in experiments that cannot be repeated.

Fong is specifically looking at improving statistical methods to deal with outliers for data from Luminex experiments, a type of assay that simultaneously measures concentrations of many different molecules in blood (or other) samples.  This type of assay is essential in learning more about the biological mechanisms of an effective or non-effective vaccine.  Following on the partial success of the Thai trial, many HIV vaccine researchers are hard on the hunt for a “correlate of protection,” the presence or increased amount of a certain molecule in the blood of vaccine recipients who were protected from HIV infection.  Identifying such a correlate would help improve vaccine design and speed up future vaccine trials, as researchers could look for the presence of that molecule rather than wait years to study the vaccine’s effects on infection rates.

Fong’s approaches focus on outlying data points in the “standards” used in Luminex assays.  Each experiment, where the researchers are looking for the concentration of a number of different molecules in the blood samples from study volunteers, is compared to a standard curve made from a set of samples with known concentrations.  These standards are essential to determining the amount of a given molecule present in the blood samples.  Fong is devising robust statistical methods, a type of method that is less affected by deviations from the original model or curve, to better handle gross outliers should they occur within the standard curves. 

“You can do a good job with existing methods, but what justifies the investment in improving the statistical method is that the blood samples are a limiting resource in this project,” Fong said.  “Luminex is a cutting edge technology because now we can measure 40-100 things from very small samples, which means if something happens, if there is an outlier, the cost of not doing anything about it is higher.  We don’t want to waste the patient samples, those are very precious.”

The problem with outliers, Fong said, is that the standard data points’ influence on the ”standard” curve increases as its value increases, meaning that if an error results in an outlying point that is too high, it will have a very large and incorrect influence on the standard curve.  His methods involve using a novel type of distribution model for measurement error, called a correlated mixture Gaussian distribution, that takes into account mixtures of different types of data points present in the population to be modeled.  Also critical to the success of the method is a probability-based framework that incorporates information known about the standard curves from previous experiments.

Fong’s statistical approaches to Luminex data connect with his doctoral work in the department of Biostatistics at the University of Washington.  There, Fong studied statistical ways to classify proteins into subfamilies, based on their structure, function and sequence.  Both his doctoral work and the Luminex project involve making inferences to model the composition of a mixed population, no easy task because which data points belong to which subpopulations in the model is unknown. Fong is working on extending a novel algorithm he devised for the protein work to bear on the Luminex data work. He was drawn to join VIDD because he saw that vaccines in general, and HIV vaccines in particular, were a great societal need that would spawn large technological advances, including big advances in statistics.

“I’m here because vaccines are an urgent problem, so there are a lot of people working on them, and we all see a need for developing better statistical methods,” Fong said.  “Statistical methods for vaccines are not that different for methods in cancer, for example.  These models can be applied in multiple fields.  Here, vaccines are the driver for the statistical methods.”

For more about Fong’s work, see: http://labs.fhcrc.org/fong/index.html