A look at computational methods for analyzing large amounts of data with Raphael Gottardo

Vaccine and Infectious Disease Division

A look at computational methods for analyzing large amounts of data with Raphael Gottardo

Raphael Gottardo

VIDD associate member Dr. Raphael Gottardo came to the Pacific Northwest from Lyon, France, to complete a PhD in Statistics at the University of Washington. He spent the last five years in Canada, as an Assistant Professor of Statistics at the University of British Columbia in Vancouver, and leading a group of computational biology researchers at the Institut de Recherches Cliniques de Montréal, before moving back to Seattle in August to join VIDD. Photo by Phil Meadows.

Biologists are often faced with daunting amounts of numbers, and have to find ways to sift through these piles of data before being able to understand the results of their experiments.  In some cases, scientists turn to statistics to interpret large sets of results.  In other cases, the type of experiment necessitates some mathematical or computational manipulation before the results are even available.  New VIDD associate member Dr. Raphael Gottardo is developing computational tools to help in the latter case, methods that will help scientists better extract answers from their experiments.

“We’re trying to develop new tools to help scientists better analyze their data and answer biological questions in these very large data sets,” Gottardo said.

Specifically, his group focuses mainly on flow cytometry, an experimental technique that can detect subtle differences in a large group of mixed cell types or other particles by passing the cells in liquid one at a time past a source of light and an instrument that measures how that light bends around each passing cell.  VIDD and other immunology researchers rely heavily on flow cytometry in much of their work, as it is a useful technique to learn about the myriad different types of immune cells present in a given patient’s blood. 

Often, these cells are first marked with different fluorescent dyes that react only with specific proteins, and then the flow cytometer reads how many cells in the overall sample have that protein present.  Modern techniques allow scientists to use many different dyes that react to different wavelengths of light, enabling them to concurrently detect the presence of many types of immune or other cells in a sample.  The problem comes when “gating” the results, or determining from the raw data the machine produces which groups of cells truly have or don’t have that fluorescent marker.  Where to place these cut-offs is not always straightforward, Gottardo said, and different scientists may gate the same experiment in different ways.

Gottardo came to VIDD in August from a faculty position in the Department of Statistics at the University of British Columbia and is new to the field of infectious diseases. At VIDD, Gottardo makes use of his computational background to develop novel tools and methods for high throughout biological assays used in immunological and vaccine research, such as antigen microarrays and flow cytometry. One of his group’s main areas of focus is the development of automated algorithms that could be used to gate flow cytometry data. This approach is challenging in part because of the lack of precision.  “It’s hard to know if the tool works, because there’s no agreement among the scientists on what the correct answer should be,” Gottardo said. 

Additionally, since scientists using flow cytometry are traditionally used to gating their experiments by hand, some are resistant to trying out Gottardo’s computational methods.  However, as the technology advances, researchers are able to use many more types of dyes in a single experiment and search for many more types of cells, making the piles of resulting data much larger and the time to manually gate all the results longer.  Gottardo feels his methods could save scientists significant chunks of time.

To help convince scientists to try out his approach, Gottardo and colleagues from four other institutions are leading the Flow Cytometry Critical Assessment of Population Identification (FlowCAP) project. The goal of FlowCAP is to advance the development of computational methods for the identification of cell populations of interest in flow cytometry data.  FlowCAP will provide the means to objectively test these methods, first by comparison to manual analysis by experts using common datasets, and second by comparison to synthetic data sets having known properties.  Along with other FlowCAP committee members, Gottardo organized a special flow cytometry meeting this past September, sponsored by the National Institutes of Health.  The meeting allowed the computational scientists to demonstrate to other researchers how their methods compared to manual gating.  Gottardo and his colleagues collected many different flow cytometry data sets from past experiments and used the consensus of the manual gating on these sets to calibrate his computational tools, then allowed the flow cytometry experts to try their hand at gating the same sets in comparison to the automatic gating.  The meeting was a success, Gottardo said.  Many of the attendees were convinced that the computational methods were just as good as manual gating, but faster, and he drummed up so much interest in the methods that FlowCAP2 is already in the works for Summer 2011.

Gottardo stresses that these computational methods may never surpass what a trained human can do by hand, but the overall goal is to make the scientists’ lives easier. 

“What I’m hoping is that we’ll get to the level where people will use our algorithms, and they will still have to sit in front of the computer, but will only have to review a handful of the samples,” Gottardo said.  “Hopefully this will save them a lot of time at the end of the day.”


For more information on Gottardo’s work, visit the Gottardo lab website here.