Featured Researchers

Paul Fearn, biomedical informatics and natural language processing

Finding the stories in medical data that help researchers forecast future health care

By Dr. Sabrina Richards

The path to finding the best treatments for patients may be found by weaving together even the smallest details and data about an individual, says Paul Fearn.

“Data is a story. Medical data is your health story that plays out over time,” explains Fearn, leader of the Hutch Integrated Data Repository and Archive (HIDRA) project at Fred Hutch and the Cancer Consortium.

Fearn is coming up with the best ways to make these stories intelligible to researchers by creating a database of clinical records, treatment outcomes and even tissue sample data from thousands of patients that researchers can stitch into a coherent tale to help them devise the best treatments for every cancer patient.

The Fred Hutch and University of Washington Cancer Consortium is “the best place to build something like this,” says Fearn, who is a graduate student in Biomedical and Health Informatics at University of Washington. In addition to the desire to rethink how cancer research is being done, the Consortium is also just the right size: enough patient data to underpin quality studies, but small enough that researchers across disciplines can collaborate on a database integration project the size of HIDRA.

HIDRA will integrate information from cancer care databases across Fred Hutch, University of Washington, Seattle Cancer Care Alliance, and Seattle Children’s Hospital into one tool available to both clinicians and researchers. A pilot project, focusing on brain tumors and dubbed Argos, will go live this summer. HIDRA will be fully operational and available to researchers in any discipline in 2017.

Despite his enthusiasm in tackling such a project, Fearn’s interest in biomedical informatics was captured almost by happenstance. On the cusp of graduating from the University of Houston with a Spanish degree, Fearn took a last-semester class on cancer and aging — and changed his goals completely.

“I’d never heard of public health,” Fearn remembers, but the class was enough to inspire him to enter the University of Texas’ Department of Public Health graduate program, despite his liberal arts background. He needed two calculus classes, one linear algebra class, and self-study during the summer to “make it through the program’s classes.”

But two years of analyzing data left Fearn more interested in the process of how data sets are stored before researchers can draw on them. This led him to “jump out” of his graduate program and into another self-teaching opportunity—building databases for medical researchers.

Initially relying on a book on “databases for dummies,” he began learning everything he could on how to build databases to help researchers house and analyze their data. Fearn’s statistical background and experience wading through public health data sets stood him in good stead, he remembers. “I knew what data needed to be in order to be analyzed.”

Fearn wanted databases to enable researchers to build predictive models of cancer to help physicians and patients weigh their treatment options. This culminated in the development of Caisis, an open-source database management system Fearn created with Mike Kattan at Memorial Sloan-Kettering Cancer Center in New York to help scientists see the health care stories in patients’ medical data — and use these stories to make better medical decisions.

But even after 10 years building databases at MSKCC, Fearn felt the need to learn more, and decided to shore up his education with a formal degree in biomedical informatics. Fearn landed at the University of Washington and Fred Hutch (an early adopter of Caisis) after a whirlwind tour of almost 70 cancer centers, which he visited to learn more about the “the big trends. I wanted to see the whole landscape” in biomedical informatics.

His studies focus on natural language processing, a method by which computers can pull information from patient records like the chart notes physicians write during treatment, allowing them to integrate information from many different types of data.

The Cancer Consortium at Fred Hutch and University of Washington is the perfect launching pad for a project the size and scope of HIDRA, says Fearn. HIDRA cannot be built without drawing on a wide variety of expertise, including statistics, medicine, genomics and computer science — all of which can be found here. A database like HIDRA, which, when completed, will be able to combine every kind of data, from information written by clinicians to the molecular data that geneticists pull from tissue samples, will solidify Fred Hutch’s status as a world-class cancer center.

Fred Hutch “is a place where I can learn, but also contribute to making one of the top cancer centers in the world,” says Fearn.

Related News

Paul Fearn

Photo by Robert Hood

HIDRA - Hutch Integrated Data Respository and Archive