As a new year dawns in Seattle, we are witnessing an amazing convergence of biology, data science and technology.
I am convinced that concepts such as machine learning, deep neural networks, natural language processing and cloud computing will become increasingly familiar parts of the cancer conversation in 2019.
At Fred Hutchinson Cancer Research Center, I see how a new generation of computationally dense technologies is cataloging the working molecular components of cells that are involved in cancer and other life-threatening illnesses. From DNA and RNA sequencing to digital imaging, we are gathering vast amounts of data about health and disease. Information on the genes, proteins and processes involved in cancer and the immune system can be searched for previously hidden patterns — clues that can lead to cures.
In 2001, it cost $95 million to sequence a single human genome. Today, it can be done for about $1,200. It took roughly 12 years to sequence the first human genome; now, it only takes a day or two. These kinds of economies are occurring in every facet of human biology, expanding the use of high-throughput studies and unleashing a deluge of data. The sheer volume of this potentially lifesaving information is mind-boggling.
In the next decade, these data are going to transform cancer prevention, diagnosis and treatment. The American Cancer Society reported this week that the death rate from cancer in the United States has steadily declined over the past 25 years, thanks to a sharp drop in smoking and advances in early detection and treatment. This is a wonderful milestone and a compelling call to redouble our efforts, because we have reached an inflection point where immunotherapies and other curative approaches can make an even greater difference in the years to come.
When we talk of harnessing the immune system to fight cancer, much of what we need to know is discoverable in digital code from these next-generation lab tools. Our success depends on how well we learn to slice, store and interpret these data, which are rolling in by the terabyte.
To put that in perspective, a terabyte is roughly equivalent to a trillion keystrokes on a computer. It is enough data to stream 100 hours of high-definition movies to your television screen.
And today, a single cancer patient can generate a terabyte of data.
What a challenge. What an opportunity.
To meet that challenge, we at Fred Hutch are fortunate to live among the world’s leading experts in collecting, moving and analyzing large data sets. For practical reasons, this sort of computational power is migrating to the cloud, where data can be stored securely and economically in massive systems run by our neighbors in Seattle such as Microsoft and Amazon, among others. There, it can be pooled with comparable data sets from other research centers and analyzed with the kind of computational horsepower no single institution could afford today.
This confluence is happening right here, right now. Microsoft and Amazon are, of course, well-established in the neighborhood. And just a few hundred yards from our campus, Google is nearing completion of a Seattle headquarters that will focus on cloud computing.