Hutch Data Commonwealth

Frequently Asked Questions

General

What is big data?

The term “big data” refers to large, complex data sets from genomics to social media that are extremely high-dimensional and include hundreds or even thousands of predictors. Such data sets can be unstructured and messy, and present challenges for traditional data processing and analytic procedures which may not scale to data of this size and complexity.

Explain data-driven science.

Data-driven science is science in which data is front and center. The scientific questions and hypotheses are either developed around a specific data resource or a new data resource may be developed so that the scientific questions can be addressed. With the evolving universe of biomedical big data, newly available data resources are enabling scientists to formulate novel scientific questions that might not have been imaginable previously.

What is translational research?

Translational research is a term that was originally developed to connote the cycle of scientific findings from basic, preclinical experiments through to clinical and population studies. In the case of biomedical big data, translational research refers to the use of findings from predictive modeling to improve or enhance clinical care or population health.

What is population science?

Population science is the study of how behaviors, environment, and patterns of care affect population health. In the case of cancer, population science is the study of what causes cancer, and how interventions to prevent, detect and treat cancer affect patterns of disease incidence and mortality.

How is molecular research defined?

We use molecular research to refer to studies in which molecular-level data (such as genomic profiles, RNA sequencing or methylation data) are used to predict cancer outcomes.

Why is creating a data hub so important?

The world of biomedical big data is rapidly changing and new data resources are rapidly becoming available. The skills necessary to access these data resources and make them useful for scientific research are universal across data types. A data hub will centralize these skills and provide a focus for data-driven science at Fred Hutch. In addition, the data hub will be a place for investigators to come to if they are considering developing or generating novel data resources.

Fred Hutch / Hutch Data Commonwealth

Will it be open to scientists external to Fred Hutch?

Initially the Hutch Data Commonwealth will focus on internal, investigator-driven initiatives utilizing biomedical big data at Fred Hutch. Ultimately we envision the Hutch Data Commonwealth becoming a regional resource for biomedical data science in the Pacific Northwest.

How is data currently managed? How will HDC change this?

Data at Fred Hutch exists within labs or projects and is managed in a decentralized fashion using customized procedures tailored to the needs each project. The HDC will offer tools and services to facilitate management of data at the project level, and will provide centralized data engineering and analytics skills for new, big data projects at Fred Hutch.

What are the current data acquisition strategies? How will HDC change these?

Currently, each lab and project at Fred Hutch acquires data in a customized way. HDC will offer a strategic approach to data acquisition, helping investigators to identify data resources that might be appropriate for a project, and assisting them with selecting and acquiring the right ones. If investigators need to develop their own data acquisition tool (e.g. using a mobile device to collect specific individual-level variables) HDC will assist them with planning and implementation.

Who will comprise the Hutch Data Commonwealth?

The Hutch Data Commonwealth will bring together data scientists, data analysts, and software engineers in  a novel cross-center organization to develop data resources and capabilities in collaboration with scientists from all Hutch Divisions. Visit our team site to learn more.

What new competencies and collaborations will the HDC help develop?

The HDC will bring data engineering, data integration, database development, natural language processing, machine learning, bioinformatics and medical informatics skills to Fred Hutch in a centralized infrastructure. Some of these competencies already exist at the center; the Hutch Data Commonwealth will expand our existing capabilities to provide a resource that will enable center investigators to develop new initiatives and compete for the many evolving opportunities in biomedical big data research.

What is the role of the steering committee?

The role of the steering committee is to ensure that the Hutch Data Commonwealth is closely connected with Fred Hutch investigators and serves the scientific mission of the center. We hope that the steering committee will be a source of ideas for new projects and initiatives that will help the HDC to evolve in a way that enhances the scope of scientific research conducted at Fred Hutch.

How is the HDC protecting people from whom this medical data comes from?

All data acquired by HDC will be fully compliant with all Fred Hutch IRB protocols and procedures including proper patient consenting procedures and confidentiality protection.

How will HDC encourage participation in this initiative?

HDC will provide funding for pilot projects and training opportunities for center faculty, staff and postdocs and students.

What kind of projects would HDC engage in?

Projects that acquire and leverage biomedical big data to provide answers to pressing questions in cancer research. Examples may include the deployment of tools for mobile health data collection, acquisition of medical record data in combination with patient-reported outcomes, or implementation of novel machine learning algorithms for integrative genomics analyses

Are there current similar projects at the Hutch?

Yes, here are some examples:

Hutch Integrated Data Repository and Archive (HIDRA) is a database that merges the thousands of medical records, databases and tissue inventories maintained by Cancer Consortium partners Fred Hutch, UW MedicineSeattle Children’s and Seattle Cancer Care Alliance into a single system. These records contain a wealth of medical information, from patient background and diagnoses to treatment responses, and will eventually contain tumor genetic and molecular data.

ImmuneSpace is the data repository and analysis platform of the Human Immunology Project Consortium (HIPC). The HIPC program, funded by the NIH, is a multi-center collaborative effort to characterize the status of the immune system in different populations under diverse stimulations and disease states. This ongoing effort has generated large amounts of varied high-throughput, high-dimensional biological data (flow cytometry, CyTOF, RNA-Seq, Luminex, among others). All data generated to date by HIPC, along with other selected datasets generated by other NIAID funded projects, have been made publicly available through ImmuneSpace and are ready to be explored using visualization and analysis tools built in ImmuneSpace.

HICOR IQ is a database of population-based cancer incidence and survival information and insurance claims data. HICOR IQ contains enrollment and claims from Premera Blue Cross and Regence that is securely provided to HICOR and linked to the Cancer Surveillance System to incorporate clinical outcomes data.

The Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO) is a collaborative effort of researchers from North America, Australia, and Europe, using data from over 40,000 participants. The coordinating center for this international consortium is based at the Fred Hutchinson Cancer Research Center. GECCO aims to accelerate the discovery of colorectal cancer-related variants by replicating and characterizing Genome Wide Association Study (GWAS) findings, conducting a large-scale meta-analysis of existing and newly generated GWAS data, and investigating how genetic variants are modified by environmental risk factors.