Hutch Data Commonwealth
The term “big data” refers to large, complex data sets from genomics to social media that are extremely high-dimensional and include hundreds or even thousands of predictors. Such data sets can be unstructured and messy, and present challenges for traditional data processing and analytic procedures which may not scale to data of this size and complexity.
Data-driven science is science in which data is front and center. The scientific questions and hypotheses are either developed around a specific data resource or a new data resource may be developed so that the scientific questions can be addressed. With the evolving universe of biomedical big data, newly available data resources are enabling scientists to formulate novel scientific questions that might not have been imaginable previously.
Translational research is a term that was originally developed to connote the cycle of scientific findings from basic, preclinical experiments through to clinical and population studies. In the case of biomedical big data, translational research refers to the use of findings from predictive modeling to improve or enhance clinical care or population health.
Population science is the study of how behaviors, environment, and patterns of care affect population health. In the case of cancer, population science is the study of what causes cancer, and how interventions to prevent, detect and treat cancer affect patterns of disease incidence and mortality.
We use molecular research to refer to studies in which molecular-level data (such as genomic profiles, RNA sequencing or methylation data) are used to predict cancer outcomes.
The world of biomedical big data is rapidly changing and new data resources are rapidly becoming available. The skills necessary to access these data resources and make them useful for scientific research are universal across data types. A data hub will centralize these skills and provide a focus for data-driven science at Fred Hutch. In addition, the data hub will be a place for investigators to come to if they are considering developing or generating novel data resources.
Initially the Hutch Data Commonwealth will focus on internal, investigator-driven initiatives utilizing biomedical big data at Fred Hutch. Ultimately we envision the Hutch Data Commonwealth becoming a regional resource for biomedical data science in the Pacific Northwest.
Data at Fred Hutch exists within labs or projects and is managed in a decentralized fashion using customized procedures tailored to the needs each project. The HDC will offer tools and services to facilitate management of data at the project level, and will provide centralized data engineering and analytics skills for new, big data projects at Fred Hutch.
Currently, each lab and project at Fred Hutch acquires data in a customized way. HDC will offer a strategic approach to data acquisition, helping investigators to identify data resources that might be appropriate for a project, and assisting them with selecting and acquiring the right ones. If investigators need to develop their own data acquisition tool (e.g. using a mobile device to collect specific individual-level variables) HDC will assist them with planning and implementation.
The Hutch Data Commonwealth will bring together data scientists, data analysts, and software engineers in a novel cross-center organization to develop data resources and capabilities in collaboration with scientists from all Hutch Divisions. Visit our team site to learn more.
The HDC will bring data engineering, data integration, database development, natural language processing, machine learning, bioinformatics and medical informatics skills to Fred Hutch in a centralized infrastructure. Some of these competencies already exist at the center; the Hutch Data Commonwealth will expand our existing capabilities to provide a resource that will enable center investigators to develop new initiatives and compete for the many evolving opportunities in biomedical big data research.
The role of the steering committee is to ensure that the Hutch Data Commonwealth is closely connected with Fred Hutch investigators and serves the scientific mission of the center. We hope that the steering committee will be a source of ideas for new projects and initiatives that will help the HDC to evolve in a way that enhances the scope of scientific research conducted at Fred Hutch.
All data acquired by HDC will be fully compliant with all Fred Hutch IRB protocols and procedures including proper patient consenting procedures and confidentiality protection.
HDC will provide funding for pilot projects and training opportunities for center faculty, staff and postdocs and students.
Projects that acquire and leverage biomedical big data to provide answers to pressing questions in cancer research. Examples may include the deployment of tools for mobile health data collection, acquisition of medical record data in combination with patient-reported outcomes, or implementation of novel machine learning algorithms for integrative genomics analyses
Yes, here are some examples:
Hutch Integrated Data Repository and Archive (HIDRA) is a database that merges the thousands of medical records, databases and tissue inventories maintained by Cancer Consortium partners Fred Hutch, UW Medicine, Seattle Children’s and Seattle Cancer Care Alliance into a single system. These records contain a wealth of medical information, from patient background and diagnoses to treatment responses, and will eventually contain tumor genetic and molecular data.
ImmuneSpace is the data repository and analysis platform of the Human Immunology Project Consortium (HIPC). The HIPC program, funded by the NIH, is a multi-center collaborative effort to characterize the status of the immune system in different populations under diverse stimulations and disease states. This ongoing effort has generated large amounts of varied high-throughput, high-dimensional biological data (flow cytometry, CyTOF, RNA-Seq, Luminex, among others). All data generated to date by HIPC, along with other selected datasets generated by other NIAID funded projects, have been made publicly available through ImmuneSpace and are ready to be explored using visualization and analysis tools built in ImmuneSpace.
HICOR IQ is a database of population-based cancer incidence and survival information and insurance claims data. HICOR IQ contains enrollment and claims from Premera Blue Cross and Regence that is securely provided to HICOR and linked to the Cancer Surveillance System to incorporate clinical outcomes data.
The Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO) is a collaborative effort of researchers from North America, Australia, and Europe, using data from over 40,000 participants. The coordinating center for this international consortium is based at the Fred Hutchinson Cancer Research Center. GECCO aims to accelerate the discovery of colorectal cancer-related variants by replicating and characterizing Genome Wide Association Study (GWAS) findings, conducting a large-scale meta-analysis of existing and newly generated GWAS data, and investigating how genetic variants are modified by environmental risk factors.