Lack of diversity in genetic research a problem

Using primarily white populations to draw broad conclusions is misleading and exacerbates health disparities – but it’s fixable
Illustration of the "draft" book of human life
Illustration by Kimberly Carney / Fred Hutch News Service

When the Human Genome Project was completed back in 2003, its top researcher Dr. Francis Collins, now the head of the National Institutes of Health, referred to it as “the first draft of the human book of life.”

Collins, and science in general, have since acknowledged that it was a rough first draft since most of the contributions were “written” by people of European descent.

The lack of diversity in genetics research — recently called out in journals like Cell and covered on PBS — was highlighted again this week with a comprehensive multi-center analysis by a consortium of researchers, co-led by geneticists, epidemiologists and biostatisticians at Fred Hutchinson Cancer Research Center. Their findings were published Wednesday in the journal Nature.

The consortium, named PAGE (short for Population Architecture using Genomics and Epidemiology), analyzed the data of nearly 50,000 U.S. participants of non-European ancestry to determine, among other things, if the Human Genome Project’s “draft” results could be generalized across ancestral groups.

The short answer: They can’t.

This new analysis found even more evidence that large-scale genomic studies — used for everything from drug development to figuring out an individual’s disease risk — need to include diverse, multi-ethnic populations to accurately represent genetics-related disease risks in all populations. Not doing so is misleading, and potentially dangerous.

Fred Hutch's Dr. Ulrike "Riki Peters
Fred Hutch's Dr. Ulrike "Riki" Peters, a senior scientist on the PAGE project. The long-running multi-center research project highlighted the lack of diversity in genetic research in a new paper. Fred Hutch file photo

“Genetic research is predominantly conducted in European descent populations which leads to a bias in the genetic risk variants that have been identified,” said Dr. Ulrike “Riki” Peters, associate director of Fred Hutch’s Public Health Sciences Division and a senior scientist on the PAGE project. “We demonstrate the bias and we demonstrate that this can be corrected by studying non-European minority populations.”

The Hutch’s Dr. Chris Carlson, another senior author, said the PAGE team was basically trying to determine if current polygenic or genetic risk scores (a score based on genetic risk variants used to predict risk of disease) from people of European ancestry could be extrapolated accurately to minorities.  

“If you’re going to have next-generation medicine and derive polygenic risk scores, those risk scores should be equally accurate regardless of what an individual’s genetic ancestry is,” he said. “And they’re not.

"Our study proves with a large-scale real data analysis that these risk scores underperform in non-European populations. That’s what makes this paper important.”

Incomplete data … on everyone

The Human Genome Project was based on the genetic sequencing of a handful of volunteers, most of whom were from European descent, so it makes a certain amount of sense that the initial data was limited. From a scientific standpoint, though, it’s extremely problematic — especially as genome-wide association studies, or GWAS, continue to gather data primarily from the exact same population.

According to well-documented research in Nature and elsewhere, about 78 percent of data used in GWAS comes from people of European descent. But that particular group makes up only 16 percent of the global population.

Since our genome is constantly shifting and evolving in response to environmental and biological cues, there is always genetic variation from person to person and population to population. Some genetic variants are completely insignificant; others can have a profound effect on a person’s health (think single-gene, or Mendelian, disorders like Huntington’s disease).

More often, tiny changes in hundreds or even thousands of genes can add up to a risk for — or a protection from — a particular disease. GWAS studies, which identify gene mutations or variants involved in disease, are the foundation for investigations into the biology of complex traits, drug development and even clinical guidelines.

But if the genetic data used in these studies is limited to one population — people of European descent — then it’s missing a vast array of genetic variants, either because they’re absent in people of European descent or they’re present, but only at low frequencies. 

Dr. Charles Kooperberg
Dr. Charles Kooperberg, head of the Biostatistics Program at Fred Hutch, was another senior author on the newly published study. Fred Hutch file photo

That means therapies and drugs developed on the basis of those variants will most likely work best in people who share that same ancestry. And polygenic risk scores, used to compute our genetic risk for cardiovascular disease, diabetes, sickle cell anemia, cancer and other diseases, are less valuable — and less accurate — for large swaths of the population.

“Commercial DNA tests will tell you what your risk is for heart disease, ingrown toenails, or whatever, but those risk scores are based on the results from people of European descent,” said Dr. Charles Kooperberg, head of the Hutch’s Biostatistics Program and another senior author. “So the predictions are much more accurate for Europeans.”

Even more worrisome: That bias is now baked into the system and could harm even more people by exacerbating existing disease and health care disparities.

“Even though there’s a shared biology, the current models are imprecise,” said Hutch staff scientist Stephanie Bien, who also worked on the study. “And they’re more imprecise if you’re not of European ancestry. You have to study all populations to see things that are relevant in all populations.”

Completing the PAGE

Established a decade ago and funded by the National Institutes of Health’s National Human Genome Research Institute, the PAGE consortium pools large groups of study participants to extract high-powered findings regarding our “epidemiological architecture,” that is, who is more prone to what disease or health issue, or who might be protected from it, because of their unique genetic makeup.

PAGE used groups from a handful of large studies for this analysis, including the Women’s Health Initiative; the Hispanic Community Health Study / Study of Latinos (HCHS/SOL); the California- and Hawaii-sourced Multiethnic Cohort (MEC) and the BioMe™ BioBank.

All told, the group represented 22,216 self-identified Hispanic/Latinos; 17,299 African Americans; 4,680 Asians; 3,940 Native Hawaiians; 652 Native Americans and 1,052 individuals who self-identified as Other.  

The PAGE team ran a GWAS of 26 separate clinical and behavioral phenotypes, or traits, within their 50,000 multi-ethnic participants to see how each person’s genetic ancestry affected each one. The traits included everything from height to waist-to-hip ratio to fasting insulin level to white blood cell count to high- or low-density lipoprotein (aka HDL and LDL) to coffee consumption.

Using a tool they’d created known as a Multi-Ethnic Genotyping Array (MEGA), the researchers were able to gain a deeper biological understanding of the genetic underpinning of many complex diseases, including diabetes, stroke, obesity, and cardiovascular disease. They also created a blueprint for analyzing genetic associations in diverse populations moving forward and identified 27 new trait-variant associations.

“As we anticipated, by examining previously underrepresented populations, we found new ancestry-specific associations, which furthers our understanding of the genetic architecture of traits and underscores the importance of including diverse populations in these studies,” Peters said.

Staff scientist Stephanie Bien
Fred Hutch staff scientist Stephanie Bien Fred Hutch file photo

How diseases differ in populations

The scientists found one such variant in the gene HBB, which provides instructions for making beta-globin, a component of the larger protein hemoglobin. HBB is known for its role in sickle-cell anemia and for its ability to affect the performance of some HbA1c assays, used to test for glucose control in diabetes.

The researchers discovered the variant in Hispanic/Latinos, while before it had only been reported in African Americans.

This means tests run on Hispanic/Latinos with this variant could “potentially lead practitioners to incorrectly believe that a patient has achieved glucose control, increasing the risk of type 2 diabetes complications,” they wrote in their paper.

C-reactive protein or CRP — a biomarker found in blood that's used to detect, diagnose and treat various inflammation-related diseases and conditions (think infection, lupus, rheumatoid arthritis, etc.) — was another instance where results of common tests could be skewed in certain populations.

“Those with African ancestry are more likely to carry a genetic variant that lowers their level of C-reactive protein,” said Bien, the Hutch staff scientist. ”So you might be suffering from rheumatoid arthritis, but your CRP levels are below the diagnostic criteria, meaning you’re not going to be prescribed the medication you need. As with HBA1c, you might think you have no underlying disease. But you actually have a genetic variant that is masking or distorting this particular biomarker.”

In other words, incomplete GWAS studies are potentially hurting large segments of the population.

“Genome-wide studies with diverse populations can help level the playing field in clinical practice and expand the reach of precision medicine to individuals who otherwise would not be included,” said Kooperberg.

Fred Hutch's Dr. Chris Carlson
Polygenic risk scores "should be equally accurate regardless of what an individual’s genetic ancestry is,” said Fred Hutch's Dr. Chris Carlson, another senior author on the new paper. Fred Hutch file photo

Carlson pointed to the multi-trait results as a key example as to why accurate genetic data is crucial.

“Genetic prediction of cholesterol levels or height may not be critical in the clinic, because you can measure these traits quickly and cheaply,” he said. “But these traits provide insight into how well genetic risk models could perform for diseases that don’t currently have good biomarkers, ranging from autoimmune disease to cancer.”

Most of the genetic variants evaluated in the project were discovered in Europeans, so without further evaluation in non-Europeans, using these variants in a genetic model leads to weaker predictions in non-Europeans.

“Across 26 traits, on average, the effect size was about 58 percent in African Americans,” Carlson said. “So on average you’re predicting less accurately in this minority population than in European Americans.”

This is a bias that needs to be corrected and can be corrected, the researchers said.

Think: genetic continuum

As usual, science offers good perspective.

“In a modern diverse population like the U.S., genetic ancestry is a continuum,” Peters said. “You have to embrace that and utilize it to help uncover new scientific insights.”

Within this continuum, though, each of us is a unique individual, with our own individual risks.

“There are both social and genetic components to minority health disparities,” said Carlson. “But when it comes to the genetics of how we interpret a patient’s Hba1c, it’s not about whether you’re African-American or Hispanic or European. It’s about whether you’re a carrier for sickle cell.

“If we’re going to do individualized medicine, then we need to know which genetic variation matters,” he said. “And we need to study these genetic factors in all populations.”

Diane Mapes is a staff writer at Fred Hutchinson Cancer Center. She has written extensively about health issues for NBC News, TODAY, CNN, MSN, Seattle Magazine and other publications. A breast cancer survivor, she blogs at and tweets @double_whammied. Email her at Just diagnosed and need information and resources? Visit our Patient Care page.

Related News

All news
New data on cancer disparities in Washington Health equity, patient voices and patient-engaged research hot topics at Value in Cancer Care Summit May 20, 2019
New method quickly, precisely maps epigenome in single cells CUT&Tag speeds process of precisely locating molecules that turn genes on or off April 29, 2019
Progress in ‘precision prevention’ for colorectal cancer New risk prediction model — not yet ready for clinical use — incorporates genetic, lifestyle and environmental risk factors April 18, 2016

Help Us Eliminate Cancer

Every dollar counts. Please support lifesaving research today.