Using data in common to discover rare variants

Science Spotlight

Using data in common to discover rare variants

From the Reiner Lab, Public Health Sciences Division.

Sept. 18, 2017

Genome-wide association studies (GWAS) have identified thousands of associations between common genetic variants and various human phenotypes. For many complex diseases or traits such as cancer or red blood cell counts, however, linking these general genetic associations to functional changes in the genes responsible remains elusive. One approach for identifying causal genes is evaluating the effects of rare coding variants within GWAS loci. In a recent article in PLoS Genetics, Dr. Alex Reiner and colleagues in the Public Health Sciences Division used this ‘rare coding variant association study’ approach to identify several variants and genes associated with hematological phenotypes such as red blood cell, white blood cell, and platelet counts.

Previous GWAS have mostly focused on evaluating single nucleotide polymorphisms (SNPs) – common genetic variants with a minor allele frequency (MAF) of at least 5%. “A major limitation of GWAS is that in most instances, these disease- or trait-associations fall short of identifying the actual causal genes,” said senior author Dr. Reiner. This is because most of these SNPs are non-coding and are in linkage disequilibrium with many other variants. As such, an important step in the clinical translation of GWAS findings is identifying the specific causal gene and functional variants driving the association.

The most straightforward strategy for linking genes with human phenotypes is to identify rare coding or splice site variants. Because these variants are rare (MAF <1%), however, much larger sample sizes are needed. In this study, the authors evaluated over 135,000 such variants for an association with 15 hematological traits in over 300,000 individuals. This large study was made possible by analyzing existing data from several existing large-scale efforts. Said Reiner, “the focus on rare coding genetic variants was made available through the ability to combine results from two recently published very large blood cell count consortia: the Blood-Cell Consortium (BCX) (see PMIDs 27346689, 27346686, 27346685) and phase 1 of the UK Biobank (see PMID 27863252). Separately, each of these prior studies was comprised of ~150,000 individuals.”

Boxplots of platelet aggregation

Differences in platelet aggregation for genotypes of the plasminogen (PLG) rare coding variant rs145535174. Box plots compare participants homozygous for the A allele with participants heterozygous with the rare variant allele G.

Image provided by Dr. Alex Reiner

By meta-analyzing across these large studies the authors were able to find 56 rare coding variants, including 31 variants not previously identified in other large-scale efforts. Said Reiner, “we were able to leverage a rare coding variant association strategy to pinpoint ~30 causal genes associated with red blood cell, white blood cell, and platelet counts. These included an association between a rare interleukin 33 (IL33) splice site variation and eosinophil count and asthma and hay fever risk, as well as a new rare missense variant in plasminogen (PLG) associated with platelet count and platelet reactivity.”

Importantly, many of these novel associations prioritize strong candidate genes at loci previously implicated by GWAS. Additional conditional analyses suggested that many of these variants are associated with blood-cell traits independently of previously identified variants. Studies such as this one, which narrow down GWAS signals to likely causal variants, will be key for both shedding light on disease pathophysiology and guiding the development of new therapeutic targets.

Moving forward, the authors plan to continue searching for variants by going even bigger. Said Reiner, “We are currently assembling such a collaborative effort across U.S., European, and Japanese investigators that is comprised of nearly 1 million individuals (~ 3 times larger than the current publication).  The new collaboration will also feature a larger number of U.S. minority participants (African ancestry, Hispanics/Latinos) to help elucidate genetic factors that underlie inter-ethnic differences in blood cell counts.”

These larger and more diverse studies should continue to uncover rare variants that contribute to the genetic architecture of complex human phenotypes. Said Reiner, “the use of very large sample sizes along with methodologic improvements in the accuracy of imputation for lower frequency genetic variants suggests that additional studies in even larger samples derived from population-based studies and biobank repositories, may continue to identify new variants (and causal genes) that contribute to the heritability of blood cell counts and related hematologic, oncologic, and immune-related disorders.”


Also contributing to this project from the Fred Hutch was Dr. Paul Auer.



Mousas A, Ntritsos G, Chen MH, Song C, Huffman JE, Tzoulaki I, Elliott P, Psaty BM; Blood-Cell Consortium, Auer PL, Johnson AD, Evangelou E, Lettre G, Reiner AP. Rare coding variants pinpoint genes that control human hematological traits. PLoS Genet 2017; 13(8):e1006925. doi: 10.1371/journal.pgen.1006925.


Funding for this study was provided by the National Heart Lung and Blood Institute, the National Institutes of Health, The Canadian Institutes of Health Research, and the “Stavros Niarchos” Foundation.