If I asked you, dear reader, to come up with a shortlist of the most monumental scientific achievements of the last century, chances are the Human Genome Project (HGP) would appear somewhere on that list. While many know of this herculean effort—which in 2003 produced the first draft sequence of our genetic blueprints—the detail-oriented among us may be quick to point out that, technically speaking, the HGP didn’t sequence the entirety of the human genome! Certain complex regions of our genome evaded the genome sequencing technologies available at the time, such that the ‘completed’ draft human genome announced in 2003 more accurately represented roughly 92% of the entire human genome. To be sure, this was still a monumental result, especially considering that the portion that was sequenced contains virtually all of our protein-coding genes.
If you’re Dr. Andrew Stergachis—an associate professor in the Division of Medical Genetics at the University of Washington who is dedicated to studying how changes in noncoding genomic regions contribute to human disease—that 8% isn’t just ‘leftover.’ In a recent publication in Cell Genomics, Stergachis and colleagues combined a powerful chromatin profiling technique with new, complete genome sequencing data to discover exotic chromatin biology hiding in the most difficult-to-reach portions of our genomes. Their results have potentially far-reaching implications for our understanding of how our genomes are maintained, regulated, and replicated.
“This work was actually one of the first projects we started when I opened the lab around five years ago,” begins Stergachis. “At its core, it revolves around two technologies: one was a sequencing-based approach that we developed called Fiber-seq, which lets us determine the landscape of genome accessibility on a single DNA strand-level. The other was the advent of truly complete genome sequences produced by the Telomere-to-Telomere (T2T) Consortium, which for the first time gave us a look at the entire human genome, including those parts which previously evaded accurate sequencing.”
One might assume that the ~8% of ‘gaps’ left by the HGP are randomly dispersed throughout the genome and relatively unimportant, but nothing could be further from the truth. In fact, one major constituent of these gaps were centromeres—the regions smack dab in the middle of each of our chromosomes, which cells use as ‘handles’ to portion the replicated genome into each daughter cell during cell division. “Once the first complete human genome (called CHM13) was available, we applied our Fiber-seq technique to the same cell line, mapped the resulting genome accessibility to CHM13, and immediately noticed something very strange happening at the centromeres,” notes Stergachis.
What Stergachis and colleagues noticed was a strange pattern of genome accessibility that challenged the dogma of how genomes work. Classically, chromatin is thought to exist in one of two distinct states: euchromatin, which is accessible to genome-interacting proteins and generally transcriptionally active, and heterochromatin, which is dense, inaccessible, and generally not transcriptionally active. At single centromeric DNA strands, the team saw something in between: regions of ultra-dense, inaccessible chromatin punctuated with patches of open and accessible chromatin. Because this chromatin had features of both traditional chromatin types, they named it ‘dichromatin.’ Beyond being of mixed type, dichromatin was also heterogenous: different DNA strands (from the same genomic locus) showed different patterns of accessible and inaccessible chromatin, though the patches of accessible DNA clustered non-randomly within each individual DNA strand.