Mapping uncharted pieces of the human genome

A study published Thursday in the journal Science Advances sheds light on large chunks of genomic black holes by revealing the sequences of many human centromeres, the middle regions of our chromosomes essential for cell division. Graphic by Shutterstock

Nearly 12 years after it was first declared complete, the map of the human genome still contains vast stretches that may as well be labeled “Here Be Dragons.”

About 250 million of humankind’s 3 billion DNA letters remain unmapped, mysterious sequences tucked into gene-less gaps littered across our chromosomes.

A study published Thursday in the newly launched journal Science Advances sheds light on large chunks of those genomic black holes by revealing the sequences of many human centromeres, the middle regions of our chromosomes essential for cell division.

There are no genes to be found in those chromosomal waistbands, but don’t call them junk DNA, said Dr. Steven Henikoff, a basic sciences researcher at Fred Hutchinson Cancer Research Center who led the sequencing study. Centromeres are indispensable guideposts when cells split in two, providing an anchor point for cellular machinery to attach and pull as chromosomes duplicate and then segregate, one copy to each new cell.

“This is more than just the uncharted frontier, it happens to be a frontier that is vitally important to cells,” Henikoff said. “So you would think that we would understand it … I don’t think there’s anything else we do in basic science that is so familiar yet so unknown.”

A one-dimensional puzzle

Like the rest of the genome’s uncharted regions, human centromeres are made up of small stretches of DNA that are identical or nearly identical. This is a problem for researchers using conventional genome sequencing techniques, which break the genome into millions of little pieces and then piece it back together using the unique aspects of each little fragment to figure out where it fits in the whole.

Mapping the genome using those techniques is like doing a one-dimensional jigsaw puzzle, Henikoff said, gesturing with one hand over the other as he pieced together an imaginary stretch of DNA. “But if (the pieces) are identical, you can’t go any farther.”

With that method, researchers were able to capture sequences at the edges of centromeres but not the centromeres themselves. These edge regions are also made up of repeated stretches of DNA, tiny sequences that are dizzyingly arrayed in the thousands into larger patterns called higher order repeats.

The outer regions, which aren’t necessary for centromeres to work, have enough differences among the repeats that the jigsaw puzzle approach works – to a point. The closer you get to the actual centromere, the more identical the repeats become, and the technique starts to fall apart.

“That’s as far as we can travel into the black hole,” Henikoff said.

Dr. Steven Henikoff Photo by Suzie Fitzhugh for Fred Hutch

‘You can’t predict the middle from what’s on the edge’

Other research teams had modeled what human centromeres might look like based on their immediate flanking sequences.

“I didn’t believe that this approach would identify the functional centromere.” Henikoff said. He and his colleagues have spent many years studying how and why centromeres evolved to do their crucial job. “What we think we know about how centromeres have evolved would say that you can’t predict what’s in the middle from what’s on the edge.”

So he and his team decided to take a different approach. Ignoring the edges, they isolated certain proteins found only at centromeres from two different types of human cells, male and female, and looked at the DNA stuck to those centromere proteins in an unbiased way.

The term “unbiased” is important, Henikoff said: The researchers didn’t make any assumptions about what they might find before they found it. Such assumptions are common, and often useful, in DNA sequencing studies, but they didn’t apply here. The techniques the team used weren’t available when the human genome sequence was first published.

And what they found wasn’t exactly more of the same, as researchers had previously hypothesized by looking at the edge sequences. Actually, it was even more of the same.

Instead of a complex series of slightly different higher order repeats, as in the edges, Henikoff’s team found just two small stretches of DNA that repeat, over and over. Those paired repeats dominated centromeres in the male and female human cells the researchers tested. The team also found them in a publically available database of another complete human genome.

Precision where precision is required

Their findings showed that centromeres are even more uniformly repetitive than their flanking sequences. And that is almost certainly not an accident, Henikoff said. He thinks that what made the centromeres so hard to sequence in the first place is also what allows them to work so perfectly every time a cell divides.

“The way I interpret it is that chromosome segregation has to be as close to 100 percent as is physically possible,” Henikoff said. “Centromeres are really precise because they have to work so well.”

If chromosomes aren’t perfectly distributed when cells divide, the consequences are often dire. Depending on the cell type, the wrong number of chromosomes could trigger cancer or even kill the entire organism – miscarriages are often due to imperfect chromosome shuffling early in embryonic growth.

The researchers also found that the centromere proteins they’d used to access centromere sequences are precisely positioned along that repetitive stretch of DNA, together forming a protein-DNA unit that repeats at an exact frequency. That’s different from the higher order repeats just to the sides of centromeres, where there’s nearly no pattern in how similar proteins bind the DNA. When cells divide, the machinery responsible for partitioning chromosomes to each progeny cell attaches to those precisely repeated protein-DNA units – their regularity may be crucial for that process, Henikoff said.

The team’s findings not only help fill in the missing gaps of the human genome, they may help build better human artificial chromosomes, Henikoff said, very small engineered chromosomes with applications in basic and applied research.

Current artificial chromosomes were constructed with sequences from the edges of human centromeres and have high failure rates, Henikoff said. He thinks they may be able to lower those failure rates by using a true centromere’s sequence. His team is now testing out that idea.

But for now, he’s happy to have simply made some headway on a biological mystery.

“We’re thrilled about being able to understand something that I didn’t know we’d ever understand,” Henikoff said. “It makes sense, finally.”

Help Us Eliminate Cancer