Circling back to RNA sequencing data to map recursive splice sites

From the Robert Bradley Lab, Public Health Sciences and Basic Sciences Divisions & Cancer Consortium

The human genome is just over 6 feet 8 inches long, which is 2 inches taller than the average NBA basketball player and in total, a lot of nucleotides! But what if the amount of sequence diversity packed into all that DNA isn’t enough? When DNA is transcribed into RNA, introns or “non-coding” regions are removed by a process called splicing and the exons or “coding” regions are stitched together to provide the sequence template for newly synthesized proteins. However, more than 95% of human genes undergo alternative splicing, which expands the diversity and complexity of our basketball player height equivalent genome even further! As complex organisms live and adapt, a rapid means of genetic variation is required for processes such as development and immune responses, but these events can also lead to disease, such as cancer. Thus, it is important to dissect the “how” and “where” of alternative splicing events to understand their effects on normal processes and disease states. The Bradley lab in the Public Health Sciences and Basic Sciences Divisions at Fred Hutchinson Cancer Center devised a plan to reanalyze RNA sequencing data to discover new recursive splice sites in human genes. Using their novel approach, they discovered 100 RNA regions with features of recursive splice sites and identified a unique location of such sites at the far end of exons, bypassing the proximal end of exons that is typically spliced during recursive events. Their findings were published recently in the journal of Life Science Alliance.

Recursive splicing is different than canonical splicing that removes complete introns in one splicing event. Instead, recursive splicing occurs in two or more steps: an initial splicing event that removes some sequence but regenerates one of the initial splice sites, and one or more subsequent splicing events to remove the remainder of the intron’s sequence. The hypothesis of “resplicing” or splicing the same RNA two or more times was initially tested in 1988, and studies to map recursive splice sites in human genes were conducted much later in 2015. “Much of the prior work in the field has centered on ultra-long introns where recursive splicing is thought to regulate the speed and accuracy of splicing,” shared Dr. Emma Hoppe, a recent Ph.D. graduate from the Bradley lab.

The Bradley lab was interested in charting new territory by now searching for recursive splice sites in shorter intron regions within human genes. Due to the connection between long introns and recursive splice sites and challenges with low sequencing depth for discovering these rare events, previous groups first enriched for ultra-long intron sequences. The quality of sequencing data and the depth of sequencing is continually improving and due to these advancements, the Bradley lab attempted to search for recursive splice sites in all intron sequences, although these events are likely infrequent. Like previous studies, they searched for the minimal 3’ splice site motif that would conjoin with a minimal GT sequence 5’ splice site for the second splicing event. The first attempt at identifying new recursive splice sites produced more sites than anticipated and had an extremely high false discovery rate (70.4%) as evaluated by including “decoy” splice sites in the analysis. It wasn’t until several filtering methods were applied to better specify sequence requirements of recursive splice sites that the researchers were able to improve the false discovery rate to 2.51% and uncover 100 likely recursive splice sites mapped to human genes. Excitingly, each of these sites differed from those mapped within long and ultra-long introns, providing an alternative method for mapping new recursive splice sites. Together, their arduous refinement process to tackle this uncharted territory provided an impressive expansion in mapped recursive splice sites in human genes.

In addition to mapping new recursive splice sites, the Bradley lab continued to characterize unique features of recursive splicing by expanding on previous observations. Dr. Hoppe shared some of their findings, “We identified a novel location of splicing at the far end of exons, which is sometimes chosen over the conventional 3’ splice site in front of the exon, causing the exon to be excluded from the mRNA transcript (model below). We [also] found that exons ending in a minimal 3’ splice site motif (YAG) have a dramatically lower inclusion rate on average, suggesting that such sites may play a role in regulating alternative splicing genome-wide.” Lastly, “our analyses uncovered a subgroup of these [YAG] sites where two exons exist adjacent to one another,” which is termed exonic recursive sites. These YAG subgroups have been described by other labs, but “our work provides further evidence of their usage in exon exclusion and suggests an intriguing path of exon birth/death in the genome.” In these cases, “an intronic recursive splice site may be paired with one or more splice sites that develop to generate a novel exon that selection may more easily act upon.”

Researchers discovered a new location of recursive splice sites that can lead to splicing at the distal side of the exon (distal recursive splicing) as opposed to the proximal side of the exon (proximal recursive splicing) to exclude the exon from the mature RNA.
Researchers discovered a new location of recursive splice sites that can lead to splicing at the distal side of the exon (distal recursive splicing) as opposed to the proximal side of the exon (proximal recursive splicing) to exclude the exon from the mature RNA. Image taken from primary publication

“Our work provides a framework for identifying recursively spliced lariats from bulk RNA sequencing with a low false discovery rate, allowing for the identification of recursive splice sites genome-wide with a high degree of confidence that can scale well with increasing sequencing depth and availability,” summarized Dr. Hoppe. Alternative filtering strategies will likely lead to the discovery of additional recursive splice sites. The use of the Bradley lab pipeline to analyze RNA sequencing data from other cell types and/or disease-linked samples will likely increase splice site discovery as well. Additionally, the Bradley lab’s novel finding of distal recursive splicing extends the known location of recursive splice sites which will aid in additional site mapping. “We hope our study spurs further work on splice site choice and the evolution of exons,” concluded Dr. Hoppe.

The spotlighted research was funded by the National Institutes of Health (NIH)/National Heart, Lung, and Blood Institute, NIH/National Cancer Institute, NIH/NIGMS, Blood Cancer Discoveries Grant Program through the Leukemia & Lymphoma Society, Mark Foundation for Cancer Research, Paul G Allen Frontiers Group, and the Department of Defense Breast Cancer Research Program.

Fred Hutch/University of Washington/Seattle Children's Cancer Consortium member Robert Bradley contributed to this work.

Hoppe ER, Udy DB, Bradley RK. 2023. Recursive splicing discovery using lariats in total RNA sequencing. Life Sci Alliance. 6(7):e202201889.