RR tracks: a new way to measure viral transmission

From the Bedford Lab, Vaccine and Infectious Disease, Public Health and Human Biology Divisions

Most infectious disease researchers are familiar with the story of how Dr. John Snow tracked a devastating cholera outbreak in London in 1854. By marking the houses impacted by cholera on a map, he found that most cases geographically clustered around a communal water pump on Broad Street. Cases from outside of this block were ultimately traced back to this same water pump, which had indeed become contaminated with Vibrio cholera. Shutting down this pump helped end the outbreak and ultimately paved the way for modern sanitation and public health measures we know today.

I like this story because it has a simple moral: when tracking infectious diseases, we often need to focus what is constant in order to find patterns that could otherwise be lost in a mess of variables.

This is the same principle that Dr. Cécile Tran-Kiem a post-doc in the Bedford Lab, is using to trace SARS-CoV-2 spread in Washington state. In a new publication in Nature, she and co-authors use geographic proximity of identical viral genomic sequences to understand viral transmission patterns.

One of the Bedford lab’s go-to tools is drawing phylogenetic trees to map viral genome diversity and mutations over time. However, phylogenetic trees are impractical for large-scale pathogen tracking because generating trees and inferring geographic patterns requires significant computational power, which limits how many viral sequences can be included in the analysis. Furthermore, the conclusions can be thrown off by uneven sampling in different locations.

So, instead of focusing on how and where the viral genome changes, the authors tracked spread by following identical SARS-CoV-2 sequences across Washington state. The principle is: a newly infected person is likely to have a genetically identical pathogen as the person who spread the bug, who is probably geographically nearby. After all, it’s not like SARS-CoV-2 mutates every time it transmits—in fact, mutations during acute infection are relatively rare. “It’s really about 1 mutation every 2 weeks along a transmission chain,” says Dr. Bedford.

Grouping viral sequences into clusters of identical sequences allowed the authors to capture epidemiologically linked infections. More detail on this clustering method and how it can be used to understand viral variant tracking can be found here.

To make these clusters, the authors used genomes obtained by comprehensive genomic surveillance in Washington State conducted by the WA Public Health Lab, UW Virology, and the Seattle Flu study. This genomic surveillance includes de-identified information on age, vaccination status, and geographic residence down to the zip code.

Out of more than 100,000 SARS-CoV-2 genomes analyzed from this project, there were 17,231 clusters made of two or more identical viral genomes. Mapping these clusters with collection date and location information allowed the researchers to understand how the clusters changed in space and time.

The clusters were then used to develop a metric dubbed relative risk (RR), which can be used to understand transmission between locations and subgroups within those locations. The RR is calculated by first measuring how many genetically identical sequence clusters are shared by the subgroups, then comparing this to how many would be statistically expected based on the sequencing effort between the sampled locations.

“A RR greater than 1 means that we are observing more sequences between the two subgroups we are looking at than expected from where sequences are coming from,” explains Dr. Tran-Kiem. “This could be within the same group (by looking at RRA,A) or between groups (by looking at RRA,B).”

Essentially, RR is a measure of how much identical sequences are enriched in groups A and B, “which we use to quantify how frequent transmission is,” says Dr. Tran-Kiem. For more about the RR method, read the explanation published on the Bedford Lab blog here.

Left: a map of Washington state with clusters of identical sequences represented by multicolored bubbles. Text reads: “Identical sequences reflect epidemiologically linked individuals.” Middle: clusters in counties are grouped into pie charts with colors that represent the proportion of each cluster. Text reads: “Their geographical clustering is imprinted by underlying patterns of spread.” Right: black lines of varying thickness are drawn between Washington counties to reflect transmission patters. Text reads: “This can be used to characterize transmission between groups.”
Geographical clustering of identical viral sequences allows comparison within and between spatial groups. Image adapted from original article.

One strength of the RR metric is a huge computational advantage: running an analysis on a test dataset of 1,300 sequences took a phylogenetic method between 1 and 24 days, but only 3 seconds using the RR framework. Another huge plus is that it accounts for sequencing effort, so it should no longer be impacted by sampling bias. In other words, you can use this RR measure to directly compare one county with high sequencing effort to a county with fewer sequences.

Using RR as a measure for transmission, the authors first analyzed the risk of transmission across different Washington state counties. They found that the RR was highest within counties than between counties, which intuitively makes sense considering that an identical sequence is most likely to spread within a geographically constrained community. Counties that were adjacent to one another had higher RR measures than those that were farther away, and RR tended to fall as distance increased.

An exception that proves the rule was found between counties on either side of the Cascades: despite being adjacent to one another on a map, Eastern and Western Washington counties had low pairwise RRs. This again makes sense given the physical barrier the Cascades pose to transmission. During different waves of SARS-CoV-2 outbreaks, the authors found with their RR metric that transmission overall flows from Western Washington to Eastern Washington, which is consistent with other published reports.

Where things get really interesting is looking at counties that don’t fit the usual patterns. For example, Mason County in Western Washington appears to share more clusters of identical viral sequences with both Franklin and Walla Walla counties—both more than 300 miles away and across the Cascades—than with counties adjacent to Mason. How could these three counties share so many similar viral sequences?

The authors hypothesize that male prisons may be one cause: the Mason County prison is an intake center and transfer hub with frequent prisoner and staff movement to and from Mason and Walla Walla counties. To confirm this hypothesis, they narrowed in on the postcodes where these prisons are located and found that the RRs between these postcodes are larger than for adjacent postcodes. This suggests that prisons may be an under-recognized network of SARS-CoV-2 transmission.

The authors did much more in the publication than I can cover in this article, including analyzing patterns of viral transmission across age groups over geographical space. However, there’s a lot more work that they’d like to do in the future. “We’re interested in pursuing method development leveraging identical sequences (or genetically proximal sequences in general) to both better account for asymmetry in transmission (and better be able to say something about the directionality of transmission) and to characterize which groups tend to contribute more to transmission,” Dr. Tran-Kiem says.

In the 171 years since John Snow tracked cholera through London, public health measures have evolved substantially. However, as the COVID pandemic showed, scientist have not fully solved the highly complex problem of how pathogens spread through a population. This latest work by Tran-Kiem co-authors helps reduce the number of variables in play and is an exciting addition to the field of molecular epidemiology.


Fred Hutch/University of Washington/Seattle Children’s Cancer Consortium Member Dr. Lea Starita contributed to this research.

The spotlighted research was funded by the National Institutes of Health, the Centers for Disease Control and Prevention, Gates Ventures, the Howard Hughes Medical Institute, Fast Grants, and the Washington Department of Health.

Tran-Kiem C, Paredes MI, Perofsky AC, Frisbie LA, Xie H, Kong K, Weixler A, Greninger AL, Roychoudhury P, Peterson JM, Delgado A, Halstead H, MacKellar D, Dykema P, Gamboa L, Frazar CD, Ryke E, Stone J, Reinhart D, Starita L, Thibodeau A, Yun C, Aragona F, Black A, Viboud C, Bedford T. Fine-scale patterns of SARS-CoV-2 spread from identical pathogen sequences. Nature. 2025 Apr;640(8057):176-185.

Hannah Lewis

Hannah Lewis is a postdoctoral research fellow with Jim Boonyaratanakornkit’s group in the Vaccine and Infectious Disease Division (VIDD). She is developing screens to find rare B cells that produce protective antibodies against human herpesviruses. She obtained her PhD in molecular and cellular biology from the University of Washington.