Every fall, millions of people roll up their sleeves for the seasonal flu shot—and many wonder why the ritual is repeated year after year. The reason is not a short-lived immune memory, but a virus that is constantly changing, reshaping itself (and specifically the surface protein hemagglutinin), just enough to stay ahead of our defenses. Because vaccine strains must be selected months in advance, scientists are increasingly turning to evolutionary forecasting models that use viral genetic data to predict which influenza variants are most likely to succeed next.
Much like it can be tricky for meteorologists to say which day it will rain next week, viral evolution forecasting models play a delicate balancing act between making predictions far enough ahead to be useful and making those predictions with enough accuracy to be reliable. For the flu, rather than trying to extend their forecast beyond two weeks like a weather forecaster, they’re trying to get as far out as a year—the time needed to pick vaccine strains. But the farther out you go, the harder it is to get it right. Adding on extra layers of difficulty, it also matters how quickly viral sequence data gets shared: delays mean models are working with slightly outdated information, making predictions tougher.
A recent study in eLife from the Bedford lab tackles these challenges by systematically testing how prediction accuracy changes when forecasts are made closer to the flu season and when viral genome data are shared more quickly. Using influenza genetic data, the authors simulated forecasts based on the strains circulating at the time each prediction would have been made, then compared those predictions to what actually happened months later.
Driven by staff scientist Dr. John Huddleston, the study found that predicting closer to the flu season brought models much closer to real viral changes, and that faster data sharing mattered most when making these shorter-term forecasts. As Huddleston notes, these results raise new questions about geographical implications, such as “in which parts of the world faster turnaround [in viral sequencing] would most improve our estimates of global influenza populations,” and whether expanding sequencing capacity in key regions could have an outsized impact.
Forecasting models are designed to predict which current viral groups will grow or shrink in the population. These predictions depend on knowing how common each group is at the starting point—and that turns out to be surprisingly sensitive to delays in data sharing. The Bedford lab found that when viral sequences are submitted months late, fast-growing groups tend to be underestimated, sometimes by a wide margin. Cutting those delays from about three months to one month significantly reduced this bias and made frequency estimates more consistent. In other words, faster data sharing doesn’t just help forecasts in the long run—it improves the baseline picture of the virus that all predictions are built on.