Numbers don’t lie, but sometimes our brains do

Here’s an equation to solve: What do you get when you add research jargon with statistics and multiply by the human brain’s tricks of perception? The answer, for many of us, is confusion.

Statistics are written in the language of math, a subject in which many people have received an unfortunately insufficient education. But understanding statistics is critical to weighing everyday risks, interpreting medical information and (perhaps less life-or-death) winning arguments with friends about sports.

We spoke with three top statisticians at Fred Hutchinson Cancer Research Center about the statistics they often see misinterpreted by the public — and misreported in science news stories, which can spread misunderstandings far and wide.

Spinning Science

Overhyped headlines, snarled statistics lead readers astray

Read our special series of articles with tips and tools to help you better understand health risks, scientific research, statistics and clinical studies

Here are some of their insights along with resources that can help, including pro tips for those already more experienced with numbers.

Risk reporting: Absolute vs. relative risk

Many people have a hard time wrapping their heads around risks and probabilities, and this can threaten our health and safety.

Our brains are wired to downgrade familiar risks, such as the risk posed by traveling in a car, which most of us do every day without fear.

Weak fundamental math skills don’t help. Many people see a 10-in-100 chance as bigger than a one-in-10 chance, when in fact they are equivalent.

Research shows that even doctors frequently misinterpret risks directly relevant to their medical specialty. This misunderstanding can affect patient care, affecting a doctor’s likelihood of recommending particular treatments or tests.

While there are many ways to improve risk communication, here’s one simple one: use absolute risks, not just relative risks.

What does this mean? Let’s say that there is an updated medication that is effective for its intended purpose but doubles the risk of a potentially deadly side effect compared to the older version. Put another way, the risk of that dangerous side effect, on average, has increased 100%.

Sounds scary.

That is the relative risk — how the risk gets bigger or smaller in different situations. But what that example leaves out is the size of the risk to begin with.

The same risk in absolute terms: Among 7,000 people who take the older version of the medication, one of them will have the side effect. Among 7,000 people who take the newer version, two will have the side effect.

Grid of 7,000 dots animated to switch between 1 and 2 dots highlighted in black (all others are gray) — This 7,000-dot grid illustrates the difference between one and two in 7,000, which represents a small difference in absolute risk but a large difference in relative risk. Graphic by Jim Woolace / Fred Hutch

The same change in risk level looks quite different in absolute terms.

This example is real, by the way, and illustrates how risks communicated exclusively in relative terms can mislead. In the 1990s, researchers discovered that the risk of blood clots among women taking “third-generation” birth control pills was two in 7,000, a small increase in absolute risk over the one-in-7,000 chance among those taking second-generation pills. That finding was widely reported in relative, not absolute, terms. Due to this and other factors in how the findings were communicated, large numbers of women and girls, fearing clots, then switched to less-effective forms of contraception, which was followed by an uptick in pregnancies. (Ironically, out of 7,000 pregnant women, between 3.5 and 14 of them will develop blood clots before or after birth, and clots are a leading cause of death among pregnant and postpartum women.)

That’s not to say that a small change in absolute risk isn’t real, or isn’t important. But it is important to understand that risk means in context, so people can make better-informed medical decisions.

Fred Hutch biostatistician Dr. Ruth Etzioni is an expert in determining the benefits and harms of cancer screening tests, like PSA tests for prostate cancer and mammograms for breast cancer, a field in which communicating risk is particularly important. She emphasized that even something that changes the absolute risk of a disease a tiny bit can still be meaningful.

Photo of Ruth Etzioni speaking at a conference podium. "NWMBCC" appears on a screen behind her. — Fred Hutch faculty member and biostatistician Dr. Ruth Etzioni, shown here speaking at the Northwest Metastatic Breast Cancer Conference on Sept. 23, 2019, studies cancer screening and early detection. Photo by Robert Hood / Fred Hutch News Service

“A little plus a little plus a little … ” Etzioni said. “So I have my colonoscopy; I reduce my risk of colon cancer death by a little more, maybe 30%. And then I go on a health kick and I reduce my risk of cancer by more, absolutely; I reduce my risk of other things too.

“I think that the things that we do for our health, individually, the absolute might seem trivial, but they can add up.”

Pro tip: That adding-up effect, especially over time, is key for understanding the long-term impact of changes that alter a person’s lifetime risk of disease. Research studies, especially of prevention/risk-reduction and early-detection interventions, typically cite a time period over which researchers measured a change in risk, e.g., “Over X years, screening reduced the risk of death from cancer by Y amount.” Knowing that time period is critical to understanding an intervention's impact over the longer term of a lifetime.

‘Sophomore slump is a myth’

No matter what’s being measured, the numbers in the data set are almost never going to be the same.

Why are the numbers different? Is it because there’s some fundamental difference between the things being measured? Or are the differences just due to random chance?

We are hardwired to find patterns and make meaning. When it comes to making sense of data, this tendency can be problematic, said Fred Hutch biostatistician Dr. Peter Gilbert, who oversees statistical analyses of international trials of new HIV vaccines.

A key statistical concept he wishes people understood better is the idea of “regression to the mean.”

Gilbert illustrated it with an example — not from cancer or HIV research, but from baseball.

Don't take our word for it: Peruse the rookies' regression to the mean yourself at this table we compiled. Highlighted players are those whose batting averages increased the year following their rookie years.

Data sources: wikipedia.org; baseball-reference.com

Through 2018, there have been 30 non-pitcher MLB Rookies of the Year with rookie-season batting averages of an impressive .300 or higher. Among this group of stellar sluggers, all but four of them had lower batting averages the following year.

Choose any stat and pull out any other high-performing subgroup of these rookies, and this notorious pattern shows up again and again: the “sophomore slump.”

“It’s not a ‘sophomore slump,’” Gilbert said. “That’s a myth. It’s regression to the mean.”

Photo of Peter Gilbert speaking and gesturing at a screen full of graphs — Dr. Peter Gilbert, shown here at a meeting of the HIV Vaccine Trials Network on Oct. 4, 2016, is the principal investigator of the Statistical Data Management Center of the Fred Hutch-headquartered HVTN. Photo by Robert Hood / Fred Hutch News Service

That is, all these players have some real skill at the bat, and over a long time, their batting averages will reflect it. But fluctuation is to be expected: Even a team of clones would have a variety of batting averages. That’s driven by sheer random chance — colloquially known as luck.

Over the short span of two seasons, luck’s role is particularly apparent. These 30 standout players happened to have rookie years with batting averages on the high end of their natural fluctuation — luckier seasons. Thus, it’s likely their next years’ are lower, closer to what will end up being their career batting averages — a regression (return) to the mean (average).

Although random forces can trip us up, they also can be harnessed in the form of a randomized study, a powerful source of truth.

Randomized studies, in which patients are randomly assigned (for example) to receive Treatment A or Treatment B, provide the strongest form of scientific evidence there is — especially when they enroll hundreds or thousands of patients per group. You can feel good about trusting the results from that kind of study.

Pro tip: Fred Hutch statisticians often see randomized studies in which scientists try to make comparisons between nonrandomized subgroups of patients — and that can be a problem. For example, say a study is pulling out and comparing the quality of life of survivors a year out from experimental treatment. If half of patients on Treatment A survive to one year but only a tiny fraction of patients on Treatment B do, unknown factors may be driving any observed differences in quality of life between the survivor groups. Maybe the few survivors of Treatment B are unusually hearty, so that anything notable in their quality of life is due to that underlying heartiness rather than the treatment they received.

Mean vs. median

A mean, or average, is all the numbers in a data set added up, and then divided by how many numbers there are.

A median is a midpoint: the number right at the middle of a data set so that half the data points are bigger than it and half are smaller.

Researchers typically report their findings using a median rather than a mean, because any outliers or asymmetries in the data will pull the mean far to one side, giving a misleading summary of the overall data set. (Imagine measuring the mean and median wealth of a group of 10 people, nine of whom are baristas and one of whom is the Queen of England.)

Means, medians and summary statistics, oh my

In making sense of the data from a research study, basic tools are summary statistics — averages and other numbers that summarize the overall direction of the data and the basic differences between groups.

Fred Hutch statistician Dr. Mary Redman leads statistical analyses for large national clinical trials in lung cancer, so she keeps tabs on how clinical trial results get reported. “One thing I think is oftentimes misconstrued is what different summary measures mean,” Redman said.

For example, a clinical trial finds that people who receive a new treatment have a median survival time of one year. Redman will see that misinterpreted as: Everyone makes it to one year and then dies.

photo of Mary Redman sitting at a table and speaking to a person sitting across from her — Fred Hutch biostatistician Dr. Mary Redman speaks at a meeting of the Hutch's Lung Cancer Specialized Program of Research Excellence on Nov. 14, 2019. Photo by Robert Hood / Fred Hutch News Service

That’s extremely unlikely (see sidebar). Redman stressed that a median, or any one-number summary, isn’t enough to understand how a data set looks. Don’t focus on one number, and don’t trust stories about research that does. “Life is complicated,” Redman said.

This graph of three different data sets illustrates the problem:

This animated gif shows a graph of three different data sets displayed in histogram format. The green data set contains a wide spread of data points from below 0 to above 100, centered around 50. The dark blue set appears as a spike centered around 50. The teal set has a smaller spike around 48 with a few data trailing up to around 80. — Three data sets with the same median can look very different. In dark blue, all the data points are clustered close to the center: Most of the numbers in that set are at or close to 50. In green, the data points are spread out more or less evenly on both sides. In teal, there’s a cluster towards the center, with some tailing out toward the right. Graphic by Jim Woolace / Fred Hutch

All three data sets have the same median (50 in this case). But you can see how different they are in their overall shape, with dots clustered or spread out, arranged evenly or lopsidedly.

Why is this important? Let’s say the graphs above show how long patients survive a particular cancer. Each dot is now a real person. The farther the dot is to the right, the longer that person lives.

Now, let’s zoom in on one person.

After his diagnosis with the cancer mesothelioma in 1982, influential scientist and writer Dr. Stephen Jay Gould asked his doctor to point him to the best research studies about his cancer. She demurred. Gould went looking anyway. As he described in his essay “The Median Isn’t the Message,” Gould was blindsided by what he found:

“The literature couldn't have been more brutally clear: mesothelioma is incurable, with a median mortality of only eight months after discovery. I sat stunned for about fifteen minutes, then smiled and said to myself: So that's why they didn't give me anything to read. Then my mind started to work again, thank goodness.”

Gould then learned that survival from mesothelioma was more like the teal graph above: it had a long tail of fewer people who live years after diagnosis. And for a number of reasons, he had a decent chance of being in that group of longer-term survivors.

Gould’s scientific training helped him interpret the eight-month median — “… and I am convinced that it played a major role in saving my life,” he wrote. Propelled by the hope sparked by his knowledge, he enrolled on a clinical trial of a new treatment.

He died 20 years later, not of mesothelioma.

More statistics to watch out for

Here’s some more statistics Fred Hutch experts say to watch out for, questions to ask and resources that can help:

Look at the metric the researchers use to report the benefit of a treatment or another intervention. It’s worth taking a close look whether the metric is really measuring and asking if something else is going on. Example: a study measures whether Treatment A or Treatment B reduces cancer spread (metastasis). The results show that there are fewer patients with cancer metastasis on Treatment A compared to B. It’s tempting to conclude that Treatment A is better, but the data don’t prove that. Perhaps Treatment A is so toxic it kills many patients outright, before their cancer can spread.

Clinical studies use a slew of metrics for measuring patient outcomes that can be tricky to interpret. The health news criticism website healthnewsreview.org has guides for understanding types of metrics like composite endpoints, surrogate endpoints and more. And here’s a table compiled from FDA guidance, comparing common outcome measures used in cancer clinical trials, such as overall survival and progression-free survival.
What’s the population being studied? Are the patients enrolled in the study representative of real-life patients with the disease? The conclusions being drawn from the study — both by researchers and journalists — should be limited to the types of people included in the study.
“Statistically significant” does not mean “important.” (Similarly, a result that doesn’t reach statistical significance is not proof that the hypothesis is wrong — the study simply did not find evidence for it.) The science magazine Undark covered the problems with “statistical significance” and its interpretation recently.
Don’t get frustrated by studies with different numbers but similar conclusions. “When science is not giving a simple answer, it’s not the fault of science, it’s just that the topic is complicated,” Etzioni pointed out. There are different ways to do a study of the same topic that can lead to slightly different conclusions: X increases your risk of cancer by two times, by three times, by hardly anything at all. “You might get frustrated with the different numbers, but if they’re in the same direction, then maybe that’s what you want to take away,” Etzioni advised.

Savvy readers and science reporters: Do you have other go-to resources for understanding math and statistics used in research studies? Send them to me at skeown@fredhutch.org or Tweet them to me at @sejkeown and I might add links to them into this story.

Numbers don’t lie, but sometimes our brains do

Spinning Science

Risk reporting: Absolute vs. relative risk

‘Sophomore slump is a myth’

Mean vs. median

Means, medians and summary statistics, oh my

More statistics to watch out for

susan-keown_

reprint-republish

Related News

Help Us Eliminate Cancer