How statistics skew research results

19 Jun, 2012 at 23:23 | Posted in Statistics & Econometrics | 3 Comments

“Most patients using the new analgesia reported significantly reduced pain.”

Such research findings sound exciting because the word significant suggests important and large. But researchers often use the word with a narrow statistical meaning that has nothing to do with importance.

Consider this statement – a change is statistically significant if we are unlikely to get the observed results, assuming the treatment under study actually has no effect.

If you find that difficult to understand, you’re in good company. Statistical significance testing relies on weird backward logic, and there’s clear evidence that most students and many researchers don’t understand it.

Another problem is that statistical significance is very sensitive to how many people we observe. A small experiment studying only a few patients probably won’t identify even a large effect as statistically significant. On the other hand, a very large experiment is likely to label even a tiny, worthless effect as statistically significant.

For this and other reasons, it’s far better to avoid statistical significance as a measure and use estimation, an alternative statistical approach that’s well known, but sadly, little used.

Estimation tells us things such as “the average reduction in pain was 1.2 ± 0.5 points on the 10-point pain scale” (1.2 plus or minus 0.5). That’s far more informative than any statement about significance. And we can interpret the 1.2 (the average improvement) in clinical terms — in terms of how patients actually felt.

The “± 0.5” tells us the precision of our estimate. Instead of 1.2 ± 0.5 we could write 0.7 to 1.7. Such a range is called a confidence interval. The usual convention is to report 95% confidence intervals, which mean we can be 95% confident the interval includes the true average reduction in pain. That’s a highly informative summary of the findings.

We have published evidence that confidence intervals prompt better interpretation of research results than significance testing.

So why has statistical significance testing become entrenched in many disciplines, and why is it widely used in medicine and biosciences? One reason may be that saying something is significant strongly suggests importance, or even truth — even though statistical significance doesn’t tell us either.

Another possible reason is that confidence intervals are often embarrassingly wide. It’s hardly reassuring to report that the average improvement was 12 ± 9, or even 12 ± 15. But such wide intervals accurately report the large amount of uncertainty in research data.

Geoff Cumming


  1. […] there already are far better and more relevant testing that can be done (see e. g. here and  here)- it is high time to consider what should be the proper function of what has now really become a […]

  2. […] already are far better and more relevant testing that can be done (see e. g. my posts here and here) – it is high time to consider what should be the proper function of what has now really […]

  3. There are a lot of neglects in the use of statistical significance and Cumming points
    elegantly to some of them. The confidence interval is, as he states, often so wide that few researcher will report it. Then we still has the problem of presentation of results. Maybe it is then better to introduce the standard deviation as a possible way out. Its qualities must then be explained better than is now done, of course.

Sorry, the comment form is closed at this time.

Blog at
Entries and Comments feeds.