Statistics — a science in deep crisis

11 March, 2016 at 18:57 | Posted in Statistics & Econometrics | 3 Comments

As most of you are aware … there is a statistical crisis in science, most notably in social psychology research but also in other fields. For the past several years, top journals such as JPSP, Psych Science, and PPNAS have published lots of papers that have made strong claims based on weak evidence. Standard statistical practice is to take your data and work with it until you get a p-value of less than .05. Run a few experiments like that, attach them to a vaguely plausible (or even, in many cases, implausible) theory, and you got yourself a publication …

statistics-science-lieThe claims in all those wacky papers have been disputed in three, mutually supporting ways:

1. Statistical analysis shows how it is possible — indeed, easy — to get statistical significance in an uncontrolled study in which rules for data inclusion, data coding, and data analysis are determined after the data have been seen …

Researchers do cheat, but we don’t have to get into that here. If someone reports a wrong p-value that just happens to be below .05, when the correct calculation would give a result above .05, or if someone claims that a p-value of .08 corresponds to a weak effect, or if someone reports the difference between significant and non-significant, I don’t really care if it’s cheating or just a pattern of sloppy work.

2. People try to replicate these studies and the replications don’t show the expected results. Sometimes these failed replications are declared to be successes … other times they are declared to be failures … I feel so bad partly because this statistical significance stuff is how we all teach introductory statistics, so I, as a representative of the statistics profession, bear much of the blame for these researchers’ misconceptions …

3. In many cases there is prior knowledge or substantive theory that the purported large effects are highly implausible …

Researchers can come up with theoretical justifications for just about anything, and indeed research is typically motivated by some theory. Even if I and others might be skeptical of a theory such as embodied cognition or himmicanes, that skepticism is in the eye of the beholder, and even a prior history of null findings (as with ESP) is no guarantee of future failure: again, the researchers studying these things have new ideas all the time … I do think that theory and prior information should and do inform our understanding of new claims. It’s certainly relevant that in none of these disputed cases is the theory strong enough on its own to hold up a claim. We’re disputing power pose and fat-arms-and-political-attitudes, not gravity, electromagnetism, or evolution.

Andrew Gelman



  1. Is it statistics or psychology that is in a deep crisis?

  2. Even Milton Friedman knew that the adequacy of a theory must be judged by examining the concordance of its theory’s logical consequences with the phenomena the theory is designed to explain, not by assessing the realism of its assumptions.

  3. Standard hypothesis testing turns falsification on its head. Instead of testing the hypothesis in question, a different hypothesis is tested, and its disconfirmation is taken as evidence confirming the hypothesis in question. So it is, but confirmatory evidence is weak.

Sorry, the comment form is closed at this time.

Create a free website or blog at
Entries and comments feeds.