Put null hypothesis significance testing where it belongs – in the garbage can!

26 Jun, 2012 at 17:40 | Posted in Statistics & Econometrics | 2 Comments

A couple of weeks ago I had a very interesting luncheon discussion with professor Deirdre McCloskey on her controversy with Kevin Hoover on significance testing. It got me thinking about where the fetish status of significance testing comes from and why we are still teaching and practising it, despite its obvious inadequacies.

A non-trivial part of teaching statistics is made up of learning students to perform significance testing. A problem I have noticed repeatedly over the years, however, is that no matter how careful you try to be in explicating what the probabilities generated by these statistical tests – p values – really are, still most students misinterpret them.

A couple of years ago I gave a statistics course for the Swedish National Research School in History, and at the exam I asked the students to explain how one should correctly interpret p-values. Although the correct definition is p(data|null hypothesis), a majority of the students either misinterpreted the p value as being the likelihood of a sampling error (which of course is wrong, since the very computation of the p value is based on the assumption that sampling errors are what causes the sample statistics not coinciding with the null hypothesis) or that the p value is the probability of the null hypothesis being true, given the data (which of course also is wrong, since it is p(null hypothesis|data) rather than the correct p(data|null hypothesis)).

This is not to blame on students’ ignorance, but rather on significance testing not being particularly transparent (conditional probability inference is difficult even to those of us who teach and practice it). A lot of researchers fall pray to the same mistakes. So – given that it anyway is very unlikely than any population parameter is exactly zero, and that contrary to assumption most samples in social science and economics are not random or having the right distributional shape – why continue to press students and researchers to do null hypothesis significance testing, testing that relies on weird backward logic that students and researchers usually don’t understand?

Statistical significance doesn’t say that something is important or true. And since there already are far better and more relevant testing that can be done (see e. g. here and  here), it is high time to give up on this statistical fetish. 


  1. Baste Lars,

    Yes! In the Dark Ages in the 1960s when I was a student at Harvard I offered econometrics as one of my specialties for the PhD exam. I had taken three graduate courses from very good people, which was way above the average in those days (now, alas, it has become standard: over-education in merely one out of a dozen important techniques of empirical inquiry, others—not taught—being simulation, experiments, upper bound estimation, survey research, archival research, accounting, careful introspection, literature, philosophy). I remember so well when Dwight Perkins, an economic historian of China, who was assigned to examine me in econometrics, asked me to explain what statistical significance was. I couldn’t. It was crazy that I passed!

    Vänliga hälsningar,

    Deirdre McCloskey

  2. Dear Sir: As a PhD in molecular biology, who uses only the most elementary statistics (I mean, really basic) I’ve always thought that statistics seemed somehow to convoluted and complicated – there had to be a better way !!

    that we still have, in many disciplines, these basic issues is a symptom of that.
    But, nothing will change until there is a syllabus – which is sort of your job, no ?

    (I’m sure you are aware of Pauli’s famous comment, its not that old scientist unlearn the errors of the past, but that they die and young students learn the correct things..)

Sorry, the comment form is closed at this time.

Blog at WordPress.com.
Entries and comments feeds.