Significance tests and the quest for the Holy Grail of “true” models3 August, 2012 at 14:46 | Posted in Statistics & Econometrics | 5 Comments
After having mastered all the technicalities of regression analysis and econometrics, students often feel as though they are the masters of universe. I usually cool them down with a required reading of Christopher Achen‘s modern classic Interpreting and Using Regression.
It usually get them back on track again, and they understand that
no increase in methodological sophistication … alter the fundamental nature of the subject. It remains a wondrous mixture of rigorous theory, experienced judgment, and inspired guesswork. And that, finally, is its charm.
After giving a statistics course, I usually – at the exam – ask students to explain how one should correctly interpret p-values. Although the correct definition is p(data|null hypothesis), a majority of the students either misinterpreted the p-value as being the likelihood of a sampling error (which of course is wrong, since the very computation of the p-value is based on the assumption that sampling errors are what causes the sample statistics not coinciding with the null hypothesis) or that the p-value is the probability of the null hypothesis being true, given the data (which of course also is wrong, since it is p(null hypothesis|data) rather than the correct p(data|null hypothesis)).
This is not to blame on students’ ignorance, but rather on significance testing not being particularly transparent – conditional probability inference is difficult even to those of us who teach and practice it. A lot of researchers fall pray to the same mistakes. So – given that it anyway is very unlikely than any population parameter is exactly zero, and that contrary to assumption most samples in social science and economics are not random or having the right distributional shape – why continue to press students and researchers to do null hypothesis significance testing, testing that relies on weird backward logic that students and researchers usually don’t understand? And as Achen writes:
Significance testing as a search for specification errors substitutes calculations for substantive thinking. Worse, it channels energy toward the hopeless search for functionally correct specifications and divert attention from the real tasks, which are to formulate a manageable description of the data and to exclude competing ones.