Significance testing and the real tasks of social science

21 Apr, 2022 at 11:36 | Posted in Statistics & Econometrics | 2 Comments

acAfter having mastered all the technicalities of regression analysis and econometrics, students often feel as though they are masters of the universe. I usually cool them down with the required reading of Christopher Achen’s modern classic Interpreting and Using Regression. It usually gets​ them back on track again, and they understand that

no increase in methodological sophistication … alter the fundamental nature of the subject. It remains a wondrous mixture of rigorous theory, experienced judgment, and inspired guesswork. And that, finally, is its charm.

And in case they get too excited about having learned to master the intricacies of proper significance tests and p-values, I ask them to also ponder on Achen’s apt warning:

Significance testing as a search for specification errors substitutes calculations for substantive thinking. Worse, it channels energy toward the hopeless search for functionally correct specifications and diverts​ attention from the real tasks, which are to formulate a manageable description of the data and to exclude competing ones.

2 Comments

  1. I pondered the Achen warning and found it misleadingly polemical and exclusionary, perhaps because it is out of context. Sensible analysts like D.R. Cox and George Box used P-values as one item among others to consider in forming judgements about whether to proceed with a model. Simply rejecting P-values because of their misuse seems as mindless as using them for model selection alone and mechanically based on unjustified cutoffs (which presumably is the problem Achen had in mind).

    Computing a P-value (the continuous observed “significance level”) does not forbid one from substantive thinking. On the contrary, a small P-value p for a model fit can alert one to problems with the model that might have otherwise gone unnoticed; their proper use requires understanding that a large p does not mean the model fits well or makes sense to use (a point made by Karl Pearson in 1906). Stated in the negative, the problems are from misuses of P-values, not intrinsic to P-values themselves, problems such as (a) fitting a model that does make sense and then mechanically rejecting it because p was small, instead of looking at the reason p was small; and (b) fitting a model that makes no sense, then using it because p was large. Either way, p does not address all aspects of the model fit, nor does it address whether the model makes sense contextually. But other model-checking procedures can (and usually should) take place in tandem, including graphical assessments and comparisons against external information.

    To illustrate, consider that we should expect large age effects for most diseases. Thus, if a disease model with age fits poorly, one reason might be that the age component is too simple to fit the disease pattern (I have seen this in real data, where the use of a single age term was clearly inadequate to either summarize the data pattern or control confounding by age); conversely, a model without age makes no sense and should be rejected even if the p for fit and for the age term are both large (for a real example see the neonatal death analysis in Greenland & Neutra (1980). Control of confounding in the assessment of medical technology. International Journal of Epidemiology 9, 361-367).

    • As you say, Sander, “p does not address all aspects of the model fit, nor does it address whether the model makes sense contextually. But other model-checking procedures can (and usually should) take place in tandem, including graphical assessments and comparisons against external information.” I do agree, and so would, I guess, Christopher. The problem with the p-value (and significance testing) is that it is too often used (at least in social science) as a substitute for substantive thinking, rather than as a complement. And as David Freedman once noticed: “Statistical significance is little more than technical jargon. Over the years, however, the jargon has acquired enormous — and richly undeserved — emotional power.”


Sorry, the comment form is closed at this time.

Blog at WordPress.com.
Entries and Comments feeds.