Cleaning p-values

12 March, 2017 at 08:49 | Posted in Statistics & Econometrics | Comments Off on Cleaning p-values

The one place that preregistration is really needed … is if you want clean p-values. A p-value is very explicitly a statement about how you would’ve analyzed the data, had they come out differently. Sometimes when I’ve criticized published p-values on the grounds of forking paths, the original authors have fought back angrily, saying how unfair it is for me to first make an assumption about what they would’ve done under different conditions, and then make conclusions based on these assumptions. But they’re getting things backward: By stating a p-value at all, they’re the ones who are making a very strong assumption about their hypothetical behavior—an assumption that, in general, I have no reason to believe.

csm_Voranmeldung_shutterstock_114096628_acd2d87408Preregistration is in fact the only way to ensure that p-values can be taken at their nominal values. In that way, preregistration is like random sampling which, strictly speaking, is the only way that sampling probabilities, estimates, standard errors, etc., can be taken at their nominal values …

Yes, you can do surveys and get estimates and standard errors without ever taking a random sample … but to do this we need to make assumptions.

And, yes, you can do causal inference from observational studies—indeed, in many settings this is absolutely necessary—but, again, assumptions are needed …

Just as a serious social science journal—or even Psychological Science or PPNAS—would never accept a paper on sampling without some discussion of the representativeness of the sample, and just as they would never accept a causal inference based on a simple regression with no identification strategy and no discussion of imbalance between treatment and control groups, so should they not take seriously a p-value without a careful assessment of the assumptions underlying it.

Andrew Gelman

Advertisements

Blog at WordPress.com.
Entries and comments feeds.