The two statistics books every social scientist should read

6 Jul, 2022 at 10:08 | Posted in Statistics & Econometrics | 1 Comment

Mathematical statistician David Freedman‘s Statistical Models and Causal Inference (Cambridge University Press, 2010)  and Statistical Models: Theory and Practice (Cambridge University Press, 2009) are marvellous books. They ought to be mandatory reading for every serious social scientist — including economists and econometricians — who doesn’t want to succumb to ad hoc assumptions and unsupported statistical conclusions!

How do we calibrate the uncertainty introduced by data collection? Nowadays, this question has become quite salient, and it is routinely answered using wellknown methods of statistical inference, with standard errors, t -tests, and P-values … These conventional answers, however, turn out to depend critically on certain rather restrictive assumptions, for instance, random sampling …

Thus, investigators who use conventional statistical technique turn out to be making, explicitly or implicitly, quite restrictive behavioral assumptions about their data collection process … More typically, perhaps, the data in hand are simply the data most readily available …

The moment that conventional statistical inferences are made from convenience samples, substantive assumptions are made about how the social world operates … When applied to convenience samples, the random sampling assumption is not a mere technicality or a minor revision on the periphery; the assumption becomes an integral part of the theory …

In particular, regression and its elaborations … are now standard tools of the trade. Although rarely discussed, statistical assumptions have major impacts on analytic results obtained by such methods.

Invariance assumptions need to be made in order to draw causal conclusions from non-experimental data: parameters are invariant to interventions, and so are errors or their distributions. Exogeneity is another concern. In a real example, as opposed to a hypothetical, real questions would have to be asked about these assumptions. Why are the equations “structural,” in the sense that the required invariance assumptions hold true? Applied papers seldom address such assumptions, or the narrower statistical assumptions: for instance, why are errors IID?

The tension here is worth considering. We want to use regression to draw causal inferences from non-experimental data. To do that, we need to know that certain parameters and certain distributions would remain invariant if we were to intervene. Invariance can seldom be demonstrated experimentally. If it could, we probably wouldn’t be discussing invariance assumptions. What then is the source of the knowledge?

“Economic theory” seems like a natural answer, but an incomplete one. Theory has to be anchored in reality. Sooner or later, invariance needs empirical demonstration, which is easier said than done.

1 Comment

1. I am going to get these two books. I have been having fun with a simple coin-flipping study that has taken me down a statistical rabbit hole. I think I may be able to tie this back to economic modeling but need more perspective.
John Lounsbury.

Sorry, the comment form is closed at this time.