What inferential leverage do statistical models provide?18 December, 2016 at 14:19 | Posted in Statistics & Econometrics | 10 Comments
Experimental (and non-experimental) data are often analyzed using a regression model of the form
Yi =a+bZi +Wiβ+εi,
where Wi is a vector of control variables for subject i, while a, b, and β are parameters (if Wi is 1×p, then β is p×1). The effect of treatment is measured by b. The disturbances εi would be assumed independent across subjects, with expectation 0 and constant variance. The Zi and Wi would also need to be independent of the disturbances (this is the exogeneity assumption).
Randomization guarantees that the Zi are independent of the Wi and εi . But why are Wi and εi independent? Why are the εi independent across subjects, with expectation 0 and constant variance? Replacing the indicator Zi for assignment by an indicator Xi for treatment received makes the model less secure: why is choice of treatment independent of the disturbance term? With observational data, such questions are even thornier. Of course, there are models with assumptions that are more general and harder to fathom. But that only postpones the reckoning. More-complicated questions can in turn be asked about more-complicated models …
With models, it is easy to lose track of three essential points: (i) results depend on assumptions, (ii) changing the assumptions in apparently innocuous ways can lead to drastic changes in conclusions, and (iii) familiarity with a model’s name is no guarantee of the model’s truth. Under the circumstances, it may be the assumptions behind the model that provide the leverage, not the data fed into the model. This is a danger with experiments, and even more so with observational studies.