Regression analysis and randomisation distract us from the real scientific issues15 November, 2016 at 18:37 | Posted in Statistics & Econometrics | 1 Comment
In my view, regression models are not a particularly good way of doing empirical work in the social sciences today, because the technique depends on knowledge that we do not have. Investigators who use the technique are not paying adequate attention to the connection – if any – between the models and the phenomena they are studying. Their conclusions may be valid for the computer code they have created, but the claims are hard to transfer from that microcosm to the larger world …
Given the limits to present knowledge, I doubt that models can be rescued by technical fixes. Arguments about the theoretical merit of regression or the asymptotic behavior of specification tests for picking one version of a model over another seem like the arguments about how to build desalination plants with cold fusion and the energy source. The concept may be admirable, the technical details may be fascinating, but thirsty people should look elsewhere …
Causal inference from observational data presents many difficulties, especially when underlying mechanisms are poorly understood. There is a natural desire to substitute intellectual capital for labor, and an equally natural preference for system and rigor over methods that seem more haphazard. These are possible explanations for the current popularity of statistical models.
Indeed, far-reaching claims have been made for the superiority of a quantitative template that depends on modeling – by those who manage to ignore the far-reaching assumptions behind the models. However, the assumptions often turn out to be unsupported by the data. If so, the rigor of advanced quantitative methods is a matter of appearance rather than substance.
Freedman is absolutely spot on in his critique of how regression analysis has been applied in social sciences.
But a growing number of social scientists today seems to think that randomization may somehow solve the causality problems surrounding regression analysis and econometrics. By randomizing we are getting different ‘populations’ (‘treatment’ and ‘control’ groups) that are homogeneous in regards to all variables except the one we think is a genuine cause. In this way we are supposed to not have to actually know what all these other factors are.
If you succeed in performing an ideal randomization with different treatment groups and control groups that is attainable. But it presupposes that you really have been able to establish – and not just assume – that the probability of all other causes but the putative have the same probability distribution in the ‘treatment’ and ‘control’ groups, and that the probability of assignment to ‘treatment’ or ‘control’ groups are independent of all other possible causal variables.
Unfortunately, real experiments and real randomizations seldom or never achieve this. So, yes, we may do without knowing all causes, but it takes ideal experiments and ideal randomizations to do that, not real ones.
That means that in practice we have to have sufficient background knowledge to deduce causal knowledge. Without old knowledge, we can’t get new knowledge. No causes in, no causes out.
Conclusion — neither regression analysis, nor randomisation, are substitutes for doing real science.