## Statistical Models and Causal Inference

11 Jun, 2012 at 19:33 | Posted in Statistics & Econometrics | Comments Off on Statistical Models and Causal Inference

Should we define randomness with probability? If we do, we have to accept that to speak of randomness we also have to presuppose the existence of nomological probability machines, since probabilities cannot be spoken of – and actually, to be strict, do not at all exist – without specifying such system-contexts. Accepting Haavelmo’s domain of probability theory and sample space of infinite populations– just as Fisher’s “hypothetical infinite population, of which the actual data are regarded as constituting a random sample”, von Mises’ “collective” or Gibbs’ ”ensemble” – also implies that judgments are made on the basis of observations that are actually never made!

Infinitely repeated trials or samplings never take place in the real world. So that cannot be a sound inductive basis for a science with aspirations of explaining real-world socio-economic processes, structures or events. It’s not tenable.

As David Salsburg once noted – in his lovely The Lady Tasting Tea – on probability theory:

[W]e assume there is an abstract space of elementary things called ‘events’ … If a measure on the abstract space of events fulfills certain axioms, then it is a probability. To use probability in real life, we have to identify this space of events and do so with sufficient specificity to allow us to actually calculate probability measurements on that space … Unless we can identify [this] abstract space, the probability statements that emerge from statistical analyses will have many different and sometimes contrary meanings.

Just as e. g. Keynes and Georgescu-Roegen, Salsburg is very critical of the way social scientists – including economists and econometricians – uncritically and without arguments have come to simply assume that one can apply probability distributions from statistical theory on their own area of research:

Probability is a measure of sets in an abstract space of events. All the mathematical properties of probability can be derived from this definition. When we wish to apply probability to real life, we need to identify that abstract space of events for the particular problem at hand … It is not well established when statistical methods are used for observational studies … If we cannot identify the space of events that generate the probabilities being calculated, then one model is no more valid than another … As statistical models are used more and more for observational studies to assist in social decisions by government and advocacy groups, this fundamental failure to be able to derive probabilities without ambiguity will cast doubt on the usefulness of these methods.

This importantly also means that if you cannot show that data satisfies all the conditions of the probabilistic nomological machine – including e. g. the distribution of the deviations corresponding to a normal curve – then the statistical inferences used, lack sound foundations.

In his great book Statistical Models and Causal Inference: A Dialogue with the Social Sciences David Freedman also touched on these fundamental problems, arising when you try to apply statistical models outside overly simple nomological machines like coin tossing and roulette wheels :

Of course, statistical models are applied not only to coin tossing but also to more complex systems. For example, “regression models” are widely used in the social sciences, as indicated below; such applications raise serious epistemological questions.

A case study would take us too far afield, but a stylized example – regression analysis used to demonstrate sex discrimination in salaries, adapted from (Kaye and Freedman, 2000) – may give the idea. We use a regression model to predict salaries (dollars per year) of employees in a firm from:

•education (years of schooling completed),

•experience (years with the firm),

•the dummy variable “man,” which takes the value 1 for men and 0 for women.

Employees are indexed by the subscript i; for example, salaryi; is the salary of the ith employee. The equation is

(3) salaryi = a + b x educationi + c x experiencei + d x mani + εi.

Equation (3) is a statistical model for the data, with unknown parameters a, b, c, d; here, a is the “intercept” and the others are “regression coefficients”; εi is an unobservable error term. … In other words, an employee’s salary is determined as if by computing

(4) n + b x education + c x experience + d x man,

then adding an error drawn at random from a box of tickets. The display (4) is the expected value for salary given the explanatory variables (education, experience, man); the error term in (3) represents deviations from the expected.

The parameters in (3) are estimated from the data using least squares. If the estimated coefficient d for the dummy variable turns out to be positive and “statistically significant” (by a “t-test”), that would be taken as evidence of disparate impact: men earn more than women, even after adjusting for differences in background factors that might affect productivity. Education and experience are entered into equation (3) as “statistical controls,” precisely in order to claim that adjustment has been made for differences in backgrounds.

…

The story about the error term – that the ε’s are independent and identically distributed from person to person in the data set – turns out to be critical for computing statistical significance. Discrimination cannot be proved by regression modeling unless statistical significance can be established, and statistical significance cannot be established unless conventional presuppositions are made about unobservable error terms.

Lurking behind the typical regression model will be found a host of such assumptions; without them, legitimate inferences cannot be drawn from the model. There are statistical procedures for testing some of these assumptions. However, the tests often lack the power to detect substantial failures. Furthermore, model testing may become circular; breakdowns in assumptions are detected, and the model is redefined to accommodate. In short, hiding the problems can become a major goal of model building.

Using models to make predictions of the future, or the results of interventions, would be a valuable corrective. Testing the model on a variety of data sets – rather than fitting refinements over and over again to the same data set – might be a good second-best … Built into the equation is a model for non-discriminatory behavior: the coefficient d vanishes. If the company discriminates, that part of the model cannot be validated at all.

Regression models like (3) are widely used by social scientists to make causal inferences; such models are now almost a routine way of demonstrating counterfactuals. However, the “demonstrations” generally turn out to depend on a series of untested, even unarticulated, technical assumptions. Under the circumstances, reliance on model outputs may be quite unjustified. Making the ideas of validation somewhat more precise is a serious problem in the philosophy of science. That models should correspond to reality is, after all, a useful but not totally straightforward idea – with some history to it. Developing appropriate models is a serious problem in statistics; testing the connection to the phenomena is even more serious.

In our days, serious arguments have been made from data. Beautiful, delicate theorems have been proved, although the connection with data analysis often remains to be established. And an enormous amount of fiction has been produced, masquerading as rigorous science.