Econometric causality and Simpson’s paradox5 December, 2016 at 18:20 | Posted in Statistics & Econometrics | Leave a comment
Which causal relationships we see depend on which model we use and its conceptual/causal articulation; which model is bestdepends on our purposes and pragmatic interests.
Take the case of Simpson’s paradox, which can be described as the situation in which conditional probabilities (often related to causal relations) are opposite for subpopulations than for the whole population. Let academic salaries be higher for economists than for sociologists, and let salaries within each group be higher for women than for men. But let there be twice as many men than women in economics and twice as many women than men in sociology. By construction, the average salary of women is higher than that for men in each group; yet, for the right values of the different salaries, women are paid less on average, taking both groups together. [Example: Economics — 2 men earn 100$, 1 woman 101$; Sociology — 1 man earn 90$, 2 women 91$. Average female earning: (101 + 2×91)/3 = 94.3; Average male earning: (2×100 + 90)/3 = 96.6 — LPS]
An aggregate model leads to the conclusion that that being female causes a lower salary. We might feel an uneasiness with such a model, since I have already filled in the details that show more precisely why the result comes about. The temptation is to say that the aggregate model shows that being female apparently causes lower salaries; but the more refined description of a disaggregated model shows that really being female causes higher salaries. A true paradox, however, is not a contradiction, but a seeming contradiction. Another way to look at it is to say that the aggregate model is really true at that level of aggregation and is useful for policy and that equally true more disaggregated model gives an explanation of the mechanism behind the true aggregate model.
It is not wrong to take an aggregate perspective and to say that being female causes a lower salary. We may not have access to the refined description. Even if we do, we may as matter of policy think (a) that the choice of field is not susceptible to useful policy intervention, and (b) that our goal is to equalize income by sex and not to enforce equality of rates of pay. That we may not believe the factual claim of (a) nor subscribe to the normative end of (b) is immaterial. The point is that that they mark out a perspective in which the aggregate model suits both our purposes and the facts: it tells the truth as seen from a particular perspective.
Simpson’s paradox is an interesting paradox in itself. But it can also highlight a deficiency in the traditional econometric approach towards causality. Say you have 1000 observations on men and an equal amount of observations on women applying for admission to university studies, and that 70% of men are admitted, but only 30% of women. Running a logistic regression to find out the odds ratios (and probabilities) for men and women on admission, females seem to be in a less favourable position (‘discriminated’ against) compared to males (male odds are 2.33, female odds are 0.43, giving an odds ratio of 5.44). But once we find out that males and females apply to different departments we may well get a Simpson’s paradox result where males turn out to be ‘discriminated’ against (say 800 male apply for economics studies (680 admitted) and 200 for physics studies (20 admitted), and 100 female apply for economics studies (90 admitted) and 900 for physics studies (210 admitted) — giving odds ratios of 0.62 and 0.37).
Econometric patterns should never be seen as anything else than possible clues to follow. From a critical realist perspective it is obvious that behind observable data there are real structures and mechanisms operating, things that are — if we really want to understand, explain and (possibly) predict things in the real world — more important to get hold of than to simply correlate and regress observable variables.
Math cannot establish the truth value of a fact. Never has. Never will.