Econometrics textbooks — vague and confused causal analysis25 December, 2016 at 12:17 | Posted in Statistics & Econometrics | 1 Comment
Econometric textbooks fall on all sides of this debate. Some explicitly ascribe causal meaning to the structural equation while others insist that it is nothing more than a compact representation of the joint probability distribution. Many fall somewhere in the middle – attempting to provide the econometric model with sufficient power to answer economic problems but hesitant to anger traditional statisticians with claims of causal meaning. The end result for many textbooks is that the meaning of the econometric model and its parameters are vague and at times contradictory …
The purpose of this report is to examine the extent to which these and other advances in causal modeling have benefited education in econometrics. Remarkably, we find that not only have they failed to penetrate the field, but even basic causal concepts lack precise definitions and, as a result, continue to be confused with their statistical counterparts.
Pearl’s and Chen’s article addresses two very important questions in the teaching of modern econometrics and its textbooks – how is causality treated in general, and more specifically, to what extent they use a distinct causal notation.
The authors have for years been part of an extended effort of advancing explicit causal modeling (especially graphical models) in applied sciences, and this is a first examination of to what extent these endeavours have found their way into econometrics textbooks.
Although the text partly is of a rather demanding ‘technical’ nature, I would definitely recommend it for reading, especially for social scientists with an interest in these issues.
Pearl’s seminal contribution to this research field is well-known and indisputable, but on the ‘taming’ and ‘resolve’ of the issues, I however have to admit that — under the influence of especially David Freedman — I still have some doubts on the reach, especially in terms of realism and relevance, of these solutions for social sciences in general and economics in specific. And with regards to the present article I think that since the distinction between the ‘interventionist’ E[Y|do(X)] and the more traditional ‘conditional expectationist’ E[Y|X] is so crucial for the subsequent argumentation, a more elaborated presentation had been of value, not the least because then the authors could also more fully explain why the first is so important and if/why this (in yours truly’s and Freedman’s view) can be exported from ‘engineer’ contexts where it arguably easily and universally apply, to ‘socio-economic’ contexts where ‘manipulativity’ and ‘modularity’ are not perhaps so universally at hand.
A popular idea in quantitative social sciences is to think of a cause (C) as something that increases the probability of its effect or outcome (O). That is:
P(O|C) > P(O|-C)
However, as is also well-known, a correlation between two variables, say A and B, does not necessarily imply that that one is a cause of the other, or the other way around, since they may both be an effect of a common cause, C.
In statistics and econometrics we usually solve this ‘confounder’ problem by ‘controlling for’ C, i. e. by holding C fixed. This means that we actually look at different ‘populations’ – those in which C occurs in every case, and those in which C doesn’t occur at all. This means that knowing the value of A does not influence the probability of C [P(C|A) = P(C)]. So if there then still exist a correlation between A and B in either of these populations, there has to be some other cause operating. But if all other possible causes have been ‘controlled for’ too, and there is still a correlation between A and B, we may safely conclude that A is a cause of B, since by ‘controlling for’ all other possible causes, the correlation between the putative cause A and all the other possible causes (D, E,. F …) is broken.
This is of course a very demanding prerequisite, since we may never actually be sure to have identified all putative causes. Even in scientific experiments may the number of uncontrolled causes be innumerable. Since nothing less will do, we do all understand how hard it is to actually get from correlation to causality. This also means that only relying on statistics or econometrics is not enough to deduce causes from correlations.
Some people think that randomization may solve the empirical problem. By randomizing we are getting different ‘populations’ that are homogeneous in regards to all variables except the one we think is a genuine cause. In that way we are supposed being able not having to actually know what all these other factors are.
If you succeed in performing an ideal randomization with different treatment groups and control groups that is attainable. But it presupposes that you really have been able to establish – and not just assume – that the probability of all other causes but the putative (A) have the same probability distribution in the treatment and control groups, and that the probability of assignment to treatment or control groups are independent of all other possible causal variables.
Unfortunately, real experiments and real randomizations seldom or never achieve this. So, yes, we may do without knowing all causes, but it takes ideal experiments and ideal randomizations to do that, not real ones.
That means that in practice we do have to have sufficient background knowledge to deduce causal knowledge. Without old knowledge, we can’t get new knowledge – and, no causes in, no causes out.
Econometrics is basically a deductive method. Given the assumptions (such as manipulability, transitivity, Reichenbach probability principles, separability, additivity, linearity etc) it delivers deductive inferences. The problem, of course, is that we will never completely know when the assumptions are right. Real target systems are seldom epistemically isomorphic to axiomatic-deductive models/systems, and even if they were, we still have to argue for the external validity of the conclusions reached from within these epistemically convenient models/systems. Causal evidence generated by statistical/econometric procedures may be valid in ‘closed’ models, but what we usually are interested in, is causal evidence in the real target system we happen to live in.
Advocates of econometrics want to have deductively automated answers to fundamental causal questions. But to apply ‘thin’ methods we have to have ‘thick’ background knowledge of what’s going on in the real world, and not in idealized models. Conclusions can only be as certain as their premises – and that also applies to the quest for causality in econometrics.