Which causal relationships we see depend on which model we use and its conceptual/causal articulation; which model is bestdepends on our purposes and pragmatic interests.
Take the case of Simpson’s paradox, which can be described as the situation in which conditional probabilities (often related to causal relations) are opposite for subpopulations than for the whole population. Let academic salaries be higher for economists than for sociologists, and let salaries within each group be higher for women than for men. But let there be twice as many men than women in economics and twice as many women than men in sociology. By construction, the average salary of women is higher than that for men in each group; yet, for the right values of the different salaries, women are paid less on average, taking both groups together. [Example: Economics — 2 men earn 100$, 1 woman 101$; Sociology — 1 man earn 90$, 2 women 91$. Average female earning: (101 + 2×91)/3 = 94.3; Average male earning: (2×100 + 90)/3 = 96.6 — LPS]
An aggregate model leads to the conclusion that that being female causes a lower salary. We might feel an uneasiness with such a model, since I have already filled in the details that show more precisely why the result comes about. The temptation is to say that the aggregate model shows that being female apparently causes lower salaries; but the more refined description of a disaggregated model shows that really being female causes higher salaries. A true paradox, however, is not a contradiction, but a seeming contradiction. Another way to look at it is to say that the aggregate model is really true at that level of aggregation and is useful for policy and that equally true more disaggregated model gives an explanation of the mechanism behind the true aggregate model.
It is not wrong to take an aggregate perspective and to say that being female causes a lower salary. We may not have access to the refined description. Even if we do, we may as matter of policy think (a) that the choice of field is not susceptible to useful policy intervention, and (b) that our goal is to equalize income by sex and not to enforce equality of rates of pay. That we may not believe the factual claim of (a) nor subscribe to the normative end of (b) is immaterial. The point is that that they mark out a perspective in which the aggregate model suits both our purposes and the facts: it tells the truth as seen from a particular perspective.
Simpson’s paradox is an interesting paradox in itself. But it can also highlight a deficiency in the traditional econometric approach towards causality. Say you have 1000 observations on men and an equal amount of observations on women applying for admission to university studies, and that 70% of men are admitted, but only 30% of women. Running a logistic regression to find out the odds ratios (and probabilities) for men and women on admission, females seem to be in a less favourable position (‘discriminated’ against) compared to males (male odds are 2.33, female odds are 0.43, giving an odds ratio of 5.44). But once we find out that males and females apply to different departments we may well get a Simpson’s paradox result where males turn out to be ‘discriminated’ against (say 800 male apply for economics studies (680 admitted) and 200 for physics studies (20 admitted), and 100 female apply for economics studies (90 admitted) and 900 for physics studies (210 admitted) — giving odds ratios of 0.62 and 0.37).
Econometric patterns should never be seen as anything else than possible clues to follow. From a critical realist perspective it is obvious that behind observable data there are real structures and mechanisms operating, things that are — if we really want to understand, explain and (possibly) predict things in the real world — more important to get hold of than to simply correlate and regress observable variables.
Math cannot establish the truth value of a fact. Never has. Never will.
Individuals, households and firms behave so irrationally and their behaviour in groups is so little understood that it is hard to think of an economic law with any claim to universality. This is a strong statement. If the statement is true, this is unfortunate, not only for its own sake, but also because of its consequences. Let me briefly discuss one consequence of a universal law. The example is a very famous one taken from physics, a discipline where everything is easier than in applied econometrics, and it shows that when one law is right it can be used to find another one.
If we observe the moons of Jupiter over a long period of time, we discover that the moons are sometimes eight minutes ahead of time and sometimes eight minutes behind time where the time is calculated according to Newton’s laws. We also discover that the moons are ahead of time when Jupiter is close to the earth and behind when Jupiter is far away. If we believe Newton, then we must conclude that it takes light some time to travel from the moons of Jupiter to the earth and what we are looking at when we see the moons is not how they are now but how they were the time ago it took the light to get here. Olaus Rømer thus demonstrated in 1675 that light has a finite speed and one year later he estimated that speed at 214,300 km/s, a remarkable achievement, only about 30% too low.
Nothing of such sweeping beauty would ever be possible in econometrics. This is because people, firms, organizations, and their interactions at various levels of aggregation are so much richer and more interesting than planets and therefore, inevitably, much more difficult to model and predict …
Fortunately, there is still a lot of work to be done. Econometrics has had wonderful successes and econometric theory has developed with terrific speed and depth. Nevertheless, we have not been searching for the keys where we expect them to be. So, we find all sorts of interesting things under the lamppost: a coin, a piece of string, and so on, things that we proudly report on in our research papers, but we do not find the keys. This is depressing. There is only one solution, namely to move the lamppost. That done, econometricians can continue to make important contributions and eventually, perhaps, become respectable scientists.
What is 0.999 …, really?
It appears to refer to a kind of sum:
.9 + + 0.09 + 0.009 + 0.0009 + …
But what does that mean? That pesky ellipsis is the real problem. There can be no controversy about what it means to add up two, or three, or a hundred numbers. But infinitely many? That’s a different story. In the real world, you can never have infinitely many heaps. What’s the numerical value of an infinite sum? It doesn’t have one — until we give it one. That was the great innovation of Augustin-Louis Cauchy, who introduced the notion of limit into calculus in the 1820s.
The British number theorist G. H. Hardy … explains it best: “It is broadly true to say that mathematicians before Cauchy asked not, ‘How shall we define 1 – 1 – 1 + 1 – 1 …’ but ‘What is 1 -1 + 1 – 1 + …?'”
No matter how tight a cordon we draw around the number 1, the sum will eventually, after some finite number of steps, penetrate it, and never leave. Under those circumstances, Cauchy said, we should simply define the value of the infinite sum to be 1.
I have no problem with solving problems in mathematics by defining them away. But how about the real world? Maybe that ought to be a question to consider even for economists all to fond of uncritically following the mathematical way when applying their models to the real world, where indeed ‘you can never have infinitely many heaps.’
In econometrics we often run into the ‘Cauchy logic’ — the data is treated as if it were from a larger population, a ‘superpopulation’ where repeated realizations of the data are imagined. Just imagine there could be more worlds than the one we live in and the problem is fixed …
In practice Prof. Tinbergen seems to be entirely indifferent whether or not his basic factors are independent of one another … But my mind goes back to the days when Mr. Yule sprang a mine under the contraptions of optimistic statisticians by his discovery of spurious correlation. In plain terms, it is evident that if what is really the same factor is appearing in several places under various disguises, a free choice of regression coefficients can lead to strange results. It becomes like those puzzles for children where you write down your age, multiply, add this and that, subtract something else, and eventually end up with the number of the Beast in Revelation.
Prof. Tinbergen explains that, generally speaking, he assumes that the correlations under investigation are linear … I have not discovered any example of curvilinear correlation in this book, and he does not tell us what kind of evidence would lead him to introduce it. If, as he suggests above, he were in such cases to use the method of changing his linear coefficients from time to time, it would certainly seem that quite easy manipulation on these lines would make it possible to fit any explanation to any facts. Am I right in thinking that the uniqueness of his results depends on his knowing beforehand that the correlation curve must be a particular kind of function, whether linear or some other kind ?
Apart from this, one would have liked to be told emphatically what is involved in the assumption of linearity. It means that the quantitative effect of any causal factor on the phenomenon under investigation is directly proportional to the factor’s own magnitude … But it is a very drastic and usually improbable postulate to suppose that all economic forces are of this character, producing independent changes in the phenomenon under investigation which are directly proportional to the changes in themselves ; indeed, it is ridiculous. Yet this is what Prof. Tinbergen is throughout assuming …
Keynes’ comprehensive critique of econometrics and the assumptions it is built around — completeness, measurability, indepencence, homogeneity, and linearity — is still valid today.
Most work in econometrics is made on the assumption that the researcher has a theoretical model that is ‘true.’ But — to think that we are being able to construct a model where all relevant variables are included and correctly specify the functional relationships that exist between them, is not only a belief without support, it is a belief impossible to support.
The theories we work with when building our econometric regression models are insufficient. No matter what we study, there are always some variables missing, and we don’t know the correct way to functionally specify the relationships between the variables.
Every econometric model constructed is misspecified. There are always an endless list of possible variables to include, and endless possible ways to specify the relationships between them. So every applied econometrician comes up with his own specification and ‘parameter’ estimates. The econometric Holy Grail of consistent and stable parameter-values is nothing but a dream.
A rigorous application of econometric methods in economics really presupposes that the phenomena of our real world economies are ruled by stable causal relations between variables. Parameter-values estimated in specific spatio-temporal contexts are presupposed to be exportable to totally different contexts. To warrant this assumption one, however, has to convincingly establish that the targeted acting causes are stable and invariant so that they maintain their parametric status after the bridging. The endemic lack of predictive success of the econometric project indicates that this hope of finding fixed parameters is a hope for which there really is no other ground than hope itself.
The theoretical conditions that have to be fulfilled for econometrics to really work are nowhere even closely met in reality. Making outlandish statistical assumptions does not provide a solid ground for doing relevant social science and economics. Although econometrics have become the most used quantitative methods in economics today, it’s still a fact that the inferences made from them are as a rule invalid.
Econometrics is basically a deductive method. Given the assumptions it delivers deductive inferences. The problem, of course, is that we will never completely know when the assumptions are right. Conclusions can only be as certain as their premises — and that also applies to econometrics.
Reading an applied econometrics paper could leave you with the impression that the economist (or any social science researcher) first formulated a theory, then built an empirical test based on the theory, then tested the theory. But in my experience what generally happens is more like the opposite: with some loose ideas in mind, the econometrician runs a lot of different regressions until they get something that looks plausible, then tries to fit it into a theory (existing or new) … Statistical theory itself tells us that if you do this for long enough, you will eventually find something plausible by pure chance!
This is bad news because as tempting as that final, pristine looking causal effect is, readers have no way of knowing how it was arrived at. There are several ways I’ve seen to guard against this:
(1) Use a multitude of empirical specifications to test the robustness of the causal links, and pick the one with the best predictive power …
(2) Have researchers submit their paper for peer review before they carry out the empirical work, detailing the theory they want to test, why it matters and how they’re going to do it. Reasons for inevitable deviations from the research plan should be explained clearly in an appendix by the authors and (re-)approved by referees.
(3) Insist that the paper be replicated. Firstly, by having the authors submit their data and code and seeing if referees can replicate it (think this is a low bar? Most empirical research in ‘top’ economics journals can’t even manage it). Secondly — in the truer sense of replication — wait until someone else, with another dataset or method, gets the same findings in at least a qualitative sense. The latter might be too much to ask of researchers for each paper, but it is a good thing to have in mind as a reader before you are convinced by a finding.
All three of these should, in my opinion, be a prerequisite for research that uses econometrics …
Naturally, this would result in a lot more null findings and probably a lot less research. Perhaps it would also result in fewer attempts at papers which attempt to tell the entire story: that is, which go all the way from building a new model to finding (surprise!) that even the most rigorous empirical methods support it.
Good suggestions, but unfortunately there are many more deep problems with econometrics that have to be ‘solved.’
In econometrics one often gets the feeling that many of its practitioners think of it as a kind of automatic inferential machine: input data and out comes causal knowledge. This is like pulling a rabbit from a hat. Great — but first you have to put the rabbit in the hat. And this is where assumptions come in to the picture. The assumption of imaginary ‘superpopulations’ is one of the many dubious assumptions used in modern econometrics.
Misapplication of inferential statistics to non-inferential situations is a non-starter for doing proper science. And when choosing which models to use in our analyses, we cannot get around the fact that the evaluation of our hypotheses, explanations, and predictions cannot be made without reference to a specific statistical model or framework. The probabilistic-statistical inferences we make from our samples decisively depends on what population we choose to refer to. The reference class problem shows that there usually are many such populations to choose from, and that the one we choose decides which probabilities we come up with and a fortiori which predictions we make. Not consciously contemplating the relativity effects this choice of ‘nomological-statistical machines’ have, is probably one of the reasons econometricians have a false sense of the amount of uncertainty that really afflicts their models.
As economists and econometricians we have to confront the all-important question of how to handle uncertainty and randomness. Should we define randomness with probability? If we do, we have to accept that to speak of randomness we also have to presuppose the existence of nomological probability machines, since probabilities cannot be spoken of – and actually, to be strict, do not at all exist – without specifying such system-contexts. Accepting Haavelmo’s domain of probability theory and sample space of infinite populations – just as Fisher’s ‘hypothetical infinite population,’ von Mises’s ‘collective’ or Gibbs’s ‘ensemble’ – also implies that judgments are made on the basis of observations that are actually never made! Infinitely repeated trials or samplings never take place in the real world. So that cannot be a sound inductive basis for a science with aspirations of explaining real-world socio-economic processes, structures or events. It’s not tenable.
Economists — and econometricians — have (uncritically and often without arguments) come to simply assume that one can apply probability distributions from statistical theory on their own area of research. However, there are fundamental problems arising when you try to apply statistical models outside overly simple nomological machines like coin tossing and roulette wheels.
Of course one could arguably treat our observational or experimental data as random samples from real populations. But probabilistic econometrics does not content itself with that kind of populations. Instead it creates imaginary populations of ‘parallel universes’ and assume that our data are random samples from that kind of populations. But this is actually nothing but hand-waving! Doing econometrics it’s always wise to remember C. S. Peirce’s remark that universes are not as common as peanuts …
The results reported here suggest that an exam school education produces only scattered gains for applicants, even among students with baseline scores close to or above the mean in the target school. Because the exam school experience is associated with sharp increases in peer achievement, these results weigh against the importance of peer effects in the education production function …
Of course, test scores and peer effects are only part of the exam school story. It may be that preparation for exam school entrance is itself worth-while … The many clubs and activities found at some exam schools may expose students to ideas and concepts not easily captured by achievement tests or our post-secondary outcomes. It is also possible that exam school graduates earn higher wages, a question we plan to explore in future work. Still, the estimates reported here suggest that any labor market gains are likely to come through channels other than peer composition and increased cognitive achievement …
Our results are also relevant to the economic debate around school quality and school choice … As with the jump in house prices at school district boundaries, heavy rates of exam school oversubscription suggest that parents believe peer composition matters a great deal for their children’s welfare. The fact that we find little support for causal peer effects suggests that parents either mistakenly equate attractive peers with high value added, or that they value exam schools for reasons other than their impact on learning. Both of these scenarios reduce the likelihood that school choice in and of itself has strong salutary demand-side effects in education production.
Results based on one of the latest fads in econometrics — regression discontinuity design. If unfamiliar with the ‘technique,’ here’s a video giving some of the basics:
Frank randomly assigned the subjects to one of three diet groups. One group followed a low-carbohydrate diet. Another followed the same low-carb diet plus a daily 1.5 oz. bar of dark chocolate. And the rest, a control group, were instructed to make no changes to their current diet. They weighed themselves each morning for 21 days, and the study finished with a final round of questionnaires and blood tests …
Both of the treatment groups lost about 5 pounds over the course of the study, while the control group’s average body weight fluctuated up and down around zero. But the people on the low-carb diet plus chocolate? They lost weight 10 percent faster. Not only was that difference statistically significant, but the chocolate group had better cholesterol readings and higher scores on the well-being survey.
I know what you’re thinking. The study did show accelerated weight loss in the chocolate group—shouldn’t we trust it? Isn’t that how science works?
Here’s a dirty little science secret: If you measure a large number of things about a small number of people, you are almost guaranteed to get a “statistically significant” result. Our study included 18 different measurements—weight, cholesterol, sodium, blood protein levels, sleep quality, well-being, etc.—from 15 people. (One subject was dropped.) That study design is a recipe for false positives.
Think of the measurements as lottery tickets. Each one has a small chance of paying off in the form of a “significant” result that we can spin a story around and sell to the media. The more tickets you buy, the more likely you are to win. We didn’t know exactly what would pan out—the headline could have been that chocolate improves sleep or lowers blood pressure—but we knew our chances of getting at least one “statistically significant” result were pretty good.
Whenever you hear that phrase, it means that some result has a small p value. The letter p seems to have totemic power, but it’s just a way to gauge the signal-to-noise ratio in the data. The conventional cutoff for being “significant” is 0.05, which means that there is just a 5 percent chance that your result is a random fluctuation. The more lottery tickets, the better your chances of getting a false positive. So how many tickets do you need to buy?
P(winning) = 1 – (1 – p)^n
With our 18 measurements, we had a 60% chance of getting some“significant” result with p < 0.05. (The measurements weren’t independent, so it could be even higher.) The game was stacked in our favor.
It’s called p-hacking—fiddling with your experimental design and data to push p under 0.05—and it’s a big problem. Most scientists are honest and do it unconsciously. They get negative results, convince themselves they goofed, and repeat the experiment until it “works.” Or they drop “outlier” data points.
Statistical inferences depend on both what actually happens and what might have happened. And Bohannon’s (in)famous chocolate con more than anything else underscores the dangers of confusing the model with reality. Or as W.V.O. Quine had it:”Confusion of sign and object is the original sin.”
There are no such things as free-standing probabilities – simply because probabilities are strictly seen only defined relative to chance set-ups – probabilistic nomological machines like flipping coins or roulette-wheels. And even these machines can be tricky to handle. Although prob(fair coin lands heads|I toss it) = prob(fair coin lands head & I toss it)|prob(fair coin lands heads) may be well-defined, it’s not certain we can use it, since we cannot define the probability that I will toss the coin given the fact that I am not a nomological machine producing coin tosses.
No nomological machine – no probability.
What is distinctive about structural models, in contrast to forecasting models, is that they are supposed to be – when successfully supported by observation – informative about the impact of interventions in the economy. As such, they carry causal content about the structure of the economy. Therefore, structural models do not model mere functional relations supported by correlations, their functional relations have causal content which support counterfactuals about what would happen under certain changes or interventions.
This suggests an important question: just what is the causal content attributed to structural models in econometrics? And, from the more restricted perspective of this paper, what does this imply with respect to the interpretation of the error term? What does the error term represent causally in structural equation models in econometrics? And finally, what constraints are imposed on the error term for successful causal inference? …
I now consider briefly a key constraint that may be necessary for the error term to meet for using the model for causal inference. To keep the discussion simple, I look only at the simplest model
The obvious experiment that comes to mind is to vary x, to see by how much y changes as a result. This sounds straight forward, one changes x, y changes and one calculates α as follows.
α = ∆y/ ∆x
Everything seems straightforward. However there is a concern since u is unobservable: how does one know that u has not also changed in changing x? Suppose that u does change so that there is hidden in the change in y a change in u, that is, the change in y is incorrectly measured by
∆yfalse= ∆y + ∆u
And thus that α is falsely measured as
αfalse =∆yfalse/∆x = ∆y/ ∆x +∆u/ ∆x = α + ∆u/ ∆x
Therefore, in order for the experiment to give the correct measurement for α, one needs either to know that u has not also changed or know by how much it has changed (if it has.) Since u is unobservable it is not known by how much u has changed. This leaves as the only option the need to know that in changing x, u has not also been unwittingly changed. Intuitively, this requires that it is known that whatever cause(s) of x which are used to change x, not also be causes of any of the factors hidden in u …
More generally, the example above shows a need to constrain the error term in the equation in a non-simultaneous structural equation model as follows. It requires that each right hand variable have a cause that causes y but not via any factor hidden in the error term. This imposes a limit on the common causes the factors in the error term can have with those factors explicitly modelled …
Consider briefly the testability of the two key assumptions brought to light in this section: (i) that the error term denotes the net impact of a set of omitted causal factors and (ii) that the each error term have at least one cause which does not cause the error term. Given these assumptions directly involve the factors omitted in the error term, testing these empirically seems impossible without information about what is hidden in the error term. This places the modeller in a difficult situation, how to know that something important has not been hidden. In practice, there will always be element of faith in the assumptions about the error term, assuming that assumptions like (i) and (ii) have been met, even if it is impossible to test these conclusively.
In econometrics textbooks it is often said that the error term in the regression models used represents the effect of the variables that were omitted from the model. The error term is somehow thought to be a ‘cover-all’ term representing omitted content in the model and necessary to include to ‘save’ the assumed deterministic relation between the other random variables included in the model. Error terms are usually assumed to be orthogonal (uncorrelated) to the explanatory variables. But since they are unobservable, they are also impossible to empirically test. And without justification of the orthogonality assumption, there is as a rule nothing to ensure identifiability:
With enough math, an author can be confident that most readers will never figure out where a FWUTV (facts with unknown truth value) is buried. A discussant or referee cannot say that an identification assumption is not credible if they cannot figure out what it is and are too embarrassed to ask.
Distributional assumptions about error terms are a good place to bury things because hardly anyone pays attention to them. Moreover, if a critic does see that this is the identifying assumption, how can she win an argument about the true expected value the level of aether? If the author can make up an imaginary variable, “because I say so” seems like a pretty convincing answer to any question about its properties.
•Achen, Christopher (1982). Interpreting and using regression. SAGE
•Berk, Richard (2004). Regression Analysis: A Constructive Critique. SAGE
•Freedman, David (1991). ‘Statistical Models and Shoe Leather’. Sociological Methodology
•Kennedy, Peter (2002). ‘Sinning in the Basement: What are the Rules? The Ten Commandments of Applied Econometrics’. Journal of Economic Surveys
•Keynes, John Maynard (1939). ‘Professor Tinbergen’s method’. Economic Journal
•Klees, Steven (2016). ‘Inferences from regression analysis: are they valid?’ Real-World Economics Review
•Lawson, Tony (1989). ‘Realism and instrumentalism in the development of econometrics’. Oxford Economic Papers
•Leamer, Edward (1983). ‘Let’s take the con out of econometrics’. American Economic Review
•Lieberson, Stanley (1987). Making it count: the improvement of social research and theory. University of California Press
•Zaman, Asad (2012). ‘Methodological Mistakes and Econometric Consequences’. International Econometric Review