Renowned ‘error-statistician’ Aris Spanos maintains — in a comment on this blog a couple of weeks ago — that Keynes’ critique of econometrics and the reliability of inferences made when it is applied, “have been addressed or answered.”
One could, of course, say that, but the valuation of the statement hinges completely on what we mean by a question or critique being ‘addressed’ or ‘answered’. As I will argue below, Keynes’ critique is still valid and unanswered in the sense that the problems he pointed at are still with us today and ‘unsolved.’ Ignoring them — the most common practice among applied econometricians — is not to solve them.
To apply statistical and mathematical methods to the real-world economy, the econometrician has to make some quite strong assumption. In a review of Tinbergen’s econometric work — published in The Economic Journal in 1939 — Keynes gave a comprehensive critique of Tinbergen’s work, focussing on the limiting and unreal character of the assumptions that econometric analyses build on:
Completeness: Where Tinbergen attempts to specify and quantify which different factors influence the business cycle, Keynes maintains there has to be a complete list of all the relevant factors to avoid misspecification and spurious causal claims. Usually this problem is ‘solved’ by econometricians assuming that they somehow have a ‘correct’ model specification. Keynes is, to put it mildly, unconvinced:
It will be remembered that the seventy translators of the Septuagint were shut up in seventy separate rooms with the Hebrew text and brought out with them, when they emerged, seventy identical translations. Would the same miracle be vouchsafed if seventy multiple correlators were shut up with the same statistical material? And anyhow, I suppose, if each had a different economist perched on his a priori, that would make a difference to the outcome.
Homogeneity: To make inductive inferences possible — and being able to apply econometrics — the system we try to analyse has to have a large degree of ‘homogeneity.’ According to Keynes most social and economic systems — especially from the perspective of real historical time — lack that ‘homogeneity.’ As he had argued already in Treatise on Probability (ch. 22), it wasn’t always possible to take repeated samples from a fixed population when we were analysing real-world economies. In many cases there simply are no reasons at all to assume the samples to be homogenous. Lack of ‘homogeneity’ makes the principle of ‘limited independent variety’ non-applicable, and hence makes inductive inferences, strictly seen, impossible since one its fundamental logical premisses are not satisfied. Without “much repetition and uniformity in our experience” there is no justification for placing “great confidence” in our inductions (TP ch. 8).
And then, of course, there is also the ‘reverse’ variability problem of non-excitation: factors that do not change significantly during the period analysed, can still very well be extremely important causal factors.
Stability: Tinbergen assumes there is a stable spatio-temporal relationship between the variables his econometric models analyze. But as Keynes had argued already in his Treatise on Probability it was not really possible to make inductive generalisations based on correlations in one sample. As later studies of ‘regime shifts’ and ‘structural breaks’ have shown us, it is exceedingly difficult to find and establish the existence of stable econometric parameters for anything but rather short time series.
Measurability: Tinbergen’s model assumes that all relevant factors are measurable. Keynes questions if it is possible to adequately quantify and measure things like expectations and political and psychological factors. And more than anything, he questioned — both on epistemological and ontological grounds — that it was always and everywhere possible to measure real-world uncertainty with the help of probabilistic risk measures. Thinking otherwise can, as Keynes wrote, “only lead to error and delusion.”
Independence: Tinbergen assumes that the variables he treats are independent (still a standard assumption in econometrics). Keynes argues that in such a complex, organic and evolutionary system as an economy, independence is a deeply unrealistic assumption to make. Building econometric models from that kind of simplistic and unrealistic assumptions risk to produce nothing but spurious correlations and causalities. Real-world economies are organic systems for which the statistical methods used in econometrics are ill-suited, or even, strictly seen, inapplicable. Mechanical probabilistic models have little leverage when applied to non-atomic evolving organic systems — such as economies.
Building econometric models can’t be a goal in itself. Good econometric models are means that make it possible for us to infer things about the real-world systems they ‘represent.’ If we can’t show that the mechanisms or causes that we isolate and handle in our econometric models are ‘exportable’ to the real-world, they are of limited value to our understanding, explanations or predictions of real-world economic systems.
The kind of fundamental assumption about the character of material laws, on which scientists appear commonly to act, seems to me to be much less simple than the bare principle of uniformity. They appear to assume something much more like what mathematicians call the principle of the superposition of small effects, or, as I prefer to call it, in this connection, the atomic character of natural law. The system of the material universe must consist, if this kind of assumption is warranted, of bodies which we may term (without any implication as to their size being conveyed thereby) legal atoms, such that each of them exercises its own separate, independent, and invariable effect, a change of the total state being compounded of a number of separate changes each of which is solely due to a separate portion of the preceding state. We do not have an invariable relation between particular bodies, but nevertheless each has on the others its own separate and invariable effect, which does not change with changing circumstances, although, of course, the total effect may be changed to almost any extent if all the other accompanying causes are different. Each atom can, according to this theory, be treated as a separate cause and does not enter into different organic combinations in each of which it is regulated by different laws …
The scientist wishes, in fact, to assume that the occurrence of a phenomenon which has appeared as part of a more complex phenomenon, may be some reason for expecting it to be associated on another occasion with part of the same complex. Yet if different wholes were subject to laws qua wholes and not simply on account of and in proportion to the differences of their parts, knowledge of a part could not lead, it would seem, even to presumptive or probable knowledge as to its association with other parts. Given, on the other hand, a number of legally atomic units and the laws connecting them, it would be possible to deduce their effects pro tanto without an exhaustive knowledge of all the coexisting circumstances.
Linearity: To make his models tractable, Tinbergen assumes the relationships between the variables he study to be linear. This is still standard procedure today, but as as Keynes writes:
It is a very drastic and usually improbable postulate to suppose that all economic forces are of this character, producing independent changes in the phenomenon under investigation which are directly proportional to the changes in themselves; indeed, it is ridiculous.
To Keynes it was a ‘fallacy of reification’ to assume that all quantities are additive (an assumption closely linked to independence and linearity).
The unpopularity of the principle of organic unities shows very clearly how great is the danger of the assumption of unproved additive formulas. The fallacy, of which ignorance of organic unity is a particular instance, may perhaps be mathematically represented thus: suppose f(x) is the goodness of x and f(y) is the goodness of y. It is then assumed that the goodness of x and y together is f(x) + f(y) when it is clearly f(x + y) and only in special cases will it be true that f(x + y) = f(x) + f(y). It is plain that it is never legitimate to assume this property in the case of any given function without proof.
J. M. Keynes “Ethics in Relation to Conduct” (1903)
And as even one of the founding fathers of modern econometrics — Trygve Haavelmo — wrote:
What is the use of testing, say, the significance of regression coefficients, when maybe, the whole assumption of the linear regression equation is wrong?
Real-world social systems are usually not governed by stable causal mechanisms or capacities. The kinds of ‘laws’ and relations that econometrics has established, are laws and relations about entities in models that presuppose causal mechanisms and variables — and the relationship between them — being linear, additive, homogenous, stable, invariant and atomistic. But — when causal mechanisms operate in the real world they only do it in ever-changing and unstable combinations where the whole is more than a mechanical sum of parts. Since statisticians and econometricians — as far as I can see — haven’t been able to convincingly warrant their assumptions of homogeneity, stability, invariance, independence, additivity as being ontologically isomorphic to real-world economic systems, Keynes’ critique is still valid . As long as — as Keynes writes in a letter to Frisch in 1935 — “nothing emerges at the end which has not been introduced expressively or tacitly at the beginning,” I remain doubtful of the scientific aspirations of econometrics.
In his critique of Tinbergen, Keynes points us to the fundamental logical, epistemological and ontological problems of applying statistical methods to a basically unpredictable, uncertain, complex, unstable, interdependent, and ever-changing social reality. Methods designed to analyse repeated sampling in controlled experiments under fixed conditions are not easily extended to an organic and non-atomistic world where time and history play decisive roles.
Econometric modeling should never be a substitute for thinking. From that perspective it is really depressing to see how much of Keynes’ critique of the pioneering econometrics in the 1930s-1940s is still relevant today.
The general line you take is interesting and useful. It is, of course, not exactly comparable with mine. I was raising the logical difficulties. You say in effect that, if one was to take these seriously, one would give up the ghost in the first lap, but that the method, used judiciously as an aid to more theoretical enquiries and as a means of suggesting possibilities and probabilities rather than anything else, taken with enough grains of salt and applied with superlative common sense, won’t do much harm. I should quite agree with that. That is how the method ought to be used.
Keynes, letter to E.J. Broster, December 19, 1939
A common goal of statistical analysis in the social sciences is to draw inferences about the relative contributions of different variables to some outcome variable. When regressing academic performance, political affiliation, or vocabulary growth on other variables, researchers often wish to determine which variables matter to the prediction and which do not—typically by considering whether each variable’s contribution remains statistically significant after statistically controlling for other predictors. When a predictor variable in a multiple regression has a coefficient that differs significantly from zero, researchers typically conclude that the variable makes a “unique” contribution to the outcome. And because measured variables are typically viewed as proxies for latent constructs of substantive interest—for example, two cognitive ability measures might be taken to index spatial versus verbal ability—it is natural to generalize the operational conclusion to the latent variable level; that is, to conclude that the latent construct measured by a given predictor variable itself has incremental validity in predicting the outcome, over and above other latent constructs that were examined.
Incremental validity claims pervade the social and biomedical sciences. In some fields, these claims are often explicit … More commonly, however, incremental validity claims are implicit—as when researchers claim that they have statistically “controlled” or “adjusted” for putative confounds—a practice that is exceedingly common in fields ranging from epidemiology to econometrics to behavioral neuroscience … The sheer ubiquity of such appeals might well give one the impression that such claims are unobjectionable, and if anything, represent a foundational tool for drawing meaningful scientific inferences.
Unfortunately, incremental validity claims can be deeply problematic. As we demonstrate below, even small amounts of error in measured predictor variables can result in extremely poorly calibrated Type 1 error probabilities. This basic problem has been discussed in a number of literatures—most extensively, in epidemiology and biostatistics, where concerns about incremental validity claims are often discussed under the heading of residual confounding, but also in fields ranging from psychology to education to econometrics. The common thread is that measurement unreliability and model misspecification will often have a deleterious and large effect on parameter estimates (and associated error rates) when covariates are entered into regression-based model. Consequently, under realistic assumptions, it can be shown that a large proportion of incremental validity claims in many disciplines are likely to be false …
In any given analysis, there is a simple fact of the matter as to whether or not the unique contribution of one or more variables in a regression is statistically significant when controlling for other variables; what room is there for inferential error? Trouble arises, however, when researchers behave as if statistical conclusions obtained at the level of observed measures can be automatically generalized to the level of latent constructs — a near-ubiquitous move, given that most scientists are not interested in prediction purely for prediction’s sake, and typically choose their measures precisely so as to stand in for latent constructs of interest. That is, researchers typically do not care to show that, say, school vouchers are associated with improved academic performance after controlling for a specific survey item asking about respondents’ income bracket; rather, the goal is to show that the vouchers may improve performance after accounting for the general construct of income (or, more generally, socioeconomic status).
The wide conviction of the superiority of the methods of the science has converted the econometric community largely to a group of fundamentalist guards of mathematical rigour. It is often the case that mathemical rigour is held as the dominant goal and the criterion for research topic choice as well as research evaluation, so much so that the relevance of the research to business cycles is reduced to empirical illustrations. To that extent, probabilistic formalization has trapped econometric business cycle research in the pursuit of means at the expense of ends.
Once the formalization attempts have gone significantly astray from what is needed for analysing and forecasting the multi-faceted characteristics of business cycles, the research community should hopefully make appropriate ‘error corrections’ of its overestimation of the power of a priori postulated models as well as its underestimation of the importance of the historical approach, or the ‘art’ dimension of business cycle research.
Duo Qin A History of Econometrics (OUP 2013)
The tool of statistical inference becomes available as the result of a self-imposed limitation of the universe of discourse. It is assumed that the available observations have been generated by a probability law or stochastic process about which some incomplete knowledge is available a priori …
It should be kept in mind that the sharpness and power of these remarkable tools of inductive reasoning are bought by willingness to adopt a specification of the universe in a form suitable for mathematical analysis.
Yes indeed — using statistics and econometrics to make inferences you have to make lots of (mathematical) tractability assumptions. And especially since econometrics aspires to explain things in terms of causes and effects, it needs loads of assumptions, such as e.g. invariance, additivity and linearity.
Limiting model assumptions in economic science always have to be closely examined since if we are going to be able to show that the mechanisms or causes that we isolate and handle in our models are stable in the sense that they do not change when we ‘export’ them to our ‘target systems,’ we have to be able to show that they do not only hold under ceteris paribus conditions. If not, they are of limited value to our explanations and predictions of real economic systems.
Unfortunately, real world social systems are usually not governed by stable causal mechanisms or capacities. The kinds of ‘laws’ and relations that econometrics has established, are laws and relations about entities in models that presuppose causal mechanisms being invariant, atomistic and additive. But — when causal mechanisms operate in the real world they mostly do it in ever-changing and unstable ways. If economic regularities obtain they do so as a rule only because we engineered them for that purpose. Outside man-made ‘nomological machines’ they are rare, or even non-existant.
So — if we want to explain and understand real-world economies we should perhaps be a little bit more cautious with using universe specifications ‘suitable for mathematical analysis’ …
Econometricians usually aim for unbiased estimates. And in econometrics textbooks you learn that if it’s not BLUE, it’s not good.
But if you really think about it, there is no real unbiased estimates. As soon as you weigh in the fact that in all econometric applications you always get your ‘unbiased’ estimates based on non-ideal randomized samples, measurement errors, non-additive and non-linear relationships, and so forth — well, then you realize there is no such a thing as ‘unbiasedness’ in the real-world.
And it’s even worse than this! ‘Randomistas’ are usually very keen to point out that their RCTs give results based on ‘unbiased’ estimator. But that doesn’t take us very far …
One should not jump to the conclusion that there is necessarily a substantive difference between drawing inferences from experimental as opposed to nonexperimental data …
In the experimental setting, the fertilizer treatment is “randomly” assigned to plots of land, whereas in the other case nature did the assignment … “Random” does not mean adequately mixed in every sample. It only means that on the average, the fertilizer treatments are adequately mixed …
Randomization implies that the least squares estimator is “unbiased,” but that definitely does not mean that for each sample the estimate is correct. Sometimes the estimate is too high, sometimes too low …
In particular, it is possible for the randomization to lead to exactly the same allocation as the nonrandom assignment … Many econometricians would insist that there is a difference, because the randomized experiment generates “unbiased” estimates. But all this means is that, if this particular experiment yields a gross overestimate, some other experiment yields a gross underestimate.
So — as soon as we realise that ‘unbiasedness’ is not the Holy Grail of econometrics, it’s easier to accept that it’s better get close to the truth with a biased estimator, than to be stuck with an ‘unbiased’ estimator that is typically not even close to truth.
It has long been recognized by some that when any parameter estimates are discarded, the sampling distribution of the remaining parameter estimates can be distorted …
For example, suppose the model a researcher selects depends on the day of the week. On Mondays it’s model A, on Tuesdays it’s model B, and so onup to seven diﬀerent models on seven diﬀerent days. Each model, therefore,is the “ﬁnal” model with a probability of 1/7th that has nothing to do with the values of the regression parameters. Then, if the data analysis happens to be done on a Thursday, say, it is the results from model D that are reported. All of the other model results that could have been reported are not. Those parameter estimates are summarily discarded …
Model selection is a procedure by which some models are chosen over others. But model selection is subject to uncertainty. Because regression parameter estimates depend on the model in which they are embedded, there is in post-model-selection estimates additional uncertainty not present when a model is speciﬁed in advance. The uncertainty translates into sampling distributions that are a mixture of distributions, whose properties can diﬀer dramatically from those required for convention statistical inference.
Modern econometrics is fundamentally based on assuming — usually without any explicit justification — that we can gain causal knowledge by considering independent variables that may have an impact on the variation of a dependent variable. This is however, far from self-evident. Often the fundamental causes are constant forces that are not amenable to the kind of analysis econometrics supplies us with. As Stanley Lieberson has it in Making It Count:
One can always say whether, in a given empirical context, a given variable or theory accounts for more variation than another. But it is almost certain that the variation observed is not universal over time and place. Hence the use of such a criterion first requires a conclusion about the variation over time and place in the dependent variable. If such an analysis is not forthcoming, the theoretical conclusion is undermined by the absence of information …
Moreover, it is questionable whether one can draw much of a conclusion about causal forces from simple analysis of the observed variation … To wit, it is vital that one have an understanding, or at least a working hypothesis, about what is causing the event per se; variation in the magnitude of the event will not provide the answer to that question.
Trygve Haavelmo was making a somewhat similar point back in 1941, when criticizing the treatmeant of the interest variable in Tinbergen’s regression analyses. The regression coefficient of the interest rate variable being zero was according to Haavelmo not sufficient for inferring that “variations in the rate of interest play only a minor role, or no role at all, in the changes in investment activity.” Interest rates may very well play a decisive indirect role by influencing other causally effective variables. And:
the rate of interest may not have varied much during the statistical testing period, and for this reason the rate of interest would not “explain” very much of the variation in net profit (and thereby the variation in investment) which has actually taken place during this period. But one cannot conclude that the rate of influence would be inefficient as an autonomous regulator, which is, after all, the important point.
This problem of ‘nonexcitation’ — when there is too little variation in a variable to say anything about its potential importance, and we can’t identify the reason for the factual influence of the variable being ‘negligible’ — strongly confirms that causality in economics and other social sciences can never solely be a question of statistical inference. Causality entails more than predictability, and to really in depth explain social phenomena requires theory.
Analysis of variation — the foundation of all econometrics — can never in itself reveal how these variations are brought about. First when we are able to tie actions, processes or structures to the statistical relations detected, can we say that we are getting at relevant explanations of causation. Too much in love with axiomatic-deductive modeling, neoclassical economists especially tend to forget that accounting for causation — how causes bring about their effects — demands deep subject-matter knowledge and acquaintance with the intricate fabrics and contexts. As already Keynes argued in his A Treatise on Probability, statistics and econometrics should primarily be seen as means to describe patterns of associations and correlations, means that we may use as suggestions of possible causal realations.
If contributions made by statisticians to the understanding of causation are to be taken over with advantage in any specific field of inquiry, then what is crucial is that the right relationship should exist between statistical and subject-matter concerns …
Where the ultimate aim of research is not prediction per se but rather causal explanation, an idea of causation that is expressed in terms of predictive power — as, for example, ‘Granger’ causation — is likely to be found wanting. Causal explanations cannot be arrived at through statistical methodology alone: a subject-matter input is also required in the form of background knowledge and, crucially, theory …
Likewise, the idea of causation as consequential manipulation is apt to research that can be undertaken primarily through experimental methods and, especially to ‘practical science’ where the central concern is indeed with ‘the consequences of performing particular acts’. The development of this idea in the context of medical and agricultural research is as understandable as the development of that of causation as robust dependence within applied econometrics. However, the extension of the manipulative approach into sociology would not appear promising, other than in rather special circumstances … The more fundamental difficulty is that, under the — highly anthropocentric — principle of ‘no causation without manipulation’, the recognition that can be given to the action of individuals as having causal force is in fact peculiarly limited.
Causality in social sciences — and economics — can never solely be a question of statistical inference. Causality entails more than predictability, and to really in depth explain social phenomena require theory. Analysis of variation — the foundation of all econometrics — can never in itself reveal how these variations are brought about. First when we are able to tie actions, processes or structures to the statistical relations detected, can we say that we are getting at relevant explanations of causation.
Most facts have many different, possible, alternative explanations, but we want to find the best of all contrastive (since all real explanation takes place relative to a set of alternatives) explanations. So which is the best explanation? Many scientists, influenced by statistical reasoning, think that the likeliest explanation is the best explanation. But the likelihood of x is not in itself a strong argument for thinking it explains y. I would rather argue that what makes one explanation better than another are things like aiming for and finding powerful, deep, causal, features and mechanisms that we have warranted and justified reasons to believe in. Statistical — especially the variety based on a Bayesian epistemology — reasoning generally has no room for these kinds of explanatory considerations. The only thing that matters is the probabilistic relation between evidence and hypothesis. That is also one of the main reasons I find abduction — inference to the best explanation — a better description and account of what constitute actual scientific reasoning and inferences.
For more on these issues — see the chapter “Capturing causality in economics and the limits of statistical inference” in my On the use and misuse of theories and models in economics.
In the social sciences … regression is used to discover relationships or to disentangle cause and effect. However, investigators have only vague ideas as to the relevant variables and their causal order; functional forms are chosen on the basis of convenience or familiarity; serious problems of measurement are often encountered.
Regression may offer useful ways of summarizing the data and making predictions. Investigators may be able to use summaries and predictions to draw substantive conclusions. However, I see no cases in which regression equations, let alone the more complex methods, have succeeded as engines for discovering causal relationships.
Some statisticians and data scientists think that algorithmic formalisms somehow give them access to causality. That is, however, simply not true. Assuming ‘convenient’ things like faithfulness or stability is not to give proofs. It’s to assume what has to be proven. Deductive-axiomatic methods used in statistics do no produce evidence for causal inferences. The real casuality we are searching for is the one existing in the real-world around us. If their is no warranted connection between axiomatically derived theorems and the real-world, well, then we haven’t really obtained the causation we are looking for.