The desire in the profession to make universalistic claims following certain standard procedures of statistical inference is simply too strong to embrace procedures which explicitly rely on the use of vernacular knowledge for model closure in a contingent manner. More broadly, such a desire has played a vital role in the decisive victory of mathematical formalization over conventionally verbal based economic discourses as the proncipal medium of rhetoric, owing to its internal consistency, reducibility, generality, and apparent objectivity. It does not matter that [as Einstein wrote] ‘as far as the laws of mathematics refer to reality, they are not certain.’ What matters is that these laws are ‘certain’ when ‘they do not refer to reality.’ Most of what is evaluated as core research in the academic domain has little direct bearing on concrete social events in the real world anyway.
One may wonder how much calibration adds to the knowledge of economic structures and the deep parameters involved … Micro estimates are imputed in general equilibrium models which are confronted with new data, not used for the construction of the imputed parameters … However this procedure to impute parameter values into calibrated models has serious weaknesses …
Second, even where estimates are available from micro-econometric investigations, they cannot be automatically importyed into aggregated general equlibrium models …
Third, calibration hardly contributes to growth of knowledge about ‘deep parameters’. These deep parameters are confronted with a novel context (aggregate time-series), but this is not used for inference on their behalf. Rather, the new context is used to fit the model to presumed ‘laws of motion’ of the economy …
This leads to the fourth weakness. The combination of different pieces of evidence is laudable, but it can be done with statistical methods as well … This statistical approach has the advantage that it takes the parameter uncertainty into account: even if uncontroversial ‘deep parameters’ were available, they would have standard errors. Specification uncertainty makes things even worse. Negecting this leads to self-deception.
There are many kinds of useless economics held in high regard within mainstream economics establishment today. Few — if any — are less deserved than the macroeconomic theory/method — mostly connected with Nobel laureates Finn Kydland, Robert Lucas, Edward Prescott and Thomas Sargent — called calibration.
Hugo Keuzenkamp and yours truly are certainly not the only ones having doubts about the scientific value of calibration. In Journal of Economic Perspective (1996, vol. 10) Lars Peter Hansen and James J. Hickman writes:
It is only under very special circumstances that a micro parameter such as the inter-temporal elasticity of substitution or even a marginal propensity to consume out of income can be ‘plugged into’ a representative consumer model to produce an empirically concordant aggregate model … What credibility should we attach to numbers produced from their ‘computational experiments’, and why should we use their ‘calibrated models’ as a basis for serious quantitative policy evaluation? … There is no filing cabinet full of robust micro estimats ready to use in calibrating dynamic stochastic equilibrium models … The justification for what is called ‘calibration’ is vague and confusing.
Mathematical statistician Aris Spanos — in Error and Inference (Mayo & Spanos, 2010, p. 240) — is no less critical:
Given that “calibration” purposefully foresakes error probabilities and provides no way to assess the reliability of inference, how does one assess the adequacy of the calibrated model? …
The idea that it should suffice that a theory “is not obscenely at variance with the data” (Sargent, 1976, p. 233) is to disregard the work that statistical inference can perform in favor of some discretional subjective appraisal … it hardly recommends itself as an empirical methodology that lives up to the standards of scientific objectivity
In physics it may possibly not be straining credulity too much to model processes as ergodic – where time and history do not really matter – but in social and historical sciences it is obviously ridiculous. If societies and economies were ergodic worlds, why do econometricians fervently discuss things such as structural breaks and regime shifts? That they do is an indication of the unrealisticness of treating open systems as analyzable with ergodic concepts.
The future is not reducible to a known set of prospects. It is not like sitting at the roulette table and calculating what the future outcomes of spinning the wheel will be. Reading Lucas, Sargent, Prescott, Kydland and other calibrationists one comes to think of Robert Clower’s apt remark that
much economics is so far removed from anything that remotely resembles the real world that it’s often difficult for economists to take their own subject seriously.
Instead of assuming calibration and rational expectations to be right, one ought to confront the hypothesis with the available evidence. It is not enough to construct models. Anyone can construct models. To be seriously interesting, models have to come with an aim. They have to have an intended use. If the intention of calibration and rational expectations is to help us explain real economies, it has to be evaluated from that perspective. A model or hypothesis without a specific applicability is not really deserving our interest.
To say, as Edward Prescott that
one can only test if some theory, whether it incorporates rational expectations or, for that matter, irrational expectations, is or is not consistent with observations
is not enough. Without strong evidence all kinds of absurd claims and nonsense may pretend to be science. We have to demand more of a justification than this rather watered-down version of “anything goes” when it comes to rationality postulates. If one proposes rational expectations one also has to support its underlying assumptions. None is given, which makes it rather puzzling how rational expectations has become the standard modeling assumption made in much of modern macroeconomics. Perhaps the reason is, as Paul Krugman has it, that economists often mistake
beauty, clad in impressive looking mathematics, for truth.
But I think Prescott’s view is also the reason why calibration economists are not particularly interested in empirical examinations of how real choices and decisions are made in real economies. In the hands of Lucas, Prescott and Sargent, rational expectations has been transformed from an – in principle – testable hypothesis to an irrefutable proposition. Believing in a set of irrefutable propositions may be comfortable – like religious convictions or ideological dogmas – but it is not science.
Tinbergen’s results cannot be judged by ordinary tests of statistical significance. The reason is that the variables with which he winds up, the particular series measuring these variables, the leads and lags, and various other aspects of the equations besides the particular values of the parameters (which alone can be tested by the usual statistical technique) have been selected after an extensive process of trial and error because they yield high coefficients of correlation. Tinbergen is seldom satisfied with a correlation coefficient less than 0.98. But these attractive correlation coefficients create no presumption that the relationships they describe will hold in the future. The multiple regression equations which yield them are simply tautological reformulations of selected economic data. Taken at face value, Tinbergen’s work “explains” the errors in his data no less than their real movements; for although many of the series employed in the study would be accorded, even by their compilers, a margin of error in excess of 5 per cent, Tinbergen’s equations “explain” well over 95 per cent of the observed variation.
As W. C. Mitchell put it some years ago, “a competent statistician, with sufficient clerical assistance and time at his command, can take almost any pair of time series for a given period and work them into forms which will yield coefficients of correlation exceeding ±.9 …. So work of [this] sort … must be judged, not by the coefficients of correlation obtained within the periods for which they have manipulated the data, but by the coefficients which they get in earlier or later periods to which their formulas may be applied.” But Tinbergen makes no attempt to determine whether his equations agree with data other than those which they translate …
The methods used by Tinbergen do not and cannot provide an empirically tested explanation of business cycle movements.
Inflationsförväntningarnas roll: Som svar på vårt påpekande att hans ekonometriska kalkyler inte är robusta, redogör Svensson i sina två inlägg för hur han kommer fram till att inflationsförväntningarna inte är signifikanta. Han avslutar sitt första inlägg med en uppmaning till andra att kontrollera resultaten. Det är precis vad vi gjort. Vi har använt hans datakällor och modeller för att replikera hans beräkningar — enligt vedertagen vetenskaplig metod …
Här vill vi komma med ett viktigt påpekande som gäller både för Svenssons och våra beräkningar. I alla ekonometriska studier tvingas forskaren göra olika antaganden rörande hur modellen, i vårt fall Phillipskurvan, bör specificeras och rörande de ekonometriska metoder som modellen ska skattas med. I vår och i Svenssons studie finns i huvudsak två ekonometriska problem att lösa, vilka vi diskuterar i detalj nedan. Dessa problem gäller tidsförskjuten data samt överlappande data. Svensson väljer en metod som tar hänsyn till ett av dessa problem medan vi väljer en annan metod som tar har hänsyn till bägge. Vi säger inte att vår metod är den bästa eller att Svenssons är den sämsta. Båda har sina för- och nackdelar – som vi diskuterar i vår studie och i det bifogade appendixet. Det centrala budskapet med våra skattningar är att valet av metod påverkar resultaten på ett sådant sätt att vi inte kan dra bestämda slutsatser om penningpolitikens effekter.
Lars E O Svensson svarar på kritiken här.
Och för en vecka sedan skrev yours truly …
Rigour and elegance in the analysis does not make up for the gap between reality and model. It is the distribution of the phenomena in itself and not its estimation that ought to be at the centre of the stage. A crucial ingredient to any economic theory that wants to use probabilistic models should be a convincing argument for the view that “there can be no harm in considering economic variables as stochastic variables” [Haavelmo 1943:13]. In most cases no such arguments are given.
Of course you are entitled – like Haavelmo and his modern probabilistic followers – to express a hope “at a metaphysical level” that there are invariant features of reality to uncover and that also show up at the empirical level of observations as some kind of regularities.
But is it a justifiable hope? I have serious doubts. The kind of regularities you may hope to find in society is not to be found in the domain of surface phenomena, but rather at the level of causal mechanisms, powers and capacities. Persistence and generality has to be looked out for at an underlying deep level. Most econometricians do not want to visit that playground. They are content with setting up theoretical models that give us correlations and eventually “mimic” existing causal properties.
[Haavelmo’s] effort to create foundations for the probability approach in econometrics finally results in an inconsistent set of claims in its defence. First, there are vast amounts of experience which warrant a frequency interpretation. This is supported by repetitive discussions of experimental design, but the inability to expeeriment inspires an epistemological interpretation. Then Haavelmo mentions the futility of bothering with these issues because the probability approach is most of all a useful tool. This would be an instrumentalistic justification for its use if Haavelmo gave supportive evidence for his claim. There is not one example which attempts to do so. […]
The founders of exonometrics tried to adapt the sampling approach to a non-experimental small sample domain. They tried to justify this with a priori and analytcal arguments. However, the ultimate argument for a ‘probability approach in econometrics’ consists of a mixture of metaphors, metaphysics and a pinch of bluff.
Neoclassical economists often hold the view that criticisms of econometrics are the conclusions of sadly misinformed and misguided people who dislike and do not understand much of it. This is really a gross misapprehension. To be careful and cautious is not the same as to dislike. And as any perusal of the mathematical-statistical and philosophical works of people like for example David Freedman, Nancy Cartwright, Chris Chatfield, Hugo Keuzenkamp, Rudolf Kalman, John Maynard Keynes or Tony Lawson would show, the critique is put forward by respected authorities. I would argue, against “common knowledge”, that they do not misunderstand the crucial issues at stake in the development of econometrics. Quite the contrary. They know them all too well — and are not satisfied with the validity and philosophical underpinning of the assumptions made for applying its methods.
Let me try to do justice to the critical arguments on the logic of probabilistic induction and shortly elaborate — mostly from a philosophy of science vantage point — on some insights a critical realist perspective gives us on econometrics and its methodological foundations.
My critique is that the currently accepted notion of a statistical model is not scientific; rather, it is a guess at what might constitute (scientific) reality without the vital element of feedback, that is, without checking the hypothesized, postulated, wished-for, natural-looking (but in fact only guessed) model against that reality. To be blunt, as far as is known today, there is no such thing as a concrete i.i.d. (independent, identically distributed) process, not because this is not desirable, nice, or even beautiful, but because Nature does not seem to be like that … As Bertrand Russell put it at the end of his long life devoted to philosophy, “Roughly speaking, what we know is science and what we don’t know is philosophy.” In the scientific context, but perhaps not in the applied area, I fear statistical modeling today belongs to the realm of philosophy.
To make this point seem less erudite, let me rephrase it in cruder terms. What would a scientist expect from statisticians, once he became interested in statistical problems? He would ask them to explain to him, in some clear-cut cases, the origin of randomness frequently observed in the real world, and furthermore, when this explanation depended on the device of a model, he would ask them to continue to confront that model with the part of reality that the model was supposed to explain. Something like this was going on three hundred years ago … But in our times the idea somehow got lost when i.i.d. became the pampered new baby.
It will be remembered that the seventy translators of the Septuagint were shut up in seventy separate rooms with the Hebrew text and brought out with them, when they emerged, seventy identical translations. Would the same miracle be vouchsafed if seventy multiple correlators were shut up with the same statistical material? And anyhow, I suppose, if each had a different economist perched on his a priori, that would make a difference to the outcome.
It is clearly the case that experienced modellers could easily come up with significantly different models based on the same set of data thus undermining claims to researcher-independent objectivity. This has been demonstrated empirically by Magnus and Morgan (1999) who conducted an experiment in which an apprentice had to try to replicate the analysis of a dataset that might have been carried out by three different experts (Leamer, Sims, and Hendry) following their published guidance. In all cases the results were different from each other, and different from that which would have been produced by the expert, thus demonstrating the importance of tacit knowledge in statistical analysis.
Magnus and Morgan conducted a further experiment which involved eight expert teams, from different universities, analysing the same sets of data each using their own particular methodology. The data concerned the demand for food in the US and in the Netherlands and was based on a classic study by Tobin (1950) augmented with more recent data. The teams were asked to estimate the income elasticity of food demand and to forecast per capita food consumption. In terms of elasticities, the lowest estimates were around 0.38 whilst the highest were around 0.74 – clearly vastly different especially when remembering that these were based on the same sets of data. The forecasts were perhaps even more extreme – from a base of around 4000 in 1989 the lowest forecast for the year 2000 was 4130 while the highest was nearly 18000!
Sherlock Holmes stated that ‘It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.’ True this may be in the circumstance of crime investigation, the principle does not apply to testing. In a crime investigation one wants to know what actually happened: who did what, when and how. Testing is somewhat different.
With testing, not only what happened is interesting, but what could have happened, and what would have happened were the circumstances to repeat itself. The particular events under study are considered draws from a larger population. It is the distribution of this population one is primarily interested in, and not so much the particular realizations of that distribution. So not the particular sequence of head and tails in coin flipping is of interest, but whether that says something about a coin being biased or not. Not (only) whether inflation and unemployment went together in the sixties is interesting, but what that tells about the true trade-off between these two economic variables. In short, one wants to test.
The tested hypothesis has to come from somewhere and to base it, like Holmes, on data is valid procedure … The theory should however not be tested on the same data they were derived from. To use significance as a selection criterion in a regression equation constitutes a violation of this principle …
Consider for example time series econometrics … It may not be clear a priori which lags matter, while it is clear that some definitely do … The Box-Jenkins framework models the auto-correlation structure of a series as good as possible first, postponing inference to the next stage. In this next stage other variables or their lagged values may be related to the time series under study. While this justifies why time series uses data mining, it leaves unaddressed the issue of the true level of significance …
This is sometimes recommended in a general-to-specific approach where the most general model is estimated and insignificant variables are subsequently discarded. As superfluous variables increase the variance of estimators, omitting irrelevant variables this way may increase efficiency. Problematic is that variables were included in the first place because they were thought to be (potentially) relevant. If then for example twenty variables, believed to be potentially relevant a priori, are included, then one or more will bound to be insignificant (depending on the power, which cannot be trusted to be high). Omitting relevant variables, whether they are insignificant or not, generally biases all other estimates as well due to the well-known omitted variable bias. The data are thus used both to specify the model and test the model; this is the problem of estimation. Without further notice this double use of the data is bound to be misleading if not incorrect. The tautological nature of this procedure is apparent; as significance is the selection criterion it is not very surprising selected variables are significant.
D. A. Hollanders Five methodological fallacies in applied econometrics
Suppose you test a highly confirmed hypothesis, for example, that the price elasticity of demand is negative. What would you do if the computer were to spew out a positive coefficient? Surely you would not claim to have overthrown the law of demand … Instead, you would rerun many variants of your regression until the recalcitrant computer finally acknowledged the sovereignty of your theory …
Only the naive are shocked by such soft and gentle testing … Easy it is. But also wrong, when the purpose of the exercise is not to use a hypothesis, but to determine its validity …
Econometric tests are far from useless. They are worth doing, and their results do tell something … But many economists insist that economics can deliver more, much more, than merely, more or less, plausible knowledge, that it can reach its results with compelling demonstrations. By such a standard how should one describe our usual way of testing hypotheses? One possibility is to interpret it as Blaug [The Methodology of Economics, 1980, p. 256] does, as ‘playing tennis with the net down’ …
Perhaps my charge that econometric testing lacks seriousness of purpose is wrong … But regardless of the cause, it should be clear that most econometric testing is not rigorous. Combining such tests with formalized theoretical analysis or elaborate techniques is another instance of the principle of the strongest link. The car is sleek and elegant; too bad the wheels keep falling off.