Tinbergen’s results cannot be judged by ordinary tests of statistical significance. The reason is that the variables with which he winds up, the particular series measuring these variables, the leads and lags, and various other aspects of the equations besides the particular values of the parameters (which alone can be tested by the usual statistical technique) have been selected after an extensive process of trial and error because they yield high coefficients of correlation. Tinbergen is seldom satisfied with a correlation coefficient less than 0.98. But these attractive correlation coefficients create no presumption that the relationships they describe will hold in the future. The multiple regression equations which yield them are simply tautological reformulations of selected economic data. Taken at face value, Tinbergen’s work “explains” the errors in his data no less than their real movements; for although many of the series employed in the study would be accorded, even by their compilers, a margin of error in excess of 5 per cent, Tinbergen’s equations “explain” well over 95 per cent of the observed variation.
As W. C. Mitchell put it some years ago, “a competent statistician, with sufficient clerical assistance and time at his command, can take almost any pair of time series for a given period and work them into forms which will yield coefficients of correlation exceeding ±.9 …. So work of [this] sort … must be judged, not by the coefficients of correlation obtained within the periods for which they have manipulated the data, but by the coefficients which they get in earlier or later periods to which their formulas may be applied.” But Tinbergen makes no attempt to determine whether his equations agree with data other than those which they translate …
The methods used by Tinbergen do not and cannot provide an empirically tested explanation of business cycle movements.
Inflationsförväntningarnas roll: Som svar på vårt påpekande att hans ekonometriska kalkyler inte är robusta, redogör Svensson i sina två inlägg för hur han kommer fram till att inflationsförväntningarna inte är signifikanta. Han avslutar sitt första inlägg med en uppmaning till andra att kontrollera resultaten. Det är precis vad vi gjort. Vi har använt hans datakällor och modeller för att replikera hans beräkningar — enligt vedertagen vetenskaplig metod …
Här vill vi komma med ett viktigt påpekande som gäller både för Svenssons och våra beräkningar. I alla ekonometriska studier tvingas forskaren göra olika antaganden rörande hur modellen, i vårt fall Phillipskurvan, bör specificeras och rörande de ekonometriska metoder som modellen ska skattas med. I vår och i Svenssons studie finns i huvudsak två ekonometriska problem att lösa, vilka vi diskuterar i detalj nedan. Dessa problem gäller tidsförskjuten data samt överlappande data. Svensson väljer en metod som tar hänsyn till ett av dessa problem medan vi väljer en annan metod som tar har hänsyn till bägge. Vi säger inte att vår metod är den bästa eller att Svenssons är den sämsta. Båda har sina för- och nackdelar – som vi diskuterar i vår studie och i det bifogade appendixet. Det centrala budskapet med våra skattningar är att valet av metod påverkar resultaten på ett sådant sätt att vi inte kan dra bestämda slutsatser om penningpolitikens effekter.
Lars E O Svensson svarar på kritiken här.
Och för en vecka sedan skrev yours truly …
Rigour and elegance in the analysis does not make up for the gap between reality and model. It is the distribution of the phenomena in itself and not its estimation that ought to be at the centre of the stage. A crucial ingredient to any economic theory that wants to use probabilistic models should be a convincing argument for the view that “there can be no harm in considering economic variables as stochastic variables” [Haavelmo 1943:13]. In most cases no such arguments are given.
Of course you are entitled – like Haavelmo and his modern probabilistic followers – to express a hope “at a metaphysical level” that there are invariant features of reality to uncover and that also show up at the empirical level of observations as some kind of regularities.
But is it a justifiable hope? I have serious doubts. The kind of regularities you may hope to find in society is not to be found in the domain of surface phenomena, but rather at the level of causal mechanisms, powers and capacities. Persistence and generality has to be looked out for at an underlying deep level. Most econometricians do not want to visit that playground. They are content with setting up theoretical models that give us correlations and eventually “mimic” existing causal properties.
[Haavelmo's] effort to create foundations for the probability approach in econometrics finally results in an inconsistent set of claims in its defence. First, there are vast amounts of experience which warrant a frequency interpretation. This is supported by repetitive discussions of experimental design, but the inability to expeeriment inspires an epistemological interpretation. Then Haavelmo mentions the futility of bothering with these issues because the probability approach is most of all a useful tool. This would be an instrumentalistic justification for its use if Haavelmo gave supportive evidence for his claim. There is not one example which attempts to do so. [...]
The founders of exonometrics tried to adapt the sampling approach to a non-experimental small sample domain. They tried to justify this with a priori and analytcal arguments. However, the ultimate argument for a ‘probability approach in econometrics’ consists of a mixture of metaphors, metaphysics and a pinch of bluff.
Neoclassical economists often hold the view that criticisms of econometrics are the conclusions of sadly misinformed and misguided people who dislike and do not understand much of it. This is really a gross misapprehension. To be careful and cautious is not the same as to dislike. And as any perusal of the mathematical-statistical and philosophical works of people like for example David Freedman, Nancy Cartwright, Chris Chatfield, Hugo Keuzenkamp, Rudolf Kalman, John Maynard Keynes or Tony Lawson would show, the critique is put forward by respected authorities. I would argue, against “common knowledge”, that they do not misunderstand the crucial issues at stake in the development of econometrics. Quite the contrary. They know them all too well — and are not satisfied with the validity and philosophical underpinning of the assumptions made for applying its methods.
Let me try to do justice to the critical arguments on the logic of probabilistic induction and shortly elaborate — mostly from a philosophy of science vantage point — on some insights a critical realist perspective gives us on econometrics and its methodological foundations.
My critique is that the currently accepted notion of a statistical model is not scientific; rather, it is a guess at what might constitute (scientific) reality without the vital element of feedback, that is, without checking the hypothesized, postulated, wished-for, natural-looking (but in fact only guessed) model against that reality. To be blunt, as far as is known today, there is no such thing as a concrete i.i.d. (independent, identically distributed) process, not because this is not desirable, nice, or even beautiful, but because Nature does not seem to be like that … As Bertrand Russell put it at the end of his long life devoted to philosophy, “Roughly speaking, what we know is science and what we don’t know is philosophy.” In the scientific context, but perhaps not in the applied area, I fear statistical modeling today belongs to the realm of philosophy.
To make this point seem less erudite, let me rephrase it in cruder terms. What would a scientist expect from statisticians, once he became interested in statistical problems? He would ask them to explain to him, in some clear-cut cases, the origin of randomness frequently observed in the real world, and furthermore, when this explanation depended on the device of a model, he would ask them to continue to confront that model with the part of reality that the model was supposed to explain. Something like this was going on three hundred years ago … But in our times the idea somehow got lost when i.i.d. became the pampered new baby.
It will be remembered that the seventy translators of the Septuagint were shut up in seventy separate rooms with the Hebrew text and brought out with them, when they emerged, seventy identical translations. Would the same miracle be vouchsafed if seventy multiple correlators were shut up with the same statistical material? And anyhow, I suppose, if each had a different economist perched on his a priori, that would make a difference to the outcome.
It is clearly the case that experienced modellers could easily come up with significantly different models based on the same set of data thus undermining claims to researcher-independent objectivity. This has been demonstrated empirically by Magnus and Morgan (1999) who conducted an experiment in which an apprentice had to try to replicate the analysis of a dataset that might have been carried out by three different experts (Leamer, Sims, and Hendry) following their published guidance. In all cases the results were different from each other, and different from that which would have been produced by the expert, thus demonstrating the importance of tacit knowledge in statistical analysis.
Magnus and Morgan conducted a further experiment which involved eight expert teams, from different universities, analysing the same sets of data each using their own particular methodology. The data concerned the demand for food in the US and in the Netherlands and was based on a classic study by Tobin (1950) augmented with more recent data. The teams were asked to estimate the income elasticity of food demand and to forecast per capita food consumption. In terms of elasticities, the lowest estimates were around 0.38 whilst the highest were around 0.74 – clearly vastly different especially when remembering that these were based on the same sets of data. The forecasts were perhaps even more extreme – from a base of around 4000 in 1989 the lowest forecast for the year 2000 was 4130 while the highest was nearly 18000!
Sherlock Holmes stated that ‘It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.’ True this may be in the circumstance of crime investigation, the principle does not apply to testing. In a crime investigation one wants to know what actually happened: who did what, when and how. Testing is somewhat different.
With testing, not only what happened is interesting, but what could have happened, and what would have happened were the circumstances to repeat itself. The particular events under study are considered draws from a larger population. It is the distribution of this population one is primarily interested in, and not so much the particular realizations of that distribution. So not the particular sequence of head and tails in coin flipping is of interest, but whether that says something about a coin being biased or not. Not (only) whether inflation and unemployment went together in the sixties is interesting, but what that tells about the true trade-off between these two economic variables. In short, one wants to test.
The tested hypothesis has to come from somewhere and to base it, like Holmes, on data is valid procedure … The theory should however not be tested on the same data they were derived from. To use significance as a selection criterion in a regression equation constitutes a violation of this principle …
Consider for example time series econometrics … It may not be clear a priori which lags matter, while it is clear that some definitely do … The Box-Jenkins framework models the auto-correlation structure of a series as good as possible first, postponing inference to the next stage. In this next stage other variables or their lagged values may be related to the time series under study. While this justifies why time series uses data mining, it leaves unaddressed the issue of the true level of significance …
This is sometimes recommended in a general-to-specific approach where the most general model is estimated and insignificant variables are subsequently discarded. As superfluous variables increase the variance of estimators, omitting irrelevant variables this way may increase efficiency. Problematic is that variables were included in the first place because they were thought to be (potentially) relevant. If then for example twenty variables, believed to be potentially relevant a priori, are included, then one or more will bound to be insignificant (depending on the power, which cannot be trusted to be high). Omitting relevant variables, whether they are insignificant or not, generally biases all other estimates as well due to the well-known omitted variable bias. The data are thus used both to specify the model and test the model; this is the problem of estimation. Without further notice this double use of the data is bound to be misleading if not incorrect. The tautological nature of this procedure is apparent; as significance is the selection criterion it is not very surprising selected variables are significant.
D. A. Hollanders Five methodological fallacies in applied econometrics
Suppose you test a highly confirmed hypothesis, for example, that the price elasticity of demand is negative. What would you do if the computer were to spew out a positive coefficient? Surely you would not claim to have overthrown the law of demand … Instead, you would rerun many variants of your regression until the recalcitrant computer finally acknowledged the sovereignty of your theory …
Only the naive are shocked by such soft and gentle testing … Easy it is. But also wrong, when the purpose of the exercise is not to use a hypothesis, but to determine its validity …
Econometric tests are far from useless. They are worth doing, and their results do tell something … But many economists insist that economics can deliver more, much more, than merely, more or less, plausible knowledge, that it can reach its results with compelling demonstrations. By such a standard how should one describe our usual way of testing hypotheses? One possibility is to interpret it as Blaug [The Methodology of Economics, 1980, p. 256] does, as ‘playing tennis with the net down’ …
Perhaps my charge that econometric testing lacks seriousness of purpose is wrong … But regardless of the cause, it should be clear that most econometric testing is not rigorous. Combining such tests with formalized theoretical analysis or elaborate techniques is another instance of the principle of the strongest link. The car is sleek and elegant; too bad the wheels keep falling off.
Because I was there when the economics department of my university got an IBM 360, I was very much caught up in the excitement of combining powerful computers with economic research. Unfortunately, I lost interest in econometrics almost as soon as I understood how it was done. My thinking went through four stages:
1.Holy shit! Do you see what you can do with a computer’s help.
2.Learning computer modeling puts you in a small class where only other members of the caste can truly understand you. This opens up huge avenues for fraud:
3.The main reason to learn stats is to prevent someone else from committing fraud against you.
4.More and more people will gain access to the power of statistical analysis. When that happens, the stratification of importance within the profession should be a matter of who asks the best questions.
Disillusionment began to set in. I began to suspect that all the really interesting economic questions were FAR beyond the ability to reduce them to mathematical formulas. Watching computers being applied to other pursuits than academic economic investigations over time only confirmed those suspicions.
1.Precision manufacture is an obvious application for computing. And for many applications, this worked magnificently. Any design that combined straight line and circles could be easily described for computerized manufacture. Unfortunately, the really interesting design problems can NOT be reduced to formulas. A car’s fender, for example, can not be describe using formulas—it can only be described by specifying an assemblage of multiple points. If math formulas cannot describe something as common and uncomplicated as a car fender, how can it hope to describe human behavior?
2.When people started using computers for animation, it soon became apparent that human motion was almost impossible to model correctly. After a great deal of effort, the animators eventually put tracing balls on real humans and recorded that motion before transferring it to the the animated character. Formulas failed to describe simple human behavior—like a toddler trying to walk.
Lately, I have discovered a Swedish economist who did NOT give up econometrics merely because it sounded so impossible. In fact, he still teaches the stuff. But for the rest of us, he systematically destroys the pretensions of those who think they can describe human behavior with some basic Formulas.
Wonder who that Swedish guy is …
- Always, but always, plot your data.
- Remember that data quality is at least as important as data
- Always ask yourself, “Do these results make economic/common sense”?
- Check whether your “statistically significant” results are also
- Be sure that you know exactly what assumptions are used/needed to obtain
the results relating to the properties of any estimator or test that you
- Just because someone else has used a particular approach to analyse a
problem that looks like yours, that doesn’t mean they were right!
- “Test, test, test”! (David Hendry). But don’t forget that “pre-testing”
raises some important issues of its own.
- Don’t assume that the computer code that someone gives to you is
relevant for your application, or that it even produces correct results.
- Keep in mind that published results will represent only a fraction of the
results that the author obtained, but is not publishing.
- Don’t forget that “peer-reviewed” does NOT mean “correct results”, or
even “best practices were followed”.