Sherlock Holmes stated that ‘It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.’ True this may be in the circumstance of crime investigation, the principle does not apply to testing. In a crime investigation one wants to know what actually happened: who did what, when and how. Testing is somewhat different.
With testing, not only what happened is interesting, but what could have happened, and what would have happened were the circumstances to repeat itself. The particular events under study are considered draws from a larger population. It is the distribution of this population one is primarily interested in, and not so much the particular realizations of that distribution. So not the particular sequence of head and tails in coin flipping is of interest, but whether that says something about a coin being biased or not. Not (only) whether inflation and unemployment went together in the sixties is interesting, but what that tells about the true trade-off between these two economic variables. In short, one wants to test.
The tested hypothesis has to come from somewhere and to base it, like Holmes, on data is valid procedure … The theory should however not be tested on the same data they were derived from. To use significance as a selection criterion in a regression equation constitutes a violation of this principle …
Consider for example time series econometrics … It may not be clear a priori which lags matter, while it is clear that some definitely do … The Box-Jenkins framework models the auto-correlation structure of a series as good as possible first, postponing inference to the next stage. In this next stage other variables or their lagged values may be related to the time series under study. While this justifies why time series uses data mining, it leaves unaddressed the issue of the true level of significance …
This is sometimes recommended in a general-to-specific approach where the most general model is estimated and insignificant variables are subsequently discarded. As superfluous variables increase the variance of estimators, omitting irrelevant variables this way may increase efficiency. Problematic is that variables were included in the first place because they were thought to be (potentially) relevant. If then for example twenty variables, believed to be potentially relevant a priori, are included, then one or more will bound to be insignificant (depending on the power, which cannot be trusted to be high). Omitting relevant variables, whether they are insignificant or not, generally biases all other estimates as well due to the well-known omitted variable bias. The data are thus used both to specify the model and test the model; this is the problem of estimation. Without further notice this double use of the data is bound to be misleading if not incorrect. The tautological nature of this procedure is apparent; as significance is the selection criterion it is not very surprising selected variables are significant.
D. A. Hollanders Five methodological fallacies in applied econometrics
Suppose you test a highly confirmed hypothesis, for example, that the price elasticity of demand is negative. What would you do if the computer were to spew out a positive coefficient? Surely you would not claim to have overthrown the law of demand … Instead, you would rerun many variants of your regression until the recalcitrant computer finally acknowledged the sovereignty of your theory …
Only the naive are shocked by such soft and gentle testing … Easy it is. But also wrong, when the purpose of the exercise is not to use a hypothesis, but to determine its validity …
Econometric tests are far from useless. They are worth doing, and their results do tell something … But many economists insist that economics can deliver more, much more, than merely, more or less, plausible knowledge, that it can reach its results with compelling demonstrations. By such a standard how should one describe our usual way of testing hypotheses? One possibility is to interpret it as Blaug [The Methodology of Economics, 1980, p. 256] does, as ‘playing tennis with the net down’ …
Perhaps my charge that econometric testing lacks seriousness of purpose is wrong … But regardless of the cause, it should be clear that most econometric testing is not rigorous. Combining such tests with formalized theoretical analysis or elaborate techniques is another instance of the principle of the strongest link. The car is sleek and elegant; too bad the wheels keep falling off.
Because I was there when the economics department of my university got an IBM 360, I was very much caught up in the excitement of combining powerful computers with economic research. Unfortunately, I lost interest in econometrics almost as soon as I understood how it was done. My thinking went through four stages:
1.Holy shit! Do you see what you can do with a computer’s help.
2.Learning computer modeling puts you in a small class where only other members of the caste can truly understand you. This opens up huge avenues for fraud:
3.The main reason to learn stats is to prevent someone else from committing fraud against you.
4.More and more people will gain access to the power of statistical analysis. When that happens, the stratification of importance within the profession should be a matter of who asks the best questions.
Disillusionment began to set in. I began to suspect that all the really interesting economic questions were FAR beyond the ability to reduce them to mathematical formulas. Watching computers being applied to other pursuits than academic economic investigations over time only confirmed those suspicions.
1.Precision manufacture is an obvious application for computing. And for many applications, this worked magnificently. Any design that combined straight line and circles could be easily described for computerized manufacture. Unfortunately, the really interesting design problems can NOT be reduced to formulas. A car’s fender, for example, can not be describe using formulas—it can only be described by specifying an assemblage of multiple points. If math formulas cannot describe something as common and uncomplicated as a car fender, how can it hope to describe human behavior?
2.When people started using computers for animation, it soon became apparent that human motion was almost impossible to model correctly. After a great deal of effort, the animators eventually put tracing balls on real humans and recorded that motion before transferring it to the the animated character. Formulas failed to describe simple human behavior—like a toddler trying to walk.
Lately, I have discovered a Swedish economist who did NOT give up econometrics merely because it sounded so impossible. In fact, he still teaches the stuff. But for the rest of us, he systematically destroys the pretensions of those who think they can describe human behavior with some basic Formulas.
Wonder who that Swedish guy is …
- Always, but always, plot your data.
- Remember that data quality is at least as important as data
- Always ask yourself, “Do these results make economic/common sense”?
- Check whether your “statistically significant” results are also
- Be sure that you know exactly what assumptions are used/needed to obtain
the results relating to the properties of any estimator or test that you
- Just because someone else has used a particular approach to analyse a
problem that looks like yours, that doesn’t mean they were right!
- “Test, test, test”! (David Hendry). But don’t forget that “pre-testing”
raises some important issues of its own.
- Don’t assume that the computer code that someone gives to you is
relevant for your application, or that it even produces correct results.
- Keep in mind that published results will represent only a fraction of the
results that the author obtained, but is not publishing.
- Don’t forget that “peer-reviewed” does NOT mean “correct results”, or
even “best practices were followed”.
Some variants of ‘data mining’ can be classified as the greatest of the basement sins, but other variants of ‘data mining’ can be viewed as important ingredients in data analysis. Unfortunately, these two variants usually are not mutually exclusive and so frequently conflict in the sense that to gain the benefits of the latter, one runs the risk of incurring the costs of the former.
Hoover and Perez (2000, p. 196) offer a general definition of data mining as referring to “a broad class of activities that have in common a search over different ways to process or package data statistically or econometrically with the purpose of making the final presentation meet certain design criteria.” Two markedly different views of data mining lie within the scope of this general definition. One view of ‘data mining’ is that it refers to experimenting with (or ‘fishing through’) the data to produce a specification … The problem with this, and why it is viewed as a sin, is that such a procedure is almost guaranteed to produce a specification tailored to the peculiarities of that particular data set, and consequently will be misleading in terms of what it says about the underlying process generating the data. Furthermore, traditional testing procedures used to ‘sanctify’ the specification are no longer legitimate, because these data, since they have been used to generate the specification, cannot be judged impartial if used to test that specification …
An alternative view of ‘data mining’ is that it refers to experimenting with (or ‘fishing through’) the data to discover empirical regularities that can inform economic theory … Hand et al (2000) describe data mining as the process of seeking interesting or valuable information in large data sets. Its greatest virtue is that it can uncover empirical regularities that point to errors/omissions in theoretical specifications …
In summary, this second type of ‘data mining’ identifies regularities in or characteristics of the data that should be accounted for and understood in the context of the underlying theory. This may suggest the need to rethink the theory behind one’s model, resulting in a new specification founded on a more broad-based understanding. This is to be distinguished from a new specification created by mechanically remolding the old specification to fit the data; this would risk incurring the costs described earlier when discussing the first variant of ‘data mining.’
The issue here is how should the model specification be chosen? As usual, Leamer (1996, p. 189) has an amusing view: “As you wander through the thicket of models, you may come to question the meaning of the Econometric Scripture that presumes the model is given to you at birth by a wise and beneficent Holy Spirit.”
In practice, model specifications come from both theory and data, and given the absence of Leamer’s Holy Spirit, properly so.
Brad DeLong wonders why Cliff Asness is clinging to a theoretical model that has clearly been rejected by the data …
There’s a version of this in econometrics, i.e. you know the model is correct, you are just having trouble finding evidence for it. It goes as follows. You are testing a theory you came up with, but the data are uncooperative and say you are wrong. But instead of accepting that, you tell yourself “My theory is right, I just haven’t found the right econometric specification yet. I need to add variables, remove variables, take a log, add an interaction, square a term, do a different correction for misspecification, try a different sample period, etc., etc., etc.” Then, after finally digging out that one specification of the econometric model that confirms your hypothesis, you declare victory, write it up, and send it off (somehow never mentioning the intense specification mining that produced the result).
Too much econometric work proceeds along these lines. Not quite this blatantly, but that is, in effect, what happens in too many cases. I think it is often best to think of econometric results as the best case the researcher could make for a particular theory rather than a true test of the model.
Mark touches the spot — and for the sake of balancing the overly rosy picture of econometric achievements given in the usual econometrics textbooks today, it may also be interesting to see how Trygve Haavelmo, with the completion (in 1958) of the twenty-fifth volume of Econometrica, assessed the the role of econometrics in the advancement of economics. Although mainly positive of the “repair work” and “clearing-up work” done, Haavelmo also found some grounds for despair:
We have found certain general principles which would seem to make good sense. Essentially, these principles are based on the reasonable idea that, if an economic model is in fact “correct” or “true,” we can say something a priori about the way in which the data emerging from it must behave. We can say something, a priori, about whether it is theoretically possible to estimate the parameters involved. And we can decide, a priori, what the proper estimation procedure should be … But the concrete results of these efforts have often been a seemingly lower degree of accuracy of the would-be economic laws (i.e., larger residuals), or coefficients that seem a priori less reasonable than those obtained by using cruder or clearly inconsistent methods.
There is the possibility that the more stringent methods we have been striving to develop have actually opened our eyes to recognize a plain fact: viz., that the “laws” of economics are not very accurate in the sense of a close fit, and that we have been living in a dream-world of large but somewhat superficial or spurious correlations.
And as the quote below shows, Frisch also shared some of Haavelmo’s — and Keynes’s — doubts on the applicability of econometrics:
I have personally always been skeptical of the possibility of making macroeconomic predictions about the development that will follow on the basis of given initial conditions … I have believed that the analytical work will give higher yields – now and in the near future – if they become applied in macroeconomic decision models where the line of thought is the following: “If this or that policy is made, and these conditions are met in the period under consideration, probably a tendency to go in this or that direction is created”.
Peter Dorman is one of those rare economists that it is always a pleasure to read. Here his critical eye is focused on economists’ infatuation with homogeneity and averages:
You may feel a gnawing discomfort with the way economists use statistical techniques. Ostensibly they focus on the difference between people, countries or whatever the units of observation happen to be, but they nevertheless seem to treat the population of cases as interchangeable—as homogenous on some fundamental level. As if people were replicants.
You are right, and this brief talk is about why and how you’re right, and what this implies for the questions people bring to statistical analysis and the methods they use.
Our point of departure will be a simple multiple regression model of the form
y = β0 + β1 x1 + β2 x2 + …. + ε
where y is an outcome variable, x1 is an explanatory variable of interest, the other x’s are control variables, the β’s are coefficients on these variables (or a constant term, in the case of β0), and ε is a vector of residuals. We could apply the same analysis to more complex functional forms, and we would see the same things, so let’s stay simple.
What question does this model answer? It tells us the average effect that variations in x1 have on the outcome y, controlling for the effects of other explanatory variables. Repeat: it’s the average effect of x1 on y.
This model is applied to a sample of observations. What is assumed to be the same for these observations? (1) The outcome variable y is meaningful for all of them. (2) The list of potential explanatory factors, the x’s, is the same for all. (3) The effects these factors have on the outcome, the β’s, are the same for all. (4) The proper functional form that best explains the outcome is the same for all. In these four respects all units of observation are regarded as essentially the same.
Now what is permitted to differ across these observations? Simply the values of the x’s and therefore the values of y and ε. That’s it.
Thus measures of the difference between individual people or other objects of study are purchased at the cost of immense assumptions of sameness. It is these assumptions that both reflect and justify the search for average effects …
In the end, statistical analysis is about imposing a common structure on observations in order to understand differentiation. Any structure requires assuming some kinds of sameness, but some approaches make much more sweeping assumptions than others. An unfortunate symbiosis has arisen in economics between statistical methods that excessively rule out diversity and statistical questions that center on average (non-diverse) effects. This is damaging in many contexts, including hypothesis testing, program evaluation, forecasting—you name it …
The first step toward recovery is admitting you have a problem. Every statistical analyst should come clean about what assumptions of homogeneity are being made, in light of their plausibility and the opportunities that exist for relaxing them.
Firmly stuck in an empiricist tradition, econometrics is only concerned with the measurable aspects of reality. But there is always the possibility that there are other variables – of vital importance and although perhaps unobservable and non-additive, not necessarily epistemologically inaccessible – that were not considered for the model.
Real world social systems are not governed by stable causal mechanisms or capacities. If economic regularities obtain they — as a rule — do it only because we engineered them for that purpose. Outside man-made “nomological machines” they are rare, or even non-existant. Unfortunately that also makes them rather useless.
Remember that a model is not the truth. It is a lie to help you get your point across. And in the case of modeling economic risk, your model is a lie about others, who are probably lying themselves. And what’s worse than a simple lie? A complicated lie.
Sam L. Savage The Flaw of Averages
In Gretl it’s extremely simple to do this kind of bootstrapping. Run the regression and you get an output-window with the regression results. Click on Analysis at the top of the window and then on Bootstrap and select the options Confidence interval and Resample residuals. After having selected the coefficient for which you want to you get bootstrapped estimates, you just click OK and a window will appear showing the 95% confidence interval for the coefficient. It’s as simple as that!