In order to make causal inferences from simple regression, it is now conventional to assume something like the setting in equation (1) … The equation makes very strong invariance assumptions, which cannot be tested from data on X and Y.
(1) Y = a + bx + δ
What happens without invariance? The answer will be obvious. If intervention changes the intercept a, the slope b, or the mean of the error distribution, the impact of the intervention becomes difficult to determine. If the variance of the error term is changed, the usual confidence intervals lose their meaning.
How would any of this be possible? Suppose, for instance, that — unbeknownst to the statistician — X and Y are both the effects of a common cause operating through linear statistical laws like (1). Suppose errors are independent and normal, while Nature randomizes the common cause to have a normal distribution. The scatter diagram will look lovely, a regression line is easily fitted, and the straightforward causal interpretation will be wrong.
One should not jump to the conclusion that there is necessarily a substantive difference between drawing inferences from experimental as opposed to nonexperimental data …
In the experimental setting, the fertilizer treatment is “randomly” assigned to plots of land, whereas in the other case nature did the assignment … “Random” does not mean adequately mixed in every sample. It only means that on the average, the fertilizer treatments are adequately mixed …
Randomization implies that the least squares estimator is “unbiased,” but that definitely does not mean that for each sample the estimate is correct. Sometimes the estimate is too high, sometimes too low …
In particular, it is possible for the randomization to lead to exactly the same allocation as the nonrandom assignment … Many econometricians would insist that there is a difference, because the randomized experiment generates “unbiased” estimates. But all this means is that, if this particular experiment yields a gross overestimate, some other experiment yields a gross underestimate.
With interactive confounders explicitly included, the overall treatment effect β0 + β′zt is not a number but a variable that depends on the confounding effects. Absent observation of the interactive compounding effects, what is estimated is some kind of average treatment effect which is called by Imbens and Angrist (1994) a “Local Average Treatment Effect,” which is a little like the lawyer who explained that when he was a young man he lost many cases he should have won but as he grew older he won many that he should have lost, so that on the average justice was done. In other words, if you act as if the treatment effect is a random variable by substituting βt for β0 + β′zt , the notation inappropriately relieves you of the heavy burden of considering what are the interactive confounders and finding some way to measure them. Less elliptically, absent observation of z, the estimated treatment effect should be transferred only into those settings in which the confounding interactive variables have values close to the mean values in the experiment. If little thought has gone into identifying these possible confounders, it seems probable that little thought will be given to the limited applicability of the results in other settings.
I cannot offer a course in mathematics in this slim volume, but I will do what I can to hit a few of the highlights, when genuinely needed. I will issue one early warning: do not be intimidated by what you don’t completely understand. Statistical Science is not really very helpful for understanding or forecasting complex evolving self-healing organic ambiguous social systems – economies, in other words.
A statistician may have done the programming, but when you press a button on a computer keyboard and ask the computer to find some good patterns, better get clear a sad fact: computers do not think. They do exactly what the programmer told them to do and nothing more. They look for the patterns that we tell them to look for, those and nothing more. When we turn to the computer for advice, we are only talking to ourselves. This works in a simple setting in which there is a very well-defined set of alternative theories and we can provide the computer with clear instructions. But in complex nonexperimental settings, Sherlock Holmes admonishes: “Never theorize before you have all the evidence. It biases the judgments” …
Mathematical analysis works great to decide which horse wins, if we are completely confident which horses are in the race, but it breaks down when we are not sure. In experimental settings, the set of alternative models can often be well agreed on, but with nonexperimental economics data, the set of models is subject to enormous disagreements. You disagree with your model made yesterday, and I disagree with your model today. Mathematics does not help much resolve our internal intellectual disagreements.
I have lost count of the number of times I have heard students and faculty repeat the idea in seminars, that “all models are wrong”. This aphorism, attributed to George Box, is the battle cry of the Minnesota calibrator, a breed of macroeconomist, inspired by Ed Prescott, one of the most important and influential economists of the last century.
All models are wrong … all models are wrong …
Of course all models are wrong. That is trivially true: it is the definition of a model. But the cry has been used for three decades to poke fun at attempts to use serious econometric methods to analyze time series data. Time series methods were inconvenient to the nascent Real Business Cycle Program that Ed pioneered because the models that he favored were, and still are, overwhelmingly rejected by the facts. That is inconvenient.
Ed’s response was pure genius. If the model and the data are in conflict, the data must be wrong. Time series econometrics, according to Ed, was crushing the acorn before it had time to grow into a tree. His response was not only to reformulate the theory, but also to reformulate the way in which that theory was to be judged. In a puff of calibrator’s smoke, the history of time series econometrics was relegated to the dustbin of history to take its place alongside alchemy, the ether, and the theory of phlogiston.
How did Ed achieve this remarkable feat of prestidigitation? First, he argued that we should focus on a small subset of the properties of the data. Since the work of Ragnar Frisch, economists have recognized that economic time series can be modeled as linear difference equations, hit by random shocks. These time series move together in different ways at different frequencies …
After removing trends, Ed was left with the wiggles. He proposed that we should evaluate our economic theories of business cycles by how well they explain co-movements among the wiggles. When his theory failed to clear the 8ft hurdle of the Olympic high jump, he lowered the bar to 5ft and persuaded us all that leaping over this high school bar was a success.
Keynesians protested. But they did not protest loudly enough and ultimately it became common, even among serious econometricians, to filter their data with the eponymous Hodrick Prescott filter …
By accepting the neo-classical synthesis, Keynesian economists had agreed to play by real business cycle rules. They both accepted that the economy is a self-stabilizing system that, left to itself, would gravitate back to the unique natural rate of unemployment. And for this reason, the Keynesians agreed to play by Ed’s rules. They filtered the data and set the bar at the high school level …
We don’t have to play by Ed’s rules … Once we allow aggregate demand to influence permanently the unemployment rate, the data do not look kindly on either real business cycle models or on the new-Keynesian approach. It’s time to get serious about macroeconomic science and put back the Olympic bar.
Endogeneity problems are of course nothing new in growth regressions. But what is special here is that policy endogeneity is not just an econometric nuisance, but typically an integral part of the null hypothesis that is being tested. The supposition that governments are trying to achieve some economic or political objective is at the core of the theoretical framework that is subjected to empirical tests. In such a setting, treating policy as if it were exogenous or random is problematic not just from an econometric standpoint, but also conceptually …
The cross-national variation we observe in government ownership is unlikely to be random by the very logic of the theories that are tested. Under the developmental perspective, this variation will be driven by the magnitude of the financial market failures that need to be addressed and the governments’ capacity to do so effectively. Under the political motive, the variation will be generated by the degree of “honesty” or “corruption” of political leaders. I show in this paper that the cross-national association between performance and policy will have a very different interpretation depending on which of these fundamental drivers dominate. Unfortunately, none of these drivers is likely to be observable to the analyst. In such a setting the estimated coefficient on state ownership is not informative about either the positive or the normative questions at stake. It cannot help us distinguish between the develop-mental and political views, because the estimated coefficient on government ownership will be negative in both cases.
Controlled experiments are the gold standard in science for proving causality. The FDA, for example, requires controlled experiments (randomized clinical trials) for approving drugs. In the software world, online controlled experiments are being used heavily to make data-driven decisions, especially in areas where the forefront of knowledge is being pushed …
The statistical theory of controlled experiments is well understood, but the devil is in the details and the difference between theory and practice is greater in practice than in theory. We have shared five puzzling experiment outcomes, which we were able to analyze deeply and explain …
Generalizing from these puzzles, we see two themes. One is that instrumentation is not as precise as we would like it to be, interacting in subtle ways with experiments …A second theme is that lessons from offline experiments don’t always map well online …
Anyone can run online controlled experiments and generate numbers with six digits after the decimal point. It’s easy to generate p-values and beautiful 3D graphs of trends over time. But the real challenge is in understanding when the results are invalid, not at the sixth decimal place, but before the decimal point, or even at the plus/minus for the percent effect; that’s what these analyses did to the initial results. We hope we’ve managed to shed light on puzzling outcomes and we encourage others to drill deep and share other similar results. Generating numbers is easy; generating numbers you should trust is hard!
Neoclassical economics nowadays usually assumes that agents that have to make choices under conditions of uncertainty behave according to Bayesian rules, axiomatized by Ramsey (1931) and Savage (1954) – that is, they maximize expected utility with respect to some subjective probability measure that is continually updated according to Bayes theorem. If not, they are supposed to be irrational, and ultimately – via some “Dutch book” or “money pump”argument – susceptible to being ruined by some clever “bookie”.
Bayesianism reduces questions of rationality to questions of internal consistency (coherence) of beliefs, but – even granted this questionable reductionism – do rational agents really have to be Bayesian? As I have been arguing elsewhere (e. g. here, here and here) there is no strong warrant for believing so.
In many of the situations that are relevant to economics one could argue that there is simply not enough of adequate and relevant information to ground beliefs of a probabilistic kind, and that in those situations it is not really possible, in any relevant way, to represent an individual’s beliefs in a single probability measure.
Say you have come to learn (based on own experience and tons of data) that the probability of you becoming unemployed in Sweden is 10 %. Having moved to another country (where you have no own experience and no data) you have no information on unemployment and a fortiori nothing to help you construct any probability estimate on. A Bayesian would, however, argue that you would have to assign probabilities to the mutually exclusive alternative outcomes and that these have to add up to 1, if you are rational. That is, in this case – and based on symmetry – a rational individual would have to assign probability 10% to becoming unemployed and 90% of becoming employed.
That feels intuitively wrong though, and I guess most people would agree. Bayesianism cannot distinguish between symmetry-based probabilities from information and symmetry-based probabilities from an absence of information. In these kinds of situations most of us would rather say that it is simply irrational to be a Bayesian and better instead to admit that we “simply do not know” or that we feel ambiguous and undecided. Arbitrary an ungrounded probability claims are more irrational than being undecided in face of genuine uncertainty, so if there is not sufficient information to ground a probability distribution it is better to acknowledge that simpliciter, rather than pretending to possess a certitude that we simply do not possess.
I think this critique of Bayesianism is in accordance with the views of John Maynard Keynes’ A Treatise on Probability (1921) and General Theory (1937). According to Keynes we live in a world permeated by unmeasurable uncertainty – not quantifiable stochastic risk – which often forces us to make decisions based on anything but rational expectations. Sometimes we “simply do not know.” Keynes would not have accepted the view of Bayesian economists, according to whom expectations “tend to be distributed, for the same information set, about the prediction of the theory.” Keynes, rather, thinks that we base our expectations on the confidence or “weight” we put on different events and alternatives. To Keynes expectations are a question of weighing probabilities by “degrees of belief”, beliefs that have preciously little to do with the kind of stochastic probabilistic calculations made by the rational agents modeled by Bayesian economists.
Stressing the importance of Keynes’ view on uncertainty John Kay writes in Financial Times:
Keynes believed that the financial and business environment was characterised by “radical uncertainty”. The only reasonable response to the question “what will interest rates be in 20 years’ time?” is “we simply do not know” …
For Keynes, probability was about believability, not frequency. He denied that our thinking could be described by a probability distribution over all possible future events, a statistical distribution that could be teased out by shrewd questioning – or discovered by presenting a menu of trading opportunities. In the 1920s he became engaged in an intellectual battle on this issue, in which the leading protagonists on one side were Keynes and the Chicago economist Frank Knight, opposed by a Cambridge philosopher, Frank Ramsey, and later by Jimmie Savage, another Chicagoan.
Keynes and Knight lost that debate, and Ramsey and Savage won, and the probabilistic approach has maintained academic primacy ever since. A principal reason was Ramsey’s demonstration that anyone who did not follow his precepts – anyone who did not act on the basis of a subjective assessment of probabilities of future events – would be “Dutch booked” … A Dutch book is a set of choices such that a seemingly attractive selection from it is certain to lose money for the person who makes the selection.
I used to tell students who queried the premise of “rational” behaviour in financial markets – where rational means are based on Bayesian subjective probabilities – that people had to behave in this way because if they did not, people would devise schemes that made money at their expense. I now believe that observation is correct but does not have the implication I sought. People do not behave in line with this theory, with the result that others in financial markets do devise schemes that make money at their expense.
Although this on the whole gives a succinct and correct picture of Keynes’s view on probability, I think it’s necessary to somewhat qualify in what way and to what extent Keynes “lost” the debate with the Bayesians Frank Ramsey and Jim Savage.
In economics it’s an indubitable fact that few mainstream neoclassical economists work within the Keynesian paradigm. All more or less subscribe to some variant of Bayesianism. And some even say that Keynes acknowledged he was wrong when presented with Ramsey’s theory. This is a view that has unfortunately also been promulgated by Robert Skidelsky in his otherwise masterly biography of Keynes. But I think it’s fundamentally wrong. Let me elaborate on this point (the argumentation is more fully presented in my book John Maynard Keynes (SNS, 2007)).
It’s a debated issue in newer research on Keynes if he, as some researchers maintain, fundamentally changed his view on probability after the critique levelled against his A Treatise on Probability by Frank Ramsey. It has been exceedingly difficult to present evidence for this being the case.
Ramsey’s critique was mainly that the kind of probability relations that Keynes was speaking of in Treatise actually didn’t exist and that Ramsey’s own procedure (betting) made it much easier to find out the “degrees of belief” people were having. I question this both from a descriptive and a normative point of view.
What Keynes is saying in his response to Ramsey is only that Ramsey “is right” in that people’s “degrees of belief” basically emanates in human nature rather than in formal logic.
Patrick Maher, former professor of philosophy at the University of Illinois, even suggests that Ramsey’s critique of Keynes’s probability theory in some regards is invalid:
Keynes’s book was sharply criticized by Ramsey. In a passage that continues to be quoted approvingly, Ramsey wrote:
“But let us now return to a more fundamental criticism of Mr. Keynes’ views, which is the obvious one that there really do not seem to be any such things as the probability relations he describes. He supposes that, at any rate in certain cases, they can be perceived; but speaking for myself I feel confident that this is not true. I do not perceive them, and if I am to be persuaded that they exist it must be by argument; moreover, I shrewdly suspect that others do not perceive them either, because they are able to come to so very little agreement as to which of them relates any two given propositions.” (Ramsey 1926, 161)
I agree with Keynes that inductive probabilities exist and we sometimes know their values. The passage I have just quoted from Ramsey suggests the following argument against the existence of inductive probabilities. (Here P is a premise and C is the conclusion.)
P: People are able to come to very little agreement about inductive proba- bilities.
C: Inductive probabilities do not exist.
P is vague (what counts as “very little agreement”?) but its truth is still questionable. Ramsey himself acknowledged that “about some particular cases there is agreement” (28) … In any case, whether complicated or not, there is more agreement about inductive probabilities than P suggests.
“If … we take the simplest possible pairs of propositions such as “This is red” and “That is blue” or “This is red” and “That is red,” whose logical relations should surely be easiest to see, no one, I think, pretends to be sure what is the probability relation which connects them.” (162)
I agree that nobody would pretend to be sure of a numeric value for these probabilities, but there are inequalities that most people on reflection would agree with. For example, the probability of “This is red” given “That is red” is greater than the probability of “This is red” given “That is blue.” This illustrates the point that inductive probabilities often lack numeric values. It doesn’t show disagreement; it rather shows agreement, since nobody pretends to know numeric values here and practically everyone will agree on the inequalities.
“Or, perhaps, they may claim to see the relation but they will not be able to say anything about it with certainty, to state if it ismore or less than 1/3, or so on. They may, of course, say that it is incomparable with any numerical relation, but a relation about which so little can be truly said will be of little scientific use and it will be hard to convince a sceptic of its existence.” (162)
Although the probabilities that Ramsey is discussing lack numeric values, they are not “incomparable with any numerical relation.” Since there are more than three different colors, the a priori probability of “This is red” must be less than 1/3 and so its probability given “This is blue” must likewise be less than 1/3. In any case, the “scientific use” of something is not relevant to whether it exists. And the question is not whether it is “hard to convince a sceptic of its existence” but whether the sceptic has any good argument to support his position …
Ramsey concluded the paragraph I have been quoting as follows:
“Besides this view is really rather paradoxical; for any believer in induction must admit that between “This is red” as conclusion and “This is round” together with a billion propositions of the form “a is round and red” as evidence, there is a finite probability relation; and it is hard to suppose that as we accumulate instances there is suddenly a point, say after 233 instances, at which the probability relation becomes finite and so comparable with some numerical relations.” (162)
Ramsey is here attacking the view that the probability of “This is red” given “This is round” cannot be compared with any number, but Keynes didn’t say that and it isn’t my view either. The probability of “This is red” given only “This is round” is the same as the a priori probability of “This is red” and hence less than 1/3. Given the additional billion propositions that Ramsey mentions, the probability of “This is red” is high (greater than 1/2, for example) but it still lacks a precise numeric value. Thus the probability is always both comparable with some numbers and lacking a precise numeric value; there is no paradox here.
I have been evaluating Ramsey’s apparent argument from P to C. So far I have been arguing that P is false and responding to Ramsey’s objections to unmeasurable probabilities. Now I want to note that the argument is also invalid. Even if P were true, it could be that inductive probabilities exist in the (few) cases that people generally agree about. It could also be that the disagreement is due to some people misapplying the concept of inductive probability in cases where inductive probabilities do exist. Hence it is possible for P to be true and C false …
I conclude that Ramsey gave no good reason to doubt that inductive probabilities exist.
Ramsey’s critique made Keynes more strongly emphasize the individuals’ own views as the basis for probability calculations, and less stress that their beliefs were rational. But Keynes’s theory doesn’t stand or fall with his view on the basis for our “degrees of belief” as logical. The core of his theory – when and how we are able to measure and compare different probabilities – he doesn’t change. Unlike Ramsey he wasn’t at all sure that probabilities always were one-dimensional, measurable, quantifiable or even comparable entities.
The desire in the profession to make universalistic claims following certain standard procedures of statistical inference is simply too strong to embrace procedures which explicitly rely on the use of vernacular knowledge for model closure in a contingent manner. More broadly, such a desire has played a vital role in the decisive victory of mathematical formalization over conventionally verbal based economic discourses as the proncipal medium of rhetoric, owing to its internal consistency, reducibility, generality, and apparent objectivity. It does not matter that [as Einstein wrote] ‘as far as the laws of mathematics refer to reality, they are not certain.’ What matters is that these laws are ‘certain’ when ‘they do not refer to reality.’ Most of what is evaluated as core research in the academic domain has little direct bearing on concrete social events in the real world anyway.
One may wonder how much calibration adds to the knowledge of economic structures and the deep parameters involved … Micro estimates are imputed in general equilibrium models which are confronted with new data, not used for the construction of the imputed parameters … However this procedure to impute parameter values into calibrated models has serious weaknesses …
Second, even where estimates are available from micro-econometric investigations, they cannot be automatically importyed into aggregated general equlibrium models …
Third, calibration hardly contributes to growth of knowledge about ‘deep parameters’. These deep parameters are confronted with a novel context (aggregate time-series), but this is not used for inference on their behalf. Rather, the new context is used to fit the model to presumed ‘laws of motion’ of the economy …
This leads to the fourth weakness. The combination of different pieces of evidence is laudable, but it can be done with statistical methods as well … This statistical approach has the advantage that it takes the parameter uncertainty into account: even if uncontroversial ‘deep parameters’ were available, they would have standard errors. Specification uncertainty makes things even worse. Negecting this leads to self-deception.
There are many kinds of useless economics held in high regard within mainstream economics establishment today. Few — if any — are less deserved than the macroeconomic theory/method — mostly connected with Nobel laureates Finn Kydland, Robert Lucas, Edward Prescott and Thomas Sargent — called calibration.
Hugo Keuzenkamp and yours truly are certainly not the only ones having doubts about the scientific value of calibration. In Journal of Economic Perspective (1996, vol. 10) Lars Peter Hansen and James J. Hickman writes:
It is only under very special circumstances that a micro parameter such as the inter-temporal elasticity of substitution or even a marginal propensity to consume out of income can be ‘plugged into’ a representative consumer model to produce an empirically concordant aggregate model … What credibility should we attach to numbers produced from their ‘computational experiments’, and why should we use their ‘calibrated models’ as a basis for serious quantitative policy evaluation? … There is no filing cabinet full of robust micro estimats ready to use in calibrating dynamic stochastic equilibrium models … The justification for what is called ‘calibration’ is vague and confusing.
Mathematical statistician Aris Spanos — in Error and Inference (Mayo & Spanos, 2010, p. 240) — is no less critical:
Given that “calibration” purposefully foresakes error probabilities and provides no way to assess the reliability of inference, how does one assess the adequacy of the calibrated model? …
The idea that it should suffice that a theory “is not obscenely at variance with the data” (Sargent, 1976, p. 233) is to disregard the work that statistical inference can perform in favor of some discretional subjective appraisal … it hardly recommends itself as an empirical methodology that lives up to the standards of scientific objectivity
In physics it may possibly not be straining credulity too much to model processes as ergodic – where time and history do not really matter – but in social and historical sciences it is obviously ridiculous. If societies and economies were ergodic worlds, why do econometricians fervently discuss things such as structural breaks and regime shifts? That they do is an indication of the unrealisticness of treating open systems as analyzable with ergodic concepts.
The future is not reducible to a known set of prospects. It is not like sitting at the roulette table and calculating what the future outcomes of spinning the wheel will be. Reading Lucas, Sargent, Prescott, Kydland and other calibrationists one comes to think of Robert Clower’s apt remark that
much economics is so far removed from anything that remotely resembles the real world that it’s often difficult for economists to take their own subject seriously.
Instead of assuming calibration and rational expectations to be right, one ought to confront the hypothesis with the available evidence. It is not enough to construct models. Anyone can construct models. To be seriously interesting, models have to come with an aim. They have to have an intended use. If the intention of calibration and rational expectations is to help us explain real economies, it has to be evaluated from that perspective. A model or hypothesis without a specific applicability is not really deserving our interest.
To say, as Edward Prescott that
one can only test if some theory, whether it incorporates rational expectations or, for that matter, irrational expectations, is or is not consistent with observations
is not enough. Without strong evidence all kinds of absurd claims and nonsense may pretend to be science. We have to demand more of a justification than this rather watered-down version of “anything goes” when it comes to rationality postulates. If one proposes rational expectations one also has to support its underlying assumptions. None is given, which makes it rather puzzling how rational expectations has become the standard modeling assumption made in much of modern macroeconomics. Perhaps the reason is, as Paul Krugman has it, that economists often mistake
beauty, clad in impressive looking mathematics, for truth.
But I think Prescott’s view is also the reason why calibration economists are not particularly interested in empirical examinations of how real choices and decisions are made in real economies. In the hands of Lucas, Prescott and Sargent, rational expectations has been transformed from an – in principle – testable hypothesis to an irrefutable proposition. Believing in a set of irrefutable propositions may be comfortable – like religious convictions or ideological dogmas – but it is not science.