I loved this album 40 years ago. I still do.
Modern econometrics is fundamentally based on assuming — usually without any explicit justification — that we can gain causal knowledge by considering independent variables that may have an impact on the variation of a dependent variable. This is however, far from self-evident. Often the fundamental causes are constant forces that are not amenable to the kind of analysis econometrics supplies us with. As Stanley Lieberson has it in his modern classic Making It Count:
One can always say whether, in a given empirical context, a given variable or theory accounts for more variation than another. But it is almost certain that the variation observed is not universal over time and place. Hence the use of such a criterion first requires a conclusion about the variation over time and place in the dependent variable. If such an analysis is not forthcoming, the theoretical conclusion is undermined by the absence of information …
Moreover, it is questionable whether one can draw much of a conclusion about causal forces from simple analysis of the observed variation … To wit, it is vital that one have an understanding, or at least a working hypothesis, about what is causing the event per se; variation in the magnitude of the event will not provide the answer to that question.
Trygve Haavelmo was making a somewhat similar point back in 1941, when criticizing the treatmeant of the interest variable in Tinbergen’s regression analyses. The regression coefficient of the interest rate variable being zero was according to Haavelmo not sufficient for inferring that “variations in the rate of interest play only a minor role, or no role at all, in the changes in investment activity.” Interest rates may very well play a decisive indirect role by influencing other causally effective variables. And:
the rate of interest may not have varied much during the statistical testing period, and for this reason the rate of interest would not “explain” very much of the variation in net profit (and thereby the variation in investment) which has actually taken place during this period. But one cannot conclude that the rate of influence would be inefficient as an autonomous regulator, which is, after all, the important point.
Causality in economics — and other social sciences — can never solely be a question of statistical inference. Causality entails more than predictability, and to really in depth explain social phenomena requires theory. Analysis of variation — the foundation of all econometrics — can never in itself reveal how these variations are brought about. First when we are able to tie actions, processes or structures to the statistical relations detected, can we say that we are getting at relevant explanations of causation. Too much in love with axiomatic-deductive modeling, neoclassical economists especially tend to forget that accounting for causation — how causes bring about their effects — demands deep subject-matter knowledge and acquaintance with the intricate fabrics and contexts. As already Keynes argued in his A Treatise on Probability, statistics and econometrics should not primarily be seen as means of inferring causality from observational data, but rather as description of patterns of associations and correlations that we may use as suggestions of possible causal realations.
No doubt one of the most important books you will read this year!
General equilibrium is fundamental to economics on a more normative level as well. A story about Adam Smith, the invisible hand, and the merits of markets pervades introductory textbooks, classroom teaching, and contemporary political discourse. The intellectual foundation of this story rests on general equilibrium, not on the latest mathematical excursions. If the foundation of everyone’s favourite economics story is now known to be unsound — and according to some, uninteresting as well — then the profession owes the world a bit of an explanation.
Almost a century and a half after Léon Walras founded general equilibrium theory, economists still have not been able to show that markets lead economies to equilibria.
We do know that — under very restrictive assumptions — equilibria do exist, are unique and are Pareto-efficient.
But after reading Frank Ackerman’s article — or Franklin M. Fisher’s The stability of general equilibrium – what do we know and why is it important? — one has to ask oneself — what good does that do?
As long as we cannot show that there are convincing reasons to suppose there are forces which lead economies to equilibria — the value of general equilibrium theory is nil. As long as we cannot really demonstrate that there are forces operating — under reasonable, relevant and at least mildly realistic conditions — at moving markets to equilibria, there cannot really be any sustainable reason for anyone to pay any interest or attention to this theory.
A stability that can only be proved by assuming Santa Claus conditions is of no avail. Most people do not believe in Santa Claus anymore. And for good reasons. Santa Claus is for kids, and general equilibrium economists ought to grow up, leaving their Santa Claus economics in the dustbin of history.
Continuing to model a world full of agents behaving as economists — “often wrong, but never uncertain” — and still not being able to show that the system under reasonable assumptions converges to equilibrium (or simply assume the problem away), is a gross misallocation of intellectual resources and time. As Ackerman writes:
The guaranteed optimality of market outcomes and laissez-faire policies died with general equilibrium. If economic stability rests on exogenous social and political forces, then it is surely appropriate to debate the desirable extent of intervention in the market — in part, in order to rescue the market fromits own instability.
I do not claim that the technique developed in the present paper will, like a stone of the wise, solve all the problems of testing “significance” with which the economic statistician is confronted. No statistical technique, however, refined, will ever be able to do such a thing. The ultimate test of significance must consist in a network of conclusions and cross checks where theoretical economic considerations, intimate and realistic knowledge of the data and a refined statistical technique concur.
Noah Smith has a post up trying to defend p-values and traditional statistical significance testing against the increasing attacks launched against it:
Suddenly, everyone is getting really upset about p-values and statistical significance testing. The backlash has reached such a frenzy that some psych journals are starting to ban significance testing. Though there are some well-known problems with p-values and significance testing, this backlash doesn’t pass the smell test. When a technique has been in wide use for decades, it’s certain that LOTS of smart scientists have had a chance to think carefully about it. The fact that we’re only now getting the backlash means that the cause is something other than the inherent uselessness of the methodology.
That doesn’t sound very convincing.
Maybe we should apply yet another smell test …
A non-trivial part of teaching statistics is made up of learning students to perform significance testing. A problem I have noticed repeatedly over the years, however, is that no matter how careful you try to be in explicating what the probabilities generated by these statistical tests – p values – really are, still most students misinterpret them.
This is not to blame on students’ ignorance, but rather on significance testing not being particularly transparent (conditional probability inference is difficult even to those of us who teach and practice it). A lot of researchers fall pray to the same mistakes. So — given that it anyway is very unlikely than any population parameter is exactly zero, and that contrary to assumption most samples in social science and economics are not random or having the right distributional shape — why continue to press students and researchers to do null hypothesis significance testing, testing that relies on weird backward logic that students and researchers usually don’t understand?
Statistical significance doesn’t say that something is important or true. And since there already are far better and more relevant testing that can be done, it is high time to give up on this statistical fetish.
Jager and Leek may well be correct in their larger point, that the medical literature is broadly correct. But I don’t think the statistical framework they are using is appropriate for the questions they are asking. My biggest problem is the identification of scientific hypotheses and statistical “hypotheses” of the “theta = 0″ variety.
Based on the word “empirical” title, I thought the authors were going to look at a large number of papers with p-values and then follow up and see if the claims were replicated. But no, they don’t follow up on the studies at all! What they seem to be doing is collecting a set of published p-values and then fitting a mixture model to this distribution, a mixture of a uniform distribution (for null effects) and a beta distribution (for non-null effects). Since only statistically significant p-values are typically reported, they fit their model restricted to p-values less than 0.05. But this all assumes that the p-values have this stated distribution. You don’t have to be Uri Simonsohn to know that there’s a lot of p-hacking going on. Also, as noted above, the problem isn’t really effects that are exactly zero, the problem is that a lot of effects are lots in the noise and are essentially undetectable given the way they are studied.
Jager and Leek write that their model is commonly used to study hypotheses in genetics and imaging. I could see how this model could make sense in those fields … but I don’t see this model applying to published medical research, for two reasons. First … I don’t think there would be a sharp division between null and non-null effects; and, second, there’s just too much selection going on for me to believe that the conditional distributions of the p-values would be anything like the theoretical distributions suggested by Neyman-Pearson theory.
So, no, I don’t at all believe Jager and Leek when they write, “we are able to empirically estimate the rate of false positives in the medical literature and trends in false positive rates over time.” They’re doing this by basically assuming the model that is being questioned, the textbook model in which effects are pure and in which there is no p-hacking.
Indeed. If anything, this underlines how important it is — and on this Noah Smith and yours truly agree — not to equate science with statistical calculation. All science entail human judgement, and using statistical models doesn’t relieve us of that necessity. Working with misspecified models, the scientific value of significance testing is actually zero – even though you’re making valid statistical inferences! Statistical models and concomitant significance tests are no substitutes for doing real science. Or as a noted German philosopher once famously wrote:
There is no royal road to science, and only those who do not dread the fatiguing climb of its steep paths have a chance of gaining its luminous summits.
In its standard form, a significance test is not the kind of “severe test” that we are looking for in our search for being able to confirm or disconfirm empirical scientific hypothesis. This is problematic for many reasons, one being that there is a strong tendency to accept the null hypothesis since they can’t be rejected at the standard 5% significance level. In their standard form, significance tests bias against new hypotheses by making it hard to disconfirm the null hypothesis.
And as shown over and over again when it is applied, people have a tendency to read “not disconfirmed” as “probably confirmed.” Standard scientific methodology tells us that when there is only say a 10 % probability that pure sampling error could account for the observed difference between the data and the null hypothesis, it would be more “reasonable” to conclude that we have a case of disconfirmation. Especially if we perform many independent tests of our hypothesis and they all give about the same 10 % result as our reported one, I guess most researchers would count the hypothesis as even more disconfirmed.
Most importantly — we should never forget that the underlying parameters we use when performing significance tests are model constructions. Our p-values mean next to nothing if the model is wrong. As eminent mathematical statistician David Freedman writes:
I believe model validation to be a central issue. Of course, many of my colleagues will be found to disagree. For them, fitting models to data, computing standard errors, and performing significance tests is “informative,” even though the basic statistical assumptions (linearity, independence of errors, etc.) cannot be validated. This position seems indefensible, nor are the consequences trivial. Perhaps it is time to reconsider.
Statistical significance tests DO NOT validate models!
In journal articles a typical regression equation will have an intercept and several explanatory variables. The regression output will usually include an F-test, with p – 1 degrees of freedom in the numerator and n – p in the denominator. The null hypothesis will not be stated. The missing null hypothesis is that all the coefficients vanish, except the intercept.
If F is significant, that is often thought to validate the model. Mistake. The F-test takes the model as given. Significance only means this: if the model is right and the coefficients are 0, it is very unlikely to get such a big F-statistic. Logically, there are three possibilities on the table:
i) An unlikely event occurred.
ii) Or the model is right and some of the coefficients differ from 0.
iii) Or the model is wrong.
The concept of “critical mass” was originally created by Thomas Schelling to explain a variety of different “tipping point” actions and behaviours in society.
The concept was elaborated on in Schelling’s masterful Micromotives and Macrobehavior (1978).
Here’s what it’s (almost) all about …
The tiny little problem that there is no hard empirical evidence that verifies rational expectations models doesn’t usually bother its protagonists too much. Rational expectations überpriest Thomas Sargent has defended the epistemological status of the rational expectations hypothesis arguing that since it “focuses on outcomes and does not pretend to have behavioral content,” it has proved to be “a powerful tool for making precise statements.”
Precise, yes, but relevant and realistic? I’ll be dipped!
In their attempted rescue operations, rational expectationists try to give the picture that only heterodox economists like yours truly are critical of the rational expectations hypothesis.
But, on this, they are, simply … eh … wrong.
Let’s listen to Nobel laureate Edmund Phelps — hardly a heterodox economist — and what he has to say (emphasis added):
Question: In a new volume with Roman Frydman, “Rethinking Expectations: The Way Forward for Macroeconomics,” you say the vast majority of macroeconomic models over the last four decades derailed your “microfoundations” approach. Can you explain what that is and how it differs from the approach that became widely accepted by the profession?
Answer: In the expectations-based framework that I put forward around 1968, we didn’t pretend we had a correct and complete understanding of how firms or employees formed expectations about prices or wages elsewhere. We turned to what we thought was a plausible and convenient hypothesis. For example, if the prices of a company’s competitors were last reported to be higher than in the past, it might be supposed that the company will expect their prices to be higher this time, too, but not that much. This is called “adaptive expectations:” You adapt your expectations to new observations but don’t throw out the past. If inflation went up last month, it might be supposed that inflation will again be high but not that high.
Q: So how did adaptive expectations morph into rational expectations?
A: The “scientists” from Chicago and MIT came along to say, we have a well-established theory of how prices and wages work. Before, we used a rule of thumb to explain or predict expectations: Such a rule is picked out of the air. They said, let’s be scientific. In their mind, the scientific way is to suppose price and wage setters form their expectations with every bit as much understanding of markets as the expert economist seeking to model, or predict, their behavior. The rational expectations approach is to suppose that the people in the market form their expectations in the very same way that the economist studying their behavior forms her expectations: on the basis of her theoretical model.
Q: And what’s the consequence of this putsch?
A: Craziness for one thing. You’re not supposed to ask what to do if one economist has one model of the market and another economist a different model. The people in the market cannot follow both economists at the same time. One, if not both, of the economists must be wrong. Another thing: It’s an important feature of capitalist economies that they permit speculation by people who have idiosyncratic views and an important feature of a modern capitalist economy that innovators conceive their new products and methods with little knowledge of whether the new things will be adopted — thus innovations. Speculators and innovators have to roll their own expectations. They can’t ring up the local professor to learn how. The professors should be ringing up the speculators and aspiring innovators. In short, expectations are causal variables in the sense that they are the drivers. They are not effects to be explained in terms of some trumped-up causes.
Q: So rather than live with variability, write a formula in stone!
A: What led to rational expectations was a fear of the uncertainty and, worse, the lack of understanding of how modern economies work. The rational expectationists wanted to bottle all that up and replace it with deterministic models of prices, wages, even share prices, so that the math looked like the math in rocket science. The rocket’s course can be modeled while a living modern economy’s course cannot be modeled to such an extreme. It yields up a formula for expectations that looks scientific because it has all our incomplete and not altogether correct understanding of how economies work inside of it, but it cannot have the incorrect and incomplete understanding of economies that the speculators and would-be innovators have.
Q: One of the issues I have with rational expectations is the assumption that we have perfect information, that there is no cost in acquiring that information. Yet the economics profession, including Federal Reserve policy makers, appears to have been hijacked by Robert Lucas.
A: You’re right that people are grossly uninformed, which is a far cry from what the rational expectations models suppose. Why are they misinformed? I think they don’t pay much attention to the vast information out there because they wouldn’t know what to do what to do with it if they had it. The fundamental fallacy on which rational expectations models are based is that everyone knows how to process the information they receive according to the one and only right theory of the world. The problem is that we don’t have a “right” model that could be certified as such by the National Academy of Sciences. And as long as we operate in a modern economy, there can never be such a model.
The rational expectations hypothesis presumes consistent behaviour, where expectations do not display any persistent errors. In the world of rational expectations we are always, on average, hitting the bull’s eye. In the more realistic, open systems view, there is always the possibility (danger) of making mistakes that may turn out to be systematic. It is because of this, presumably, that we put so much emphasis on learning in our modern knowledge societies.
So, where does all this leave us? I think John Kay sums it up pretty well:
A scientific idea is not seminal because it influences the research agenda of PhD students. An important scientific advance yields conclusions that differ from those derived from other theories, and establishes that these divergent conclusions are supported by observation. Yet as Prof Sargent disarmingly observed, “such empirical tests were rejecting too many good models” in the programme he had established with fellow Nobel laureates Bob Lucas and Ed Prescott. In their world, the validity of a theory is demonstrated if, after the event, and often with torturing of data and ad hoc adjustments that are usually called “imperfections”, it can be reconciled with already known facts – “calibrated”. Since almost everything can be “explained” in this way, the theory is indeed universal; no other approach is necessary, or even admissible …