Frequentist hypothesis testing has come under sustained and vigorous attack in recent years … But there are a couple of good things about Frequentist hypothesis testing that I haven’t seen many people discuss. Both of these have to do not with the formal method itself, but with social conventions associated with the practice …
Why do I like these social conventions? Two reasons. First, I think they cut down a lot on scientific noise. “Statistical significance” is sort of a first-pass filter that tells you which results are interesting and which ones aren’t. Without that automated filter, the entire job of distinguishing interesting results from uninteresting ones falls to the reviewers of a paper, who have to read through the paper much more carefully than if they can just scan for those little asterisks of “significance”.
A non-trivial part of teaching statistics is made up of teaching students to perform significance testing. A problem I have noticed repeatedly over the years, however, is that no matter how careful you try to be in explicating what the probabilities generated by these statistical tests – p-values – really are, still most students misinterpret them. And a lot of researchers obviously also fall pray to the same mistakes:
Are women three times more likely to wear red or pink when they are most fertile? No, probably not. But here’s how hardworking researchers, prestigious scientific journals, and gullible journalists have been fooled into believing so.
The paper I’ll be talking about appeared online this month in Psychological Science, the flagship journal of the Association for Psychological Science, which represents the serious, research-focused (as opposed to therapeutic) end of the psychology profession.
“Women Are More Likely to Wear Red or Pink at Peak Fertility,” by Alec Beall and Jessica Tracy, is based on two samples: a self-selected sample of 100 women from the Internet, and 24 undergraduates at the University of British Columbia. Here’s the claim: “Building on evidence that men are sexually attracted to women wearing or surrounded by red, we tested whether women show a behavioral tendency toward wearing reddish clothing when at peak fertility. … Women at high conception risk were more than three times more likely to wear a red or pink shirt than were women at low conception risk. … Our results thus suggest that red and pink adornment in women is reliably associated with fertility and that female ovulation, long assumed to be hidden, is associated with a salient visual cue.”
Pretty exciting, huh? It’s (literally) sexy as well as being statistically significant. And the difference is by a factor of three—that seems like a big deal.
Really, though, this paper provides essentially no evidence about the researchers’ hypotheses …
The way these studies fool people is that they are reduced to sound bites: Fertile women are three times more likely to wear red! But when you look more closely, you see that there were many, many possible comparisons in the study that could have been reported, with each of these having a plausible-sounding scientific explanation had it appeared as statistically significant in the data.
The standard in research practice is to report a result as “statistically significant” if its p-value is less than 0.05; that is, if there is less than a 1-in-20 chance that the observed pattern in the data would have occurred if there were really nothing going on in the population. But of course if you are running 20 or more comparisons (perhaps implicitly, via choices involved in including or excluding data, setting thresholds, and so on), it is not a surprise at all if some of them happen to reach this threshold.
The headline result, that women were three times as likely to be wearing red or pink during peak fertility, occurred in two different samples, which looks impressive. But it’s not really impressive at all! Rather, it’s exactly the sort of thing you should expect to see if you have a small data set and virtually unlimited freedom to play around with the data, and with the additional selection effect that you submit your results to the journal only if you see some catchy pattern. …
Statistics textbooks do warn against multiple comparisons, but there is a tendency for researchers to consider any given comparison alone without considering it as one of an ensemble of potentially relevant responses to a research question. And then it is natural for sympathetic journal editors to publish a striking result without getting hung up on what might be viewed as nitpicking technicalities. Each person in this research chain is making a decision that seems scientifically reasonable, but the result is a sort of machine for producing and publicizing random patterns.
There’s a larger statistical point to be made here, which is that as long as studies are conducted as fishing expeditions, with a willingness to look hard for patterns and report any comparisons that happen to be statistically significant, we will see lots of dramatic claims based on data patterns that don’t represent anything real in the general population. Again, this fishing can be done implicitly, without the researchers even realizing that they are making a series of choices enabling them to over-interpret patterns in their data.
Indeed. If anything, this underlines how important it is not to equate science with statistical calculation. All science entail human judgement, and using statistical models doesn’t relieve us of that necessity. Working with misspecified models, the scientific value of significance testing is actually zero – even though you’re making valid statistical inferences! Statistical models and concomitant significance tests are no substitutes for doing real science. Or as a noted German philosopher once famously wrote:
There is no royal road to science, and only those who do not dread the fatiguing climb of its steep paths have a chance of gaining its luminous summits.
Statistical significance doesn’t say that something is important or true. Since there already are far better and more relevant testing that can be done (see e. g. here and here)- it is high time to consider what should be the proper function of what has now really become a statistical fetish. Given that it anyway is very unlikely than any population parameter is exactly zero, and that contrary to assumption most samples in social science and economics are not random or having the right distributional shape – why continue to press students and researchers to do null hypothesis significance testing, testing that relies on a weird backward logic that students and researchers usually don’t understand?
Suppose that we as educational reformers have a hypothesis that implementing a voucher system would raise the mean test results with 100 points (null hypothesis). Instead, when sampling, it turns out it only raises it with 75 points and has a standard error (telling us how much the mean varies from one sample to another) of 20.
Does this imply that the data do not disconfirm the hypothesis? Given the usual normality assumptions on sampling distributions the one-tailed p-value is approximately 0.11. Thus, approximately 11% of the time we would expect a score this low or lower if we were sampling from this voucher system population. That means – using the ordinary 5% significance-level — we would not reject the null hypothesis although the test has shown that it is “likely” that the hypothesis is false.
In its standard form, a significance test is not the kind of “severe test” that we are looking for in our search for being able to confirm or disconfirm empirical scientific hypothesis. This is problematic for many reasons, one being that there is a strong tendency to accept the null hypothesis since they can’t be rejected at the standard 5% significance level. In their standard form, significance tests bias against new hypothesis by making it hard to disconfirm the null hypothesis.
And as shown over and over again when it is applied, people have a tendency to read “not disconfirmed” as “probably confirmed.” But looking at our example, standard scientific methodology tells us that since there is only 11% probability that pure sampling error could account for the observed difference between the data and the null hypothesis, it would be more “reasonable” to conclude that we have a case of disconfirmation. Especially if we perform many independent tests of our hypothesis and they all give about the same result as our reported one, I guess most researchers would count the hypothesis as even more disconfirmed.
And, most importantly, of course we should never forget that the underlying parameters we use when performing significance tests are model constructions. Our p-value of 0.11 means next to nothing if the model is wrong. As David Freedman writes in Statistical Models and Causal Inference:
I believe model validation to be a central issue. Of course, many of my colleagues will be found to disagree. For them, fitting models to data, computing standard errors, and performing significance tests is “informative,” even though the basic statistical assumptions (linearity, independence of errors, etc.) cannot be validated. This position seems indefensible, nor are the consequences trivial. Perhaps it is time to reconsider.
There is a nice YouTube video with Tony O’Hagan interviewing Dennis Lindley. Of course, Dennis is a legend and his impact on the field of statistics is huge.
At one point, Tony points out that some people liken Bayesian inference to a religion. Dennis claims this is false. Bayesian inference, he correctly points out, starts with some basic axioms and then the rest follows by deduction. This is logic, not religion.
I agree that the mathematics of Bayesian inference is based on sound logic. But, with all due respect, I think Dennis misunderstood the question. When people say that “Bayesian inference is like a religion,” they are not referring to the logic of Bayesian inference. They are referring to how adherents of Bayesian inference behave.
(As an aside, detractors of Bayesian inference do not deny the correctness of the logic. They just don’t think the axioms are relevant for data analysis. For example, no one doubts the axioms of Peano arithmetic. But that doesn’t imply that arithmetic is the foundation of statistical inference. But I digress.)
The vast majority of Bayesians are pragmatic, reasonable people. But there is a sub-group of die-hard Bayesians who do treat Bayesian inference like a religion. By this I mean:
They are very cliquish.
They have a strong emotional attachment to Bayesian inference.
They are overly sensitive to criticism.
They are unwilling to entertain the idea that Bayesian inference might have flaws.
When someone criticizes Bayes, they think that critic just “doesn’t get it.”
They mock people with differing opinions …
No evidence you can provide would ever make the die-hards doubt their ideas. To them, Sir David Cox, Brad Efron and other giants in our field who have doubts about Bayesian inference, are not taken seriously because they “just don’t get it.”
So is Bayesian inference a religion? For most Bayesians: no. But for the thin-skinned, inflexible die-hards who have attached themselves so strongly to their approach to inference that they make fun of, or get mad at, critics: yes, it is a religion.
Neoclassical economics nowadays usually assumes that agents that have to make choices under conditions of uncertainty behave according to Bayesian rules (preferably the ones axiomatized by Ramsey (1931), de Finetti (1937) or Savage (1954)) – that is, they maximize expected utility with respect to some subjective probability measure that is continually updated according to Bayes theorem. If not, they are supposed to be irrational, and ultimately – via some “Dutch book” or “money pump” argument – susceptible to being ruined by some clever “bookie”.
Bayesianism reduces questions of rationality to questions of internal consistency (coherence) of beliefs, but – even granted this questionable reductionism – do rational agents really have to be Bayesian? As I have been arguing elsewhere (e. g. here and here) there is no strong warrant for believing so.
In many of the situations that are relevant to economics one could argue that there is simply not enough of adequate and relevant information to ground beliefs of a probabilistic kind, and that in those situations it is not really possible, in any relevant way, to represent an individual’s beliefs in a single probability measure.
The view that Bayesian decision theory is only genuinely valid in a small world was asserted very firmly by Leonard Savage when laying down the principles of the theory in his path-breaking Foundations of Statistics. He makes the distinction between small and large worlds in a folksy way by quoting the proverbs ”Look before you leap” and ”Cross that bridge when you come to it”. You are in a small world if it is feasible always to look before you leap. You are in a large world if there are some bridges that you cannot cross before you come to them.
As Savage comments, when proverbs conflict, it is proverbially true that there is some truth in both—that they apply in different contexts. He then argues that some decision situations are best modeled in terms of a small world, but others are not. He explicitly rejects the idea that all worlds can be treated as small as both ”ridiculous” and ”preposterous” … Frank Knight draws a similar distinction between making decision under risk or uncertainty …
Bayesianism is understood [here] to be the philosophical principle that Bayesian methods are always appropriate in all decision problems, regardless of whether the relevant set of states in the relevant world is large or small. For example, the world in which financial economics is set is obviously large in Savage’s sense, but the suggestion that there might be something questionable about the standard use of Bayesian updating in financial models is commonly greeted with incredulity or laughter.
Someone who acts as if Bayesianism were correct will be said to be a Bayesianite. It is important to distinguish a Bayesian like myself—someone convinced by Savage’s arguments that Bayesian decision theory makes sense in small worlds—from a Bayesianite. In particular, a Bayesian need not join the more extreme Bayesianites in proceeding as though:
• All worlds are small.
• Rationality endows agents with prior probabilities.
• Rational learning consists simply in using Bayes’ rule to convert a set of prior
probabilities into posterior probabilities after registering some new data.
Bayesianites are often understandably reluctant to make an explicit commitment to these principles when they are stated so baldly, because it then becomes evi-dent that they are implicitly claiming that David Hume was wrong to argue that the principle of scientific induction cannot be justified by rational argument …
Bayesianites believe that the subjective probabilities of Bayesian decision theory can be reinterpreted as logical probabilities without any hassle. Its adherents therefore hold that Bayes’ rule is the solution to the problem of scientific induction. No support for such a view is to be found in Savage’s theory—nor in the earlier theories of Ramsey, de Finetti, or von Neumann and Morgenstern. Savage’s theory is entirely and exclusively a consistency theory. It says nothing about how decision-makers come to have the beliefs ascribed to them; it asserts only that, if the decisions taken are consistent (in a sense made precise by a list of axioms), then they act as though maximizing expected utility relative to a subjective probability distribution …
A reasonable decision-maker will presumably wish to avoid inconsistencies. A Bayesianite therefore assumes that it is enough to assign prior beliefs to as decisionmaker, and then forget the problem of where beliefs come from. Consistency then forces any new data that may appear to be incorporated into the system via Bayesian updating. That is, a posterior distribution is obtained from the prior distribution using Bayes’ rule.
The naiveté of this approach doesn’t consist in using Bayes’ rule, whose validity as a piece of algebra isn’t in question. It lies in supposing that the problem of where the priors came from can be quietly shelved.
Savage did argue that his descriptive theory of rational decision-making could be of practical assistance in helping decision-makers form their beliefs, but he didn’t argue that the decision-maker’s problem was simply that of selecting a prior from a limited stock of standard distributions with little or nothing in the way of soulsearching. His position was rather that one comes to a decision problem with a whole set of subjective beliefs derived from one’s previous experience that may or may not be consistent …
But why should we wish to adjust our gut-feelings using Savage’s methodology? In particular, why should a rational decision-maker wish to be consistent? After all, scientists aren’t consistent, on the grounds that it isn’t clever to be consistently wrong. When surprised by data that shows current theories to be in error, they seek new theories that are inconsistent with the old theories. Consistency, from this point of view, is only a virtue if the possibility of being surprised can somehow be eliminated. This is the reason for distinguishing between large and small worlds. Only in the latter is consistency an unqualified virtue.
Say you have come to learn (based on own experience and tons of data) that the probability of you becoming unemployed in the US is 10%. Having moved to another country (where you have no own experience and no data) you have no information on unemployment and a fortiori nothing to help you construct any probability estimate on. A Bayesian would, however, argue that you would have to assign probabilities to the mutually exclusive alternative outcomes and that these have to add up to 1, if you are rational. That is, in this case – and based on symmetry – a rational individual would have to assign probability 10% to becoming unemployed and 90% of becoming employed.
That feels intuitively wrong though, and I guess most people would agree. Bayesianism cannot distinguish between symmetry-based probabilities from information and symmetry-based probabilities from an absence of information. In these kinds of situations most of us would rather say that it is simply irrational to be a Bayesian and better instead to admit that we “simply do not know” or that we feel ambiguous and undecided. Arbitrary an ungrounded probability claims are more irrational than being undecided in face of genuine uncertainty, so if there is not sufficient information to ground a probability distribution it is better to acknowledge that simpliciter, rather than pretending to possess a certitude that we simply do not possess.
I think this critique of Bayesianism is in accordance with the views of Keynes’ A Treatise on Probability (1921) and General Theory (1937). According to Keynes we live in a world permeated by unmeasurable uncertainty – not quantifiable stochastic risk – which often forces us to make decisions based on anything but rational expectations. Sometimes we “simply do not know.” Keynes would not have accepted the view of Bayesian economists, according to whom expectations “tend to be distributed, for the same information set, about the prediction of the theory.” Keynes, rather, thinks that we base our expectations on the confidence or “weight” we put on different events and alternatives. To Keynes expectations are a question of weighing probabilities by “degrees of belief”, beliefs that have preciously little to do with the kind of stochastic probabilistic calculations made by the rational agents modeled by Bayesian economists.
The bias toward the superficial and the response to extraneous influences on research are both examples of real harm done in contemporary social science by a roughly Bayesian paradigm of statistical inference as the epitome of empirical argument. For instance the dominant attitude toward the sources of black-white differential in United States unemployment rates (routinely the rates are in a two to one ratio) is “phenomenological.” The employment differences are traced to correlates in education, locale, occupational structure, and family background. The attitude toward further, underlying causes of those correlations is agnostic … Yet on reflection, common sense dictates that racist attitudes and institutional racism must play an important causal role. People do have beliefs that blacks are inferior in intelligence and morality, and they are surely influenced by these beliefs in hiring decisions … Thus, an overemphasis on Bayesian success in statistical inference discourages the elaboration of a type of account of racial disadavantages that almost certainly provides a large part of their explanation.
One of my favourite bloggers – Noah Smith – has a nice post up today on Bayesianism:
Consider Proposition H: “God is watching out for me, and has a special purpose for me and me alone. Therefore, God will not let me die. No matter how dangerous a threat seems, it cannot possibly kill me, because God is looking out for me – and only me – at all times.”
Suppose that you believe that there is a nonzero probability that H is true. And suppose you are a Bayesian – you update your beliefs according to Bayes’ Rule. As you survive longer and longer – as more and more threats fail to kill you – your belief about the probability that H is true must increase and increase. It’s just mechanical application of Bayes’ Rule:
P(H|E) = (P(E|H)P(H))/P(E)
Here, E is “not being killed,” P(E|H)=1, and P(H) is assumed not to be zero. P(E) is less than 1, since under a number of alternative hypotheses you might get killed (if you have a philosophical problem with this due to the fact that anyone who observes any evidence must not be dead, just slightly tweak H so that it’s possible to receive a “mortal wound”).
So P(H|E) is greater than P(H) – every moment that you fail to die increases your subjective probability that you are an invincible superman, the chosen of God. This is totally and completely rational, at least by the Bayesian definition of rationality.
The nodal point here is — of course — that although Bayes’ Rule is mathematically unquestionable, that doesn’t qualify it as indisputably applicable to scientific questions. As another of my favourite bloggers — statistician Andrew Gelman – puts it:
The fundamental objections to Bayesian methods are twofold: on one hand, Bayesian methods are presented as an automatic inference engine, and this raises suspicion in anyone with applied experience, who realizes that di erent methods work well in different settings … Bayesians promote the idea that a multiplicity of parameters can be handled via hierarchical, typically exchangeable, models, but it seems implausible that this could really work automatically. In contrast, much of the work in modern non-Bayesian statistics is focused on developing methods that give reasonable answers using minimal assumptions.
The second objection to Bayes comes from the opposite direction and addresses the subjective strand of Bayesian inference: the idea that prior and posterior distributions represent subjective states of knowledge. Here the concern from outsiders is, first, that as scientists we should be concerned with objective knowledge rather than subjective belief, and second, that it’s not clear how to assess subjective knowledge in any case.
Beyond these objections is a general impression of the shoddiness of some Bayesian analyses, combined with a feeling that Bayesian methods are being oversold as an allpurpose statistical solution to genuinely hard problems. Compared to classical inference, which focuses on how to extract the information available in data, Bayesian methods seem to quickly move to elaborate computation. It does not seem like a good thing for a generation of statistics to be ignorant of experimental design and analysis of variance, instead becoming experts on the convergence of the Gibbs sampler. In the short-term this represents a dead end, and in the long term it represents a withdrawal of statisticians from the deeper questions of inference and an invitation for econometricians, computer scientists, and others to move in and fill in the gap …
Bayesian inference is a coherent mathematical theory but I don’t trust it in scientific applications. Subjective prior distributions don’t transfer well from person to person, and there’s no good objective principle for choosing a noninformative prior (even if that concept were mathematically defined, which it’s not). Where do prior distributions come from, anyway? I don’t trust them and I see no reason to recommend that other people do, just so that I can have the warm feeling of philosophical coherence …
As Brad Efron wrote in 1986, Bayesian theory requires a great deal of thought about the given situation to apply sensibly, and recommending that scientists use Bayes’ theorem is like giving the neighborhood kids the key to your F-16 …
So what do I mean by methodo-logical arrogance? I mean an attitude that invokes micro-foundations as a methodo-logical principle — philosophical reductionism in Popper’s terminology — while dismissing non-microfounded macromodels as unscientific. To be sure, the progress of science may enable us to reformulate (and perhaps improve) explanations of certain higher-level phenomena by expressing those relationships in terms of lower-level concepts. That is what Popper calls scientific reduction. But scientific reduction is very different from rejecting, on methodological principle, any explanation not expressed in terms of more basic concepts.
And whenever macrotheory seems inconsistent with microtheory, the inconsistency poses a problem to be solved. Solving the problem will advance our understanding. But simply to reject the macrotheory on methodological principle without evidence that the microfounded theory gives a better explanation of the observed phenomena than the non-microfounded macrotheory … is arrogant. Microfoundations for macroeconomics should result from progress in economic theory, not from a dubious methodological precept.
For more on microfoundations and methodological arrogance, read yours truly’s Micro versus Macro in Real-World Economics Review (issue no. 66, January 2014).
During the last weekend of June, hundreds of students, university lecturers, professors and interested members of the public descended on the halls of University College London to attend the Rethinking Economics conference. They all shared a similar belief: that economics education in most universities had become narrow, insular and detached from the real world …
Despite attempts to shore up the orthodoxy, students have sensed that something is wrong: Over the past two years, they have been organizing across more than 60 countries with the aim of forcing the vampire that is the economics profession into the light of day. While the students in the movement have a diversity of opinions on various issues, they have all come to believe that the best way to reform economics is to demand that a plurality of approaches be taught. They have rightly identified the key fault with contemporary economics teaching: the monoculture it engenders. Currently only one approach to economics is taught in the vast majority of departments in the U.S. and Europe: what is usually called neoclassical or marginalize economics …
Donald Gillies, a former president of the British Society for the Philosophy of Science, told a stunned audience that he had examined three well-known Nobel Prize–winning papers in economics and could find nothing in them that he could call scientific.
When I spoke with the students, they were struck by how even those who dissented from contemporary economic policies like austerity shared this overarching vision. Paul Krugman, for example, to whom many turned after the crisis to provide context — including many of the students I met — also accepted the orthodox view (although he has not embraced some of the worst excesses echoed by his peers).
Almost a hundred years after John Maynard Keynes wrote his seminal A Treatise on Probability (1921), it is still very difficult to find mainstream economists that seriously try to incorporate his far-reaching and incisive analysis of induction and evidential weight into their theories and models.
The standard view in economics – and the axiomatic probability theory underlying it – is to a large extent based on the rather simplistic idea that “more is better.” But as Keynes argues – “more of the same” is not what is important when making inductive inferences. It’s rather a question of “more but different.”
Variation, not replication, is at the core of induction. Finding that p(x|y) = p(x|y & w) doesn’t make w “irrelevant.” Knowing that the probability is unchanged when w is present gives p(x|y & w) another evidential weight (“weight of argument”). Running 10 replicative experiments do not make you as “sure” of your inductions as when running 10 000 varied experiments – even if the probability values happen to be the same.
According to Keynes we live in a world permeated by unmeasurable uncertainty – not quantifiable stochastic risk – which often forces us to make decisions based on anything but “rational expectations.” Keynes rather thinks that we base our expectations on the confidence or “weight” we put on different events and alternatives. To Keynes expectations are a question of weighing probabilities by “degrees of belief,” beliefs that often have preciously little to do with the kind of stochastic probabilistic calculations made by the rational agents as modeled by “modern” social sciences. And often we “simply do not know.” As Keynes writes in Treatise:
The kind of fundamental assumption about the character of material laws, on which scientists appear commonly to act, seems to me to be [that] the system of the material universe must consist of bodies … such that each of them exercises its own separate, independent, and invariable effect, a change of the total state being compounded of a number of separate changes each of which is solely due to a separate portion of the preceding state … Yet there might well be quite different laws for wholes of different degrees of complexity, and laws of connection between complexes which could not be stated in terms of laws connecting individual parts … If different wholes were subject to different laws qua wholes and not simply on account of and in proportion to the differences of their parts, knowledge of a part could not lead, it would seem, even to presumptive or probable knowledge as to its association with other parts … These considerations do not show us a way by which we can justify induction … /427 No one supposes that a good induction can be arrived at merely by counting cases. The business of strengthening the argument chiefly consists in determining whether the alleged association is stable, when accompanying conditions are varied … /468 In my judgment, the practical usefulness of those modes of inference … on which the boasted knowledge of modern science depends, can only exist … if the universe of phenomena does in fact present those peculiar characteristics of atomism and limited variety which appears more and more clearly as the ultimate result to which material science is tending.
Science according to Keynes should help us penetrate to “the true process of causation lying behind current events” and disclose “the causal forces behind the apparent facts.” Models can never be more than a starting point in that endeavour. He further argued that it was inadmissible to project history on the future. Consequently we cannot presuppose that what has worked before, will continue to do so in the future. That models can get hold of correlations between different “variables” is not enough. If they cannot get at the causal structure that generated the data, they are not really “identified.”
How strange that mainstream economics as a rule do not even touch upon these aspects of scientific methodology that seems to be so fundamental and important for anyone trying to understand how we learn and orient ourselves in an uncertain world. An educated guess on why this is a fact would be that Keynes concepts are not possible to squeeze into a single calculable numerical “probability.” In the quest for quantities one puts a blind eye to qualities and looks the other way – but Keynes ideas keep creeping out from under the carpet.
It’s high time to give Keynes his due.
Alex Rosenberg — chair of the philosophy department at Duke University, renowned economic methodologist and author of Economics — Mathematical Politics or Science of Diminshing Returns? – had an interesting article on What’s Wrong with Paul Krugman’s Philosophy of Economics in 3:AM Magazine the other day. Writes Rosenberg:
Krugman writes: ‘So how do you do useful economics? In general, what we really do is combine maximization-and-equilibrium as a first cut with a variety of ad hoc modifications reflecting what seem to be empirical regularities about how both individual behavior and markets depart from this idealized case.’
But if you ask the New Classical economists, they’ll say, this is exactly what we do—combine maximizing-and-equilibrium with empirical regularities. And they’d go on to say it’s because Krugman’s Keynesian models don’t do this or don’t do enough of it, they are not “useful” for prediction or explanation.
When he accepts maximizing and equilibrium as the (only?) way useful economics is done Krugman makes a concession so great it threatens to undercut the rest of his arguments against New Classical economics:
‘Specifically: we have a body of economic theory built around the assumptions of perfectly rational behavior and perfectly functioning markets. Any economist with a grain of sense — which is to say, maybe half the profession? — knows that this is very much an abstraction, to be modified whenever the evidence suggests that it’s going wrong. But nobody has come up with general rules for making such modifications.’
The trouble is that the macroeconomic evidence can’t tell us when and where maximization-and-equilibrium goes wrong, and there seems no immediate prospect for improving the assumptions of perfect rationality and perfect markets from behavioral economics, neuroeconomics, experimental economics, evolutionary economics, game theory, etc.
But these concessions are all the New Classical economists need to defend themselves against Krugman. After all, he seems to admit there is no alternative to maximization and equilibrium …
One thing that’s missing from Krugman’s treatment of economics is the explicit recognition of what Keynes and before him Frank Knight, emphasized: the persistent presence of enormous uncertainty in the economy … Why is uncertainty so important? Because the more of it there is in the economy the less scope for successful maximizing and the more unstable are the equilibria the economy exhibits, if it exhibits any at all …
There is a second feature of the economy that Krugman’s useful economics needs to reckon with, one that Keynes and after him George Soros, emphasized. Along with uncertainty, the economy exhibits pervasive reflexivity: expectations about the economic future tend to actually shift that future …
When combined uncertainty and reflexivity together greatly limit the power of maximizing and equilibrium to do predictively useful economics. Reflexive relations between future expectations and outcomes are constantly breaking down at times and in ways about which there is complete uncertainty.
I think Rosenberg is on to something important here regarding Krugman’s neglect of methodological reflection.
When Krugman earlier this year responded to my critique of IS-LM this hardly came as a surprise. As Rosenberg notes, Krugman works with a very simple modelling dichotomy — either models are complex or they are simple. For years now, self-proclaimed “proud neoclassicist” Paul Krugman has in endless harpings on the same old IS-LM string told us about the splendour of the Hicksian invention — so, of course, to Krugman simpler models are always preferred.
In an earlier post on his blog, Krugman has argued that “Keynesian” macroeconomics more than anything else “made economics the model-oriented field it has become.” In Krugman’s eyes, Keynes was a “pretty klutzy modeler,” and it was only thanks to Samuelson’s famous 45-degree diagram and Hicks’s IS-LM that things got into place. Although admitting that economists have a tendency to use ”excessive math” and “equate hard math with quality” he still vehemently defends — and always have — the mathematization of economics:
I’ve seen quite a lot of what economics without math and models looks like — and it’s not good.
Sure, “New Keynesian” economists like Krugman — and their forerunners, “Keynesian” economists like Paul Samuelson and (young) John Hicks — certainly have contributed to making economics more mathematical and “model-oriented.”
But if these math-is-the-message-modelers aren’t able to show that the mechanisms or causes that they isolate and handle in their mathematically formalized macromodels are stable in the sense that they do not change when we “export” them to our “target systems,” these mathematical models do only hold under ceteris paribus conditions and are consequently of limited value to our understandings, explanations or predictions of real economic systems.
Science should help us disclose the causal forces at work behind the apparent facts. But models — mathematical, econometric, or what have you — can never be more than a starting point in that endeavour. There is always the possibility that there are other (non-quantifiable) variables – of vital importance, and although perhaps unobservable and non-additive, not necessarily epistemologically inaccessible – that were not considered for the formalized mathematical model.
The kinds of laws and relations that “modern” economics has established, are laws and relations about mathematically formalized entities in models that presuppose causal mechanisms being atomistic and additive. When causal mechanisms operate in real world social target systems they only do it in ever-changing and unstable combinations where the whole is more than a mechanical sum of parts. If economic regularities obtain they do it (as a rule) only because we engineered them for that purpose. Outside man-made mathematical-statistical “nomological machines” they are rare, or even non-existant. Unfortunately that also makes most of contemporary mainstream neoclassical endeavours of mathematical economic modeling rather useless. And that also goes for Krugman and the rest of the “New Keynesian” family.
When it comes to modeling philosophy, Paul Krugman has in an earlier piece defended his position in the following words (my italics):
I don’t mean that setting up and working out microfounded models is a waste of time. On the contrary, trying to embed your ideas in a microfounded model can be a very useful exercise — not because the microfounded model is right, or even better than an ad hoc model, but because it forces you to think harder about your assumptions, and sometimes leads to clearer thinking. In fact, I’ve had that experience several times.
The argument is hardly convincing. If people put that enormous amount of time and energy that they do into constructing macroeconomic models, then they really have to be substantially contributing to our understanding and ability to explain and grasp real macroeconomic processes. If not, they should – after somehow perhaps being able to sharpen our thoughts – be thrown into the waste-paper-basket (something the father of macroeconomics, Keynes, used to do), and not as today, being allowed to overrun our economics journals and giving their authors celestial academic prestige.
Krugman’s explications on this issue is really interesting also because they shed light on a kind of inconsistency in his art of argumentation. During a couple of years Krugman has in more than one article criticized mainstream economics for using too much (bad) mathematics and axiomatics in their model-building endeavours. But when it comes to defending his own position on various issues he usually himself ultimately falls back on the same kind of models. In his End This Depression Now — just to take one example — Paul Krugman maintains that although he doesn’t buy “the assumptions about rationality and markets that are embodied in many modern theoretical models, my own included,” he still find them useful “as a way of thinking through some issues carefully.”
When it comes to methodology and assumptions, Krugman obviously has a lot in common with the kind of model-building he otherwise criticizes.
The same critique – that when it comes to defending his own position on various issues he usually himself ultimately falls back on the same kind of models that he otherwise criticize – can be directed against his new post. Krugman has said these things before, but I am still waiting for him to really explain HOW the silly assumptions behind IS-LM helps him work with the fundamental issues. If one can only use those assumptions with — as Krugman says, “tongue in cheek” – well, why then use them at all? Wouldn’t it be better to use more adequately realistic assumptions and be able to talk clear without any tongue in cheek?
I have noticed again and again, that on most macroeconomic policy issues I find myself in agreement with Krugman. To me that just shows that Krugman is right in spite of and not thanks to those neoclassical models — IS-LM included — he ultimately refers to. When he is discussing austerity measures, Ricardian equivalence or problems with the euro, he is actually not using those models, but rather (even) simpler and more adequate and relevant thought-constructions much more in the vein of Keynes.
The final court of appeal for macroeconomic models is the real world, and as long as no convincing justification is put forward for how the inferential bridging de facto is made, macroeconomic model building is little more than “hand waving” that give us rather little warrant for making inductive inferences from models to real world target systems. If substantive questions about the real world are being posed, it is the formalistic-mathematical representations utilized to analyze them that have to match reality, not the other way around. As Keynes has it:
Economics is a science of thinking in terms of models joined to the art of choosing models which are relevant to the contemporary world. It is compelled to be this, because, unlike the natural science, the material to which it is applied is, in too many respects, not homogeneous through time.
If macroeconomic models – no matter of what ilk – make assumptions, and we know that real people and markets cannot be expected to obey these assumptions, the warrants for supposing that conclusions or hypotheses of causally relevant mechanisms or regularities can be bridged, are obviously non-justifiable. Macroeconomic theorists – regardless of being New Monetarist, New Classical or ”New Keynesian” – ought to do some ontological reflection and heed Keynes’ warnings on using thought-models in economics:
The object of our analysis is, not to provide a machine, or method of blind manipulation, which will furnish an infallible answer, but to provide ourselves with an organized and orderly method of thinking out particular problems; and, after we have reached a provisional conclusion by isolating the complicating factors one by one, we then have to go back on ourselves and allow, as well as we can, for the probable interactions of the factors amongst themselves. This is the nature of economic thinking. Any other way of applying our formal principles of thought (without which, however, we shall be lost in the wood) will lead us into error.
So let me — respectfully — summarize: A gadget is just a gadget — and brilliantly silly simple models — IS-LM included — do not help us working with the fundamental issues of modern economies any more than brilliantly silly complicated models — calibrated DSGE and RBC models included. And as Rosenberg rightly notices:
When he accepts maximizing and equilibrium as the (only?) way useful economics is done Krugman makes a concession so great it threatens to undercut the rest of his arguments against New Classical economics.
Mathematics can be beguilingly elegant. It can also be dangerous when people mistake its elegance for truth.
Albert Einstein’s theory of general relativity might be the best example of elegant math, capturing a wide range of subtle and surprising phenomena with remarkable simplicity. Step toward the practical, though, and physics moves quickly away from elegance to makeshift usefulness. There’s no pretty expression for the operation of a nuclear reactor, or for how air flows past the swept wings of an aircraft. Understanding demands ugly approximations, or brute-force simulation on a large computer …
In one very practical and consequential area, though, the allure of elegance has exercised a perverse and lasting influence. For several decades, economists have sought to express the way millions of people and companies interact in a handful of pretty equations.
The resulting mathematical structures, known as dynamic stochastic general equilibrium models, seek to reflect our messy reality without making too much actual contact with it. They assume that economic trends emerge from the decisions of only a few “representative” agents — one for households, one for firms, and so on. The agents are supposed to plan and act in a rational way, considering the probabilities of all possible futures and responding in an optimal way to unexpected shocks.
Surreal as such models might seem, they have played a significant role in informing policy at the world’s largest central banks. Unfortunately, they don’t work very well, and they proved spectacularly incapable of accommodating the way markets and the economy acted before, during and after the recent crisis …
If economists jettisoned elegance and got to work developing more realistic models, we might gain a better understanding of how crises happen, and learn how to anticipate similarly unstable episodes in the future. The theories won’t be pretty, and probably won’t show off any clever mathematics. But we ought to prefer ugly realism to beautiful fantasy.
The economy, the system of market relationships between members of society, is viewed as a relatively closed network of forces, as a system, which indeed receives a certain external impetus, but functions independently of factors such as those mentioned above, which cannot be ascertained with economic tools …
If one gains clarity about these relation-ships, then one can begin to understand why the models constructed with the help of simple behavioral assumptions by neoclassical oriented theoreticians must be immunized against experience in one way or another if their failure is to be avoided. It is not by chance that the attempts of some proponents of pure economics to achieve autonomous theory formation tend to be translated methodologically into model Platonism: the immunization from the inﬂuence of non-economic factors leads to the immunization from experience in general. It appears that the diagnosis of the fundamental methodological weakness of the neoclassical way of thinking must lead to an aversion to sociology. By contrast, regardless of all methodological differences, all heterodox currents in economics characteristically share one element: the accentuation of the signiﬁcance of social factors for economic relationships and the consciousness of the fact that the social domain analyzed by pure economics is embedded in a more comprehensive social complex that cannot be abstracted away from with no further ado if useful explanations are being sought. The methodological weakness of these currents should not prevent one from seeing what is, in my view, the decisive point, which generally tends to be buried amidst an array of irrelevant arguments about subordinate problems or pseudo-problems, such as those about the applicability of mathematical expressions, the usage of certain types of terms, the question of the preferability of generalizing or pointedly emphasizing abstraction, etc.