Brad DeLong is wrong on realism and inference to the best explanation
31 Aug, 2015 at 13:43 | Posted in Theory of Science & Methodology | 4 CommentsBrad DeLong has a new post up where he gets critical about scientific realism and inference to the best explanation:
Daniel Little: The Case for Realism in the Social Realm:
“The case for scientific realism in the case of physics is a strong one…
The theories… postulate unobservable entities, forces, and properties. These hypotheses… are not individually testable, because we cannot directly observe or measure the properties of the hypothetical entities. But the theories as wholes have a great deal of predictive and descriptive power, and they permit us to explain and predict a wide range of physical phenomena. And the best explanation of the success of these theories is that they are true: that the world consists of entities and forces approximately similar to those hypothesized in physical theory. So realism is an inference to the best explanation…”
“WTF?!” is the only reaction I can have when I read Daniel Little.
Ptolemy’s epicycles are a very good model of planetary motion–albeit not as good as General Relativity. Nobody believes that epicycles are real …
There is something there. But just because your theory is good does not mean that the entities in your theory are “really there”, whatever that might mean…
Although Brad sounds upset, I can’t really see any good reasons why.
In a time when scientific relativism is expanding, it is important to keep up the claim for not reducing science to a pure discursive level. We have to maintain the Enlightenment tradition of thinking of reality as principally independent of our views of it and of the main task of science as studying the structure of this reality. Perhaps the most important contribution a researcher can make is reveal what this reality that is the object of science actually looks like.
Science is made possible by the fact that there are structures that are durable and are independent of our knowledge or beliefs about them. There exists a reality beyond our theories and concepts of it. It is this independent reality that our theories in some way deal with. Contrary to positivism, I would as a critical realist argue that the main task of science is not to detect event-regularities between observed facts. Rather, that task must be conceived as identifying the underlying structure and forces that produce the observed events.
In a truly wonderful essay – chapter three of Error and Inference (Cambridge University Press, 2010, eds. Deborah Mayo and Aris Spanos) – Alan Musgrave gives strong arguments why scientific realism and inference to the best explanation are the best alternatives for explaining what’s going on in the world we live in:
For realists, the name of the scientific game is explaining phenomena, not just saving them. Realists typically invoke ‘inference to the best explanation’ or IBE …
IBE is a pattern of argument that is ubiquitous in science and in everyday life as well. van Fraassen has a homely example:
“I hear scratching in the wall, the patter of little feet at midnight, my cheese disappears – and I infer that a mouse has come to live with me. Not merely that these apparent signs of mousely presence will continue, not merely that all the observable phenomena will be as if there is a mouse, but that there really is a mouse.” (1980: 19-20)
Here, the mouse hypothesis is supposed to be the best explanation of the phenomena, the scratching in the wall, the patter of little feet, and the disappearing cheese.
What exactly is the inference in IBE, what are the premises, and what the conclusion? van Fraassen says “I infer that a mouse has come to live with me”. This suggests that the conclusion is “A mouse has come to live with me” and that the premises are statements about the scratching in the wall, etc. Generally, the premises are the things to be explained (the explanandum) and the conclusion is the thing that does the explaining (the explanans). But this suggestion is odd. Explanations are many and various, and it will be impossible to extract any general pattern of inference taking us from explanandum to explanans. Moreover, it is clear that inferences of this kind cannot be deductively valid ones, in which the truth of the premises guarantees the truth of the conclusion. For the conclusion, the explanans, goes beyond the premises, the explanandum. In the standard deductive model of explanation, we infer the explanandum from the explanans, not the other way around – we do not deduce the explanatory hypothesis from the phenomena, rather we deduce the phenomena from the explanatory hypothesis …
The intellectual ancestor of IBE is Peirce’s abduction, and here we find a different pattern:
The surprising fact, C, is observed.
But if A were true, C would be a matter of course.
Hence, … A is true.
(C. S. Peirce, 1931-58, Vol. 5: 189)Here the second premise is a fancy way of saying “A explains C”. Notice that the explanatory hypothesis A figures in this second premise as well as in the conclusion. The argument as a whole does not generate the explanans out of the explanandum. Rather, it seeks to justify the explanatory hypothesis …
Abduction is deductively invalid … IBE attempts to improve upon abduction by requiring that the explanation is the best explanation that we have. It goes like this:
F is a fact.
Hypothesis H explains F.
No available competing hypothesis explains F as well as H does.
Therefore, H is true
(William Lycan, 1985: 138)This is better than abduction, but not much better. It is also deductively invalid …
There is a way to rescue abduction and IBE. We can validate them without adding missing premises that are obviously false, so that we merely trade obvious invalidity for equally obvious unsoundness. Peirce provided the clue to this. Peirce’s original abductive scheme was not quite what we have considered so far. Peirce’s original scheme went like this:
The surprising fact, C, is observed.
But if A were true, C would be a matter of course.
Hence, there is reason to suspect that A is true.
(C. S. Peirce, 1931-58, Vol. 5: 189)This is obviously invalid, but to repair it we need the missing premise “There is reason to suspect that any explanation of a surprising fact is true”. This missing premise is, I suggest, true. After all, the epistemic modifier “There is reason to suspect that …” weakens the claims considerably. In particular, “There is reason to suspect that A is true” can be true even though A is false. If the missing premise is true, then instances of the abductive scheme may be both deductively valid and sound.
IBE can be rescued in a similar way. I even suggest a stronger epistemic modifier, not “There is reason to suspect that …” but rather “There is reason to believe (tentatively) that …” or, equivalently, “It is reasonable to believe (tentatively) that …” What results, with the missing premise spelled out, is:
It is reasonable to believe that the best available explanation of any fact is true.
F is a fact.
Hypothesis H explains F.
No available competing hypothesis explains F as well as H does.
Therefore, it is reasonable to believe that H is true.This scheme is valid and instances of it might well be sound. Inferences of this kind are employed in the common affairs of life, in detective stories, and in the sciences.
Of course, to establish that any such inference is sound, the ‘explanationist’ owes us an account of when a hypothesis explains a fact, and of when one hypothesis explains a fact better than another hypothesis does. If one hypothesis yields only a circular explanation and another does not, the latter is better than the former. If one hypothesis has been tested and refuted and another has not, the latter is better than the former. These are controversial issues, to which I shall return. But they are not the most controversial issue – that concerns the major premise. Most philosophers think that the scheme is unsound because this major premise is false, whatever account we can give of explanation and of when one explanation is better than another. So let me assume that the explanationist can deliver on the promises just mentioned, and focus on this major objection.
People object that the best available explanation might be false. Quite so – and so what? It goes without saying that any explanation might be false, in the sense that it is not necessarily true. It is absurd to suppose that the only things we can reasonably believe are necessary truths.
What if the best explanation not only might be false, but actually is false. Can it ever be reasonable to believe a falsehood? Of course it can. Suppose van Fraassen’s mouse explanation is false, that a mouse is not responsible for the scratching, the patter of little feet, and the disappearing cheese. Still, it is reasonable to believe it, given that it is our best explanation of those phenomena. Of course, if we find out that the mouse explanation is false, it is no longer reasonable to believe it. But what we find out is that what we believed was wrong, not that it was wrong or unreasonable for us to have believed it.
People object that being the best available explanation of a fact does not prove something to be true or even probable. Quite so – and again, so what? The explanationist principle – “It is reasonable to believe that the best available explanation of any fact is true” – means that it is reasonable to believe or think true things that have not been shown to be true or probable, more likely true than not.
I do appreciate when mainstream economists like Brad make an effort at doing some methodological-ontological-epistemological reflection. On this issue, unfortunately — although it’s always interesting and thought-provoking to read what Brad has to say — his arguments are too weak to warrant the negative stance on scientific realism and inference to the best explanation.
Unbiased econometric estimates? Forget it!
30 Aug, 2015 at 20:42 | Posted in Statistics & Econometrics | Comments Off on Unbiased econometric estimates? Forget it!Following our recent post on econometricians’ traditional privileging of unbiased estimates, there were a bunch of comments echoing the challenge of teaching this topic, as students as well as practitioners often seem to want the comfort of an absolute standard such as best linear unbiased estimate or whatever. Commenters also discussed the tradeoff between bias and variance, and the idea that unbiased estimates can overfit the data.
I agree with all these things but I just wanted to raise one more point: In realistic settings, unbiased estimates simply don’t exist. In the real world we have nonrandom samples, measurement error, nonadditivity, nonlinearity, etc etc etc.
So forget about it. We’re living in the real world …
It’s my impression that many practitioners in applied econometrics and statistics think of their estimation choice kinda like this:
1. The unbiased estimate. It’s the safe choice, maybe a bit boring and maybe not the most efficient use of the data, but you can trust it and it gets the job done.
2. A biased estimate. Something flashy, maybe Bayesian, maybe not, it might do better but it’s risky. In using the biased estimate, you’re stepping off base—the more the bias, the larger your lead—and you might well get picked off …
If you take the choice above and combine it with the unofficial rule that statistical significance is taken as proof of correctness (in econ, this would also require demonstrating that the result holds under some alternative model specifications, but “p less than .05″ is still key), then you get the following decision rule:
A. Go with the safe, unbiased estimate. If it’s statistically significant, run some robustness checks and, if the result doesn’t go away, stop.
B. If you don’t succeed with A, you can try something fancier. But . . . if you do that, everyone will know that you tried plan A and it didn’t work, so people won’t trust your finding.
So, in a sort of Gresham’s Law, all that remains is the unbiased estimate. But, hey, it’s safe, conservative, etc, right?
And that’s where the present post comes in. My point is that the unbiased estimate does not exist! There is no safe harbor. Just as we can never get our personal risks in life down to zero … there is no such thing as unbiasedness. And it’s a good thing, too: recognition of this point frees us to do better things with our data right away.
‘New Keynesian’ models are not too simple. They are just wrong.
30 Aug, 2015 at 11:30 | Posted in Economics | 4 Comments
Simon Wren-Lewis has a nice post discussing Paul Romer’s critique of macro. In Simon’s words:
“It is hard to get academic macroeconomists trained since the 1980s to address [large scale Keynesian models] , because they have been taught that these models and techniques are fatally flawed because of the Lucas critique and identification problems … But DSGE models as a guide for policy are also fatally flawed because they are too simple. The unique property that DSGE models have is internal consistency … Take a DSGE model, and alter a few equations so that they fit the data much better, and you have what could be called a structural econometric model. It is internally inconsistent, but because it fits the data better it may be a better guide for policy.”
Nope! Not too simple. Just wrong!
I disagree with Simon. NK models are not too simple. They are simply wrong. There are no ‘frictions’. There is no Calvo Fairy. There are simply persistent nominal beliefs.
Period.
Yes indeed. There really is something about the way macroeconomists construct their models nowadays that obviously doesn’t sit right.
Empirical evidence only plays a minor role in neoclassical mainstream economic theory, where models largely function as a substitute for empirical evidence. One might have hoped that humbled by the manifest failure of its theoretical pretences during the latest economic-financial crisis, the one-sided, almost religious, insistence on axiomatic-deductivist modeling as the only scientific activity worthy of pursuing in economics would give way to methodological pluralism based on ontological considerations rather than formalistic tractability. That has, so far, not happened.
Fortunately — when you’ve got tired of the kind of macroeconomic apologetics produced by “New Keynesian” macroeconomists and other DSGE modellers — there still are some real Keynesian macroeconomists to read. One of them — Axel Leijonhufvud — writes:
For many years now, the main alternative to Real Business Cycle Theory has been a somewhat loose cluster of models given the label of New Keynesian theory. New Keynesians adhere on the whole to the same DSGE modeling technology as RBC macroeconomists but differ in the extent to which they emphasise inflexibilities of prices or other contract terms as sources of shortterm adjustment problems in the economy. The “New Keynesian” label refers back to the “rigid wages” brand of Keynesian theory of 40 or 50 years ago. Except for this stress on inflexibilities this brand of contemporary macroeconomic theory has basically nothing Keynesian about it …
I conclude that dynamic stochastic general equilibrium theory has shown itself an intellectually bankrupt enterprise. But this does not mean that we should revert to the old Keynesian theory that preceded it (or adopt the New Keynesian theory that has tried to compete with it). What we need to learn from Keynes … are about how to view our responsibilities and how to approach our subject.
If macroeconomic models — no matter of what ilk — build on microfoundational assumptions of representative actors, rational expectations, market clearing and equilibrium, and we know that real people and markets cannot be expected to obey these assumptions, the warrants for supposing that conclusions or hypothesis of causally relevant mechanisms or regularities can be bridged, are obviously non-justifiable. Incompatibility between actual behaviour and the behaviour in macroeconomic models building on representative actors and rational expectations-microfoundations is not a symptom of “irrationality”. It rather shows the futility of trying to represent real-world target systems with models flagrantly at odds with reality.
A gadget is just a gadget — and no matter how brilliantly silly DSGE models you come up with, they do not help us working with the fundamental issues of modern economies. Using DSGE models only confirms Robert Gordon‘s dictum that today
rigor competes with relevance in macroeconomic and monetary theory, and in some lines of development macro and monetary theorists, like many of their colleagues in micro theory, seem to consider relevance to be more or less irrelevant.
Funeral Ikos
30 Aug, 2015 at 09:31 | Posted in Varia | Comments Off on Funeral Ikos
If thou hast shown mercy
unto man, o man,
that same mercy
shall be shown thee there;
and if on an orphan
thou hast shown compassion,
that same shall there
deliver thee from want.
If in this life
the naked thou hast clothed,
the same shall give thee
shelter there,
and sing the psalm:
Alleluia.
A life without the music of people like John Tavener and Arvo Pärt would be unimaginable.
Has macroeconomics — really — progressed?
25 Aug, 2015 at 09:32 | Posted in Economics | 3 CommentsA typical DSGE model has a key property that from my work seems wrong. A good example is the model in Galí and Gertler (2007). In this model a positive price shock—a ‘‘cost push” shock — is explosive unless the Fed raises the nominal interest rate more than the increase in the inflation rate.
In other words, positive price shocks with the nominal interest rate held constant are expansionary (because the real interest rate falls). In my work, however, they are contractionary. If there is a positive price shock like an oil price increase, nominal wages lag output prices, and so the real wage initially falls. This has a negative effect on consumption. In addition, household real wealth falls because nominal asset prices don’t initially rise as much as the price level. This has a negative effect on consumption through a wealth effect. There is little if any offset from lower real interest rates because households appear to respond more to nominal rates than to real rates. Positive price shocks are thus contractionary even if the Fed keeps the nominal interest rate unchanged. This property is important for a monetary authority in deciding how to respond to a positive price shock. If the authority used the Galí and Gertler (2007) model, it would likely raise the nominal interest rate too much thinking that the price shock is otherwise expansionary. Typical DSGE models are thus likely to be misleading for guiding monetary policy if this key property of the models is wrong.
Deirdre McCloskey’s shallow and misleading rhetoric
20 Aug, 2015 at 17:07 | Posted in Economics | 7 CommentsThis is not new to most of you of course. You are already steeped in McCloskey’s Rhetoric. Or you ought to be. After all economists are simply telling stories about the economy. Sometimes we are taken in. Sometimes we are not.
Unfortunately McCloskey herself gets a little too caught up in her stories. As in her explanation as to how she can be both a feminist and a free market economist:
“The market is the great liberator of women; it has not been the state, which is after all an instrument of patriarchy … The market is the way out of enslavement from your dad, your husband, or your sons. … The enrichment that has come through allowing markets to operate has been a tremendous part of the learned freedom of the modern women.” — Quoted in “The Changing Face of Economics – Conversations With Cutting Edge Economists” by Colander, Holt, and Rosser
Notice the binary nature of the world in this story. There are only the market (yea!) and the state (boo!). There are no other institutions. Whole swathes of society vanish or are flattened into insignificance. The state is viewed as a villain that the market heroically battles against to advance us all.
It is a ripping tale.
It is shallow and utterly misleading.
Top universities — preserves of bad economics
19 Aug, 2015 at 09:56 | Posted in Economics | Comments Off on Top universities — preserves of bad economicsThere are certainly some things that the top institutions offer which lower-ranked once simply can’t: great buildings and history for starters. To walk around Cambridge, to see its grand architecture, and to feel drenched in its history, is an amazing experience …
But the quality of the education you get at University depends very much on the individual people you are taught by, and here University rankings are far from a perfect guide. Extremely gifted teachers and researchers can be at lower ranked Universities, for a multitude of reasons from personal preferences to sheer lock-in: a capable person can start in a lower-ranked institution, and find that the “Old Boys Network” locks them out of the higher ranked ones.
In my own field of economics, there is also a paradox at play: in many ways the top universities have become preserves of bad economics, both in content and in teaching quality, while the best education in economics often comes from the lower ranked Universities.
In fact, there’s a case to be made that the better the University is ranked, the worse the education in economics will be. And before you think I’m just flogging my own wares here, consider what the American Economics Association had to say about the way that economics education appeared to be headed in the USA back in 1991:
“The Commission’s fear is that graduate programs may be turning out a generation with too many idiots savants, skilled in technique but innocent of real economic issues.” (“Report of the Commission on Graduate Education in Economics”, American Economic Association 1991)
The graduates of 1991 have become the University lecturers of today, and thanks to them, the trend the report identified at the graduate level has trickled down to undergraduate education at the so-called leading Universities.
Things economists could learn from kids
18 Aug, 2015 at 19:26 | Posted in Economics | Comments Off on Things economists could learn from kidsKids, somehow, seem to be more in touch with real science than can-opener-assuming economists …
A physicist, a chemist, and an economist are stranded on a desert island. One can only imagine what sort of play date went awry to land them there. Anyway, they’re hungry. Like, desert island hungry. And then a can of soup washes ashore. Progresso Reduced Sodium Chicken Noodle, let’s say. Which is perfect, because the physicist can’t have much salt, and the chemist doesn’t eat red meat.
But, famished as they are, our three professionals have no way to open the can. So they put their brains to the problem. The physicist says “We could drop it from the top of that tree over there until it breaks open.” And the chemist says “We could build a fire and sit the can in the flames until it bursts open.”
Those two squabble a bit, until the economist says “No, no, no. Come on, guys, you’d lose most of the soup. Let’s just assume a can opener.”
Statistical power and significance
18 Aug, 2015 at 09:41 | Posted in Statistics & Econometrics | 1 CommentMuch has been said about significance testing – most of it negative. Methodologists constantly point out that researchers misinterpret p-values. Some say that it is at best a meaningless exercise and at worst an impediment to scientific discoveries. Consequently, I believe it is extremely important that students and researchers correctly interpret statistical tests. This visualization is meant as an aid for students when they are learning about statistical hypothesis testing.
Kristoffer Magnusson
Great stuff!
Robert Solow kicking Lucas and Sargent in the pants
16 Aug, 2015 at 20:34 | Posted in Economics | 6 CommentsMcNees documented the radical break between the 1960s and 1970s. The question is: what are the possible responses that economists and economics can make to those events?
One possible response is that of Professors Lucas and Sargent. They describe what happened in the 1970s in a very strong way with a polemical vocabulary reminiscent of Spiro Agnew. Let me quote some phrases that I culled from thepaper: “wildly incorrect,” “fundamentally flawed,” “wreckage,” “failure,” “fatal,” “of no value,” “dire implications,” “failure on a grand scale,” “spectac- ular recent failure,” “no hope” … I think that Professors Lucas and Sargent really seem to be serious in what they say, and in turn they have a proposal for constructive research that I find hard to talk about sympathetically. They call it equilibrium business cycle theory, and they say very firmly that it is based on two terribly important postulates — optimizing behavior and perpetual market clearing. When you read closely, they seem to regard the postulate of optimizing behavior as self-evident and the postulate of market-clearing behavior as essentially meaningless. I think they are too optimistic, since the one that they think is self-evident I regard as meaningless and the one that they think is meaningless, I regard as false. The assumption that everyone optimizes implies only weak and uninteresting consistency conditions on their behavior. Anything useful has to come from knowing what they optimize, and what constraints they perceive. Lucas and Sargent’s casual assumptions have no special claim to attention …
It is plain as the nose on my face that the labor market and many markets for produced goods do not clear in any meaningful sense. Professors Lucas and Sargent say after all there is no evidence that labor markets do not clear, just the unemployment survey. That seems to me to be evidence. Suppose an unemployed worker says to you “Yes, I would be glad to take a job like the one I have already proved I can do because I had it six months ago or three or four months ago. And I will be glad to work at exactly the same wage that is being paid to those exactly like myself who used to be working at that job and happen to be lucky enough still to be working at it.” Then I’m inclined to label that a case of excess supply of labor and I’m not inclined to make up an elaborate story of search or misinformation or anything of the sort. By the way I find the misinformation story another gross implausibility. I would like to see direct evidence that the unemployed are more misinformed than the employed, as I presume would have to be the case if everybody is on his or her supply curve of employment. Similarly, if the Chrysler Motor Corporation tells me that it would be happy to make and sell 1000 more automobiles this week at the going price if only it could find buyers for them, I am inclined to believe they are telling me that price exceeds marginal cost, or even that marginal revenue exceeds marginal cost, and regard that as a case of excess supply of automobiles. Now you could ask, why do not prices and wages erode and crumble under those circumstances? Why doesn’t the unemployed worker who told me “Yes, I would like to work, at the going wage, at the old job that my brother-in-law or my brother-in-law’s brother-in-law is still holding”, why doesn’t that person offer to work at that job for less? Indeed why doesn’t the employer try to encourage wage reduction? That doesn’t happen either. Why does the Chrysler Corporation not cut the price? Those are questions that I think an adult person might spend a lifetime studying. They are important and serious questions, but the notion that the excess supply is not there strikes me as utterly implausible.
No unnecessary beating around the bush here.
The always eminently quotable Solow says it all.
The purported strength of New Classical macroeconomics is that it has firm anchorage in preference-based microeconomics, and especially the decisions taken by inter-temporal utility maximizing “forward-loooking” individuals.
To some of us, however, this has come at too high a price. The almost quasi-religious insistence that macroeconomics has to have microfoundations – without ever presenting neither ontological nor epistemological justifications for this claim – has put a blind eye to the weakness of the whole enterprise of trying to depict a complex economy based on an all-embracing representative actor equipped with superhuman knowledge, forecasting abilities and forward-looking rational expectations. It is as if – after having swallowed the sour grapes of the Sonnenschein-Mantel-Debreu-theorem – these economists want to resurrect the omniscient walrasian auctioneer in the form of all-knowing representative actors equipped with rational expectations and assumed to somehow know the true structure of our model of the world.
That anyone should take that kind of stuff seriously is totally and unbelievably ridiculous. Or as Solow has it:
Suppose someone sits down where you are sitting right now and announces to me that he is Napoleon Bonaparte. The last thing I want to do with him is to get involved in a technical discussion of cavalry tactics at the battle of Austerlitz. If I do that, I’m getting tacitly drawn into the game that he is Napoleon. Now, Bob Lucas and Tom Sargent like nothing better than to get drawn into technical discussions, because then you have tacitly gone along with their fundamental assumptions; your attention is attracted away from the basic weakness of the whole story. Since I find that fundamental framework ludicrous, I respond by treating it as ludicrous – that is, by laughing at it – so as not to fall into the trap of taking it seriously and passing on to matters of technique.
Our kids and the American dream
15 Aug, 2015 at 17:37 | Posted in Education & School | Comments Off on Our kids and the American dream
No doubt one of the most important books you will read this year!
General equilibrium theory — a gross misallocation of intellectual resources and time
15 Aug, 2015 at 10:30 | Posted in Economics | 2 CommentsGeneral equilibrium is fundamental to economics on a more normative level as well. A story about Adam Smith, the invisible hand, and the merits of markets pervades introductory textbooks, classroom teaching, and contemporary political discourse.
The intellectual foundation of this story rests on general equilibrium, not on the latest mathematical excursions. If the foundation of everyone’s favourite economics story is now known to be unsound — and according to some, uninteresting as well — then the profession owes the world a bit of an explanation.
Almost a century and a half after Léon Walras founded general equilibrium theory, economists still have not been able to show that markets lead economies to equilibria.
We do know that — under very restrictive assumptions — equilibria do exist, are unique and are Pareto-efficient.
But after reading Frank Ackerman’s article — or Franklin M. Fisher’s The stability of general equilibrium – what do we know and why is it important? — one has to ask oneself — what good does that do?
As long as we cannot show that there are convincing reasons to suppose there are forces which lead economies to equilibria — the value of general equilibrium theory is nil. As long as we cannot really demonstrate that there are forces operating — under reasonable, relevant and at least mildly realistic conditions — at moving markets to equilibria, there cannot really be any sustainable reason for anyone to pay any interest or attention to this theory.
A stability that can only be proved by assuming Santa Claus conditions is of no avail. Most people do not believe in Santa Claus anymore. And for good reasons. Santa Claus is for kids, and general equilibrium economists ought to grow up, leaving their Santa Claus economics in the dustbin of history.
Continuing to model a world full of agents behaving as economists — “often wrong, but never uncertain” — and still not being able to show that the system under reasonable assumptions converges to equilibrium (or simply assume the problem away), is a gross misallocation of intellectual resources and time. As Ackerman writes:
The guaranteed optimality of market outcomes and laissez-faire policies died with general equilibrium. If economic stability rests on exogenous social and political forces, then it is surely appropriate to debate the desirable extent of intervention in the market — in part, in order to rescue the market fromits own instability.
Les garcons de la plage
14 Aug, 2015 at 21:24 | Posted in Varia | Comments Off on Les garcons de la plage
Ragnar Frisch on the limits of statistics and significance testing
13 Aug, 2015 at 12:00 | Posted in Statistics & Econometrics | Comments Off on Ragnar Frisch on the limits of statistics and significance testing
I do not claim that the technique developed in the present paper will, like a stone of the wise, solve all the problems of testing “significance” with which the economic statistician is confronted. No statistical technique, however, refined, will ever be able to do such a thing. The ultimate test of significance must consist in a network of conclusions and cross checks where theoretical economic considerations, intimate and realistic knowledge of the data and a refined statistical technique concur.
Noah Smith thinks p-values work. Read my lips — they don’t!
12 Aug, 2015 at 16:24 | Posted in Statistics & Econometrics | 5 CommentsNoah Smith has a post up trying to defend p-values and traditional statistical significance testing against the increasing attacks launched against it:
Suddenly, everyone is getting really upset about p-values and statistical significance testing. The backlash has reached such a frenzy that some psych journals are starting to ban significance testing. Though there are some well-known problems with p-values and significance testing, this backlash doesn’t pass the smell test. When a technique has been in wide use for decades, it’s certain that LOTS of smart scientists have had a chance to think carefully about it. The fact that we’re only now getting the backlash means that the cause is something other than the inherent uselessness of the methodology.
Hmm …
That doesn’t sound very convincing.
Maybe we should apply yet another smell test …
A non-trivial part of teaching statistics is made up of learning students to perform significance testing. A problem I have noticed repeatedly over the years, however, is that no matter how careful you try to be in explicating what the probabilities generated by these statistical tests – p values – really are, still most students misinterpret them.
This is not to blame on students’ ignorance, but rather on significance testing not being particularly transparent (conditional probability inference is difficult even to those of us who teach and practice it). A lot of researchers fall pray to the same mistakes. So — given that it anyway is very unlikely than any population parameter is exactly zero, and that contrary to assumption most samples in social science and economics are not random or having the right distributional shape — why continue to press students and researchers to do null hypothesis significance testing, testing that relies on weird backward logic that students and researchers usually don’t understand?
Statistical significance doesn’t say that something is important or true. And since there already are far better and more relevant testing that can be done, it is high time to give up on this statistical fetish.
Jager and Leek may well be correct in their larger point, that the medical literature is broadly correct. But I don’t think the statistical framework they are using is appropriate for the questions they are asking. My biggest problem is the identification of scientific hypotheses and statistical “hypotheses” of the “theta = 0″ variety.
Based on the word “empirical” title, I thought the authors were going to look at a large number of papers with p-values and then follow up and see if the claims were replicated. But no, they don’t follow up on the studies at all! What they seem to be doing is collecting a set of published p-values and then fitting a mixture model to this distribution, a mixture of a uniform distribution (for null effects) and a beta distribution (for non-null effects). Since only statistically significant p-values are typically reported, they fit their model restricted to p-values less than 0.05. But this all assumes that the p-values have this stated distribution. You don’t have to be Uri Simonsohn to know that there’s a lot of p-hacking going on. Also, as noted above, the problem isn’t really effects that are exactly zero, the problem is that a lot of effects are lots in the noise and are essentially undetectable given the way they are studied.
Jager and Leek write that their model is commonly used to study hypotheses in genetics and imaging. I could see how this model could make sense in those fields … but I don’t see this model applying to published medical research, for two reasons. First … I don’t think there would be a sharp division between null and non-null effects; and, second, there’s just too much selection going on for me to believe that the conditional distributions of the p-values would be anything like the theoretical distributions suggested by Neyman-Pearson theory.
So, no, I don’t at all believe Jager and Leek when they write, “we are able to empirically estimate the rate of false positives in the medical literature and trends in false positive rates over time.” They’re doing this by basically assuming the model that is being questioned, the textbook model in which effects are pure and in which there is no p-hacking.
Indeed. If anything, this underlines how important it is — and on this Noah Smith and yours truly agree — not to equate science with statistical calculation. All science entail human judgement, and using statistical models doesn’t relieve us of that necessity. Working with misspecified models, the scientific value of significance testing is actually zero – even though you’re making valid statistical inferences! Statistical models and concomitant significance tests are no substitutes for doing real science. Or as a noted German philosopher once famously wrote:
There is no royal road to science, and only those who do not dread the fatiguing climb of its steep paths have a chance of gaining its luminous summits.
In its standard form, a significance test is not the kind of “severe test” that we are looking for in our search for being able to confirm or disconfirm empirical scientific hypothesis. This is problematic for many reasons, one being that there is a strong tendency to accept the null hypothesis since they can’t be rejected at the standard 5% significance level. In their standard form, significance tests bias against new hypotheses by making it hard to disconfirm the null hypothesis.
And as shown over and over again when it is applied, people have a tendency to read “not disconfirmed” as “probably confirmed.” Standard scientific methodology tells us that when there is only say a 10 % probability that pure sampling error could account for the observed difference between the data and the null hypothesis, it would be more “reasonable” to conclude that we have a case of disconfirmation. Especially if we perform many independent tests of our hypothesis and they all give about the same 10 % result as our reported one, I guess most researchers would count the hypothesis as even more disconfirmed.
Most importantly — we should never forget that the underlying parameters we use when performing significance tests are model constructions. Our p-values mean next to nothing if the model is wrong. As eminent mathematical statistician David Freedman writes:
I believe model validation to be a central issue. Of course, many of my colleagues will be found to disagree. For them, fitting models to data, computing standard errors, and performing significance tests is “informative,” even though the basic statistical assumptions (linearity, independence of errors, etc.) cannot be validated. This position seems indefensible, nor are the consequences trivial. Perhaps it is time to reconsider.
Statistical significance tests DO NOT validate models!
In journal articles a typical regression equation will have an intercept and several explanatory variables. The regression output will usually include an F-test, with p – 1 degrees of freedom in the numerator and n – p in the denominator. The null hypothesis will not be stated. The missing null hypothesis is that all the coefficients vanish, except the intercept.
If F is significant, that is often thought to validate the model. Mistake. The F-test takes the model as given. Significance only means this: if the model is right and the coefficients are 0, it is very unlikely to get such a big F-statistic. Logically, there are three possibilities on the table:
i) An unlikely event occurred.
ii) Or the model is right and some of the coefficients differ from 0.
iii) Or the model is wrong.
So?
Blog at WordPress.com.
Entries and Comments feeds.