Yesterday I had a post up on the small sample fallacy. Today Dwayne Woods was kind enough to direct me to one of my favourite statisticians — Andrew Gelman — who has a post up touching on the same problematique:
Here’s Alan Turing in 1950:
“I assume that the reader is familiar with the idea of extra-sensory perception, and the meaning of the four items of it, viz. telepathy, clairvoyance, precognition and psycho-kinesis. These disturbing phenomena seem to deny all our usual scientific ideas. How we should like to discredit them! Unfortunately the statistical evidence, at least for telepathy, is overwhelming.”
Wow! Overwhelming evidence isn’t what it used to be …
How could Turing have thought this? I don’t know much about Turing but it does seem, when reading old-time literature, that belief in the supernatural was pretty common back then, lots of mention of ghosts etc …
To move things forward a few decades, Wagenmakers et al. mention “the phenomenon of social priming, where a subtle cognitive or emotional manipulation influences overt behavior. The prototypical example is the elderly walking study from Bargh, Chen, and Burrows (1996); in the priming phase of this study, students were either confronted with neutral words or with words that are related to the concept of the elderly (e.g., ‘Florida’, ‘bingo’). The results showed that the students’ walking speed was slower after having been primed with the elderly-related words.”
They then pop our this 2011 quote from Daniel Kahneman:
“When I describe priming studies to audiences, the reaction is often disbelief . . . The idea you should focus on, however, is that disbelief is not an option. The results are not made up, nor are they statistical flukes. You have no choice but to accept that the major conclusions of these studies are true.”
No, you don’t have to accept that the major conclusions of these studies are true …
Where did Turing and Kahneman go wrong?
Overstating the strength of empirical evidence. How does that happen? … Statistically significant comparisons are not hard to come by, even by researchers who are not actively fishing through the data.
The other issue is that when any real effects are almost certainly tiny (as in ESP, or social priming, or various other bank-shot behavioral effects such as ovulation and voting), statistically significant patterns can be systematically misleading …
Still and all, it’s striking to see brilliant people such as Turing and Kahneman making this mistake …
It’s good to have an open mind. When a striking result appears in the dataset, it’s possible that this result does not represent an enduring truth or even a pattern in the general population but rather is just an artifact of a particular small and noisy dataset.
One frustration I’ve had in recent discussions regarding controversial research is the seeming unwillingness of researchers to entertain the possibility that their published findings are just noise. Maybe not, maybe these are real effects being discovered, but you should at least consider the possibility that you’re chasing noise …
Turing and Kahneman both made big contributions to science, almost certainly much bigger than anything I will ever do. And I’m not criticizing them for believing in ESP and social priming. What I am criticizing them for is their insistence that the evidence is “overwhelming” and that the rest of us “have no choice” but to accept these hypotheses. Both Turing and Kahneman, great as they are, overstated the strength of the statistical evidence.
And that’s interesting. When stupid people make a mistake, that’s no big deal. But when brilliant people make a mistake, it’s worth noting.
Modern probabilistic econometrics relies on the notion of probability. To at all be amenable to econometric analysis, economic observations allegedly have to be conceived as random events.
But is it really necessary to model the economic system as a system where randomness can only be analyzed and understood when based on an a priori notion of probability?
In probabilistic econometrics, events and observations are as a rule interpreted as random variables as if generated by an underlying probability density function, and a fortiori – since probability density functions are only definable in a probability context – consistent with a probability. As Haavelmo (1944:iii) has it:
For no tool developed in the theory of statistics has any meaning – except , perhaps for descriptive purposes – without being referred to some stochastic scheme.
When attempting to convince us of the necessity of founding empirical economic analysis on probability models, Haavelmo – building largely on the earlier Fisherian paradigm – actually forces econometrics to (implicitly) interpret events as random variables generated by an underlying probability density function.
This is at odds with reality. Randomness obviously is a fact of the real world. Probability, on the other hand, attaches to the world via intellectually constructed models, and a fortiori is only a fact of a probability generating machine or a well constructed experimental arrangement or “chance set-up”.
Just as there is no such thing as a “free lunch,” there is no such thing as a “free probability.” To be able at all to talk about probabilities, you have to specify a model. If there is no chance set-up or model that generates the probabilistic outcomes or events – in statistics one refers to any process where you observe or measure as an experiment (rolling a die) and the results obtained as the outcomes or events (number of points rolled with the die, being e. g. 3 or 5) of the experiment –there strictly seen is no event at all.
Probability is a relational element. It always must come with a specification of the model from which it is calculated. And then to be of any empirical scientific value it has to be shown to coincide with (or at least converge to) real data generating processes or structures – something seldom or never done!
And this is the basic problem with economic data. If you have a fair roulette-wheel, you can arguably specify probabilities and probability density distributions. But how do you conceive of the analogous nomological machines for prices, gross domestic product, income distribution etc? Only by a leap of faith. And that does not suffice. You have to come up with some really good arguments if you want to persuade people into believing in the existence of socio-economic structures that generate data with characteristics conceivable as stochastic events portrayed by probabilistic density distributions!
From a realistic point of view we really have to admit that the socio-economic states of nature that we talk of in most social sciences – and certainly in econometrics – are not amenable to analyze as probabilities, simply because in the real world open systems that social sciences – including econometrics – analyze, there are no probabilities to be had!
The processes that generate socio-economic data in the real world cannot just be assumed to always be adequately captured by a probability measure. And, so, it cannot really be maintained – as in the Haavelmo paradigm of probabilistic econometrics – that it even should be mandatory to treat observations and data – whether cross-section, time series or panel data – as events generated by some probability model. The important activities of most economic agents do not usually include throwing dice or spinning roulette-wheels. Data generating processes – at least outside of nomological machines like dice and roulette-wheels – are not self-evidently best modeled with probability measures.
If we agree on this, we also have to admit that probabilistic econometrics lacks a sound justification. I would even go further and argue that there really is no justifiable rationale at all for this belief that all economically relevant data can be adequately captured by a probability measure. In most real world contexts one has to argue one’s case. And that is obviously something seldom or never done by practitioners of probabilistic econometrics.
Econometrics and probability are intermingled with randomness. But what is randomness?
In probabilistic econometrics it is often defined with the help of independent trials – two events are said to be independent if the occurrence or nonoccurrence of either one has no effect on the probability of the occurrence of the other – as drawing cards from a deck, picking balls from an urn, spinning a roulette wheel or tossing coins – trials which are only definable if somehow set in a probabilistic context.
But if we pick a sequence of prices – say 2, 4, 3, 8, 5, 6, 6 – that we want to use in an econometric regression analysis, how do we know the sequence of prices is random and a fortiori being able to treat as generated by an underlying probability density function? How can we argue that the sequence is a sequence of probabilistically independent random prices? And are they really random in the sense that is most often applied in probabilistic econometrics – where X is called a random variable only if there is a sample space S with a probability measure and X is a real-valued function over the elements of S?
Bypassing the scientific challenge of going from describable randomness to calculable probability by just assuming it, is of course not an acceptable procedure. Since a probability density function is a “Gedanken” object that does not exist in a natural sense, it has to come with an export license to our real target system if it is to be considered usable.
Among those who at least honestly try to face the problem – the usual procedure is to refer to some artificial mechanism operating in some “games of chance” of the kind mentioned above and which generates the sequence. But then we still have to show that the real sequence somehow coincides with the ideal sequence that defines independence and randomness within our – to speak with science philosopher Nancy Cartwright (1999) – “nomological machine”, our chance set-up, our probabilistic model.
As the originator of the Kalman filter, Rudolf Kalman (1994:143), notes:
Not being able to test a sequence for ‘independent randomness’ (without being told how it was generated) is the same thing as accepting that reasoning about an “independent random sequence” is not operationally useful.
So why should we define randomness with probability? If we do, we have to accept that to speak of randomness we also have to presuppose the existence of nomological probability machines, since probabilities cannot be spoken of – and actually, to be strict, do not at all exist – without specifying such system-contexts (how many sides do the dice have, are the cards unmarked, etc)
If we do adhere to the Fisher-Haavelmo paradigm of probabilistic econometrics we also have to assume that all noise in our data is probabilistic and that errors are well-behaving, something that is hard to justifiably argue for as a real phenomena, and not just an operationally and pragmatically tractable assumption.
Maybe Kalman’s (1994:147) verdict that
Haavelmo’s error that randomness = (conventional) probability is just another example of scientific prejudice
is, from this perspective seen, not far-fetched.
Accepting Haavelmo’s domain of probability theory and sample space of infinite populations– just as Fisher’s (1922:311) “hypothetical infinite population, of which the actual data are regarded as constituting a random sample”, von Mises’ “collective” or Gibbs’ ”ensemble” – also implies that judgments are made on the basis of observations that are actually never made!
Infinitely repeated trials or samplings never take place in the real world. So that cannot be a sound inductive basis for a science with aspirations of explaining real-world socio-economic processes, structures or events. It’s not tenable.
As David Salsburg (2001:146) notes on probability theory:
[W]e assume there is an abstract space of elementary things called ‘events’ … If a measure on the abstract space of events fulfills certain axioms, then it is a probability. To use probability in real life, we have to identify this space of events and do so with sufficient specificity to allow us to actually calculate probability measurements on that space … Unless we can identify [this] abstract space, the probability statements that emerge from statistical analyses will have many different and sometimes contrary meanings.
Just as e. g. Keynes (1921) and Georgescu-Roegen (1971), Salsburg (2001:301f) is very critical of the way social scientists – including economists and econometricians – uncritically and without arguments have come to simply assume that one can apply probability distributions from statistical theory on their own area of research:
Probability is a measure of sets in an abstract space of events. All the mathematical properties of probability can be derived from this definition. When we wish to apply probability to real life, we need to identify that abstract space of events for the particular problem at hand … It is not well established when statistical methods are used for observational studies … If we cannot identify the space of events that generate the probabilities being calculated, then one model is no more valid than another … As statistical models are used more and more for observational studies to assist in social decisions by government and advocacy groups, this fundamental failure to be able to derive probabilities without ambiguity will cast doubt on the usefulness of these methods.
Some wise words that ought to be taken seriously by probabilistic econometricians is also given by mathematical statistician Gunnar Blom (2004:389):
If the demands for randomness are not at all fulfilled, you only bring damage to your analysis using statistical methods. The analysis gets an air of science around it, that it does not at all deserve.
Richard von Mises (1957:103) noted that
Probabilities exist only in collectives … This idea, which is a deliberate restriction of the calculus of probabilities to the investigation of relations between distributions, has not been clearly carried through in any of the former theories of probability.
And obviously not in Haavelmo’s paradigm of probabilistic econometrics either. It would have been better if one had heeded von Mises warning (1957:172) that
the field of application of the theory of errors should not be extended too far.
This importantly also means that if you cannot show that data satisfies all the conditions of the probabilistic nomological machine – including randomness – then the statistical inferences used, lack sound foundations!
Gunnar Blom et al: Sannolikhetsteori och statistikteori med tillämpningar, Lund: Studentlitteratur.
Cartwright, Nancy (1999), The Dappled World. Cambridge: Cambridge University Press.
Fisher, Ronald (1922), On the mathematical foundations of theoretical statistics. Philosophical Transactions of The Royal Society A, 222.
Georgescu-Roegen, Nicholas (1971), The Entropy Law and the Economic Process. Harvard University Press.
Haavelmo, Trygve (1944), The probability approach in econometrics. Supplement to Econometrica 12:1-115.
Kalman, Rudolf (1994), Randomness Reexamined. Modeling, Identification and Control 3:141-151.
Keynes, John Maynard (1973 (1921)), A Treatise on Probability. Volume VIII of The Collected Writings of John Maynard Keynes, London: Macmillan.
Pålsson Syll, Lars (2007), John Maynard Keynes. Stockholm: SNS Förlag.
Salsburg, David (2001), The Lady Tasting Tea. Henry Holt.
von Mises, Richard (1957), Probability, Statistics and Truth. New York: Dover Publications.
I just finished reading Nate Silver’s newish book, The Signal and the Noise: Why so many predictions fail – but some don’t.
First off, let me say this: I’m very happy that people are reading a book on modeling in such huge numbers – it’s currently eighth on the New York Times best seller list and it’s been on the list for six weeks. This means people are starting to really care about modeling, both how it can help us remove biases to clarify reality and how it can institutionalize those same biases and go bad.
As a modeler myself, I am extremely concerned about how models affect the public, so the book’s success is wonderful news. The first step to get people to think critically about something is to get them to think about it at all …
Having said all that, I have major problems with this book and what it claims to explain. In fact, I’m angry.
It would be reasonable for Silver to tell us about his baseball models, which he does. It would be reasonable for him to tell us about political polling and how he uses weights on different polls to combine them to get a better overall poll. He does this as well. He also interviews a bunch of people who model in other fields, like meteorology and earthquake prediction, which is fine, albeit superficial.
What is not reasonable, however, is for Silver to claim to understand how the financial crisis was a result of a few inaccurate models, and how medical research need only switch from being frequentist to being Bayesian to become more accurate …
The truth is somewhat harder to understand, a lot less palatable, and much more important than Silver’s gloss. But when independent people like myself step up to denounce a given statement or theory, it’s not clear to the public who is the expert and who isn’t. From this vantage point, the happier, shorter message will win every time.
This raises a larger question: how can the public possibly sort through all the noise that celebrity-minded data people like Nate Silver hand to them on a silver platter? Whose job is it to push back against rubbish disguised as authoritative scientific theory?
It’s not a new question, since PR men disguising themselves as scientists have been around for decades. But I’d argue it’s a question that is increasingly urgent considering how much of our lives are becoming modeled. It would be great if substantive data scientists had a way of getting together to defend the subject against sensationalist celebrity-fueled noise …
There’s an easy test here to determine whether to be worried. If you see someone using a model to make predictions that directly benefit them or lose them money – like a day trader, or a chess player, or someone who literally places a bet on an outcome (unless they place another hidden bet on the opposite outcome) – then you can be sure they are optimizing their model for accuracy as best they can. And in this case Silver’s advice on how to avoid one’s own biases are excellent and useful.
But if you are witnessing someone creating a model which predicts outcomes that are irrelevant to their immediate bottom-line, then you might want to look into the model yourself.
Suppose you are concerned with determining what the most visited parks in a city are. One idea is to take a momentary snapshot: to see how many people are this moment in park A, how many are in park B and so on. Another idea is to look at one individual (or few of them) and to follow him for a certain period of time, e.g. a year. Then, you observe how often the individual is going to park A, how often he is going to park B and so on.
Thus, you obtain two different results: one statistical analysis over the entire ensemble of people at a certain moment in time, and one statistical analysis for one person over a certain period of time. The first one may not be representative for a longer period of time, while the second one may not be representative for all the people. The idea is that an ensemble is ergodic if the two types of statistics give the same result. Many ensembles, like the human populations, are not ergodic.
The importance of ergodicity becomes manifest when you think about how we all infer various things, how we draw some conclusion about something while having information about something else. For example, one goes once to a restaurant and likes the fish and next time he goes to the same restaurant and orders chicken, confident that the chicken will be good. Why is he confident? Or one observes that a newspaper has printed some inaccurate information at one point in time and infers that the newspaper is going to publish inaccurate information in the future. Why are these inferences ok, while others such as “more crimes are committed by black persons than by white persons, therefore each individual black person is not to be trusted” are not ok?
The answer is that the ensemble of articles published in a newspaper is more or less ergodic, while the ensemble of black people is not at all ergodic. If one searches how many mistakes appear in an entire newspaper in one issue, and then searches how many mistakes one news editor does over time, one finds the two results almost identical (not exactly, but nonetheless approximately equal). However, if one takes the number of crimes committed by black people in a certain day divided by the total number of black people, and then follows one random-picked black individual over his life, one would not find that, e.g. each month, this individual commits crimes at the same rate as the crime rate determined over the entire ensemble. Thus, one cannot use ensemble statistics to properly infer what is and what is not probable that a certain individual will do.
Structural econometrics aims to infer causes from probabilities, inferred from sample data generated in non-experimental settings. Arguably, it is the most ambitious part of econometrics. It aims to identify economic structures, robust parts of the economy to which interventions can be made to bring about desirable events. This part of econometrics is distinguished from forecasting econometrics in its attempt to capture something of the ‘real’ economy in the hope of allowing policy makers to act on and control events …
By making many strong background assumptions, the deductivist [the conventional logic of structural econometrics] reading of the regression model allows one — in principle — to support a structural reading of the equations and to support many rich causal claims as a result. Here, however, the difficulty is that of finding good evidence for many of the assumptions on which the approach rests. It seems difficult to believe, even in cases where we have good background economic knowledge, that the background information will be sufficiently to do the job that the deductivist asks of it. As a result, the deductivist approach may be difficult to sustain, at least in economics.
The difficulties in providing an evidence base for the deductive approach show just how difficult it is to warrant such strong causal claims. In short, as might be expected there is a trade-off between the strength of causal claims we would like to make from non-experimental data and the possibility of grounding these in evidence. If this conclusion is correct — and an appropriate elaboration were done to take into account the greater sophistication of actual structural econometric methods — then it suggests that if we want to do evidence-based structural econometrics, then we may need to be more modest in the causal knowledge we aim for. Or failing this, we should not act as if our causal claims — those that result from structural econometrics — are fully warranted by the evidence and we should acknowledge that they rest on contingent, conditional assumptions about the economy and the nature of causality.