## What is randomness?

28 May, 2013 at 08:38 | Posted in Statistics & Econometrics | 2 CommentsModern probabilistic econometrics relies on the notion of probability. To at all be amenable to econometric analysis, economic observations allegedly have to be conceived as random events.

But is it really necessary to model the economic system as a system where randomness can only be analyzed and understood when based on an *a priori* notion of probability?

In probabilistic econometrics, events and observations are as a rule interpreted as random variables as if generated by an underlying probability density function, and a fortiori – since probability density functions are only definable in a probability context – consistent with a probability. As Haavelmo (1944:iii) has it:

For no tool developed in the theory of statistics has any meaning – except , perhaps for descriptive purposes – without being referred to some stochastic scheme.

When attempting to convince us of the necessity of founding empirical economic analysis on probability models, Haavelmo – building largely on the earlier Fisherian paradigm – actually forces econometrics to (implicitly) interpret events as random variables generated by an underlying probability density function.

This is at odds with reality. Randomness obviously is a fact of the real world. Probability, on the other hand, attaches to the world via intellectually constructed models, and *a fortiori* is only a fact of a probability generating machine or a well constructed experimental arrangement or “chance set-up”.

Just as there is no such thing as a “free lunch,” there is no such thing as a “free probability.” To be able at all to talk about probabilities, you have to specify a model. If there is no chance set-up or model that generates the probabilistic outcomes or events – in statistics one refers to any process where you observe or measure as an experiment (rolling a die) and the results obtained as the *outcomes* or *events* (number of points rolled with the die, being e. g. 3 or 5) of the experiment –there strictly seen is no event at all.

Probability is a relational element. It always must come with a specification of the model from which it is calculated. And then to be of any empirical scientific value it has to be *shown* to coincide with (or at least converge to) real data generating processes or structures – something seldom or never done!

And this is the basic problem with economic data. If you have a fair roulette-wheel, you can arguably specify probabilities and probability density distributions. But how do you conceive of the analogous nomological machines for prices, gross domestic product, income distribution etc? Only by a leap of faith. And that does not suffice. You have to come up with some really good arguments if you want to persuade people into believing in the existence of socio-economic structures that generate data with characteristics conceivable as stochastic events portrayed by probabilistic density distributions!

From a realistic point of view we really have to admit that the socio-economic states of nature that we talk of in most social sciences – and certainly in econometrics – are not amenable to analyze as probabilities, simply because in the real world open systems that social sciences – including econometrics – analyze, there are no probabilities to be had!

The processes that generate socio-economic data in the real world cannot just be assumed to always be adequately captured by a probability measure. And, so, it cannot really be maintained – as in the Haavelmo paradigm of probabilistic econometrics – that it even should be mandatory to treat observations and data – whether cross-section, time series or panel data – as events generated by some probability model. The important activities of most economic agents do not usually include throwing dice or spinning roulette-wheels. Data generating processes – at least outside of nomological machines like dice and roulette-wheels – are not self-evidently best modeled with probability measures.

If we agree on this, we also have to admit that probabilistic econometrics lacks a sound justification. I would even go further and argue that there really is no justifiable rationale at all for this belief that all economically relevant data can be adequately captured by a probability measure. In most real world contexts one has to *argue* one’s case. And that is obviously something seldom or never done by practitioners of probabilistic econometrics.

Econometrics and probability are intermingled with randomness. But what is randomness?

In probabilistic econometrics it is often defined with the help of independent trials – two events are said to be independent if the occurrence or nonoccurrence of either one has no effect on the probability of the occurrence of the other – as drawing cards from a deck, picking balls from an urn, spinning a roulette wheel or tossing coins – trials which are only definable if somehow set in a probabilistic context.

But if we pick a sequence of prices – say 2, 4, 3, 8, 5, 6, 6 – that we want to use in an econometric regression analysis, how do we know the sequence of prices is random and *a fortiori* being able to treat as generated by an underlying probability density function? How can we argue that the sequence is a sequence of probabilistically independent random prices? And are they really random in the sense that is most often applied in probabilistic econometrics – where X is called a random variable only if there is a sample space S with a probability measure and X is a real-valued function over the elements of S?

Bypassing the scientific challenge of going from describable randomness to calculable probability by just assuming it, is of course not an acceptable procedure. Since a probability density function is a “Gedanken” object that does not exist in a natural sense, it has to come with an export license to our real target system if it is to be considered usable.

Among those who at least honestly try to face the problem – the usual procedure is to refer to some artificial mechanism operating in some “games of chance” of the kind mentioned above and which generates the sequence. But then we still have to show that the real sequence somehow coincides with the ideal sequence that defines independence and randomness within our – to speak with science philosopher Nancy Cartwright (1999) – “nomological machine”, our chance set-up, our probabilistic model.

As the originator of the Kalman filter, Rudolf Kalman (1994:143), notes:

Not being able to test a sequence for ‘independent randomness’ (without being told how it was generated) is the same thing as accepting that reasoning about an “independent random sequence” is not operationally useful.

So why should we define randomness with probability? If we do, we have to accept that to speak of randomness we also have to presuppose the existence of nomological probability machines, since probabilities cannot be spoken of – and actually, to be strict, do not at all exist – without specifying such system-contexts (how many sides do the dice have, are the cards unmarked, etc)

If we do adhere to the Fisher-Haavelmo paradigm of probabilistic econometrics we also have to assume that all noise in our data is probabilistic and that errors are well-behaving, something that is hard to justifiably argue for as a real phenomena, and not just an operationally and pragmatically tractable assumption.

Maybe Kalman’s (1994:147) verdict that

Haavelmo’s error that randomness = (conventional) probability is just another example of scientific prejudice

is, from this perspective seen, not far-fetched.

Accepting Haavelmo’s domain of probability theory and sample space of infinite populations– just as Fisher’s (1922:311) “hypothetical infinite population, of which the actual data are regarded as constituting a random sample”, von Mises’ “collective” or Gibbs’ ”ensemble” – also implies that judgments are made on the basis of observations that are actually never made!

Infinitely repeated trials or samplings never take place in the real world. So that cannot be a sound inductive basis for a science with aspirations of explaining real-world socio-economic processes, structures or events. It’s not tenable.

As David Salsburg (2001:146) notes on probability theory:

[W]e assume there is an abstract space of elementary things called ‘events’ … If a measure on the abstract space of events fulfills certain axioms, then it is a probability. To use probability in real life, we have to identify this space of events and do so with sufficient specificity to allow us to actually calculate probability measurements on that space … Unless we can identify [this] abstract space, the probability statements that emerge from statistical analyses will have many different and sometimes contrary meanings.

Just as e. g. Keynes (1921) and Georgescu-Roegen (1971), Salsburg (2001:301f) is very critical of the way social scientists – including economists and econometricians – uncritically and without arguments have come to simply assume that one can apply probability distributions from statistical theory on their own area of research:

Probability is a measure of sets in an abstract space of events. All the mathematical properties of probability can be derived from this definition. When we wish to apply probability to real life, we need to identify that abstract space of events for the particular problem at hand … It is not well established when statistical methods are used for observational studies … If we cannot identify the space of events that generate the probabilities being calculated, then one model is no more valid than another … As statistical models are used more and more for observational studies to assist in social decisions by government and advocacy groups, this fundamental failure to be able to derive probabilities without ambiguity will cast doubt on the usefulness of these methods.

Some wise words that ought to be taken seriously by probabilistic econometricians is also given by mathematical statistician Gunnar Blom (2004:389):

If the demands for randomness are not at all fulfilled, you only bring damage to your analysis using statistical methods. The analysis gets an air of science around it, that it does not at all deserve.

Richard von Mises (1957:103) noted that

Probabilities exist only in collectives … This idea, which is a deliberate restriction of the calculus of probabilities to the investigation of relations between distributions, has not been clearly carried through in any of the former theories of probability.

And obviously not in Haavelmo’s paradigm of probabilistic econometrics either. It would have been better if one had heeded von Mises warning (1957:172) that

the field of application of the theory of errors should not be extended too far.

**This importantly also means that if you cannot show that data satisfies all the conditions of the probabilistic nomological machine – including randomness – then the statistical inferences used, lack sound foundations! **

*References*

Gunnar Blom et al: *Sannolikhetsteori och statistikteori med tillämpningar*, Lund: Studentlitteratur.

Cartwright, Nancy (1999), *The Dappled World*. Cambridge: Cambridge University Press.

Fisher, Ronald (1922), On the mathematical foundations of theoretical statistics. *Philosophical Transactions of The Royal Society A*, 222.

Georgescu-Roegen, Nicholas (1971), *The Entropy Law and the Economic Process*. Harvard University Press.

Haavelmo, Trygve (1944), The probability approach in econometrics. *Supplement to Econometrica *12:1-115.

Kalman, Rudolf (1994), Randomness Reexamined. *Modeling, Identification and Control *3:141-151.

Keynes, John Maynard (1973 (1921)), *A Treatise on Probability*. Volume VIII of *The Collected Writings of John Maynard Keynes*, London: Macmillan.

Pålsson Syll, Lars (2007), *John Maynard Keynes*. Stockholm: SNS Förlag.

Salsburg, David (2001), *The Lady Tasting Tea*. Henry Holt.

von Mises, Richard (1957), *Probability, Statistics and Truth*. New York: Dover Publications.

## 2 Comments »

RSS feed for comments on this post. TrackBack URI

### Leave a Reply

Blog at WordPress.com. | The Pool Theme.

Entries and comments feeds.

It is maybe a little unfortunate that teachers often use frequency of cars on a road as an example of Poisson distribution. But it can be used as a further teaching tool.

If I were to stand by the sida of the road counting cars, I collect a huge data set. I then do what any good economist do. I start to try and fit this to a model, a psobability density function, I create a perfect fit between my data and the model, have I gained any knowledge at all? Sadly the mathematical masturbation I have engaged in after collecting the data has not increased my knowledge at all, I have not tried to explain the data.

Now I show my data plotted in a graph to a cretin who is not schooled in economics at a prestigous university. He looks at the data and instantly says:

It looks like people start work between 8 and 9 in the morning and get off work between 16-17 in the afternoon, it also appears most people sleep at night.

Instantly and without any maths, he extracted the knowledge that was available in the data. Actually a model would not have been able to do it, no amount of mathematical excercise would be able to find that knowledge.

I propose the following excercise for economists who believe in models and statistics as tools for creating knowledge. Stand beside the road and count cars, model the data anyway you like. Collect as much data as you like. When you are satisfied with your model, put a bag over your head and cross the road without anything other than your model as guide.

Comment by Martin Kullberg— 29 May, 2013 #

[…] of these “solutions” for social sciences in general and economics in specific (see here, here, here and here). And with regards to the present article I think that since the distinction between […]

Pingback by Judea Pearl on regression and causation | Real-World Economics Review Blog— 28 September, 2013 #