Randomness reexamined (wonkish)

9 February, 2016 at 15:39 | Posted in Statistics & Econometrics | 10 Comments

The-Importance-of-RandomnessIf the centuries-old struggle with the problem of finding an analytical definition of probability has produced only endless controversies between the various doctrines, it is, in my opinion, because too little attention has been paid to the singular notion of random. For the dialectical root, in fact, lies in this notion: probability is only an arithmetical aspect of it.

Nicholas Georgescu-Roegen

Modern probabilistic econometrics relies on the notion of probability. To at all be amenable to econometric analysis, economic observations allegedly have to be conceived as random events.

But is it really necessary to model the economic system as a system where randomness can only be analyzed and understood when based on an a priori notion of probability?

In probabilistic econometrics, events and observations are as a rule interpreted as random variables as if generated by an underlying probability density function, and, a fortiori, since probability density functions are only definable in a probability context, consistent with a probability. As Haavelmo has it in ‘The probability approach in econometrics’ (1944):

For no tool developed in the theory of statistics has any meaning – except, perhaps for descriptive purposes – without being referred to some stochastic scheme.

When attempting to convince us of the necessity of founding empirical economic analysis on probability models, Haavelmo – building largely on the earlier Fisherian paradigm – actually forces econometrics to (implicitly) interpret events as random variables generated by an underlying probability density function.

This is at odds with reality. Randomness obviously is a fact of the real world. Probability, on the other hand, attaches to the world via intellectually constructed models, and a fortiori is only a fact of a probability generating machine or a well constructed experimental arrangement or “chance set-up”.

Just as there is no such thing as a “free lunch,” there is no such thing as a “free probability.” To be able at all to talk about probabilities, you have to specify a model. If there is no chance set-up or model that generates the probabilistic outcomes or events – in statistics one refers to any process where you observe or measure as an experiment (rolling a die) and the results obtained as the outcomes or events (number of points rolled with the die, being e. g. 3 or 5) of the experiment –there strictly seen is no event at all.

Probability is a relational element. It always must come with a specification of the model from which it is calculated. And then to be of any empirical scientific value it has to be shown to coincide with (or at least converge to) real data generating processes or structures – something seldom or never done!

And this is the basic problem with economic data. If you have a fair roulette-wheel, you can arguably specify probabilities and probability density distributions. But how do you conceive of the analogous nomological machines for prices, gross domestic product, income distribution etc? Only by a leap of faith. And that does not suffice. You have to come up with some really good arguments if you want to persuade people into believing in the existence of socio-economic structures that generate data with characteristics conceivable as stochastic events portrayed by probabilistic density distributions!

From a realistic point of view we really have to admit that the socio-economic states of nature that we talk of in most social sciences – and certainly in econometrics – are not amenable to analyze as probabilities, simply because in the real world open systems that social sciences – including econometrics – analyze, there are no probabilities to be had!

The processes that generate socio-economic data in the real world cannot just be assumed to always be adequately captured by a probability measure. And, so, it cannot really be maintained – as in the Haavelmo paradigm of probabilistic econometrics – that it even should be mandatory to treat observations and data – whether cross-section, time series or panel data – as events generated by some probability model. The important activities of most economic agents do not usually include throwing dice or spinning roulette-wheels. Data generating processes – at least outside of nomological machines like dice and roulette-wheels – are not self-evidently best modeled with probability measures.

If we agree on this, we also have to admit that probabilistic econometrics lacks sound foundations. I would even go further and argue that there really is no justifiable rationale at all for this belief that all economically relevant data can be adequately captured by a probability measure. In most real world contexts one has to argue one’s case. And that is obviously something seldom or never done by practitioners of probabilistic econometrics.

Econometrics and probability are intermingled with randomness. But what is randomness?

In probabilistic econometrics it is often defined with the help of independent trials – two events are said to be independent if the occurrence or nonoccurrence of either one has no effect on the probability of the occurrence of the other – as drawing cards from a deck, picking balls from an urn, spinning a roulette wheel or tossing coins – trials which are only definable if somehow set in a probabilistic context.

But if we pick a sequence of prices – say 2, 4, 3, 8, 5, 6, 6 – that we want to use in an econometric regression analysis, how do we know the sequence of prices is random and a fortiori being able to treat as generated by an underlying probability density function? How can we argue that the sequence is a sequence of probabilistically independent random prices? And are they really random in the sense that is most often applied in probabilistic econometrics – where X is called a random variable only if there is a sample space S with a probability measure and X is a real-valued function over the elements of S?

Bypassing the scientific challenge of going from describable randomness to calculable probability by just assuming it, is of course not an acceptable procedure. Since a probability density function is a “Gedanken” object that does not exist in a natural sense, it has to come with an export license to our real target system if it is to be considered usable.

Among those who at least honestly try to face the problem – the usual procedure is to refer to some artificial mechanism operating in some “games of chance” of the kind mentioned above and which generates the sequence. But then we still have to show that the real sequence somehow coincides with the ideal sequence that defines independence and randomness within our – to speak with science philosopher Nancy Cartwright – “nomological machine”, our chance set-up, our probabilistic model.

As the originator of the Kalman filter, Rudolf Kalman notes in ‘Randomness Reexamined'(1994):

Not being able to test a sequence for ‘independent randomness’ (without being told how it was generated) is the same thing as accepting that reasoning about an “independent random sequence” is not operationally useful.

Probability is a property of the model we choose to use in our endeavour to understand and explain the world in which we live — but, contrary to randomness, probability is not a property of that world.

So why should we define randomness with probability? If we do, we have to accept that to speak of randomness we also have to presuppose the existence of nomological probability machines, since probabilities cannot be spoken of – and actually, to be strict, do not at all exist – without specifying such system-contexts (how many sides do the dice have, are the cards unmarked, etc)

If we do adhere to the Fisher-Haavelmo paradigm of probabilistic econometrics we also have to assume that all noise in our data is probabilistic and that errors are well-behaving, something that is hard to justifiably argue for as a real phenomena, and not just an operationally and pragmatically tractable assumption.

Maybe Kalman’s verdict that

Haavelmo’s error that randomness = (conventional) probability is just another example of scientific prejudice

is, from this perspective seen, not far-fetched.

Accepting Haavelmo’s domain of probability theory and sample space of infinite populations – just as Fisher’s “hypothetical infinite population, of which the actual data are regarded as constituting a random sample”, von Mises’ “collective” or Gibbs’ ”ensemble” – also implies that judgments are made on the basis of observations that are actually never made!

Infinitely repeated trials or samplings never take place in the real world. So that cannot be a sound inductive basis for a science with aspirations of explaining real-world socio-economic processes, structures or events. It’s not tenable.

This importantly also means that if you cannot show that data satisfies all the conditions of the probabilistic nomological machine – including randomness – then the statistical inferences used lack sound foundations.



  1. There are similarities between medical and economic statistics and their use.
    The argument here suggests that President Obama is fundamentally mistaken in his recent proposal for $1.8 billion in emergency funding to respond to the Zika virus .

    Regarding Zika infections and microcephaly, Obama says: “There’s enough correlation that we have to take this very seriously”.
    This argument fails to satisfy Prof. Syll’s requirement that “all the conditions of the probabilistic nomological machine – including randomness” must be satisfied. So, according to Prof. Syll, “the statistical inferences used lack sound foundations.”

    If so, on what basis should we respond to Zika other than on the basis of correlations within the data? Apparently there is no known explanation for the observed correlation.

    Should we do nothing until there is progress in medical understanding of the disease?
    Should we wait until a new scientific methodology or new and practical econometric philosophy is found?

    • We should follow the master and say: “We simply do not know!”. Obama’s claim lacks sound ontological foundations and does not have an export license to the target system (usually the real world).

  2. The games of a casino are deliberately designed to produce random outcomes and nothing more. The operations of a factory, say, are designed to produce an outcome of near certainty and value, and the observed randomness belongs to the unintended residual, beyond the limits of control. Failing to place randomness in a context of intention, knowledge and control creates confusion.

  3. If we take Keynes’ treatise seriously then we can never know that a roulette wheel is ‘really’ random, but we can still apply probability theory. But we shouldn’t claim too much for the result.

  4. “The games of a casino are deliberately designed to produce random outcomes and nothing more. ”

    If that were the case, the House would never make a profit.

    “deliberately designed” suggests there is nothing random about it.

    • Yes, I suppose that is the point I was fumbling to make: only a finely controlled process generates a well-defined probability distribution. Somehow, the context of such control has to come into the center of how we think about the meanings of information, knowledge, certainty, variability and risk.

    • Why on earth couldn’t the House make a profit if the outcome of games at a casino was random? The odds matter anyway.

      • So how is that the House make a profit?

      • In roulette, for instance, if the outcome follows a uniform random distribution the probability of red (or black) is 15/31 and the expected return is therefore less than the amount gambled on. The expected profit for the house is therefore positive.

  5. Thanks Pontus. Didn’t realize there were more than 30 slots in the wheel.

Sorry, the comment form is closed at this time.

Blog at WordPress.com.
Entries and comments feeds.