## On probabilism and statistics

27 Jan, 2018 at 16:51 | Posted in Statistics & Econometrics | 1 Comment

‘Mr Brown has exactly two children. At least one of them is a boy. What is the probability that the other is a girl?’ What could be simpler than that? After all, the other child either is or is not a girl. I regularly use this example on the statistics courses I give to life scientists working in the pharmaceutical industry. They all agree that the probability is one-half.

So they are all wrong. I haven’t said that the older child is a boy. The child I mentioned, the boy, could be the older or the younger child. This means that Mr Brown can have one of three possible combinations of two children: both boys, elder boy and younger girl, elder girl and younger boy, the fourth combination of two girls being excluded by what I have stated. But of the three combinations, in two cases the other child is a girl so that the requisite probability is 2/3 …

This example is typical of many simple paradoxes in probability: the answer is easy to explain but nobody believes the explanation. However, the solution I have given is correct.

Or is it? That was spoken like a probabilist. A probabilist is a sort of mathematician. He or she deals with artificial examples and logical connections but feel no obligation to say anything about the real world. My demonstration, however, relied on the assumption that the three combinations boy–boy, boy–girl and girl–boy are equally likely and this may not be true. The difference between a statistician and a probabilist is that the latter will define the problem so that this is true, whereas the former will consider whether it is true and obtain data to test its truth.

Statistical reasoning certainly seems paradoxical to most people.

Take for example the well-known Simpson’s paradox.

From a theoretical perspective, Simpson’s paradox importantly shows that causality can never be reduced to a question of statistics or probabilities, unless you are — miraculously — able to keep constant all other factors that influence the probability of the outcome studied.

To understand causality we always have to relate it to a specific causal structure. Statistical correlations are never enough. No structure, no causality.

Simpson’s paradox is an interesting paradox in itself, but it can also highlight a deficiency in the traditional econometric approach towards causality. Say you have 1000 observations on men and an equal amount of  observations on women applying for admission to university studies, and that 70% of men are admitted, but only 30% of women. Running a logistic regression to find out the odds ratios (and probabilities) for men and women on admission, females seem to be in a less favourable position (‘discriminated’ against) compared to males (male odds are 2.33, female odds are 0.43, giving an odds ratio of 5.44). But once we find out that males and females apply to different departments we may well get a Simpson’s paradox result where males turn out to be ‘discriminated’ against (say 800 male apply for economics studies (680 admitted) and 200 for physics studies (20 admitted), and 100 female apply for economics studies (90 admitted) and 900 for physics studies (210 admitted) — giving odds ratios of 0.62 and 0.37).

Econometric patterns should never be seen as anything else than possible clues to follow. From a critical realist perspective it is obvious that behind observable data there are real structures and mechanisms operating, things that are  — if we really want to understand, explain and (possibly) predict things in the real world — more important to get hold of than to simply correlate and regress observable variables.

Math cannot establish the truth value of a fact. Never has. Never will.

Paul Romer

## 1 Comment

1. Senn’s example is very similar to one I discuss at https://djmarsay.wordpress.com/notes/puzzles/the-two-daughter-problem/. We both make an important distinction between what he calls Probabilists and statisticians. But, whereas Senn claims that Probabilists keep quiet about reality, I can only wish it were so. Probabilists are often not only steeped in probability theory, but also human.

For example, at popular maths lectures dealing with probability it is very common for the lecturer to get out a coin and ask for the probabilities of various combinations of ‘Heads’ and ‘Tails’. It seems to me that, far from keeping quiet on the grounds that their mathematics is silent on such questions, those who have (dangerously) studied a little probability theory will often be the first to shout out, particularly for the more obscure combinations. So far, they have always been very wrong (the coin is generally double-headed).

The problem is that the word ‘probability’ is used in different mathematical theories, and (as in other areas) the unwary can easily conflate the meanings, at best confusing everyone. In the lectures there is a case to be made that the lecturer seemed to be asking for the Probabilists’ probability and that it was reasonable for the Probabilists to be taken by surprise by how things turned out, as such things were outside their domain of expertise. So the lesson for practical people is that when offered a probability estimate they should enquire whether it is just a Probabilists’ estimate or more soundly based. A discussion of the Principle of Indifference might be a good place to start.

Sorry, the comment form is closed at this time.