Statisticism — confusing statistics and research

20 May, 2018 at 12:46 | Posted in Statistics & Econometrics | 3 Comments

140113.bigdataCoupled with downright incompetence in statistics, we often find the syndrome that I have come to call statisticism: the notion that computing is synonymous with doing research, the naïve faith that statistics is a complete or sufficient basis for scientific methodology, the superstition that statistical formulas exist for evaluating such things as the relative merits of different substantive theories or the “importance” of  the causes of a “dependent variable”; and the delusion that decomposing the covariations of some arbitrary and haphazardly assembled collection of variables can somehow justify not only a “causal model” but also, praise a mark, a “measurement model.” There would be no point in deploring such caricatures of the scientific enterprise if there were a clearly identifiable sector of social science research wherein such fallacies were clearly recognized and emphatically out of bounds.

Dudley Duncan

Statistical reasoning certainly seems paradoxical to most people.

Take for example the well-known Simpson’s paradox.

From a theoretical perspective, Simpson’s paradox importantly shows that causality can never be reduced to a question of statistics or probabilities unless you are — miraculously — able to keep constant all other factors that influence the probability of the outcome studied.

To understand causality we always have to relate it to a specific causal structure. Statistical correlations are never enough. No structure, no causality.

Simpson’s paradox is an interesting paradox in itself, but it can also highlight a deficiency in the traditional econometric approach towards causality. Say you have 1000 observations on men and an equal amount of observations on women applying for admission to university studies, and that 70% of men are admitted, but only 30% of women. Running a logistic regression to find out the odds ratios (and probabilities) for men and women on admission, females seem to be in a less favourable position (‘discriminated’ against) compared to males (male odds are 2.33, female odds are 0.43, giving an odds ratio of 5.44). But once we find out that males and females apply to different departments we may well get a Simpson’s paradox result where males turn out to be ‘discriminated’ against (say 800 male apply for economics studies (680 admitted) and 200 for physics studies (20 admitted), and 100 female apply for economics studies (90 admitted) and 900 for physics studies (210 admitted) — giving odds ratios of 0.62 and 0.37).

Statistical — and econometric — patterns should never be seen as anything else than possible clues to follow. Behind observable data, there are real structures and mechanisms operating, things that are  — if we really want to understand, explain and (possibly) predict things in the real world — more important to get hold of than to simply correlate and regress observable variables.

Statistics cannot establish the truth value of a fact. Never has. Never will.

Lars P. Syll


  1. As often, I agree with the substantive point, but not the pithy summary. Suppose that Alf claims that a coin is fair, Bert tosses it 1,000 times and Charlene does the stats. She may be able to establish the truth value of the fact that Bert’s tosses are consistent (or not) with Alf’s claims. Setting aside some technicalities, I think this a reasonable 3/4 truth. But this says nothing about what might happen if Alf then uses the coin in a gamble. It could be that Alf knows how to achieve a significant bias, but Bert doesn’t.

    I am not aware of a single practical example where the above limitation does not apply. As with probabilities, you can get something that is both useful and ‘true’, but there is normally a gap with what you really want to know, and sometimes this gap matters. My ‘beef’ is that while ‘proper’ statisticians (at least in the UK) are well aware of this, they don’t always communicate it to decision-makers. (Judges, politicians, regulators, investors, … .) But then, I don’t seem to be able to do much better.

    • Dave, maybe it delivers “a reasonable 3/4 truth” but don’t forget that often
      1/2 truth + 1/2 truth = LIE 🙂

  2. So do you break down statistics also for prediction purposes (in addition to causal inference)?
    I agree that causal inference is a problem with observational data (rather than experimental data), but for prediction tasks the distinction between correlation and causal effect doesn’t matter..

Sorry, the comment form is closed at this time.

Blog at
Entries and comments feeds.