Getting causality into statistics

24 Feb, 2023 at 10:39 | Posted in Statistics & Econometrics | 6 Comments

Sander Greenland at Judea Pearl Symposium - YouTubeBecause statistical analyses need a causal skeleton to connect to the world, causality is not extra-statistical but instead is a logical antecedent of real-world inferences. Claims of random or “ignorable” or “unbiased” sampling or allocation are justified by causal actions to block (“control”) unwanted causal effects on the sample patterns. Without such actions of causal blocking, independence can only be treated as a subjective exchangeability assumption whose justification requires detailed contextual information about absence of factors capable of causally influencing both selection (including selection for treatment) and outcomes. Otherwise it is essential to consider pathways for the causation of biases (nonrandom, systematic errors) and their interactions …

Probability is inadequate as a foundation for applied statistics, because competent statistical practice integrates logic, context, and probability into scientific inference and decision, using causal narratives to explain diverse data. Thus, given the absence of elaborated causality discussions in statistics textbooks and coursework, we should not be surprised at the widespread misuse and misinterpretation of statistical methods and results. This is why incorporation of causality into introductory statistics is needed as urgently as other far more modest yet equally resisted reforms involving shifts in labels and interpretations for P-values and interval estimates.

Sander Greenland

Causality can never be reduced to a question of statistics or probabilities unless you are — miraculously — able to keep constant all other factors that influence the probability of the outcome studied. To understand causality we always have to relate it to a specific causal structure. Statistical correlations are never enough. No structure, no causality.

Statistical patterns should never be seen as anything else than possible clues to follow. Behind observable data, there are real structures and mechanisms operating, things that are  — if we really want to understand, explain and (possibly) predict things in the real world — more important to get hold of than to simply correlate and regress observable variables.

Statistics cannot establish the truth value of a fact. Never has. Never will.


  1. Due to fundamental uncertainty, how can we be certain that there can’t be many causes which can include statistical models? Do physicists still not know what causes a quantum waveform to collapse to a particular measured quantity? Can there be as many causes of a jazz improviser’s next note as there are stories being told in each listener’s head?

    • One feature of intelligent, social behavior that does not get enough philosophic attention is that its causal patterns are overdetermined. The movement of billiard balls, considering only the classical physics of action, reaction and angles, is in every case almost perfectly determined and therefore predictable by reference to models of force and momentum. There is small bit of “uncertainty” introduced by friction and imperfections in the shape and elasticity of balls and table surfaces and stick.
      By contrast, the behavior of billiard players is massively overdetermined. As fits the theme of this blog, the physics of billiard ball movement is arguably “ergodic” and explainable in terms of forces and inertia in the moment; history simply doesn’t matter. The players in contrast bring to bear history embodied in skills and experience and project their calculations into the future, attempting to anticipate the strategic behavior of their opponents.

        “[…] the topic is billiards. The calculations were done by Prof. Sir Michael Berry in 1978 in his paper Regular and Irregular Motion, in Nonlinear Mechanics and recounted in The Black Swan.
        If you know a set of basic parameters concerning the ball at rest, can computer the resistance of the table (quite elementary), and can gauge the strength of the impact, then it is rather easy to predict what would happen at the first hit. The second impact becomes more complicated, but possible; and more precision is called for. The problem is that to correctly computer the ninth impact, you need to take account the gravitational pull of someone standing next to the table (modestly, Berry’s computations use a weight of less than 150 pounds). And to compute the fifty-sixth impact, every single elementary particle in the universe needs to be present in your assumptions! An electron at the edge of the universe, separated from us by 10 billion light-years, must figure in the calculations, since it exerts a meaningful effect on the outcome. (p. 178)”
        Also, why do stars (the biggest “billiard balls” we can see) in many galaxies move faster than simple physics predicts?

  2. In many instances, denying causality is a more straightforward case to make than the inference of causality. For example, an R-squared value of 0.25 cannot in and of itself constitute a causality of 25% of A from the correlated variable B. But it does imply that a direct causality of more than 25% of A from B is not likely.
    – – John Lounsbury

  3. post hoc, propter hoc only with equations

    You can protest that correlation is not causation but some will never quite grasp why that should be

    it is in stories after all — one damn thing after another

Sorry, the comment form is closed at this time.

Blog at
Entries and Comments feeds.