What does randomisation guarantee? Nothing!

26 Jun, 2021 at 17:43 | Posted in Theory of Science & Methodology | 3 Comments

BJUP interview with John Worrall | Philosophy, Logic and Scientific MethodDoes not randomization somehow or other guarantee (or perhaps, much more plausibly, provide the nearest thing that we can have to a guarantee) that any possible links to … outcome, aside from the link to treatment …, are broken?

Although he does not explicitly make this claim, and although there are issues about how well it sits with his own technical programme, this seems to me the only way in which Pearl could, in the end, ground his argument for randomizing. Notice, first, however, that even if the claim works then it would provide a justification, on the basis of his account of cause, only for randomizing after we have deliberately matched for known possible confounders … Once it is accepted that for any real randomized allocation known factors might be unbalanced — and more sensible defenders of randomization do accept this (though curiously, as we saw earlier, they recommend rerandomizing until the known factors are balanced rather than deliberately balancing them!) — then it seems difficult to deny that a properly matched experimental and control group is better, so far as preventing known confounders from producing a misleading outcome, than leaving it to the happenstance of the tosses …

The random allocation may ‘sever the link’ with this unknown factor or it may not (since we are talking about an unknown factor, then, by definition, we will not and cannot know which). Pearl’s claim that Fisher’s method ‘guarantees’ that the link with the possible confounders is broken is then, in practical terms, pure bluster. 

John Worrall

The point of making a randomized experiment is often said to be that it ‘ensures’ that any correlation between a supposed cause and effect indicates a causal relation. This is believed to hold since randomization (allegedly) ensures that a supposed causal variable does not correlate with other variables that may influence the effect.

The problem with that (rather simplistic) view on randomization is that the claims made are both exaggerated and strictly seen false:

• Even if you manage to do the assignment to treatment and control groups ideally random, the sample selection certainly is — except in extremely rare cases — not random. Even if we make a proper randomized assignment, if we apply the results to a biased sample, there is always the risk that the experimental findings will not apply. What works ‘there,’ does not work ‘here.’ Randomization hence does not ‘guarantee ‘ or ‘ensure’ making the right causal claim. Although randomization may help us rule out certain possible causal claims, randomization per se does not guarantee anything!

• Even if both sampling and assignment are made in an ideal random way, performing standard randomized experiments only give you averages. The problem here is that although we may get an estimate of the ‘true’ average causal effect, this may ‘mask’ important heterogeneous effects of a causal nature. Although we get the right answer of the average causal effect being 0, those who are ‘treated’ may have causal effects equal to -100 and those ‘not treated’ may have causal effects equal to 100. Contemplating being treated or not, most people would probably be interested in knowing about this underlying heterogeneity and would not consider the average effect particularly enlightening.

• There is almost always a trade-off between bias and precision. In real-world settings, a little bias often does not overtrump greater precision. And — most importantly — in case we have a population with sizeable heterogeneity, the average treatment effect of the sample may differ substantially from the average treatment effect in the population. If so, the value of any extrapolating inferences made from trial samples to other populations is highly questionable.

• Since most real-world experiments and trials build on performing a single randomization, what would happen if you kept on randomizing forever, does not help you to ‘ensure’ or ‘guarantee’ that you do not make false causal conclusions in the one particular randomized experiment you actually do perform. It is indeed difficult to see why thinking about what you know you will never do, would make you happy about what you actually do.

The problem many ‘randomistas’ end up with when underestimating heterogeneity and interaction is not only an external validity problem when trying to ‘export’ regression results to different times or different target populations. It is also often an internal problem to the millions of regression estimates that economists produce every year.

‘Ideally controlled experiments’ tell us with certainty what causes what effects — but only given the right ‘closures.’ Making appropriate extrapolations from (ideal, accidental, natural, or quasi) experiments to different settings, populations, or target systems, is not easy. And since trials usually are not repeated, unbiasedness and balance on average over repeated trials say nothing about anyone trial. ‘It works there’ is no evidence for ‘it will work here.’ Causes deduced in an experimental setting still have to show that they come with an export-warrant to the target population/system. The causal background assumptions made have to be justified, and without licenses to export, the value of ‘rigorous’ and ‘precise’ methods — and ‘on-average-knowledge’ — is despairingly small.

RCTs have very little reach beyond giving descriptions of what has happened in the past. From the perspective of the future and for policy purposes they are as a rule of limited value since they cannot tell us what background factors were held constant when the trial intervention was being made.

RCTs usually do not provide evidence that the results are exportable to other target systems. RCTs cannot be taken for granted to give generalisable results. That something works somewhere for someone is no warranty for us to believe it to work for us here or even that it works generally.

Randomisation may often — in the right contexts — help us to draw causal conclusions. But it certainly is not necessary to secure scientific validity or establish causality. Randomisation guarantees nothing. Just as observational studies may be subject to different biases, so are randomised studies and trials.

3 Comments

  1. While I agree with much in these two comments, I think they go way overboard and become utterly, logically wrong when they state that “randomisation guarantees nothing”. And I mean 2+2=5 type wrong. Drop that extreme statement and focus on the problem that what randomization guarantees is narrow and far from everything the real-world applied scientist needs to know about a study to judge its import.

    The information randomization does provide (what it “guarantees”) is technical: It is only about the study considered in strictest isolation from its context and intended use (which limits it right there). That information is proportional to its effective sample size (as measured for example by its Fisher information); randomization of 2 units tells us almost nothing (only that the direction of confounding is random rather than selected by someone) while randomization of a million units tells us quite a lot more (that the potential outcomes will appear balanced). Real studies are between these extremes and their position is probability-measurable via information metrics such as Fisher’s and its transforms (such as standard errors). The details of this provision are explained by Senn at several places, e.g. in “Seven myths of randomization in clinical trials”
    https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.5713
    and subsequent discussions such as “Randomisation isn’t perfect but doing
    better is harder than you think”

    Click to access Randomisation-isnt-perfect-but.pdf


    Regardless, randomization alone says nothing at all about how the study results will apply to the next unit we see, and yet forecasting about units outside the study is whole purpose of most studies. The credibility of randomization critics would be well served if that point was emphasized instead sweeping erroneous claims like “randomization guarantees nothing”. The problem is it guarantees just one item among the many needed to interpret and apply the study results, a limitation which “randomistas” underplay.

    • I did read Senn’s article a couple of times when it was published back in 2013. There are parts of it that I agree with. But I still think that most of the critique raised by people like Freedman, Cartwright, Deaton, and Worrall is valid. Randomization is usually an effective (but not necessarily the only or always best) way to get rid of selection bias. But there are other forms of biases and (mostly unobservable) confounding factors that REAL randomization does not GUARANTEE to ‘control for’. Most — and here I am not talking of the theoretical construct ‘ideal randomization’ — real randomizations are not done more than once, and so (as Freedman used to stress) what would happen if we performed infinite re-randomization in ‘superpopulations’ or ‘hypothetical populations’ is of dubitable practical real-world value. And it still is a fact that most RCTs performed, although randomizing assignment, do not build on randomized populations (which, as is common knowledge, may lead to severe problems of ‘transportability’, ‘generalizability’, ‘external validity’ etc). Although these ‘problematiques’ are not directed especially at the causation question, they certainly do apply to that question. REAL randomizations performed do not, from an epistemological point of view, give us any guarantees when it comes to causality. At times they may certainly be of great value. At times they may be the best ‘gold standard’ we can hope for. But they do not give us any guarantees.

      • Most if not all of any disagreement here is the nuance we assign to “guarantees” – Nothing is guaranteed except death, taxes, and semantic disagreements, so I would try to avoid the word. But randomization does supply an instrumental variable, namely the randomized treatment-assignment indicator. In fact one modern causal view is that the only purpose of randomization is creation of a variable one can credibly argue is instrumental for estimating effects of treatment on the randomized cohort; e.g., see this primer: Greenland S. “An introduction to instrumental variables for epidemiologists”, Int J Epidemiol 2000;29:722–729 (the appendix was misprinted; see the Erratum: Int J Epidemiol 2000;29:1102). For a much more advanced and detailed treatment see Hernan & Robins, “Instruments for Causal Inference”, Epidemiology 2006;17:360-372.

        In ordinary clinical trials, random assignment is a very strong instrument compared to what are often offered as instruments in nonexperimental studies. Still, I agree completely and it has long been argued that confounding is left after randomization and needs to be dealt with by further covariate adjustment (e.g., see Rothman KJ. “Epidemiologic methods in clinical trials”, Cancer 1977;39:1771–5). And none of that adjustment addresses the transportability of the results (or rather, lack thereof). Hence I also agree that the obsession with randomized trials as a “gold standard” has perversely shifted attention away from the adjustments such trials need and the extreme selectivity of trial cohorts.


Sorry, the comment form is closed at this time.

Blog at WordPress.com.
Entries and Comments feeds.