Do RCTs really control for ‘lack of balance’?

16 Feb, 2022 at 11:14 | Posted in Statistics & Econometrics | 1 Comment

Mike Clarke, the Director of the Cochrane Centre in the UK, for example, states on the Centre’s Web site: ‘In a randomized trial, the only difference between the two groups being compared is that of most interest: the intervention under investigation’.

Evidence-based medicine is broken: why we need data and technology to fix itThis seems clearly to constitute a categorical assertion that by randomizing, all other factors — both known and unknown — are equalized between the experimental and control groups; hence the only remaining difference is exactly that one group has been given the treatment under test, while the other has been given either a placebo or conventional therapy; and hence any observed difference in outcome between the two groups in a randomized trial (but only in a randomized trial) must be the effect of the treatment under test.

Clarke’s claim is repeated many times elsewhere and is widely believed. It is admirably clear and sharp, but it is clearly unsustainable … Clearly the claim taken literally is quite trivially false: the experimental group contains Mrs Brown and not Mr Smith, whereas the control group contains Mr Smith and not Mrs Brown, etc. Some restriction on the range of differences being considered is obviously implicit here; and presumably the real claim is something like that the two groups have the same means and distributions of all the [causally?] relevant factors. Although this sounds like a meaningful claim, I am not sure whether it would remain so under analysis … And certainly, even with respect to a given (finite) list of potentially relevant factors, no one can really believe that it automatically holds in the case of any particular randomized division of the subjects involved in the study. Although many commentators often seem to make the claim … no one seriously thinking about the issues can hold that randomization is a sufficient condition for there to be no difference between the two groups that may turn out to be relevant …

In sum, despite what is often said and written, no one can seriously believe that having randomized is a sufficient condition for a trial result to be reasonably supposed to reflect the true effect of some treatment. Is randomizing a necessary condition for this? That is, is it true that we cannot have real evidence that a treatment is genuinely effective unless it has been validated in a properly randomized trial? Again, some people in medicine sometimes talk as if this were the case, but again no one can seriously believe it. Indeed, as pointed out earlier, modern medicine would be in a terrible state if it were true. As already noted, the overwhelming majority of all treatments regarded as unambiguously effective by modern medicine today — from aspirin for mild headache through diuretics in heart failure and on to many surgical procedures — were never (and now, let us hope, never will be) ‘validated’ in an RCT.

John Worrall

For more on the question of ‘balance’ in randomized experiments, the recent paper by Marco Martinez & David Teira gives some valuable insights.

1 Comment

  1. While raising worthwhile points, most discussions I see misunderstand randomization in both causal and statistical ways. Notably, randomization can be valuable but does not induce balance in the ordinary English sense of the word, nor does it deal with most problems of real experiments. Furthermore, the use of the word “balance” to describe what randomization actually does invites confusion with the ordinary English meaning of “balance” (as does use of ordinary words like “significance” and “confidence” to describe other technical concepts).

    Causally, a controlled experiment is one which the experimenter causally controls the causes of (inputs to) the treatment (studied cause) or the outcome (studied effect) – preferably both. A randomized experiment is one in which the causes of the treatment are fully determined by a known randomizing device (at least within levels of fully measured covariates), so that there is nothing unmeasured that causes both treatment and outcome. Provided the outcome is shielded from any effect of the randomizing device except that through (mediated by) the treatment, the random assignment variable becomes a perfect instrumental variable (IV), and statistical techniques based on such perfect IVs can be justified without recourse to dodgy empirical tests. A similar view can be found in Pearl’s book (Causality, 2nd ed. 2009).

    Statistically, a frequentist can use the randomization distribution of A to construct a reference distribution for test statistics under various models (hypotheses) about the treatment effect (usually only a test of a no-effect model is described in this fashion, but most so-called confidence intervals are summaries of tests of models across which the treatment effect but nothing else is varied). This view can be seen in writings by Stephen Senn and James Robins. In parallel a Bayesian can use the distribution to provide prior distributions for counterfactual outcomes under the same variety of models (Cornfield, American Journal of Epidemiology 1976.

    Note that none of these descriptions use or need the term “balance”, nor need they make claims that randomization corresponds to no confounding (Greenland and Mansournia, European Journal of Epidemiology 2015). Proper randomization can be said to provide “balance in probability” but for frequentists this property is over a purely hypothetical long run while for Bayesians it is a subjective prior probability induced by the knowing that allocation was random (Cornfield 1976 again). Neither use of “balance” applies to the actual state of observed trial populations, which both theories allow or concede may be arbitrarily out of balance on unmeasured covariates due to “bad luck of the draw” (“random confounding” as per Greenland & Mansournia 2015). By properly merging the randomization distribution (information on allocation) and models for treatment effect, frequentists can deal with this chance element via P-value functions (“confidence distributions”) while Bayesians can deal with it via posterior distributions. Again, neither need invoke “balance” – and I would argue that, to avoid the confusion seen in much literature, they shouldn’t.

    None of this should be taken as sanctifying or criticizing randomization; I am simply pointing out that randomization, if done and described precisely, does something valuable – but that is not balance of actual samples (as opposed to hypothetical infinite samples or repetitions, or bets about actual samples). Real randomized studies of humans and their groupings must deal with many other considerations such as selectivity of study subjects (hence lack of generalizability), blinding (masking) of subjects and evaluators, outcome measurement error, nonadherence, drop-out, competing risks, etc. Randomization can help deflect concerns about confounding in probability, but is no panacea, and can increase other concerns such as selectivity of participation.

    For a variety of views on randomization and its limitations see the collection of articles in this journal volume:

Sorry, the comment form is closed at this time.

Blog at
Entries and Comments feeds.