What RCTs can and cannot tell us

5 May, 2023 at 09:54 | Posted in Statistics & Econometrics | 4 Comments

Unfortunately, social sciences’ hope that we can control simultaneously for a range of factors like education, labor force attachment, discrimination, and others is simply more wishful thinking.

rcThe problem is that the causal relations underlying such associations are so complex and so irregular that the mechanical process of regression analysis has no hope of unpacking them. One hope for quantitative researchers who recognize the problems I have discussed is the use of experimentation – with the preferred terminology these days being randomized controlled trials (RCTs). RCTs supposedly get around the issues faced by regression analysis through the use of careful physical, experimental controls instead of statistical ones. The idea is that doing so will let one look at the effect of an individual factor, such as whether a student attended a particular reading program. In order to do this, one randomly assigns students to an experimental group and control group, which, in theory, will allow for firm attribution of cause and effect. Having done this, one hopes that the difference in achievement between the groups is a result of being in the reading program. Unfortunately, it may or may not be. You still have the problem that the social and pedagogical processes are so complex, with so many aspects for which to account, that, along some relevant dimensions, the control and experimental group will not be similar. That is, if you look closely at all potentially relevant factors, control groups almost always turn out systematically different from the experimental group, and the result is we no longer have the ability to make clear inferences. Instead, we need to use some form of statistical analysis to control for differences between the two groups. However, the application of statistical controls becomes an ad hoc exercise, even worse than the causal modeling regression approach. In the latter, at least there is a pretence of developing a complete model of potentially intervening variables whereas with the former a few covariates are selected rather arbitrarily as controls. In the end, one does not know whether to attribute achievement differences to the reading program or other factors.

Steven Klees

Klees’ interesting article highlights some of the fundamental problems with the present idolatry of ‘evidence-based’ policies and randomization designs in the field of education. Unfortunately, we face the same problems in economics.

The point of making a randomized experiment is often said to be that it ‘ensures’ that any correlation between a supposed cause and effect indicates a causal relation. This is believed to hold since randomization (allegedly) ensures that a supposed causal variable does not correlate with other variables that may influence the effect.

The problem with that simplistic view of randomization is that the claims made are exaggerated and sometimes even false:

• Even if you manage to do the assignment to treatment and control groups ideally random, the sample selection certainly is — except in extremely rare cases — not random. Even if we make a proper randomized assignment, if we apply the results to a biased sample, there is always the risk that the experimental findings will not apply. What works ‘there,’ does not work ‘here.’ Randomization hence does not ‘guarantee ‘ or ‘ensure’ making the right causal claim. Although randomization may help us rule out certain possible causal claims, randomization per se does not guarantee anything!

• Even if both sampling and assignment are made in an ideal random way, performing standard randomized experiments only gives you averages. The problem here is that although we may get an estimate of the ‘true’ average causal effect, this may ‘mask’ important heterogeneous effects of a causal nature. Although we get the right answer of the average causal effect being 0, those who are ‘treated’  may have causal effects equal to -100, and those ‘not treated’ may have causal effects equal to 100. Contemplating whether being treated or not, most people would probably be interested in knowing about this underlying heterogeneity and would not consider the average effect particularly enlightening.

• There is almost always a trade-off between bias and precision. In real-world settings, a little bias often does not overtrump greater precision. And — most importantly — in case we have a population with sizeable heterogeneity, the average treatment effect of the sample may differ substantially from the average treatment effect in the population. If so, the value of any extrapolating inferences made from trial samples to other populations is highly questionable.

• Since most real-world experiments and trials build on performing single randomization, what would happen if you kept on randomizing forever, does not help you to ‘ensure’ or ‘guarantee’ that you do not make false causal conclusions in the one particular randomized experiment you actually do perform. It is indeed difficult to see why thinking about what you know you will never do, would make you happy about what you actually do.

• And then there is also the problem that ‘Nature’ may not always supply us with the random experiments we are most interested in. If we are interested in X, why should we study Y only because design dictates that? Method should never be prioritized over substance!

Nowadays many mainstream economists maintain that ‘imaginative empirical methods’ — especially ‘as-if-random’ natural experiments and RCTs — can help us to answer questions concerning the external validity of economic models. In their view, they are, more or less, tests of ‘an underlying economic model’ and enable economists to make the right selection from the ever-expanding ‘collection of potentially applicable models.’

It is widely believed among mainstream economists that the scientific value of randomization — contrary to other methods — is more or less uncontroversial and that randomized experiments are free from bias. When looked at carefully, however, there are in fact few real reasons to share this optimism on the alleged ’experimental turn’ in economics. Strictly seen, randomization does not guarantee anything.

‘Ideally’ controlled experiments tell us with certainty what causes what effects — but only given the right ‘closures.’ Making appropriate extrapolations from (ideal, accidental, natural, or quasi) experiments to different settings, populations, or target systems, is not easy. Causes deduced in an experimental setting still have to show that they come with an export warrant to the target population. The causal background assumptions made have to be justified, and without licenses to export, the value of ‘rigorous’ and ‘precise’ methods — and ‘on-average-knowledge’ — is despairingly small.

The almost religious belief with which its propagators — including ‘Nobel prize’ winners like Duflo, Banerjee and Kremer  — portray it, cannot hide the fact that RCTs cannot be taken for granted to give generalizable results. That something works somewhere is no warranty for us to believe it to work for us here or that it works generally.

Leaning on an interventionist approach often means that instead of posing interesting questions on a social level, the focus is on individuals. Instead of asking about structural socio-economic factors behind, e.g., gender or racial discrimination, the focus is on the choices individuals make.  Esther Duflo is a typical example of the dangers of this limiting approach. Duflo et consortes want to give up on ‘big ideas’ like political economy and institutional reform and instead go for solving more manageable problems ‘the way plumbers do.’ Yours truly is far from sure that is the right way to move economics forward and make it a relevant and realist science. A plumber can fix minor leaks in your system, but if the whole system is rotten, something more than good old fashion plumbing is needed. The big social and economic problems we face today are not going to be solved by plumbers performing interventions or manipulations in the form of RCTs.

the-right-toolThe present RCT idolatry is dangerous. Believing randomization is the only way to achieve scientific validity blinds people to searching for and using other methods that in many contexts are better. Insisting on using only one tool often means using the wrong tool.

Randomization is not a panacea. It is not the best method for all questions and circumstances. Proponents of randomization make claims about its ability to deliver causal knowledge that is simply wrong. There are good reasons to share Klees’ scepticism on the now popular — and ill-informed — view that randomization is the only valid and the best method on the market. It is not.

4 Comments

  1. Social developments and events are almost always overdetermined and the relationship between dependent and in(ter)dependent variables is usually non-linear. To further imagine that the chaotic, emergent evolving social world is somehow as precisely engineered as a casino’s gaming machines seems to edge into madness.
    .
    I understand why futility is not a rallying cry, but maybe we need some metaphysical exploration of what we mean by cause-and-effect in a social context. Surely, no one imagines a kindergarten classroom is a billiard table on which precise trajectories can be plotted from calculations of force and inertia? When “causes” are information, roles, expectations, conditions and semantic generalization, what is it we imagine the “treatment” is? And, how is the “treatment” acting on the actors, who are the subjects and objects?
    .
    There is a lot of deep thinking needed on the foundations for any social or biological science that simply is not being done while RCTs are glibly embraced.

    • “as precisely engineered as a casino’s gaming machines”
      .
      Didn’t Ed Thorp prove that there are free lunches to be had with actual existing casino gaming machines?
      .
      “a billiard table on which precise trajectories can be plotted from calculations of force and inertia”
      .
      At what point must one include quantum entanglement or dark matter or some other spooky action at a distance, to match actual trajectories in the actual world?

      • “Didn’t Ed Thorp prove that there are free lunches to be had with actual existing casino gaming machines?”
        .
        He showed that there was enough unaccounted-for information about the direction and magnitude of bias that a calculating player could rely on careful observation to reveal enough to enable the player to beat the house under conventional house rules, which rules would advantage the house if the machinery worked perfectly to realize idealized odds. The card counter can win at blackjack. A worn roulette wheel can reward the observant.
        .
        That’s not a “free lunch”. The house loses what the player wins.
        .
        My point is that too few social scientists are willing to humble themselves enough to account for the possibility that their “objects of study” — people behaving strategically as well as socially — might sometimes be smarter than the scientists studying their behavior.

      • Bruce, are you adroitly demonstrating how to take advantage of a social free lunch by conveniently assuming only one of the common definitions of free lunch in economics? Can I tell a story of physics being unable to operationalize a perfect machine because of unmodelable forces such as friction, etc., and Thorp getting a free lunch from arbitraging what a conventional physics model says the odds should be against what reality says?
        .
        In other words, can I answer your point with a point that even in physics, “the universe is not only stranger than you imagine, but stranger than you can imagine”?


Sorry, the comment form is closed at this time.

Blog at WordPress.com.
Entries and Comments feeds.