On ‘randomistas’ and causal inference from randomization (wonkish)

6 Jul, 2014 at 15:05 | Posted in Economics, Statistics & Econometrics | 3 Comments

Yesterday, Dwayne Woods was kind enough to direct me to an interesting new article by Stephen Ziliak & Edward Teather-Posadas on The Unprincipled Randomization Principle in Economics and Medicine:

Over the past decade randomized field experiments have gained prominence in the toolbox of economics and policy making. Yet enthusiasts for randomization have perhaps paid not enough attention to conceptual and ethical errors caused by complete randomization.

Many but by no means all of the randomized experiments are being conducted by economists on poor people in developing nations. The larger objective of the new development economics is to use randomized controlled trials to learn about behavior in the field and to eradicate poverty …

RCT-Gold-Standard

There are … prudential and other ethical implications of a practice that deliberately withholds already-known-to-be best practice treatments from one or more human subjects. Randomized trials often give nil placebo or no treatment at all to vulnerable individuals, withholding (in the name of science) best treatments from the control group.

Although I don’t want to trivialize the ethical aspects of randomization studies (there’s an interesting discussion on the issue here), I still don’t think that Ziliak & Teather-Posadas get at the heart of the problem with randomization as a research strategy.

Field studies and experiments face the same basic problem as theoretical models — they are built on rather artificial conditions and have difficulties with the “trade-off” between internal and external validity. The more artificial conditions, the more internal validity, but also less external validity. The more we rig experiments/field studies/models to avoid the “confounding factors”, the less the conditions are reminicent of the real “target system”. You could of course discuss the field vs. experiments vs. theoretical models in terms of realism — but the nodal issue is not about that, but basically about how economists using different isolation strategies in different “nomological machines” attempt to learn about causal relationships. I have  strong doubts on the generalizability of all three research strategies, because the probability is high that causal mechanisms are different in different contexts and that lack of homogeneity/stability/invariance doesn’t give us warranted export licenses to the “real” societies or economies.

If we see experiments or field studies as theory tests or models that ultimately aspire to say something about the real “target system”, then the problem of external validity is central (and was for a long time also a key reason why behavioural economists had trouble getting their research results published).

Assume that you have examined how the work performance of Chinese workers A is affected by B (“treatment”). How can we extrapolate/generalize to new samples outside the original population (e.g. to the US)? How do we know that any replication attempt “succeeds”? How do we know when these replicated experimental results can be said to justify inferences made in samples from the original population? If, for example, P(A|B) is the conditional density function for the original sample, and we are interested in doing a extrapolative prediction of E [P(A|B)], how can we know that the new sample’s density function is identical with the original? Unless we can give some really good argument for this being the case, inferences built on P(A|B) is not really saying anything on that of the target system’s P'(A|B).

As I see it is this heart of the matter. External validity/extrapolation/generalization is founded on the assumption that we could make inferences based on P(A|B) that is exportable to other populations for which P'(A|B) applies. Sure, if one can convincingly show that P and P’are similar enough, the problems are perhaps surmountable. But arbitrarily just introducing functional specification restrictions of the type invariance/stability /homogeneity, is, at least for an epistemological realist far from satisfactory. And often it is – unfortunately – exactly this that I see when I take part of neoclassical economists’ models/experiments/field studies.

By this I do not mean to say that empirical methods per se are so problematic that they can never be used. On the contrary, I am basically – though not without reservations – in favour of the increased use of experiments and field studies within economics. Not least as an alternative to completely barren “bridge-less” axiomatic-deductive theory models. My criticism is more about aspiration levels and what we believe that we can achieve with our mediational epistemological tools and methods in the social sciences.

Many ‘experimentalists’ claim that it is easy to replicate experiments under different conditions and therefore a fortiori easy to test the robustness of experimental results. But is it really that easy? If in the example given above, we run a test and find that our predictions were not correct – what can we conclude? The B “works” in China but not in the US? Or that B “works” in a backward agrarian society, but not in a post-modern service society? That B “worked” in the field study conducted in year 2008 but not in year 2014? Population selection is almost never simple. Had the problem of external validity only been about inference from sample to population, this would be no critical problem. But the really interesting inferences are those we try to make from specific labs/experiments/fields to specific real world situations/institutions/structures that we are interested in understanding or (causally) to explain. And then the population problem is more difficult to tackle.

Everyone – both “labs” and “experimentalists” – should consider the following lines from David Salsburg’s The Lady Tasting Tea (Henry Holt 2001:146):

In Kolmogorov’s axiomatization of probability theory, we assume there is an abstract space of elementary things called ‘events’ … If a measure on the abstract space of events fulfills certain axioms, then it is a probability. To use probability in real life, we have to identify this space of events and do so with sufficient specificity to allow us to actually calculate probability measurements on that space … Unless we can identify Kolmogorov’s abstract space, the probability statements that emerge from statistical analyses will have many different and sometimes contrary meanings.

Or as mathematical statistician David Freedman had it:

In our days, serious arguments have been made from data. Beautiful, delicate theorems have been proved, although the connection with data analysis often remains to be established. And an enormous amount of fiction has been produced, masquerading as rigorous science …

Indeed, far-reaching claims have been made for the superiority of a quantitative template that depends on modeling – by those who manage to ignore the far-reaching assumptions behind the models. However, the assumptions often turn out to be unsupported by data. If so, the rigor of advanced quantitative methods is a matter of appearance rather than substance …

Fisher’s “constitutional hypothesis” explained the association between smoking and disease on the basis of a gene that caused both. This idea is refuted not by making assumptions but by doing some empirical work.

 Statistical Models and Causal Inference

So, using Randomized Controlled Trials (RCTs) is not at all the “gold standard” that it has lately often been portrayed as. But I don’t see the ethical issues as the biggest problem. To me the biggest problem with RCTs is that they usually do not provide evidence that their results are exportable to other target systems. The almost religious belief with which its propagators portray it, cannot hide the fact that RCTs cannot be taken for granted to give generalizable results. That something works somewhere is no warranty for it to work for us or even that it works generally.

3 Comments

  1. […] 4. On ‘randomistas’ and causal inference from randomization  […]

  2. It sounds as though the problem is not with conducting trials but in trying to extrapolate the results. If enough trials are carried out on different populations, then the results can be extrapolated (if they agree), or give some experimental data which might suggest why there are different results (which can then be tested by further trials).

  3. Lars, I enjoy your writing but I worry that you often fail to adequately reference for readers who are not as familiar with the literature as you. Terms like ‘nomological machines’, ‘warrant’ and ‘something works somewhere’ seem to come straight from Nancy Cartwright’s work – should you not refer to that? There have also been a number of papers recently on exporting results from experiments so I might suggest referencing those or looking them up.


Sorry, the comment form is closed at this time.

Blog at WordPress.com.
Entries and Comments feeds.