Can experiments save economics as science?

29 Oct, 2013 at 21:33 | Posted in Theory of Science & Methodology | 3 Comments

2008-05-08-99“There’s nothing so radical as empiricism,” Stumbling and Mumbling (Chris Dillow) argued in a post the other day. The main drift of Dillow’s argumentation is that there’s really no need for heterodox theoretical critiques of mainstream neoclassical economics, but rather challenges to neoclassical economics “buttressed by good empirical work.” Out with “big-think theorizing about the nature of rationality” and in with “ordinary empiricism – looking at the data and running experiments.”

Although — as always — thought provoking, I think Dillow here offers a view on empiricism and experiments that is too simplistic. And for several reasons — but mostly because the kind of experimental empiricism it favours, is largely untenable. Let me elaborate a little.

Experiments are actually very similar to theoretical models in many ways  — they e. g. have the same basic problem that they are built on rather artificial conditions and have difficulties with the “trade-off” between internal and external validity. The more artificial conditions, the more internal validity, but also less external validity. The more we rig experiments/models to avoid the “confounding factors”, the less the conditions are reminicent of the real “target system”. The nodal issue is how economists using different isolation strategies in different “nomological machines” attempt to learn about causal relationships. I doubt the generalizability of both research strategies, because the probability is high that causal mechanisms are different in different contexts and that lack of homogeneity/ stability/invariance doesn’t give us warranted export licenses to the “real” societies or economies.

If we see experiments as theory tests or models that ultimately aspire to say something about the real “target system”, then the problem of external validity is central.

Assume that you have examined how the work performance of Swedish workers A is affected by B (“treatment”). How can we extrapolate/generalize to new samples outside the original population (e.g. to the UK)? How do we know that any replication attempt “succeeds”? How do we know when these replicated experimental results can be said to justify inferences made in samples from the original population? If, for example, P(A|B) is the conditional density function for the original sample, and we are interested in doing a extrapolative prediction of E [P(A|B)], how can we know that the new sample’s density function is identical with the original? Unless we can give some really good argument for this being the case, inferences built on P(A|B) is not really saying anything on that of the target system’s P'(A|B).

As I see it is this heart of the matter. External validity/extrapolation/generalization is founded on the assumption that we could make inferences based on P(A|B) that is exportable to other populations for which P'(A|B) applies. Sure, if one can convincingly show that P and P’are similar enough, the problems are perhaps surmountable. But arbitrarily just introducing functional specification restrictions of the type invariance/stability /homogeneity, is, at least for an epistemological realist far from satisfactory. And often it is – unfortunately – exactly this that we see when we take part of neoclassical economists’ models/experiments.

By this I do not mean to say that empirical methods per se are so problematic that they can never be used. On the contrary, I am basically – though not without reservations – in favour of the increased use of experiments within economics. Not least as an alternative to completely barren “bridge-less” axiomatic-deductive theory models. My criticism is more about aspiration levels and what we believe that we can achieve with our mediational epistemological tools and methods in the social sciences.

Many ‘experimentalists’ claim that it is easy to replicate experiments under different conditions and therefore a fortiori easy to test the robustness of experimental results. But is it really that easy? If in the example given above, we run a test and find that our predictions were not correct – what can we conclude? The B “works” in Sweden but not in the UK? Or that B “works” in a backward agrarian society, but not in a post-modern service society? That B “worked” in the field study conducted in year 2005 but not in year 2013? Population selection is almost never simple. Had the problem of external validity only been about inference from sample to population, this would be no critical problem. But the really interesting inferences are those we try to make from specific labs/experiments to specific real world situations/institutions/structures that we are interested in understanding or (causally) to explain. And then the population problem is more difficult to tackle.

Just as traditional neoclassical modelling, randomized experiments is basically a deductive method. Given  the assumptions (such as manipulability, transitivity, separability, additivity, linearity etc)  these methods deliver deductive inferences. The problem, of course, is that we will never completely know when the assumptions are right.  Real target systems are seldom epistemically isomorphic to our axiomatic-deductive models/systems, and even if they were, we still have to argue for the external validity of  the conclusions reached from within these epistemically convenient models/systems. Causal evidence generated by randomization procedures may be valid in “closed” models, but what we usually are interested in, is causal evidence in the real target system we happen to live in.

Ideally controlled experiments (still the benchmark even for natural and quasi experiments) tell us with certainty what causes what effects – but only given the right “closures”. Making appropriate extrapolations from (ideal, accidental, natural or quasi) experiments to different settings, populations or target systems, is not easy. “It works there” is no evidence for “it will work here”. Causes deduced in an experimental setting still have to show that they come with an export-warrant to the target population/system. The causal background assumptions made have to be justified, and without licenses to export, the value of  “rigorous” and “precise” methods is despairingly small.

Many advocates of randomization and experiments want  to have deductively automated answers to  fundamental causal questions. But to apply “thin” methods we have to have “thick” background knowledge of  what’s going on in the real world, and not in (ideally controlled) experiments. Conclusions  can only be as certain as their premises – and that also goes for methods based on randomized experiments.

test-tubeThe claimed strength of a social experiment, relatively to non-experimental methods, is that few assumptions are required to establish its internal validity in identifying a project’s impact. The identification is not assumption-free. People are (typically and thankfully) free agents who make purposive choices about whether or not they should take up an assigned intervention. As is well understood by the randomistas, one needs to correct for such selective compliance … The randomized assignment is assumed to only affect outcomes through treatment status (the “exclusion restriction”).

There is another, more troubling, assumption just under the surface. Inferences are muddied by the presence of some latent factor—unobserved by the evaluator but known to the participant—that influences the individual-specific impact of the program in question … Then the standard instrumental variable method for identifying [the average treatment effect on the treated] is no longer valid, even when the instrumental variable is a randomized assignment … Most social experiments in practice make the implicit and implausible assumption that the program has the same impact for everyone.

While internal validity … is the claimed strength of an experiment, its acknowledged weakness is external validity—the ability to learn from an evaluation about how the specific intervention will work in other settings and at larger scales. The randomistas see themselves as the guys with the lab coats—the scientists—while other types, the “policy analysts,” worry about things like external validity. Yet it is hard to argue that external validity is less important than internal validity when trying to enhance development effectiveness against poverty; nor is external validity any less legitimate as a topic for scientific inquiry.

Martin Ravaillon


  1. The prime feature of Dillow’s approach is that it fails to mention many controversial areas. For some of his readers, this may be a ‘good thing’. One way to take his approach forward would be to note that Keynes’ had concerns about bubbles and suggested some mechanisms. Dillow is pointing to some experiments (‘simple’ or ‘simplistic’) that support Keynes’ observations, and to some simulations (‘simple’ or ‘simplistic’) that show how these mechanisms can lead to bubbles. Thus, while neo-classical economics may be correct in that crashes may be rare, they may still be frequent enough for us to worry about, and it may be that there are identifiable mechanisms that ‘the market’ does not fully take account of, and which policy-makers ought to consider.

    I agree with this conclusion, and would suggest that one would wish to find out which types of scientific or economic reasoning policy-makers were comfortable with, and then construct an argument that fits their comfort zone, without frightening the horses. Before 2006, this was probably a Dillow-like argument. I would hope that we have moved on. But have we?

    For me, the main thing is that if we accept anything like Dillow, then we have to accept something like Keynes’ notion of uncertainty, and strive to face it.

  2. I entirely agree that ““It works there” is no evidence for “it will work here””.However, many experiments – such as the one I cited on peer effects – are consistent with the findings of “real world” research. And I wasn’t cherry-picking there. I’ve pointed out other ways in which lab experiments are consistent with other findings:
    Sure, a freestanding experiment on its own poses the question of its external validity. But this question can often be answered in the affirmative – at least in terms of corroboration if not proof.
    I think what’s at issue here are big philosophical and sociological questions of how to make progress in social science. Purely theoretical challenges to neoclassical economics haven’t always weakened its hegemony, even when they were intellectually robust: this could be one lesson of the Cambridge capital controversy. So isn’t it worth considering that experimental evidence – alongside other forms – might be worth considering?

  3. Dufluo and Banerjee clearly overreached in their claims.

Sorry, the comment form is closed at this time.

Blog at
Entries and Comments feeds.