On the limits of experimental economics

26 Dec, 2014 at 16:25 | Posted in Theory of Science & Methodology | Comments Off on On the limits of experimental economics


According to some  people there’s really no need for heterodox theoretical critiques of mainstream neoclassical economics, but rather challenges to neoclassical economics “buttressed by good empirical work.” Out with “big-think theorizing” and in with “ordinary empiricism.”

Although thought provoking, the view on empiricism and experiments offered is however too simplistic. And for several reasons — but mostly because the kind of experimental empiricism it favours is largely untenable.

Experiments are actually very similar to theoretical models in many ways  — they e. g. have the same basic problem that they are built on rather artificial conditions and have difficulties with the “trade-off” between internal and external validity. The more artificial conditions, the more internal validity, but also less external validity. The more we rig experiments/models to avoid the “confounding factors”, the less the conditions are reminicent of the real “target system”. The nodal issue is how economists using different isolation strategies in different “nomological machines” attempt to learn about causal relationships. I doubt the generalizability of both research strategies, because the probability is high that causal mechanisms are different in different contexts and that lack of homogeneity/ stability/invariance doesn’t give us warranted export licenses to the “real” societies or economies.

If we see experiments as theory tests or models that ultimately aspire to say something about the real “target system”, then the problem of external validity is central.

Assume that you have examined how the work performance of Swedish workers A is affected by B (“treatment”). How can we extrapolate/generalize to new samples outside the original population (e.g. to the UK)? How do we know that any replication attempt “succeeds”? How do we know when these replicated experimental results can be said to justify inferences made in samples from the original population? If, for example, P(A|B) is the conditional density function for the original sample, and we are interested in doing a extrapolative prediction of E [P(A|B)], how can we know that the new sample’s density function is identical with the original? Unless we can give some really good argument for this being the case, inferences built on P(A|B) is not really saying anything on that of the target system’s P'(A|B).

As I see it is this heart of the matter. External validity/extrapolation/generalization is founded on the assumption that we can make inferences based on P(A|B) that is exportable to other populations for which P'(A|B) applies. Sure, if one can convincingly show that P and P’ are similar enough, the problems are perhaps not insurmountable. But arbitrarily just introducing functional specification restrictions of the type invariance/stability/homogeneity is, at least for an epistemological realist, far from satisfactory. And often it is — unfortunately — exactly this that we see when we take part of neoclassical economists’ models/experiments.

By this I do not mean to say that empirical methods per se are so problematic that they can never be used. On the contrary, I am basically — though not without reservations — in favour of the increased use of experiments within economics as an alternative to completely barren “bridge-less” axiomatic-deductive theory models. My criticism is more about aspiration levels and what we believe we can achieve with our mediational epistemological tools and methods in social sciences.

Many ‘experimentalists’ claim that it is easy to replicate experiments under different conditions and therefore a fortiori easy to test the robustness of experimental results. But is it really that easy? If in the example given above, we run a test and find that our predictions were not correct – what can we conclude? The B “works” in Sweden but not in the UK? Or that B “works” in a backward agrarian society, but not in a post-modern service society? That B “worked” in the field study conducted in year 2005 but not in year 2014? Population selection is almost never simple. Had the problem of external validity only been about inference from sample to population, this would be no critical problem. But the really interesting inferences are those we try to make from specific labs/experiments to specific real world situations/institutions/structures that we are interested in understanding or (causally) to explain. And then the population problem is more difficult to tackle.

Just as traditional neoclassical modelling, randomized experiments is basically a deductive method. Given  the assumptions (such as manipulability, transitivity, separability, additivity, linearity etc)  these methods deliver deductive inferences. The problem, of course, is that we will never completely know when the assumptions are right.  Real target systems are seldom epistemically isomorphic to our axiomatic-deductive models/systems, and even if they were, we still have to argue for the external validity of  the conclusions reached from within these epistemically convenient models/systems. Causal evidence generated by randomization procedures may be valid in “closed” models, but what we usually are interested in, is causal evidence in the real target system we happen to live in.

Ideally controlled experiments (still the benchmark even for natural and quasi experiments) tell us with certainty what causes what effects – but only given the right “closures”. Making appropriate extrapolations from (ideal, accidental, natural or quasi) experiments to different settings, populations or target systems, is not easy. “It works there” is no evidence for “it will work here”. Causes deduced in an experimental setting still have to show that they come with an export-warrant to the target population/system. The causal background assumptions made have to be justified, and without licenses to export, the value of  “rigorous” and “precise” methods is despairingly small.

Many advocates of randomization and experiments want  to have deductively automated answers to  fundamental causal questions. But to apply “thin” methods we have to have “thick” background knowledge of  what’s going on in the real world, and not in (ideally controlled) experiments. Conclusions  can only be as certain as their premises – and that also goes for methods based on randomized experiments.

test-tubeThe claimed strength of a social experiment, relatively to non-experimental methods, is that few assumptions are required to establish its internal validity in identifying a project’s impact. The identification is not assumption-free. People are (typically and thankfully) free agents who make purposive choices about whether or not they should take up an assigned intervention. As is well understood by the randomistas, one needs to correct for such selective compliance … The randomized assignment is assumed to only affect outcomes through treatment status (the “exclusion restriction”).

There is another, more troubling, assumption just under the surface. Inferences are muddied by the presence of some latent factor—unobserved by the evaluator but known to the participant—that influences the individual-specific impact of the program in question … Then the standard instrumental variable method for identifying [the average treatment effect on the treated] is no longer valid, even when the instrumental variable is a randomized assignment … Most social experiments in practice make the implicit and implausible assumption that the program has the same impact for everyone.

While internal validity … is the claimed strength of an experiment, its acknowledged weakness is external validity—the ability to learn from an evaluation about how the specific intervention will work in other settings and at larger scales. The randomistas see themselves as the guys with the lab coats—the scientists—while other types, the “policy analysts,” worry about things like external validity. Yet it is hard to argue that external validity is less important than internal validity when trying to enhance development effectiveness against poverty; nor is external validity any less legitimate as a topic for scientific inquiry.

Martin Ravaillon

Blog at WordPress.com.
Entries and comments feeds.