Randomization, experiments and claims of causality in economics

28 April, 2012 at 13:37 | Posted in Theory of Science & Methodology | Comments Off on Randomization, experiments and claims of causality in economics

A few years ago Armin Falk and James Heckman published an acclaimed article titled “Lab Experiments Are a Major Source of Knowledge in the Social Sciences” in the journal Science. The authors – both renowned economists – argued that both field experiments and laboratory experiments are basically facing the same problems in terms of generalizability and external validity – and that a fortiori it is impossible to say that one would be better than the other.

What strikes me when reading both Falk & Heckman and advocators of field experiments – such as John List and Steven Levitt – is that field studies and experiments are both very similar to theoretical models. They all have the same basic problem – they are built on rather artificial conditions and have difficulties with the “trade-off” between internal and external validity. The more artificial conditions, the more internal validity, but also less external validity. The more we rig experiments/field studies/models to avoid the “confounding factors”, the less the conditions are reminicent of the real “target system”. To that extent, I also believe that Falk & Heckman are right in their comments on the discussion of the field vs. experiments in terms of realism – the nodal issue is not about that, but basically about how economists using different isolation strategies in different “nomological machines” attempt to learn about causal relationships. By contrast to Falk & Heckman and advocators of field experiments, as List and Levitt, I doubt the generalizability of both research strategies, because the probability is high that causal mechanisms are different in different contexts and that lack of homogeneity/ stability/invariance doesn’t give us warranted export licenses to the “real” societies or economies.

If you mainly conceive of experiments or field studies as heuristic tools, the dividing line between, say, Falk & Heckman and List or Levitt is probably difficult to perceive.

But if we see experiments or field studies as theory tests or models that ultimately aspire to say something about the real “target system”, then the problem of external validity is central (and was for a long time also a key reason why behavioural economists had trouble getting their research results published).

Assume that you have examined how the work performance of Chinese workers A is affected by B (“treatment”). How can we extrapolate/generalize to new samples outside the original population (e.g. to the US)? How do we know that any replication attempt “succeeds”? How do we know when these replicated experimental results can be said to justify inferences made in samples from the original population? If, for example, P(A|B) is the conditional density function for the original sample, and we are interested in doing a extrapolative prediction of E [P(A|B)], how can we know that the new sample’s density function is identical with the original? Unless we can give some really good argument for this being the case, inferences built on P(A|B) is not really saying anything on that of the target system’s P'(A|B).

As I see it is this heart of the matter. External validity/extrapolation/generalization is founded on the assumption that we could make inferences based on P(A|B) that is exportable to other populations for which P'(A|B) applies. Sure, if one can convincingly show that P and P’are similar enough, the problems are perhaps surmountable. But arbitrarily just introducing functional specification restrictions of the type invariance/stability /homogeneity, is, at least for an epistemological realist far from satisfactory. And often it is – unfortunately – exactly this that I see when I take part of neoclassical economists’ models/experiments/field studies.

By this I do not mean to say that empirical methods per se are so problematic that they can never be used. On the contrary, I am basically – though not without reservations – in favour of the increased use of experiments and field studies within economics. Not least as an alternative to completely barren “bridge-less” axiomatic-deductive theory models. My criticism is more about aspiration levels and what we believe that we can achieve with our mediational epistemological tools and methods in the social sciences.

Many ‘experimentalists’ claim that it is easy to replicate experiments under different conditions and therefore a fortiori easy to test the robustness of experimental results. But is it really that easy? If in the example given above, we run a test and find that our predictions were not correct – what can we conclude? The B “works” in China but not in the US? Or that B “works” in a backward agrarian society, but not in a post-modern service society? That B “worked” in the field study conducted in year 2008 but not in year 2012? Population selection is almost never simple. Had the problem of external validity only been about inference from sample to population, this would be no critical problem. But the really interesting inferences are those we try to make from specific labs/experiments/fields to specific real world situations/institutions/structures that we are interested in understanding or (causally) to explain. And then the population problem is more difficult to tackle.

Everyone – both “labs” and “experimentalists” – should consider the following lines from David Salsburg’s The Lady Tasting Tea (Henry Holt 2001:146):

In Kolmogorov’s axiomatization of probability theory, we assume there is an abstract space of elementary things called ‘events’ … If a measure on the abstract space of events fulfills certain axioms, then it is a probability. To use probability in real life, we have to identify this space of events and do so with sufficient specificity to allow us to actually calculate probability measurements on that space … Unless we can identify Kolmogorov’s abstract space, the probability statements that emerge from statistical analyses will have many different and sometimes contrary meanings.

Evidence-based theories and policies are highly valued nowadays. Randomization is supposed to best control for bias from unknown confounders. The received opinion is that evidence based on randomized experiments therefore is the best.

More and more economists have also lately come to advocate randomization as the principal method for ensuring being able to make valid causal inferences.

Renowned econometrician Ed Leamer has responded to these allegations, maintaning that randomization is not sufficient, and that the hopes of a better empirical and quantitative macroeconomics are to a large extent illusory. Randomization – just as econometrics – promises more than it can deliver, basically because it requires assumptions that in practice are not possible to maintain:

We economists trudge relentlessly toward Asymptopia, where data are unlimited and estimates are consistent, where the laws of large numbers apply perfectly andwhere the full intricacies of the economy are completely revealed. But it’s a frustrating journey, since, no matter how far we travel, Asymptopia remains infinitely far away. Worst of all, when we feel pumped up with our progress, a tectonic shift can occur, like the Panic of 2008, making it seem as though our long journey has left us disappointingly close to the State of Complete Ignorance whence we began.

The pointlessness of much of our daily activity makes us receptive when the Priests of our tribe ring the bells and announce a shortened path to Asymptopia … We may listen, but we don’t hear, when the Priests warn that the new direction is only for those with Faith, those with complete belief in the Assumptions of the Path. It often takes years down the Path, but sooner or later, someone articulates the concerns that gnaw away in each of us and asks if the Assumptions are valid … Small seeds of doubt in each of us inevitably turn to despair and we abandon that direction and seek another …

Ignorance is a formidable foe, and to have hope of even modest victories, we economists need to use every resource and every weapon we can muster, including thought experiments (theory), and the analysis of data from nonexperiments, accidental experiments, and designed experiments. We should be celebrating the small genuine victories of the economists who use their tools most effectively, and we should dial back our adoration of those who can carry the biggest and brightest and least-understood weapons. We would benefit from some serious humility, and from burning our “Mission Accomplished” banners. It’s never gonna happen.

Part of the problem is that we data analysts want it all automated. We want an answer at the push of a button on a keyboard …  Faced with the choice between thinking long and hard verus pushing the button, the single button is winning by a very large margin.

Let’s not add a “randomization” button to our intellectual keyboards, to be pushed without hard reflection and thought.

Especially when it comes to questions of causality, randomization is nowadays considered some kind of “gold standard”. Everything has to be evidence-based, and the evidence has to come from randomized experiments.

But just as econometrics, randomization is basically a deductive method. Given  the assumptions (such as manipulability, transitivity, Reichenbach probability principles, separability, additivity, linearity etc)  these methods deliver deductive inferences. The problem, of course, is that we will never completely know when the assumptions are right. [And although randomization may contribute to controlling for confounding, it does not guarantee it, since genuine ramdomness presupposes infinite experimentation and we know all real experimentation is finite. And even if randomization may help to establish average causal effects, it says nothing of individual effects unless homogeneity is added to the list of assumptions.] Real target systems are seldom epistemically isomorphic to our axiomatic-deductive models/systems, and even if they were, we still have to argue for the external validity of  the conclusions reached from within these epistemically convenient models/systems. Causal evidence generated by randomization procedures may be valid in “closed” models, but what we usually are interested in, is causal evidence in the real target system we happen to live in.

When does a conclusion established in population X hold for target population Y? Only under  very restrictive conditions!

Science philosopher Nancy Cartwright has succinctly summarized the value of randomization. In The Lancet 23/4 2011 she states:

But recall the logic of randomized control trials … [T]hey are ideal for supporting ‘it-works-somewhere’ claims. But they are in no way ideal for other purposes; in particular they provide no better bases for extrapolating or generalising than knowledge that the treatmet caused the outcome in any other individuals in any other circumstances … And where no capacity claims obtain, there is seldom warrant for assuming that a treatment that works somewhere will work anywhere else. (The exception is where there is warrant to believe that the study population is a representative sample of the target population – and cases like this are hard to come by.)

And in BioSocieties 2/2007:

We experiment on a population of individuals each of whom we take to be described (or ‘governed’) by the same fixed causal structure (albeit unknown) and fixed probability measure (albeit unknown). Our deductive conclusions depend on that cvery causal structure and probability. How do we know what individuals beyond those in our experiment this applies to? … The [randomized experiment], with its vaunted rigor, takes us only a very small part of the way we need to go for practical knowledge. This is what disposes me to warn about the vanity of rigor in [randomized experiments].

Ideally controlled experiments (still the benchmark even for natural and quasi experiments) tell us with certainty what causes what effects – but only given the right “closures”. Making appropriate extrapolations from (ideal, accidental, natural or quasi) experiments to different settings, populations or target systems, is not easy. “It works there” is no evidence for “it will work here”. Causes deduced in an experimental setting still have to show that they come with an export-warrant to the target population/system. The causal background assumptions made have to be justified, and without licenses to export, the value of  “rigorous” and “precise” methods is despairingly small.

Here I think Leamer’s “button” metaphor is appropriate. Many advocates of randomization want  to have deductively automated answers to  fundamental causal questions. But to apply “thin” methods we have to have “thick” background knowledge of  what’s going on in the real world, and not in (ideally controlled) experiments. Conclusions  can only be as certain as their premises – and that also goes for methods based on randomized experiments.

Advertisements

Blog at WordPress.com.
Entries and comments feeds.