‘Rigorous evidence’? Yes — and totally useless!

1 February, 2016 at 16:31 | Posted in Economics | 8 Comments

So far we have shown that for two prominent questions in the economics of education, experimental and non-experimental estimates appear to be in tension. Furthermore, experimental results across different contexts are often in tension with each other. The first tension presents policymakers with a trade-off between the internal validity of estimates from the “wrong” context, and the greater external validity of observational data analysis from the “right” context. The second tension, between equally well-identifed results across contexts, suggests that the resolution of this trade-off is not trivial. There appears to be genuine heterogeneity in the true causal parameter across contexts.

8407_2008_58931_lThese findings imply that the common practice of ranking evidence by its level of “rigor”,
without respect to context, may produce misleading policy recommendations …

Despite the fact that we have chosen to focus on extremely well-researched literatures,
it is plausible that a development practitioner confronting questions related to class size, private schooling, or the labor-market returns to education would confront a dearth of well-identified, experimental or quasi-experimental evidence from the country or context in which they are working. They would instead be forced to choose between less internally valid OLS estimates, and more internally valid experimental estimates produced in a very different setting. For all five of the examples explored here, the literature provides a compelling case that policymakers interested in minimizing the error of their parameter estimates would do well to prioritize careful thinking about local evidence over rigorously-estimated causal effects from the wrong context.

Lant Pritchett & Justin Sandefur

Randomization — just as econometrics — is basically a deductive method. Given the assumptions (such as manipulability, transitivity, separability, additivity, linearity, etc.) these methods deliver ‘deductive’ inferences. The problem, of course, is that we will never completely know when the assumptions are right. And although randomization may contribute to controlling for confounding, it does not guarantee it, since genuine ramdomness presupposes infinite experimentation and we know all real experimentation is finite. And even if randomization may help to establish average causal effects, it says nothing of individual effects unless homogeneity is added to the list of assumptions. Real target systems are seldom epistemically isomorphic to our axiomatic-deductive models/systems, and even if they were, we still have to argue for the external validity of the conclusions reached from within these epistemically convenient models/systems. Causal evidence generated by randomization procedures may be valid in “closed” models, but what we usually are interested in, is causal evidence in the real world we happen to live in.



  1. “The second tension, between equally well-identifed results across contexts, suggests that the resolution of this trade-off is not trivial. There appears to be genuine heterogeneity in the true causal parameter across contexts.”

    Again, I’m inclined to be more radical. What about genuine heterogeneity within contexts? I.e., why exactly do we assume “true” parameters (note the inevitable scare quotes — why exactly are they necessary?) exist at all in the social world in anything but closely micro-engineered situations?

    [From the abstract:] We conclude with recommendations for research and policy, including the need to evaluate programs in context, and avoid simple analogies to clinical medicine in which “systematic reviews” attempt to identify best-practices by putting most (or all) weight on the most “rigorous” evidence with no allowance for context.

    Good advice. But this seems to allow that “systematic reviews” and decontextualised “evidence-based” proclamations are valuable in clinical medicine. They’re not. David Freedman was systematically sceptical about the logic of the ever-popular “meta-analysis”, and Ben Goldacre and others have shown how utterly broken the system of medical research (especially drug trials) really is.

  2. I don’t think Ben Goldacre is an advocate of abandoning randomisation in experiments.

    I don’t want to argue from authority, but didn’t Keynes argue that it is better to be roughly right than precisely wrong? Although you (LS) say that randomisation is a deductivist procedure, it enables you (always allowing certain assumptions) to set bounds on how wrong you are likely to be, and so it does deliver a “roughly right” answer.

    • And Ben Goldacre is a fan of meta-analyses. So what? I don’t have to agree with everything he says.I was making a stronger and broader point than Pritchett & Sandefur’s. (I thought it was obvious from my first paragraph that I’m sceptical of OLS as well as RCTs.)I wasn’t claiming that Goldacre had established (or believed) that randomisation in clinical medicine was rotten in principle. I was pointing out P&S’s implicit assumption that if the analogy to (“evidence-based”) clinical medicine was valid (which they contest), the claims on behalf of experimental economics would ipso facto be lent strength. And most economists and other social scientists, for that matter, I presume, would buy that intuitively and without question. But strong faith in — and the ostensible prestige of — the findings of clinical medicine as a whole — including lots and lots of results produced via RCTs — simply aren’t warranted. As Goldacre “and others” have demonstrated.

      • So why isn’t it “warranted”?

  3. Oh, hooray. And I thought the champions of uniquely “rigorous” social science had sadly all departed this blog.
    To give but one example: how about massive, endemic publication bias? Researchers can do as many trials as they want, and negative, statistically-non-significant and/or unflattering results simply don’t get published.
    On drug trials specifically: virtually all of the great majority of drug research that is funded by the drug industry involves contracts subjecting researchers to gag clauses preventing them from discussing any aspect of the data they gather/produce without the permission of their sponsors. There are typically restraints on publication rights of the researchers, and in a substantial proportion of cases, the funder either outright owns the data, and/or has to approve publication. In another substantial proportion of cases, drug company sponsors of trials get access to the data as it’s accumulating, and/or to stop the trials at any time, for any reason (which can be used to suppress undesirable results, or to magnify desired results). Most clinical trial agreements between researchers and funders are kept confidential, and apparently a majority of researchers see no problem with that. Very many (i.e. perhaps half) of these research papers are drafted by the funder itself (!), and sponsors can often insert their own statistical analyses.
    Ethics committees, universities, professional medical associations, regulators and academic journals all permit this to happen, and measures ostensibly taken to deal with the problem — introduced with great fanfare — aren’t enforced, keep secret data collected for the purposes of “transparency”, etc.
    Sound like any rigorous social sciences you know?

    • Oh, I thought that your argument was that “strong faith in […] results produced via RCT […] aren’t warranted”, exclusively applied to social science, and not clinical medicine. But now I understand that you feel the same about any discipline.

      I believe the problems you bring up are real and certainly are problematic. But it is also the case that all medical treatments available have first been tested in an RCT setting. So while a successful RCT should not be a sufficient condition for “strong faith”, I do think it should be necessary.

      • I don’t know where you got that. What part of “how utterly broken the system of medical research … really is” and “strong faith in — and the ostensible prestige of — the findings of clinical medicine as a whole … simply aren’t warranted” would lead you to think I was singling out the social sciences?
        That said, I happen to believe that RCTs face many more problems in the social sciences than in clinical medicine. (And above I gave but a sampling of the problems of RCTs in clinical medicine — and those mainly problems that might seem “external” to the logic of RCTs themselves.) Many of the social-science problems are endemic and fundamental.
        You shouldn’t presume to know my feelings about any particular discipline. My feelings about the appropriateness/value/reliability of RCTs, or any other method for that matter, differ from one discipline to the next, in a way that is connected with my impression of the nature of the various phenomena dealt with by those disciplines. But some general points. RCTs can’t deliver decisive verdicts about causation, and they are no “gold standard” of causal inference, in any domain. They can be useful evidence for causation in those domains where their core assumptions — and not just the ones identified by statisticians — are actually satisfied by the phenomena under study. Otherwise, they are at best — at best — interesting hints at the possible existence of phenomena or their features, which may or may not amount to something genuine, and say nothing about causation. They certainly establish nothing on their own.
        My original point was that those who want to make an analogy to clinical medicine in order to lend support to the reliability of experimental economics would not in fact be getting much support even if the analogy were a good one.
        I don’t see how your last statement follows from your second-last one. And is your last statement supposed to apply to economics, too? If so, that’s a remarkable stance. If not, why not?

      • I haven’t presumed anything about your feelings. I suppose I just misread you. No need to get upset.


        I use necessary and sufficient in the scientific sense, not as in everyday parlance. That is, if you believe there is a true causal relationship between X and Y, then it ought to be deeply troubling if an RCT could not confirm that relationship (necessity). But just because an RCT suggests a causal relationship does not necessarily imply that there is one (sufficiency).

        I think that statement is perfectly consistent with “[…] it is also the case that all medical treatments available have first been tested in an RCT setting”

Sorry, the comment form is closed at this time.

Create a free website or blog at WordPress.com.
Entries and comments feeds.