Randomisering är svårt. Det är i själva verket så knepigt att uppnå en ”ren” randomisering att det till och med kan vara en utmaning i laboratoriemiljö, dvs där metoden först uppstod och där man, i princip, kan kontrollera för alla de faktorer som man vill ta hänsyn till …
Randomisering behövs inte …
Randomisering svarar bara delvis på relevanta frågor. Ett problem med RKS-insatser som till och med många ”randomistas” åtminstone delvis håller med om är de relativt begränsade frågor som en sådan typ av insats ger svar på. Grovt sagt så svarar en randomiserad studie på om just den här insatsen fungerade just i denna kontext vid den här tidpunkten, dvs den har hög intern validitet. Det är inte oväsentliga svar, men de ger inte vid handen om insatsen bör utvidgas till andra områden och förhållanden. För svar på sådana frågor behövs studier med hög extern validitet.
Randomisering är inte tillämpbart inom stora delar av biståndet. Det är mycket möjligt, kanske till och med troligt, att det finns biståndsfinansierade projekt och insatser som bör designas och genomföras på ett randomiserat sätt för att utvärdera dess effekter. Men, de insatserna tillhör sannolikt minoriteten av allt det som finansieras av bistånd. Det allra mesta går inte att genomföra på ett slumpmässigt och kontrollerat sätt.
Dessutom finns det uppenbara etiska aspekter på en insats genomförande där man ger en grupp ett bevisligen effektivt medel, medan en annan grupp inte tilldelas samma medel …
Randomisering är inte kostnadseffektivt. Att genomföra en randomiserad kontrollstudie är dyrt, väldigt dyrt.
A 2005 governmental inquiry led to a trial period involving anonymous job applications in seven public sector workplaces during 2007. In doing so, the public sector aims to improve the recruitment process and to increase the ethnic diversity among its workforce. There is evidence to show that gender and ethnicity have an influence in the hiring process although this is considered as discrimination by current legislation …
The process of ‘depersonalising’ job applications is to make these applications anonymous. In the case of the Gothenburg trial, certain information about the applicant – such as name, sex, country of origin or other identifiable traits of ethnicity and gender – is hidden during the first phase of the job application procedure. The recruiting managers therefore do not see the full content of applications when deciding on whom to invite for interview. Once a candidate has been selected for interview, this information can then be seen.
The trial involving job applications of this nature in the city of Gothenburg is so far the most extensive in Sweden. For this reason, the Institute for Labour Market Policy Evaluation (IFAU) has carried out an evaluation of the impact of anonymous job applications in Gothenburg …
The data used in the IFAU study derive from three districts in Gothenburg … Information on the 3,529 job applicants and a total of 109 positions were collected from all three districts …
A difference-in-difference model was used to test the findings and to estimate the effects in the outcome variables: whether a difference emerges regarding an invitation to interview and job offers in relation to gender and ethnicity in the case of anonymous job applications compared with traditional application procedures.
For job openings where anonymous job applications were applied, the IFAU study reveals that gender and the ethnic origin of the applicant do not affect the probability of being invited for interview. As would be expected from previous research, these factors do have an impact when compared with recruitment processes using traditional application procedures where all the information on the applicant, such as name, sex, country of origin or other identifiable traits of ethnicity and gender, is visible during the first phase of the hiring process. As a result, anonymous applications are estimated to increase the probability of being interviewed regardless of gender and ethnic origin, showing an increase of about 8% for both non-western migrant workers and women.
As yours truly has repeatedly argued (here here here) on this blog, RCTs usually do not provide evidence that their results are exportable to other target systems. The almost religious belief with which its propagators portray it, cannot hide the fact that RCTs cannot be taken for granted to give generalizable results. That something works somewhere is no warranty for it to work for us or even that it works generally.
In an extremely interesting article on the grand claims to external validity often raised by advocates of RCTs, Lant Pritchett and Justin Sandefur now confirm this view and show that using an RCT is not at all the “gold standard” it is portrayed as:
Our point here is not to argue against any well-founded generalization of research findings, nor against the use of experimental methods. Both are central pillars of scientific research. As a means of quantifying the impact of a given development project, or measuring the underlying causal parameter of a clearly-specified economic model, field experiments provide unquestioned advantages over observational studies.
But the popularity of RCTs in development economics stems largely from the claim that they provide a guide to making “evidence-based” policy decisions. In the vast majority of cases, policy recommendations based on experimental results hinge not only on the interior validity of the treatment effect estimates, but also on their external validity across contexts.
Inasmuch as development economics is a worthwhile, independent field of study – rather than a purely parasitic form of regional studies, applying the lessons of rich-country economies to poorer settings – its central conceit is that development is different. The economic, social, and institutional systems of poor countries operate differently than in rich countries in ways that are sufficiently fundamental to require different models and different data.
It is difficult if not impossible to adjudicate the external validity of an individual eperimental result in isolation. But experimental results do not exist in a vacuum. On many development policy questions, the literature as a whole — i. e., the combination of experimental and non-experimental results across multiple contexts — collectively invalidate any claim of external validity for any individual experimental result.
Yours truly har i ett antal artiklar här på bloggen — se t. ex. här, här och här — ifrågasatt värdet av randomiserade kontrollstudier (RKS) utifrån vetenskapsteoretiska och metodologiska utgångspunkter. I ett läsvärt gästinlägg på ekonomistas ifrågasätter Björn Ekman starkt värdet av RKS som vägledning för biståndsarbete:
Modelling by the construction of analogue economies is a widespread technique in economic theory nowadays … As Lucas urges, the important point about analogue economies is that everything is known about them … and within them the propositions we are interested in ‘can be formulated rigorously and shown to be valid’ … For these constructed economies, our views about what will happen are ‘statements of verifiable fact.’
The method of verification is deduction … We are however, faced with a trade-off: we can have totally verifiable results but only about economies that are not real …
How then do these analogue economies relate to the real economies that we are supposed to be theorizing about? … My overall suspicion is that the way deductivity is achieved in economic models may undermine the possibility … to teach genuine truths about empirical reality.
Microfounded DSGE models standardly assume rational expectations, Walrasian market clearing, unique equilibria, time invariance, linear separability and homogeneity of both inputs/outputs and technology, infinitely lived intertemporally optimizing representative household/consumer/producer agents with homothetic and identical preferences, etc., etc. At the same time the models standardly ignore complexity, diversity, uncertainty, coordination problems, non-market clearing prices, real aggregation problems, emergence, expectations formation, etc., etc.
Behavioural and experimental economics – not to speak of psychology – show beyond any doubts that “deep parameters” — peoples’ preferences, choices and forecasts — are regularly influenced by those of other participants in the economy. And how about the homogeneity assumption? And if all actors are the same – why and with whom do they transact? And why does economics have to be exclusively teleological (concerned with intentional states of individuals)? Where are the arguments for that ontological reductionism? And what about collective intentionality and constitutive background rules?
These are all justified questions – so, in what way can one maintain that these models give workable microfoundations for macroeconomics?
I think science philosopher Nancy Cartwright gives a good hint at how to answer that question:
Our assessment of the probability of effectiveness is only as secure as the weakest link in our reasoning to arrive at that probability. We may have to ignore some issues to make heroic assumptions about them. But that should dramatically weaken our degree of confidence in our final assessment. Rigor isn’t contagious from link to link. If you want a relatively secure conclusion coming out, you’d better be careful that each premise is secure going on.
Science is made possible by the fact that there are structures that are durable and are independent of our knowledge or beliefs about them. There exists a reality beyond our theories and concepts of it. It is this independent reality that our theories in some way deal with. Contrary to positivism, I would as a critical realist argue that the main task of science is not to detect event-regularities between observed facts. Rather, it is to identify the underlying structure and forces that produce the observed events.
In a truly wonderful essay in Error and Inference (Cambridge University Press, 2010, eds. Deborah Mayo and Aris Spanos), Alan Musgrave explains why scientific realism and inference to the best explanation (IBE) are the best alternatives for explaining what’s going on in the world we live in:
For realists, the name of the scientific game is explaining phenomena, not just saving them. Realists typically invoke ‘inference to the best explanation’ or IBE …
IBE is a pattern of argument that is ubiquitous in science and in everyday life as well. van Fraassen has a homely example:
“I hear scratching in the wall, the patter of little feet at midnight, my cheese disappears – and I infer that a mouse has come to live with me. Not merely that these apparent signs of mousely presence will continue, not merely that all the observable phenomena will be as if there is a mouse, but that there really is a mouse.” (1980: 19-20)
Here, the mouse hypothesis is supposed to be the best explanation of the phenomena, the scratching in the wall, the patter of little feet, and the disappearing cheese.
What exactly is the inference in IBE, what are the premises, and what the conclusion? van Fraassen says “I infer that a mouse has come to live with me”. This suggests that the conclusion is “A mouse has come to live with me” and that the premises are statements about the scratching in the wall, etc. Generally, the premises are the things to be explained (the explanandum) and the conclusion is the thing that does the explaining (the explanans). But this suggestion is odd. Explanations are many and various, and it will be impossible to extract any general pattern of inference taking us from explanandum to explanans. Moreover, it is clear that inferences of this kind cannot be deductively valid ones, in which the truth of the premises guarantees the truth of the conclusion. For the conclusion, the explanans, goes beyond the premises, the explanandum. In the standard deductive model of explanation, we infer the explanandum from the explanans, not the other way around – we do not deduce the explanatory hypothesis from the phenomena, rather we deduce the phenomena from the explanatory hypothesis …
To achieve explanatory success, a theory should, minimally, satisfy two criteria: it should have determinate implications for behavior, and the implied behavior should be what we actually observe. These are necessary conditions, not sufficient ones. Rational-choice theory often fails on both counts. The theory may be indeterminate, and people may be irrational. In what was perhaps the first sustained criticism of the theory, Keynes emphasized indeterminacy, notably because of the pervasive presence of uncertainty. His criticism applied especially to cases where agents have to form expectations about the behavior of other agents or about the development of the economy in the long run. In the wake of the current economic crisis, this objection has returned to the forefront. Before the crisis, going back to the 1970s, the main objections to the theory were based on pervasive irrational behavior. Experimental psychology and behavioral economics have uncovered many mechanisms that cause people to deviate from the behavior that rational-choice theory prescribes.
Disregarding some more technical sources of indeterminacy, the most basic one is embarrassingly simple: how can one impute to the social agents the capacity to make the calculations that occupy many pages of mathematical appendixes in the leading journals of economics and political science and that can be acquired only through years of professional training? …
I believe that much work in economics and political science that is inspired by rational-choice theory is devoid of any explanatory, aesthetic or mathematical interest, which means that it has no value at all. I cannot make a quantitative assessment of the proportion of work in leading journals that fall in this category, but I am confident that it represents waste on a staggering scale.
Recruiting Nancy Cartwright and her former student Julian Reiss to the University of Durham has definitely put Durham on the economic philosophy map.
A farmer is offered a choice between, on the one hand, getting a horse if it is raining tomorrow and a cow if it is not raining and, on the other hand, a cow if it is raining and a horse if it is not. He prefers getting a horse to getting a cow; this is a ‘pure preference’. But which of the offered alternatives does he prefer? Assume that he professes to be indifferent as between them. How shall we then understand his attitude?
To this question there is an answer, first proposed by F. P. Ramsey, which has later come to play a great role in so-called Bayesian decision theory …
Ramsey thought that an attitude of indifference here means that the person rates the two events, ‘rain’ and ‘not rain’, as equally probable. Accepting this, one can then proceed as follows:
Assume that our farmer is next presented with this option: On the one hand a horse if it is raining and a sheep if it is not raining and, on the other hand, a cow if it is raining and a hog if it is not raining. Again he says he is indifferent. This, on Ramsey’s view, means that the value to him of a cow is as much less the value of the horse as the value of a sheep is less than that of a hog. With this the way is open to a metrization of value and the introduction of utility functions. This done, one can use attiyudes of indifference in other, more complex, conditional options for defining arbitrary degrees of (subjective) probability. The product of the value of a good an dthe probability of its materialization is called expected utility. Attitudes of preference in options aim at maximizing this quantity.
Ramsey’s method is elegant and ingenious. Nevertheless, it seems to rest on a mistake. It ignores the distinction between two senses of ‘indifference’.
The farmer who, when presented with the first of the above two options, professes an attitude of indifference can do so for one or two reasons. Either he ‘simply has no idea’ about the chances of rainfall for tomorrow and therefore cannot make up his mind about which alternative is more to his advantage.
This does not mean that he thinks rain and not-rain equally likely; he simply suspends judgement. Or, he considers them equally likely and therefore judges the two alternatives to be equally advantageous. He could, for example, support his attitude with the argument that if he repeatedly opted for one of the alternatives, no matter which one, on average half the number of times he would ‘probably’ get a horse, which is to his advantage, and half the number of times a cow, which is to his disadvantage. So, therefore, he is indifferent as between the alternatives. It is, in other words not his judgement of indifference which gives meaning to the probabilities for him; but it is his prior estimate of the probabilities which determines his attitude of indifference.
I’m fond of science philosophers like Nancy Cartwright. With razor-sharp intellects they immediately go for the essentials. They have no time for bullshit. And neither should we.
In Evidence: For Policy — downloadable here — Cartwirght has assembled her papers on how better to use evidence from the sciences “to evaluate whether policies that have been tried have succeeded and to predict whether those we are thinking of trying will produce the outcomes we aim for.” Many of the collected papers center around what can and cannot be inferred from results in well-done randomised controlled trials (RCTs).
A must-read for everyone with an interest in the methodology of science.