The Bayesian Trap

21 October, 2018 at 09:58 | Posted in Statistics & Econometrics | Leave a comment

 

Advertisements

The connection between cause and probability

18 October, 2018 at 15:07 | Posted in Statistics & Econometrics | 2 Comments

hunt Causes can increase the probability​ of their effects; but they need not. And for the other way around: an increase in probability can be due to a causal connection; but lots of other things can be responsible as well …

The connection between causes and probabilities is like the connection between a disease and one of its symptoms: The disease can cause the symptom, but it need not; and the same symptom can result from a great many different diseases …

If you see a probabilistic dependence and are inclined to infer a causal connection from it, think hard about all the other possible reasons that that dependence might occur and eliminate them one by one. And when you are all done, remember — your conclusion is no more certain than your confidence that you really have eliminated all​ the possible alternatives.

Causality in social sciences — and economics — can never solely be a question of statistical inference. Causality entails more than predictability, and to really in-depth explain social phenomena require theory. Analysis of variation — the foundation of all econometrics — can never in itself reveal how these variations are brought about. First, when we are able to tie actions, processes or structures to the statistical relations detected, can we say that we are getting at relevant explanations of causation.

mediator“Mediation analysis” is this thing where you have a treatment and an outcome and you’re trying to model how the treatment works: how much does it directly affect the outcome, and how much is the effect “mediated” through intermediate variables …

In the real world, it’s my impression that almost all the mediation analyses that people actually fit in the social and medical sciences are misguided: lots of examples where the assumptions aren’t clear and where, in any case, coefficient estimates are hopelessly noisy and where confused people will over-interpret statistical significance …

More and more I’ve been coming to the conclusion that the standard causal inference paradigm is broken … So how to do it? I don’t think traditional path analysis or other multivariate methods of the throw-all-the-data-in-the-blender-and-let-God-sort-em-out variety will do the job. Instead we need some structure and some prior information.

Andrew Gelman

Most facts have many different, possible, alternative explanations, but we want to find the best of all contrastive (since all real explanation takes place relative to a set of alternatives) explanations. So which is the best explanation? Many scientists, influenced by statistical reasoning, think that the likeliest explanation is the best explanation. But the likelihood of x is not in itself a strong argument for thinking it explains y. I would rather argue that what makes one explanation better than another are things like aiming for and finding powerful, deep, causal, features and mechanisms that we have warranted and justified reasons to believe in. Statistical — especially the variety based on a Bayesian epistemology — reasoning generally has no room for these kinds of explanatory considerations. The only thing that matters is the probabilistic relation between evidence and hypothesis. That is also one of the main reasons I find abduction — inference to the best explanation — a better description and account of what constitute actual scientific reasoning and inferences.

In the social sciences … regression is used to discover relationships or to disentangle cause and effect. However, investigators have only vague ideas as to the relevant variables and their causal order; functional forms are chosen on the basis of convenience or familiarity; serious problems of measurement are often encountered.

Regression may offer useful ways of summarizing the data and making predictions. Investigators may be able to use summaries and predictions to draw substantive conclusions. However, I see no cases in which regression equations, let alone the more complex methods, have succeeded as engines for discovering causal relationships.

David Freedman

Some statisticians and data scientists think that algorithmic formalisms somehow give them access to causality. That is, however, simply not true. Assuming ‘convenient’ things like faithfulness or stability is not to give proofs. It’s to assume what has to be proven. Deductive-axiomatic methods used in statistics do no produce evidence for causal inferences. The real causality we are searching for is the one existing in the real world around us. If there is no warranted connection between axiomatically derived theorems and the real world, well, then we haven’t really obtained the causation we are looking for.

If contributions made by statisticians to the understanding of causation are to be taken over with advantage in any specific field of inquiry, then what is crucial is that the right relationship should exist between statistical and subject-matter concerns …
introduction-to-statistical-inferenceThe idea of causation as consequential manipulation is apt to research that can be undertaken primarily through experimental methods and, especially to ‘practical science’ where the central concern is indeed with ‘the consequences of performing particular acts’. The development of this idea in the context of medical and agricultural research is as understandable as the development of that of causation as robust dependence within applied econometrics. However, the extension of the manipulative approach into sociology would not appear promising, other than in rather special circumstances … The more fundamental difficulty is that​ under the — highly anthropocentric — principle of ‘no causation without manipulation’, the recognition that can be given to the action of individuals as having causal force is in fact peculiarly limited.

John H. Goldthorpe

When odds ratios mislead (wonkish)

17 October, 2018 at 19:10 | Posted in Statistics & Econometrics | Leave a comment

A few years ago, some researchers from Georgetown University published in the New England Journal of Medicine a study that demonstrated systematic race and sex bias in the behavior of America’s doctors. Needless to say, this finding was widely reported in the media:

Relative_Risk_and_Odds_RatioWashington Post: “Physicians said they would refer blacks and women to heart specialists for cardiac catheterization tests only 60 percent as often as they would prescribe the procedure for white male patients.”

N.Y. Times: “Doctors are only 60% as likely to order cardiac catheterization for women and blacks as for men and whites.”

Now let’t try a little test of reading comprehension. The study found that the referral rate for white men was 90.6%. What was the referral rate for blacks and women?

If you’re like most literate and numerate people, you’ll calculate 60% of 90.6%, and come up with .6*.906 = .5436. So, you’ll reason, the referral rate for blacks and women was about 54.4 %.

But in fact, what the study found was a referral rate for blacks and women of 84.7%.

What’s going on?

It’s simple — the study reported an “odds ratio”. The journalists, being as ignorant as most people are about odds and odds ratios, reported these numbers as if they were ratios of rates rather than ratios of odds.

Let’s go through the numbers. If 90.6% of white males were referred, then 9.4% were not referred, and so a white male’s odds of being referred were 90.6/9.4, or about 9.6 to 1. Since 84.7% of blacks and women were referred, 13.3% were not referred, and so for these folks, the odds of referral were 84.7/15.3 ≅ 5.5 to 1. The ratio of odds was thus about 5.5/9.6, or about 0.6 to 1. Convert to a percentage, and you’ve got “60% as likely” or “60 per cent as often”.

The ratio of odds (rounded to the nearest tenth) was truly 0.6 to 1. But when you report this finding by saying that “doctors refer blacks and women to heart specialists 60% as often as they would white male patients”, normal readers will take “60% as often” to describe a ratio of rates — even though in this case the ratio of rates (the “relative risk”) was 84.7/90.6, or (in percentage terms) about 93.5%.

Mark Liberman

Too much of ‘we controlled for’

14 October, 2018 at 12:12 | Posted in Statistics & Econometrics | Leave a comment

The gender pay gap is a fact that, sad to say, to a non-negligible extent is the result of discrimination. And even though many women are not deliberately discriminated against, but rather self-select into lower-wage jobs, this in no way magically explains away the discrimination gap. As decades of socialization research has shown, women may be ‘structural’ victims of impersonal social mechanisms that in different ways aggrieve them. Wage discrimination is unacceptable. Wage discrimination is a shame.

You see it all the time in studies. “We controlled for…” And then the list starts. The longer the better. Income. Age. Race. Religion. Height. Hair color. Sexual preference. Crossfit attendance. Love of parents. Coke or Pepsi. The more things you can control for, the stronger your study is — or, at least, the stronger your study seems. Controls give the feeling of specificity, of precision. But sometimes, you can control for too much. Sometimes you end up controlling for the thing you’re trying to measure …

paperAn example is research around the gender wage gap, which tries to control for so many things that it ends up controlling for the thing it’s trying to measure. As my colleague Matt Yglesias wrote:

“The commonly cited statistic that American women suffer from a 23 percent wage gap through which they make just 77 cents for every dollar a man earns is much too simplistic. On the other hand, the frequently heard conservative counterargument that we should subject this raw wage gap to a massive list of statistical controls until it nearly vanishes is an enormous oversimplification in the opposite direction. After all, for many purposes gender is itself a standard demographic control to add to studies — and when you control for gender the wage gap disappears entirely!” …

Take hours worked, which is a standard control in some of the more sophisticated wage gap studies. Women tend to work fewer hours than men. If you control for hours worked, then some of the gender wage gap vanishes. As Yglesias wrote, it’s “silly to act like this is just some crazy coincidence. Women work shorter hours because as a society we hold women to a higher standard of housekeeping, and because they tend to be assigned the bulk of childcare responsibilities.”

Controlling for hours worked, in other words, is at least partly controlling for how gender works in our society. It’s controlling for the thing that you’re trying to isolate.

Ezra Klein

Trying to reduce the risk of having established only ‘spurious relations’ when dealing with observational data, statisticians and econometricians standardly add control variables. The hope is that one thereby will be able to make more reliable causal inferences. But — as Keynes showed already back in the 1930s when criticizing statistical-econometric applications of regression analysis — if you do not manage to get hold of all potential confounding factors, the model risks producing estimates of the variable of interest that are even worse than models without any control variables at all. Conclusion: think twice before you simply include ‘control variables’ in your models!

Relative und absolute Risiken

8 October, 2018 at 09:03 | Posted in Statistics & Econometrics | Leave a comment

relative-vs-absolute2-3Das absolute Risiko einer medizinischen Intervention unterscheidet sich dabei vom relativen Risiko. Das wird an einem Beispiel klar: Angenommen, bei einem Test kann durch ein Medikament die Zahl der Erkrankungen von 10 auf 5 Fälle bei 1000 Personen reduziert werden. Relativ gesehen ist das eine Reduzierung des Krankheitsrisikos um 50 Prozent (5 von 10). Absolut gesehen sind es jedoch nur 0,5 Prozent (5 von 1000). Die erste Zahl wird vermutlich die Firma, die das Medikament vermarkten möchte, bevorzugt verwenden. Die zweite Zahl ist aber wesentlich aussagekräftiger, da sie die Gesamtheit aller Fälle berücksichtigt.

Die Angabe von relativen Risiken findet man leider viel zu oft in Medienberichten. Kein Wunder, denn sie klingen im Allgemeinen viel spektakulärer. Wenn wir wirklich über Gefahren oder Erfolge Bescheid wissen wollen, sollten wir nach den absoluten Zahlen suchen. Vor allem jedoch sollten wir uns immer der Tatsache bewusst sein, dass wir Wahrscheinlichkeiten und Risiken nicht ohne Weiteres korrekt einschätzen können.

Florian Freistetter

Vad säger oss — egentligen — statistiska regressioner?

30 September, 2018 at 20:16 | Posted in Statistics & Econometrics | Leave a comment

En grupp ‘högpresterande’ elever — Ada, Beda, och Cissi — söker in till en friskola. Ada och Beda blir antagna och börjar på den. Cissi blir också antagen, men väljer att gå på en kommunal skola. En annan grupp ‘lågpresterande’ elever — bestående av Dora och Eva — söker och blir både antagna till en friskola, men Eva väljer att gå på en kommunal skola.

drunkOm vi nu tittar på hur de presterar på ett kunskapsprov får vi följande resulatat: Ada — 22, Beda — 20, Cissi — 22, Dora — 12, Eva — 6. I den första gruppen får vi en provresultatskillnad mellan de elever som går på friskola och eleven som går i kommunal skola på -1 ((22+20)/2 – 22). I den andra gruppen blir provresultatskillnaden mellan eleven som väljer att gå på friskola och eleven som väljer gå i kommunal skola 6 (12-6). Den genomsnittliga provresultatskillnaden för grupperna tagna tillsammans är 2.5 ((-1+6)/2). Om man kör en vanlig OLS regression på datan — Skattade Provresultat = α + ß*Skolform + ζ*Grupptillhörighet — så får vi α = 8, ß = 2 och ζ = 12.

Kruxet med regressionsparameterskattningen är att det viktade genomsnittsvärdet — 2 — egentligen inte säger speciellt mycket om de gruppspecifika effekterna, där vi i den ena gruppen har en negativ ‘effekt’ av att gå i friskola och i den andra en positiv ‘effekt.’ Återigen har vi ett exempel där verklighetens heterogenitet riskerar ‘maskeras’ när man använder traditionell regressionsanalys för att skatta kausala ‘effekter.’

Curve-fitting methods

30 September, 2018 at 15:24 | Posted in Statistics & Econometrics | 1 Comment

 
curve_fitting

Instrumentalvariabler och heterogenitet — en kommentar (wonkish)

27 September, 2018 at 17:30 | Posted in Statistics & Econometrics | Leave a comment

Användandet av instrumentalvariabler används numera flitigt bland ekonomer och andra samhällsforskare. Inte minst när man vill försöka gå bakom statistikens ‘korrelationer’ och också säga något om ‘kausalitet.’

causation1Tyvärr brister det ofta rejält i tolkningen av de resultat man får med hjälp av den vanligaste metoden som används för detta syfte — statistisk regressionsanalys.

Ett exempel från skolområdet belyser detta väl.

Ibland hävdas det bland skoldebattörer och politiker att friskolor skulle vara bättre än kommunala skolor. De sägs leda till bättre resultat. Alltså: om vi tänker oss att man skulle låta elever från friskolor och kommunala skolor genomföra gemensamma prov så skulle friskolelever prestera bättre (fler rätt på provräkningar e d).

För argumentets skull antar vi att man för att ta reda på om det verkligen förhåller sig på detta sätt även i Malmö, slumpmässigt väljer ut högstadieelever i Malmö och låter dem skriva ett prov. Resultatet skulle då i vanlig regressionsanalytisk form kunna bli

Provresultat = 20 + 5*T,

där T=1 om eleven går i friskola, och T=0 om eleven går i kommunal skola. Detta skulle innebära att man får bekräftat antagandet — friskoleelever har i genomsnitt 5 poäng högre resultat än elever på kommunala skolor i Malmö.

Nu är ju politiker (förhoppningsvis) inte dummare än att de är medvetna om att detta statistiska resultat inte kan tolkas i kausala termer eftersom elever som går på friskolor typiskt inte har samma bakgrund (socio-ekonomiskt, utbildningsmässigt, kulturellt etc) som de som går på kommunala skolor (relationen skolform-resultat är ‘confounded’ via ‘selection bias.’)

För att om möjligt få ett bättre mått på skolformens kausala effekter väljer Malmös politiker  föreslå att man via lottning gör det möjligt för 1000 högstadieelever att bli antagna till en friskola. ‘Vinstchansen’ är 10%, så 100 elever får denna möjlighet. Av dessa antar 20 erbjudandet att gå i friskola. Av de 900 lotterideltagare som inte ‘vinner’ väljer 100 att gå i friskola.

Lotteriet uppfattas ofta av skolforskare som en ’instrumentalvariabel’ och när man så genomför regressionsanalysen med hjälp av denna visar sig resultatet bli

Provresultat = 20 + 2*T.

Detta tolkas standardmässigt som att man nu har fått ett kausalt mått på hur mycket bättre provresultat högstadieelever i Malmö i genomsnitt skulle få om de istället för att gå på kommunala skolor skulle välja att gå på friskolor.

Men stämmer det? Nej!

Om inte alla Malmös skolelever har exakt samma provresultat (vilket väl får anses vara ett rätt långsökt ‘homogenitetsantagande’) så gäller den angivna genomsnittliga kausala effekten bara de elever som väljer att gå på friskola om de ’vinner’ i lotteriet, men som annars inte skulle välja att gå på en friskola (på statistikjargong kallar vi dessa ’compliers’). Att denna grupp elever skulle vara speciellt intressant i det här exemplet är svårt att se med tanke på att den genomsnittliga kausala effekten skattad med hjälp av instrumentalvariabeln inte säger någonting alls om effekten för majoriteten (de 100 av 120 som väljer en friskola utan att ha ‘vunnit’ i lotteriet) av de som väljer att gå på en friskola.

Slutsats: forskare måste vara mycket mer försiktiga med att tolka vanliga statistiska regressionsanalyser och deras ‘genomsnittsskattningar’ som kausala. Verkligheten uppvisar en hög grad av heterogenitet. Och då säger oss regressionsanalysens konstanta ‘genomsnittsparametrar’ i regel inte ett smack!

When should we believe the unconfoundedness assumption?

26 September, 2018 at 09:38 | Posted in Statistics & Econometrics | 1 Comment


Economics may be an informative tool for research. But if its practitioners do not investigate and make an effort of providing a justification for the credibility of the assumptions on which they erect their building, it will not fulfil its task. There is a gap between its aspirations and its accomplishments, and without more supportive evidence to substantiate its claims, critics — like yours truly — will continue to consider its ultimate arguments as a mixture of rather unhelpful metaphors and metaphysics.

In mainstream economics, there is an excessive focus on formal modelling and statistics. The models and the statistical (econometric) machinery build on — often hidden and non-argued for — assumptions that are unsupported by data and whose veracity is highly uncertain.

Econometrics fails miserably over and over again. One reason is that the unconfoundedness assumption does not hold. Another important reason why it does is that the error term in the regression models used is thought of as representing the effect of the variables that were omitted from the models. The error term is somehow thought to be a ‘cover-all’ term representing omitted content in the model and necessary to include to ‘save’ the assumed deterministic relation between the other random variables included in the model. Error terms are usually assumed to be orthogonal (uncorrelated) to the explanatory variables. But since they are unobservable, they are also impossible to empirically test. And without justification of the orthogonality assumption, there is, as a rule, nothing to ensure identifiability:

Paul-Romer-727x727With enough math, an author can be confident that most readers will never figure out where a FWUTV (facts with unknown truth value) is buried. A discussant or referee cannot say that an identification assumption is not credible if they cannot figure out what it is and are too embarrassed to ask.

Distributional assumptions about error terms are a good place to bury things because hardly anyone pays attention to them. Moreover, if a critic does see that this is the identifying assumption, how can she win an argument about the true expected value the level of aether? If the author can make up an imaginary variable, “because I say so” seems like a pretty convincing answer to any question about its properties.

Paul Romer

Regression analysis — a constructive critique

25 September, 2018 at 08:34 | Posted in Statistics & Econometrics | Comments Off on Regression analysis — a constructive critique

As a descriptive exercise, all is well. One can compare the average salary of men and women, holding constant potential confounders. The result is a summary of how salaries differ on the average by gender, conditional on the values of one or more covariates. Why the salaries may on the average differ is not represented explicitly in the regression model …

berkMoving to causal inference is an enormous step that needs to be thoroughly considered. To begin, one must ponder … whether the causal variable of interest can be usefully conceptualized as an intervention within a response schedule framework [a formal structure in which to consider what the value of the response y would be if an input x were set to some vaue]. Once again consider gender. Imagine a particular faculty member. Now imagine intervening so that the faculty member’s gender could be set to ‘male.’ One would do this while altering nothing else about this person …

Clearly, the fit between the requisite response schedule and the academic world in which salaries are determined fails for at least two reasons: The idea of setting gender to male or female is an enormous stretch, and even, if gender could be manipulated, it is hard to accept that only gender would be changed. In short, the causal story is in deep trouble even before the matter of holding constant surfaces …

This is not to imply that it never makes sense to apply regression-based adjustments in causal modeling. The critical issue is that the real world must cooperate by providing interventions that could be delivered separately …

As a technical move, it is easy to apply regression-based adjustmens to confounders. Whether it is sensible to do so is an entirely different matter …

The most demanding material [is] the examination of what it means to ‘hold constant’ … The problem [is] the potential incongruence between the mechanics of regression-based adjustments and the natural or social world under study.

How to control confounding (wonkish)

18 September, 2018 at 17:55 | Posted in Statistics & Econometrics | Comments Off on How to control confounding (wonkish)

 

The debate about RCTs is over. The randomistas lost. We won.

13 September, 2018 at 10:30 | Posted in Statistics & Econometrics | Comments Off on The debate about RCTs is over. The randomistas lost. We won.

 

Structural econometrics

11 September, 2018 at 10:02 | Posted in Statistics & Econometrics | 1 Comment

In the ongoing discussion on the ’empirical revolution’ in economics, some econometricians criticise — rightfully — the view that quasi-experiments and RCTs are the (only) true solutions to finding causal parameters. But — the alternative they put forward, structural models, have their own monumental problems.

Structural econometrics — essentially going back to the Cowles programme — more or less takes for granted the possibility of a priori postulating relations that describe economic behaviours as invariant within a Walrasian general equilibrium system. In practice, that means the structural model is based on a straightjacket delivered by economic theory. Causal inferences in those models are — by assumption — made possible since the econometrician is supposed to know the true structure of the economy. And, of course, those exact assumptions are the crux of the matter. If the assumptions don’t hold, there is no reason whatsoever to have any faith in the conclusions drawn, since they do not follow from the statistical machinery used!

 LierBy making many strong background assumptions, the deductivist [the conventional logic of structural econometrics] reading of the regression model allows one — in principle — to support a structural reading of the equations and to support many rich causal claims as a result. Here, however, the difficulty is that of finding good evidence for many of the assumptions on which the approach rests. It seems difficult to believe, even in cases where we have good background economic knowledge, that the background information will be sufficient​ to do the job that the deductivist asks of it. As a result, the deductivist approach may be difficult to sustain, at least in economics.

The difficulties in providing an evidence base for the deductive approach show just how difficult it is to warrant such strong causal claims. In short, as might be expected there is a trade-off between the strength of causal claims we would like to make from non-experimental data and the possibility of grounding these in evidence. If this conclusion is correct — and an appropriate elaboration was​ done to take into account the greater sophistication of actual structural econometric methods — then it suggests that if we want to do evidence-based structural econometrics, then we may need to be more modest in the causal knowledge we aim for. Or failing this, we should not act as if our causal claims — those that result from structural econometrics — are fully warranted by the evidence and we should acknowledge that they rest on contingent, conditional assumptions about the economy and the nature of causality.

Damien Fennell

Econometricians still concentrate on fixed parameter models and the structuralist belief/hope that parameter-values estimated in specific spatio-temporal contexts are exportable to totally different contexts. To warrant this assumption one, however, has to convincingly establish that the targeted acting causes are stable and invariant so that they maintain their parametric status after the bridging. The endemic lack of predictive success of the econometric project indicates that this hope of finding fixed parameters is a hope for which there really is no other ground than hope itself.

Most of the assumptions that econometric modelling presupposes are not only unrealistic — they are plainly wrong.

If economic regularities obtain they do it (as a rule) only because we engineered them for that purpose. Outside man-made ‘nomological machines’ they are rare, or even non-existent. Unfortunately, that also makes most of the achievements of both structural and non-structural econometric forecasting and ‘causal explanation’ rather useless.

Estadística y econometría no son muy útiles para entender la economía

10 September, 2018 at 08:02 | Posted in Statistics & Econometrics | Comments Off on Estadística y econometría no son muy útiles para entender la economía

La mayor parte del trabajo en econometría y estadística se realiza suponiendo que el investigador tiene un modelo teórico que es “verdadero”. Sin embargo, “pensar que podemos construir un modelo donde todas las variables relevantes están incluidas, y que las relaciones funcionales que hay entre ellas están correctamente especificadas, no es solamente una creencia sin sustento, sino que es una creencia imposible de sustentar”, indica Syll.

onthe useY es que “las teorías con las que trabajamos cuando construimos nuestros modelos de regresión son insuficientes. No importa qué estudiemos, siempre habrán variables faltantes, y no podemos saber la forma correcta para especificar funcionalmente las relaciones entre las variables”.

Por tanto, todos los modelos econométricos construidos son falibles. “Siempre hay una lista interminable de posibles variables a incluir, e infinitas posibles formas de especificar las relaciones entre ellas. Así que cada econometrista presenta su propia especificación y estimaciones de parámetros. El Santo Grial de la econometría de valores paramétricos consistentes y estables no es nada más que un sueño”, sentencia el experto.

Esto se debe a que las condiciones teóricas que se deben cumplir para que la econometría funcione, no se cumplen ni de lejos en la realidad.

Según explica Syll, la econometría es básicamente un método deductivo: dados ciertos supuestos, produce inferencias deductivas. “El problema es que nunca sabemos completamente cuándo esos supuestos son correctos. Las conclusiones solo pueden ser tan válidas como las premisas, y eso se aplica también en la econometría”.

Alejandro Zegada/El País

The range of men and women

8 September, 2018 at 13:59 | Posted in Statistics & Econometrics | Comments Off on The range of men and women

I happen to be reading The Puppet Masters, a Heinlein novel from the 1950s … and came across this line:

“Listen, son—most women are damn fools and children. But they’ve got more range than we’ve got. The brave ones are braver, the good ones are better—and the vile ones are viler. . . .”

main-qimg-e3a75fc210c089f8562259e7a91cc646-c

What struck me about the above quote is how it goes in the opposite of current received wisdom about men and women, the view, associated with former U.S. Treasury Secretary Lawrence Summers, that men are more variable than women, the “wider tails” theory, which is said to explain why there are more male geniuses and more male imbeciles, more male heroes and more male villains, etc. Heinlein’s quote above says the opposite (on the moral, not the intellectual, dimension, but I think the feeling is the same).

My point here is not to use Heinlein to shoot down Summers (or vice-versa). Rather, it’s just interesting how received wisdom can change over time. What seemed like robust common sense back in the 1950s, has turned around 180 degrees, just a few decades later.

Andrew Gelman

Causal interaction and external validity

4 September, 2018 at 11:48 | Posted in Statistics & Econometrics | 2 Comments

As yours truly has repeatedly argued on this blog, randomized control trials (RCTs) usually do not provide evidence that their results are exportable to other target systems. The almost religious belief with which its propagators portray it, cannot hide the fact that RCTs cannot be taken for granted to give generalizable results.

checkRandomized evaluations have become widespread in development economics in recent decades, largely due to the promise of identifying policy-relevant causal effects. A number of concerns have been raised in response … [One] concern, which is the subject of the present contribution, is that current research based on experimental methods does not adequately address the problem of extrapolating from empirical findings to policy claims relating to other populations (“external validity”) …

Combining insights from prior literature on experimental methods in social science and econometric formulations of external validity yields three important insights. First, that plausibly attaining external validity requires ex ante knowledge of covariates that influence the treatment effect along with empirical information on these variables in the experimental and policy populations. This, in turn, implies that “atheoretical” replication-based resolutions to the external validity problem are unlikely to be successful except for extremely simple causal relations, or very homogeneous populations, of a kind that appears​ unlikely in social science. Finally, the formal requirements for external validity are conceptually analogous to the assumptions needed for causal identification using observational data. Together these imply a much more modest interpretation of the policy relevance of past work that has not addressed these issues. Furthermore, the resultant challenges for making policy claims premised on randomized evaluations are substantial, if not insurmountable, in many cases of interest.

Seán Muller

Muller’s article underlines the problem many ‘randomistas’ end up with when underestimating heterogeneity and interaction. It does not just turn up as an external validity problem when trying to ‘export’ regression results to different times or different target populations. It is also often an internal problem to the millions of regression estimates that economists produce every year.

‘Ideally controlled experiments’ tell us with certainty what causes what effects — but only given the right ‘closures.’ Making appropriate extrapolations from (ideal, accidental, natural or quasi) experiments to different settings, populations or target systems, is not easy. ‘It works there’ is no evidence for ‘it will work here.’ Causes deduced in an experimental setting still have to show that they come with an export-warrant to the target population/system. The causal background assumptions made have to be justified, and without licenses to export, the value of ‘rigorous’ and ‘precise’ methods — and ‘on-average-knowledge’ — is despairingly small.

RCTs have very little reach beyond giving descriptions of what has happened in the past. From the perspective of the future and for policy purposes they are as a rule of limited value since they cannot tell us what background factors were held constant when the trial intervention was being made.

RCTs usually do not provide evidence that the results are exportable to other target systems. RCTs cannot be taken for granted to give generalizable results. That something works somewhere for someone is no warranty for us to believe it to work for us here or even that it works generally.

The Gambler’s Ruin (wonkish)

30 August, 2018 at 16:22 | Posted in Statistics & Econometrics | Comments Off on The Gambler’s Ruin (wonkish)

 

[In case you’re curious what happens if you start out with $25 but we change the probabilities — from 0.50, 0.50 into e. g. 0.49, 0.51 — you can check this out easily with e.g. Gretl:
matrix B = {1,0,0,0; 0.51,0,0.49,0;0,0.51,0,0.49;0,0,0,1}
matrix v25 = {0,1,0,0}
matrix X = v25*B^50
X

which gives X = 0.68 0.00 0.00 0.32]

Schlechte Wissenschaft

30 August, 2018 at 08:33 | Posted in Statistics & Econometrics | Comments Off on Schlechte Wissenschaft

bad-science-watch-logoWenn Wissenschaftler etwas herausgefunden haben – wann kann man sich auch tatsächlich darauf verlassen? Eine Antwort lautet: Wenn Fachkollegen die Studie überprüft haben. Eine andere: Wenn sie in einer renommierten Fachzeitschrift veröffentlicht wurde. Doch manchmal reicht auch beides zusammen nicht aus, wie Forscher jetzt gezeigt haben. Und zwar auf die beste und aufwendigste Art: Sie haben die zugrundeliegenden Experimente wiederholt. Und geschaut, ob noch einmal dasselbe dabei herauskommt.

Es ging um 21 sozialwissenschaftliche Studien aus den Journalen Nature und Science. Mehr Renommee geht nicht. Und natürlich werden dort eingereichte Arbeiten von Experten geprüft (Peer Review). Trotzdem kam in fast 40 Prozent der Fälle nicht noch einmal dasselbe heraus – sondern meistens: gar nichts …

Selbst wenn bei Wiederholung der Experimente ähnliche Effekte auftraten, waren diese merklich kleiner als im Original, durchschnittlich nur dreiviertel so groß. Wenn man die nicht-replizierbaren Studien einrechnet, schrumpft der durchschnittliche Effekt aller Wiederholungen sogar auf die Hälfte. Deshalb sagt Forschungskritiker John Ioannidis: “Wenn man einen Artikel über ein sozialwissenschaftliches Experiment in Nature oder Science liest, muss man den Effekt gleich halbieren.”

Die Zeit

Some common misunderstandings about randomization

28 August, 2018 at 14:22 | Posted in Statistics & Econometrics | Comments Off on Some common misunderstandings about randomization

rcRandomization is an alternative when we do not know enough to control, but is generally inferior to good control when we do. We suspect that at least some of the popular and professional enthusiasm for RCTs, as well as the belief that they are precise by construction, comes from misunderstandings about … random or realized confounding on the one hand and confounding in expectation on the other …

The RCT strategy is only successful if we are happy with estimates that are arbitrarily far from the truth, just so long as the errors cancel out over a series of imaginary experiments. In reality, ​the causality that is being attributed to the treatment might, in fact, be coming from an imbalance in some other cause in our particular trial; limiting this requires serious thought about possible covariates.

Angus Deaton & Nancy Cartwright

The point of making a randomized experiment is often said to be that it ‘ensures’ that any correlation between a supposed cause and effect indicates a causal relation. This is believed to hold since randomization (allegedly) ensures that a supposed causal variable does not correlate with other variables that may influence the effect.

The problem with that simplistic view on randomization is that the claims made are both exaggerated and false:

• Even if you manage to do the assignment to treatment and control groups ideally random, the sample selection certainly is — except in extremely rare cases — not random. Even if we make a proper randomized assignment, if we apply the results to a biased sample, there is always the risk that the experimental findings will not apply. What works ‘there,’ does not work ‘here.’ Randomization hence does not ‘guarantee ‘ or ‘ensure’ making the right causal claim. Although randomization may help us rule out certain possible causal claims, randomization per se does not guarantee anything!

• Even if both sampling and assignment are made in an ideal random way, performing standard randomized experiments only give you averages. The problem here is that although we may get an estimate of the ‘true’ average causal effect, this may ‘mask’ important heterogeneous effects of a causal nature. Although we get the right answer of the average causal effect being 0, those who are ‘treated’  may have causal effects equal to -100 and those ‘not treated’ may have causal effects equal to 100. Contemplating being treated or not, most people would probably be interested in knowing about this underlying heterogeneity and would not consider the average effect particularly enlightening.

• There is almost always a trade-off between bias and precision. In real-world settings, a little bias often does not overtrump greater precision. And — most importantly — in case we have a population with sizeable heterogeneity, the average treatment effect of the sample may differ substantially from the average treatment effect in the population. If so, the value of any extrapolating inferences made from trial samples to other populations is highly questionable.

• Since most real-world experiments and trials build on performing a single randomization, what would happen if you kept on randomizing forever, does not help you to ‘ensure’ or ‘guarantee’ that you do not make false causal conclusions in the one particular randomized experiment you actually do perform. It is indeed difficult to see why thinking about what you know you will never do, would make you happy about what you actually do.

Randomization is not a panacea. It is not the best method for all questions and circumstances. Proponents of randomization make claims about its ability to deliver causal knowledge that are simply wrong. There are good reasons to be sceptical of the now popular — and ill-informed — view that randomization is the only valid and best method on the market. It is not.

Tractability, truth, and ignorability

25 August, 2018 at 15:37 | Posted in Statistics & Econometrics | 1 Comment

ignorance
Most attempts at causal inference in observational studies are based on assumptions that treatment assignment is ignorable. Such assumptions are usually made casually, largely because they justify the use of available statistical methods and not because they are truly believed.

Marshall Joffe et al.

An interesting (but from a technical point of view rather demanding) article on a highly questionable assumption used in ‘potential outcome’ causal models. It made yours truly come to think of how tractability has come to override reality​ and truth also in moder​n mainstream economics.

Having a ‘tractable’ model is of course great, since it usually means that you can solve it. But — using ‘simplifying’ tractability assumptions (rational expectations, common knowledge, representative agents, linearity, additivity, ergodicity, etc.) because otherwise they cannot ‘manipulate’ their models or come up with ‘rigorous ‘ and ‘precise’ predictions and explanations, does not exempt economists from having to justify their modelling choices. Being able to ‘manipulate’ things in models cannot per se be enough to warrant a methodological choice. If economists do not think their tractability assumptions make for good and realist models, it is certainly a just question to ask for clarification of the ultimate goal of the whole modelling endeavour.

Take for example the ongoing discussion on rational expectations as a modelling assumption. Those who want to build macroeconomics on microfoundations usually maintain that the only robust policies are those based on rational expectations and representative actors models. As yours truly has tried to show in On the use and misuse of theories and models in mainstream economics there is really no support for this conviction at all. If microfounded macroeconomics has nothing to say about the real world and the economic problems out there, why should we care about it? The final court of appeal for macroeconomic models is not if we — once we have made our tractability assumptions — can ‘manipulate’ them, but the real world. And as long as no convincing justification is put forward for how the inferential bridging de facto is made, macroeconomic modelbuilding is little more than hand-waving that give us rather a little warrant for making inductive inferences from models to real-world target systems. If substantive questions about the real world are being posed, it is the formalistic-mathematical representations utilized to analyze them that have to match reality, not the other way around.

Berkson’s paradox or why attractive people you date tend​ to be jerks

24 August, 2018 at 15:16 | Posted in Statistics & Econometrics | 4 Comments

The Book of Why_coverHave you ever noticed that, among the people you date, the attractive ones tend to be jerks? Instead of constructing elaborate psychosocial theories, consider a simpler explanation. Your choice of people to date depends on two factors, attractiveness and personality. You’ll take a chance on dating a mean attractive person or a nice unattractive person, and certainly a nice attractive person, but not a mean unattractive person … This creates a spurious negative correlation between attractiveness and personality. The sad truth is that unattractive people are just as mean as attractive people — but you’ll never realize it, because you’ll never date somebody who is both mean and unattractive.

Wage discrimination and the dangers of ‘controlling for’ confounders

24 August, 2018 at 09:08 | Posted in Economics, Statistics & Econometrics | Comments Off on Wage discrimination and the dangers of ‘controlling for’ confounders

You see it all the time in studies. “We controlled for…” And then the list starts. The longer the better. Income. Age. Race. Religion. Height. Hair color. Sexual preference. Crossfit attendance. Love of parents. Coke or Pepsi. The more things you can control for, the stronger your study is — or, at least, the stronger your study seems. Controls give the feeling of specificity, of precision. But sometimes, you can control for too much. Sometimes you end up controlling for the thing you’re trying to measure …

wage-gapThe problem with controls is that it’s often hard to tell the difference between a variable that’s obscuring the thing you’re studying and a variable that is the thing you’re studying.

An example is research around the gender wage gap, which tries to control for so many things that it ends up controlling for the thing it’s trying to measure. As my colleague Matt Yglesias wrote:

“The commonly cited statistic that American women suffer from a 23 percent wage gap through which they make just 77 cents for every dollar a man earns is much too simplistic. On the other hand, the frequently heard conservative counterargument that we should subject this raw wage gap to a massive list of statistical controls until it nearly vanishes is an enormous oversimplification in the opposite direction. After all, for many purposes gender is itself a standard demographic control to add to studies — and when you control for gender the wage gap disappears entirely! The question to ask about the various statistical controls that can be applied to shrink the gender gap is what are they actually telling us. The answer, I think, is that it’s telling how the wage gap works.”

Take hours worked, which is a standard control in some of the more sophisticated wage gap studies. Women tend to work fewer hours than men. If you control for hours worked, then some of the gender wage gap vanishes. As Yglesias wrote, it’s “silly to act like this is just some crazy coincidence. Women work shorter hours because as a society we hold women to a higher standard of housekeeping, and because they tend to be assigned the bulk of childcare responsibilities.”

Controlling for hours worked, in other words, is at least partly controlling for how gender works in our society. It’s controlling for the thing that you’re trying to isolate.

Ezra Klein

The gender pay gap is a fact that, sad to say, to a non-negligible extent is the result of discrimination. And even though many women are not deliberately discriminated against, but rather self-select into lower-wage jobs, this in no way magically explains away the discrimination gap. As decades of socialization​ research has shown, women may be ‘structural’ victims of impersonal social mechanisms that in different ways aggrieve them. Wage discrimination is unacceptable. Wage discrimination​ is a shame.

Why most published research findings are false

19 August, 2018 at 10:45 | Posted in Statistics & Econometrics | Comments Off on Why most published research findings are false

Instead of chasing statistical significance, we should improve our understanding of the range of R values — the pre-study odds — where research efforts operate. Before running an experiment, investigators should consider what they believe the chances are that they are testing a true rather than a non-true relationship. Speculated high R values may sometimes then be ascertained … Large studies with minimal bias should be performed on research findings that are considered relatively established, to see how often they are indeed confirmed. I suspect several established “classics” will fail the test.

homer-stats-quoteNevertheless, most new discoveries will continue to stem from hypothesis-generating​ research with low or very low pre-study odds. We should then acknowledge that statistical significance testing in the report of a single study gives only a partial picture, without knowing how much testing has been done outside the report and in the relevant field at large. Despite a large statistical literature for multiple testing corrections, usually it is impossible to decipher how much data dredging by the reporting authors or other research teams has preceded a reported research finding. Even if determining this were feasible, this would not inform us about the pre-study odds. Thus, it is unavoidable that one should make approximate assumptions on howmany relationships are expected to be true among those probed across the relevant research fields and research designs.​

John P. A. Ioannidis

Collider bias (wonkish)

8 August, 2018 at 11:51 | Posted in Statistics & Econometrics | Comments Off on Collider bias (wonkish)

 

Why data is NOT enough to answer scientific questions

7 August, 2018 at 10:50 | Posted in Statistics & Econometrics | 3 Comments

The Book of Why_coverIronically, the need for a theory of causation began to surface at the same time that statistics came into being. In fact modern statistics hatched out of the causal questions that Galton and Pearson asked about heredity and out of their ingenious attempts to answer them from cross-generation data. Unfortunately, they failed in this endeavor and, rather than pause to ask “Why?”, they declared those questions off limits, and turned to develop a thriving, causality- free enterprise called statistics.

This was a critical moment in the history of science. The opportunity to equip causal questions with a language of their own came very close to being realized, but was squandered. In the following years, these questions were declared unscientific and went underground. Despite heroic efforts by the geneticist Sewall Wright (1889-1988), causal vocabulary was virtually prohibited for more than half a century. And when you prohibit speech, you prohibit thought, and you stifle principles, methods, and tools.

Readers do not have to be scientists to witness this prohibition. In Statistics 101, every student learns to chant: “Correlation is not causation.” With good reason! The rooster crow is highly correlated with the sunrise, yet it does not cause the sunrise.

Unfortunately, statistics took this common-sense observation and turned it into a fetish. It tells us that correlation is not causation, but it does not tell us what causation is. In vain will you search the index of a statistics textbook for an entry on “cause.” Students are never allowed to say that X is the cause of Y — only that X and Y are related or associated.

A popular idea in quantitative social sciences is to think of a cause (C) as something that increases the probability of its effect or outcome (O). That is:

P(O|C) > P(O|-C)

However, as is also well-known, a correlation between two variables, say A and B, does not necessarily imply that that one is a cause of the other, or the other way around, since they may both be an effect of a common cause, C.

In statistics and econometrics, we usually solve this confounder problem by controlling for C, i. e. by holding C fixed. This means that we actually look at different populations – those in which C occurs in every case, and those in which C doesn’t occur at all. This means that knowing the value of A does not influence the probability of C [P(C|A) = P(C)]. So if there then still exist a correlation between A and B in either of these populations, there has to be some other cause operating. But if all other possible causes have been controlled for too, and there is still a correlation between A and B, we may safely conclude that A is a cause of B, since by controlling for all other possible causes, the correlation between the putative cause A and all the other possible causes (D, E,. F …) is broken.

This is, of course, a very demanding prerequisite, since we may never actually be sure to have identified all putative causes. Even in scientific experiments may the number of uncontrolled causes be innumerable. Since nothing less will do, we do all understand how hard it is to actually get from correlation to causality. This also means that only relying on statistics or econometrics is not enough to deduce causes from correlations.

Some people think that randomization may solve the empirical problem. By randomizing we are getting different populations that are homogeneous in regards to all variables except the one we think is a genuine cause. In that way, we are supposed being able not having to actually know what all these other factors are.

If you succeed in performing an ideal randomization with different treatment groups and control groups that is attainable. But — it presupposes that you really have been able to establish — and not just assumed — that the probability of all other causes but the putative (A) have the same probability distribution in the treatment and control groups, and that the probability of assignment to treatment or control groups are independent of all other possible causal variables.

Unfortunately, real experiments and real randomizations seldom or never achieve this. So, yes, we may do without knowing all causes, but it takes ideal experiments and ideal randomizations to do that, not real ones.

That means that in practice we do have to have sufficient background knowledge to deduce causal knowledge. Without old knowledge, we can’t get new knowledge, and — no causes in, no causes out.

Econometrics is basically a deductive method. Given the assumptions (such as manipulability, transitivity, Reichenbach probability principles, separability, additivity, linearity, etc., etc.) it delivers deductive inferences. The problem, of course, is that we will never completely know when the assumptions are right. Real target systems are seldom epistemically isomorphic to axiomatic-deductive models/systems, and even if they were, we still have to argue for the external validity of the e conclusions reached from within these epistemically convenient models/systems. Causal evidence generated by statistical/econometric procedures may be valid in closed models, but what we usually are interested in, is causal evidence in the real target system we happen to live in.

Advocates of econometrics want to have deductively automated answers to fundamental causal questions. But to apply ‘thin’ methods we have to have ‘thick’ background knowledge of what’s going on in the real world, and not in idealized models. Conclusions can only be as certain as their premises — and that also applies to the quest for causality in econometrics.

The central problem with the present ‘machine learning’ and ‘big data’ hype is that so many — falsely — think that they can get away with analysing real-world phenomena without any (commitment to) theory. But — data never speaks for itself. Without a prior statistical set-up, there actually are no data at all to process. And — using a machine learning algorithm will only produce what you are looking for.

Clever data-mining tricks are never enough to answer important scientific questions. Theory matters.

Der Zusammenhang zwischen Musikgeschmack und Intelligenz

30 July, 2018 at 16:26 | Posted in Statistics & Econometrics | 3 Comments

 

Regression analysis — a case of wishful thinking

13 July, 2018 at 18:34 | Posted in Statistics & Econometrics | Comments Off on Regression analysis — a case of wishful thinking

The impossibility of proper specification is true generally in regression analyses across the social sciences, whether we are looking at the factors affecting occupational status, voting behavior, etc. The problem is that as implied by the conditions for regression analyses to yield accurate, unbiased estimates, you need to investigate a phenomenon that has underlying mathematical regularities – and, moreover, you need to know what they are. Neither seems true. I have no reason to believe that the way in which multiple factors affect earnings, student achievement, and GNP have some underlying mathematical regularity across individuals or countries. More likely, each individual or country has a different function, and one that changes over time. Even if there was some constancy, the processes are so complex that we have no idea of what the function looks like.

regressionResearchers recognize that they do not know the true function and seem to treat, usually implicitly, their results as a good-enough approximation. But there is no basis for the belief that the results of what is run in practice is anything close to the underlying phenomenon, even if there is an underlying phenomenon. This just seems to be wishful thinking. Most regression analysis research doesn’t even pay lip service to theoretical regularities. But you can’t just regress anything you want and expect the results to approximate reality. And even when researchers take somewhat seriously the need to have an underlying theoretical framework – as they have, at least to some extent, in the examples of studies of earnings, educational achievement, and GNP that I have used to illustrate my argument – they are so far from the conditions necessary for proper specification that one can have no confidence in the validity of the results.

Steven J. Klees

The theoretical conditions that have to be fulfilled for regression analysis and econometrics to really work are nowhere even closely met in reality. Making outlandish statistical assumptions do not provide a solid ground for doing relevant social science and economics. Although regression analysis and econometrics have become the most used quantitative methods in social sciences and economics today, it’s still a fact that the inferences made from them are — strictly seen — invalid.

The main reason why almost all econometric models are wrong

13 July, 2018 at 09:33 | Posted in Statistics & Econometrics | 3 Comments

How come that econometrics and statistical regression analyses still have not taken us very far in discovering, understanding, or explaining causation in socio-economic contexts? That is the question yours truly has tried to answer in an article published in the latest issue of World Economic Association Commentaries:

The processes that generate socio-economic data in the real world cannot just be assumed to always be adequately captured by a probability measure. And, so, it cannot be maintained that it even should be mandatory to treat observations and data — whether cross-section, time series or panel data — as events generated by some probability model. The important activities of most economic agents do not usually include throwing dice or spinning roulette-wheels. Data generating processes — at least outside of nomological machines like dice and roulette-wheels — are not self-evidently best modelled with probability measures.

EGOBILD2017When economists and econometricians — often uncritically and without arguments — simply assume that one can apply probability distributions from statistical theory on their own area of research, they are really skating on thin ice. If you cannot show that data satisfies all the conditions of the probabilistic nomological machine, then the statistical inferences made in mainstream economics lack sound foundations.

Statistical — and econometric — patterns should never be seen as anything other than possible clues to follow. Behind observable data, there are real structures and mechanisms operating, things that are — if we really want to understand, explain and (possibly) predict things in the real world — more important to get hold of than to simply correlate and regress observable variables.

Statistics cannot establish the truth value of a fact. Never has. Never will.

What instrumental​ variables analysis is all about

9 July, 2018 at 18:11 | Posted in Statistics & Econometrics | 2 Comments

 

The randomistas revolution

7 July, 2018 at 10:54 | Posted in Statistics & Econometrics | 2 Comments

RandomistasIn his new history of experimental social science — Randomistas: How radical researchers are changing our world — Andrew Leigh gives an introduction to the RCT (randomized controlled trial) method for conducting experiments in medicine, psychology, development economics, and policy evaluation. Although it mentions there are critiques that can be waged against it, the author does not let that shadow his overwhelmingly enthusiastic view on RCT.

Among mainstream economists, this uncritical attitude towards RCTs has become standard. Nowadays many mainstream economists maintain that ‘imaginative empirical methods’ — such as natural experiments, field experiments, lab experiments, RCTs — can help us to answer questions concerning the external validity of economic models. In their view, they are more or less tests of ‘an underlying economic model’ and enable economists to make the right selection from the ever-expanding ‘collection of potentially applicable models.’

When looked at carefully, however, there are in fact few real reasons to share this optimism on the alleged ’empirical turn’ in economics.

If we see experiments or field studies as theory tests or models that ultimately aspire to say something about the real ‘target system,’ then the problem of external validity is central (and was for a long time also a key reason why behavioural economists had trouble getting their research results published).

Assume that you have examined how the performance of a group of people (A) is affected by a specific ‘treatment’ (B). How can we extrapolate/generalize to new samples outside the original population? How do we know that any replication attempt ‘succeeds’? How do we know when these replicated experimental results can be said to justify inferences made in samples from the original population? If, for example, P(A|B) is the conditional density function for the original sample, and we are interested in doing an extrapolative prediction of E [P(A|B)], how can we know that the new sample’s density function is identical with the original? Unless we can give some really good argument for this being the case, inferences built on P(A|B) is not really saying anything on that of the target system’s P'(A|B).

External validity/extrapolation/generalization is founded on the assumption that we can make inferences based on P(A|B) that is exportable to other populations for which P'(A|B) applies. Sure, if one can convincingly show that P and P’are similar enough, the problems are perhaps surmountable. But arbitrarily just introducing functional specification restrictions of the type invariance/stability /homogeneity, is, at least for an epistemological realist far from satisfactory. And often it is – unfortunately – exactly this that I see when I take part of mainstream economists’ RCTs and ‘experiments.’

Many ‘experimentalists’ claim that it is easy to replicate experiments under different conditions and therefore a fortiori easy to test the robustness of experimental results. But is it really that easy? Population selection is almost never simple. Had the problem of external validity only been about inference from sample to population, this would be no critical problem. But the really interesting inferences are those we try to make from specific labs/experiments/fields to specific real-world situations/institutions/ structures that we are interested in understanding or (causally) to explain. And then the population problem is more difficult to tackle.

In randomized trials the researchers try to find out the causal effects that different variables of interest may have by changing circumstances randomly — a procedure somewhat (‘on average’) equivalent to the usual ceteris paribus assumption).

Besides the fact that ‘on average’ is not always ‘good enough,’ it amounts to nothing but hand waving to simpliciter assume, without argumentation, that it is tenable to treat social agents and relations as homogeneous and interchangeable entities.

Randomization is used to basically allow the econometrician to treat the population as consisting of interchangeable and homogeneous groups (‘treatment’ and ‘control’). The regression models one arrives at by using randomized trials tell us the average effect that variations in variable X has on the outcome variable Y, without having to explicitly control for effects of other explanatory variables R, S, T, etc., etc. Everything is assumed to be essentially equal except the values taken by variable X.

In a usual regression context one would apply an ordinary least squares estimator (OLS) in trying to get an unbiased and consistent estimate:

Y = α + βX + ε,

where α is a constant intercept, β a constant ‘structural’ causal effect and ε an error term.

The problem here is that although we may get an estimate of the ‘true’ average causal effect, this may ‘mask’ important heterogeneous effects of a causal nature. Although we get the right answer of the average causal effect being 0, those who are ‘treated'( X=1) may have causal effects equal to – 100 and those ‘not treated’ (X=0) may have causal effects equal to 100. Contemplating being treated or not, most people would probably be interested in knowing about this underlying heterogeneity and would not consider the OLS average effect particularly enlightening.

Most ‘randomistas’ underestimate the heterogeneity problem. It does not just turn up as an external validity problem when trying to ‘export’ regression results to different times or different target populations. It is also often an internal problem to the millions of regression estimates that economists produce every year.

‘Ideally controlled experiments’ tell us with certainty what causes what effects — but only given the right ‘closures.’ Making appropriate extrapolations from (ideal, accidental, natural or quasi) experiments to different settings, populations or target systems, is not easy. “It works there” is no evidence for “it will work here”. Causes deduced in an experimental setting still have to show that they come with an export-warrant to the target population/system. The causal background assumptions made have to be justified, and without licenses to export, the value of ‘rigorous’ and ‘precise’ methods — and ‘on-average-knowledge’ — is despairingly small.

RCTs have very little reach beyond giving descriptions of what has happened in the past. From the perspective of the future and for policy purposes they are as a rule of limited value since they cannot tell us what background factors were held constant when the trial intervention was being made.

RCTs usually do not provide evidence that the results are exportable to other target systems. RCTs cannot be taken for granted to give generalizable results. That something works somewhere for someone is no warranty for us to believe it to work for us here or even that it works generally. RCTs are simply not the best method for all questions and in all circumstances. And insisting on using only one tool often means using the wrong tool.

Next Page »

Create a free website or blog at WordPress.com.
Entries and comments feeds.