## The most dangerous equation in the world

15 December, 2018 at 17:12 | Posted in Statistics & Econometrics | Leave a commentFailure to take sample size into account and inferring causality from outliers can lead to incorrect policy actions. For this reason, Howard Wainer refers to the formula for the standard deviation of the mean the “most dangerous equation in the world.” For example, in the 1990s the Gates Foundation and other nonprofits advocated breaking up schools based on evidence that the best schools were small. To see the flawed reasoning, imagine that schools come in two sizes — small schools with 100 students and large schools with 1600 students — and that students scores at both types of schools are drawn from the same distribution with a mean score of 100 and a standard deviation of 80. At small schools, the standard deviation of the mean equals 8. At large schools, the standard deviation of the mean equals 2.

If we assign the label ‘high-performing’ to schools with means above 110 and the label ‘exceptional’ to schools with means above 120, then only small schools will meet either threshold. For the small schools, an average score of 110 is 1.25 standard deviations above the mean; such events occur about 10% of the time. A mean score of 120 is 2.5 standard deviations above the mean … When we do these same calculations for large schools, we find that the ‘high-performing’ threshold lies 5 standard deviations above the mean and the ‘exceptional’ threshold lies 10 standard deviations above the mean. Such events would, in practice, never occur. Thus, the fact that the very best schools are smaller is not evidence that smaller schools perform better.

## Econometrics — analysis with incredible certitude

10 December, 2018 at 18:41 | Posted in Statistics & Econometrics | Leave a commentThere have been over four decades of econometric research on business cycles …

But the significance of the formalization becomes more difficult to identify when it is assessed from the applied perspective …

The wide conviction of the superiority of the methods of the science has converted the econometric community largely to a group of fundamentalist guards of mathematical rigour … So much so that the relevance of the research to business cycles is reduced to empirical illustrations. To that extent, probabilistic formalisation has trapped econometric business cycle research in the pursuit of means at the expense of ends.

The limits of econometric forecasting have, as noted by Qin, been critically pointed out many times before. Trygve Haavelmo assessed the role of econometrics — in an article from 1958 — and although mainly positive of the “repair work” and “clearing-up work” done, Haavelmo also found some grounds for despair:

There is the possibility that the more stringent methods we have been striving to develop have actually opened our eyes to recognize a plain fact: viz., that the “laws” of economics are not very accurate in the sense of a close fit, and that we have been living in a dream-world of large but somewhat superficial or spurious correlations.

Maintaining that economics is a science in the ‘true knowledge’ business, I remain a sceptic of the pretences and aspirations of econometrics. The marginal return on its ever higher technical sophistication in no way makes up for the lack of serious under-labouring of its deeper philosophical and methodological foundations that already Keynes complained about. The rather one-sided emphasis of usefulness and its concomitant instrumentalist justification cannot hide that the legions of probabilistic econometricians who give supportive evidence for their considering it ‘fruitful to believe’ in the possibility of treating unique economic data as the observable results of random drawings from an imaginary sampling of an imaginary population, are skating on thin ice.

A rigorous application of econometric methods in economics really presupposes that the phenomena of our real world economies are ruled by stable causal relations between variables. The endemic lack of predictive success of the econometric project indicates that this hope of finding fixed parameters is an incredible hope for which there, really, is no other ground than hope itself.

## What RCTs can and cannot tell us

8 December, 2018 at 18:10 | Posted in Statistics & Econometrics | Leave a commentWe seek to promote an approach to RCTs that is tentative in its claims and that avoids simplistic generalisations about causality and replaces these with more nuanced and grounded accounts that acknowledge uncertainty, plausibility and statistical probability …

Whilst promoting the use of RCTs in education we also need to be acutely aware of their limitations … Whilst the strength of an RCT rests on strong internal validity, the Achilles heel of the RCT is external validity … Within education and the social sciences a range of cultural conditions is likely to influence the external validity of trial results across different contexts. It is precisely for this reason that qualitative components of an evaluation, and particularly the development of plausible accounts of generative mechanisms are so important …

Highly recommended reading.

## ‘Controlling for’ — a methodological urban legend

6 December, 2018 at 18:42 | Posted in Statistics & Econometrics | 9 CommentsTrying to reduce the risk of having established only ‘spurious relations’ when dealing with observational data, statisticians and econometricians standardly add control variables. The hope is that one thereby will be able to make more reliable causal inferences. But — as Keynes showed already back in the 1930s when criticizing statistical-econometric applications of regression analysis — if you do not manage to get hold of *all* potential confounding factors, the model risks producing estimates of the variable of interest that are even worse than models without any control variables at all. Conclusion: think twice before you simply include ‘control variables’ in your models!

The gender pay gap is a fact that, sad to say, to a non-negligible extent is the result of discrimination. And even though many women are not deliberately discriminated against, but rather self-select into lower-wage jobs, this in no way magically explains away the discrimination gap. As decades of socialization research has shown, women may be ‘structural’ victims of impersonal social mechanisms that in different ways aggrieve them. Wage discrimination is unacceptable. Wage discrimination is a shame.

You see it all the time in studies. “We controlled for…” An example is research around the gender wage gap, which tries to control for so many things that it ends up controlling for the thing it’s trying to measure. As my colleague Matt Yglesias wrote:

“Take hours worked, which is a standard control in some of the more sophisticated wage gap studies. Women tend to work fewer hours than men. If you control for hours worked, then some of the gender wage gap vanishes. As Yglesias wrote, it’s “silly to act like this is just some crazy coincidence. Women work shorter hours because as a society we hold women to a higher standard of housekeeping, and because they tend to be assigned the bulk of childcare responsibilities.”

Controlling for hours worked, in other words, is at least partly controlling for how gender works in our society. It’s controlling for the thing that you’re trying to isolate.

## The Model Thinker

1 December, 2018 at 16:51 | Posted in Statistics & Econometrics | Leave a commentScott Page’s new book is a great introduction on how to use and evaluate different kinds of mathematical models in the social sciences. Yours truly will be back soon for a lengthy review, but let me just notice that — as I have over and over again emphasized on this blog — Page underscores that if we want to explain social phenomena, relations and structures, we have to go beyond data. Data do not speak for themselves. Without theory and a search for mechanisms and deep causality, we won’t be able to understand or explain what happens in the real world.

## Unbiased estimates? Forget it!

30 November, 2018 at 07:28 | Posted in Statistics & Econometrics | 1 CommentIn realistic settings, unbiased estimates simply don’t exist. In the real world we have nonrandom samples, measurement error, nonadditivity, nonlinearity, etc etc etc.

So forget about it. We’re living in the real world …

It’s my impression that many practitioners in applied econometrics and statistics think of their estimation choice kinda like this:

1. The unbiased estimate. It’s the safe choice, maybe a bit boring and maybe not the most efficient use of the data, but you can trust it and it gets the job done.

2. A biased estimate. Something flashy, maybe Bayesian, maybe not, it might do better but it’s risky. In using the biased estimate, you’re stepping off base—the more the bias, the larger your lead—and you might well get picked off …

If you take the choice above and combine it with the unofficial rule that statistical significance is taken as proof of correctness (in econ, this would also require demonstrating that the result holds under some alternative model specifications, but “p less than .05″ is still key), then you get the following decision rule:

A. Go with the safe, unbiased estimate. If it’s statistically significant, run some robustness checks and, if the result doesn’t go away, stop.

B. If you don’t succeed with A, you can try something fancier. But . . . if you do that, everyone will know that you tried plan A and it didn’t work, so people won’t trust your finding … My point is that the unbiased estimate does not exist! There is no safe harbor. Just as we can never get our personal risks in life down to zero … there is no such thing as unbiasedness. And it’s a good thing, too: recognition of this point frees us to do better things with our data right away.

## Probability vs Maximum Likelihood (student stuff)

27 November, 2018 at 16:19 | Posted in Statistics & Econometrics | Leave a comment

.

## On manipulability and causation

26 November, 2018 at 16:08 | Posted in Statistics & Econometrics | 1 CommentIf contributions made by statisticians to the understanding of causation are to be taken over with advantage in any specific field of inquiry, then what is crucial is that the right relationship should exist between statistical and subject-matter concerns …

The idea of causation as consequential manipulation is apt to research that can be undertaken primarily through experimental methods and, especially to ‘practical science’ where the central concern is indeed with ‘the consequences of performing particular acts’. The development of this idea in the context of medical and agricultural research is as understandable as the development of that of causation as robust dependence within applied econometrics. However, the extension of the manipulative approach into sociology would not appear promising, other than in rather special circumstances … The more fundamental difficulty is that under the — highly anthropocentric — principle of ‘no causation without manipulation’, the recognition that can be given to the action of individuals as having causal force is in fact peculiarly limited.

Some statisticians and data scientists think that algorithmic formalisms somehow give them access to causality. That is, however, simply not true. Assuming ‘convenient’ things like faithfulness or stability is not to give proofs. It is to assume what has to be proven. Deductive-axiomatic methods used in statistics do no produce evidence for causal inferences. The real causality we are searching for is the one existing in the real world around us. If there is no warranted connection between axiomatically derived theorems and the real world, well, then we have not really obtained the causation we are looking for.

[**Added**: If you have not already read Goldthorpe’s article, you should. For every social scientist or economist interested in questions about causality, this modern minor classic is a must-read.]

## Why most published research is wrong

26 November, 2018 at 11:07 | Posted in Statistics & Econometrics | Leave a comment

After having mastered all the technicalities of regression analysis and econometrics, students often feel as though they are the masters of the universe. I usually cool them down with a required reading of **Christopher Achen**‘s modern classic *Interpreting and Using Regression*.

It usually get them back on track again, and they understand that

no increase in methodological sophistication … alter the fundamental nature of the subject. It remains a wondrous mixture of rigorous theory, experienced judgment, and inspired guesswork. And that, finally, is its charm.

And in case they get too excited about having learned to master the intricacies of proper significance tests and p-values, I ask them to also ponder on Achen’s warning:

Significance testing as a search for specification errors substitutes calculations for substantive thinking. Worse, it channels energy toward the hopeless search for functionally correct specifications and divert attention from the real tasks, which are to formulate a manageable description of the data and to exclude competing ones.

## P-values are no substitute for thinking

21 November, 2018 at 22:18 | Posted in Statistics & Econometrics | Comments Off on P-values are no substitute for thinking

A non-trivial part of statistics education is made up of teaching students to perform significance testing. A problem I have noticed repeatedly over the years, however, is that no matter how careful you try to be in explicating what the probabilities generated by these statistical tests really are, still most students misinterpret them.

This is not to blame on students’ ignorance, but rather on significance testing not being particularly transparent (conditional probability inference is difficult even to those of us who teach and practice it). A lot of researchers fall prey to the same mistakes.

If anything, the above video underlines how important it is not to equate science with statistical calculation. All science entail human judgement, and using statistical models doesn’t relieve us of that necessity. Working with misspecified models, the scientific value of significance testing is actually zero — even though you’re making valid statistical inferences! Statistical models and concomitant significance tests are no substitutes for doing real science.

In its standard form, a significance test is not the kind of ‘severe test’ that we are looking for in our search for being able to confirm or disconfirm empirical scientific hypotheses. This is problematic for many reasons, one being that there is a strong tendency to accept the null hypothesis since they can’t be rejected at the standard 5% significance level. In their standard form, significance tests bias against new hypotheses by making it hard to disconfirm the null hypothesis.

And as shown over and over again when it is applied, people have a tendency to read “not disconfirmed” as “probably confirmed.” Standard scientific methodology tells us that when there is only say a 10 % probability that pure sampling error could account for the observed difference between the data and the null hypothesis, it would be more “reasonable” to conclude that we have a case of disconfirmation. Especially if we perform many independent tests of our hypothesis and they all give the same 10% result as our reported one, I guess most researchers would count the hypothesis as even more disconfirmed.

Most importantly — we should never forget that the underlying parameters we use when performing significance tests are *model constructions*. Our p-values mean next to nothing if the model is wrong. Statistical significance tests DO NOT validate models!

In journal articles a typical regression equation will have an intercept and several explanatory variables. The regression output will usually include an F-test, with p-1 degrees of freedom in the numerator and n-p in the denominator. The null hypothesis will not be stated. The missing null hypothesis is that all the coefficients vanish, except the intercept.

If F is significant, that is often thought to validate the model. Mistake. The F-test takes the model as given. Significance only means this:

ifthe model is rightandthe coefficients are 0, it is very unlikely to get such a big F-statistic. Logically, there are three possibilities on the table:

i) An unlikely event occurred.

ii) Or the model is right and some of the coefficients differ from 0.

iii) Or the model is wrong.

So?

## In search of causality

14 November, 2018 at 14:41 | Posted in Statistics & Econometrics | 3 CommentsOne of the few statisticians that yours truly have on the blogroll is **Andrew Gelman**. Although not sharing his Bayesian leanings, I find his open-minded, thought-provoking and non-dogmatic statistical thinking highly recommendable. The plaidoyer below for ‘reverse causal questioning’ is typical Gelmanian:

When statistical and econometrc methodologists write about causal inference, they generally focus on forward causal questions. We are taught to answer questions of the type “What if?”, rather than “Why?” Following the work by Rubin (1977) causal questions are typically framed in terms of manipulations: if x were changed by one unit, how much would y be expected to change? But reverse causal questions are important too … In many ways, it is the reverse causal questions that motivate the research, including experiments and observational studies, that we use to answer the forward questions …

Reverse causal reasoning is different; it involves asking questions and searching for new variables that might not yet even be in our model. We can frame reverse causal questions as model checking. It goes like this: what we see is some pattern in the world that needs an explanation. What does it mean to “need an explanation”? It means that existing explanations — the existing model of the phenomenon — does not do the job …

By formalizing reverse casual reasoning within the process of data analysis, we hope to make a step toward connecting our statistical reasoning to the ways that we naturally think and talk about causality. This is consistent with views such as Cartwright (2007) that causal inference in reality is more complex than is captured in any theory of inference … What we are really suggesting is a way of talking about reverse causal questions in a way that is complementary to, rather than outside of, the mainstream formalisms of statistics and econometrics.

In a time when scientific relativism is expanding, it is important to keep up the claim for not reducing science to a pure discursive level. We have to maintain the Enlightenment tradition of thinking of reality as principally independent of our views of it and of the main task of science as studying the structure of this reality. Perhaps the most important contribution a researcher can make is revealing what this reality that is the object of science actually looks like.

Science is made possible by the fact that there are structures that are durable and are independent of our knowledge or beliefs about them. There exists a reality beyond our theories and concepts of it. It is this independent reality that our theories in some way deal with. Contrary to positivism, I would as a critical realist argue that the main task of science is not to detect event-regularities between observed facts. Rather, that task must be conceived as identifying the underlying structures and forces that produce the observed events.

In Gelman’s essay there is no explicit argument for abduction — inference to the best explanation — but I would still argue that it is *de facto* nothing but a very strong argument for why scientific realism and inference to the best explanation are the best alternatives for explaining what is going on in the world we live in. The focus on causality, model checking, anomalies and context-dependence — although here expressed in statistical terms — is as close to abductive reasoning as we get in statistics and econometrics today.

## Kausala modeller och heterogenitet (wonkish)

12 November, 2018 at 13:50 | Posted in Statistics & Econometrics | Comments Off on Kausala modeller och heterogenitet (wonkish)I *The Book of Why* för Judea Pearl fram flera tunga skäl till varför den numera så populära kausala grafteoretiska ansatsen är att föredra framför mer traditionella regressionsbaserade förklaringsmodeller. Ett av skälen är att kausala grafer är icke-parametriska och därför inte behöver anta exempelvis additivitet och/eller frånvaro av interaktionseffekter — pilar och noder ersätter regressionsanalysens nödvändiga specificeringar av funktionella relationer mellan de i ekvationerna ingående variablerna.

Men även om Pearl och andra av grafteorins anhängare mest framhäver fördelarna med den flexibilitet det nya verktyget ger oss, finns det också klara risker och nackdelar med användandet av kausala grafer. Bristen på klargörande om additivitet, interaktion, eller andra variabel- och relationskaraktäristika föreligger och hur de i så fall specificeras, kan ibland skapa mer problem än de löser.

Många av problemen — precis som med regressionsanalyser — hänger samman med förekomsten och graden av heterogenitet. Låt mig ta ett exempel från skolforskningens område för att belysa problematiken.

En på senare år återkommande fråga som både politiker och forskare ställt sig (se t ex här och här) är om friskolor leder till att höja kunskapsnivå och provresultat bland landets skolelever. För att kunna svara på denna (*realiter* mycket svåra) kausala fråga, behöver vi ha kännedom om mängder av kända, observerbara variabler och bakgrundsfaktorer (föräldrars inkomster och utbildning, etnicitet, boende, etc, etc). Därutöver också faktorer som vi vet har betydelse men är icke-observerbara och/eller mer eller mindre omätbara.

Problemen börjar redan när vi frågar oss vad som döljer sig bakom den allmänna termen ‘friskola’. Alla friskolor är inte likvärdiga (homogenitet). Vi vet att det föreligger många gånger stora skillnader mellan dem (heterogenitet). Att då lumpa ihop alla och försöka besvara den kausala frågan utan att ta hänsyn till dessa skillnader blir många gånger poänglöst och ibland också fullständigt missvisande.

Ett annat problem är att en annan typ av heterogenitet — som har med specifikation av de funktionella relationerna att göra — kan dyka upp. Anta att friskoleeffekten hänger samman med exempelvis etnicitet, och att elever med ‘svensk bakgrund’ presterar bättre än elever med ‘invandrarbakgrund.’ Detta behöver inte nödvändigtvis innebära att elever med olika etnisk bakgrund i sig påverkas olika av att gå på friskola. Effekten kan snarare härröra, exempelvis, ur det faktum att de alternativa kommunala skolor invandrareleverna kunnat gå på varit sämre än de ‘svenska’ elever kunnat gå på. Om man inte tar hänsyn till dessa skillnader i jämförelsegrund blir de skattade friskoleeffekterna missvisande.

Ytterligare heterogenitetsproblem uppstår om de mekanismer som är verksamma vid skapandet av friskoleeffekten ser väsentligt annorlunda ut för olika grupper av elever. Friskolor med ‘fokus’ på invandrargrupper kan exempelvis tänkas vara mer medvetna om behovet av att stötta dessa elever och vidta kompenserande åtgärder för att motarbeta fördomar och dylikt. Utöver effekterna av den (förmodade) bättre undervisningen i övrigt på friskolor är effekterna för denna kategori av elever också en effekt av den påtalade heterogeniteten, och kommer följaktligen inte att sammanfalla med den för den andra gruppen elever.

Tyvärr är det inte slut på problemen här. Vi konfronteras också med ett svårlöst och ofta förbisett selektivitetsproblem. När vi vill försöka få svar på den kausala frågan kring effekterna av friskolor är ett vanligt förfarande i regressionsanalyser att ‘konstanthålla’ eller ‘kontrollera’ för påverkansfaktorer utöver de vi främst är intresserade av. När det gäller friskolor är en vanlig kontrollvariabel föräldrarnas inkomst- eller utbildnings-bakgrund. Logiken är att vi på så vis ska kunna simulera en (ideal) situation som påminner så mycket som möjligt om ett randomiserat experiment där vi bara ‘jämför’ (matchar) elever till föräldrar med jämförbar utbildning eller inkomst, och på så vis hoppas kunna erhålla ett bättre mått på den ‘rena’ friskoleeffekten. Kruxet här är att det inom varje inkomst- och utbildningskategori kan dölja sig ytterligare en – ibland dold och kanske omätbar — heterogenitet som har med exempelvis inställning och motivation att göra och som gör att vissa elever tenderar välja (selektera) att gå på friskolor eftersom de tror sig veta att de kommer att prestera bättre där än på kommunala skolor (i friskoledebatten är ett återkommande argument kring segregationseffekterna att elever till föräldrar med hög ‘socio-ekonomisk status’ här bättre tillgång till information om skolvalets effekter än andra elever). Inkomst- eller utbildningsvariabeln kan på så vis *de facto* ‘maskera’ andra faktorer som ibland kan spela en mer avgörande roll än de. Skattningarna av friskoleeffekten kan därför här — åter — bli missvisande, och ibland till och med ännu mer missvisande än om vi inte ‘konstanthållit’ för någon kontrollvariabel alls (jfr med ‘second-best’ teoremet i välfärdsekonomisk teori)!

Att ‘kontrollera’ för möjliga ‘confounders’ är alltså inte alltid självklart rätt väg att gå. Om själva relationen mellan friskola (X) och studieresultat (Y) påverkas av införandet av kontrollvariabeln ‘socio-ekonomisk status'(W) är detta troligen ett resultat av att det föreligger någon typ av samband mellan X och W. Detta innebär också att vi inte har en ideal ‘experimentsimulering’ eftersom det uppenbarligen finns faktorer som påverkar Y och som inte är slumpmässigt fördelade (randomiserade). Innan vi kan gå vidare måste vi då fråga oss *varför* sambandet i fråga föreligger. För att kunna kausalt förklara sambandet mellan X och Y, måste vi veta mer om hur W påverkar valet av X. Bland annat kan vi då finna att det föreligger en skillnad i valet av X mellan olika delar av gruppen med hög ‘socio-ekonomisk status’ W. Utan kunskaper om denna selektionsmekanism kan vi inte på ett tillförlitligt sätt mäta X:s effekt på Y — den randomiserade förklaringsmodellen är helt enkelt inte applicerbar. Utan kunskap om varför det föreligger ett samband — och hur det ser ut — mellan X och W, hjälper oss inte ‘kontrollerandet’ eftersom det inte tar höjd för den verksamma selektionsmekanismen.

Utöver de här tangerade problemen har vi andra sedan gammalt välkända problem. Den så kallade kontext- eller gruppeffekten — för en elev som går på en friskola kan resultaten delvis vara en effekt av att hennes skolkamrater har liknande bakgrund och att hon därför i någon mening dra fördel av sin omgivning, vilket inte skulle ske om hon gick på en kommunal skola — innebär åter att ’confounder’ eliminering via kontrollvariabler inte självklart fungerar när det föreligger ett samband mellan kontrollvariabel och icke-eller svårmätbara icke-observerbara attribut som själva påverkar den beroende variabeln. I vårt skolexempel kan man anta att de föräldrar med en viss socio-ekonomisk status som skickar sina barn till friskolor skiljer sig från samma grupp av föräldrar som väljer låta barnen gå i kommunal skola. Kontrollvariablerna fungerar — åter igen — inte som fullödiga substitut för ett verkligt experiments randomiserade ’assignment.’

Am I right in thinking that the method of multiple correlation analysis essentially depends on the economist having furnished, not merely a list of the significant causes, which is correct so far as it goes, but a

completelist? For example, suppose three factors are taken into account, it is not enough that these should be in fact vera causa; there must be no other significant factor. If there is a further factor, not taken account of, then the method is not able to discover the relative quantitative importance of the first three. If so, this means that the method is only applicable where the economist is able to provide beforehand a correct and indubitably complete analysis of the significant factors. The method is one neither of discovery nor of criticism. It is a means of giving quantitative precision to what, in qualitative terms, we know already as the result of a complete theoretical analysis …

Vad avser användandet av kontrollvariabler får man inte heller bortse från en viktig aspekt som sällan berörs av de som använder berörda statistiska metoder. De i studierna ingående variablerna behandlas ‘som om’ relationerna mellan dem i populationen är slumpmässig. Men variabler kan ju de facto ha de värden de har just för att de ger upphov till de konsekvenser de har. Utfallet bestämmer på så vis alltså i viss utsträckning varför de ‘oberoende’ variablerna har de värden de har. De ”randomiserade’ oberoende variablerna visar sig i själva verket vara något annat än vad de antas vara, och omöjliggör därför också att observationsstudierna och kvasiexperimenten ens är i närheten av att vara riktiga experiment. Saker och ting ser ut som de gör många gånger av ett skäl. Ibland är skälen just de konsekvenser regler, institutioner och andra faktorer anteciperas ge upphov till! Det som uppfattas som ‘exogent’ är i själva verket inte alls ‘exogent’

Those variables that have been left outside of the causal system may not actually operate as assumed; they may produce effects that are nonrandom and that may become confounded with those of the variables directly under consideration.

Vad drar vi för slutsats av allt detta då? Kausalitet *är* svårt och vi ska — trots kritiken — så klart inte kasta ut barnet med badvattnet. Men att inta en hälsosam skepsis och försiktighet när det gäller bedömning och värdering av statistiska metoders — vare sig det gäller kausal grafteori eller mer traditionell regressionsanalys — förmåga att verkligen slå fast kausala relationer, är definitivt att rekommendera.

## Good thinking — the thing statistics cannot replace

10 November, 2018 at 16:06 | Posted in Statistics & Econometrics | 4 Comments

As social researchers, we should never equate science with mathematics and statistical calculation. All science entail human judgement, and using mathematical and statistical models don’t relieve us of that necessity. They are no substitutes for thinking and doing real science.

Statistical — and econometric — patterns should never be seen as anything else than possible clues to follow. Behind observable data, there are real structures and mechanisms operating, things that are — if we really want to understand, explain and (possibly) predict things in the real world — more important to get hold of than to simply correlate and regress observable variables.

Statistics cannot establish the truth value of a fact. Never has. Never will.

## Econometrics: The Keynes-Tinbergen controversy

8 November, 2018 at 13:18 | Posted in Statistics & Econometrics | 1 CommentMainstream economists often hold the view that Keynes’ criticism of econometrics was the result of a sadly misinformed and misguided person who disliked and did not understand much of it.

This is, however, nothing but a gross misapprehension.

To be careful and cautious is not the same as to dislike. Keynes did not misunderstand the crucial issues at stake in the development of econometrics. Quite the contrary. He knew them all too well — and was not satisfied with the validity and philosophical underpinning of the assumptions made for applying its methods.

Keynes’ critique is still valid and unanswered in the sense that the problems he pointed at are still with us today and ‘unsolved.’ Ignoring them — the most common practice among applied econometricians — is not to solve them.

To apply statistical and mathematical methods to the real-world economy, the econometrician has to make some quite strong assumptions. In a review of Tinbergen’s econometric work — published in *The Economic Journal* in 1939 — Keynes gave a comprehensive critique of Tinbergen’s work, focusing on the limiting and unreal character of the assumptions that econometric analyses build on:

**Completeness**: Where Tinbergen attempts to specify and quantify which different factors influence the business cycle, Keynes maintains there has to be a complete list of *all* the relevant factors to avoid misspecification and spurious causal claims. Usually, this problem is ‘solved’ by econometricians assuming that they somehow have a ‘correct’ model specification. Keynes is, to put it mildly, unconvinced:

It will be remembered that the seventy translators of the Septuagint were shut up in seventy separate rooms with the Hebrew text and brought out with them, when they emerged, seventy identical translations. Would the same miracle be vouchsafed if seventy multiple correlators were shut up with the same statistical material? And anyhow, I suppose, if each had a different economist perched on his

a priori, that would make a difference to the outcome.

**Homogeneity**: To make inductive inferences possible — and being able to apply econometrics — the system we try to analyse has to have a large degree of ‘homogeneity.’ According to Keynes most social and economic systems — especially from the perspective of real historical time — lack that ‘homogeneity.’ As he had argued already in *Treatise on Probability* (ch. 22), it wasn’t always possible to take repeated samples from a fixed population when we were analysing real-world economies. In many cases, there simply are no reasons at all to assume the samples to be homogenous. Lack of ‘homogeneity’ makes the principle of ‘limited independent variety’ non-applicable, and hence makes inductive inferences, strictly seen, impossible since one of its fundamental logical premises are not satisfied. Without “much repetition and uniformity in our experience” there is no justification for placing “great confidence” in our inductions (TP ch. 8).

And then, of course, there is also the ‘reverse’ variability problem of non-excitation: factors that do not change significantly during the period analysed, can still very well be extremely important causal factors.

**Stability:** Tinbergen assumes there is a stable spatio-temporal relationship between the variables his econometric models analyze. But as Keynes had argued already in his *Treatise on Probability* it was not really possible to make inductive generalisations based on correlations in one sample. As later studies of ‘regime shifts’ and ‘structural breaks’ have shown us, it is exceedingly difficult to find and establish the existence of stable econometric parameters for anything but rather short time series.

**Measurability:** Tinbergen’s model assumes that all relevant factors are measurable. Keynes questions if it is possible to adequately quantify and measure things like expectations and political and psychological factors. And more than anything, he questioned — both on epistemological and ontological grounds — that it was always and everywhere possible to measure real-world uncertainty with the help of probabilistic risk measures. Thinking otherwise can, as Keynes wrote, “only lead to error and delusion.”

**Independence**: Tinbergen assumes that the variables he treats are independent (still a standard assumption in econometrics). Keynes argues that in such a complex, organic and evolutionary system as an economy, independence is a deeply unrealistic assumption to make. Building econometric models from that kind of simplistic and unrealistic assumptions risk producing nothing but spurious correlations and causalities. Real-world economies are organic systems for which the statistical methods used in econometrics are ill-suited, or even, strictly seen, inapplicable. Mechanical probabilistic models have little leverage when applied to non-atomic evolving organic systems — such as economies.

It is a great fault of symbolic pseudo-mathematical methods of formalising a system of economic analysis … that they expressly assume strict independence between the factors involved and lose all their cogency and authority if this hypothesis is disallowed; whereas, in ordinary discourse, where we are not blindly manipulating but know all the time what we are doing and what the words mean, we can keep “at the back of our heads” the necessary reserves and qualifications and the adjustments which we shall have to make later on, in a way in which we cannot keep complicated partial differentials “at the back” of several pages of algebra which assume that they all vanish.

Building econometric models can’t be a goal in itself. Good econometric models are means that make it possible for us to infer things about the real-world systems they ‘represent.’ If we can’t show that the mechanisms or causes that we isolate and handle in our econometric models are ‘exportable’ to the real world, they are of limited value to our understanding, explanations or predictions of real-world economic systems.

The kind of fundamental assumption about the character of material laws, on which scientists appear commonly to act, seems to me to be much less simple than the bare principle of uniformity. They appear to assume something much more like what mathematicians call the principle of the superposition of small effects, or, as I prefer to call it, in this connection, the

atomiccharacter of natural law. The system of the material universe must consist, if this kind of assumption is warranted, of bodies which we may term (without any implication as to their size being conveyed thereby)legal atoms, such that each of them exercises its own separate, independent, and invariable effect, a change of the total state being compounded of a number of separate changes each of which is solely due to a separate portion of the preceding state …The scientist wishes, in fact, to assume that the occurrence of a phenomenon which has appeared as part of a more complex phenomenon, may be some reason for expecting it to be associated on another occasion with part of the same complex. Yet if different wholes were subject to laws

quawholes and not simply on account of and in proportion to the differences of their parts, knowledge of a part could not lead, it would seem, even to presumptive or probable knowledge as to its association with other parts.

**Linearity:** To make his models tractable, Tinbergen assumes the relationships between the variables he study to be linear. This is still standard procedure today, but as Keynes writes:

It is a very drastic and usually improbable postulate to suppose that all economic forces are of this character, producing independent changes in the phenomenon under investigation which are directly proportional to the changes in themselves; indeed, it is ridiculous.

To Keynes, it was a ‘fallacy of reification’ to assume that all quantities are additive (an assumption closely linked to independence and linearity).

The unpopularity of the principle of organic unities shows very clearly how great is the danger of the assumption of unproved additive formulas. The fallacy, of which ignorance of organic unity is a particular instance, may perhaps be mathematically represented thus: suppose f(x) is the goodness of x and f(y) is the goodness of y. It is then assumed that the goodness of x and y together is f(x) + f(y) when it is clearly f(x + y) and only in special cases will it be true that f(x + y) = f(x) + f(y). It is plain that it is never legitimate to assume this property in the case of any given function without proof.

J. M. Keynes “Ethics in Relation to Conduct” (1903)

And as even one of the founding fathers of modern econometrics — Trygve Haavelmo — wrote:

What is the use of testing, say, the significance of regression coefficients, when maybe, the whole assumption of the linear regression equation is wrong?

Real-world social systems are usually not governed by stable causal mechanisms or capacities. The kinds of ‘laws’ and relations that econometrics has established, are laws and relations about entities in models that presuppose causal mechanisms and variables — and the relationship between them — being linear, additive, homogenous, stable, invariant and atomistic. But — when causal mechanisms operate in the real world they only do it in ever-changing and unstable combinations where the whole is more than a mechanical sum of parts. Since statisticians and econometricians — as far as I can see — haven’t been able to convincingly warrant their assumptions of homogeneity, stability, invariance, independence, additivity as being ontologically isomorphic to real-world economic systems, Keynes’ critique is still valid. As long as — as Keynes writes in a letter to Frisch in 1935 — “nothing emerges at the end which has not been introduced expressively or tacitly at the beginning,” I remain doubtful of the scientific aspirations of econometrics.

In his critique of Tinbergen, Keynes points us to the fundamental logical, epistemological and ontological problems of applying statistical methods to a basically unpredictable, uncertain, complex, unstable, interdependent, and ever-changing social reality. Methods designed to analyse repeated sampling in controlled experiments under fixed conditions are not easily extended to an organic and non-atomistic world where time and history play decisive roles.

Econometric modelling should never be a substitute for thinking. From that perspective, it is really depressing to see how much of Keynes’ critique of the pioneering econometrics in the 1930s-1940s is still relevant today.

The general line you take is interesting and useful. It is, of course, not exactly comparable with mine. I was raising the logical difficulties. You say in effect that, if one was to take these seriously, one would give up the ghost in the first lap, but that the method, used judiciously as an aid to more theoretical enquiries and as a means of suggesting possibilities and probabilities rather than anything else, taken with enough grains of salt and applied with superlative common sense, won’t do much harm. I should quite agree with that. That is how the method ought to be used.

Keynes, letter to E.J. Broster, December 19, 1939

## ‘Shoe-leather research’

1 November, 2018 at 21:08 | Posted in Statistics & Econometrics | Comments Off on ‘Shoe-leather research’If anything, Snow’s path-breaking research underlines how important it is not to equate science with statistical calculation. All science entail human judgement, and using statistical models doesn’t relieve us of that necessity. Working with misspecified models, the scientific value of statistics is actually zero — even though you’re making valid statistical inferences! Statistical models are no substitutes for doing real science. Or as a famous German philosopher once famously wrote:

There is no royal road to science, and only those who do not dread the fatiguing climb of its steep paths have a chance of gaining its luminous summits.

We should never forget that the underlying parameters we use when performing statistical tests are *model constructions*. And if the model is wrong, the value of our calculations is nil. As ‘shoe-leather researcher’ David Freedman wrote in *Statistical Models and Causal Inference*:

I believe model validation to be a central issue. Of course, many of my colleagues will be found to disagree. For them, fitting models to data, computing standard errors, and performing significance tests is “informative,” even though the basic statistical assumptions (linearity, independence of errors, etc.) cannot be validated. This position seems indefensible, nor are the consequences trivial. Perhaps it is time to reconsider.

## Berkson’s fallacy (wonkish)

26 October, 2018 at 09:12 | Posted in Statistics & Econometrics | Comments Off on Berkson’s fallacy (wonkish)

## Econometrics and causality

23 October, 2018 at 15:53 | Posted in Statistics & Econometrics | Comments Off on Econometrics and causalityJudea Pearl’s and Bryant Chen’s Regression and causation: a critical examination of six econometrics textbooks — published in* Real-World Economics Review* no. 65 — addresses two very important questions in the teaching of modern econometrics and its different textbooks — how is causality treated in general, and more specifically, to what extent they use a distinct causal notation.

The authors have for years been part of an extended effort of advancing explicit causal modelling (especially graphical models) in applied sciences, and this article examines to what extent these endeavours have found their way into econometrics textbooks (and Pearl has later come back to the theme in his *The Book of Why (*2018))

Although the text partly is of a rather demanding ‘technical’ nature, yours truly definitely recommend it for reading, especially for social scientists with an interest in causality.

Pearl’s seminal contribution to this research field is well-known and indisputable, but on the ‘taming’ and ‘resolve’ of the issues, I, however, have to admit that — under the influence of especially David Freedman and Nancy Cartwright — I still have some doubts on the reach, especially in terms of ‘realism’ and ‘relevance, of these ‘solutions’ for social sciences in general and economics in specific (see here, here, here and here). And with regards to the present article I think that since the distinction between the ‘interventionist’ E[Y|do(X)] and the more traditional ‘conditional expectationist’ E[Y|X] is so crucial for the subsequent argumentation, a more elaborated presentation had been of value, not the least because then the authors could also more fully explain why the first is so important and if/why this (in my, Freedman’s and Cartwright’s view) can be exported from ‘engineer’ contexts where it arguably easily and universally apply, to ‘socio-economic’ contexts where ‘manipulativity,’ ‘stability,’ faithfulness,’ ‘invariance’ and ‘modularity’ are not perhaps so universally at hand. In real-world settings, interventions may affect variables in complex and non-deterministic ways. In socio-economic contexts, complexity and lack of control often make it impossible to treat change — and causality — in terms of easily identifiable ‘interventions.’

The value of getting at precise and rigorous conclusions about causality based on ‘tractability’ conditions that are seldom met in real life, is difficult to assess. Testing and constructing models is one thing, but we do also need guidelines for how to evaluate in which situations and contexts they are applicable. Formalism may help us a bit down the road, but we have to make sure it somehow also fits the world if it is going to be really helpful in navigating that world. In all of science, conclusions are never more certain than the assumptions on which they are founded. Epistemically convenient methods and models that work in ‘well-behaved’ systems need not work in other contexts.

## The Bayesian Trap

21 October, 2018 at 09:58 | Posted in Statistics & Econometrics | Comments Off on The Bayesian Trap

## The connection between cause and probability

18 October, 2018 at 15:07 | Posted in Statistics & Econometrics | 2 CommentsCauses

canincrease the probability of their effects; but they need not. And for the other way around: an increase in probabilitycanbe due to a causal connection; but lots of other things can be responsible as well …The connection between causes and probabilities is like the connection between a disease and one of its symptoms: The disease can cause the symptom, but it need not; and the same symptom can result from a great many different diseases …

If you see a probabilistic dependence and are inclined to infer a causal connection from it, think hard about all the other possible reasons that that dependence might occur and eliminate them one by one. And when you are all done, remember — your conclusion is no more certain than your confidence that you really have eliminated all the possible alternatives.

Causality in social sciences — and economics — can never solely be a question of statistical inference. Causality entails more than predictability, and to really in-depth explain social phenomena require theory. Analysis of variation — the foundation of all econometrics — can never in itself reveal how these variations are brought about. First, when we are able to tie actions, processes or structures to the statistical relations detected, can we say that we are getting at relevant explanations of causation.

“Mediation analysis” is this thing where you have a treatment and an outcome and you’re trying to model how the treatment works: how much does it directly affect the outcome, and how much is the effect “mediated” through intermediate variables …

In the real world, it’s my impression that almost all the mediation analyses that people actually fit in the social and medical sciences are misguided: lots of examples where the assumptions aren’t clear and where, in any case, coefficient estimates are hopelessly noisy and where confused people will over-interpret statistical significance …

More and more I’ve been coming to the conclusion that the standard causal inference paradigm is broken … So how to do it? I don’t think traditional path analysis or other multivariate methods of the throw-all-the-data-in-the-blender-and-let-God-sort-em-out variety will do the job. Instead we need some structure and some prior information.

Most facts have many different, possible, alternative explanations, but we want to find the best of all contrastive (since all real explanation takes place relative to a set of alternatives) explanations. So which is the best explanation? Many scientists, influenced by statistical reasoning, think that the likeliest explanation is the best explanation. But the likelihood of x is not in itself a strong argument for thinking it explains y. I would rather argue that what makes one explanation better than another are things like aiming for and finding powerful, deep, causal, features and mechanisms that we have warranted and justified reasons to believe in. Statistical — especially the variety based on a Bayesian epistemology — reasoning generally has no room for these kinds of explanatory considerations. The only thing that matters is the probabilistic relation between evidence and hypothesis. That is also one of the main reasons I find abduction — inference to the best explanation — a better description and account of what constitute actual scientific reasoning and inferences.

In the social sciences … regression is used to discover relationships or to disentangle cause and effect. However, investigators have only vague ideas as to the relevant variables and their causal order; functional forms are chosen on the basis of convenience or familiarity; serious problems of measurement are often encountered.

Regression may offer useful ways of summarizing the data and making predictions. Investigators may be able to use summaries and predictions to draw substantive conclusions. However, I see no cases in which regression equations, let alone the more complex methods, have succeeded as engines for discovering causal relationships.

Some statisticians and data scientists think that algorithmic formalisms somehow give them access to causality. That is, however, simply not true. Assuming ‘convenient’ things like faithfulness or stability is not to give proofs. It’s to assume what has to be proven. Deductive-axiomatic methods used in statistics do no produce evidence for causal inferences. The real causality we are searching for is the one existing in the real world around us. If there is no warranted connection between axiomatically derived theorems and the real world, well, then we haven’t really obtained the causation we are looking for.

If contributions made by statisticians to the understanding of causation are to be taken over with advantage in any specific field of inquiry, then what is crucial is that the right relationship should exist between statistical and subject-matter concerns …

The idea of causation as consequential manipulation is apt to research that can be undertaken primarily through experimental methods and, especially to ‘practical science’ where the central concern is indeed with ‘the consequences of performing particular acts’. The development of this idea in the context of medical and agricultural research is as understandable as the development of that of causation as robust dependence within applied econometrics. However, the extension of the manipulative approach into sociology would not appear promising, other than in rather special circumstances … The more fundamental difficulty is that under the — highly anthropocentric — principle of ‘no causation without manipulation’, the recognition that can be given to the action of individuals as having causal force is in fact peculiarly limited.

Blog at WordPress.com.

Entries and comments feeds.