## Econometrics in the 21st century

22 Nov, 2020 at 19:05 | Posted in Statistics & Econometrics | 6 Comments.

If you don’t have time to listen to all of the presentations (a couple of them are actually quite uninformative) you should at least scroll forward to 1:39:25 and listen to what Angus Deaton has to say. As so often, he is spot on!

As Deaton notes, evidence-based theories and policies are highly valued nowadays. Randomization is supposed to control for bias from unknown confounders. The received opinion is that evidence based on randomized experiments, therefore, is the best.

More and more economists and econometricians have also lately come to advocate randomization as the principal method for ensuring being able to make valid causal inferences.

Yours truly would however rather argue that randomization, just as econometrics, promises more than it can deliver, basically because it requires assumptions that in practice are not possible to maintain. Just as econometrics, randomization is basically a deductive method. Given the assumptions (such as manipulability, transitivity, separability, additivity, linearity, etc.) these methods deliver deductive inferences. The problem, of course, is that we will never completely know when the assumptions are right. And although randomization may contribute to controlling for confounding, it does not guarantee it, since genuine randomness presupposes infinite experimentation and we know all real experimentation is finite. And even if randomization may help to establish average causal effects, it says nothing of individual effects unless homogeneity is added to the list of assumptions. Real target systems are seldom epistemically isomorphic to our axiomatic-deductive models/systems, and even if they were, we still have to argue for the external validity of the conclusions reached from within these epistemically convenient models/systems. Causal evidence generated by randomization procedures may be valid in ‘closed’ models, but what we usually are interested in, is causal evidence in the real target system we happen to live in.

The point of making a randomized experiment is often said to be that it ‘ensures’ that any correlation between a supposed cause and effect indicates a causal relation. This is believed to hold since randomization (allegedly) ensures that a supposed causal variable does not correlate with other variables that may influence the effect.

The problem with that simplistic view on randomization is that the claims made are both exaggerated and false:

• Even if you manage to do the assignment to treatment and control groups ideally random, the sample selection certainly is — except in extremely rare cases — not random. Even if we make a proper randomized assignment, if we apply the results to a biased sample, there is always the risk that the experimental findings will not apply. What works ‘there,’ does not work ‘here.’ Randomization hence does not ‘guarantee ‘ or ‘ensure’ making the right causal claim. Although randomization may help us rule out certain possible causal claims, randomization per se does not guarantee anything!

• Even if both sampling and assignment are made in an ideal random way, performing standard randomized experiments only give you averages. The problem here is that although we may get an estimate of the ‘true’ average causal effect, this may ‘mask’ important heterogeneous effects of a causal nature. Although we get the right answer of the average causal effect being 0, those who are ‘treated’ may have causal effects equal to -100, and those ‘not treated’ may have causal effects equal to 100. Contemplating being treated or not, most people would probably be interested in knowing about this underlying heterogeneity and would not consider the average effect particularly enlightening.

• There is almost always a trade-off between bias and precision. In real-world settings, a little bias often does not overtrump greater precision. And — most importantly — in case we have a population with sizeable heterogeneity, the average treatment effect of the sample may differ substantially from the average treatment effect in the population. If so, the value of any extrapolating inferences made from trial samples to other populations is highly questionable.

• Since most real-world experiments and trials build on performing one single randomization, what would happen if you kept on randomizing forever, does not help you to ‘ensure’ or ‘guarantee’ that you do not make false causal conclusions in the one particular randomized experiment you actually do perform. It is indeed difficult to see why thinking about what you know you will never do, would make you happy about what you actually do.

Randomization is not a panacea. It is not the best method for all questions and circumstances. Proponents of randomization make claims about its ability to deliver causal knowledge that is simply wrong. There are good reasons to be skeptical of the now popular — and ill-informed — view that randomization is the only valid and best method on the market. It is not.

## Kausalitet och statistik

18 Nov, 2020 at 22:43 | Posted in Statistics & Econometrics | Leave a commentI *The Book of Why* för Judea Pearl fram flera tunga skäl till varför den numera så populära kausala grafteoretiska ansatsen är att föredra framför mer traditionella regressionsbaserade förklaringsmodeller. Ett av skälen är att kausala grafer är icke-parametriska och därför inte behöver anta exempelvis additivitet och/eller frånvaro av interaktionseffekter — pilar och noder ersätter regressionsanalysens nödvändiga specificeringar av funktionella relationer mellan de i ekvationerna ingående variablerna.

Men även om Pearl och andra av grafteorins anhängare mest framhäver fördelarna med den flexibilitet det nya verktyget ger oss, finns det också klara risker och nackdelar med användandet av kausala grafer. Bristen på klargörande om additivitet, interaktion, eller andra variabel- och relationskaraktäristika föreligger och hur de i så fall specificeras, kan ibland skapa mer problem än de löser.

Många av problemen — precis som med regressionsanalyser — hänger samman med förekomsten och graden av heterogenitet. Låt mig ta ett exempel från skolforskningens område för att belysa problematiken.

En på senare år återkommande fråga som både politiker och forskare ställt sig (se t ex här) är om friskolor leder till att höja kunskapsnivå och provresultat bland landets skolelever. För att kunna svara på denna (*realiter* mycket svåra) kausala fråga, behöver vi ha kännedom om mängder av kända, observerbara variabler och bakgrundsfaktorer (föräldrars inkomster och utbildning, etnicitet, boende, etc, etc). Därutöver också faktorer som vi vet har betydelse men är icke-observerbara och/eller mer eller mindre omätbara.

Problemen börjar redan när vi frågar oss vad som döljer sig bakom den allmänna termen ‘friskola’. Alla friskolor är inte likvärdiga (homogenitet). Vi vet att det föreligger många gånger stora skillnader mellan dem (heterogenitet). Att då lumpa ihop alla och försöka besvara den kausala frågan utan att ta hänsyn till dessa skillnader blir många gånger poänglöst och ibland också fullständigt missvisande.

Ett annat problem är att en annan typ av heterogenitet — som har med specifikation av de funktionella relationerna att göra — kan dyka upp. Anta att friskoleeffekten hänger samman med exempelvis etnicitet, och att elever med ‘svensk bakgrund’ presterar bättre än elever med ‘invandrarbakgrund.’ Detta behöver inte nödvändigtvis innebära att elever med olika etnisk bakgrund i sig påverkas olika av att gå på friskola. Effekten kan snarare härröra, exempelvis, ur det faktum att de alternativa kommunala skolor invandrareleverna kunnat gå på varit sämre än de ‘svenska’ elever kunnat gå på. Om man inte tar hänsyn till dessa skillnader i jämförelsegrund blir de skattade friskoleeffekterna missvisande.

Ytterligare heterogenitetsproblem uppstår om de mekanismer som är verksamma vid skapandet av friskoleeffekten ser väsentligt annorlunda ut för olika grupper av elever. Friskolor med ‘fokus’ på invandrargrupper kan exempelvis tänkas vara mer medvetna om behovet av att stötta dessa elever och vidta kompenserande åtgärder för att motarbeta fördomar och dylikt. Utöver effekterna av den (förmodade) bättre undervisningen i övrigt på friskolor är effekterna för denna kategori av elever också en effekt av den påtalade heterogeniteten, och kommer följaktligen inte att sammanfalla med den för den andra gruppen elever.

Tyvärr är det inte slut på problemen här. Vi konfronteras också med ett svårlöst och ofta förbisett selektivitetsproblem. När vi vill försöka få svar på den kausala frågan kring effekterna av friskolor är ett vanligt förfarande i regressionsanalyser att ‘konstanthålla’ eller ‘kontrollera’ för påverkansfaktorer utöver de vi främst är intresserade av. När det gäller friskolor är en vanlig kontrollvariabel föräldrarnas inkomst- eller utbildnings-bakgrund. Logiken är att vi på så vis ska kunna simulera en (ideal) situation som påminner så mycket som möjligt om ett randomiserat experiment där vi bara ‘jämför’ (matchar) elever till föräldrar med jämförbar utbildning eller inkomst, och på så vis hoppas kunna erhålla ett bättre mått på den ‘rena’ friskoleeffekten. Kruxet här är att det inom varje inkomst- och utbildningskategori kan dölja sig ytterligare en – ibland dold och kanske omätbar — heterogenitet som har med exempelvis inställning och motivation att göra och som gör att vissa elever tenderar välja (selektera) att gå på friskolor eftersom de tror sig veta att de kommer att prestera bättre där än på kommunala skolor (i friskoledebatten är ett återkommande argument kring segregationseffekterna att elever till föräldrar med hög ‘socio-ekonomisk status’ här bättre tillgång till information om skolvalets effekter än andra elever). Inkomst- eller utbildningsvariabeln kan på så vis *de facto* ‘maskera’ andra faktorer som ibland kan spela en mer avgörande roll än de. Skattningarna av friskoleeffekten kan därför här — åter — bli missvisande, och ibland till och med ännu mer missvisande än om vi inte ‘konstanthållit’ för någon kontrollvariabel alls (jfr med ‘second-best’ teoremet i välfärdsekonomisk teori)!

Att ‘kontrollera’ för möjliga ‘confounders’ är alltså inte alltid självklart rätt väg att gå. Om själva relationen mellan friskola (X) och studieresultat (Y) påverkas av införandet av kontrollvariabeln ‘socio-ekonomisk status'(W) är detta troligen ett resultat av att det föreligger någon typ av samband mellan X och W. Detta innebär också att vi inte har en ideal ‘experimentsimulering’ eftersom det uppenbarligen finns faktorer som påverkar Y och som inte är slumpmässigt fördelade (randomiserade). Innan vi kan gå vidare måste vi då fråga oss *varför* sambandet i fråga föreligger. För att kunna kausalt förklara sambandet mellan X och Y, måste vi veta mer om hur W påverkar valet av X. Bland annat kan vi då finna att det föreligger en skillnad i valet av X mellan olika delar av gruppen med hög ‘socio-ekonomisk status’ W. Utan kunskaper om denna selektionsmekanism kan vi inte på ett tillförlitligt sätt mäta X:s effekt på Y — den randomiserade förklaringsmodellen är helt enkelt inte applicerbar. Utan kunskap om varför det föreligger ett samband — och hur det ser ut — mellan X och W, hjälper oss inte ‘kontrollerandet’ eftersom det inte tar höjd för den verksamma selektionsmekanismen.

Utöver de här tangerade problemen har vi andra sedan gammalt välkända problem. Den så kallade kontext- eller gruppeffekten — för en elev som går på en friskola kan resultaten delvis vara en effekt av att hennes skolkamrater har liknande bakgrund och att hon därför i någon mening dra fördel av sin omgivning, vilket inte skulle ske om hon gick på en kommunal skola — innebär åter att ’confounder’ eliminering via kontrollvariabler inte självklart fungerar när det föreligger ett samband mellan kontrollvariabel och icke-eller svårmätbara icke-observerbara attribut som själva påverkar den beroende variabeln. I vårt skolexempel kan man anta att de föräldrar med en viss socio-ekonomisk status som skickar sina barn till friskolor skiljer sig från samma grupp av föräldrar som väljer låta barnen gå i kommunal skola. Kontrollvariablerna fungerar — åter igen — inte som fullödiga substitut för ett verkligt experiments randomiserade ’assignment.’

Am I right in thinking that the method of multiple correlation analysis essentially depends on the economist having furnished, not merely a list of the significant causes, which is correct so far as it goes, but a

completelist? For example, suppose three factors are taken into account, it is not enough that these should be in fact vera causa; there must be no other significant factor. If there is a further factor, not taken account of, then the method is not able to discover the relative quantitative importance of the first three. If so, this means that the method is only applicable where the economist is able to provide beforehand a correct and indubitably complete analysis of the significant factors. The method is one neither of discovery nor of criticism. It is a means of giving quantitative precision to what, in qualitative terms, we know already as the result of a complete theoretical analysis …

Vad avser användandet av kontrollvariabler får man inte heller bortse från en viktig aspekt som sällan berörs av de som använder berörda statistiska metoder. De i studierna ingående variablerna behandlas ‘som om’ relationerna mellan dem i populationen är slumpmässig. Men variabler kan ju de facto ha de värden de har just för att de ger upphov till de konsekvenser de har. Utfallet bestämmer på så vis alltså i viss utsträckning varför de ‘oberoende’ variablerna har de värden de har. De ”randomiserade’ oberoende variablerna visar sig i själva verket vara något annat än vad de antas vara, och omöjliggör därför också att observationsstudierna och kvasiexperimenten ens är i närheten av att vara riktiga experiment. Saker och ting ser ut som de gör många gånger av ett skäl. Ibland är skälen just de konsekvenser regler, institutioner och andra faktorer anteciperas ge upphov till! Det som uppfattas som ‘exogent’ är i själva verket inte alls ‘exogent’

Those variables that have been left outside of the causal system may not actually operate as assumed; they may produce effects that are nonrandom and that may become confounded with those of the variables directly under consideration.

Vad drar vi för slutsats av allt detta då? Kausalitet *är* svårt och vi ska — trots kritiken — så klart inte kasta ut barnet med badvattnet. Men att inta en hälsosam skepsis och försiktighet när det gäller bedömning och värdering av statistiska metoders — vare sig det gäller kausal grafteori eller mer traditionell regressionsanalys — förmåga att verkligen slå fast kausala relationer, är definitivt att rekommendera.

## Simpson’s paradox

12 Nov, 2020 at 11:16 | Posted in Statistics & Econometrics | Leave a commentFrom a theoretical perspective, Simpson’s paradox importantly shows that causality can never be reduced to a question of statistics or probabilities.

To understand causality we always have to relate it to a specific causal *structure*. Statistical correlations are *never* enough. No structure, no causality.

Simpson’s paradox is an interesting paradox in itself, but it can also highlight a deficiency in the traditional econometric approach towards causality. Say you have 1000 observations on men and an equal amount of observations on women applying for admission to university studies, and that 70% of men are admitted, but only 30% of women. Running a logistic regression to find out the odds ratios (and probabilities) for men and women on admission, females seem to be in a less favourable position (‘discriminated’ against) compared to males (male odds are 2.33, female odds are 0.43, giving an odds ratio of 5.44). But once we find out that males and females apply to different departments we may well get a Simpson’s paradox result where males turn out to be ‘discriminated’ against (say 800 male apply for economics studies (680 admitted) and 200 for physics studies (20 admitted), and 100 female apply for economics studies (90 admitted) and 900 for physics studies (210 admitted) — giving odds ratios of 0.62 and 0.37).

Econometric patterns should never be seen as anything else than possible clues to follow. From a critical realist perspective, it is obvious that behind observable data there are real structures and mechanisms operating, things that are — if we really want to understand, explain and (possibly) predict things in the real world — more important to get hold of than to simply correlate and regress observable variables.

Math cannot establish the truth value of a fact. Never has. Never will.

## Conditional exchangeability and causal inference

11 Nov, 2020 at 18:13 | Posted in Statistics & Econometrics | Leave a commentIn observational data, it is unrealistic to assume that the treatment groups are exchangeable. In other words, there is no reason to expect that the groups are the same in all relevant variables other than the treatment. However, if we control for relevant variables by conditioning, then maybe the subgroups will be exchangeable. We will clarify what the “relevant variables” are, but for now, let’s just say they are all of the covariates 𝑋. Then, we can state conditional exchangeability formally …

The idea is that although the treatment and potential outcomes may be unconditionally associated (due to confounding), within levels of 𝑋, they are not associated. In other words, there is no confounding within levels of 𝑋 because controlling for 𝑋 has made the treatment groups comparable …

Conditional exchangeability is the main assumption necessary for causal inference. Armed with this assumption, we can identify the causal effect within levels of 𝑋, just like we did with (unconditional) exchangeability …

This marks an important result for causal inference … The main reason for moving from exchangeability to conditional exchangeability was that it seemed like a more realistic assumption. However, we often cannot know for certain if conditional exchangeability holds. There may be some unobserved confounders that are not part of 𝑋, meaning conditional exchangeability is violated. Fortunately, that is not a problem in randomized experiments. Unfortunately, it is something that we must always be conscious of in observational data. Intuitively, the best thing we can do is to observe and fit as many covariates into 𝑋 as possible to try to ensure unconfoundedness.

## Checking your statistical assumptions

9 Nov, 2020 at 13:37 | Posted in Statistics & Econometrics | 4 CommentsThe assumption of additivity and linearity means that the outcome variable is, in reality, linearly related to any predictors … and that if you have several predictors then their combined effect is best described by adding their effects together …

This assumption is the most important because if it is not true then even if all other assumptions are met, your model is invalid because you have described it incorrectly. It’s a bit like calling your pet cat a dog: you can try to get it to go in a kennel, or to fetch sticks, or to sit when you tell it to, but don’t be surprised when its behaviour isn’t what you expect because even though you’ve called it a dog, it is in fact a cat. Similarly, if you have described your statistical model inaccurately it won’t behave itself and there’s no point in interpreting its parameter estimates or worrying about significance tests of confidence intervals: the model is wrong.

Econometrics fails miserably over and over again — and not only because of the additivity and linearity assumption

Another reason why it does, is that the error term in the regression models used is thought of as representing the effect of the variables that were omitted from the models. The error term is somehow thought to be a ‘cover-all’ term representing omitted content in the model and necessary to include to ‘save’ the assumed deterministic relation between the other random variables included in the model. Error terms are usually assumed to be orthogonal (uncorrelated) to the explanatory variables. But since they are unobservable, they are also impossible to empirically test. And without justification of the orthogonality assumption, there is, as a rule, nothing to ensure identifiability.

Nowadays it has almost become a self-evident truism among economists that you cannot expect people to take your arguments seriously unless they are based on or backed up by advanced econometric modelling. So legions of mathematical-statistical theorems are proved — and heaps of fiction are being produced, masquerading as science. The rigour of the econometric modelling and the far-reaching assumptions they are built on is frequently simply not supported by data.

## How scientists manipulate research

9 Nov, 2020 at 10:25 | Posted in Statistics & Econometrics | Leave a comment

*All* science entails human judgment, and using statistical models doesn’t relieve us of that necessity. Working with misspecified models, the scientific value of significance testing is actually zero — even though you’re making valid statistical inferences! Statistical models and concomitant significance tests are no substitutes for doing real science.

In its standard form, a significance test is not the kind of ‘severe test’ that we are looking for in our search for being able to confirm or disconfirm empirical scientific hypotheses. This is problematic for many reasons, one being that there is a strong tendency to accept the null hypothesis since they can’t be rejected at the standard 5% significance level. In their standard form, significance tests bias against new hypotheses by making it hard to disconfirm the null hypothesis.

And as shown over and over again when it is applied, people have a tendency to read “not disconfirmed” as “probably confirmed.” Standard scientific methodology tells us that when there is only say a 10 % probability that pure sampling error could account for the observed difference between the data and the null hypothesis, it would be more “reasonable” to conclude that we have a case of disconfirmation. Especially if we perform many independent tests of our hypothesis and they all give the same 10% result as our reported one, I guess most researchers would count the hypothesis as even more disconfirmed.

Statistics is no substitute for thinking. We should never forget that the underlying parameters we use when performing significance tests are model constructions. Our p-values mean next to nothing if the model is wrong. Statistical significance tests do not validate models!

In many social sciences, p-values and null hypothesis significance testing (NHST) is often used to draw far-reaching scientific conclusions — despite the fact that they are as a rule poorly understood and that there exist alternatives that are easier to understand and more informative.

Not the least using confidence intervals (CIs) and effect sizes are to be preferred to the Neyman-Pearson-Fisher mishmash approach that is so often practiced by applied researchers.

Running a Monte Carlo simulation with 100 replications of a fictitious sample having N = 20, confidence intervals of 95%, a normally distributed population with a mean = 10 and a standard deviation of 20, taking two-tailed p-values on a zero null hypothesis, we get varying CIs (since they are based on varying sample standard deviations), but with a minimum of 3.2 and a maximum of 26.1, we still get a clear picture of what would happen in an infinite limit sequence. On the other hand p-values (even though from a purely mathematical-statistical sense more or less equivalent to CIs) vary strongly from sample to sample, and jumping around between a minimum of 0.007 and a maximum of 0.999 doesn’t give you a clue of what will happen in an infinite limit sequence!

[In case you want to do your own Monte Carlo simulations, here’s an example yours truly made using his favourite econometrics program, Gretl:

nulldata 20

loop 100 –progressive

series y = normal(10,15)

scalar zs = (10-mean(y))/sd(y)

scalar df = $nobs-1

scalar ybar=mean(y)

scalar ysd= sd(y)

scalar ybarsd=ysd/sqrt($nobs)

scalar tstat = (ybar-10)/ybarsd

pvalue t df tstat

scalar lowb = mean(y) – critical(t,df,0.025)*ybarsd

scalar uppb = mean(y) + critical(t,df,0.025)*ybarsd

scalar pval = pvalue(t,df,tstat)

store E:\pvalcoeff.gdt lowb uppb pval

endloop]

## Good advice

22 Oct, 2020 at 17:55 | Posted in Statistics & Econometrics | Comments Off on Good advice## Dirichlet’s principle — sophisticated simplicity

31 Aug, 2020 at 15:39 | Posted in Statistics & Econometrics | 1 Comment## Keynes on the additivity fallacy

8 Aug, 2020 at 09:42 | Posted in Statistics & Econometrics | 2 CommentsThe unpopularity of the principle of organic unities shows very clearly how great is the danger of the assumption of unproved additive formulas. The fallacy, of which ignorance of organic unity is a particular instance, may perhaps be mathematically represented thus: suppose f(x) is the goodness of x and f(y) is the goodness of y. It is then assumed that the goodness of x and y together is f(x) + f(y) when it is clearly f(x + y) and only in special cases will it be true that f(x + y) = f(x) + f(y). It is plain that it is never legitimate to assume this property in the case of any given function without proof.

J. M. Keynes “Ethics in Relation to Conduct(1903)

Since econometrics doesn’t content itself with only making optimal *predictions*, but also aspires to *explain* things in terms of causes and effects, econometricians need loads of assumptions — most important of these are *additivity* and *linearity*. Important, simply because if they are not true, your model is invalid and descriptively incorrect. It’s like calling your house a bicycle. No matter how you try, it won’t move you an inch. When the model is wrong — well, then it’s wrong.

## Econometrics — science based on questionable presumptions

6 Aug, 2020 at 12:11 | Posted in Statistics & Econometrics | Comments Off on Econometrics — science based on questionable presumptionsWhat we are asked to assume is that the precept can be carried out in economics by techniques which are established for linear systems, serially independent disturbances, error-free observations, and samples of a size not generally obtainable in economic time series today. In view of such limitations, anyone using these techniques must find himself appealing at every stage less to what theory is saying to him than to what solvability requirements demand of him. Certain it is that the empirical work of this school yields numerous instances in which open questions of economics are resolved in a way that saves a mathematical theorem.

Still, there are doubtless many who will be prepared to make the assumptions required by this theory on pragmatic grounds. We cannot know in advance how well or badly they will work, and they commend themselves on the practical test of convenience. Moreover, as the authors point out, a great many models are compatible with what we know in economics – that is to say, do not violate any matters on which economists are agreed. Attractive as this view is, it fails to draw a necessary distinction between what is assumed and what is merely proposed as hypothesis. This distinction is forced upon us by an obvious but neglected fact of statistical theory: the matters “assumed” are put wholly beyond test, and the entire edifice of conclusions (e.g., about identifiability, optimum properties of the estimates, their sampling distributions, etc.) depends absolutely on the validity of these assumptions. The great merit of modern statistical inference is that it makes exact and efficient use of what we know about reality to forge new tools of discovery, but it teaches us painfully little about the efficacy of these tools when their basis of assumptions is not satisfied. It may be that the approximations involved in the present theory are tolerable ones; only repeated attempts to use them can decide that issue. Evidence exists that trials in this empirical spirit are finding a place in the work of the econometric school, and one may look forward to substantial changes in the methodological presumptions that have dominated this field until now.

Maintaining that economics is a science in the ‘true knowledge’ business, yours truly remains a sceptic of the pretences and aspirations of econometrics. The marginal return on its ever higher technical sophistication in no way makes up for the lack of serious under-labouring of its deeper philosophical and methodological foundations that already Millard Hastay complained about. The rather one-sided emphasis of usefulness and its concomitant instrumentalist justification cannot hide that the legions of probabilistic econometricians who give supportive evidence for their considering it ‘fruitful to believe’ in the possibility of treating unique economic data as the observable results of random drawings from an imaginary sampling of an imaginary population, are skating on thin ice.

A rigorous application of econometric methods in economics presupposes that the phenomena of our real world economies are ruled by stable causal relations between variables. The endemic lack of predictive success of the econometric project indicates that this hope of finding fixed parameters is a hope for which there, really, is no other ground than hope itself.

## The dangers of calling your pet cat a dog

31 Jul, 2020 at 00:12 | Posted in Statistics & Econometrics | Comments Off on The dangers of calling your pet cat a dogThe assumption of additivity and linearity means that the outcome variable is, in reality, linearly related to any predictors … and that if you have several predictors then their combined effect is best described by adding their effects together …

This assumption is the most important because if it is not true then even if all other assumptions are met, your model is invalid because you have described it incorrectly. It’s a bit like calling your pet cat a dog: you can try to get it to go in a kennel, or to fetch sticks, or to sit when you tell it to, but don’t be surprised when its behaviour isn’t what you expect because even though you’ve called it a dog, it is in fact a cat. Similarly, if you have described your statistical model inaccurately it won’t behave itself and there’s no point in interpreting its parameter estimates or worrying about significance tests of confidence intervals: the model is wrong.

Econometrics fails miserably over and over again — and not only because of the additivity and linearity assumption

Another reason why it does, is that the error term in the regression models used is thought of as representing the effect of the variables that were omitted from the models. The error term is somehow thought to be a ‘cover-all’ term representing omitted content in the model and necessary to include to ‘save’ the assumed deterministic relation between the other random variables included in the model. Error terms are usually assumed to be orthogonal (uncorrelated) to the explanatory variables. But since they are unobservable, they are also impossible to empirically test. And without justification of the orthogonality assumption, there is, as a rule, nothing to ensure identifiability.

Nowadays it has almost become a self-evident truism among economists that you cannot expect people to take your arguments seriously unless they are based on or backed up by advanced econometric modelling. So legions of mathematical-statistical theorems are proved — and heaps of fiction are being produced, masquerading as science. The rigour of the econometric modelling and the far-reaching assumptions they are built on is frequently not supported by data.

## The alleged success of econometrics

4 Jul, 2020 at 12:01 | Posted in Statistics & Econometrics | 4 CommentsEconometricians typically hail the evolution of econometrics as a “big success”. For example, Geweke et al. (2006) argue that “econometrics has come a long way over a relatively short period” … Pagan (1987) describes econometrics as “outstanding success” because the work of econometric theorists has become “part of the process of economic investigation and the training of economists” …

These claims represent no more than self-glorifying rhetoric … The widespread use of econometrics is not indicative of success, just like the widespread use of drugs does not represent social success. Applications of econometric methods in almost every field of economics is not the same as saying that econometrics has enhanced our understanding of the underlying issues in every field of economics. It only shows that econometrics is no longer a means to an end but rather the end itself. The use of econometric models by government agencies has not led to improvement in policy making, as we move from one crisis to another …

The observation that econometric theory has become part of the training of economists and the other observation of excess demand for well-trained econometricians are far away from being measures of success … The alleged success of econometrics has led to the production of economics graduates who may be good at number crunching but do not know much about the various economic problems faced by humanity. It has also led to the brain drain inflicted on the society by the movement of physicists, mathematicians and engineers to economics and finance, particularly those looking for lucrative jobs in the financial sector. At the same time, some good economists have left the field or retired early because they could not cope with the success of econometrics.

Mainstream economists often hold the view that if you are critical of econometrics it can only be because you are a sadly misinformed and misguided person who dislikes and does not understand much of it.

As Moosa’s eminent article shows, this is, however, nothing but a gross misapprehension.

And just as Moosa, Keynes certainly did not misunderstand the crucial issues at stake in his critique of econometrics. Quite the contrary. He knew them all too well — and was not satisfied with the validity and philosophical underpinnings of the assumptions made for applying its methods.

Keynes’ critique is still valid and unanswered in the sense that the problems he pointed at are still with us today and ‘unsolved.’ Ignoring them — the most common practice among applied econometricians — is not to solve them.

To apply statistical and mathematical methods to the real-world economy, the econometrician has to make some quite strong assumptions. In a review of Tinbergen’s econometric work — published in *The Economic Journal* in 1939 — Keynes gave a comprehensive critique of Tinbergen’s work, focusing on the limiting and unreal character of the assumptions that econometric analyses build on:

**Completeness**: Where Tinbergen attempts to specify and quantify which different factors influence the business cycle, Keynes maintains there has to be a complete list of *all* the relevant factors to avoid misspecification and spurious causal claims. Usually, this problem is ‘solved’ by econometricians assuming that they somehow have a ‘correct’ model specification. Keynes is, to put it mildly, unconvinced:

It will be remembered that the seventy translators of the Septuagint were shut up in seventy separate rooms with the Hebrew text and brought out with them, when they emerged, seventy identical translations. Would the same miracle be vouchsafed if seventy multiple correlators were shut up with the same statistical material? And anyhow, I suppose, if each had a different economist perched on his

a priori, that would make a difference to the outcome.

**Homogeneity**: To make inductive inferences possible — and being able to apply econometrics — the system we try to analyze has to have a large degree of ‘homogeneity.’ According to Keynes most social and economic systems — especially from the perspective of real historical time — lack that ‘homogeneity.’ As he had argued already in *Treatise on Probability* (ch. 22), it wasn’t always possible to take repeated samples from a fixed population when we were analyzing real-world economies. In many cases, there simply are no reasons at all to assume the samples to be homogenous. Lack of ‘homogeneity’ makes the principle of ‘limited independent variety’ non-applicable, and hence makes inductive inferences, strictly seen, impossible since one of its fundamental logical premises are not satisfied. Without “much repetition and uniformity in our experience” there is no justification for placing “great confidence” in our inductions (TP ch. 8).

And then, of course, there is also the ‘reverse’ variability problem of non-excitation: factors that do not change significantly during the period analyzed, can still very well be extremely important causal factors.

**Stability:** Tinbergen assumes there is a stable spatio-temporal relationship between the variables his econometric models analyze. But as Keynes had argued already in his *Treatise on Probability* it was not really possible to make inductive generalizations based on correlations in one sample. As later studies of ‘regime shifts’ and ‘structural breaks’ have shown us, it is exceedingly difficult to find and establish the existence of stable econometric parameters for anything but rather short time series.

**Measurability:** Tinbergen’s model assumes that all relevant factors are measurable. Keynes questions if it is possible to adequately quantify and measure things like expectations and political and psychological factors. And more than anything, he questioned — both on epistemological and ontological grounds — that it was always and everywhere possible to measure real-world uncertainty with the help of probabilistic risk measures. Thinking otherwise can, as Keynes wrote, “only lead to error and delusion.”

**Independence**: Tinbergen assumes that the variables he treats are independent (still a standard assumption in econometrics). Keynes argues that in such a complex, organic, and evolutionary system as an economy, independence is a deeply unrealistic assumption to make. Building econometric models from that kind of simplistic and unrealistic assumptions risk producing nothing but spurious correlations and causalities. Real-world economies are organic systems for which the statistical methods used in econometrics are ill-suited, or even, strictly seen, inapplicable. Mechanical probabilistic models have little leverage when applied to non-atomic evolving organic systems — such as economies.

It is a great fault of symbolic pseudo-mathematical methods of formalising a system of economic analysis … that they expressly assume strict independence between the factors involved and lose all their cogency and authority if this hypothesis is disallowed; whereas, in ordinary discourse, where we are not blindly manipulating but know all the time what we are doing and what the words mean, we can keep “at the back of our heads” the necessary reserves and qualifications and the adjustments which we shall have to make later on, in a way in which we cannot keep complicated partial differentials “at the back” of several pages of algebra which assume that they all vanish.

Building econometric models can’t be a goal in itself. Good econometric models are means that make it possible for us to infer things about the real-world systems they ‘represent.’ If we can’t show that the mechanisms or causes that we isolate and handle in our econometric models are ‘exportable’ to the real world, they are of limited value to our understanding, explanations or predictions of real-world economic systems.

The kind of fundamental assumption about the character of material laws, on which scientists appear commonly to act, seems to me to be much less simple than the bare principle of uniformity. They appear to assume something much more like what mathematicians call the principle of the superposition of small effects, or, as I prefer to call it, in this connection, the

atomiccharacter of natural law. The system of the material universe must consist, if this kind of assumption is warranted, of bodies which we may term (without any implication as to their size being conveyed thereby)legal atoms, such that each of them exercises its own separate, independent, and invariable effect, a change of the total state being compounded of a number of separate changes each of which is solely due to a separate portion of the preceding state …The scientist wishes, in fact, to assume that the occurrence of a phenomenon which has appeared as part of a more complex phenomenon, may be some reason for expecting it to be associated on another occasion with part of the same complex. Yet if different wholes were subject to laws

quawholes and not simply on account of and in proportion to the differences of their parts, knowledge of a part could not lead, it would seem, even to presumptive or probable knowledge as to its association with other parts.

**Linearity:** To make his models tractable, Tinbergen assumes the relationships between the variables he study to be linear. This is still standard procedure today, but as Keynes writes:

It is a very drastic and usually improbable postulate to suppose that all economic forces are of this character, producing independent changes in the phenomenon under investigation which are directly proportional to the changes in themselves; indeed, it is ridiculous.

To Keynes, it was a ‘fallacy of reification’ to assume that all quantities are additive (an assumption closely linked to independence and linearity).

The unpopularity of the principle of organic unities shows very clearly how great is the danger of the assumption of unproved additive formulas. The fallacy, of which ignorance of organic unity is a particular instance, may perhaps be mathematically represented thus: suppose f(x) is the goodness of x and f(y) is the goodness of y. It is then assumed that the goodness of x and y together is f(x) + f(y) when it is clearly f(x + y) and only in special cases will it be true that f(x + y) = f(x) + f(y). It is plain that it is never legitimate to assume this property in the case of any given function without proof.

J. M. Keynes “Ethics in Relation to Conduct” (1903)

And as even one of the founding fathers of modern econometrics — Trygve Haavelmo — wrote:

What is the use of testing, say, the significance of regression coefficients, when maybe, the whole assumption of the linear regression equation is wrong?

Real-world social systems are usually not governed by stable causal mechanisms or capacities. The kinds of ‘laws’ and relations that econometrics has established, are laws and relations about entities in models that presuppose causal mechanisms and variables — and the relationship between them — being linear, additive, homogenous, stable, invariant and atomistic. But — when causal mechanisms operate in the real world they only do it in ever-changing and unstable combinations where the whole is more than a mechanical sum of parts. Since statisticians and econometricians — as far as I can see — haven’t been able to convincingly warrant their assumptions of homogeneity, stability, invariance, independence, additivity as being ontologically isomorphic to real-world economic systems, Keynes’ critique is still valid. As long as — as Keynes writes in a letter to Frisch in 1935 — “nothing emerges at the end which has not been introduced expressively or tacitly at the beginning,” I remain doubtful of the scientific aspirations of econometrics.

In his critique of Tinbergen, Keynes points us to the fundamental logical, epistemological, and ontological problems of applying statistical methods to a basically unpredictable, uncertain, complex, unstable, interdependent, and ever-changing social reality. Methods designed to analyze repeated sampling in controlled experiments under fixed conditions are not easily extended to an organic and non-atomistic world where time and history play decisive roles.

Econometric modeling should never be a substitute for thinking. From that perspective, it is really depressing to see how much of Keynes’ critique of the pioneering econometrics in the 1930s-1940s is still relevant today. And that is also a reason why yours truly — as does Moosa — has to keep on criticizing it.

The general line you take is interesting and useful. It is, of course, not exactly comparable with mine. I was raising the logical difficulties. You say in effect that, if one was to take these seriously, one would give up the ghost in the first lap, but that the method, used judiciously as an aid to more theoretical enquiries and as a means of suggesting possibilities and probabilities rather than anything else, taken with enough grains of salt and applied with superlative common sense, won’t do much harm. I should quite agree with that. That is how the method ought to be used.

J. M. Keynes, letter to E.J. Broster, December 19, 1939

Econometric patterns should never be seen as anything else than possible clues to follow. From a critical realist perspective, it is obvious that behind observable data there are real structures and mechanisms operating, things that are — if we really want to understand, explain and (possibly) predict things in the real world — more important to get hold of than to simply correlate and regress observable variables.

Math cannot establish the truth value of a fact. Never has. Never will.

## P-value — a poor substitute for scientific reasoning

28 Jun, 2020 at 23:16 | Posted in Statistics & Econometrics | Comments Off on P-value — a poor substitute for scientific reasoning

*All* science entails human judgment, and using statistical models doesn’t relieve us of that necessity. Working with misspecified models, the scientific value of significance testing is actually zero — even though you’re making valid statistical inferences! Statistical models and concomitant significance tests are no substitutes for doing real science.

In its standard form, a significance test is not the kind of ‘severe test’ that we are looking for in our search for being able to confirm or disconfirm empirical scientific hypotheses. This is problematic for many reasons, one being that there is a strong tendency to accept the null hypothesis since they can’t be rejected at the standard 5% significance level. In their standard form, significance tests bias against new hypotheses by making it hard to disconfirm the null hypothesis.

And as shown over and over again when it is applied, people have a tendency to read “not disconfirmed” as “probably confirmed.” Standard scientific methodology tells us that when there is only say a 10 % probability that pure sampling error could account for the observed difference between the data and the null hypothesis, it would be more “reasonable” to conclude that we have a case of disconfirmation. Especially if we perform many independent tests of our hypothesis and they all give the same 10% result as our reported one, I guess most researchers would count the hypothesis as even more disconfirmed.

Statistics is no substitute for thinking. We should never forget that the underlying parameters we use when performing significance tests are model constructions. Our p-values mean next to nothing if the model is wrong. Statistical significance tests do not validate models!

In many social sciences, p-values and null hypothesis significance testing (NHST) is often used to draw far-reaching scientific conclusions — despite the fact that they are as a rule poorly understood and that there exist alternatives that are easier to understand and more informative.

Not the least using confidence intervals (CIs) and effect sizes are to be preferred to the Neyman-Pearson-Fisher mishmash approach that is so often practiced by applied researchers.

Running a Monte Carlo simulation with 100 replications of a fictitious sample having N = 20, confidence intervals of 95%, a normally distributed population with a mean = 10 and a standard deviation of 20, taking two-tailed p-values on a zero null hypothesis, we get varying CIs (since they are based on varying sample standard deviations), but with a minimum of 3.2 and a maximum of 26.1, we still get a clear picture of what would happen in an infinite limit sequence. On the other hand p-values (even though from a purely mathematical-statistical sense more or less equivalent to CIs) vary strongly from sample to sample, and jumping around between a minimum of 0.007 and a maximum of 0.999 doesn’t give you a clue of what will happen in an infinite limit sequence!

[In case you want to do your own Monte Carlo simulation, here’s an example I’ve made using Gretl:

nulldata 20

loop 100 –progressive

series y = normal(10,15)

scalar zs = (10-mean(y))/sd(y)

scalar df = $nobs-1

scalar ybar=mean(y)

scalar ysd= sd(y)

scalar ybarsd=ysd/sqrt($nobs)

scalar tstat = (ybar-10)/ybarsd

pvalue t df tstat

scalar lowb = mean(y) – critical(t,df,0.025)*ybarsd

scalar uppb = mean(y) + critical(t,df,0.025)*ybarsd

scalar pval = pvalue(t,df,tstat)

store E:\pvalcoeff.gdt lowb uppb pval

endloop

]

## Golden ratio (student stuff)

28 Jun, 2020 at 18:58 | Posted in Statistics & Econometrics | Comments Off on Golden ratio (student stuff)

## The rhetoric of imaginary populations

24 Jun, 2020 at 09:16 | Posted in Economics, Statistics & Econometrics | Comments Off on The rhetoric of imaginary populationsThe most

expedientpopulation and data generation model to adopt is one in which the population is regarded as a realization of an infinite super population. This setup is the standard perspective in mathematical statistics, in which random variables are assumed to exist with fixed moments for an uncountable and unspecified universe of events …This perspective is tantamount to assuming a population machine that spawns individuals forever (i.e., the analog to a coin that can be flipped forever). Each individual is born as a set of random draws from the distributions of Y¹, Y°, and additional variables collectively denoted by S …

Because of its

expediency, we will usually write with the superpopulation model in the background, even though the notions of infinite superpopulations and sequences of sample sizes approaching infinity aremanifestly unrealistic.

In econometrics one often gets the feeling that many of its practitioners think of it as a kind of automatic inferential machine: input data and out comes casual knowledge. This is like pulling a rabbit from a hat. Great — but first you have to put the rabbit in the hat. And this is where assumptions come into the picture.

The assumption of imaginary ‘super populations’ is one of the many dubious assumptions used in modern econometrics.

As social scientists — and economists — we have to confront the all-important question of how to handle uncertainty and randomness. Should we define randomness with probability? If we do, we have to accept that to speak of randomness we also have to presuppose the existence of nomological probability machines, since probabilities cannot be spoken of – and actually, to be strict, do not at all exist – without specifying such system-contexts. Accepting a domain of probability theory and sample space of infinite populations also implies that judgments are made on the basis of observations that are actually never made!

Infinitely repeated trials or samplings never take place in the real world. So that cannot be a sound inductive basis for a science with aspirations of explaining real-world socio-economic processes, structures or events. It’s not tenable.

In *Statistical Models and Causal Inference: A Dialogue with the Social Sciences *David Freedman also touched on this fundamental problem, arising when you try to apply statistical models outside overly simple nomological machines like coin tossing and roulette wheels:

Lurking behind the typical regression model will be found a host of such assumptions; without them, legitimate inferences cannot be drawn from the model. There are statistical procedures for testing some of these assumptions. However, the tests often lack the power to detect substantial failures. Furthermore, model testing may become circular; breakdowns in assumptions are detected, and the model is redefined to accommodate. In short,

hiding the problems can become a major goal of model building.Using models to make predictions of the future, or the results of interventions, would be a valuable corrective. Testing the model on a variety of data sets – rather than fitting refinements over and over again to the same data set – might be a good second-best … Built into the equation is a model for non-discriminatory behavior: the coefficient d vanishes. If the company discriminates, that part of the model cannot be validated at all.

Regression models are widely used by social scientists to make causal inferences; such models are now almost a routine way of demonstrating counterfactuals.

However, the “demonstrations” generally turn out to depend on a series of untested, even unarticulated, technical assumptions.Under the circumstances, reliance on model outputs may be quite unjustified. Making the ideas of validation somewhat more precise is a serious problem in the philosophy of science. That models should correspond to reality is, after all, a useful but not totally straightforward idea – with some history to it. Developing appropriate models is a serious problem in statistics; testing the connection to the phenomena is even more serious …In our days, serious arguments have been made from data. Beautiful, delicate theorems have been proved, although the connection with data analysis often remains to be established. And

an enormous amount of fiction has been produced, masquerading as rigorous science.

And as if this wasn’t enough, one could — as we’ve seen — also seriously wonder what kind of ‘populations’ these statistical and econometric models ultimately are based on. Why should we as social scientists — and not as pure mathematicians working with formal-axiomatic systems without the urge to confront our models with real target systems — unquestioningly accept models based on concepts like the ‘infinite super populations’ used in e.g. the ‘potential outcome’ framework that has become so popular lately in social sciences?

Of course one could treat observational or experimental data as random samples from real populations. I have no problem with that (although it has to be noted that most ‘natural experiments’ are *not* based on random sampling from some underlying population — which, of course, means that the effect-estimators, strictly seen, only are unbiased for the specific groups studied). But probabilistic econometrics does not content itself with that kind of populations. Instead, it creates imaginary populations of ‘parallel universes’ and assume that our data are random samples from that kind of ‘infinite super populations.’

But this is actually nothing else but hand-waving! And it is inadequate for real science. As David Freedman writes:

With this approach, the investigator does not explicitly define a population that could in principle be studied, with unlimited resources of time and money. The investigator merely

assumesthat such a population exists in some ill-defined sense. And there is a further assumption, that the data set being analyzed can be treatedas ifit were based on a random sample from the assumed population.These are convenient fictions… Nevertheless, reliance on imaginary populations is widespread. Indeed regression models are commonly used to analyze convenience samples… The rhetoric of imaginary populations is seductive because it seems to free the investigator from the necessity of understanding how data were generated.

In social sciences — including economics — it’s always wise to ponder C. S. Peirce’s remark that universes are not as common as peanuts …

Blog at WordPress.com.

Entries and comments feeds.