## The alleged success of econometrics

4 Jul, 2020 at 12:01 | Posted in Statistics & Econometrics | 4 CommentsEconometricians typically hail the evolution of econometrics as a “big success”. For example, Geweke et al. (2006) argue that “econometrics has come a long way over a relatively short period” … Pagan (1987) describes econometrics as “outstanding success” because the work of econometric theorists has become “part of the process of economic investigation and the training of economists” …

These claims represent no more than self-glorifying rhetoric … The widespread use of econometrics is not indicative of success, just like the widespread use of drugs does not represent social success. Applications of econometric methods in almost every field of economics is not the same as saying that econometrics has enhanced our understanding of the underlying issues in every field of economics. It only shows that econometrics is no longer a means to an end but rather the end itself. The use of econometric models by government agencies has not led to improvement in policy making, as we move from one crisis to another …

The observation that econometric theory has become part of the training of economists and the other observation of excess demand for well-trained econometricians are far away from being measures of success … The alleged success of econometrics has led to the production of economics graduates who may be good at number crunching but do not know much about the various economic problems faced by humanity. It has also led to the brain drain inflicted on the society by the movement of physicists, mathematicians and engineers to economics and finance, particularly those looking for lucrative jobs in the financial sector. At the same time, some good economists have left the field or retired early because they could not cope with the success of econometrics.

Mainstream economists often hold the view that if you are critical of econometrics it can only be because you are a sadly misinformed and misguided person who dislikes and does not understand much of it.

As Moosa’s eminent article shows, this is, however, nothing but a gross misapprehension.

And just as Moosa, Keynes certainly did not misunderstand the crucial issues at stake in his critique of econometrics. Quite the contrary. He knew them all too well — and was not satisfied with the validity and philosophical underpinnings of the assumptions made for applying its methods.

Keynes’ critique is still valid and unanswered in the sense that the problems he pointed at are still with us today and ‘unsolved.’ Ignoring them — the most common practice among applied econometricians — is not to solve them.

To apply statistical and mathematical methods to the real-world economy, the econometrician has to make some quite strong assumptions. In a review of Tinbergen’s econometric work — published in *The Economic Journal* in 1939 — Keynes gave a comprehensive critique of Tinbergen’s work, focusing on the limiting and unreal character of the assumptions that econometric analyses build on:

**Completeness**: Where Tinbergen attempts to specify and quantify which different factors influence the business cycle, Keynes maintains there has to be a complete list of *all* the relevant factors to avoid misspecification and spurious causal claims. Usually, this problem is ‘solved’ by econometricians assuming that they somehow have a ‘correct’ model specification. Keynes is, to put it mildly, unconvinced:

It will be remembered that the seventy translators of the Septuagint were shut up in seventy separate rooms with the Hebrew text and brought out with them, when they emerged, seventy identical translations. Would the same miracle be vouchsafed if seventy multiple correlators were shut up with the same statistical material? And anyhow, I suppose, if each had a different economist perched on his

a priori, that would make a difference to the outcome.

**Homogeneity**: To make inductive inferences possible — and being able to apply econometrics — the system we try to analyze has to have a large degree of ‘homogeneity.’ According to Keynes most social and economic systems — especially from the perspective of real historical time — lack that ‘homogeneity.’ As he had argued already in *Treatise on Probability* (ch. 22), it wasn’t always possible to take repeated samples from a fixed population when we were analyzing real-world economies. In many cases, there simply are no reasons at all to assume the samples to be homogenous. Lack of ‘homogeneity’ makes the principle of ‘limited independent variety’ non-applicable, and hence makes inductive inferences, strictly seen, impossible since one of its fundamental logical premises are not satisfied. Without “much repetition and uniformity in our experience” there is no justification for placing “great confidence” in our inductions (TP ch. 8).

And then, of course, there is also the ‘reverse’ variability problem of non-excitation: factors that do not change significantly during the period analyzed, can still very well be extremely important causal factors.

**Stability:** Tinbergen assumes there is a stable spatio-temporal relationship between the variables his econometric models analyze. But as Keynes had argued already in his *Treatise on Probability* it was not really possible to make inductive generalizations based on correlations in one sample. As later studies of ‘regime shifts’ and ‘structural breaks’ have shown us, it is exceedingly difficult to find and establish the existence of stable econometric parameters for anything but rather short time series.

**Measurability:** Tinbergen’s model assumes that all relevant factors are measurable. Keynes questions if it is possible to adequately quantify and measure things like expectations and political and psychological factors. And more than anything, he questioned — both on epistemological and ontological grounds — that it was always and everywhere possible to measure real-world uncertainty with the help of probabilistic risk measures. Thinking otherwise can, as Keynes wrote, “only lead to error and delusion.”

**Independence**: Tinbergen assumes that the variables he treats are independent (still a standard assumption in econometrics). Keynes argues that in such a complex, organic, and evolutionary system as an economy, independence is a deeply unrealistic assumption to make. Building econometric models from that kind of simplistic and unrealistic assumptions risk producing nothing but spurious correlations and causalities. Real-world economies are organic systems for which the statistical methods used in econometrics are ill-suited, or even, strictly seen, inapplicable. Mechanical probabilistic models have little leverage when applied to non-atomic evolving organic systems — such as economies.

It is a great fault of symbolic pseudo-mathematical methods of formalising a system of economic analysis … that they expressly assume strict independence between the factors involved and lose all their cogency and authority if this hypothesis is disallowed; whereas, in ordinary discourse, where we are not blindly manipulating but know all the time what we are doing and what the words mean, we can keep “at the back of our heads” the necessary reserves and qualifications and the adjustments which we shall have to make later on, in a way in which we cannot keep complicated partial differentials “at the back” of several pages of algebra which assume that they all vanish.

Building econometric models can’t be a goal in itself. Good econometric models are means that make it possible for us to infer things about the real-world systems they ‘represent.’ If we can’t show that the mechanisms or causes that we isolate and handle in our econometric models are ‘exportable’ to the real world, they are of limited value to our understanding, explanations or predictions of real-world economic systems.

The kind of fundamental assumption about the character of material laws, on which scientists appear commonly to act, seems to me to be much less simple than the bare principle of uniformity. They appear to assume something much more like what mathematicians call the principle of the superposition of small effects, or, as I prefer to call it, in this connection, the

atomiccharacter of natural law. The system of the material universe must consist, if this kind of assumption is warranted, of bodies which we may term (without any implication as to their size being conveyed thereby)legal atoms, such that each of them exercises its own separate, independent, and invariable effect, a change of the total state being compounded of a number of separate changes each of which is solely due to a separate portion of the preceding state …The scientist wishes, in fact, to assume that the occurrence of a phenomenon which has appeared as part of a more complex phenomenon, may be some reason for expecting it to be associated on another occasion with part of the same complex. Yet if different wholes were subject to laws

quawholes and not simply on account of and in proportion to the differences of their parts, knowledge of a part could not lead, it would seem, even to presumptive or probable knowledge as to its association with other parts.

**Linearity:** To make his models tractable, Tinbergen assumes the relationships between the variables he study to be linear. This is still standard procedure today, but as Keynes writes:

It is a very drastic and usually improbable postulate to suppose that all economic forces are of this character, producing independent changes in the phenomenon under investigation which are directly proportional to the changes in themselves; indeed, it is ridiculous.

To Keynes, it was a ‘fallacy of reification’ to assume that all quantities are additive (an assumption closely linked to independence and linearity).

The unpopularity of the principle of organic unities shows very clearly how great is the danger of the assumption of unproved additive formulas. The fallacy, of which ignorance of organic unity is a particular instance, may perhaps be mathematically represented thus: suppose f(x) is the goodness of x and f(y) is the goodness of y. It is then assumed that the goodness of x and y together is f(x) + f(y) when it is clearly f(x + y) and only in special cases will it be true that f(x + y) = f(x) + f(y). It is plain that it is never legitimate to assume this property in the case of any given function without proof.

J. M. Keynes “Ethics in Relation to Conduct” (1903)

And as even one of the founding fathers of modern econometrics — Trygve Haavelmo — wrote:

What is the use of testing, say, the significance of regression coefficients, when maybe, the whole assumption of the linear regression equation is wrong?

Real-world social systems are usually not governed by stable causal mechanisms or capacities. The kinds of ‘laws’ and relations that econometrics has established, are laws and relations about entities in models that presuppose causal mechanisms and variables — and the relationship between them — being linear, additive, homogenous, stable, invariant and atomistic. But — when causal mechanisms operate in the real world they only do it in ever-changing and unstable combinations where the whole is more than a mechanical sum of parts. Since statisticians and econometricians — as far as I can see — haven’t been able to convincingly warrant their assumptions of homogeneity, stability, invariance, independence, additivity as being ontologically isomorphic to real-world economic systems, Keynes’ critique is still valid. As long as — as Keynes writes in a letter to Frisch in 1935 — “nothing emerges at the end which has not been introduced expressively or tacitly at the beginning,” I remain doubtful of the scientific aspirations of econometrics.

In his critique of Tinbergen, Keynes points us to the fundamental logical, epistemological, and ontological problems of applying statistical methods to a basically unpredictable, uncertain, complex, unstable, interdependent, and ever-changing social reality. Methods designed to analyze repeated sampling in controlled experiments under fixed conditions are not easily extended to an organic and non-atomistic world where time and history play decisive roles.

Econometric modeling should never be a substitute for thinking. From that perspective, it is really depressing to see how much of Keynes’ critique of the pioneering econometrics in the 1930s-1940s is still relevant today. And that is also a reason why yours truly — as does Moosa — has to keep on criticizing it.

The general line you take is interesting and useful. It is, of course, not exactly comparable with mine. I was raising the logical difficulties. You say in effect that, if one was to take these seriously, one would give up the ghost in the first lap, but that the method, used judiciously as an aid to more theoretical enquiries and as a means of suggesting possibilities and probabilities rather than anything else, taken with enough grains of salt and applied with superlative common sense, won’t do much harm. I should quite agree with that. That is how the method ought to be used.

J. M. Keynes, letter to E.J. Broster, December 19, 1939

Econometric patterns should never be seen as anything else than possible clues to follow. From a critical realist perspective, it is obvious that behind observable data there are real structures and mechanisms operating, things that are — if we really want to understand, explain and (possibly) predict things in the real world — more important to get hold of than to simply correlate and regress observable variables.

Math cannot establish the truth value of a fact. Never has. Never will.

## P-value — a poor substitute for scientific reasoning

28 Jun, 2020 at 23:16 | Posted in Statistics & Econometrics | Leave a comment

*All* science entails human judgment, and using statistical models doesn’t relieve us of that necessity. Working with misspecified models, the scientific value of significance testing is actually zero — even though you’re making valid statistical inferences! Statistical models and concomitant significance tests are no substitutes for doing real science.

In its standard form, a significance test is not the kind of ‘severe test’ that we are looking for in our search for being able to confirm or disconfirm empirical scientific hypotheses. This is problematic for many reasons, one being that there is a strong tendency to accept the null hypothesis since they can’t be rejected at the standard 5% significance level. In their standard form, significance tests bias against new hypotheses by making it hard to disconfirm the null hypothesis.

And as shown over and over again when it is applied, people have a tendency to read “not disconfirmed” as “probably confirmed.” Standard scientific methodology tells us that when there is only say a 10 % probability that pure sampling error could account for the observed difference between the data and the null hypothesis, it would be more “reasonable” to conclude that we have a case of disconfirmation. Especially if we perform many independent tests of our hypothesis and they all give the same 10% result as our reported one, I guess most researchers would count the hypothesis as even more disconfirmed.

Statistics is no substitute for thinking. We should never forget that the underlying parameters we use when performing significance tests are model constructions. Our p-values mean next to nothing if the model is wrong. Statistical significance tests do not validate models!

In many social sciences, p-values and null hypothesis significance testing (NHST) is often used to draw far-reaching scientific conclusions — despite the fact that they are as a rule poorly understood and that there exist alternatives that are easier to understand and more informative.

Not the least using confidence intervals (CIs) and effect sizes are to be preferred to the Neyman-Pearson-Fisher mishmash approach that is so often practiced by applied researchers.

Running a Monte Carlo simulation with 100 replications of a fictitious sample having N = 20, confidence intervals of 95%, a normally distributed population with a mean = 10 and a standard deviation of 20, taking two-tailed p-values on a zero null hypothesis, we get varying CIs (since they are based on varying sample standard deviations), but with a minimum of 3.2 and a maximum of 26.1, we still get a clear picture of what would happen in an infinite limit sequence. On the other hand p-values (even though from a purely mathematical-statistical sense more or less equivalent to CIs) vary strongly from sample to sample, and jumping around between a minimum of 0.007 and a maximum of 0.999 doesn’t give you a clue of what will happen in an infinite limit sequence!

[In case you want to do your own Monte Carlo simulation, here’s an example I’ve made using Gretl:

nulldata 20

loop 100 –progressive

series y = normal(10,15)

scalar zs = (10-mean(y))/sd(y)

scalar df = $nobs-1

scalar ybar=mean(y)

scalar ysd= sd(y)

scalar ybarsd=ysd/sqrt($nobs)

scalar tstat = (ybar-10)/ybarsd

pvalue t df tstat

scalar lowb = mean(y) – critical(t,df,0.025)*ybarsd

scalar uppb = mean(y) + critical(t,df,0.025)*ybarsd

scalar pval = pvalue(t,df,tstat)

store E:\pvalcoeff.gdt lowb uppb pval

endloop

]

## Golden ratio (student stuff)

28 Jun, 2020 at 18:58 | Posted in Statistics & Econometrics | Leave a comment

## The rhetoric of imaginary populations

24 Jun, 2020 at 09:16 | Posted in Economics, Statistics & Econometrics | Leave a commentThe most

expedientpopulation and data generation model to adopt is one in which the population is regarded as a realization of an infinite super population. This setup is the standard perspective in mathematical statistics, in which random variables are assumed to exist with fixed moments for an uncountable and unspecified universe of events …This perspective is tantamount to assuming a population machine that spawns individuals forever (i.e., the analog to a coin that can be flipped forever). Each individual is born as a set of random draws from the distributions of Y¹, Y°, and additional variables collectively denoted by S …

Because of its

expediency, we will usually write with the superpopulation model in the background, even though the notions of infinite superpopulations and sequences of sample sizes approaching infinity aremanifestly unrealistic.

In econometrics one often gets the feeling that many of its practitioners think of it as a kind of automatic inferential machine: input data and out comes casual knowledge. This is like pulling a rabbit from a hat. Great — but first you have to put the rabbit in the hat. And this is where assumptions come into the picture.

The assumption of imaginary ‘super populations’ is one of the many dubious assumptions used in modern econometrics.

As social scientists — and economists — we have to confront the all-important question of how to handle uncertainty and randomness. Should we define randomness with probability? If we do, we have to accept that to speak of randomness we also have to presuppose the existence of nomological probability machines, since probabilities cannot be spoken of – and actually, to be strict, do not at all exist – without specifying such system-contexts. Accepting a domain of probability theory and sample space of infinite populations also implies that judgments are made on the basis of observations that are actually never made!

Infinitely repeated trials or samplings never take place in the real world. So that cannot be a sound inductive basis for a science with aspirations of explaining real-world socio-economic processes, structures or events. It’s not tenable.

In *Statistical Models and Causal Inference: A Dialogue with the Social Sciences *David Freedman also touched on this fundamental problem, arising when you try to apply statistical models outside overly simple nomological machines like coin tossing and roulette wheels:

Lurking behind the typical regression model will be found a host of such assumptions; without them, legitimate inferences cannot be drawn from the model. There are statistical procedures for testing some of these assumptions. However, the tests often lack the power to detect substantial failures. Furthermore, model testing may become circular; breakdowns in assumptions are detected, and the model is redefined to accommodate. In short,

hiding the problems can become a major goal of model building.Using models to make predictions of the future, or the results of interventions, would be a valuable corrective. Testing the model on a variety of data sets – rather than fitting refinements over and over again to the same data set – might be a good second-best … Built into the equation is a model for non-discriminatory behavior: the coefficient d vanishes. If the company discriminates, that part of the model cannot be validated at all.

Regression models are widely used by social scientists to make causal inferences; such models are now almost a routine way of demonstrating counterfactuals.

However, the “demonstrations” generally turn out to depend on a series of untested, even unarticulated, technical assumptions.Under the circumstances, reliance on model outputs may be quite unjustified. Making the ideas of validation somewhat more precise is a serious problem in the philosophy of science. That models should correspond to reality is, after all, a useful but not totally straightforward idea – with some history to it. Developing appropriate models is a serious problem in statistics; testing the connection to the phenomena is even more serious …In our days, serious arguments have been made from data. Beautiful, delicate theorems have been proved, although the connection with data analysis often remains to be established. And

an enormous amount of fiction has been produced, masquerading as rigorous science.

And as if this wasn’t enough, one could — as we’ve seen — also seriously wonder what kind of ‘populations’ these statistical and econometric models ultimately are based on. Why should we as social scientists — and not as pure mathematicians working with formal-axiomatic systems without the urge to confront our models with real target systems — unquestioningly accept models based on concepts like the ‘infinite super populations’ used in e.g. the ‘potential outcome’ framework that has become so popular lately in social sciences?

Of course one could treat observational or experimental data as random samples from real populations. I have no problem with that (although it has to be noted that most ‘natural experiments’ are *not* based on random sampling from some underlying population — which, of course, means that the effect-estimators, strictly seen, only are unbiased for the specific groups studied). But probabilistic econometrics does not content itself with that kind of populations. Instead, it creates imaginary populations of ‘parallel universes’ and assume that our data are random samples from that kind of ‘infinite super populations.’

But this is actually nothing else but hand-waving! And it is inadequate for real science. As David Freedman writes:

With this approach, the investigator does not explicitly define a population that could in principle be studied, with unlimited resources of time and money. The investigator merely

assumesthat such a population exists in some ill-defined sense. And there is a further assumption, that the data set being analyzed can be treatedas ifit were based on a random sample from the assumed population.These are convenient fictions… Nevertheless, reliance on imaginary populations is widespread. Indeed regression models are commonly used to analyze convenience samples… The rhetoric of imaginary populations is seductive because it seems to free the investigator from the necessity of understanding how data were generated.

In social sciences — including economics — it’s always wise to ponder C. S. Peirce’s remark that universes are not as common as peanuts …

## Econometric self-deceptions

11 Jun, 2020 at 10:47 | Posted in Statistics & Econometrics | 2 CommentsOne may wonder how much calibration adds to the knowledge of economic structures and the deep parameters involved …

First, few ‘deep parameters’ have been established at all …

Second, even where estimates are available from micro-econometric investigations, they cannot be automatically imported into aggregated general equilibrium models …

Third, calibration hardly contributes to growth of knowledge about ‘deep parameters’. These deep parameters are confronted with a novel context (aggregate time-series), but this is not used for inference on their behalf. Rather, the new context is used to fit the model to presumed ‘laws of motion’ of the economy …

There are many kinds of useless economics held in high regard within mainstream economics establishment today. Few — if any — are less deserved than the macroeconomic theory/method — mostly connected with Nobel laureates Finn Kydland, Robert Lucas, Edward Prescott and Thomas Sargent — called calibration.

Hugo Keuzenkamp and yours truly are certainly not the only ones having doubts about the scientific value of calibration. In *Journal of Economic Perspective* (1996, vol. 10) Lars Peter Hansen and James J. Heckman write:

It is only under very special circumstances that a micro parameter such as the inter-temporal elasticity of substitution or even a marginal propensity to consume out of income can be ‘plugged into’ a representative consumer model to produce an empirically concordant aggregate model … What credibility should we attach to numbers produced from their ‘computational experiments’, and why should we use their ‘calibrated models’ as a basis for serious quantitative policy evaluation? … There is no filing cabinet full of robust micro estimates ready to use in calibrating dynamic stochastic equilibrium models … The justification for what is called ‘calibration’ is vague and confusing.

Mathematical statistician Aris Spanos — in *Error and Inference* (Mayo & Spanos, 2010, p. 240) — is no less critical:

Given that “calibration” purposefully foresakes error probabilities and provides no way to assess the reliability of inference, how does one assess the adequacy of the calibrated model? …

The idea that it should suffice that a theory “is not obscenely at variance with the data” (Sargent, 1976, p. 233) is to disregard the work that statistical inference can perform in favor of some discretional subjective appraisal … it hardly recommends itself as an empirical methodology that lives up to the standards of scientific objectivity

In physics, it may possibly not be straining credulity too much to model processes as ergodic – where time and history do not really matter – but in social and historical sciences it is obviously ridiculous. If societies and economies were ergodic worlds, why do econometricians fervently discuss things such as structural breaks and regime shifts? That they do is an indication of the unrealisticness of treating open systems as analyzable with ergodic concepts.

The future is not reducible to a known set of prospects. It is not like sitting at the roulette table and calculating what the future outcomes of spinning the wheel will be. Reading Lucas, Sargent, Prescott, Kydland and other calibrationists one comes to think of Robert Clower’s apt remark that

much economics is so far removed from anything that remotely resembles the real world that it’s often difficult for economists to take their own subject seriously.

Instead of assuming calibration and rational expectations to be right, one ought to confront the hypothesis with the available evidence. It is not enough to construct models. Anyone can construct models. To be seriously interesting, models have to come with an aim. They have to have an intended use. If the intention of calibration and rational expectations is to help us explain real economies, it has to be evaluated from that perspective. A model or hypothesis without specific applicability is not really deserving of our interest.

To say, as Edward Prescott** **that

one can only test if some theory, whether it incorporates rational expectations or, for that matter, irrational expectations, is or is not consistent with observations

is not enough. Without strong evidence, all kinds of absurd claims and nonsense may pretend to be science. We have to demand more of a justification than this rather watered-down version of “anything goes” when it comes to rationality postulates. If one proposes rational expectations one also has to support its underlying assumptions. None is given, which makes it rather puzzling how rational expectations has become the standard modelling assumption made in much of modern macroeconomics. Perhaps the reason is, as Paul Krugman has it, that economists often mistake

beauty, clad in impressive looking mathematics, for truth.

But I think Prescott’s view is also the reason why calibration economists are not particularly interested in empirical examinations of how real choices and decisions are made in real economies. In the hands of Lucas, Prescott and Sargent, rational expectations have been transformed from an — in-principle — testable hypothesis to an irrefutable proposition. Believing in a set of irrefutable propositions may be comfortable — like religious convictions or ideological dogmas — but it is not science.

## Randomization

1 Jun, 2020 at 23:52 | Posted in Statistics & Econometrics | Comments Off on Randomization

A great video, but — there’s always a but — unfortunately also not without some analytical shortcomings.

The point of making a randomized experiment is often said to be that it ‘ensures’ that any correlation between a supposed cause and effect indicates a causal relation. This is believed to hold since randomization (allegedly) ensures that a supposed causal variable does not correlate with other variables that may influence the effect.

The problem with that (rather simplistic) view on randomization is that the claims made are both exaggerated and strictly seen false:

• Even if you manage to do the assignment to treatment and control groups ideally random, the sample selection certainly is — except in extremely rare cases — not random. Even if we make a proper randomized assignment, if we apply the results to a biased sample, there is always the risk that the experimental findings will not apply. What works ‘there,’ does not work ‘here.’ Randomization hence does not ‘guarantee ‘ or ‘ensure’ making the right causal claim. Although randomization may help us rule out certain possible causal claims, randomization per se does not guarantee anything!

• Even if both sampling and assignment are made in an ideal random way, performing standard randomized experiments only give you averages. The problem here is that although we may get an estimate of the ‘true’ average causal effect, this may ‘mask’ important heterogeneous effects of a causal nature. Although we get the right answer of the average causal effect being 0, those who are ‘treated’ may have causal effects equal to -100 and those ‘not treated’ may have causal effects equal to 100. Contemplating being treated or not, most people would probably be interested in knowing about this underlying heterogeneity and would not consider the average effect particularly enlightening.

• There is almost always a trade-off between bias and precision. In real-world settings, a little bias often does not overtrump greater precision. And — most importantly — in case we have a population with sizeable heterogeneity, the average treatment effect of the sample may differ substantially from the average treatment effect in the population. If so, the value of any extrapolating inferences made from trial samples to other populations is highly questionable.

• Since most real-world experiments and trials build on performing a single randomization, what would happen if you kept on randomizing forever, does not help you to ‘ensure’ or ‘guarantee’ that you do not make false causal conclusions in the one particular randomized experiment you actually do perform. It is indeed difficult to see why thinking about what you know you will never do, would make you happy about what you actually do.

The problem many ‘randomistas’ end up with when underestimating heterogeneity and interaction is not only an external validity problem when trying to ‘export’ regression results to different times or different target populations. It is also often an internal problem to the millions of regression estimates that economists produce every year.

‘Ideally controlled experiments’ tell us with certainty what causes what effects — but only given the right ‘closures.’ Making appropriate extrapolations from (ideal, accidental, natural, or quasi) experiments to different settings, populations, or target systems, is not easy. And since trials usually are not repeated, unbiasedness and balance on average over repeated trials say nothing about anyone trial. ‘It works there’ is no evidence for ‘it will work here.’ Causes deduced in an experimental setting still have to show that they come with an export-warrant to the target population/system. The causal background assumptions made have to be justified, and without licenses to export, the value of ‘rigorous’ and ‘precise’ methods — and ‘on-average-knowledge’ — is despairingly small.

RCTs have very little reach beyond giving descriptions of what has happened in the past. From the perspective of the future and for policy purposes they are as a rule of limited value since they cannot tell us what background factors were held constant when the trial intervention was being made.

RCTs usually do not provide evidence that the results are exportable to other target systems. RCTs cannot be taken for granted to give generalizable results. That something works somewhere for someone is no warranty for us to believe it to work for us here or even that it works generally.

## Haavelmo and modern probabilistic econometrics — a critical-realist perspective (wonkish)

24 Apr, 2020 at 11:52 | Posted in Statistics & Econometrics | 2 CommentsMainstream economists often hold the view that criticisms of econometrics are the conclusions of sadly misinformed and misguided people who dislike and do not understand much of it. This is a gross misapprehension. To be careful and cautious is not equivalent to dislike.

The ordinary deductivist ‘textbook approach’ to econometrics views the modelling process as foremost an estimation problem since one (at least implicitly) assumes that the model provided by economic theory is a well-specified and ‘true’ model. The more empiricist, general-to-specific-methodology (often identified as the ‘LSE approach’) on the other hand views models as theoretically and empirically adequate representations (approximations) of a data generating process (DGP). Diagnostics tests (mostly some variant of the F-test) are used to ensure that the models are ‘true’ – or at least ‘congruent’ – representations of the DGP. The modelling process is here more seen as a specification problem where poor diagnostics results may indicate a possible misspecification requiring re-specification of the model. The objective is standardly to identify models that are structurally stable and valid across a large time-space horizon. The DGP is not seen as something we already know, but rather something we discover in the process of modelling it. Considerable effort is put into testing to what extent the models are structurally stable and generalizable over space and time.

Although I have sympathy for this approach in general, there are still some unsolved ‘problematics’ with its epistemological and ontological presuppositions. There is, e. g., an implicit assumption that the DGP fundamentally has an invariant property and that models that are structurally unstable just have not been able to get hold of that invariance. But, as already Keynes maintained, one cannot just presuppose or take for granted that kind of invariance. It has to be argued and justified. Grounds have to be given for viewing reality as satisfying conditions of model-closure. It is as if the lack of closure that shows up in the form of structurally unstable models somehow could be solved by searching for more autonomous and invariable ‘atomic uniformity.’ But if reality is ‘congruent’ to this analytical prerequisite has to be argued for, and not simply taken for granted.

A great many models are compatible with what we know in economics — that is to say, do not violate any matters on which economists are agreed. Attractive as this view is, it fails to draw a necessary distinction between what is assumed and what is merely proposed as hypothesis. This distinction is forced upon us by an obvious but neglected fact of statistical theory: the matters ‘assumed’ are put wholly beyond test, and the entire edifice of conclusions (e.g., about identifiability, optimum properties of the estimates, their sampling distributions, etc.) depends absolutely on the validity of these assumptions. The great merit of modern statistical inference is that it makes exact and efficient use of what we know about reality to forge new tools of discovery, but it teaches us painfully little about the efficacy of these tools when their basis of assumptions is not satisfied.

Even granted that closures come in degrees, we should not compromise on ontology. Some methods simply introduce improper closures, closures that make the disjuncture between models and real-world target systems inappropriately large. ‘Garbage in, garbage out.’

Underlying the search for these immutable ‘fundamentals’ lays the implicit view of the world as consisting of entities with their own separate and invariable effects. These entities are thought of as being able to be treated as separate and addible causes, thereby making it possible to infer complex interaction from a knowledge of individual constituents with limited independent variety. But, again, if this is a justified analytical procedure cannot be answered without confronting it with the nature of the objects the models are supposed to describe, explain or predict. Keynes himself thought it generally inappropriate to apply the ‘atomic hypothesis’ to such an open and ‘organic entity’ as the real world. As far as I can see these are still appropriate strictures all econometric approaches have to face. Grounds for believing otherwise have to be provided by the econometricians.

## Causal inference (student stuff)

21 Apr, 2020 at 18:52 | Posted in Statistics & Econometrics | Comments Off on Causal inference (student stuff)

## Read my lips — using an RCT guarantees nothing!

20 Apr, 2020 at 17:33 | Posted in Statistics & Econometrics | 1 CommentThe claimed hierarchy of methods, with randomized assignment being deemed inherently superior to observational studies, does not survive close scrutiny. Despite frequent claims to the contrary, an RCT does not equate counterfactual outcomes between treated and control units. The fact that systematic bias in estimating the mean impact vanishes in expectation (under ideal conditions) does not imply that the (unknown) experimental error in a one-off RCT is less than the (unknown) error in some alternative observational study. We obviously cannot know that. A biased observational study with a reasonably large sample size may well be closer to the truth in specific trials than an underpowered RCT …

The questionable claims made about the superiority of RCTs as the “gold standard” have had a distorting influence on the use of impact evaluations to inform development policymaking, given that randomization is only feasible for a non-random subset of policies. When a program is community- or economy-wide or there are pervasive spillover effects from those treated to those not, an RCT will be of little help, and may well be deceptive. The tool is only well suited to a rather narrow range of development policies, and even then it will not address many of the questions that policymakers ask. Advocating RCTs as the best, or even only, scientific method for impact evaluation risks distorting our knowledge base for fighting poverty.

Even if you manage to do the assignment to treatment and control groups ideally random, the sample selection certainly is — except in extremely rare cases — not random. Even if we make a proper randomized assignment, if we apply the results to a biased sample, there is always the risk that the experimental findings will not apply. What works ‘there,’ does not work ‘here.’ Randomization hence does not ‘guarantee ‘ or ‘ensure’ making the right causal claim. Although randomization may help us rule out certain possible causal claims, randomization per se does not guarantee anything!

There is almost always a trade-off between bias and precision. In real-world settings, a little bias often does not overtrump greater precision. And — most importantly — in case we have a population with sizeable heterogeneity, the average treatment effect of the sample may differ substantially from the average treatment effect in the population. If so, the value of any extrapolating inferences made from trial samples to other populations is highly questionable.

And — as underscored by Ravallion — since most real-world experiments and trials build on performing a single randomization, what would happen if you kept on randomizing forever, does not help you to ‘ensure’ or ‘guarantee’ that you do not make false causal conclusions in the one particular randomized experiment you actually do perform. It is indeed difficult to see why thinking about what you know you will never do, would make you happy about what you actually do.

## ‘Doctor, it hurts when I p’

14 Apr, 2020 at 10:07 | Posted in Statistics & Econometrics | Comments Off on ‘Doctor, it hurts when I p’A low-powered study is only going to be able to see a pretty big effect. But sometimes you know that the effect, if it exists, is small. In other words, a study that accurately measures the effect … is likely to be rejected as statistically insignificant, while any result that passes the p < .05 test is either a false positive or a true positive that massively overstates the … effect.

…

A conventional boundary, obeyed long enough, can be easily mistaken for an actual thing in the world. Imagine if we talked about the state of the economy this way! Economists have a formal definition of a 'recession,' which depends on arbitrary thresholds just as 'statistical significance' does. One doesn't say, 'I don't care about the unemployment rate, or housing starts, or the aggregate burden of student loans, or the federal deficit; if it's not a recession, we're not going to talk about it.' One would be nuts to say so. The critics — and there are more of them, and they are louder, each year — say that a great deal of scientific practice is nuts in just this way.

If anything, this underlines how important it is not to equate science with statistical calculation. All science entail human judgement, and using statistical models doesn’t relieve us of that necessity. Working with misspecified models, the scientific value of significance testing is actually zero — even though you’re making valid statistical inferences! Statistical models and concomitant significance tests are no substitutes for doing real science. Or as a noted German philosopher once famously wrote:

There is no royal road to science, and only those who do not dread the fatiguing climb of its steep paths have a chance of gaining its luminous summits.

Statistical significance doesn’t say that something is important or true. Since there already are far better and more relevant testing that can be done (see e. g. here and here), it is high time to consider what should be the proper function of what has now really become a statistical fetish. Given that it anyway is very unlikely than any population parameter is exactly zero, and that contrary to assumption most samples in social science and economics are not random or having the right distributional shape – why continue to press students and researchers to do null hypothesis significance testing, testing that relies on a weird backward logic that students and researchers usually don’t understand?

In its standard form, a significance test is not the kind of “severe test” that we are looking for in our search for being able to confirm or disconfirm empirical scientific hypothesis. This is problematic for many reasons, one being that there is a strong tendency to accept the null hypothesis since it can’t be rejected at the standard 5% significance level. In their standard form, significance tests bias against new hypotheses by making it hard to disconfirm the null hypothesis.

As shown over and over again when it is applied, people have a tendency to read “not disconfirmed” as “probably confirmed.” And — most importantly — we should of course never forget that the underlying parameters we use when performing significance tests are *model constructions*. Our p-values mean next to nothing if the model is wrong. As David Freedman writes in *Statistical Models and Causal Inference*:

I believe model validation to be a central issue. Of course, many of my colleagues will be found to disagree. For them, fitting models to data, computing standard errors, and performing significance tests is “informative,” even though the basic statistical assumptions (linearity, independence of errors, etc.) cannot be validated. This position seems indefensible, nor are the consequences trivial. Perhaps it is time to reconsider.

## An introduction to probability density functions (student stuff)

12 Apr, 2020 at 19:00 | Posted in Statistics & Econometrics | Comments Off on An introduction to probability density functions (student stuff)

## Some basic COVID-19 mathematics

10 Apr, 2020 at 10:29 | Posted in Statistics & Econometrics | 6 Comments

## Simpson’s paradox and the limits of econometrics

8 Apr, 2020 at 08:10 | Posted in Statistics & Econometrics | 2 Comments

From a more theoretical perspective, Simpson’s paradox importantly shows that causality can never be reduced to a question of statistics or probabilities.

To understand causality we always have to relate it to a specific causal *structure*. Statistical correlations are *never* enough. No structure, no causality.

Simpson’s paradox is an interesting paradox in itself, but it can also highlight a deficiency in the traditional econometric approach towards causality. Say you have 1000 observations on men and an equal amount of observations on women applying for admission to university studies, and that 70% of men are admitted, but only 30% of women. Running a logistic regression to find out the odds ratios (and probabilities) for men and women on admission, females seem to be in a less favourable position (‘discriminated’ against) compared to males (male odds are 2.33, female odds are 0.43, giving an odds ratio of 5.44). But once we find out that males and females apply to different departments we may well get a Simpson’s paradox result where males turn out to be ‘discriminated’ against (say 800 male apply for economics studies (680 admitted) and 200 for physics studies (20 admitted), and 100 female apply for economics studies (90 admitted) and 900 for physics studies (210 admitted) — giving odds ratios of 0.62 and 0.37).

Econometric patterns should never be seen as anything else than possible clues to follow. From a critical realist perspective, it is obvious that behind observable data there are real structures and mechanisms operating, things that are — if we really want to understand, explain and (possibly) predict things in the real world — more important to get hold of than to simply correlate and regress observable variables.

Math cannot establish the truth value of a fact. Never has. Never will.

## Economic growth and the size of the ‘private sector’

26 Mar, 2020 at 17:52 | Posted in Statistics & Econometrics | 2 CommentsEconomic growth has since long interested economists. Not least, the question of which factors are behind high growth rates has been in focus. The factors usually pointed at are mainly economic, social and political variables. In an interesting study from the University of Helsinki, Tatu Westling expanded the potential causal variables to also include biological and sexual variables. In the report *Male Organ and Economic Growth: Does Size Matter* (2011), he was — based on the ‘cross-country’ data of Mankiw et al (1992), Summers and Heston (1988), Polity IV Project data of political regime types and a data set on average penis size in 76 non-oil producing countries (www.everyoneweb.com/worldpenissize) — able to show that the level and growth of GDP per capita between 1960 and 1985 varies with penis size. Replicating Westling’s study — yours truly has used his favourite program Gretl — we obtain the following two charts:

The Solow-based model estimates show that the maximum GDP is achieved with the penis of about 13.5 cm and that the male reproductive organ (OLS without control variables) are negatively correlated with — and able to ‘explain’ 20% of the variation in — GDP growth.

Even with the reservation for problems such as endogeneity and confounders one can not but agree with Westling’s final assessment that “the ‘male organ hypothesis’ is worth pursuing in future research” and that it “clearly seems that the ‘private sector’ deserves more credit for economic development than is typically acknowledged.” Or? …

## Econometric modelling as junk science

20 Mar, 2020 at 12:09 | Posted in Statistics & Econometrics | Comments Off on Econometric modelling as junk scienceDo you believe that 10 to 20% of the decline in crime in the 1990s was caused by an increase in abortions in the 1970s? Or that the murder rate would have increased by 250% since 1974 if the United States had not built so many new prisons? Did you believe predictions that the welfare reform of the 1990s would force 1,100,000 children into poverty?

If you were misled by any of these studies, you may have fallen for a pernicious form of junk science: the use of mathematical modeling to evaluate the impact of social policies. These studies are superficially impressive. Produced by reputable social scientists from prestigious institutions, they are often published in peer reviewed scientific journals. They are filled with statistical calculations too complex for anyone but another specialist to untangle. They give precise numerical “facts” that are often quoted in policy debates. But these “facts” turn out to be will o’ the wisps …

These predictions are based on a statistical technique called multiple regression that uses correlational analysis to make causal arguments … The problem with this, as anyone who has studied statistics knows, is that correlation is not causation. A correlation between two variables may be “spurious” if it is caused by some third variable. Multiple regression researchers try to overcome the spuriousness problem by including all the variables in analysis. The data available for this purpose simply is not up to this task, however, and the studies have consistently failed.

Mainstream economists often hold the view that if you are critical of econometrics it can only be because you are a sadly misinformed and misguided person who dislike and do not understand much of it.

As Goertzel’s eminent article shows, this is, however, nothing but a gross misapprehension.

To apply statistical and mathematical methods to the real-world economy, the econometrician has to make some quite strong, limiting, and unreal assumptions (completeness, homogeneity, stability, measurability, independence, linearity, additivity, etc., etc.)

Building econometric models can’t be a goal in itself. Good econometric models are means that make it possible for us to infer things about the real-world systems they ‘represent.’ If we can’t show that the mechanisms or causes that we isolate and handle in our econometric models are ‘exportable’ to the real world, they are of limited value to our understanding, explanations or predictions of real-world economic systems.

Real-world social systems are usually not governed by stable causal mechanisms or capacities. The kinds of ‘laws’ and relations that econometrics has established, are laws and relations about entities in models that presuppose causal mechanisms and variables — and the relationship between them — being linear, additive, homogenous, stable, invariant and atomistic. But — when causal mechanisms operate in the real world they only do it in ever-changing and unstable combinations where the whole is more than a mechanical sum of parts. Since econometricians haven’t been able to convincingly warrant their assumptions of homogeneity, stability, invariance, independence, additivity as being ontologically isomorphic to real-world economic systems, I remain doubtful of the scientific aspirations of econometrics.

There are fundamental logical, epistemological and ontological problems of applying statistical methods to a basically unpredictable, uncertain, complex, unstable, interdependent, and ever-changing social reality. Methods designed to analyse repeated sampling in controlled experiments under fixed conditions are not easily extended to an organic and non-atomistic world where time and history play decisive roles.

Econometric modelling should never be a substitute for thinking. From that perspective, it is really depressing to see how much of Keynes’ critique of the pioneering econometrics in the 1930s-1940s is still relevant today. And that is also a reason why social scientists like Goertzl and yours truly keep on criticizing it.

The general line you take is interesting and useful. It is, of course, not exactly comparable with mine. I was raising the logical difficulties. You say in effect that, if one was to take these seriously, one would give up the ghost in the first lap, but that the method, used judiciously as an aid to more theoretical enquiries and as a means of suggesting possibilities and probabilities rather than anything else, taken with enough grains of salt and applied with superlative common sense, won’t do much harm. I should quite agree with that. That is how the method ought to be used.

Keynes, letter to E.J. Broster, December 19, 1939

Blog at WordPress.com.

Entries and comments feeds.