## Econometrics — junk science with no relevance whatsoever to real-world economics

17 Oct, 2019 at 09:47 | Posted in Statistics & Econometrics | 3 CommentsDo you believe that 10 to 20% of the decline in crime in the 1990s was caused by an increase in abortions in the 1970s? Or that the murder rate would have increased by 250% since 1974 if the United States had not built so many new prisons? Did you believe predictions that the welfare reform of the 1990s would force 1,100,000 children into poverty?

If you were misled by any of these studies, you may have fallen for a pernicious form of junk science: the use of mathematical modeling to evaluate the impact of social policies. These studies are superficially impressive. Produced by reputable social scientists from prestigious institutions, they are often published in peer reviewed scientific journals. They are filled with statistical calculations too complex for anyone but another specialist to untangle. They give precise numerical “facts” that are often quoted in policy debates. But these “facts” turn out to be will o’ the wisps …

These predictions are based on a statistical technique called multiple regression that uses correlational analysis to make causal arguments … The problem with this, as anyone who has studied statistics knows, is that correlation is not causation. A correlation between two variables may be “spurious” if it is caused by some third variable. Multiple regression researchers try to overcome the spuriousness problem by including all the variables in analysis. The data available for this purpose simply is not up to this task, however, and the studies have consistently failed.

Mainstream economists often hold the view that if you are critical of econometrics it can only be because you are a sadly misinformed and misguided person who dislike and do not understand much of it.

As Goertzel’s eminent article shows, this is, however, nothing but a gross misapprehension.

And just as Goertzel, Keynes certainly did not misunderstand the crucial issues at stake in his critique of econometrics. Quite the contrary. He knew them all too well — and was not satisfied with the validity and philosophical underpinnings of the assumptions made for applying its methods.

Keynes’ critique is still valid and unanswered in the sense that the problems he pointed at are still with us today and ‘unsolved.’ Ignoring them — the most common practice among applied econometricians — is not to solve them.

To apply statistical and mathematical methods to the real-world economy, the econometrician has to make some quite strong assumptions. In a review of Tinbergen’s econometric work — published in *The Economic Journal* in 1939 — Keynes gave a comprehensive critique of Tinbergen’s work, focusing on the limiting and unreal character of the assumptions that econometric analyses build on:

**Completeness**: Where Tinbergen attempts to specify and quantify which different factors influence the business cycle, Keynes maintains there has to be a complete list of *all* the relevant factors to avoid misspecification and spurious causal claims. Usually, this problem is ‘solved’ by econometricians assuming that they somehow have a ‘correct’ model specification. Keynes is, to put it mildly, unconvinced:

It will be remembered that the seventy translators of the Septuagint were shut up in seventy separate rooms with the Hebrew text and brought out with them, when they emerged, seventy identical translations. Would the same miracle be vouchsafed if seventy multiple correlators were shut up with the same statistical material? And anyhow, I suppose, if each had a different economist perched on his

a priori, that would make a difference to the outcome.

**Homogeneity**: To make inductive inferences possible — and being able to apply econometrics — the system we try to analyse has to have a large degree of ‘homogeneity.’ According to Keynes most social and economic systems — especially from the perspective of real historical time — lack that ‘homogeneity.’ As he had argued already in *Treatise on Probability*, it wasn’t always possible to take repeated samples from a fixed population when we were analysing real-world economies. In many cases, there simply are no reasons at all to assume the samples to be homogenous. Lack of ‘homogeneity’ makes the principle of ‘limited independent variety’ non-applicable, and hence makes inductive inferences, strictly seen, impossible since one of its fundamental logical premises are not satisfied. Without “much repetition and uniformity in our experience” there is no justification for placing “great confidence” in our inductions.

And then, of course, there is also the ‘reverse’ variability problem of non-excitation: factors that do not change significantly during the period analysed, can still very well be extremely important causal factors.

**Stability:** Tinbergen assumes there is a stable spatio-temporal relationship between the variables his econometric models analyze. But as Keynes had argued already in his *Treatise on Probability* it was not really possible to make inductive generalisations based on correlations in one sample. As later studies of ‘regime shifts’ and ‘structural breaks’ have shown us, it is exceedingly difficult to find and establish the existence of stable econometric parameters for anything but rather short time series.

**Measurability:** Tinbergen’s model assumes that all relevant factors are measurable. Keynes questions if it is possible to adequately quantify and measure things like expectations and political and psychological factors. And more than anything, he questioned — both on epistemological and ontological grounds — that it was always and everywhere possible to measure real-world uncertainty with the help of probabilistic risk measures. Thinking otherwise can, as Keynes wrote, “only lead to error and delusion.”

**Independence**: Tinbergen assumes that the variables he treats are independent (still a standard assumption in econometrics). Keynes argues that in such a complex, organic and evolutionary system as an economy, independence is a deeply unrealistic assumption to make. Building econometric models from that kind of simplistic and unrealistic assumptions risk producing nothing but spurious correlations and causalities. Real-world economies are organic systems for which the statistical methods used in econometrics are ill-suited, or even, strictly seen, inapplicable. Mechanical probabilistic models have little leverage when applied to non-atomic evolving organic systems — such as economies.

It is a great fault of symbolic pseudo-mathematical methods of formalising a system of economic analysis … that they expressly assume strict independence between the factors involved and lose all their cogency and authority if this hypothesis is disallowed; whereas, in ordinary discourse, where we are not blindly manipulating but know all the time what we are doing and what the words mean, we can keep “at the back of our heads” the necessary reserves and qualifications and the adjustments which we shall have to make later on, in a way in which we cannot keep complicated partial differentials “at the back” of several pages of algebra which assume that they all vanish.

Building econometric models can’t be a goal in itself. Good econometric models are means that make it possible for us to infer things about the real-world systems they ‘represent.’ If we can’t show that the mechanisms or causes that we isolate and handle in our econometric models are ‘exportable’ to the real world, they are of limited value to our understanding, explanations or predictions of real-world economic systems.

The kind of fundamental assumption about the character of material laws, on which scientists appear commonly to act, seems to me to be much less simple than the bare principle of uniformity. They appear to assume something much more like what mathematicians call the principle of the superposition of small effects, or, as I prefer to call it, in this connection, the

atomiccharacter of natural law. The system of the material universe must consist, if this kind of assumption is warranted, of bodies which we may term (without any implication as to their size being conveyed thereby)legal atoms, such that each of them exercises its own separate, independent, and invariable effect, a change of the total state being compounded of a number of separate changes each of which is solely due to a separate portion of the preceding state …Yet if different wholes were subject to laws

quawholes and not simply on account of and in proportion to the differences of their parts, knowledge of a part could not lead, it would seem, even to presumptive or probable knowledge as to its association with other parts.

**Linearity:** To make his models tractable, Tinbergen assumes the relationships between the variables he study to be linear. This is still standard procedure today, but as Keynes writes:

It is a very drastic and usually improbable postulate to suppose that all economic forces are of this character, producing independent changes in the phenomenon under investigation which are directly proportional to the changes in themselves; indeed, it is ridiculous.

To Keynes, it was a ‘fallacy of reification’ to assume that all quantities are additive (an assumption closely linked to independence and linearity).

The unpopularity of the principle of organic unities shows very clearly how great is the danger of the assumption of unproved additive formulas. The fallacy, of which ignorance of organic unity is a particular instance, may perhaps be mathematically represented thus: suppose f(x) is the goodness of x and f(y) is the goodness of y. It is then assumed that the goodness of x and y together is f(x) + f(y) when it is clearly f(x + y) and only in special cases will it be true that f(x + y) = f(x) + f(y). It is plain that it is never legitimate to assume this property in the case of any given function without proof.

J. M. Keynes “Ethics in Relation to Conduct” (1903)

And as even one of the founding fathers of modern econometrics — Trygve Haavelmo — wrote:

What is the use of testing, say, the significance of regression coefficients, when maybe, the whole assumption of the linear regression equation is wrong?

Real-world social systems are usually not governed by stable causal mechanisms or capacities. The kinds of ‘laws’ and relations that econometrics has established, are laws and relations about entities in models that presuppose causal mechanisms and variables — and the relationship between them — being linear, additive, homogenous, stable, invariant and atomistic. But — when causal mechanisms operate in the real world they only do it in ever-changing and unstable combinations where the whole is more than a mechanical sum of parts. Since statisticians and econometricians — as far as I can see — haven’t been able to convincingly warrant their assumptions of homogeneity, stability, invariance, independence, additivity as being ontologically isomorphic to real-world economic systems, Keynes’ critique is still valid. As long as — as Keynes writes in a letter to Frisch in 1935 — “nothing emerges at the end which has not been introduced expressively or tacitly at the beginning,” I remain doubtful of the scientific aspirations of econometrics.

In his critique of Tinbergen, Keynes points us to the fundamental logical, epistemological and ontological problems of applying statistical methods to a basically unpredictable, uncertain, complex, unstable, interdependent, and ever-changing social reality. Methods designed to analyse repeated sampling in controlled experiments under fixed conditions are not easily extended to an organic and non-atomistic world where time and history play decisive roles.

Econometric modelling should never be a substitute for thinking. From that perspective, it is really depressing to see how much of Keynes’ critique of the pioneering econometrics in the 1930s-1940s is still relevant today. And that is also a reason why we — as does Goertzl — have to keep on criticizing it.

The general line you take is interesting and useful. It is, of course, not exactly comparable with mine. I was raising the logical difficulties. You say in effect that, if one was to take these seriously, one would give up the ghost in the first lap, but that the method, used judiciously as an aid to more theoretical enquiries and as a means of suggesting possibilities and probabilities rather than anything else, taken with enough grains of salt and applied with superlative common sense, won’t do much harm. I should quite agree with that. That is how the method ought to be used.

Keynes, letter to E.J. Broster, December 19, 1939

## ‘Goodness of fit’ is not what social science is about

6 Oct, 2019 at 11:10 | Posted in Statistics & Econometrics | Leave a commentWhich independent variables should be included in the equation? The goal is a “good fit” … How can a good fit be recognized? A popular measure for the satisfactoriness of a regression is the coefficient of determination,

R. If this number is large, it is said, the regression gives a good fit …^{2}Nothing about

Rsupports these claims. This statistic is best regarded as characterizing the geometric shape of the regression points and not much more.^{2}The central difficulty with

Rfor social scientists is that the independent variables are not subject to experimental manipulation. In some samples, they vary widely, producing large variance; in other cases, the observations are more tightly grouped and there is little dispersion. The variances are a function of the^{2}sample, not of the underlying relationship. Hence they cannot have any real connection to the “strength” of the relationship as social scientists ordinarily use the term, i. e., as a measure of how much effect a given change in independent variable has on the dependent variable …Thus “maximizing

R” cannot be a reasonable procedure for arriving at a strong relationship. It neither measures causal power nor is comparable across samples … “Explaining variance” is not what social science is about.^{2}

## The lack of positive results in econometrics

12 Sep, 2019 at 11:33 | Posted in Statistics & Econometrics | Comments Off on The lack of positive results in econometricsFor the sake of balancing the overly rosy picture of econometric achievements given in the usual econometrics textbooks today, it may be interesting to see how Trygve Haavelmo — with the completion (in 1958) of the twenty-fifth volume of *Econometrica — *assessed the role of econometrics in the advancement of economics.

We have found certain general principles which would seem to make good sense. Essentially, these principles are based on the reasonable idea that, if an economic model is in fact “correct” or “true,” we can say something a priori about the way in which the data emerging from it must behave. We can say something, a priori, about whether it is theoretically possible to estimate the parameters involved. And we can decide, a priori, what the proper estimation procedure should be … But the concrete results of these efforts have often been a seemingly

lower degree of accuracyof the would-be economic laws (i.e., larger residuals), or coefficients that seem a priori less reasonable than those obtained by using cruder or clearly inconsistent methods.There is the possibility that the more stringent methods we have been striving to develop have actually opened our eyes to recognize a plain fact: viz., that the “laws” of economics are not very accurate in the sense of a close fit, and that we have been living in a dream-world of large but somewhat superficial or spurious correlations.

Since statisticians and econometricians have not been able to convincingly warrant their assumptions — homogeneity, stability, invariance, independence, additivity, and so on — as being ontologically isomorphic to real-world economic systems, there are still strong reasons to be critical of the econometric project. There are deep epistemological and ontological problems of applying statistical methods to a basically unpredictable, uncertain, complex, unstable, interdependent, and ever-changing social reality. Methods designed to analyse repeated sampling in controlled experiments under fixed conditions are not easily extended to an organic and non-atomistic world where time and history play decisive roles.

Econometric modelling should never be a substitute for thinking.

The general line you take is interesting and useful. It is, of course, not exactly comparable with mine. I was raising the logical difficulties. You say in effect that, if one was to take these seriously, one would give up the ghost in the first lap, but that the method, used judiciously as an aid to more theoretical enquiries and as a means of suggesting possibilities and probabilities rather than anything else, taken with enough grains of salt and applied with superlative common sense, won’t do much harm. I should quite agree with that. That is how the method ought to be used.

Keynes, letter to E.J. Broster, December 19, 1939

## It’s not just p = 0.048 vs. p = 0.052

9 Sep, 2019 at 17:37 | Posted in Statistics & Econometrics | Comments Off on It’s not just p = 0.048 vs. p = 0.052“[G]iven the realities of real-world research, it seems goofy to say that a result with, say, only a 4.8% probability of happening by chance is “significant,” while if the result had a 5.2% probability of happening by chance it is “not significant.” Uncertainty is a continuum, not a black-and-white difference” …

My problem with the 0.048 vs. 0.052 thing is that it way, way, way understates the problem.

Yes, there’s no stable difference between p = 0.048 and p = 0.052.

But there’s also no stable difference between p = 0.2 (which is considered non-statistically significant by just about everyone) and p = 0.005 (which is typically considered very strong evidence) …

If these two p-values come from two identical experiments, then the standard error of their difference is sqrt(2) times the standard error of each individual estimate, hence that difference in p-values itself is only (2.81 – 1.28)/sqrt(2) = 1.1 standard errors away from zero …

So. Yes, it seems goofy to draw a bright line between p = 0.048 and p = 0.052. But it’s also goofy to draw a bright line between p = 0.2 and p = 0.005. There’s a lot less information in these p-values than people seem to think.

So, when we say that the difference between “significant” and “not significant” is not itself statistically significant, “we are not merely making the commonplace observation that any particular threshold is arbitrary—for example, only a small change is required to move an estimate from a 5.1% significance level to 4.9%, thus moving it into statistical significance. Rather, we are pointing out that even large changes in significance levels can correspond to small, nonsignificant changes in the underlying quantities.”

## Kitchen sink econometrics

8 Sep, 2019 at 11:45 | Posted in Statistics & Econometrics | 1 CommentWhen I present this argument … one or more scholars say, “But shouldn’t I control for everything I can in my regressions? If not, aren’t my coefficients biased due to excluded variables?” This argument is not as persuasive as it may seem initially. First of all, if what you are doing is misspecified already, then adding or excluding other variables has no tendency to make things consistently better or worse … The excluded variable argument only works if you are sure your specification is precisely correct with all variables included. But no one can know that with more than a handful of explanatory variables.

Still more importantly, big, mushy linear regression and probit equations seem to need a great many control variables precisely because they are jamming together all sorts of observations that do not belong together. Countries, wars, racial categories, religious preferences, education levels, and other variables that change people’s coefficients are “controlled” with dummy variables that are completely inadequate to modeling their effects. The result is a long list of independent variables, a jumbled bag of nearly unrelated observations, and often a hopelessly bad specification with meaningless (but statistically significant with several asterisks!) results.A preferable approach is to separate the observations into meaningful subsets—internally compatible statistical regimes … If this can’t be done, then statistical analysis can’t be done. A researcher claiming that nothing else but the big, messy regression is possible because, after all, some results have to be produced, is like a jury that says, “Well, the evidence was weak, but somebody had to be convicted.”

The empirical and theoretical evidence is clear. Predictions and forecasts are inherently difficult to make in a socio-economic domain where genuine uncertainty and unknown unknowns often rule the roost. The real processes that underly the time series that economists use to make their predictions and forecasts do not conform with the assumptions made in the applied statistical and econometric models. Much less is *a fortiori* predictable than standardly — and uncritically — assumed. The forecasting models fail to a large extent because the kind of uncertainty that faces humans and societies actually makes the models strictly seen inapplicable. The future is inherently unknowable — and using statistics, econometrics, decision theory or game theory, does not in the least overcome this ontological fact. The economic future is not something that we normally can predict in advance. Better then to accept that as a rule ‘we simply do not know.’

We could, of course, just *assume *that the world is ergodic and hence convince ourselves that we can predict the future by looking at the past. Unfortunately, economic systems do *not* display that property. So we simply have to accept that all our forecasts are fragile.

## A guide to econometrics

5 Sep, 2019 at 11:11 | Posted in Statistics & Econometrics | 1 Comment1. Thou shalt use common sense and economic theory.

2. Thou shalt ask the right question.

3. Thou shalt know the context.

4. Thou shalt inspect the data.

5. Thou shalt not worship complexity.

6. Thou shalt look long and hard at thy results.

7. Thou shalt beware the costs of data mining.

8. Thou shalt be willing to compromise.

9. Thou shalt not confuse statistical significance with substance.

10. Thou shalt confess in the presence of sensitivity.

## Econometric forecasting — no icing on the economist’s cake

29 Aug, 2019 at 21:44 | Posted in Statistics & Econometrics | 1 CommentIt is clearly the case that experienced modellers could easily come up with significantly different models based on the same set of data thus undermining claims to researcher-independent objectivity. This has been demonstrated empirically by Magnus and Morgan (1999) who conducted an experiment in which an apprentice had to try to replicate the analysis of a dataset that might have been carried out by three different experts (Leamer, Sims, and Hendry) following their published guidance. In all cases the results were different from each other, and different from that which would have been produced by the expert, thus demonstrating the importance of tacit knowledge in statistical analysis.

Magnus and Morgan conducted a further experiment which involved eight expert teams, from different universities, analysing the same sets of data each using their own particular methodology. The data concerned the demand for food in the US and in the Netherlands and was based on a classic study by Tobin (1950) augmented with more recent data. The teams were asked to estimate the income elasticity of food demand and to forecast per capita food consumption. In terms of elasticities, the lowest estimates were around 0.38 whilst the highest were around 0.74 – clearly vastly different especially when remembering that these were based on

the same sets of data. The forecasts were perhaps even more extreme – from a base of around 4000 in 1989 the lowest forecast for the year 2000 was 4130 while the highest was nearly 18000!

## Why attractive people you date tend to be jerks

28 Aug, 2019 at 10:17 | Posted in Statistics & Econometrics | 2 CommentsHave you ever noticed that, among the people you date, the attractive ones tend to be jerks? Instead of constructing elaborate psychosocial theories, consider a simpler explanation. Your choice of people to date depends on two factors, attractiveness and personality. You’ll take a chance on dating a mean attractive person or a nice unattractive person, and certainly a nice attractive person, but not a mean unattractive person … This creates a spurious negative correlation between attractiveness and personality. The sad truth is that unattractive people are just as mean as attractive people — but you’ll never realize it, because you’ll never date somebody who is both mean and unattractive.

The spurious correlation — ‘collider bias’ — is here induced because the outcome of the two variables is ‘controlled for’. Mean people are not necessarily attractive, and nor are nice people. Looking only at people that do date, you would however probably guess that the mean ones are attractive. In order to date lack of nicety has to be compensated with attractiveness.

If anything this should be a helpful reminder for economists who nowadays seem to be more than happy to add lots of variables to their regressions ‘controlling for’ omitted variables bias. In this case, dating someone is a collider of multiple causes — attractiveness and personality — that gives the false impression that there is a trade-off between the two variables.

## The Illusion of Certainty

27 Aug, 2019 at 12:52 | Posted in Statistics & Econometrics | Comments Off on The Illusion of Certainty

## What — if anything — do p-values test?

21 Aug, 2019 at 11:43 | Posted in Statistics & Econometrics | 1 CommentUnless enforced by study design and execution, statistical assumptions usually have no external justification; they may even be quite implausible. As result, one often sees attempts to justify specific assumptions with statistical tests, in the belief that a high p-value or ‘‘nonsignificance’’ licenses the assumption and a low p-value refutes it. Such focused testing is based on a meta-assumption that every other assumption used to derive the p-value is correct, which is a poor judgment when some of those other assumptions are uncertain. In that case (and in general) one should recognize that the p-value is simultaneously testing all the assumptions used to compute it – in particular, a null p-value actually tests the entire model, not just the stated hypothesis or assumption it is presumed to test.

All science entail human judgement, and using statistical models doesn’t relieve us of that necessity. Working with misspecified models, the scientific value of significance testing is actually zero — even though you’re making valid statistical inferences! Statistical models and concomitant significance tests are no substitutes for doing real science.

In its standard form, a significance test is not the kind of ‘severe test’ that we are looking for in our search for being able to confirm or disconfirm empirical scientific hypotheses. This is problematic for many reasons, one being that there is a strong tendency to accept the null hypothesis since they can’t be rejected at the standard 5% significance level. In their standard form, significance tests bias against new hypotheses by making it hard to disconfirm the null hypothesis.

And as shown over and over again when it is applied, people have a tendency to read “not disconfirmed” as “probably confirmed.” Standard scientific methodology tells us that when there is only say a 10 % probability that pure sampling error could account for the observed difference between the data and the null hypothesis, it would be more “reasonable” to conclude that we have a case of disconfirmation. Especially if we perform many independent tests of our hypothesis and they all give the same 10% result as our reported one, I guess most researchers would count the hypothesis as even more disconfirmed.

Statistics is no substitute for thinking. We should never forget that the underlying parameters we use when performing significance tests are *model constructions*. Our p-values mean next to nothing if the model is wrong. Statistical significance tests *do not* validate models!

## Econometrics and the problem of unjustified assumptions

21 Aug, 2019 at 11:03 | Posted in Statistics & Econometrics | 1 CommentThere seems to be a pervasive human aversion to uncertainty, and one way to reduce feelings of uncertainty is to invest faith in deduction as a sufficient guide to truth. Unfortunately, such faith is as logically unjustified as any religious creed, since a deduction produces certainty about the real world only when its assumptions about the real world are certain …

Unfortunately, assumption uncertainty reduces the status of deductions and statistical computations to exercises in hypothetical reasoning – they provide best-case scenarios of what we could infer from specific data (which are assumed to have only specific, known problems). Even more unfortunate, however, is that this exercise is deceptive to the extent it ignores or misrepresents available information, and makes hidden assumptions that are unsupported by data …

Econometrics supplies dramatic cautionary examples in which complex modelling has failed miserably in important applications …

Yes, indeed, econometrics fails miserably over and over again.

One reason why it does, is that the error term in the regression models used is thought of as representing the effect of the variables that were omitted from the models. The error term is somehow thought to be a ‘cover-all’ term representing omitted content in the model and necessary to include to ‘save’ the assumed deterministic relation between the other random variables included in the model. Error terms are usually assumed to be orthogonal (uncorrelated) to the explanatory variables. But since they are unobservable, they are also impossible to empirically test. And without justification of the orthogonality assumption, there is, as a rule, nothing to ensure identifiability:

Distributional assumptions about error terms are a good place to bury things because hardly anyone pays attention to them. Moreover, if a critic does see that this is the identifying assumption, how can she win an argument about the true expected value the level of aether? If the author can make up an imaginary variable, “because I say so” seems like a pretty convincing answer to any question about its properties.

Nowadays it has almost become a self-evident truism among economists that you cannot expect people to take your arguments seriously unless they are based on or backed up by advanced econometric modelling. So legions of mathematical-statistical theorems are proved — and heaps of fiction are being produced, masquerading as science. The rigour of the econometric modelling and the far-reaching assumptions they are built on is frequently not supported by data.

Econometrics is basically a deductive method. Given the assumptions, it delivers deductive inferences. The problem, of course, is that we almost never know when the assumptions are right. Conclusions can only be as certain as their premises — and that also applies to econometrics.

Econometrics doesn’t establish the truth value of facts. Never has. Never will.

## Sloppy regression interpretations

14 Aug, 2019 at 16:57 | Posted in Statistics & Econometrics | Comments Off on Sloppy regression interpretationsIn most econometrics textbooks the authors give an interpretation of a linear regression such as

Y = a + bX,

saying that a one-unit *increase* in X (years of education) will cause a b unit *increase* in Y (wages).

Dealing with time-series regressions this may well be OK. The problem is that this ‘dynamic’ interpretation of b is standardly also given as the ‘explanation’ of the slope coefficient for cross-sectional data. But in that case, the only increase that can generally come in question is in the value of X (years of education) when going from individual to another individual in the population. If we are interested — as we usually are — in saying something about the dynamics of an individual’s wages and education, we have to look elsewhere (unless we assume cross-unit and cross-time invariance, which, of course, would be utterly ridiculous from a perspective of relevance and realism).

## Understanding regression assumptions

12 Aug, 2019 at 20:45 | Posted in Statistics & Econometrics | Comments Off on Understanding regression assumptionsAlthough most social scientists can recite the formal definitions of the various regression assumptions, many have little appreciation of the substantive meanings of these assumptions. And unless the meanings of these assumptions are understood, regression analysis almost inevitably will be a rigid exercise in which a handful of independent variables are cavalierly inserted into a standard linear additive regression and coefficients are estimated. Although such an exercise may occasionally produce results that are worth believing, it will do so only when an analyst is very lucky … This monograph was written to encourage students to … think of regression assumptions as a vital set of conditions the applicability of which must be explicitly analyzed each time regression analysis is utilized.

I thought this book was great when I first read it 25 years ago.

I still do.

## On the applicability of statistics in social sciences

8 Aug, 2019 at 13:32 | Posted in Statistics & Econometrics | Comments Off on On the applicability of statistics in social sciencesEminent statistician David Salsburg is rightfully very critical of the way social scientists — including economists and econometricians — uncritically and without arguments have come to simply assume that they can apply probability distributions from statistical theory on their own area of research:

We assume there is an abstract space of elementary things called ‘events’ … If a measure on the abstract space of events fulfills certain axioms, then it is a probability. To use probability in real life, we have to identify this space of events and do so with sufficient specificity to allow us to actually calculate probability measurements on that space … Unless we can identify [this] abstract space, the probability statements that emerge from statistical analyses will have many different and sometimes contrary meanings …

Kolmogorov established the mathematical meaning of probability: Probability is a measure of sets in an abstract space of events. All the mathematical properties of probability can be derived from this definition. When we wish to apply probability to real life, we need to identify that abstract space of events for the particular problem at hand … It is not well established when statistical methods are used for observational studies … If we cannot identify the space of events that generate the probabilities being calculated, then one model is no more valid than another … As statistical models are used more and more for observational studies to assist in social decisions by government and advocacy groups, this fundamental failure to be able to derive probabilities without ambiguity will cast doubt on the usefulness of these methods.

Wise words well worth pondering on.

As long as economists and statisticians cannot really identify their statistical theories with real-world phenomena there is no real warrant for taking their statistical inferences seriously.

Just as there is no such thing as a ‘free lunch,’ there is no such thing as a ‘free probability.’ To be able at all to talk about probabilities, you have to specify a model. If there is no chance set-up or model that generates the probabilistic outcomes or events – in statistics one refers to any process where you observe or measure as an experiment (rolling a die) and the results obtained as the outcomes or events (number of points rolled with the die, being e. g. 3 or 5) of the experiment – there strictly seen is no event at all.

Probability is — as strongly argued by Keynes — a relational thing. It always must come with a specification of the model from which it is calculated. And then to be of any empirical scientific value it has to be shown to coincide with (or at least converge to or approximate) real data generating processes or structures — something seldom or never done!

And this is the basic problem with economic data. If you have a fair roulette-wheel, you can arguably specify probabilities and probability density distributions. But how do you conceive of the analogous ‘nomological machines’ for prices, gross domestic product, income distribution etc? Only by a leap of faith. And that does not suffice. You have to come up with some really good arguments if you want to persuade people into believing in the existence of socio-economic structures that generate data with characteristics conceivable as stochastic events portrayed by probabilistic density distributions!

## Econometric illusions

3 Aug, 2019 at 10:07 | Posted in Statistics & Econometrics | 5 CommentsBecause I was there when the economics department of my university got an IBM 360, I was very much caught up in the excitement of combining powerful computers with economic research. Unfortunately, I lost interest in econometrics almost as soon as I understood how it was done. My thinking went through four stages:

1.Holy shit! Do you see what you can do with a computer’s help.

2.Learning computer modeling puts you in a small class where only other members of the caste can truly understand you. This opens up huge avenues for fraud:

3.The main reason to learn stats is to prevent someone else from committing fraud against you.

4.More and more people will gain access to the power of statistical analysis. When that happens, the stratification of importance within the profession should be a matter of who asks the best questions.Disillusionment began to set in. I began to suspect that all the really interesting economic questions were FAR beyond the ability to reduce them to mathematical formulas. Watching computers being applied to other pursuits than academic economic investigations over time only confirmed those suspicions.

1.Precision manufacture is an obvious application for computing. And for many applications, this worked magnificently. Any design that combined straight line and circles could be easily described for computerized manufacture. Unfortunately, the really interesting design problems can NOT be reduced to formulas. A car’s fender, for example, can not be describe using formulas—it can only be described by specifying an assemblage of multiple points. If math formulas cannot describe something as common and uncomplicated as a car fender, how can it hope to describe human behavior?

2.When people started using computers for animation, it soon became apparent that human motion was almost impossible to model correctly. After a great deal of effort, the animators eventually put tracing balls on real humans and recorded that motion before transferring it to the the animated character. Formulas failed to describe simple human behavior—like a toddler trying to walk.Lately, I have discovered a Swedish economist who did NOT give up econometrics merely because it sounded so impossible. In fact, he still teaches the stuff. But for the rest of us, he systematically destroys the pretensions of those who think they can describe human behavior with some basic Formulas.

Maintaining that economics is a science in the ‘true knowledge’ business, that Swedish economist remains a sceptic of the pretences and aspirations of econometrics. The marginal return on its ever higher technical sophistication in no way makes up for the lack of serious under-labouring of its deeper philosophical and methodological foundations that already Keynes complained about. The rather one-sided emphasis of usefulness and its concomitant instrumentalist justification cannot hide that the legions of probabilistic econometricians who give supportive evidence for their considering it ‘fruitful to believe’ in the possibility of treating unique economic data as the observable results of random drawings from an imaginary sampling of an imaginary population, are skating on thin ice.

A rigorous application of econometric methods in economics really presupposes that the phenomena of our real world economies are ruled by stable causal relations between variables. The endemic lack of both explanatory and predictive success of the econometric project indicate that this hope of finding fixed parameters is an incredible hope for which there, really, is no other ground than hope itself.

Blog at WordPress.com.

Entries and comments feeds.