## Econometrics — a Keynesian perspective

27 May, 2016 at 14:05 | Posted in Statistics & Econometrics | Leave a commentIt will be remembered that the seventy translators of the Septuagint were shut up in seventy separate rooms with the Hebrew text and brought out with them, when they emerged, seventy identical translations. Would the same miracle be vouchsafed if seventy multiple correlators were shut up with the same statistical material? And anyhow, I suppose, if each had a different economist perched on his

a priori, that would make a difference to the outcome.

Mainstream economists today usually subscribe to the idea that although mathematical-statistical models are not ‘always the right guide for policy,’ they are still somehow necessary for making policy recommendations. The models are supposed to supply us with a necessary ‘discipline of thinking.’

This emphasis on the value of modeling should come as no surprise. Mainstreamers usually vehemently defend the formalization and mathematization that comes with the insistence of using a model building strategy in economics.

But if these math-is-the-message-modelers aren’t able to show that the mechanisms or causes that they isolate and handle in their mathematical-statistically formalised models are stable in the sense that they do not change when we ‘export’ them to our ‘target systems,’ these models do only hold under *ceteris paribus* conditions and are consequently of limited value to our understandings, explanations or predictions of real economic systems. Building models only to show ‘self-dicipline’ is setting the aspiration level far too low.

According to Keynes, science should help us penetrate to ‘the true process of causation lying behind current events’ and disclose ‘the causal forces behind the apparent facts.’ We should look out for causal relations. But models — mathematical, econometric, or what have you — can never be more than a starting point in that endeavour. There is always the possibility that there are other (non-quantifiable) variables – of vital importance and although perhaps unobservable and non-additive not necessarily epistemologically inaccessible – that were not considered for the formalized mathematical model.

The kinds of laws and relations that ‘modern’ economics has established, are laws and relations about mathematically formalized entities in models that presuppose causal mechanisms being atomistic and additive. When causal mechanisms operate in real world social target systems they only do it in ever-changing and unstable combinations where the whole is more than a mechanical sum of parts. If economic regularities obtain they do it (as a rule) only because we engineered them for that purpose. Outside man-made mathematical-statistical ‘nomological machines’ they are rare, or even non-existant. Whether econometric or not, that also, unfortunately, makes most of contemporary mainstream endeavours of economic modeling rather useless.

Econometric modeling should never be a substitute for thinking. From that perspective it is really depressing to see how much of Keynes’ critique of the pioneering econometrics in the 1930s-1940s is still relevant today.

The general line you take is interesting and useful. It is, of course, not exactly comparable with mine. I was raising the logical difficulties. You say in effect that, if one was to take these seriously, one would give up the ghost in the first lap, but that the method, used judiciously as an aid to more theoretical enquiries and as a means of suggesting possibilities and probabilities rather than anything else, taken with enough grains of salt and applied with superlative common sense, won’t do much harm. I should quite agree with that. That is how the method ought to be used.

Keynes, letter to E.J. Broster, December 19, 1939

## Friedman on the limited value of econometrics

24 May, 2016 at 13:18 | Posted in Statistics & Econometrics | 2 CommentsTinbergen’s results cannot be judged by ordinary tests of statistical significance. The reason is that the variables with which he winds up, the particular series measuring these variables, the leads and lags, and various other aspects of the equations besides the particular values of the parameters (which alone can be tested by the usual statistical technique) have been selected after an extensive process of trial and error because they yield high coefficients of correlation. Tinbergen is seldom satisfied with a correlation coefficient less than 0.98. But these attractive correlation coefficients create no presumption that the relationships they describe will hold in the future. The multiple regression equations which yield them are simply tautological reformulations of selected economic data. Taken at face value, Tinbergen’s work “explains” the errors in his data no less than their real movements; for although many of the series employed in the study would be accorded, even by their compilers, a margin of error in excess of 5 per cent, Tinbergen’s equations “explain” well over 95 per cent of the observed variation.

As W. C. Mitchell put it some years ago, “a competent statistician, with sufficient clerical assistance and time at his command, can take almost any pair of time series for a given period and work them into forms which will yield coefficients of correlation exceeding ±.9 …. So work of [this] sort … must be judged, not by the coefficients of correlation obtained within the periods for which they have manipulated the data, but by the coefficients which they get in earlier or later periods to which their formulas may be applied.” But Tinbergen makes no attempt to determine whether his equations agree with data other than those which they translate …

The methods used by Tinbergen do not and cannot provide an empirically tested explanation of business cycle movements.

## Why Africa is so poor

5 May, 2016 at 19:15 | Posted in Economics, Statistics & Econometrics | 1 CommentA few years ago, two economics professors, Quamrul Ashraf and Oded Galor, published a paper, “The ‘Out of Africa’ Hypothesis, Human Genetic Diversity, and Comparative Economic Development,” that drew inferences about poverty and genetics based on a statistical pattern …

When the paper by Ashraf and Galor came out, I criticized it from a statistical perspective, questioning what I considered its overreach in making counterfactual causal claims … I argued (and continue to believe) that the problems in that paper reflect a more general issue in social science: There is an incentive to make strong and dramatic claims to get published in a top journal …

Recently, Shiping Tang sent me a paper criticizing Ashraf and Galor from a data-analysis perspective … I have not tried to evaluate the details of Tang’s re-analysis because I continue to think that Ashraf and Galor’s paper is essentially an analysis of three data points (sub-Saharan Africa, remote Andean countries and Eurasia). It offered little more than the already-known stylized fact that sub-Saharan African countries are very poor, Amerindian countries are somewhat poor, and countries with Eurasians and their descendants tend to have middle or high incomes.

## Pitfalls of meta-analysis

19 April, 2016 at 10:28 | Posted in Statistics & Econometrics | 1 CommentIncluding all relevant material – good, bad, and indifferent – in meta-analysis admits the subjective judgments that meta-analysis was designed to avoid. Several problems arise in meta-analysis: regressions are often non -linear; effects are often multivariate rather than univariate; coverage can be restricted; bad studies may be included; the data summarised may not be homogeneous; grouping different causal factors may lead to meaningless estimates of effects; and the theory-directed approach may obscure discrepancies. Meta-analysis may not be the one best method for studying the diversity of fields for which it has been used …

Glass and Smith carried out a meta-analysis of research on class size and achievement and concluded that “a clear and strong relationship between class size and achievement has emerged.”10 The study was done and analysed well; it might almost be cited as an example of what meta-analysis can do. Yet the conclusion is very misleading, as is the estimate of effect size it presents: “between class-size of 40 pupils and one pupil lie more than 30 percentile ranks of achievement.” Such estimates imply a linear regression, yet the regression is extremely curvilinear, as one of the authors’ figures shows: between class sizes of 20 and 40 there is absolutely no difference in achievement; it is only with unusually small classes that there seems to be an effect. For a teacher the major result is that for 90% of all classes the number of pupils makes no difference at all to their achievement. The conclusions drawn by the authors from their meta-analysis are normally correct, but they are statistically meaningless and particularly misleading. No estimate of effect size is meaningful unless regressions are linear, yet such linearity is seldom investigated, or, if not present, taken seriously.

Systematic reviews in sciences are extremely important to undertake in our search for robust evidence and explanations — simply averaging data from different populations, places, and contexts, is not.

## Kocherlakota on picking p-values

7 April, 2016 at 11:15 | Posted in Statistics & Econometrics | Leave a commentThe word “significant” has a special place in the world of statistics, thanks to a test that researchers use to avoid jumping to conclusions from too little data. Suppose a researcher has what looks like an exciting result: She gave 30 kids a new kind of lunch, and they all got better grades than a control group that didn’t get the lunch. Before concluding that the lunch helped, she must ask the question: If it actually had no effect, how likely would I be to get this result? If that probability, or p-value, is below a certain threshold — typically set at 5 percent — the result is deemed “statistically significant.”

Clearly, this statistical significance is not the same as real-world significance — all it offers is an indication of whether you’re seeing an effect where there is none. Even this narrow technical meaning, though, depends on where you set the threshold at which you are willing to discard the “null hypothesis” — that is, in the above case, the possibility that there is no effect. I would argue that there’s no good reason to always set it at 5 percent. Rather, it should depend on what is being studied, and on the risks involved in acting — or failing to act — on the conclusions …

This example illustrates three lessons. First, researchers shouldn’t blindly follow convention in picking an appropriate p-value cutoff. Second, in order to choose the right p-value threshold, they need to know how the threshold affects the probability of a Type II error. Finally, they should consider, as best they can, the costs associated with the two kinds of errors.

Statistics is a powerful tool. But, like any powerful tool, it can’t be used the same way in all situations.

If anything, Kocherlakota’s article underlines how important it is not to equate science with statistical calculation. All science entail human judgement, and using statistical models doesn’t relieve us of that necessity. Working with misspecified models, the scientific value of significance testing is actually zero – even though you’re making valid statistical inferences! Statistical models and concomitant significance tests are no substitutes for doing real science.

In its standard form, a significance test is not the kind of “severe test” that we are looking for in our search for being able to confirm or disconfirm empirical scientific hypothesis. This is problematic for many reasons, one being that there is a strong tendency to accept the null hypothesis since they can’t be rejected at the standard 5% significance level. In their standard form, significance tests bias against new hypotheses by making it hard to disconfirm the null hypothesis.

And as shown over and over again when it is applied, people have a tendency to read “not disconfirmed” as “probably confirmed.” Standard scientific methodology tells us that when there is only say a 10 % probability that pure sampling error could account for the observed difference between the data and the null hypothesis, it would be more “reasonable” to conclude that we have a case of disconfirmation. Especially if we perform many independent tests of our hypothesis and they all give about the same 10 % result as our reported one, I guess most researchers would count the hypothesis as even more disconfirmed.

We should never forget that the underlying parameters we use when performing significance tests are *model constructions*. Our p-values mean next to nothing if the model is wrong. And most importantly — statistical significance tests DO NOT validate models!

In journal articles a typical regression equation will have an intercept and several explanatory variables. The regression output will usually include an F-test, with p – 1 degrees of freedom in the numerator and n – p in the denominator. The null hypothesis will not be stated. The missing null hypothesis is that all the coefficients vanish, except the intercept.

If F is significant, that is often thought to validate the model. Mistake. The F-test takes the model as given. Significance only means this:

ifthe model is rightandthe coefficients are 0, it is very unlikely to get such a big F-statistic. Logically, there are three possibilities on the table:

i) An unlikely event occurred.

ii) Or the model is right and some of the coefficients differ from 0.

iii) Or the model is wrong.

So?

[h/t Tom Hickey]

## The rhetoric of econometrics

1 April, 2016 at 09:19 | Posted in Statistics & Econometrics | 1 CommentThe desire in the profession to make universalistic claims following certain standard procedures of statistical inference is simply too strong to embrace procedures which explicitly rely on the use of vernacular knowledge for model closure in a contingent manner. More broadly, such a desire has played a vital role in the decisive victory of mathematical formalization over conventionally verbal based economic discourses as the proncipal medium of rhetoric, owing to its internal consistency, reducibility, generality, and apparent objectivity. It does not matter that [as Einstein wrote] ‘as far as the laws of mathematics refer to reality, they are not certain.’ What matters is that these laws are ‘certain’ when ‘they do not refer to reality.’ Most of what is evaluated as core research in the academic domain has little direct bearing on concrete social events in the real world anyway.

Maintaining that economics is a science in the ‘true knowledge’ business, yours truly remains a skeptic of the pretences and aspirations of econometrics. So far, I cannot see that it has yielded much in terms of relevant, interesting economic knowledge. Over all the results have been bleak indeed.

Firmly stuck in an empiricist tradition, econometrics is only concerned with the measurable aspects of reality. But there is always the possibility that there are other variables — of vital importance and although perhaps unobservable and non-additive, not necessarily epistemologically inaccessible — that were not considered for the econometric modeling.

A perusal of the leading econom(etr)ic journals shows that most econometricians still concentrate on fixed parameter models and that parameter-values estimated in specific spatio-temporal contexts are presupposed to be exportable to totally different contexts. To warrant this assumption one, however, has to convincingly establish that the targeted acting causes are stable and invariant so that they maintain their parametric status after the bridging. The endemic lack of predictive success of the econometric project indicates that this hope of finding fixed parameters is a hope for which there really is no other ground than hope itself.

Most of the assumptions that econometric modeling presupposes are not only unrealistic — they are plainly wrong.

If economic regularities obtain they do it (as a rule) only because we engineered them for that purpose. Outside man-made ‘nomological machines’ they are rare, or even non-existant. Unfortunately that also makes most of the achievements of econometric forecasting and ‘explanation’ rather useless.

## The lady tasting tea

17 March, 2016 at 09:19 | Posted in Statistics & Econometrics | Leave a commentThe mathematical formulations of statistics can be used to compute probabilities. Those probabilities enable us to apply statistical methods to scientific problems. In terms of the mathematics used, probability is well defined. How does this abstract concept connect to reality? How is the scientist to interpret the probability statements of statistical analyses when trying to decide what is true and what is not? …

Fisher’s use of a significance test produced a number Fisher called the p-value. This is a calculated probabiity, a probability associated with the observed data under the assumption that the null hypothesis is true. For instance, suppose we wish to test a new drug for the prevention of a recurrence of breast cancer in patients who have had mastectomies, comparing it to a placebo. The null hypothesis, the straw man, is that the drug is no better than the placebo …

Since [the p-value] is used to show that the hypothesis under which it is calculated is false, what does it really mean? It is a theoretical probability associated with the observations under conditions that are most likely false. It has nothing to do with reality. It is an indirect measurement of plausibility. It is not the probability that we would be wrong to say that the drug works. It is not the probability of any kind of error. It is not the probability that a patient will do as well on the placebo as on the drug.

## Significance tests — asking the wrong questions and getting the wrong answers

14 March, 2016 at 12:56 | Posted in Statistics & Econometrics | Leave a commentScientists have enthusiastically adopted significance testing and hypothesis testing because these methods appear to solve a fundamental problem: how to distinguish “real” effects from randomness or chance. Unfortunately significance testing and hypothesis testing are of limited scientific value – they often ask the wrong question and almost always give the wrong answer. And they are widely misinterpreted.

Consider a clinical trial designed to investigate the effectiveness of new treatment for some disease. After the trial has been conducted the researchers might ask “is the observed effect of treatment real, or could it have arisen merely by chance?” If the calculated p value is less than 0.05 the researchers might claim the trial has demonstrated the treatment was effective. But even before the trial was conducted we could reasonably have expected the treatment was “effective” – almost all drugs have some biochemical action and all surgical interventions have some effects on health. Almost all health interventions have some effect, it’s just that some treatments have effects that are large enough to be useful and others have effects that are trivial and unimportant.

So what’s the point in showing empirically that the null hypothesis is not true? Researchers who conduct clinical trials need to determine if the effect of treatment is big enough to make the intervention worthwhile, not whether the treatment has any effect at all.

A more technical issue is that p tells us the probability of observing the data given that the null hypothesis is true. But most scientists think p tells them the probability the null hypothesis is true given their data. The difference might sound subtle but it’s not. It is like the difference between the probability that a prime minister is male and the probability a male is prime minister! …

Significance testing and hypothesis testing are so widely misinterpreted that they impede progress in many areas of science. What can be done to hasten their demise? Senior scientists should ensure that a critical exploration of the methods of statistical inference is part of the training of all research students. Consumers of research should not be satisfied with statements that “X is effective”, or “Y has an effect”, especially when support for such claims is based on the evil p.

Decisions based on statistical significance testing certainly make life easier. But significance testing doesn’t give us the knowledge we want. It only gives an answer to a question we as researchers never ask — what is the probability of getting the result we have got, assuming that there is no difference between two sets of data (e. g. control group – experimental group, sample – population). On answering the question we really are interested in — how probable and reliable is our hypothesis — it remains silent.

Significance tests are not the kind of “severe test” that we are looking for in our search for being able to confirm or disconfirm empirical scientific hypothesis. This is problematic for many reasons, one being that there is a strong tendency to accept the null hypothesis since they can’t be rejected at the standard 5% significance level. In their standard form, significance tests bias against new hypotheses by making it hard to disconfirm the null hypothesis.

And as shown over and over again when it is applied, people have a tendency to read “not disconfirmed” as “probably confirmed.” Standard scientific methodology tells us that when there is only say a 10 % probability that pure sampling error could account for the observed difference between the data and the null hypothesis, it would be more “reasonable” to conclude that we have a case of disconfirmation. Especially if we perform many independent tests of our hypothesis and they all give about the same 10 % result as our reported one, I guess most researchers would count the hypothesis as even more disconfirmed.

Most importantly — we should never forget that the underlying parameters we use when performing significance tests are *model constructions — *our p-values mean nothing if the model is wrong!

## Statistics — a science in deep crisis

11 March, 2016 at 18:57 | Posted in Statistics & Econometrics | 3 CommentsAs most of you are aware … there is a statistical crisis in science, most notably in social psychology research but also in other fields. For the past several years, top journals such as JPSP, Psych Science, and PPNAS have published lots of papers that have made strong claims based on weak evidence. Standard statistical practice is to take your data and work with it until you get a p-value of less than .05. Run a few experiments like that, attach them to a vaguely plausible (or even, in many cases, implausible) theory, and you got yourself a publication …

The claims in all those wacky papers have been disputed in three, mutually supporting ways:

1. Statistical analysis shows how it is possible — indeed, easy — to get statistical significance in an uncontrolled study in which rules for data inclusion, data coding, and data analysis are determined after the data have been seen …

Researchers do cheat, but we don’t have to get into that here. If someone reports a wrong p-value that just happens to be below .05, when the correct calculation would give a result above .05, or if someone claims that a p-value of .08 corresponds to a weak effect, or if someone reports the difference between significant and non-significant, I don’t really care if it’s cheating or just a pattern of sloppy work.

2. People try to replicate these studies and the replications don’t show the expected results. Sometimes these failed replications are declared to be successes … other times they are declared to be failures … I feel so bad partly because this statistical significance stuff is how we all teach introductory statistics, so I, as a representative of the statistics profession, bear much of the blame for these researchers’ misconceptions …

3. In many cases there is prior knowledge or substantive theory that the purported large effects are highly implausible …

Researchers can come up with theoretical justifications for just about anything, and indeed research is typically motivated by some theory. Even if I and others might be skeptical of a theory such as embodied cognition or himmicanes, that skepticism is in the eye of the beholder, and even a prior history of null findings (as with ESP) is no guarantee of future failure: again, the researchers studying these things have new ideas all the time … I do think that theory and prior information should and do inform our understanding of new claims. It’s certainly relevant that in none of these disputed cases is the theory strong enough on its own to hold up a claim. We’re disputing power pose and fat-arms-and-political-attitudes, not gravity, electromagnetism, or evolution.

## Can an endless series reach its limit?

2 March, 2016 at 09:30 | Posted in Statistics & Econometrics | 2 Comments

Create a free website or blog at WordPress.com. | The Pool Theme.

Entries and comments feeds.