## Why Africa is so poor

5 May, 2016 at 19:15 | Posted in Economics, Statistics & Econometrics | Leave a commentA few years ago, two economics professors, Quamrul Ashraf and Oded Galor, published a paper, “The ‘Out of Africa’ Hypothesis, Human Genetic Diversity, and Comparative Economic Development,” that drew inferences about poverty and genetics based on a statistical pattern …

When the paper by Ashraf and Galor came out, I criticized it from a statistical perspective, questioning what I considered its overreach in making counterfactual causal claims … I argued (and continue to believe) that the problems in that paper reflect a more general issue in social science: There is an incentive to make strong and dramatic claims to get published in a top journal …

Recently, Shiping Tang sent me a paper criticizing Ashraf and Galor from a data-analysis perspective … I have not tried to evaluate the details of Tang’s re-analysis because I continue to think that Ashraf and Galor’s paper is essentially an analysis of three data points (sub-Saharan Africa, remote Andean countries and Eurasia). It offered little more than the already-known stylized fact that sub-Saharan African countries are very poor, Amerindian countries are somewhat poor, and countries with Eurasians and their descendants tend to have middle or high incomes.

## Pitfalls of meta-analysis

19 April, 2016 at 10:28 | Posted in Statistics & Econometrics | 1 CommentIncluding all relevant material – good, bad, and indifferent – in meta-analysis admits the subjective judgments that meta-analysis was designed to avoid. Several problems arise in meta-analysis: regressions are often non -linear; effects are often multivariate rather than univariate; coverage can be restricted; bad studies may be included; the data summarised may not be homogeneous; grouping different causal factors may lead to meaningless estimates of effects; and the theory-directed approach may obscure discrepancies. Meta-analysis may not be the one best method for studying the diversity of fields for which it has been used …

Glass and Smith carried out a meta-analysis of research on class size and achievement and concluded that “a clear and strong relationship between class size and achievement has emerged.”10 The study was done and analysed well; it might almost be cited as an example of what meta-analysis can do. Yet the conclusion is very misleading, as is the estimate of effect size it presents: “between class-size of 40 pupils and one pupil lie more than 30 percentile ranks of achievement.” Such estimates imply a linear regression, yet the regression is extremely curvilinear, as one of the authors’ figures shows: between class sizes of 20 and 40 there is absolutely no difference in achievement; it is only with unusually small classes that there seems to be an effect. For a teacher the major result is that for 90% of all classes the number of pupils makes no difference at all to their achievement. The conclusions drawn by the authors from their meta-analysis are normally correct, but they are statistically meaningless and particularly misleading. No estimate of effect size is meaningful unless regressions are linear, yet such linearity is seldom investigated, or, if not present, taken seriously.

Systematic reviews in sciences are extremely important to undertake in our search for robust evidence and explanations — simply averaging data from different populations, places, and contexts, is not.

## Kocherlakota on picking p-values

7 April, 2016 at 11:15 | Posted in Statistics & Econometrics | Leave a commentThe word “significant” has a special place in the world of statistics, thanks to a test that researchers use to avoid jumping to conclusions from too little data. Suppose a researcher has what looks like an exciting result: She gave 30 kids a new kind of lunch, and they all got better grades than a control group that didn’t get the lunch. Before concluding that the lunch helped, she must ask the question: If it actually had no effect, how likely would I be to get this result? If that probability, or p-value, is below a certain threshold — typically set at 5 percent — the result is deemed “statistically significant.”

Clearly, this statistical significance is not the same as real-world significance — all it offers is an indication of whether you’re seeing an effect where there is none. Even this narrow technical meaning, though, depends on where you set the threshold at which you are willing to discard the “null hypothesis” — that is, in the above case, the possibility that there is no effect. I would argue that there’s no good reason to always set it at 5 percent. Rather, it should depend on what is being studied, and on the risks involved in acting — or failing to act — on the conclusions …

This example illustrates three lessons. First, researchers shouldn’t blindly follow convention in picking an appropriate p-value cutoff. Second, in order to choose the right p-value threshold, they need to know how the threshold affects the probability of a Type II error. Finally, they should consider, as best they can, the costs associated with the two kinds of errors.

Statistics is a powerful tool. But, like any powerful tool, it can’t be used the same way in all situations.

If anything, Kocherlakota’s article underlines how important it is not to equate science with statistical calculation. All science entail human judgement, and using statistical models doesn’t relieve us of that necessity. Working with misspecified models, the scientific value of significance testing is actually zero – even though you’re making valid statistical inferences! Statistical models and concomitant significance tests are no substitutes for doing real science.

In its standard form, a significance test is not the kind of “severe test” that we are looking for in our search for being able to confirm or disconfirm empirical scientific hypothesis. This is problematic for many reasons, one being that there is a strong tendency to accept the null hypothesis since they can’t be rejected at the standard 5% significance level. In their standard form, significance tests bias against new hypotheses by making it hard to disconfirm the null hypothesis.

And as shown over and over again when it is applied, people have a tendency to read “not disconfirmed” as “probably confirmed.” Standard scientific methodology tells us that when there is only say a 10 % probability that pure sampling error could account for the observed difference between the data and the null hypothesis, it would be more “reasonable” to conclude that we have a case of disconfirmation. Especially if we perform many independent tests of our hypothesis and they all give about the same 10 % result as our reported one, I guess most researchers would count the hypothesis as even more disconfirmed.

We should never forget that the underlying parameters we use when performing significance tests are *model constructions*. Our p-values mean next to nothing if the model is wrong. And most importantly — statistical significance tests DO NOT validate models!

In journal articles a typical regression equation will have an intercept and several explanatory variables. The regression output will usually include an F-test, with p – 1 degrees of freedom in the numerator and n – p in the denominator. The null hypothesis will not be stated. The missing null hypothesis is that all the coefficients vanish, except the intercept.

If F is significant, that is often thought to validate the model. Mistake. The F-test takes the model as given. Significance only means this:

ifthe model is rightandthe coefficients are 0, it is very unlikely to get such a big F-statistic. Logically, there are three possibilities on the table:

i) An unlikely event occurred.

ii) Or the model is right and some of the coefficients differ from 0.

iii) Or the model is wrong.

So?

[h/t Tom Hickey]

## The rhetoric of econometrics

1 April, 2016 at 09:19 | Posted in Statistics & Econometrics | 1 CommentThe desire in the profession to make universalistic claims following certain standard procedures of statistical inference is simply too strong to embrace procedures which explicitly rely on the use of vernacular knowledge for model closure in a contingent manner. More broadly, such a desire has played a vital role in the decisive victory of mathematical formalization over conventionally verbal based economic discourses as the proncipal medium of rhetoric, owing to its internal consistency, reducibility, generality, and apparent objectivity. It does not matter that [as Einstein wrote] ‘as far as the laws of mathematics refer to reality, they are not certain.’ What matters is that these laws are ‘certain’ when ‘they do not refer to reality.’ Most of what is evaluated as core research in the academic domain has little direct bearing on concrete social events in the real world anyway.

Maintaining that economics is a science in the ‘true knowledge’ business, yours truly remains a skeptic of the pretences and aspirations of econometrics. So far, I cannot see that it has yielded much in terms of relevant, interesting economic knowledge. Over all the results have been bleak indeed.

Firmly stuck in an empiricist tradition, econometrics is only concerned with the measurable aspects of reality. But there is always the possibility that there are other variables — of vital importance and although perhaps unobservable and non-additive, not necessarily epistemologically inaccessible — that were not considered for the econometric modeling.

A perusal of the leading econom(etr)ic journals shows that most econometricians still concentrate on fixed parameter models and that parameter-values estimated in specific spatio-temporal contexts are presupposed to be exportable to totally different contexts. To warrant this assumption one, however, has to convincingly establish that the targeted acting causes are stable and invariant so that they maintain their parametric status after the bridging. The endemic lack of predictive success of the econometric project indicates that this hope of finding fixed parameters is a hope for which there really is no other ground than hope itself.

Most of the assumptions that econometric modeling presupposes are not only unrealistic — they are plainly wrong.

If economic regularities obtain they do it (as a rule) only because we engineered them for that purpose. Outside man-made ‘nomological machines’ they are rare, or even non-existant. Unfortunately that also makes most of the achievements of econometric forecasting and ‘explanation’ rather useless.

## The lady tasting tea

17 March, 2016 at 09:19 | Posted in Statistics & Econometrics | Leave a commentThe mathematical formulations of statistics can be used to compute probabilities. Those probabilities enable us to apply statistical methods to scientific problems. In terms of the mathematics used, probability is well defined. How does this abstract concept connect to reality? How is the scientist to interpret the probability statements of statistical analyses when trying to decide what is true and what is not? …

Fisher’s use of a significance test produced a number Fisher called the p-value. This is a calculated probabiity, a probability associated with the observed data under the assumption that the null hypothesis is true. For instance, suppose we wish to test a new drug for the prevention of a recurrence of breast cancer in patients who have had mastectomies, comparing it to a placebo. The null hypothesis, the straw man, is that the drug is no better than the placebo …

Since [the p-value] is used to show that the hypothesis under which it is calculated is false, what does it really mean? It is a theoretical probability associated with the observations under conditions that are most likely false. It has nothing to do with reality. It is an indirect measurement of plausibility. It is not the probability that we would be wrong to say that the drug works. It is not the probability of any kind of error. It is not the probability that a patient will do as well on the placebo as on the drug.

## Significance tests — asking the wrong questions and getting the wrong answers

14 March, 2016 at 12:56 | Posted in Statistics & Econometrics | Leave a commentScientists have enthusiastically adopted significance testing and hypothesis testing because these methods appear to solve a fundamental problem: how to distinguish “real” effects from randomness or chance. Unfortunately significance testing and hypothesis testing are of limited scientific value – they often ask the wrong question and almost always give the wrong answer. And they are widely misinterpreted.

Consider a clinical trial designed to investigate the effectiveness of new treatment for some disease. After the trial has been conducted the researchers might ask “is the observed effect of treatment real, or could it have arisen merely by chance?” If the calculated p value is less than 0.05 the researchers might claim the trial has demonstrated the treatment was effective. But even before the trial was conducted we could reasonably have expected the treatment was “effective” – almost all drugs have some biochemical action and all surgical interventions have some effects on health. Almost all health interventions have some effect, it’s just that some treatments have effects that are large enough to be useful and others have effects that are trivial and unimportant.

So what’s the point in showing empirically that the null hypothesis is not true? Researchers who conduct clinical trials need to determine if the effect of treatment is big enough to make the intervention worthwhile, not whether the treatment has any effect at all.

A more technical issue is that p tells us the probability of observing the data given that the null hypothesis is true. But most scientists think p tells them the probability the null hypothesis is true given their data. The difference might sound subtle but it’s not. It is like the difference between the probability that a prime minister is male and the probability a male is prime minister! …

Significance testing and hypothesis testing are so widely misinterpreted that they impede progress in many areas of science. What can be done to hasten their demise? Senior scientists should ensure that a critical exploration of the methods of statistical inference is part of the training of all research students. Consumers of research should not be satisfied with statements that “X is effective”, or “Y has an effect”, especially when support for such claims is based on the evil p.

Decisions based on statistical significance testing certainly make life easier. But significance testing doesn’t give us the knowledge we want. It only gives an answer to a question we as researchers never ask — what is the probability of getting the result we have got, assuming that there is no difference between two sets of data (e. g. control group – experimental group, sample – population). On answering the question we really are interested in — how probable and reliable is our hypothesis — it remains silent.

Significance tests are not the kind of “severe test” that we are looking for in our search for being able to confirm or disconfirm empirical scientific hypothesis. This is problematic for many reasons, one being that there is a strong tendency to accept the null hypothesis since they can’t be rejected at the standard 5% significance level. In their standard form, significance tests bias against new hypotheses by making it hard to disconfirm the null hypothesis.

And as shown over and over again when it is applied, people have a tendency to read “not disconfirmed” as “probably confirmed.” Standard scientific methodology tells us that when there is only say a 10 % probability that pure sampling error could account for the observed difference between the data and the null hypothesis, it would be more “reasonable” to conclude that we have a case of disconfirmation. Especially if we perform many independent tests of our hypothesis and they all give about the same 10 % result as our reported one, I guess most researchers would count the hypothesis as even more disconfirmed.

Most importantly — we should never forget that the underlying parameters we use when performing significance tests are *model constructions — *our p-values mean nothing if the model is wrong!

## Statistics — a science in deep crisis

11 March, 2016 at 18:57 | Posted in Statistics & Econometrics | 3 CommentsAs most of you are aware … there is a statistical crisis in science, most notably in social psychology research but also in other fields. For the past several years, top journals such as JPSP, Psych Science, and PPNAS have published lots of papers that have made strong claims based on weak evidence. Standard statistical practice is to take your data and work with it until you get a p-value of less than .05. Run a few experiments like that, attach them to a vaguely plausible (or even, in many cases, implausible) theory, and you got yourself a publication …

The claims in all those wacky papers have been disputed in three, mutually supporting ways:

1. Statistical analysis shows how it is possible — indeed, easy — to get statistical significance in an uncontrolled study in which rules for data inclusion, data coding, and data analysis are determined after the data have been seen …

Researchers do cheat, but we don’t have to get into that here. If someone reports a wrong p-value that just happens to be below .05, when the correct calculation would give a result above .05, or if someone claims that a p-value of .08 corresponds to a weak effect, or if someone reports the difference between significant and non-significant, I don’t really care if it’s cheating or just a pattern of sloppy work.

2. People try to replicate these studies and the replications don’t show the expected results. Sometimes these failed replications are declared to be successes … other times they are declared to be failures … I feel so bad partly because this statistical significance stuff is how we all teach introductory statistics, so I, as a representative of the statistics profession, bear much of the blame for these researchers’ misconceptions …

3. In many cases there is prior knowledge or substantive theory that the purported large effects are highly implausible …

Researchers can come up with theoretical justifications for just about anything, and indeed research is typically motivated by some theory. Even if I and others might be skeptical of a theory such as embodied cognition or himmicanes, that skepticism is in the eye of the beholder, and even a prior history of null findings (as with ESP) is no guarantee of future failure: again, the researchers studying these things have new ideas all the time … I do think that theory and prior information should and do inform our understanding of new claims. It’s certainly relevant that in none of these disputed cases is the theory strong enough on its own to hold up a claim. We’re disputing power pose and fat-arms-and-political-attitudes, not gravity, electromagnetism, or evolution.

## Can an endless series reach its limit?

2 March, 2016 at 09:30 | Posted in Statistics & Econometrics | 2 Comments

## Ergodicity and the law of large numbers (wonkish)

1 March, 2016 at 08:51 | Posted in Statistics & Econometrics | 9 CommentsIf n identical trials A occurs v times, and if n is very large, then v/n should be near the probability p of A …This is one form of the

law of large numbersand serves as a basis for the intuitive notion of probability as a measure of relative frequencies …It is usual to read into the law of large numbers things which it definitely does not imply. If Peter and Paul toss a perfect coin 10 000 times, it is customary to expect that Peter will be in the lead roughly half the time.

This is not true. In a large number ofdifferentcoin-tossing games it is reasonable to expect that at any fixed moment heads will be in the lead in roughly half of all cases. But it is quite likely that the player who ends at the winning side has been in the lead for practically the whole duration of the game. Thus contrary to widespread belief, thetime averagefor any individual game has nothing to do with theensemble averageat any given moment.

When giving my yearly PhD course in statistics at Malmö University, Feller’s book is as self-evident a reference as when I started my own statistics studies forty years ago.

## Is 0.999 … = 1? (wonkish)

29 February, 2016 at 12:52 | Posted in Statistics & Econometrics | 7 CommentsWhat is 0.999 …, really? Is it 1? Or is it some number infinitesimally less than 1?

The right answer is to unmask the question. What is 0.999 …, really? It appears to refer to a kind of sum:

.9 + + 0.09 + 0.009 + 0.0009 + …

But what does that mean? That pesky ellipsis is the real problem. There can be no controversy about what it means to add up two, or three, or a hundred numbers. But infinitely many? That’s a different story. In the real world, you can never have infinitely many heaps. What’s the numerical value of an infinite sum? It doesn’t have one —

until we give it one. That was the great innovation of Augustin-Louis Cauchy, who introduced the notion oflimitinto calculus in the 1820s.The British number theorist G. H. Hardy … explains it best: “It is broadly true to say that mathematicians before Cauchy asked not, ‘How shall we

define1 – 1 – 1 + 1 – 1 …’ but ‘Whatis1 -1 + 1 – 1 + …?'”No matter how tight a cordon we draw around the number 1, the sum will eventually, after some finite number of steps, penetrate it, and never leave. Under those circumstances, Cauchy said, we should simply

definethe value of the infinite sum to be 1.

I have no problem with solving problems in *mathematics* by ‘defining’ them away. But how about the *real world*? Maybe that ought to be a question to ponder even for economists all to fond of uncritically following the mathematical way when applying their mathematical models to the real world, where indeed “you can never have infinitely many heaps” …

In econometrics we often run into the ‘Cauchy logic’ —the data is treated as if it were from a larger population, a ‘superpopulation’ where repeated realizations of the data are imagined. Just imagine there could be more worlds than the one we live in and the problem is fixed …

Accepting Haavelmo’s domain of probability theory and sample space of infinite populations – just as Fisher’s “hypothetical infinite population, of which the actual data are regarded as constituting a random sample”, von Mises’s “collective” or Gibbs’s ”ensemble” – also implies that judgments are made on the basis of observations that are actually never made!

Infinitely repeated trials or samplings never take place in the real world. So that cannot be a sound inductive basis for a science with aspirations of explaining real-world socio-economic processes, structures or events. It’s — just as the Cauchy mathematical logic of ‘defining’ away problems — not tenable.

In social sciences — including economics — it’s always wise to ponder C. S. Peirce’s remark that universes are not as common as peanuts …

Create a free website or blog at WordPress.com. | The Pool Theme.

Entries and comments feeds.