## Male circumcision — a case of selection bias

10 March, 2015 at 17:01 | Posted in Statistics & Econometrics | 1 CommentTake a look at a map of Africa showing male circumcision rates, and impose on that data on HIV/AIDS prevalence. There is a very close correspondence between the two, with the exceptions being cities with large numbers of recent uncircumcised male migrants. One might therefore conclude that male circumcision reduces the chances of contracting HIV/AIDS, and indeed there are medical reasons to believe this may be so. But maybe some third, underlying variable, explains both circumcision and HIV/AIDS prevalence. That is, those who select to get circumcised have special characteristics which make them less likely to contract HIV/AIDS, so a comparison of HIV/AIDS rates between circumcised and uncircumcised men will give a biased estimate of the impact of circumcision on HIV/AIDS prevalence. There is such a factor, it is being Muslim. Muslim men are circumcised and less likely to engage in risky sexual behaviour exposing themselves to HIV/AIDS, partly as they do not drink alcohol. Again we are not comparing like with like: circumcised men have different characteristics compared to uncircumcised men, and these characteristics affect the outcome of interest.

## ‘Sizeless science’ and the cult of significance testing

9 March, 2015 at 15:06 | Posted in Statistics & Econometrics | Leave a commentA couple of years ago yours truly had an interesting luncheon discussion with Deirdre McCloskey on her controversy with Kevin Hoover on significance testing. It got me thinking about where the fetish status of significance testing comes from and why we are still teaching and practising it without serious qualifications despite its obvious inadequacies.

A non-trivial part of teaching statistics is made up of learning students to perform significance testing. A problem I have noticed repeatedly over the years, however, is that no matter how careful you try to be in explicating what the proba-bilities generated by these statistical tests – *p-values* – really are, still most students misinterpret them.

Giving a statistics course for the *Swedish National Research School in History*, I asked the students at the exam to explain how one should correctly interpret *p-values*. Although the correct definition is p(data|null hypothesis), a majority of the students either misinterpreted the meaning of the *p-value* as being the *likelihood of a sampling error* (which of course is wrong, since the very computation of the p value is based on the assumption that sampling errors are what causes the sample statistics not coinciding with the null hypothesis), or that the meaning of the *p-value* is the probability of the null hypothesis being true, given the data (which is a case of the fallacy of transposing the conditional, and of course also being wrong, since that is p(null hypothesis|data) rather than the correct p(data|null hypothesis)).

This is not to blame on students’ ignorance, but rather on significance testing not being particularly transparent (conditional probability inference is difficult even to those of us who teach and practice it). A lot of researchers fall prey to the same mistakes. So — given that it anyway is very unlikely than any population parameter is exactly zero, and that contrary to assumption most samples in social science and economics are not random or having the right distributional shape — why continue to press students and researchers to do null hypothesis significance testing, testing that relies on weird backward logic that students and researchers usually don’t understand?

Reviewing Deirdre’s and Stephen Ziliak’s *The Cult of Statistical Significance *(University of Michigan Press 2008), mathematical statistician Olle Häggström succinctly summarizes what the debate is all about:

Stephen Ziliak and Deirdre McCloskey, claim in their recent book

The Cult of Statistical Significance[ZM] that the reliance on statistical methods has gone too far and turned into a ritual and an obstacle to scientific progress.A typical situation is the following. A scientist formulates a

null hypothesis. By means of asignificance test, she tries to falsify it. The analysis leads to ap-value, which indicates how likely it would have been, if the null hypothesis were true, to obtain data at least as extreme as those she actually got. If thep-valueis below a certain prespecified threshold (typically 0.01 or 0.05), the result is deemedstatistically significant, which, although far from constituting a definite disproof of the null hypothesis, counts as evidence against it.Imagine now that a new drug for reducing blood pressure is being tested and that the fact of the matter is that the drug does have a positive effect (as compared with a placebo) but that the effect is so small that it is of no practical relevance to the patient’s health or well-being. If the study involves sufficiently many patients, the effect will nevertheless with high probability be detected, and the study will yield statistical significance. The lesson to learn from this is that in a medical study, statistical significance is not enough—the detected effect also needs to be large enough to be

medically significant. Likewise, empirical studies in economics (or psychology, geology, etc.) need to consider not only statistical significance but also economic (psychological, geological, etc.) significance.A major point in

The Cult of Statistical Significanceis the observation that many researchers are so obsessed with statistical significance that they neglect to ask themselves whether the detected discrepancies are large enough to be of any subject-matter significance. Ziliak and McCloskey call this neglectsizeless science …

The Cult of Statistical Significanceis written in an entertaining and polemical style. Sometimes the authors push their position a bit far, such as when they ask themselves: “If nullhypothesis significance testing is as idiotic as we and its other critics have so long believed, how on earth has it survived?” (p. 240). Granted, the single-minded focus on statistical significance that they label sizeless science is bad practice. Still, to throw out the use of significance tests would be a mistake, considering how often it is a crucial tool for concluding with confidence that what we see really is a pattern, as opposed to just noise. For a data set to provide reasonable evidence of an important deviation from the null hypothesis, we typically needbothstatisticalandsubject-matter significance.

Statistical significance doesn’t say that something is important or true. Although Häggström has a point in his last remark, I still think – since there already are far better and more relevant testing that can be done (see e. g. my posts here and here) – it is high time to consider what should be the proper function of what has now really become a statistical fetish.

## Signifikanta resultat är inte alltid signifikanta

3 March, 2015 at 16:15 | Posted in Statistics & Econometrics | Leave a commentTidskriften

Basic and Applied Social Psychologybeslutade nyligen att förbjuda p-värden i publicerade artiklar. Det finns mycket att säga om denna ganska drastiska åtgärd … men det står helt klart att det finns problem med att fokusera alltför mycket på p-värden. Särskilt problematiska är p-värden när den statistiska styrkan (power) är låg i kombination med publiceringsbias, d.v.s. att framförallt statistiskt signifikanta resultat publiceras (se tidigare inlägg om publiceringsbias). Om den statistiska styrkan är låg, kan låga p-värden i vissa fall snarare vara en garant för missvisande resultat än ett tecken på tillförlitlighet.

Läsvärd artikel!

Yours truly har själv berört problemen med den överdrivna fixeringen vid p-värden här, här, här och här.

## Forecasting time series data in Gretl

2 March, 2015 at 20:38 | Posted in Statistics & Econometrics | 2 Comments

Thanks to Allin Cottrell and Riccardo Lucchetti we today have access to a high quality tool for doing and teaching econometrics — **Gretl**. And, best of all, it is totally *free*!

Gretl is up to the tasks you may have, so why spend money on expensive commercial programs?

The latest snapshot version of Gretl can be downloaded here.

## Econom(etr)ic fictions masquerading as rigorous science

27 February, 2015 at 09:15 | Posted in Statistics & Econometrics | 2 CommentsIn econometrics one often gets the feeling that many of its practitioners think of it as a kind of automatic inferential machine: input data and out comes casual knowledge. This is like pulling a rabbit from a hat. Great — but first you have to put the rabbit in the hat. And this is where assumptions come in to the picture.

As social scientists — and economists — we have to confront the all-important question of how to handle uncertainty and randomness. Should we define randomness with probability? If we do, we have to accept that to speak of randomness we also have to presuppose the existence of nomological probability machines, since probabilities cannot be spoken of – and actually, to be strict, do not at all exist – without specifying such system-contexts.

Accepting a domain of probability theory and a sample space of “infinite populations” — which is legion in modern econometrics — also implies that judgments are made on the basis of observations that are actually never made! Infinitely repeated trials or samplings never take place in the real world. So that cannot be a sound inductive basis for a science with aspirations of explaining real-world socio-economic processes, structures or events. It’s not tenable.

In his great book *Statistical Models and Causal Inference: A Dialogue with the Social Sciences *David Freedman touched on this fundamental problem, arising when you try to apply statistical models outside overly simple nomological machines like coin tossing and roulette wheels:

Lurking behind the typical regression model will be found a host of such assumptions; without them, legitimate inferences cannot be drawn from the model. There are statistical procedures for testing some of these assumptions. However, the tests often lack the power to detect substantial failures. Furthermore, model testing may become circular; breakdowns in assumptions are detected, and the model is redefined to accommodate. In short,

hiding the problems can become a major goal of model building.Using models to make predictions of the future, or the results of interventions, would be a valuable corrective. Testing the model on a variety of data sets – rather than fitting refinements over and over again to the same data set – might be a good second-best … Built into the equation is a model for non-discriminatory behavior: the coefficient d vanishes. If the company discriminates, that part of the model cannot be validated at all.

Regression models are widely used by social scientists to make causal inferences; such models are now almost a routine way of demonstrating counterfactuals.

However, the “demonstrations” generally turn out to depend on a series of untested, even unarticulated, technical assumptions.Under the circumstances, reliance on model outputs may be quite unjustified. Making the ideas of validation somewhat more precise is a serious problem in the philosophy of science. That models should correspond to reality is, after all, a useful but not totally straightforward idea – with some history to it. Developing appropriate models is a serious problem in statistics; testing the connection to the phenomena is even more serious …In our days, serious arguments have been made from data. Beautiful, delicate theorems have been proved, although the connection with data analysis often remains to be established. And

an enormous amount of fiction has been produced, masquerading as rigorous science.

Making outlandish statistical assumptions does not provide a solid ground for doing relevant social science.

## Econometrics and the difficult art of making it count

26 February, 2015 at 20:43 | Posted in Statistics & Econometrics | Leave a commentModern econometrics is fundamentally based on assuming — usually without any explicit justification — that we can gain causal knowledge by considering independent variables that may have an impact on the *variation* of a dependent variable. This is however, far from self-evident. Often the *fundamental* causes are *constant* forces that are not amenable to the kind of analysis econometrics supplies us with. As **Stanley Lieberson** has it in his modern classic Making It Count:

One can always say whether, in a given empirical context, a given variable or theory accounts for more variation than another. But it is almost certain that the variation observed is not universal over time and place. Hence the use of such a criterion first requires a conclusion about the variation over time and place in the dependent variable. If such an analysis is not forthcoming, the theoretical conclusion is undermined by the absence of information …

Moreover, it is questionable whether one can draw much of a conclusion about causal forces from simple analysis of the observed variation … To wit, it is vital that one have an understanding, or at least a working hypothesis, about what is causing the event per se; variation in the magnitude of the event will not provide the answer to that question.

Causality in social sciences — and economics — can never solely be a question of statistical inference. Causality entails more than predictability, and to really in depth explain social phenomena requires theory. Analysis of variation – the foundation of all econometrics – can never in itself reveal *how* these variations are brought about. First when we are able to tie actions, processes or structures to the statistical relations detected, can we say that we are getting at relevant explanations of causation. Too much in love with axiomatic-deductive modeling, neoclassical economists especially tend to forget that accounting for causation — *how* causes bring about their effects — demands deep subject-matter knowledge and acquaintance with the intricate fabrics and contexts. As already Keynes argued in his *A Treatise on Probability*, statistics and econometrics should not primarily be seen as means of inferring causality from observational data, but rather as description of patterns of associations and correlations that we may use as *suggestions* of possible causal realations.

## Model assumptions and reality

9 February, 2015 at 13:57 | Posted in Statistics & Econometrics | Leave a commentIn a previous article posted here — What are the key assumptions of linear regression models? — yours truly tried to argue that since econometrics doesn’t content itself with only making optimal *predictions*, but also aspires to *explain* things in terms of causes and effects, econometricians need loads of assumptions — and that most important of these are *additivity* and *linearity*.

Let me take the opportunity to cite one of my favourite introductory statistics textbooks on one further reason these assumptions are made — and why they ought to be much more argued for on both epistemological and ontological grounds when used (emphasis added):

In a hypothesis test … the sample comes from an

unknownpopulation. If the population is really unknown, it would suggest that we do not know the standard deviation, and therefore, we cannot calculate the standard error. To solve this dilemma, we have made an assumption. Specifically, we assume that the standard deviation for the unknown population (after treatment) is the same as it was for the population before treatment.Actually this assumption is the consequence of a more general assumption that is part of many statistical procedure. The general assumption states that the effect of the treatment is to add a constant amount to … every score in the population … You should also note that

this assumption is a theoretical ideal. In actual experiments, a treatment generally does not show a perfect and consistent additive effect.

Additivity and linearity are the two most important of the assumptions that most applied econometric models rely on, simply because if they are not true, your model is invalid and descriptively incorrect. It’s like calling your house a bicycle. No matter how you try, it won’t move you an inch. When the model is wrong — well, then it’s wrong.

## Markov’s Inequality Theorem (wonkish)

4 February, 2015 at 12:33 | Posted in Statistics & Econometrics | 2 CommentsOne of the most beautiful results of probability theory is **Markov’s Inequality Theorem** (after the Russian mathematician Andrei Markov (1856-1922)):

**If X is a non-negative stochastic variable (X ≥ 0) with a finite expectation value E(X), then for every a > 0**

**P{X ≥ a} ≤ E(X)/a**

If, e.g., the production of cars in a factory during a week is assumed to be a stochastic variable with an expectation value (mean) of 50 units, we can – based on nothing else but the inequality – conclude that the probability that the production for a week would be greater than 100 units can not exceed 50% [P(X≥100)≤(50/100)=0.5 = 50%]

I still feel a humble awe at this immensely powerful result. Without knowing anything else but an expected value (mean) of a probability distribution we can deduce upper limits for probabilities. The result hits me as equally suprising today as thirty years ago when I first run into it as a student of mathematical statistics.

## Wasserman on Bayesian religion

30 January, 2015 at 18:10 | Posted in Statistics & Econometrics | 4 CommentsThere is a nice YouTube video with Tony O’Hagan interviewing Dennis Lindley. Of course, Dennis is a legend and his impact on the field of statistics is huge.

At one point, Tony points out that some people liken Bayesian inference to a religion. Dennis claims this is false. Bayesian inference, he correctly points out, starts with some basic axioms and then the rest follows by deduction. This is logic, not religion.I agree that the mathematics of Bayesian inference is based on sound logic. But, with all due respect, I think Dennis misunderstood the question. When people say that “Bayesian inference is like a religion,” they are not referring to the logic of Bayesian inference. They are referring to how adherents of Bayesian inference behave.

(As an aside, detractors of Bayesian inference do not deny the correctness of the logic. They just don’t think the axioms are relevant for data analysis. For example, no one doubts the axioms of Peano arithmetic. But that doesn’t imply that arithmetic is the foundation of statistical inference. But I digress.)

The vast majority of Bayesians are pragmatic, reasonable people. But there is a sub-group of die-hard Bayesians who do treat Bayesian inference like a religion. By this I mean:

They are very cliquish.

They have a strong emotional attachment to Bayesian inference.

They are overly sensitive to criticism.

They are unwilling to entertain the idea that Bayesian inference might have flaws.

When someone criticizes Bayes, they think that critic just “doesn’t get it.”

They mock people with differing opinions …No evidence you can provide would ever make the die-hards doubt their ideas. To them, Sir David Cox, Brad Efron and other giants in our field who have doubts about Bayesian inference, are not taken seriously because they “just don’t get it.”

So is Bayesian inference a religion? For most Bayesians: no. But for the thin-skinned, inflexible die-hards who have attached themselves so strongly to their approach to inference that they make fun of, or get mad at, critics: yes, it is a religion.

For some more thoughts on the limits of the Bayesian approach, Stephen Senn’s You May Believe You Are a Bayesian But You Are Probably Wrong is a good read.

## The Lady Tasting Tea

29 January, 2015 at 15:20 | Posted in Statistics & Econometrics | Leave a commentEn av mina absoluta favoriter i statistikhyllan är David Salsburgs insiktsfulla statistikhistoria *The Lady Tasting Tea*. Boken är full av djupa och värdefulla reflektioner kring statistikens roll i modern vetenskap. Salsburg är, precis som tidigare till exempel Keynes, tveksam till hur många samhällsvetare – inte minst ekonomer – okritiskt och oargumenterat ofta bara *antar* att man kan applicera statistikteorins sannolikhetsfördelningar på sitt eget undersökningsområde. I slutkapitlet skriver han:

Kolmogorov established the mathematical meaning of probability: Probability is a measure of sets in an abstract space of events. All the mathematical properties of probability can be derived from this definition. When we wish to apply probability to real life, we need to identify that abstract space of events for the particular problem at hand … It is not well established when statistical methods are used for observational studies … If we cannot identify the space of events that generate the probabilities being calculated, then one model is no more valid than another … As statistical models are used more and more for observational studies to assist in social decisions by government and advocacy groups, this fundamental failure to be able to derive probabilities without ambiguity will cast doubt on the usefulness of these methods.

Kloka ord för ekonometriker och andra “räknenissar” att begrunda!

Blog at WordPress.com. | The Pool Theme.

Entries and comments feeds.