## Linearity — a dangerous assumption

16 November, 2017 at 14:44 | Posted in Statistics & Econometrics | Comments Off on Linearity — a dangerous assumption

## P-hacking and data dredging

12 November, 2017 at 14:31 | Posted in Statistics & Econometrics | Comments Off on P-hacking and data dredgingP-hacking refers to when you massage your data and analysis methods until your result reaches a statistically significant p-value. I will put it to you that in practice most p-hacking is not necessarily about hacking p-s but about dredging your data until your results fit a particular pattern. That may be something you predicted but didn’t find or could even just be some chance finding that looked interesting and is amplified this way. However, the p-value is usually probably secondary to the act here. The end result may very well be the same in that you continue abusing the data until a finding becomes significant, but I would bet that in most cases what matters to people is not the p-value but the result. Moreover, while null-hypothesis significance testing with p-values is still by far the most widespread way to make inferences about results, it is not the only way. All this fussing about p-hacking glosses over the fact that the same analytic flexibility or data dredging can be applied to any inference, whether it is based on p-values, confidence intervals, Bayes factors, posterior probabilities, or simple summary statistics …

Everybody p-hacks if left to their own devices. Preregistration and open data can help protect yourself against your mind’s natural tendency to perceive patterns in noise. A scientist’s training is all about developing techniques to counteract this tendency, and so open practices are just another tool for achieving that purpose.

## Time to abandon statistical significance

27 September, 2017 at 10:55 | Posted in Statistics & Econometrics | 6 CommentsWe recommend dropping the NHST [null hypothesis significance testing] paradigm — and the p-value thresholds associated with it — as the default statistical paradigm for research, publication, and discovery in the biomedical and social sciences. Specifically, rather than allowing statistical signicance as determined by p < 0.05 (or some other statistical threshold) to serve as a lexicographic decision rule in scientic publication and statistical decision making more broadly as per the status quo, we propose that the p-value be demoted from its threshold screening role and instead, treated continuously, be considered along with the neglected factors [such factors as prior and related evidence, plausibility of mechanism, study design and data quality, real world costs and benefits, novelty of finding, and other factors that vary by research domain] as just one among many pieces of evidence.

We make this recommendation for three broad reasons. First, in the biomedical and social sciences, the sharp point null hypothesis of zero effect and zero systematic error used in the overwhelming majority of applications is generally not of interest because it is generally implausible. Second, the standard use of NHST — to take the rejection of this straw man sharp point null hypothesis as positive or even definitive evidence in favor of some preferredalternative hypothesis — is a logical fallacy that routinely results in erroneous scientic reasoning even by experienced scientists and statisticians. Third, p-value and other statistical thresholds encourage researchers to study and report single comparisons rather than focusing on the totality of their data and results.

As shown over and over again when significance tests are applied, people have a tendency to read ‘not disconfirmed’ as ‘probably confirmed.’ Standard scientific methodology tells us that when there is only say a 10 % probability that pure sampling error could account for the observed difference between the data and the null hypothesis, it would be more ‘reasonable’ to conclude that we have a case of disconfirmation. Especially if we perform many independent tests of our hypothesis and they all give about the same 10 % result as our reported one, I guess most researchers would count the hypothesis as even more disconfirmed.

We should never forget that the underlying parameters we use when performing significance tests are *model constructions*. Our p-values mean nothing if the model is wrong. And most importantly — statistical significance tests DO NOT validate models!

In journal articles a typical regression equation will have an intercept and several explanatory variables. The regression output will usually include an F-test, with p – 1 degrees of freedom in the numerator and n – p in the denominator. The null hypothesis will not be stated. The missing null hypothesis is that all the coefficients vanish, except the intercept.

If F is significant, that is often thought to validate the model. Mistake. The F-test takes the model as given. Significance only means this:

ifthe model is rightandthe coefficients are 0, it is very unlikely to get such a big F-statistic. Logically, there are three possibilities on the table:

i) An unlikely event occurred.

ii) Or the model is right and some of the coefficients differ from 0.

iii) Or the model is wrong.

So?

## Big Data — Poor Science

6 September, 2017 at 20:08 | Posted in Statistics & Econometrics | Comments Off on Big Data — Poor ScienceAlmost everything we do these days leaves some kind of data trace in some computer system somewhere. When such data is aggregated into huge databases it is called “Big Data”. It is claimed social science will be transformed by the application of computer processing and Big Data. The argument is that social science has, historically, been “theory rich” and “data poor” and now we will be able to apply the methods of “real science” to “social science” producing new validated and predictive theories which we can use to improve the world.

What’s wrong with this? … Firstly what is this “data” we are talking about? In it’s broadest sense it is some representation usually in a symbolic form that is machine readable and processable. And how will this data be processed? Using some form of machine learning or statistical analysis. But what will we find? Regularities or patterns … What do such patterns mean? Well that will depend on who is interpreting them …

Looking for “patterns or regularities” presupposes a definition of what a pattern is and that presupposes a hypothesis or model, i.e. a theory. Hence big data does not “get us away from theory” but rather requires theory before any project can commence.

What is the problem here? The problem is that a certain kind of approach is being propagated within the “big data” movement that claims to not be a priori committed to any theory or view of the world. The idea is that data is real and theory is not real. That theory should be induced from the data in a “scientific” way.

I think this is wrong and dangerous. Why? Because it is not clear or honest while appearing to be so. Any statistical test or machine learning algorithm expresses a view of what a pattern or regularity is and any data has been collected for a reason based on what is considered appropriate to measure. One algorithm will find one kind of pattern and another will find something else. One data set will evidence some patterns and not others. Selecting an appropriate test depends on what you are looking for. So the question posed by the thought experiment remains “what are you looking for, what is your question, what is your hypothesis?”

Ideas matter. Theory matters. Big data is not a theory-neutral way of circumventing the hard questions. In fact it brings these questions into sharp focus and it’s time we discuss them openly.

The central problem with the present ‘machine learning’ and ‘big data’ hype is that so many — falsely — think that they can get away with analysing real world phenomena without any (commitment to) theory. But — data never speaks for itself. Without a prior statistical set-up there actually are no data at all to process. And — using a machine learning algorithm will only produce what you are looking for.

Theory matters.

## ‘Autonomy’ in econometics

24 August, 2017 at 22:50 | Posted in Statistics & Econometrics | 2 CommentsThe point of the discussion, of course, has to do with where Koopmans thinks we should look for “autonomous behaviour relations”. He appeals to experience but in a somewhat oblique manner. He refers to the Harvard barometer “to show that relationships between economic variables … not traced to underlying behaviour equations are unreliable as instruments for prediction” … His argument would have been more effectively put had he been able to give instances of relationships that

have been“traced to underlying behaviour equations”andthat have been reliable instruments for prediction. He did not do this, and I know of no conclusive case that he could draw upon. There are of course cases of economic models that he could have mentioned as having beenunreliablepredictors. But these latter instances demonstrate no more than the failure of Harvard barometer: all were presumably built upon relations that were more or less unstable in time. The meaning conveyed, we may suppose, by the term “fundamental autonomous relation” is a relation stable in time and not drawn as an inference from combinations of other relations. The discovery of such relations suitable for the prediction procedure that Koopmans has in mind has yet to be publicly presented, and the phrase “underlying behaviour equation” is left utterly devoid of content.

Guess Robert Lucas didn’t read Vining …

## James Heckman — ‘Nobel prize’ winner gone wrong

20 August, 2017 at 11:32 | Posted in Statistics & Econometrics | 1 CommentHere’s James Heckman in 2013:

“Also holding back progress are those who claim that Perry and ABC are experiments with samples too small to accurately predict widespread impact and return on investment. This is a nonsensical argument. Their relatively small sample sizes actually speak for — not against — the strength of their findings. Dramatic differences between treatment and control-group outcomes are usually not found in small sample experiments, yet the differences in Perry and ABC are big and consistent in rigorous analyses of these data.”Wow. The “What does not kill my statistical significance makes it stronger” fallacy, right there in black and white … Heckman’s pretty much saying that if his results are statistically significant (and “consistent in rigorous analyses,” whatever that means) that they should be believed—and even more so if sample sizes are small (and of course the same argument holds in favor of stronger belief if measurement error is large).

With the extra special bonus that he’s labeling contrary arguments as “nonsensical” …

Heckman is wrong here. Actually, the smaller sample sizes (and also the high variation in these studies) speaks against—not for—the strength of the published claims …

One of the first things yours truly warns his statistics students against, is to jump to the conclusion that signal-to-noise levels have to be high just because they get statistically significant estimates when running regressions. One would have thought a prize winner should know that too …

## Understanding the limits of statistical inference

6 July, 2017 at 18:35 | Posted in Statistics & Econometrics | 1 Comment

This is indeed an instructive video on what *statistical* inference is all about.

But we have to remember that economics and statistics are two quite different things, and as long as economists cannot identify their statistical theories with real-world phenomena there is no real warrant for taking their statistical inferences seriously.

Just as there is no such thing as a ‘free lunch,’ there is no such thing as a ‘free probability.’ To be able at all to talk about probabilities, you have to specify a model. If there is no chance set-up or model that generates the probabilistic outcomes or events -– in statistics one refers to any process where you observe or measure as an experiment (rolling a die) and the results obtained as the outcomes or events (number of points rolled with the die, being e. g. 3 or 5) of the experiment -– there, strictly seen, is no event at all.

Probability is a relational element. It always must come with a specification of the model from which it is calculated. And then to be of any empirical scientific value it has to be shown to coincide with (or at least converge to) real data generating processes or structures –- something seldom or never done in economics.

And this is the basic problem!

If you have a fair roulette-wheel, you can arguably specify probabilities and probability density distributions. But how do you conceive of the analogous ‘nomological machines’ for prices, gross domestic product, income distribution etc? Only by a leap of faith. And that does not suffice. You have to come up with some really good arguments if you want to persuade people into believing in the existence of socio-economic structures that generate data with characteristics conceivable as stochastic events portrayed by probabilistic density distributions! Not doing that, you simply conflate statistical and economic inferences.

And even worse — some economists using statistical methods think that algorithmic formalisms somehow give them access to causality. That is, however, simply not true. Assuming ‘convenient’ things like faithfulness or stability,is to assume what has to be proven. Deductive-axiomatic methods used in statistics do no produce evidence for causal inferences. The real casuality we are searching for is the one existing in the real-world around us. If there is no warranted connection between axiomatically derived statistical theorems and the real-world, well, then we haven’t really obtained the causation we are looking for.

## Heterogeneity and the flaw of averages

29 June, 2017 at 00:11 | Posted in Statistics & Econometrics | Comments Off on Heterogeneity and the flaw of averagesWith interactive confounders explicitly included, the overall treatment effect β0 + β′zt is not a number but a variable that depends on the confounding effects. Absent observation of the interactive compounding effects, what is estimated is some kind of average treatment effect which is called by Imbens and Angrist (1994) a “Local Average Treatment Effect,” which is a little like the lawyer who explained that when he was a young man he lost many cases he should have won but as he grew older he won many that he should have lost, so that on the average justice was done. In other words, if you act as if the treatment effect is a random variable by substituting βt for β0 + β′zt , the notation inappropriately relieves you of the heavy burden of considering what are the interactive confounders and finding some way to measure them. Less elliptically, absent observation of z, the estimated treatment effect should be transferred only into those settings in which the confounding interactive variables have values close to the mean values in the experiment. If little thought has gone into identifying these possible confounders, it seems probable that little thought will be given to the limited applicability of the results in other settings.

Yes, indeed, regression-based averages is something we have reasons to be cautious about.

Suppose we want to estimate the average causal effect of a dummy variable (T) on an observed outcome variable (O). In a usual regression context one would apply an ordinary least squares estimator (OLS) in trying to get an unbiased and consistent estimate:

O = α + βT + ε,

where α is a constant intercept, β a constant ‘structural’ causal effect and ε an error term.

The problem here is that although we may get an estimate of the ‘true’ average causal effect, this may ‘mask’ important heterogeneous effects of a causal nature. Although we get the right answer of the average causal effect being 0, those who are ‘treated’ ( T=1) may have causal effects equal to -100 and those ‘not treated’ (T=0) may have causal effects equal to 100. Contemplating being treated or not, most people would probably be interested in knowing about this underlying heterogeneity and would not consider the OLS average effect particularly enlightening.

The heterogeneity problem does not just turn up as an *external* validity problem when trying to ‘export’ regression results to different times or different target populations. It is also often an *internal* problem to the millions of OLS estimates that economists produce every year.

## What is a statistical model?

24 June, 2017 at 14:05 | Posted in Statistics & Econometrics | 1 CommentMy critique is that the currently accepted notion of a statistical model is not scientific; rather, it is a guess at what might constitute (scientific) reality without the vital element of feedback, that is, without checking the hypothesized, postulated, wished-for, natural-looking (but in fact only guessed) model against that reality. To be blunt, as far as is known today, there is no such thing as a concrete i.i.d. (independent, identically distributed) process, not because this is not desirable, nice, or even beautiful, but because Nature does not seem to be like that … As Bertrand Russell put it at the end of his long life devoted to philosophy, “Roughly speaking, what we know is science and what we don’t know is philosophy.” In the scientific context, but perhaps not in the applied area, I fear statistical modeling today belongs to the realm of philosophy.

To make this point seem less erudite, let me rephrase it in cruder terms. What would a scientist expect from statisticians, once he became interested in statistical problems? He would ask them to explain to him, in some clear-cut cases, the origin of randomness frequently observed in the real world, and furthermore, when this explanation depended on the device of a model, he would ask them to continue to confront that model with the part of reality that the model was supposed to explain. Something like this was going on three hundred years ago … But in our times the idea somehow got lost when i.i.d. became the pampered new baby.

Should we define randomness with probability? If we do, we have to accept that to speak of randomness we also have to presuppose the existence of nomological probability machines, since probabilities cannot be spoken of — and actually, to be strict, do not at all exist — without specifying such system-contexts. Accepting Haavelmo’s domain of probability theory and sample space of infinite populations — just as Fisher’s ‘hypothetical infinite population,’ von Mises’ ‘collective’ or Gibbs’ ‘ensemble’ — also implies that judgments are made on the basis of observations that are actually never made!

Infinitely repeated trials or samplings never take place in the real world. So that cannot be a sound inductive basis for a science with aspirations of explaining real-world socio-economic processes, structures or events. It’s not tenable. And so the way social scientists — including economists and econometricians — often uncritically and without arguments have come to simply assume that one can apply probability distributions from statistical theory on their own area of research, is not acceptable.

This importantly also means that if you cannot show that data satisfies *all* the conditions of the probabilistic nomological machine — including e. g. the distribution of the deviations corresponding to a normal curve — then the statistical inferences used, lack sound foundations.

Trying to apply statistical models outside overly simple nomological machines like coin tossing and roulette wheels, scientists run into serious problems, the greatest being the need for lots of more or less unsubstantiated — and sometimes wilfully hidden — assumptions to be able to make any sustainable inferences from the models. Much of the results that economists and other social scientists present with their statistical/econometric models depend to a substantial part on the use of mostly unfounded ‘technical’ assumptions.

Making outlandish statistical assumptions does not provide a solid ground for doing relevant social science. It is rather a recipe for producing fiction masquerading as science.

## Simpson’s paradox

21 June, 2017 at 08:29 | Posted in Statistics & Econometrics | Comments Off on Simpson’s paradox

From a more theoretical perspective, Simpson’s paradox importantly shows that causality can never be reduced to a question of statistics or probabilities, unless you are — miraculously — able to keep constant *all* other factors that influence the probability of the outcome studied.

To understand causality we always have to relate it to a specific causal *structure*. Statistical correlations are *never* enough. No structure, no causality.

Simpson’s paradox is an interesting paradox in itself, but it can also highlight a deficiency in the traditional econometric approach towards causality. Say you have 1000 observations on men and an equal amount of observations on women applying for admission to university studies, and that 70% of men are admitted, but only 30% of women. Running a logistic regression to find out the odds ratios (and probabilities) for men and women on admission, females seem to be in a less favourable position (‘discriminated’ against) compared to males (male odds are 2.33, female odds are 0.43, giving an odds ratio of 5.44). But once we find out that males and females apply to different departments we may well get a Simpson’s paradox result where males turn out to be ‘discriminated’ against (say 800 male apply for economics studies (680 admitted) and 200 for physics studies (20 admitted), and 100 female apply for economics studies (90 admitted) and 900 for physics studies (210 admitted) — giving odds ratios of 0.62 and 0.37).

Econometric patterns should never be seen as anything else than possible clues to follow. From a critical realist perspective it is obvious that behind observable data there are real structures and mechanisms operating, things that are — if we really want to understand, explain and (possibly) predict things in the real world — more important to get hold of than to simply correlate and regress observable variables.

Math cannot establish the truth value of a fact. Never has. Never will.

Blog at WordPress.com.

Entries and comments feeds.