## On the importance of power

31 January, 2013 at 17:20 | Posted in Statistics & Econometrics | Leave a comment

## The last battle

30 January, 2013 at 14:10 | Posted in Varia | Leave a comment

**In loving memory of my brother – Peter “Uncas” Pålsson**

## Did p values work? Read my lips – they didn’t!

29 January, 2013 at 21:36 | Posted in Statistics & Econometrics, Theory of Science & Methodology | Leave a commentJager and Leek may well be correct in their larger point, that the medical literature is broadly correct. But I don’t think the statistical framework they are using is appropriate for the questions they are asking. My biggest problem is the identification of scientific hypotheses and statistical “hypotheses” of the “theta = 0″ variety.

Based on the word “empirical” title, I thought the authors were going to look at a large number of papers with p-values and then follow up and see if the claims were replicated. But no, they don’t follow up on the studies at all! What they seem to be doing is collecting a set of published p-values and then fitting a mixture model to this distribution, a mixture of a uniform distribution (for null effects) and a beta distribution (for non-null effects). Since only statistically significant p-values are typically reported, they fit their model restricted to p-values less than 0.05. But this all assumes that the p-values have this stated distribution. You don’t have to be Uri Simonsohn to know that there’s a lot of p-hacking going on. Also, as noted above, the problem isn’t really effects that are exactly zero, the problem is that a lot of effects are lots in the noise and are essentially undetectable given the way they are studied.

Jager and Leek write that their model is commonly used to study hypotheses in genetics and imaging. I could see how this model could make sense in those fields … but I don’t see this model applying to published medical research, for two reasons. First … I don’t think there would be a sharp division between null and non-null effects; and, second, there’s just too much selection going on for me to believe that the conditional distributions of the p-values would be anything like the theoretical distributions suggested by Neyman-Pearson theory.

So, no, I don’t at all believe Jager and Leek when they write, “we are able to empirically estimate the rate of false positives in the medical literature and trends in false positive rates over time.” They’re doing this by basically assuming the model that is being questioned, the textbook model in which effects are pure and in which there is no p-hacking.

Indeed. If anything, this underlines how important it is not to equate science with statistical calculation. All science entail human judgement, and using statistical models doesn’t relieve us of that necessity. Working with misspecified models, the scientific value of significance testing is actually zero – even though you’re making valid statistical inferences! Statistical models and concomitant significance tests are no substitutes for doing real science. Or as a noted German philosopher once famously wrote:

There is no royal road to science, and only those who do not dread the fatiguing climb of its steep paths have a chance of gaining its luminous summits.

## Neyman-Pearson vs. Fisher on p values

28 January, 2013 at 19:05 | Posted in Statistics & Econometrics | Leave a comment

## Keep It Sophisticatedly Simple

28 January, 2013 at 14:36 | Posted in Varia | 1 CommentArnold Zellner’s KISS rule – Keep It Sophisticatedly Simple – has its application even outside of econometrics. An example is the film music of Stefan Nilsson. Here in the breathtakingly beautiful “Fäboden” from Bille August’s and Ingmar Bergman’s masterpiece *The Best Intentions*.

## Monte Carlo simulation and p values (wonkish)

28 January, 2013 at 11:45 | Posted in Statistics & Econometrics | 9 CommentsIn many social sciences p values and null hypothesis significance testing (NHST) are often used to draw far-reaching scientific conclusions – despite the fact that they are as a rule poorly understood and that there exist altenatives that are easier to understand and more informative.

Not the least using confidence intervals (CIs) and effect sizes are to be preferred to the Neyman-Pearson-Fisher mishmash approach that is so often practised by applied researchers.

Running a Monte Carlo simulation with 100 replications of a fictitious sample having N = 20, confidence itervals of 95%, a normally distributed population with a mean = 10 and a standard deviation of 20, taking two-tailed p values on a zero null hypothesis, we get varying CIs (since they are based on varying sample standard deviations), but with a minimum of 3.2 and a maximum of 26.1 we still get a clear picture of what would happen in an infinite limit sequence. On the other hand p values (even though from a purely mathematical statistical sense more or less equivalent to CIs) vary strongly from sample to sample, and jumping around between a minimum of 0.007 and a maximum of 0.999 don’t give you a clue of what will happen in an infinite limit sequence! So, I can’t but agree with Geoff Cummings:

The problems are so severe we need to shift as much as possible from NHST … The first shift should be to estimation: report and interpret effect sizes and CIs … I suggest p should be given only a marginal role, its problem explained, and it should be interpreted primarily as an indicator of where the 95% CI falls in relation to a null hypothesised value.

[In case you want to do your own Monte Carlo simulation, here's an example using Gretl:

nulldata 20

loop 100 --progressive

series y = normal(10,15)

scalar zs = (10-mean(y))/sd(y)

scalar df = $nobs-1

scalar ybar=mean(y)

scalar ysd= sd(y)

scalar ybarsd=ysd/sqrt($nobs)

scalar tstat = (ybar-10)/ybarsd

pvalue t df tstat

scalar lowb = mean(y) - critical(t,df,0.025)*ybarsd

scalar uppb = mean(y) + critical(t,df,0.025)*ybarsd

scalar pval = pvalue(t,df,tstat)

store E:\pvalcoeff.gdt lowb uppb pval

endloop]

## Fun with statistics

27 January, 2013 at 00:08 | Posted in Statistics & Econometrics | 3 CommentsYours truly gives a PhD course in statistics for students in education and sports this semester. And between teaching them all about Chebyshev’s Theorem, Beta Distributions, Moment-Generating Functions and the Neyman-Pearson Lemma, I try to remind them that statistics can actually also be fun …

## Austerity – this is what it’s all about

26 January, 2013 at 23:49 | Posted in Politics & Society | Leave a comment

## Keynes on statistics and evidential weight

25 January, 2013 at 19:04 | Posted in Statistics & Econometrics, Theory of Science & Methodology | 5 CommentsAlmost a hundred years after John Maynard Keynes wrote his seminal *A Treatise on Probability* (1921), it is still very difficult to find statistics textbooks that seriously try to incorporate his far-reaching and incisive analysis of induction and evidential weight.

The standard view in statistics – and the axiomatic probability theory underlying it – is to a large extent based on the rather simplistic idea that “more is better.” But as Keynes argues – “more of the same” is not what is important when making inductive inferences. It’s rather a question of “more but different.”

Variation, not replication, is at the core of induction. Finding that p(x|y) = p(x|y & w) doesn’t make w “irrelevant.” Knowing that the probability is unchanged when w is present gives p(x|y & w) another evidential weight (“weight of argument”). Running 10 replicative experiments do not make you as “sure” of your inductions as when running 10 000 varied experiments – even if the probability values happen to be the same.

According to Keynes we live in a world permeated by unmeasurable uncertainty – not quantifiable stochastic risk – which often forces us to make decisions based on anything but “rational expectations.” Keynes rather thinks that we base our expectations on the confidence or “weight” we put on different events and alternatives. To Keynes expectations are a question of weighing probabilities by “degrees of belief,” beliefs that often have preciously little to do with the kind of stochastic probabilistic calculations made by the rational agents as modeled by “modern” social sciences. And often we “simply do not know.” As Keynes writes in *Treatise*:

The kind of fundamental assumption about the character of material laws, on which scientists appear commonly to act, seems to me to be [that] the system of the material universe must consist of bodies … such that each of them exercises its own separate, independent, and invariable effect, a change of the total state being compounded of a number of separate changes each of which is solely due to a separate portion of the preceding state … Yet there might well be quite different laws for wholes of different degrees of complexity, and laws of connection between complexes which could not be stated in terms of laws connecting individual parts … If different wholes were subject to different laws qua wholes and not simply on account of and in proportion to the differences of their parts, knowledge of a part could not lead, it would seem, even to presumptive or probable knowledge as to its association with other parts … These considerations do not show us a way by which we can justify induction … /427 No one supposes that a good induction can be arrived at merely by counting cases. The business of strengthening the argument chiefly consists in determining whether the alleged association is stable, when accompanying conditions are varied … /468 In my judgment, the practical usefulness of those modes of inference … on which the boasted knowledge of modern science depends, can only exist … if the universe of phenomena does in fact present those peculiar characteristics of atomism and limited variety which appears more and more clearly as the ultimate result to which material science is tending.

Science according to Keynes should help us penetrate to “the true process of causation lying behind current events” and disclose “the causal forces behind the apparent facts.” Models can never be more than a starting point in that endeavour. He further argued that it was inadmissible to project history on the future. Consequently we cannot presuppose that what has worked before, will continue to do so in the future. That statistical models can get hold of correlations between different “variables” is not enough. If they cannot get at the causal structure that generated the data, they are not really “identified.”

How strange that writers of statistics textbook as a rule do not even touch upon these aspects of scientific methodology that seems to be so fundamental and important for anyone trying to understand how we learn and orient ourselves in an uncertain world. An educated guess on why this is a fact would be that Keynes concepts are not possible to squeeze into a single calculable numerical “probability.” In the quest for quantities one puts a blind eye to qualities and looks the other way – but Keynes ideas keep creeping out from under the statistics carpet.

It’s high time that statistics textbooks give Keynes his due.

Blog at WordPress.com. | The Pool Theme.

Entries and comments feeds.