## On the importance of power

31 Jan, 2013 at 17:20 | Posted in Statistics & Econometrics | Comments Off on On the importance of power

## Did p values work? Read my lips – they didn’t!

29 Jan, 2013 at 21:36 | Posted in Statistics & Econometrics, Theory of Science & Methodology | Comments Off on Did p values work? Read my lips – they didn’t!Jager and Leek may well be correct in their larger point, that the medical literature is broadly correct. But I don’t think the statistical framework they are using is appropriate for the questions they are asking. My biggest problem is the identification of scientific hypotheses and statistical “hypotheses” of the “theta = 0″ variety.

Based on the word “empirical” title, I thought the authors were going to look at a large number of papers with p-values and then follow up and see if the claims were replicated. But no, they don’t follow up on the studies at all! What they seem to be doing is collecting a set of published p-values and then fitting a mixture model to this distribution, a mixture of a uniform distribution (for null effects) and a beta distribution (for non-null effects). Since only statistically significant p-values are typically reported, they fit their model restricted to p-values less than 0.05. But this all assumes that the p-values have this stated distribution. You don’t have to be Uri Simonsohn to know that there’s a lot of p-hacking going on. Also, as noted above, the problem isn’t really effects that are exactly zero, the problem is that a lot of effects are lots in the noise and are essentially undetectable given the way they are studied.

Jager and Leek write that their model is commonly used to study hypotheses in genetics and imaging. I could see how this model could make sense in those fields … but I don’t see this model applying to published medical research, for two reasons. First … I don’t think there would be a sharp division between null and non-null effects; and, second, there’s just too much selection going on for me to believe that the conditional distributions of the p-values would be anything like the theoretical distributions suggested by Neyman-Pearson theory.

So, no, I don’t at all believe Jager and Leek when they write, “we are able to empirically estimate the rate of false positives in the medical literature and trends in false positive rates over time.” They’re doing this by basically assuming the model that is being questioned, the textbook model in which effects are pure and in which there is no p-hacking.

Indeed. If anything, this underlines how important it is not to equate science with statistical calculation. All science entail human judgement, and using statistical models doesn’t relieve us of that necessity. Working with misspecified models, the scientific value of significance testing is actually zero – even though you’re making valid statistical inferences! Statistical models and concomitant significance tests are no substitutes for doing real science. Or as a noted German philosopher once famously wrote:

There is no royal road to science, and only those who do not dread the fatiguing climb of its steep paths have a chance of gaining its luminous summits.

## Neyman-Pearson vs. Fisher on p values

28 Jan, 2013 at 19:05 | Posted in Statistics & Econometrics | Comments Off on Neyman-Pearson vs. Fisher on p values

## Keep It Sophisticatedly Simple

28 Jan, 2013 at 14:36 | Posted in Varia | 1 CommentArnold Zellner’s KISS rule – Keep It Sophisticatedly Simple – has its application even outside of econometrics. An example is the film music of Stefan Nilsson. Here in the breathtakingly beautiful “Fäboden” from Bille August’s and Ingmar Bergman’s masterpiece *The Best Intentions*.

## Fun with statistics

27 Jan, 2013 at 00:08 | Posted in Statistics & Econometrics | 3 CommentsYours truly gives a PhD course in statistics for students in education and sports this semester. And between teaching them all about Chebyshev’s Theorem, Beta Distributions, Moment-Generating Functions and the Neyman-Pearson Lemma, I try to remind them that statistics can actually also be fun …

## Austerity – this is what it’s all about

26 Jan, 2013 at 23:49 | Posted in Politics & Society | Comments Off on Austerity – this is what it’s all about

## Keynes on statistics and evidential weight

25 Jan, 2013 at 19:04 | Posted in Statistics & Econometrics, Theory of Science & Methodology | 6 CommentsAlmost a hundred years after John Maynard Keynes wrote his seminal *A Treatise on Probability* (1921), it is still very difficult to find statistics textbooks that seriously try to incorporate his far-reaching and incisive analysis of induction and evidential weight.

The standard view in statistics – and the axiomatic probability theory underlying it – is to a large extent based on the rather simplistic idea that “more is better.” But as Keynes argues – “more of the same” is not what is important when making inductive inferences. It’s rather a question of “more but different.”

Variation, not replication, is at the core of induction. Finding that p(x|y) = p(x|y & w) doesn’t make w “irrelevant.” Knowing that the probability is unchanged when w is present gives p(x|y & w) another evidential weight (“weight of argument”). Running 10 replicative experiments do not make you as “sure” of your inductions as when running 10 000 varied experiments – even if the probability values happen to be the same.

According to Keynes we live in a world permeated by unmeasurable uncertainty – not quantifiable stochastic risk – which often forces us to make decisions based on anything but “rational expectations.” Keynes rather thinks that we base our expectations on the confidence or “weight” we put on different events and alternatives. To Keynes expectations are a question of weighing probabilities by “degrees of belief,” beliefs that often have preciously little to do with the kind of stochastic probabilistic calculations made by the rational agents as modeled by “modern” social sciences. And often we “simply do not know.” As Keynes writes in *Treatise*:

The kind of fundamental assumption about the character of material laws, on which scientists appear commonly to act, seems to me to be [that] the system of the material universe must consist of bodies … such that each of them exercises its own separate, independent, and invariable effect, a change of the total state being compounded of a number of separate changes each of which is solely due to a separate portion of the preceding state … Yet there might well be quite different laws for wholes of different degrees of complexity, and laws of connection between complexes which could not be stated in terms of laws connecting individual parts … If different wholes were subject to different laws qua wholes and not simply on account of and in proportion to the differences of their parts, knowledge of a part could not lead, it would seem, even to presumptive or probable knowledge as to its association with other parts … These considerations do not show us a way by which we can justify induction … /427 No one supposes that a good induction can be arrived at merely by counting cases. The business of strengthening the argument chiefly consists in determining whether the alleged association is stable, when accompanying conditions are varied … /468 In my judgment, the practical usefulness of those modes of inference … on which the boasted knowledge of modern science depends, can only exist … if the universe of phenomena does in fact present those peculiar characteristics of atomism and limited variety which appears more and more clearly as the ultimate result to which material science is tending.

Science according to Keynes should help us penetrate to “the true process of causation lying behind current events” and disclose “the causal forces behind the apparent facts.” Models can never be more than a starting point in that endeavour. He further argued that it was inadmissible to project history on the future. Consequently we cannot presuppose that what has worked before, will continue to do so in the future. That statistical models can get hold of correlations between different “variables” is not enough. If they cannot get at the causal structure that generated the data, they are not really “identified.”

How strange that writers of statistics textbook as a rule do not even touch upon these aspects of scientific methodology that seems to be so fundamental and important for anyone trying to understand how we learn and orient ourselves in an uncertain world. An educated guess on why this is a fact would be that Keynes concepts are not possible to squeeze into a single calculable numerical “probability.” In the quest for quantities one puts a blind eye to qualities and looks the other way – but Keynes ideas keep creeping out from under the statistics carpet.

It’s high time that statistics textbooks give Keynes his due.

## Tobinskatt – ja tack!

25 Jan, 2013 at 11:09 | Posted in Economics, Politics & Society | Comments Off on Tobinskatt – ja tack!Yours truly har idag en artikel på Dagens Arena om att EU:s kärnländer nu bestämt sig för att införa en skatt på finansiella transaktioner.

## The sky’s the limit?

23 Jan, 2013 at 17:42 | Posted in Varia | 1 Comment Yours truly launched this blog two years ago. The number of visitors has increased steadily. From having only a couple of hundred visits per month at the start, I’m now having almost 55 000 visits per month. A blog is sure not a beauty contest, but given the rather “wonkish” character of the blog – with posts mostly on economic theory, statistics, econometrics, theory of science and methodology – it’s rather gobsmacking that so many are interested and take their time to read and comment on it. **I am – of course – truly awed, honoured and delighted!**

## Mainstream economics and neoliberalism

23 Jan, 2013 at 13:37 | Posted in Economics, Politics & Society | 5 CommentsUnlearning economics has an interesting post on some important shortcomings of mainstream (neoclassical) economics and libertarianism (on which I have written e.g. here, here, and here ):

I’ve touched briefly before on how behavioural economics makes the central libertarian mantra of being ‘free to choose’ completely incoherent. Libertarians tend to have a difficult time grasping this, responding with things like ‘so people aren’t rational; they’re still the best judges of their own decisions’. My point here is not necessarily that people are not the best judges of their own decisions, but that the idea of freedom of choice – as interpreted by libertarians – is nonsensical once you start from a behavioural standpoint.

The problem is that neoclassical economics, by modelling people as rational utility maximisers, lends itself to a certain way of thinking about government intervention. For if you propose intervention on the grounds that they are not rational utility maximisers, you are told that you are treating people as if they are stupid. Of course, this isn’t the case – designing policy as if people are rational utility maximisers is no different ethically to designing it as if they rely on various heuristics and suffer cognitive biases.This ‘treating people as if they are stupid’ mentality highlights problem with neoclassical choice modelling: behaviour is generally considered either ‘rational’ or ‘irrational’. But this isn’t a particularly helpful way to think about human action – as Daniel Kuehn says, heuristics are not really ‘irrational’; they simply save time, and as this video emphasises, they often produce better results than homo economicus-esque calculation. So the line between rationality and irrationality becomes blurred.

For an example of how this flawed thinking pervades libertarian arguments, consider the case of excessive choice. It is well documented that people can be overwhelmed by too much choice, and will choose to put off the decision or just abandon trying altogether. So is somebody who is so inundated with choice that they don’t know what to do ‘free to choose’? Well, not really – their liberty to make their own decisions is hamstrung.

Another example is the case of Nudge. The central point of this book is that people’s decisions are always pushed in a certain direction, either by advertising and packaging, by what the easiest or default choice is, by the way the choice is framed, or any number of other things. This completely destroys the idea of ‘free to choose’ – if people’s choices are rarely or never made neutrally, then one cannot be said to be ‘deciding for them’ any more than the choice was already ‘decided’ for them. The best conclusion is to push their choices in a ‘good’ direction (e.g. towards healthy food rather than junk). Nudging people isn’t a decision – they are almost always nudged. The question is the direction they are nudged in.

It must also be emphasised that choices do not come out of nowhere – they are generally presented with a flurry of bright colours and offers from profit seeking companies. These things do influence us, as much as we hate to admit it, so to work from the premise that the state is the only one that can exercise power and influence in this area is to miss the point.

The fact is that the way both neoclassical economists and libertarians think about choice is fundamentally flawed – in the case of neoclassicism, it cannot be remedied with ‘utility maximisation plus a couple of constraints’; in the case of libertarianism it cannot be remedied by saying ‘so what if people are irrational? They should be allowed to be irrational.’ Both are superficial remedies for a fundamentally flawed epistemological starting point for human action.

## On significance and model validation

22 Jan, 2013 at 12:53 | Posted in Statistics & Econometrics | 9 CommentsLet us suppose that we as educational reformers have a hypothesis that implementing a voucher system would raise the mean test results with 100 points (null hypothesis). Instead, when sampling, it turns out it only raises it with 75 points and having a standard error (telling us how much the mean varies from one sample to another) of 20.

Does this imply that the data do not disconfirm the hypothesis? Given the usual normality assumptions on sampling distributions, with a t-value of 1.25 [(100-75)/20] the one-tailed p-value is approximately 0.11. Thus, approximately 11% of the time we would expect a score this low or lower if we were sampling from this voucher system population. That means – using the ordinary 5% significance-level, we would not reject the null hypothesis aalthough the test has shown that it is likely – the odds are 0.89/0.11 or 8-to-1 – that the hypothesis is false.

In its standard form, a significance test is not the kind of “severe test” that we are looking for in our search for being able to confirm or disconfirm empirical scientific hypothesis. This is problematic for many reasons, one being that there is a strong tendency to accept the null hypothesis since they can’t be rejected at the standard 5% significance level. In their standard form, significance tests bias against new hypothesis by making it hard to disconfirm the null hypothesis.

And as shown over and over again when it is applied, people have a tendency to read “not disconfirmed” as “probably confirmed.” But looking at our example, standard scientific methodology tells us that since there is only 11% probability that pure sampling error could account for the observed difference between the data and the null hypothesis, it would be more “reasonable” to conclude that we have a case of disconfirmation. Especially if we perform many independent tests of our hypothesis and they all give about the same result as our reported one, I guess most researchers would count the hypothesis as even more disconfirmed.

And, most importantly, of course we should never forget that the underlying parameters we use when performing significance tests are *model constructions*. Our p-value of 0.11 means next to nothing if the model is wrong. As David Freedman writes in *Statistical Models and Causal Inference*:

I believe model validation to be a central issue. Of course, many of my colleagues will be found to disagree. For them, fitting models to data, computing standard errors, and performing significance tests is “informative,” even though the basic statistical assumptions (linearity, independence of errors, etc.) cannot be validated. This position seems indefensible, nor are the consequences trivial. Perhaps it is time to reconsider.

## Significance tests and ladies tasting tea

22 Jan, 2013 at 00:02 | Posted in Statistics & Econometrics | Comments Off on Significance tests and ladies tasting teaThe mathematical formulations of statistics can be used to compute probabilities. Those probabilities enable us to apply statistical methods to scientific problems. In terms of the mathematics used, probability is well defined. How does this abstract concept connect to reality? How is the scientist to interpret the probability statements of statistical analyses when trying to decide what is true and what is not? …

Fisher’s use of a significance test produced a number Fisher called the p-value. This is a calculated probabiity, a probability associated with the observed data under the assumption that the null hypothesis is true. For instance, suppose we wish to test a new drug for the prevention of a recurrence of breast cancer in patients who have had mastectomies, comparing it to a placebo. The null hypothesis, the straw man, is that the drug is no better than the placebo …

Since [the p-value] is used to show that the hypothesis under which it is calculated is false, what does it really mean? It is a theoretical probability associated with the observations under conditions that are most likely false. It has nothing to do with reality. It is an indirect measurement of plausibility. It is not the probability that we would be wrong to say that the drug works. It is not the probability of any kind of error. It is not the probability that a patient will do as well on the placebo as on the drug.

## P-values and the real tasks of social science

21 Jan, 2013 at 15:47 | Posted in Statistics & Econometrics | Comments Off on P-values and the real tasks of social scienceAfter having mastered all the technicalities of regression analysis and econometrics, students often feel as though they are the masters of universe. I usually cool them down with a required reading of **Christopher Achen**‘s modern classic *Interpreting and Using Regression*.

It usually get them back on track again, and they understand that

no increase in methodological sophistication … alter the fundamental nature of the subject. It remains a wondrous mixture of rigorous theory, experienced judgment, and inspired guesswork. And that, finally, is its charm.

And in case they get to excited about having learned to master the intricacies of proper significance tests and p-values, I ask them to also ponder on Achen’s warning:

Significance testing as a search for specification errors substitutes calculations for substantive thinking. Worse, it channels energy toward the hopeless search for functionally correct specifications and divert attention from the real tasks, which are to formulate a manageable description of the data and to exclude competing ones.

## Om de döda skall icke tigas men talas

20 Jan, 2013 at 23:42 | Posted in Varia | 1 Comment###### Till *Fadime Sahindal*, född 2 april 1975 i Turkiet, mördad 21 januari 2002 i Sverige

DE DÖDADe döda skall icke tiga men tala.

Förskingrad plåga skall finna sin röst,

och när cellernas råttor och mördarnas kolvar

förvandlats till aska och urgammalt stoft

skall kometens parabel och stjärnornas vågspel

ännu vittna om dessa som föll mot sin mur:

tvagna i eld men inte förbrunna till glöd,

förtrampade slagna men utan ett sår på sin kropp,

och ögon som stirrat i fasa skall öppnas i frid,

och de döda skall icke tiga men tala.

Om de döda skall inte tigas men talas.

Fast stympade strypta i maktens cell,

glasartade beledda i cyniska väntrum

där döden har klistrat sin freds propaganda,

skall de vila länge i samvetets montrar.

balsamerade av sanning och tvagna i eld,

och de som redan har stupat skall icke brytas,

och den som tiggde nåd i ett ögonblicks glömska

skall resa sig och vittna om det som inte brytes,

för de döda skall inte tiga men tala.

Nej, de döda skall icke tiga men tala.

De som kände triumf på sin nacke skall höja sitt huvud,

och de som kvävdes av rök skall se klart,

de som pinades galna skall flöda som källor,

de som föll för sin motsats skall själva fälla,

de som dräptes med bly skall dräpa med eld,

de som vräktes av vågor skall själva bli storm.

Och de döda skall icke tiga men tala.

Erik Lindegren

Blog at WordPress.com.

Entries and comments feeds.