Textbooks problem — teaching the wrong things all too well

25 March, 2017 at 16:01 | Posted in Statistics & Econometrics | 2 Comments

It is well known that even experienced scientists routinely misinterpret p-values in all sorts of ways, including confusion of statistical and practical significance, treating non-rejection as acceptance of the null hypothesis, and interpreting the p-value as some sort of replication probability or as the posterior probability that the null hypothesis is true …

servicemanIt is shocking that these errors seem so hard-wired into statisticians’ thinking, and this suggests that our profession really needs to look at how it teaches the interpretation of statistical inferences. The problem does not seem just to be technical misunderstandings; rather, statistical analysis is being asked to do something that it simply can’t do, to bring out a signal from any data, no matter how noisy. We suspect that, to make progress in pedagogy, statisticians will have to give up some of the claims we have implicitly been making about the effectiveness of our methods …

It would be nice if the statistics profession was offering a good solution to the significance testing problem and we just needed to convey it more clearly. But, no, … many statisticians misunderstand the core ideas too. It might be a good idea for other reasons to recommend that students take more statistics classes—but this won’t solve the problems if textbooks point in the wrong direction and instructors don’t understand what they are teaching. To put it another way, it’s not that we’re teaching the right thing poorly; unfortunately, we’ve been teaching the wrong thing all too well.

Andrew Gelman & John Carlin

Teaching both statistics and economics, yours truly can’t but notice that the statements “give up some of the claims we have implicitly been making about the effectiveness of our methods” and “it’s not that we’re teaching the right thing poorly; unfortunately, we’ve been teaching the wrong thing all too well” obviously apply not only to statistics …

And the solution? Certainly not — as Gelman and Carlin also underline — to reform p-values. Instead we have to accept that we live in a world permeated by genuine uncertainty and that it takes a lot of variation to make good inductive inferences.

Sounds familiar? It definitely should!

treatprobThe standard view in statistics – and the axiomatic probability theory underlying it – is to a large extent based on the rather simplistic idea that ‘more is better.’ But as Keynes argues in his seminal A Treatise on Probability (1921), ‘more of the same’ is not what is important when making inductive inferences. It’s rather a question of ‘more but different’ — i.e., variation.

Variation, not replication, is at the core of induction. Finding that p(x|y) = p(x|y & w) doesn’t make w ‘irrelevant.’ Knowing that the probability is unchanged when w is present gives p(x|y & w) another evidential weight (‘weight of argument’). Running 10 replicative experiments do not make you as ‘sure’ of your inductions as when running 10 000 varied experiments – even if the probability values happen to be the same.

According to Keynes we live in a world permeated by unmeasurable uncertainty – not quantifiable stochastic risk – which often forces us to make decisions based on anything but ‘rational expectations.’ Keynes rather thinks that we base our expectations on the confidence or ‘weight’ we put on different events and alternatives. To Keynes expectations are a question of weighing probabilities by ‘degrees of belief,’ beliefs that often have preciously little to do with the kind of stochastic probabilistic calculations made by the rational agents as modeled by “modern” social sciences. And often we ‘simply do not know.’



  1. It is more than 40 years since I studied statistics but have two stories to tell.

    I was a student in a masters of Social Work programme and was completing my practicum requirement in an agency in a nearby city. I was asked to liaise with a part-time worker conducting research into community attitudes and values and comparing the youth to the parents via survey. His survey results achieved significance at .95 if memory serves (he used a Chi Square analysis) and he accepted the null hypothesis that there was no difference. Funding for programmes for youth in the community hinged on the survey results. Of course, you know already what was wrong. What was interesting for me was that two other researchers who had been hired to supervise the student allowed him to accept the null hypothesis. My supervisor who had ultimate authority also allowed it. So there was a panic in the upper echelons of the agency until I went through his work and realized what he had done. It was corrected and the funding was approved. My treasured copy of the research is autographed by the student who gave me a new nickname — “Stats.”

    Earlier that same year I was taking a course in advanced statistics and the night before the exam I went out partying to celebrate the end of the term. So when I wrote the exam the next day I was still somewhat impaired. There was a question on the exam requiring a plan as to how to analyze some results. When I got the exam back the professor had made the comment that “… while [my] answer was unusual or atypical for that problem, [I] had convinced him that it would work in this instance.” Being sober when I reviewed the paper I groaned because in the (literally) sober light of day, I knew it would not work. BS baffles brains sometimes!

  2. According to Prof. Syll: “To Keynes expectations are a question of weighing probabilities by ‘degrees of belief’ “
    To clarify and illustrate this methodology, please reference practical examples where it is applied to the real world.
    —– The need for examples is explained below:—-
