Why p-values cannot be taken at face value

24 Oct, 2016 at 09:00 | Posted in Economics, Statistics & Econometrics | 7 Comments

statistics-done-wrong-alex-reinhartA researcher is interested in differences between Democrats and Republicans in how they perform in a short mathematics test when it is expressed in two different contexts, either involving health care or the military. The research hypothesis is that context matters, and one would expect Democrats to do better in the health- care context and Republicans in the military context … At this point there is a huge number of possible comparisons that can be performed—all consistent with the data. For example, the pattern could be found (with statistical significance) among men and not among women— explicable under the theory that men are more ideological than women. Or the pattern could be found among women but not among men—explicable under the theory that women are more sensitive to context, compared to men … A single overarching research hypothesis—in this case, the idea that issue context interacts with political partisanship to affect mathematical problem-solving skills—corresponds to many different possible choices of the decision variable.

At one level, these multiplicities are obvious. And it would take a highly unscrupulous researcher to perform test after test in a search for statistical significance … Given a particular data set, it is not so difficult to look at the data and construct completely reasonable rules for data exclusion, coding, and data analysis that can lead to statistical significance—thus, the researcher needs only perform one test, but that test is conditional on the data … A researcher when faced with multiple reasonable measures can reason (perhaps correctly) that the one that produces a significant result is more likely to be the least noisy measure, but then decide (incorrectly) to draw inferences based on that one only.

Andrew Gelman & Eric Loken

7 Comments

  1. And for those unfamiliar with Monty Hall controversy:
    .
    //An earlier version, the Three Prisoner Problem, was analyzed in 1959 by Martin Gardner in the journal Scientific American. He called it “a wonderfully confusing little problem” and presciently noted that “in no other branch of mathematics is it so easy for experts to blunder as in probability theory.”
    .
    The experts responded in force to Ms. vos Savant’s column. Of the critical letters she received, close to 1,000 carried signatures with Ph.D.’s, and many were on letterheads of mathematics and science departments.
    .
    “Our math department had a good, self-righteous laugh at your expense,” wrote Mary Jane Still, a professor at Palm Beach Junior College. Robert Sachs, a professor of mathematics at George Mason University in Fairfax, Va., expressed the prevailing view that there was no reason to switch doors.
    .
    “You blew it!” he wrote. “Let me explain: If one door is shown to be a loser, that information changes the probability of either remaining choice — neither of which has any reason to be more likely — to 1/2. As a professional mathematician, I’m very concerned with the general public’s lack of mathematical skills. Please help by confessing your error and, in the future, being more careful.” //
    .

    .

  2. “The technique of hypothesis testing assumes that the underlying distribution of the null hypothesis is known. Isn’t this in reality an almost intractable problem?”
    .
    It’s only an intractable problem if you don’t have a properly randomized control group.

    • Michael,
      .
      How does one know when one has one?

      • “How does one know when one has one?”
        .
        That’s sort of like asking, “how does one know how to fly an airplane?”. With a lot of training, a lot of practice, and even then the best-trained professionals can make the occasional mistake.
        .
        The main difference is that in aviation, the evidence of serious error is usually immediate, obvious, and irrefutable, whereas with statistics the evidence of error is obscure, hard to understand, and harder to communicate. Consequently, such errors are far more common, far less likely to be detected, and far less likely to be corrected.
        .
        For a sobering illustration, consider the “Monty Hall Dilemma”:
        .
        “Across experiments, the probability of gaining reinforcement for switching and staying was manipulated, and birds adjusted their probability of switching and staying to approximate the optimal strategy. Replication of the procedure with human participants showed that humans failed to adopt optimal strategies, even with extensive training.”
        .
        https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3086893/
        .
        Not only do pigeons outperform humans at aviation, in certain circumstances, they outperform humans at statistical analysis as well.
        .

  3. The technique of hypothesis testing assumes that the underlying distribution of the null hypothesis is known. Isn’t this in reality an almost intractable problem?
    .
    I think the problem is even more fundamental than that.
    .
    Any test assumes that the “truth” is resides in the centre. The law of central tendency prevails.
    .
    All other measurements/events are presumed to be random and constitute what is summarily dismissed as noise. Variability is shunned.
    .
    Like “probability”, “randomness” is another abstract construct, pressed into service because really we don’t understand the phenomenon under study. Like probability, it is a way of dealing with our lack of understanding. We believe that if we can describe a phenomenon by a mean and a distribution of some sort then we have the tiger caged.
    .
    It seems to me science should get interested in variability per se and look for the truth in the noise as much as in the centre.

  4. //Arguing about p won’t help if the real issue is that we don’t know how to describe experiments, how to collect data, or how to publish results in ways that can be used effectively by others.
    .
    Here’s a radical proposal. Let’s consider the crisis of reproducibility as an opportunity to think about the scientific enterprise itself.//
    .
    https://www.oreilly.com/ideas/p-values-not-quite-considered-harmful

  5. Required reading/interaction to understand what p-values are, and are not:
    .
    http://rpsychologist.com/d3/NHST/


Sorry, the comment form is closed at this time.

Blog at WordPress.com.
Entries and Comments feeds.