What — if anything — do p-values test?

21 Aug, 2019 at 11:43 | Posted in Statistics & Econometrics | 1 Comment

pvUnless enforced by study design and execution, statistical assumptions usually have no external justification; they may even be quite implausible. As result, one often sees attempts to justify specific assumptions with statistical tests, in the belief that a high p-value or ‘‘nonsignificance’’ licenses the assumption and a low p-value refutes it. Such focused testing is based on a meta-assumption that every other assumption used to derive the p-value is correct, which is a poor judgment when some of those other assumptions are uncertain. In that case (and in general) one should recognize that the p-value is simultaneously testing all the assumptions used to compute it – in particular, a null p-value actually tests the entire model, not just the stated hypothesis or assumption it is presumed to test.

Sander Greenland

All science entail human judgement, and using statistical models doesn’t relieve us of that necessity. Working with misspecified models, the scientific value of significance testing is actually zero —  even though you’re making valid statistical inferences! Statistical models and concomitant significance tests are no substitutes for doing real science.

In its standard form, a significance test is not the kind of ‘severe test’ that we are looking for in our search for being able to confirm or disconfirm empirical scientific hypotheses. This is problematic for many reasons, one being that there is a strong tendency to accept the null hypothesis since they can’t be rejected at the standard 5% significance level. In their standard form, significance tests bias against new hypotheses by making it hard to disconfirm the null hypothesis.

And as shown over and over again when it is applied, people have a tendency to read “not disconfirmed” as “probably confirmed.” Standard scientific methodology tells us that when there is only say a 10 % probability that pure sampling error could account for the observed difference between the data and the null hypothesis, it would be more “reasonable” to conclude that we have a case of disconfirmation. Especially if we perform many independent tests of our hypothesis and they all give ​the same 10% result as our reported one, I guess most researchers would count the hypothesis as even more disconfirmed.

Statistics is no substitute for thinking. We should never forget that the underlying parameters we use when performing significance tests are model constructions. Our p-values mean next to nothing if the model is wrong. Statistical​ significance tests do not validate models!

1 Comment

  1. Here:

    https://www.bmj.com/content/bmj/363/bmj.k5094.full.pdf

    Is a send up of the concept of pre test Bayesian probability and the trouble one can get into.

    Here is an out take:

    Parachutes are routinely used to prevent death or major traumatic injury among individuals jumping from aircraft. However, evidence supporting the efficacy of parachutes is weak and guideline recommendations for their use are principally based on biological plausibility and expert opinion.1 2 Despite this widely held yet unsubstantiated belief of efficacy, many studies of parachutes have suggested injuries related
    34 to their use in both military and recreational settings,
    and parachutist injuries are formally recognized in the World Health Organization’s ICD-10 (international classification of diseases, 10th revision).5 This could raise concerns for supporters of evidence-based medicine, because numerous medical interventions believed to be useful have ultimately failed to show efficacy when subjected to properly executed randomized clinical trials.6 7
    Previous attempts to evaluate parachute use in a randomized setting have not been undertaken owing to both ethical and practical concerns. Lack of equipoise could inhibit recruitment of participants in such a trial. However, whether pre-existing beliefs about the efficacy of parachutes would, in fact, impair the enrolment of participants in a clinical trial has not been formally evaluated. To address these important gaps in evidence, we conducted the first randomized clinical trial of the efficacy of parachutes in reducing death and major injury when jumping fro

    Then hilarity ensues.


Sorry, the comment form is closed at this time.

Blog at WordPress.com.
Entries and comments feeds.