The appropriate use of hypothesis testing

25 May, 2014 at 09:44 | Posted in Theory of Science & Methodology | 2 Comments

Hypothesis testing and p-values are so compelling in that they fit in so well with the Popperian model in which science advances via refutation of hypotheses. For both theoretical and practical reasons I am supportive of a (modified) Popperian philosophy of science in which models are advanced and then refuted (Gelman and Shalizi 2013). But a necessary part of falsificationism is that the models being rejected are worthy of consideration. significant-p-valueIf a group of researchers in some scientific field develops an interesting scientific model with predictive power, then I think it very appropriate to use this model for inference and to check it rigorously, eventually abandoning it and replacing it with something better if it fails to make accurate predictions in a definitive series of experiments. This is the form of hypothesis testing and falsification that is valuable to me. In common practice, however, the “null hypothesis” is a straw man that exists only to be rejected. In this case, I am typically much more interested in the size of the effect, its persistence, and how it varies across different situations. I would like to reserve hypothesis testing for the exploration of serious hypotheses and not as in indirect form of statistical inference that typically has the effect of reducing scientific explorations to yes/no conclusions.

Andrew Gelman

Assumptions in scientific theories/models are often based on (mathematical) tractability (and so necessarily simplifying) and used for more or less self-evidently necessary theoretical consistency reasons. But one should also remember that assumptions are selected for a specific purpose, and so the arguments (in economics shamelessly often totally non-existent) put forward for having selected a specific set of assumptions, have to be judged against that background to check if they are warranted.

This, however, only shrinks the assumptions set minimally – it is still necessary to decide on which assumptions are innocuous and which are harmful, and what constitutes interesting/important assumptions from an ontological & epistemological point of view (explanation, understanding, prediction). Especially so if you intend to refer your theories/models to a specific target system — preferably the real world. To do this one should start by applying a real world filter in the form of a Smell Test: Is the theory/model reasonable given what we know about the real world? If not, why should we care about it? If not – we shouldn’t apply it (remember time is limited and economics is a science on scarcity & optimization …)


  1. Generally, I agree with your understanding of reasonability as necessary for acceptance of a theory, model, or concept. In organizations, however, the interests
    behind a particular approach typically fail to provide any reasoning which causes people to understand rather than follow orders in a bewildered state. This can also be the reason for textbooks or articles in economics to fail to address the problems or reasoning underlying the particular approach.

    If the argument were the ‘unit of analysis’ then arguments could be presented and evaluated which included assumptions, claims, evidence, reasoning, conclusions and possible rebuttals.

    Reasonability, and reasoning generally, have however come under withering fire in the 20th century as a veil of interests and as instrumental means for undisclosed ends.

    A few words about Popper and scientists: Popper made major claims about science. He claimed that the normal practice of scientists is to hypothesize and look for confirming observations or ‘experiences.’ This was called induction and he objected that scientists always confirmed their hypotheses. So, he offered a deductive model where disconfirming observations could falsify an hypothesis. However, scientist’s everyday work was usually a process of induction: looking for observations that confirmed one’s hypotheses, or expectations!

    Popper also rejected historicism which he defined as teleology or determinism, even though historicism had been defined by others as historical empathizing to reconstruct prior events. In effect, Popper rejected any long term or widespread predictions! AND, Popper thought that the scientific method was the same for both the natural and social sciences. In doing so, he failed to address the feedback mechanism where human behavior changes as a result of new information, namely, confirmed or disconfirmed hypotheses, which does not happen to physical objects. Both confirming and falsifying are mental operations that have a dual effect, so to speak.

    What is probably happening, if I may speculate, is that there are normative expectations that listeners and readers agree or identify with the presenter or writer of hypotheses, theories and models. To critique any economic model on the basis that its assumptions either fail to address certain necessary contingencies or that its conclusions and implications would spell ruin/decline for certain groups would be to fail to instantiate the “matching capacity” that we attribute to each other. Perhaps, in this way, bogus theories and models make their way through the paper presentation –> articles publication –> book selling process.

  2. Perhaps there’s an issue in economics around a failure to distinguish sharply enough between analytical models used to do theory, and operational models used to study and describe the working of actual institutions.

    Reflecting on the paper by Michel De Vroey, which you posted on, it seems to me that the “tractability” problem may be very deep indeed for economics, so deep that analytic models are likely to be inverted analogues of their “targets” as you call them. The central problem would seem to be radical uncertainty and risk. To solve an optimization problem in theory, the optimum must be knowable; in the actual economy, what is known is always in the process of being discovered, and, to an unknown extent, unknown. Something like the iconic Arrow-DeBreu-McKenzie model of a market economy in general equilibrium can be interesting for establishing the logical requirements of such an equilibrium, but the most striking aspect of the result is surely that the economy described is nothing like the actual economy, and cannot be like the actual economy for more or less obvious reasons.

    Treating such a theoretical result as an ideal, and merely waving one’s hands at “frictions” is not a very scientific way of confronting facts.

    Popper would have us build operational models, incorporating auxiliary hypothesis, to describe, measure and interpret what we observe. The institutions of the actual economy must cope with radical uncertainty, and learning and control and error in those conditions. Modeling the operation of actual institutions and measuring the “frictions” may be the practically important thing. As Michel De Vroey hinted, something as practically important as involuntary unemployment may be one of those frictions, something our understanding can only penetrate, when we get serious about operational modeling of the working of actual institutions.

Sorry, the comment form is closed at this time.

Blog at
Entries and comments feeds.