Model selection and the reference class problem (wonkish)12 September, 2015 at 16:20 | Posted in Theory of Science & Methodology | Leave a comment
The reference class problem arises when we want to assign a probability to a single proposition, X, which may be classified in various ways, yet its probability can change depending on how it is classified. (X may correspond to a sentence, or event, or an individual’s instantiating a given property, or the outcome of a random experiment, or a set of possible worlds, or some other bearer of probability.) X may be classified as belonging to set S1, or to set S2, and so on. Qua member of S1, its probability is p1; qua member of S2, its probability is p2, where p1 ≠ p2; and so on. And perhaps qua member of some other set, its probability does not exist at all …
Now, the bad news. Giving primacy to conditional probabilities does not so much rid us the epistemological reference class problem as give us another way of stating it. Which of the many conditional probabilities should guide us, should underpin our inductive reasonings and decisions? Our friend John Smith is still pondering his prospects of living at least eleven more years as he contemplates buying life insurance. It will not help him much to tell him of the many conditional probabilities that apply to him, each relativized to a different reference class: “conditional on your being an Englishman, your probability of living to 60 is x; conditional on your being consumptive, it is y; …”. (By analogy, when John Smith is pondering how far away is London, it will not help him much to tell him of the many distances that there are, each relative to a different reference frame.) If probability is to serve as a guide to life, it should in principle be possible to designate one of these conditional probabilities as the right one. To be sure, we could single out one conditional probability among them, and insist that that is the one that should guide him. But that is tantamount to singling out one reference class of the many to which he belongs, and claiming that we have solved the original reference class problem. Life, unfortunately, is not that easy—and neither is our guide to life.
When choosing which models to use in our analyses, we cannot get around the fact that the evaluation of our hypotheses, explanations, and predictions cannot be made without reference to a specific statistical model or framework. What Hajék so eloquently points at is that the probabilistic-statistical inferences we make from our samples decisively depends on what population we choose to refer to. The reference class problem shows that there usually are many such populations to choose from, and that the one we choose decides which probabilities we come up with and a fortiori which predictions we make. Not consciously contemplating the relativity effects this choice of “nomological-statistical machines” have, is probably one of the reasons economists have a false sense of the amount of uncertainty that really afflicts their models.