Freedman’s Rabbit Theorem

13 Dec, 2022 at 11:10 | Posted in Statistics & Econometrics | 3 Comments

In econometrics one often gets the feeling that many of its practitioners think of it as a kind of automatic inferential machine: input data and out comes causal knowledge. This is like pulling a rabbit from a hat. Great, but as renowned statistician David Freedman had it, first you must put the rabbit in the hat. And this is where assumptions come into the picture.

The assumption of imaginary ‘superpopulations’ is one of the many dubious assumptions used in modern econometrics, and as Clint Ballinger has highlighted, this is a particularly questionable rabbit-pulling assumption:

pulling_a_rabbit_out_of_a_hat_by_candiphoenixes-d3ee5jaInferential statistics are based on taking a random sample from a larger population … and attempting to draw conclusions about a) the larger population from that data and b) the probability that the relations between measured variables are consistent or are artifacts of the sampling procedure.

However, in political science, economics, development studies and related fields the data often represents as complete an amount of data as can be measured from the real world (an ‘apparent population’). It is not the result of a random sampling from a larger population. Nevertheless, social scientists treat such data as the result of random sampling.

Because there is no source of further cases a fiction is propagated—the data is treated as if it were from a larger population, a ‘superpopulation’ where repeated realizations of the data are imagined. Imagine there could be more worlds with more cases and the problem is fixed …

What ‘draw’ from this imaginary superpopulation does the real-world set of cases we have in hand represent? This is simply an unanswerable question. The current set of cases could be representative of the superpopulation, and it could be an extremely unrepresentative sample, a one in a million chance selection from it …

The problem is not one of statistics that need to be fixed. Rather, it is a problem of the misapplication of inferential statistics to non-inferential situations.


  1. When gathering data, the timing matters.
    Data from the morning after a bank crisis, shows big bank exposures to what’s very risky.
    Data from time before, (when the rabbit is put in the hat) show those exposures being built up with what’s perceived or decreed as very safe.

  2. Contrary to Prof. Syll’s comment at the start of this post, social scientists do not make any “dubious assumption” of “imaginary superpopulations”.
    It would be absurd to expect available data to reveal precise deterministic patterns of behaviour Y = f(X). Instead social scientists look for relationships such as Y = f(X) + u where u is a random error term. This error term a technical device reflecting important characteristics real world human behaviour and available data.
    Ballinger’s argument in the passage quoted is invalid because it is based on the false belief that:
    “in political science, economics, development studies and related fields the data often represents as complete an amount of data as can be measured from the real world”.
    In fact the data available to social scientists is always incomplete and imperfect due to:
    – Vagueness of theory: the theory, if any, determining human behavior may be, and often is, incomplete.
    – Measurement/recording errors.
    – Unavailability of data: it is impossible to collect data on all of the multitude of complexities which may influence human behaviour. At best the most important variables will be measured but there are bound to be numerous omitted variables in any practical study.
    – The need to use imperfect proxy measures to approximately indicate factors which cannot be directly measured.
    – Intrinsic randomness in human behavior: even if we succeed in introducing all the relevant variables, there is bound to be some “intrinsic” randomness.

    • When do you see economists using error margins on the GDP statistic?
      In my economics MOOCs (from the IMF), why did they silently throw out the error terms?

Sorry, the comment form is closed at this time.

Blog at
Entries and Comments feeds.