Statistics and econometrics — science building on fantasy worlds

28 Sep, 2021 at 11:04 | Posted in Statistics & Econometrics | 2 Comments

In econometrics one often gets the feeling that many of its practitioners think of it as a kind of automatic inferential machine: input data and out comes casual knowledge. This is like pulling a rabbit from a hat. Great — but first you have to put the rabbit in the hat. And this is where assumptions come into the picture.

The assumption of imaginary ‘super populations’ is one of the many dubious assumptions used in modern econometrics.

Creepy Condescending Wonka Meme - ImgflipAs social scientists — and economists — we have to confront the all-important question of how to handle uncertainty and randomness. Should we define randomness with probability? If we do, we have to accept that to speak of randomness we also have to presuppose the existence of nomological probability machines, since probabilities cannot be spoken of – and actually, to be strict, do not at all exist – without specifying such system-contexts. Accepting a domain of probability theory and sample space of infinite populations also implies that judgments are made on the basis of observations that are actually never made!

Infinitely repeated trials or samplings never take place in the real world. So that cannot be a sound inductive basis for a science with aspirations of explaining real-world socio-economic processes, structures or events. It’s not tenable.

And as if this wasn’t enough, one could — as we’ve seen — also seriously wonder what kind of ‘populations’ these statistical and econometric models ultimately are based on. Why should we as social scientists — and not as pure mathematicians working with formal-axiomatic systems without the urge to confront our models with real target systems — unquestioningly accept models based on concepts like the ‘infinite super populations’ used in e.g. the ‘potential outcome’ framework that has become so popular lately in social sciences?

The theory requires that the data be embedded in a stochastic framework, complete with random variables, probability distributions, and unknown parameters. However, the data often arrive without benefit of randomness. In such cases, the investigators may still wish to separate effects of “the causes they wish to study or are trying to detect” from “accidental occurrences due to the many other circumstances which they cannot control.” What can they do? Usually, they follow Fisher (1922) into a fantasy world “by constructing a hypothetical infinite population, of which the actual data are regarded as constituting a random sample.” Unfortunately, this fantasy world is often harder to understand than the original problem which lead to its invocation.

David Freedman & David Lane

Of course one could treat observational or experimental data as random samples from real populations. I have no problem with that (although it has to be noted that most ‘natural experiments’ are not based on random sampling from some underlying population — which, of course, means that the effect-estimators, strictly seen, only are unbiased for the specific groups studied). But probabilistic econometrics does not content itself with that kind of populations. Instead, it creates imaginary populations of ‘parallel universes’ and assume that our data are random samples from that kind of  ‘infinite super populations.’

But this is actually nothing else but hand-waving! And it is inadequate for real science. As David Freedman writes:

These are convenient fictions … Nevertheless, reliance on imaginary populations is widespread. Indeed regression models are commonly used to analyze convenience samples … The rhetoric of imaginary populations is seductive because it seems to free the investigator from the necessity of understanding how data were generated.

Modelling assumptions made in statistics and econometrics are more often than not made for mathematical tractability reasons, rather than verisimilitude. That is unfortunately also a reason why the methodological ‘rigour’ encountered when taking part of statistical and econometric research to a large degree is nothing but deceptive appearance. The models constructed may seem technically advanced and very ‘sophisticated,’ but that’s usually only because the problems here discussed have been swept under the carpet. Assuming that our data are generated by ‘coin flips’ in an imaginary ‘superpopulation’ only means that we get answers to questions that we are not asking. The inferences made based on imaginary ‘superpopulations,’ well, they too are nothing but imaginary. We should not — as already Aristotle noted — expect more rigour and precision than the object examined allows. And in social sciences — including economics and econometrics — it’s always wise to ponder C. S. Peirce’s remark that universes are not as common as peanuts …


    “…the object of statistical methods is the reduction of data. A quantity of data,which usually by its mere bulk is incapable of entering the mind, is to be, replaced by relatively few quantities which shall adequately represent the whole, or which, in other words, shall contain as much as possible, ideally the whole, of the relevant information contained in the original data.
    This object is accomplished by constructing a hypothetical infinite population, of which the actual data are regarded as constituting a random sample. The law of distribution of this hypothetical population is specified by relatively few parameters, which are sufficient to describe it exhaustively in respect of all qualities under discussion. Any information given by the sample, which is of use in estimating the values of these parameters, is relevant information. Since the number of independent facts supplied in the data is usually far greater than the number of facts sought, much of the information supplied by any actual sample is irrelevant. It is the object of the statistical processes employed in the reduction of data to exclude this irrelevant information, and to isolate the whole of the relevant information contained in the data.”

    “It should be noted that there is no falsehood in interpreting any set of independent measurements as a random sample from an infinite population; for any such set of numbers are a random sample from the totality of numbers produced by the same matrix of causal conditions : the hypothetical population which we are studying is an aspect of the totality of the effects of these conditions, of whatever nature they may be.”

    “As regards problems of specification, these are entirely a matter for the practical statistician, for those cases where the qualitative nature of the hypothetical population is known do not involve any problems of this type. In other cases we may know by experience what forms are likely to be suitable, and the adequacy of our choice may be tested a posteriori.”
    – R. A. Fisher 1922: On the Mathematical Foundations of Theoretical Statistics
    The methodology of econometricians reflects the innate learning processes found in humans and other animals:
    1) Statistical concepts are innate in humans, e.g. in the rapid learning of toddlers that bumps are correlated with pain depending on speed and direction of movement, hardness of objects, part of body, etc.
    Children begin to develop cause-and-effect thinking skills as early as eight months of age. They assume that regular causal mechanisms operate in real world.
    Evidence of the development of causal reasoning in toddlers and young children is consistent with Causal Graphical Models (CGMs). These are “representations of a joint probability distribution — a list of all possible combinations of events and the probability that each combination occurs.”
    This is just like econometrics.
    – Sobel & Kirkham – Developmental Psychology 2006: “Blickets and Babies: The Development of Causal Reasoning in Toddlers and Infants”
    – Sobel & Legare – Cognition Science 2014: “Causal learning in children”
    2) There is overwhelming empirical evidence that our ancestors were very good at statistical thinking – that’s how they survived and we evolved. They succeeded at hunting, fishing, scavenging, foraging, marauding, philandering etc. by making judgements about prospects in an uncertain and dangerous world with imperfect information.
    A study of indigenous Maya people found that “humans have an innate grasp of probability” and that “probabilistic reasoning does not depend on formal education”.
    3) Even chimps and many other animals naturally think probabilistically.
    From a series of 7 experiments with Bonobos, Chimpanzees, Gorillas and Orang-utans it was concluded that:
    “a basic form of drawing inferences from populations to samples is not uniquely human, but evolutionarily more ancient:
    It is shared by our closest living primate relatives, the great apes, and perhaps by other species in the primate lineage and beyond and it thus clearly antedates language and formal mathematical thinking both phylogenetically and ontogenetically.”
    – Rakoczy et al. (2014) – Apes are intuitive statisticians. Cognition, 131(1):60-8

    Click to access Rakoczy_Apes_Cognition_2014_1920316.pdf

    • Kingsley, suppose an anthropologist observed me setting up a rain shelter in the woods, could he force a statistical story on my behavior, where I was drawing inferences from samples to populations, or whatnot?
      But if they just asked me would I tell a story of hunches, rules of thumb, some application of basic physics and geometry, but a lot more just winging it?
      As long as my story keeps me dry (my self-determined preference), what does statistics add?

Sorry, the comment form is closed at this time.

Blog at
Entries and Comments feeds.