What is this thing called probability?

19 Jun, 2023 at 17:24 | Posted in Statistics & Econometrics | 2 Comments

Theory of Probability Course I Stanford OnlineFitting a model that has a parameter called ‘probability’ to data does not mean that the estimated value of that parameter estimates the probability of anything in the real world. Just as the map is not the territory, the model is not the phenomenon, and calling something ‘probability’ does not make it a probability, any more than drawing a mountain on a map creates a real mountain …

In summary, the word ‘probability’ is often used with little thought about why, if at all, the term applies, and many common uses of the word are rather removed from anything in the real world that can be reasonably described or modeled as random.

Philip Stark

Modern probabilistic econometrics relies on the notion of probability. To at all be amenable to econometric analysis, economic observations allegedly have to be conceived as random events.

But is it really necessary to model the economic system as a system where uncertainty and randomness can only be analyzed and understood when based on an a priori notion of probability?

In probabilistic econometrics, events and observations are as a rule interpreted as random variables as if generated by an underlying probability density function, and, a fortiori, since probability density functions are only definable in a probability context, consistent with a probability. Attempting to convince us of the necessity of founding empirical economic analysis on probability models actually forces econometrics to (implicitly) interpret events as random variables generated by an underlying probability density function.

This is at odds with reality. Randomness and uncertainty obviously are facts of the real world. Probability, on the other hand, attaches to the world via intellectually constructed models, and a fortiori is only a fact of a probability-generating machine or a well-constructed (randomized) experimental arrangement or ‘chance set-up-‘

Just as there is no such thing as a ‘free lunch,’ there is no such thing as a ‘free probability.’ To be able at all to talk about probabilities, you have to specify a model. If there is no chance set-up or model that generates the probabilistic outcomes or events — in statistics, one refers to any process where you observe or measure as an experiment (rolling a die) and the results obtained as the outcomes or events (number of points rolled with the die, being e. g. 3 or 5) of the experiment — there strictly seen is no event at all.

Probability is a relational element. It always must come with a specification of the model from which it is calculated. And then to be of any empirical scientific value it has to be shown to coincide with (or at least converge to) real data-generating processes or structures — something seldom or never done!

And this is the basic problem with economic data. If you have a fair roulette wheel, you can arguably specify probabilities and probability density distributions. But how do you conceive of the analogous nomological machines for prices, gross domestic product, income distribution etc? Only by a leap of faith. And that does not suffice. You have to come up with some really good arguments if you want to persuade people into believing in the existence of socioeconomic structures that generate data with characteristics conceivable as stochastic events portrayed by probabilistic density distributions!

From a realistic point of view we have to admit that the socio-economic states of nature that we talk of in most social sciences — and certainly in econometrics — are not amenable to analyze as probabilities, simply because in the real-world open systems that social sciences — including econometrics — analyze, there are no probabilities to be had!

The processes that generate socio-economic data in the real world cannot just be assumed to always be adequately captured by a probability measure. And, so, it cannot be maintained — as in the paradigm of probabilistic econometrics — that it even should be mandatory to treat observations and data — whether cross-section, time series or panel data — as events generated by some probability model. The important activities of most economic agents do not usually include throwing dice or spinning roulette wheels. Data-generating processes — at least outside of nomological machines like dice and roulette wheels — are not self-evidently best modelled with probability measures.

If we agree on this, we also have to admit that probabilistic econometrics lacks sound foundations. I would even go further and argue that there really is no justifiable rationale at all for this belief that all economically relevant data can be adequately captured by a probability measure. In most real-world contexts one has to argue one’s case. And that is obviously something seldom or never done by practitioners of probabilistic econometrics.

2 Comments

  1. It is sad to see philosophers getting their knickers in a twist whenever they discus probability.
    If they could just relax they might realise that “probability” is just a numerical index used to communicate degrees of uncertainty.
    This is so in both popular and scientific discourse in the fields of gambling, empirical research and descriptions of events, people and nature.
    Contrary to Prof Syll, the concept of “probability” is the same in all these contexts.
    .
    This was realised by the eminent statistician R. A. Fisher:
    “probability is the most elementary of statistical concepts”
    …”the object of statistical methods is the reduction of data. A quantity of data, which usually by its mere bulk is incapable of: entering the mind, is to be, replaced by relatively few quantities which shall adequately represent the whole, or which, in other words, shall contain as much as possible, ideally the whole, of the relevant information”contained in the original data”.
    – R. A. Fisher 1922 – On the Mathematical Foundations of Theoretical Statistics
    .
    Of course, all index numbers have limitations in that they can’t summarise all details of the available information in just one or just a few statistics. Prof Syll is trivially right to point this out. But he would be fundamentally wrong if he is arguing that this invalidates scientific discourse using numerical indices.
    .
    There are many examples of other numerical indices widely used by ordinarily folk and by scientists despite their limitations.
    – GDP for the scale of economic output
    – Consumer price indices for the general level of prices and cost of living
    – Heat indices for perceived/apparent temperature
    – Richter scale for the magnitude of earthquakes
    – Saffir-Simpson Scale for hurricane winds
    – Enhanced Fujita scale for tornado intensity
    – Beaufort scale for winds at sea or on land
    – Scoville scale for the of pungency of chili peppers etc
    – Indices of Perceived Sweetness of sugars etc
    – Weber–Fechner laws regarding human perceptions of the change in a physical stimulus to human vision, hearing, taste, touch.
    .
    The construction of all of such indices requires assumptions which are never 100% true. Even so, indices may still be sufficiently accurate to render them useful for the purposes intended.
    In the case of econometrics, the main assumptions regarding uncertainty concern the distribution of the error term. The error term is intended to allow for measurement errors, omitted variables, imperfections in proxy variables and randomness in nature.
    The Central Limit Theorem and empirical evidence suggests that effects of these factors taken together, and even individually, may be expected in many applications to result an approximately normal distribution of errors. (See note at end of this comment)
    .
    Most laymen and applied scientists are able to communicate with each other understanding the approximations and limitations of assumptions. They are not very concerned about the scepticism and confusions of philosophers who seem to be be impotent to produce any superior indices or techniques.
    .
    “Not the cry, but the flight of the wild duck causes the flock to follow”
    – Chinese proverb.
    —————–
    Technical Note
    By the CLT “it can be shown that if there are a large number of independent and identically distributed random variables, then, with a few exceptions, the distribution of their sum tends to a normal distribution as the number of such variables increases indefinitely.
    A variant of the CLT states that, even if the number of variables is not very large or if these variables are not strictly independent, their sum may still be normally distributed.
    – Basic Econometrics – Gujarati & Porter 5e 2009

  2. Excellent discussion. It weakens my argument that correlation (a probalistic artifact) can be used to infer what is impossible. I think that is true only for data that fits a gaussian distribution. Some observed data may approach a Gaussian distribution to some extent, but in econometrics only a few do and then not very closely.
    – – John Lounsbury


Sorry, the comment form is closed at this time.

Blog at WordPress.com.
Entries and Comments feeds.