## Sometimes we do not know because we cannot know

18 April, 2018 at 17:11 | Posted in Economics, Statistics & Econometrics | 8 CommentsSome time ago, Bank of England’s Andrew G Haldane and Benjamin Nelson presented a paper with the title Tails of the unexpected. The main message of the paper was that we should not let us be fooled by randomness:

The normal distribution provides a beguilingly simple description of the world. Outcomes lie symmetrically around the mean, with a probability that steadily decays. It is well-known that repeated games of chance deliver random outcomes in line with this distribution: tosses of a fair coin, sampling of coloured balls from a jam-jar, bets on a lottery number, games of paper/scissors/stone. Or have you been fooled by randomness?

Normality has been an accepted wisdom in economics and finance for a century or more. Yet in real-world systems, nothing could be less normal than normality. Tails should not be unexpected, for they are the rule. As the world becomes increasingly integrated – financially, economically, socially – interactions among the moving parts may make for potentially fatter tails. Catastrophe risk may be on the rise.

If public policy treats economic and financial systems as though they behave like a lottery – random, normal – then public policy risks itself becoming a lottery. Preventing public policy catastrophe requires that we better understand and plot the contours of systemic risk, fat tails and all. It also means putting in place robust fail-safes to stop chaos emerging, the sand pile collapsing, the forest fire spreading. Until then, normal service is unlikely to resume.

Since I think this is a great paper, it merits a couple of comments s.

To understand real world ”non-routine” decisions and unforeseeable changes in behaviour, ergodic probability distributions are of no avail. In a world full of genuine uncertainty – where real historical time rules the roost – the probabilities that ruled the past are not those that will rule the future.

Time is what prevents everything from happening at once. To simply assume that economic processes are ergodic and concentrate on ensemble averages – and a fortiori in any relevant sense timeless – is not a sensible way for dealing with the kind of genuine uncertainty that permeates open systems such as economies.

When you assume the economic processes to be ergodic, ensemble and time averages are identical. Let me give an example: Assume we have a market with an asset priced at 100 €. Then imagine the price first goes up by 50% and then later falls by 50%. The ensemble average for this asset would be 100 €- because we here envision two parallel universes (markets) where the asset-price falls in one universe (market) with 50% to 50 €, and in another universe (market) it goes up with 50% to 150 €, giving an average of 100 € ((150+50)/2). The time average for this asset would be 75 € – because we here envision one universe (market) where the asset-price first rises by 50% to 150 €, and then falls by 50% to 75 € (0.5*150).

From the ensemble perspective nothing really, on average, happens. From the time perspective lots of things really, on average, happen.

Assuming ergodicity there would have been no difference at all. What is important with the fact that real social and economic processes are nonergodic is the fact that uncertainty – not risk – rules the roost. That was something both Keynes and Knight basically said in their 1921 books. Thinking about uncertainty in terms of “rational expectations” and “ensemble averages” has had seriously bad repercussions on the financial system.

Knight’s uncertainty concept has an epistemological founding and Keynes’ definitely an ontological founding. Of course, this also has repercussions on the issue of ergodicity in a strict methodological and mathematical-statistical sense. I think Keynes’ view is the most warranted of the two.

The most interesting and far-reaching difference between the epistemological and the ontological view is that if one subscribes to the former, Knightian view – as Taleb, Haldane & Nelson and “black swan” theorists basically do – you open up for the mistaken belief that with better information and greater computer-power we somehow should always be able to calculate probabilities and describe the world as an ergodic universe. As Keynes convincingly argued, that is ontologically just not possible.

If probability distributions do not exist for certain phenomena, those distributions are not only not knowable, but the whole question regarding whether they can or cannot be known is beside the point. Keynes essentially says this when he asserts that sometimes they are simply unknowable.

To Keynes, the source of uncertainty was in the nature of the real — nonergodic — world. It had to do, not only — or primarily — with the epistemological fact of us not knowing the things that today are unknown, but rather with the much deeper and far-reaching ontological fact that there often is no firm basis on which we can form quantifiable probabilities and expectations at all.

Sometimes we *do not* know because we *cannot* know.

## Shortcomings of regression analysis

16 April, 2018 at 08:45 | Posted in Statistics & Econometrics | Leave a commentDistinguished social psychologist Richard E. Nisbett has a somewhat atypical aversion to multiple regression analysis. In his *Intelligence and How to Get It* (Norton 2011) he writes:

Researchers often determine the individual’s contemporary IQ or IQ earlier in life, socioeconomic status of the family of origin, living circumstances when the individual was a child, number of siblings, whether the family had a library card, educational attainment of the individual, and other variables, and put all of them into a multiple-regression equation predicting adult socioeconomic status or income or social pathology or whatever. Researchers then report the magnitude of the contribution of each of the variables in the regression equation, net of all the others (that is, holding constant all the others). It always turns out that IQ, net of all the other variables, is important to outcomes. But … the independent variables pose a tangle of causality – with some causing others in goodness-knows-what ways and some being caused by unknown variables that have not even been measured. Higher socioeconomic status of parents is related to educational attainment of the child, but higher-socioeconomic-status parents have higher IQs, and this affects both the genes that the child has and the emphasis that the parents are likely to place on education and the quality of the parenting with respect to encouragement of intellectual skills and so on. So statements such as “IQ accounts for X percent of the variation in occupational attainment” are built on the shakiest of statistical foundations. What nature hath joined together, multiple regressions cannot put asunder.

Now, I think that what Nisbett says is right as far as it goes, although it would certainly have strengthened Nisbett’s argumentation if he had elaborated more on the methodological question around causality, or at least had given some mathematical-statistical-econometric references. Unfortunately, his alternative approach is not more convincing than regression analysis. As so many other contemporary social scientists today, Nisbett seems to think that randomization may solve the empirical problem. By randomizing we are getting different “populations” that are homogeneous in regards to all variables except the one we think is a genuine cause. In this way, we are supposed to be able to not have to actually know what all these other factors are.

If you succeed in performing an *ideal* randomization with different treatment groups and control groups that is attainable. *But* it presupposes that you really have been able to establish — and not just assume — that the probability of all other causes but the putative have the same probability distribution in the treatment and control groups, and that the probability of assignment to treatment or control groups is independent of all other possible causal variables.

Unfortunately, *real *experiments and *real* randomizations seldom or never achieve this. So, yes, we may do without knowing *all *causes, but it takes *ideal* experiments and *ideal* randomizations to do that, not *real *ones.

As I have argued — e. g. here — that means that in practice we do have to have sufficient background knowledge to deduce causal knowledge. Without old knowledge, we can’t get new knowledge — and, no causes in, no causes out.

On the issue of the shortcomings of multiple regression analysis, no one sums it up better than eminent mathematical statistician David Freedman:

If the assumptions of a model are not derived from theory, and if predictions are not tested against reality, then deductions from the model must be quite shaky. However, without the model, the data cannot be used to answer the research question …

In my view, regression models are not a particularly good way of doing empirical work in the social sciences today, because the technique depends on knowledge that we do not have. Investigators who use the technique are not paying adequate attention to the connection – if any – between the models and the phenomena they are studying. Their conclusions may be valid for the computer code they have created, but the claims are hard to transfer from that microcosm to the larger world …

Regression models often seem to be used to compensate for problems in measurement, data collection, and study design. By the time the models are deployed, the scientific position is nearly hopeless. Reliance on models in such cases is Panglossian …

Given the limits to present knowledge, I doubt that models can be rescued by technical fixes. Arguments about the theoretical merit of regression or the asymptotic behavior of specification tests for picking one version of a model over another seem like the arguments about how to build desalination plants with cold fusion and the energy source. The concept may be admirable, the technical details may be fascinating, but thirsty people should look elsewhere …

Causal inference from observational data presents may difficulties, especially when underlying mechanisms are poorly understood. There is a natural desire to substitute intellectual capital for labor, and an equally natural preference for system and rigor over methods that seem more haphazard. These are possible explanations for the current popularity of statistical models.

Indeed, far-reaching claims have been made for the superiority of a quantitative template that depends on modeling — by those who manage to ignore the far-reaching assumptions behind the models. However, the assumptions often turn out to be unsupported by the data. If so, the rigor of advanced quantitative methods is a matter of appearance rather than substance.

## Keynes’ critique of econometrics — still valid after all these years

26 March, 2018 at 19:24 | Posted in Statistics & Econometrics | 2 CommentsTo apply statistical and mathematical methods to the real-world economy, the econometrician has to make some quite strong assumptions. In a review of Tinbergen’s econometric work — published in *The Economic Journal* in 1939 — John Maynard Keynes gave a comprehensive critique of Tinbergen’s work, focusing on the limiting and unreal character of the assumptions that econometric analyses build on:

**(1) Completeness**: Where Tinbergen attempts to specify and quantify which different factors influence the business cycle, Keynes maintains there has to be a complete list of *all* the relevant factors to avoid misspecification and spurious causal claims. Usually, this problem is ‘solved’ by econometricians assuming that they somehow have a ‘correct’ model specification. Keynes is, to put it mildly, unconvinced:

It will be remembered that the seventy translators of the Septuagint were shut up in seventy separate rooms with the Hebrew text and brought out with them, when they emerged, seventy identical translations. Would the same miracle be vouchsafed if seventy multiple correlators were shut up with the same statistical material? And anyhow, I suppose, if each had a different economist perched on his

a priori, that would make a difference to the outcome.

**(2) Homogeneity**: To make inductive inferences possible — and being able to apply econometrics — the system we try to analyse has to have a large degree of ‘homogeneity.’ According to Keynes most social and economic systems — especially from the perspective of real historical time — lack that ‘homogeneity.’ It is not always possible to take repeated samples from a fixed population when we were analysing real-world economies. In many cases, there simply are no reasons at all to assume the samples to be homogenous.

**(3) Stability:** Tinbergen assumes there is a stable spatio-temporal relationship between the variables his econometric models analyze. But Keynes argued that it was not really possible to make inductive generalisations based on correlations in one sample. As later studies of ‘regime shifts’ and ‘structural breaks’ have shown us, it is exceedingly difficult to find and establish the existence of stable econometric parameters for anything but rather short time series.

**(4) Measurability:** Tinbergen’s model assumes that all relevant factors are measurable. Keynes questions if it is possible to adequately quantify and measure things like expectations and political and psychological factors. And more than anything, he questioned — both on epistemological and ontological grounds — that it was always and everywhere possible to measure real-world uncertainty with the help of probabilistic risk measures. Thinking otherwise can, as Keynes wrote, “only lead to error and delusion.”

**(5) Independence**: Tinbergen assumes that the variables he treats are independent (still a standard assumption in econometrics). Keynes argues that in such a complex, organic and evolutionary system as an economy, independence is a deeply unrealistic assumption to make. Building econometric models from that kind of simplistic and unrealistic assumptions risk producing nothing but spurious correlations and causalities. Real-world economies are organic systems for which the statistical methods used in econometrics are ill-suited, or even, strictly seen, inapplicable. Mechanical probabilistic models have little leverage when applied to non-atomic evolving organic systems — such as economies.

Building econometric models can’t be a goal in itself. Good econometric models are means that make it possible for us to infer things about the real-world systems they ‘represent.’ If we can’t show that the mechanisms or causes that we isolate and handle in our econometric models are ‘exportable’ to the real-world, they are of limited value to our understanding, explanations or predictions of real-world economic systems.

**(6) Linearity:** To make his models tractable, Tinbergen assumes the relationships between the variables he study to be linear. This is still standard procedure today, but as Keynes writes:

It is a very drastic and usually improbable postulate to suppose that all economic forces are of this character, producing independent changes in the phenomenon under investigation which are directly proportional to the changes in themselves; indeed, it is ridiculous.

To Keynes, it was a ‘fallacy of reification’ to assume that all quantities are additive (an assumption closely linked to independence and linearity).

The unpopularity of the principle of organic unities shows very clearly how great is the danger of the assumption of unproved additive formulas. The fallacy, of which ignorance of organic unity is a particular instance, may perhaps be mathematically represented thus: suppose f(x) is the goodness of x and f(y) is the goodness of y. It is then assumed that the goodness of x and y together is f(x) + f(y) when it is clearly f(x + y) and only in special cases will it be true that f(x + y) = f(x) + f(y). It is plain that it is never legitimate to assume this property in the case of any given function without proof.

J. M. Keynes “Ethics in Relation to Conduct” (1903)

Real-world social systems are usually not governed by stable causal mechanisms or capacities. The kinds of ‘laws’ and relations that econometrics has established, are laws and relations about entities in models that presuppose causal mechanisms and variables — and the relationship between them — being linear, additive, homogenous, stable, invariant and atomistic. But — when causal mechanisms operate in the real world they only do it in ever-changing and unstable combinations where the whole is more than a mechanical sum of parts. Since statisticians and econometricians have not been able to convincingly warrant their assumptions of homogeneity, stability, invariance, independence, additivity as being ontologically isomorphic to real-world economic systems, Keynes’ critique is still valid.

In his critique of Tinbergen, Keynes points us to the fundamental logical, epistemological and ontological problems of applying statistical methods to a basically unpredictable, uncertain, complex, unstable, interdependent, and ever-changing social reality. Methods designed to analyse repeated sampling in controlled experiments under fixed conditions are not easily extended to an organic and non-atomistic world where time and history play decisive roles.

Econometric modelling should never be a substitute for thinking. From that perspective, it is really depressing to see how much of Keynes’ critique of the pioneering econometrics in the 1930s-1940s is still relevant today.

The general line you take is interesting and useful. It is, of course, not exactly comparable with mine. I was raising the logical difficulties. You say in effect that, if one was to take these seriously, one would give up the ghost in the first lap, but that the method, used judiciously as an aid to more theoretical enquiries and as a means of suggesting possibilities and probabilities rather than anything else, taken with enough grains of salt and applied with superlative common sense, won’t do much harm. I should quite agree with that. That is how the method ought to be used.

Keynes, letter to E.J. Broster, December 19, 1939

## How to interpret and use regression analysis

23 March, 2018 at 19:06 | Posted in Statistics & Econometrics | Comments Off on How to interpret and use regression analysisAfter having mastered all the technicalities of regression analysis and econometrics, students often feel as though they are the masters of the universe. I usually cool them down with a required reading of Christopher Achen’s modern classic — *Interpreting and Using Regression*.

It usually gets them back on track again, and they understand that

“no increase in methodological sophistication … alter the fundamental nature of the subject. It remains a wondrous mixture of rigorous theory, experienced judgment, and inspired guesswork. And that, finally, is its charm.”

And in case they get too excited about having learned to master the intricacies of proper significance tests and p-values, I ask them to also ponder on Achen’s warning:

Significance testing as a search for specification errors substitutes calculations for substantive thinking. Worse, it channels energy toward the hopeless search for functionally correct specifications and diverts attention from the real tasks, which are to formulate a manageable description of the data and to exclude competing ones.

## Keynes and econometrics

15 March, 2018 at 12:23 | Posted in Statistics & Econometrics | Comments Off on Keynes and econometricsAfter the 1920s, the theoretical and methodological approach to economics deeply changed … A new generation of American and European economists developed Walras’ and Pareto’s mathematical economics. As a result of this trend, the Econometric Society was founded in 1930 …

In the late 1930s, John Maynard Keynes and other economists objected to this recent “mathematizing” approach … At the core of Keynes’ concern laid the question of methodology.

Keynes’ comprehensive critique of econometrics and the assumptions it is built around — completeness, measurability, indepencence, homogeneity, and linearity — is still valid today.

Most work in econometrics is made on the assumption that the researcher has a theoretical model that is ‘true.’ But — to think that we are being able to construct a model where all relevant variables are included and correctly specify the functional relationships that exist between them, is not only a belief without support, it is a belief *impossible* to support.

The theories we work with when building our econometric regression models are insufficient. No matter what we study, there are always some variables missing, and we don’t know the correct way to functionally specify the relationships between the variables.

*Every* econometric model constructed is misspecified. There are always an endless list of possible variables to include, and endless possible ways to specify the relationships between them. So every applied econometrician comes up with his own specification and ‘parameter’ estimates. The econometric Holy Grail of consistent and stable parameter-values is nothing but a dream.

A rigorous application of econometric methods in economics really presupposes that the phenomena of our real world economies are ruled by stable causal relations between variables. Parameter-values estimated in specific spatio-temporal contexts are *presupposed* to be exportable to totally different contexts. To warrant this assumption one, however, has to convincingly establish that the targeted acting causes are stable and invariant so that they maintain their parametric status after the bridging. The endemic lack of predictive success of the econometric project indicates that this hope of finding fixed parameters is a hope for which there really is no other ground than hope itself.

The theoretical conditions that have to be fulfilled for econometrics to really work are nowhere even closely met in reality. Making outlandish statistical assumptions does not provide a solid ground for doing relevant social science and economics. Although econometrics have become the most used quantitative methods in economics today, it’s still a fact that the inferences made from them are as a rule invalid.

Econometrics is basically a deductive method. Given the assumptions it delivers deductive inferences. The problem, of course, is that we will never completely know when the assumptions are right. Conclusions can only be as certain as their premises — and that also applies to econometrics.

## Non-ergodicity and the poverty of kitchen sink modeling

13 March, 2018 at 11:21 | Posted in Statistics & Econometrics | 2 Comments

When I present this argument … one or more scholars say, “But shouldn’t I control for everything I can in my regressions? If not, aren’t my coefficients biased due to excluded variables?” This argument is not as persuasive as it may seem initially. First of all, if what you are doing is misspecified already, then adding or excluding other variables has no tendency to make things consistently better or worse … The excluded variable argument only works if you are sure your specification is precisely correct with all variables included. But no one can know that with more than a handful of explanatory variables.

Still more importantly, big, mushy linear regression and probit equations seem to need a great many control variables precisely because they are jamming together all sorts of observations

that do not belong together. Countries, wars, racial categories, religious preferences, education levels, and other variables that change people’s coefficients are “controlled” with dummy variables that are completely inadequate to modeling their effects. The result is a long list of independent variables, a jumbled bag of nearly unrelated observations, and often a hopelessly bad specification with meaningless (but statistically significant with several

asterisks!) results.A preferable approach is to separate the observations into meaningful subsets—internally compatible statistical regimes … If this can’t be done, then statistical analysis can’t be done. A researcher claiming that nothing else but the big, messy regression is possible because, after all, some results have to be produced, is like a jury that says, “Well, the evidence was weak, but somebody had to be convicted.”

The empirical and theoretical evidence is clear. Predictions and forecasts are inherently difficult to make in a socio-economic domain where genuine uncertainty and unknown unknowns often rule the roost. The real processes that underly the time series that economists use to make their predictions and forecasts do not conform with the assumptions made in the applied statistical and econometric models. Much less is *a fortiori* predictable than standardly — and uncritically — assumed. The forecasting models fail to a large extent because the kind of uncertainty that faces humans and societies actually makes the models strictly seen inapplicable. The future is inherently unknowable — and using statistics, econometrics, decision theory or game theory, does not in the least overcome this ontological fact. The economic future is not something that we normally can predict in advance. Better then to accept that as a rule ‘we simply do not know.’

Ergodicity is a technical term used by statisticians to reflect the idea that we can learn something about the future by looking at the past. It is an idea that is essential to our use of probability models to forecast the future and it is the failure of economic systems to display this property that makes our forecasts so fragile.

## Observation and experiment

10 March, 2018 at 19:56 | Posted in Statistics & Econometrics | Comments Off on Observation and experimentPaul Rosenbaum’s latest book — *Observation and experiment: an introduction to causal inference* — is a well-written introduction to some of the most important and far-reaching ideas in modern statistics. With only a minimum of mathematics, the author manages to give a lively and interesting account of how statisticians try to use statistics to make causal inferences from observational studies and experiments. For non-graduate social science students with no or little ‘technical’ background, this is highly recommended reading. Especially for those who want to get their first grip on the nowadays so influential ‘potential outcomes’ paradigm, this is probably the most accessible presentation available. A must-read. That said, there are, of course, critiques that can be waged against that paradigm. But I save that for another post.

## Econometric disillusionment

9 March, 2018 at 15:06 | Posted in Statistics & Econometrics | 4 CommentsBecause I was there when the economics department of my university got an IBM 360, I was very much caught up in the excitement of combining powerful computers with economic research. Unfortunately, I lost interest in econometrics almost as soon as I understood how it was done. My thinking went through four stages:

1. Holy shit! Do you see what you can do with a computer’s help.

2. Learning computer modeling puts you in a small class where only other members of the caste can truly understand you. This opens up huge avenues for fraud:

3. The main reason to learn stats is to prevent someone else from committing fraud against you.

4. More and more people will gain access to the power of statistical analysis. When that happens, the stratification of importance within the profession should be a matter of who asks the best questions.Disillusionment began to set in. I began to suspect that all the really interesting economic questions were FAR beyond the ability to reduce them to mathematical formulas. Watching computers being applied to other pursuits than academic economic investigations over time only confirmed those suspicions.

1. Precision manufacture is an obvious application for computing. And for many applications, this worked magnificently. Any design that combined straight line and circles could be easily described for computerized manufacture. Unfortunately, the really interesting design problems can NOT be reduced to formulas. A car’s fender, for example, cannot be described using formulas—it can only be described by specifying an assemblage of multiple points. If math formulas cannot describe something as common and uncomplicated as a car fender, how can it hope to describe human behavior?

2. When people started using computers for animation, it soon became apparent that human motion was almost impossible to model correctly. After a great deal of effort, the animators eventually put tracing balls on real humans and recorded that motion before transferring it to the animated character. Formulas failed to describe simple human behavior—like a toddler trying to walk.Lately, I have discovered a Swedish economist who did NOT give up econometrics merely because it sounded so impossible. In fact, he still teaches the stuff. But for the rest of us, he systematically destroys the pretensions of those who think they can describe human behavior with some basic Formulas.

## The limits of probabilistic reasoning

12 February, 2018 at 09:48 | Posted in Statistics & Econometrics | 8 CommentsProbabilistic reasoning in science — especially Bayesianism — reduces questions of rationality to questions of internal consistency (coherence) of beliefs, but, even granted this questionable reductionism, it’s not self-evident that rational agents really have to be probabilistically consistent. There is no strong warrant for believing so. Rather, there is strong evidence for us encountering huge problems if we let probabilistic reasoning become the dominant method for doing research in social sciences on problems that involve risk and uncertainty.

In many of the situations that are relevant to economics, one could argue that there is simply not enough of adequate and relevant information to ground beliefs of a probabilistic kind and that in those situations it is not possible, in any relevant way, to represent an individual’s beliefs in a single probability measure.

Say you have come to learn (based on own experience and tons of data) that the probability of you becoming unemployed in Sweden is 10%. Having moved to another country (where you have no own experience and no data) you have no information on unemployment and a fortiori nothing to help you construct any probability estimate on. A Bayesian would, however, argue that you would have to assign probabilities to the mutually exclusive alternative outcomes and that these have to add up to 1 if you are rational. That is, in this case – and based on symmetry – a rational individual would have to assign probability 10% to become unemployed and 90% to become employed.

That feels intuitively wrong though, and I guess most people would agree. Bayesianism cannot distinguish between symmetry-based probabilities from information and symmetry-based probabilities from an absence of information. In these kinds of situations, most of us would rather say that it is simply irrational to be a Bayesian and better instead to admit that we “simply do not know” or that we feel ambiguous and undecided. Arbitrary an ungrounded probability claims are more irrational than being undecided in face of genuine uncertainty, so if there is not sufficient information to ground a probability distribution it is better to acknowledge that simpliciter, rather than pretending to possess a certitude that we simply do not possess.

I think this critique of Bayesianism is in accordance with the views of John Maynard Keynes’ *A Treatise on Probability* (1921) and *General Theory* (1937). According to Keynes we live in a world permeated by unmeasurable uncertainty – not quantifiable stochastic risk – which often forces us to make decisions based on anything but rational expectations. Sometimes we “simply do not know.” Keynes would not have accepted the view of Bayesian economists, according to whom expectations “tend to be distributed, for the same information set, about the prediction of the theory.” Keynes, rather, thinks that we base our expectations on the confidence or ‘weight’ we put on different events and alternatives. To Keynes expectations are a question of weighing probabilities by ‘degrees of belief,’ beliefs that have preciously little to do with the kind of stochastic probabilistic calculations made by the rational agents modelled by probabilistically reasoning Bayesian economists.

We always have to remember that economics and statistics are two quite different things, and as long as economists cannot identify their statistical theories with real-world phenomena there is no real warrant for taking their statistical inferences seriously.

Just as there is no such thing as a ‘free lunch,’ there is no such thing as a ‘free probability.’ To be able at all to talk about probabilities, you have to specify a model. If there is no chance set-up or model that generates the probabilistic outcomes or events -– in statistics one refers to any process where you observe or measure as an experiment (rolling a die) and the results obtained as the outcomes or events (number of points rolled with the die, being e. g. 3 or 5) of the experiment -– there, strictly seen, is no event at all.

Probability is a relational element. It always must come with a specification of the model from which it is calculated. And then to be of any empirical scientific value it has to be shown to coincide with (or at least converge to) real data generating processes or structures –- something seldom or never done in economics.

And this is the basic problem!

If you have a fair roulette-wheel, you can arguably specify probabilities and probability density distributions. But how do you conceive of the analogous ‘nomological machines’ for prices, gross domestic product, income distribution etc? Only by a leap of faith. And that does not suffice in science. You have to come up with some really good arguments if you want to persuade people into believing in the existence of socio-economic structures that generate data with characteristics conceivable as stochastic events portrayed by probabilistic density distributions! Not doing that, you simply conflate statistical and economic inferences.

The present ‘machine learning’ and ‘big data’ hype shows that many social scientists — falsely — think that they can get away with analysing real-world phenomena without any (commitment to) theory. But — data never speaks for itself. Without a prior statistical set-up, there actually are no data at all to process. And — using a machine learning algorithm will only produce what you are looking for. Theory matters.

Causality in social sciences — and economics — can never solely be a question of statistical inference. Causality entails more than predictability, and to really in-depth explain social phenomena require theory. Analysis of variation — the foundation of all econometrics — can never in itself reveal how these variations are brought about. First when we are able to tie actions, processes or structures to the statistical relations detected, can we say that we are getting at relevant explanations of causation.

Most facts have many different, possible, alternative explanations, but we want to find the best of all contrastive (since all real explanation takes place relative to a set of alternatives) explanations. So which is the best explanation? Many scientists, influenced by statistical reasoning, think that the likeliest explanation is the best explanation. But the likelihood of x is not in itself a strong argument for thinking it explains y. I would rather argue that what makes one explanation better than another are things like aiming for and finding powerful, deep, causal, features and mechanisms that we have warranted and justified reasons to believe in. Statistical — especially the variety based on a Bayesian epistemology — reasoning generally has no room for these kinds of explanatory considerations. The only thing that matters is the probabilistic relation between evidence and hypothesis. That is also one of the main reasons I find abduction — inference to the best explanation — a better description and account of what constitute actual scientific reasoning and inferences.

And even worse — some economists using statistical methods think that algorithmic formalisms somehow give them access to causality. That is, however, simply not true. Assuming ‘convenient’ things like ‘faithfulness’ or ‘stability’ is to assume what has to be proven. Deductive-axiomatic methods used in statistics do no produce evidence for causal inferences. The real causality we are searching for is the one existing in the real world around us. If there is no warranted connection between axiomatically derived statistical theorems and the real-world, well, then we haven’t really obtained the causation we are looking for.

## Hierarchical models and clustered residuals (student stuff)

10 February, 2018 at 16:05 | Posted in Statistics & Econometrics | Comments Off on Hierarchical models and clustered residuals (student stuff)

Blog at WordPress.com.

Entries and comments feeds.