## How to do econometrics properly

2 Jun, 2023 at 13:26 | Posted in Statistics & Econometrics | 2 Comments- Always, but
*always*, plot your data. - Remember that data
*quality*is at least as important as data*quantity*. - Always ask yourself, “Do these results make economic/common sense”?
- Check whether your “statistically significant” results are also “numerically/economically significant”.
- Be sure that you know exactly what assumptions are used/needed to obtain the results relating to the properties of any estimator or test that you use.
- Just because someone else has used a particular approach to analyse a problem that looks like yours, that doesn’t mean they were right!
- “Test, test, test”! (David Hendry). But don’t forget that “pre-testing” raises some important issues of its own.
*Don’t*assume that the computer code that someone gives to you is relevant for your application, or that it even produces correct results.- Keep in mind that published results will represent only a fraction of the results that the author obtained, but is not publishing.
- Don’t forget that “peer-reviewed” does NOT mean “correct results”, or even “best practices were followed”.

Nowadays it has almost become a self-evident truism among economists that you cannot expect people to take your arguments seriously unless they are based on or backed up by advanced econometric modelling. So legions of mathematical-statistical theorems are proved — and heaps of fiction are being produced, masquerading as science. The rigour of the econometric modelling and the far-reaching assumptions they are built on is frequently not supported by data.

Modelling assumptions made in statistics and econometrics are more often than not made for mathematical tractability reasons, rather than verisimilitude. That is unfortunately also a reason why the methodological ‘rigour’ encountered when taking part of statistical and econometric research to a large degree is nothing but deceptive appearance. The models constructed may seem technically advanced and very ‘sophisticated,’ but that’s usually only because the problems here discussed have been swept under the carpet. Assuming that our data are generated by ‘coin flips’ in an imaginary ‘superpopulation’ only means that we get answers to questions that we are not asking.

## The limited epistemic value of ‘variation analysis’

23 May, 2023 at 07:20 | Posted in Statistics & Econometrics | 8 CommentsWhile appeal to R squared is a common rhetorical device, it is a very tenuous connection to any plausible explanatory virtues for many reasons. Either it is meant to be merely a measure of predictability in a given data set or it is a measure of causal influence. In either case it does not tell us much about explanatory power. Taken as a measure of predictive power, it is limited in that it predicts variances only. But what we mostly want to predict is levels, about which it is silent. In fact, two models can have exactly the same R squared and yet describe regression lines with very different slopes, the natural predictive measure of levels. Furthermore even in predicting variance, it is entirely dependent on the variance in the sample—if a covariate shows no variation, then it cannot predict anything. This leads to getting very different measures of explanatory power across samples for reasons not having any obvious connection to explanation.

Taken as a measure of causal explanatory power, R squared does not fare any better. The problem of explaining variances rather than levels shows up here as well—if it measures causal influence, it has to be influences on variances. But we often do not care about the causes of variance in economic variables but instead about the causes of levels of those variables about which it is silent. Similarly, because the size of R squared varies with variance in the sample, it can find a large effect in one sample and none in another for arbitrary, noncausal reasons. So while there may be some useful epistemic roles for R squared, measuring explanatory power is not one of them.

Although in a somewhat different context, Jon Elster makes basically the same observation as Kincaid:

Consider two elections, A and B. For each of them, identify the events that cause a given percentage of voters to turn out. Once we have thus explained the turnout in election A and the turnout in election B, the explanation of the difference (if any) follows automatically, as a by-product. As a bonus, we might be able to explain whether identical turnouts in A and B are accidental, that is, due to differences that exactly offset each other, or not. In practice, this procedure might be too demanding. The data or he available theories might not allow us to explain the phenomena “in and of themselves.” We should be aware, however, that if we do resort to explanation of variation, we are engaging in a second-best explanatory practice.

Modern econometrics is fundamentally based on assuming — usually without any explicit justification — that we can gain causal knowledge by considering independent variables that may have an impact on the *variation* of a dependent variable. As argued by both Kincaid and Elster, this is, however, far from self-evident. Often the *fundamental* causes are *constant* forces that are not amenable to the kind of analysis econometrics supplies us with. As Stanley Lieberson has it in Making It Count:

One can always say whether, in a given empirical context, a given variable or theory accounts for more variation than another. But it is almost certain that the variation observed is not universal over time and place. Hence the use of such a criterion first requires a conclusion about the variation over time and place in the dependent variable. If such an analysis is not forthcoming, the theoretical conclusion is undermined by the absence of information …

Moreover, it is questionable whether one can draw much of a conclusion about causal forces from simple analysis of the observed variation … To wit, it is vital that one have an understanding, or at least a working hypothesis, about what is causing the event per se; variation in the magnitude of the event will not provide the answer to that question.

Trygve Haavelmo was making a somewhat similar point back in 1941 when criticizing the treatment of the interest variable in Tinbergen’s regression analyses. The regression coefficient of the interest rate variable being zero was according to Haavelmo not sufficient for inferring that “variations in the rate of interest play only a minor role, or no role at all, in the changes in investment activity.” Interest rates may very well play a decisive indirect role by influencing other causally effective variables. And:

the rate of interest may not have varied much during the statistical testing period, and for this reason the rate of interest would not “explain” very much of the variation in net profit (and thereby the variation in investment) which has actually taken place during this period. But one cannot conclude that the rate of influence would be inefficient as an autonomous regulator, which is, after all, the important point.

This problem of ‘nonexcitation’ — when there is too little variation in a variable to say anything about its *potential* importance, and we can’t identify the reason for the *factual *influence of the variable being ‘negligible’ — strongly confirms that causality in economics and other social sciences can never solely be a question of statistical inference. Causality entails more than predictability, and to really in-depth explain social phenomena requires theory.

Analysis of variation — the foundation of all econometrics — can never in itself reveal *how* these variations are brought about. First, when we are able to tie actions, processes, or structures to the statistical relations detected, can we say that we are getting at relevant explanations of causation. Too much in love with axiomatic-deductive modelling, neoclassical economists especially tend to forget that accounting for causation — *how* causes bring about their effects — demands deep subject-matter knowledge and acquaintance with the intricate fabrics and contexts. As Keynes already argued in his *A Treatise on Probability*, statistics (and econometrics) should primarily be seen as means to describe patterns of associations and correlations, means that we may use as *suggestions* of possible causal relations. Forgetting that, economists will continue to be stuck with a second-best explanatory practice.

## Adjusting for confounding (student stuff)

12 May, 2023 at 11:31 | Posted in Statistics & Econometrics | Comments Off on Adjusting for confounding (student stuff).

Simpson’s paradox is an interesting paradox in itself, but it also highlights a deficiency in the traditional econometric approach towards causality. Say you have 1000 observations on men and an equal amount of observations on women applying for admission to university studies, and that 70% of men are admitted, but only 30% of women. Running a logistic regression to find out the odds ratios (and probabilities) for men and women on admission, females seem to be in a less favourable position (‘discriminated’ against) compared to males (male odds are 2.33, female odds are 0.43, giving an odds ratio of 5.44). But once we find out that males and females apply to different departments we may well get a Simpson’s paradox result where males turn out to be ‘discriminated’ against (say 800 males apply for economics studies (680 admitted) and 200 for physics studies (20 admitted), and 100 female apply for economics studies (90 admitted) and 900 for physics studies (210 admitted) — giving odds ratios of 0.62 and 0.37).

## What RCTs can and cannot tell us

5 May, 2023 at 09:54 | Posted in Statistics & Econometrics | 4 CommentsUnfortunately, social sciences’ hope that we can control simultaneously for a range of factors like education, labor force attachment, discrimination, and others is simply more wishful thinking.

The problem is that the causal relations underlying such associations are so complex and so irregular that the mechanical process of regression analysis has no hope of unpacking them. One hope for quantitative researchers who recognize the problems I have discussed is the use of experimentation – with the preferred terminology these days being randomized controlled trials (RCTs). RCTs supposedly get around the issues faced by regression analysis through the use of careful physical, experimental controls instead of statistical ones. The idea is that doing so will let one look at the effect of an individual factor, such as whether a student attended a particular reading program. In order to do this, one randomly assigns students to an experimental group and control group, which, in theory, will allow for firm attribution of cause and effect. Having done this, one hopes that the difference in achievement between the groups is a result of being in the reading program. Unfortunately, it may or may not be. You still have the problem that the social and pedagogical processes are so complex, with so many aspects for which to account, that, along some relevant dimensions, the control and experimental group will not be similar. That is, if you look closely at all potentially relevant factors, control groups almost always turn out systematically different from the experimental group, and the result is we no longer have the ability to make clear inferences. Instead, we need to use some form of statistical analysis to control for differences between the two groups. However, the application of statistical controls becomes an ad hoc exercise, even worse than the causal modeling regression approach. In the latter, at least there is a pretence of developing a complete model of potentially intervening variables whereas with the former a few covariates are selected rather arbitrarily as controls. In the end, one does not know whether to attribute achievement differences to the reading program or other factors.

Klees’ interesting article highlights some of the fundamental problems with the present idolatry of ‘evidence-based’ policies and randomization designs in the field of education. Unfortunately, we face the same problems in economics.

The point of making a randomized experiment is often said to be that it ‘ensures’ that any correlation between a supposed cause and effect indicates a causal relation. This is believed to hold since randomization (allegedly) ensures that a supposed causal variable does not correlate with other variables that may influence the effect.

The problem with that simplistic view of randomization is that the claims made are exaggerated and sometimes even false:

• Even if you manage to do the assignment to treatment and control groups ideally random, the sample selection certainly is — except in extremely rare cases — not random. Even if we make a proper randomized assignment, if we apply the results to a biased sample, there is always the risk that the experimental findings will not apply. What works ‘there,’ does not work ‘here.’ Randomization hence does not ‘guarantee ‘ or ‘ensure’ making the right causal claim. Although randomization may help us rule out certain possible causal claims, randomization *per se* does not *guarantee* anything!

• Even if both sampling and assignment are made in an ideal random way, performing standard randomized experiments only gives you averages. The problem here is that although we may get an estimate of the ‘true’ average causal effect, this may ‘mask’ important heterogeneous effects of a causal nature. Although we get the right answer of the average causal effect being 0, those who are ‘treated’ may have causal effects equal to -100, and those ‘not treated’ may have causal effects equal to 100. Contemplating whether being treated or not, most people would probably be interested in knowing about this underlying heterogeneity and would not consider the average effect particularly enlightening.

• There is almost always a trade-off between bias and precision. In real-world settings, a little bias often does not overtrump greater precision. And — most importantly — in case we have a population with sizeable heterogeneity, the average treatment effect of the sample may differ substantially from the average treatment effect in the population. If so, the value of any extrapolating inferences made from trial samples to other populations is highly questionable.

• Since most real-world experiments and trials build on performing single randomization, what *would* happen *if* you kept on randomizing forever, does not help you to ‘ensure’ or ‘guarantee’ that you do not make false causal conclusions in the one particular randomized experiment you actually do perform. It is indeed difficult to see why thinking about what you know you will never do, would make you happy about what you actually do.

• And then there is also the problem that ‘Nature’ may not always supply us with the random experiments we are most interested in. If we are interested in X, why should we study Y only because design dictates that? Method should never be prioritized over substance!

Nowadays many mainstream economists maintain that ‘imaginative empirical methods’ — especially ‘as-if-random’ natural experiments and RCTs — can help us to answer questions concerning the external validity of economic models. In their view, they are, more or less, tests of ‘an underlying economic model’ and enable economists to make the right selection from the ever-expanding ‘collection of potentially applicable models.’

It is widely believed among mainstream economists that the scientific value of randomization — contrary to other methods — is more or less uncontroversial and that randomized experiments are free from bias. When looked at carefully, however, there are in fact few real reasons to share this optimism on the alleged ’experimental turn’ in economics. Strictly seen, randomization does not guarantee anything.

‘Ideally’ controlled experiments tell us with certainty what causes what effects — but only given the right ‘closures.’ Making appropriate extrapolations from (ideal, accidental, natural, or quasi) experiments to different settings, populations, or target systems, is not easy. Causes deduced in an experimental setting still have to show that they come with an export warrant to the target population. The causal background assumptions made have to be justified, and without licenses to export, the value of ‘rigorous’ and ‘precise’ methods — and ‘on-average-knowledge’ — is despairingly small.

The almost religious belief with which its propagators — including ‘Nobel prize’ winners like Duflo, Banerjee and Kremer — portray it, cannot hide the fact that RCTs cannot be taken for granted to give generalizable results. That something works *somewhere* is no warranty for us to believe it to work for us *here* or that it works *generally*.

Leaning on an interventionist approach often means that instead of posing interesting questions on a social level, the focus is on individuals. Instead of asking about structural socio-economic factors behind, e.g., gender or racial discrimination, the focus is on the choices individuals make. Esther Duflo is a typical example of the dangers of this limiting approach. Duflo *et consortes* want to give up on ‘big ideas’ like political economy and institutional reform and instead go for solving more manageable problems ‘the way plumbers do.’ Yours truly is far from sure that is the right way to move economics forward and make it a relevant and realist science. A plumber can fix minor leaks in your system, but if the whole system is rotten, something more than good old fashion plumbing is needed. The big social and economic problems we face today are not going to be solved by plumbers performing interventions or manipulations in the form of RCTs.

The present RCT idolatry is dangerous. Believing randomization is the only way to achieve scientific validity blinds people to searching for and using other methods that in many contexts are better. Insisting on using only *one* tool often means using the *wrong* tool.

Randomization is not a panacea. It is not the best method for all questions and circumstances. Proponents of randomization make claims about its ability to deliver causal knowledge that is simply wrong. There are good reasons to share Klees’ scepticism on the now popular — and ill-informed — view that randomization is the only valid and the best method on the market. It is not.

## The Deadly Sin of Statistical Reification

4 May, 2023 at 12:59 | Posted in Statistics & Econometrics | 1 CommentPeople sometimes speak as if random variables “behave” in a certain way, as if they have a life of their own. Thus “X is normally distributed”, “W follows a gamma”, “The underlying distribution behind y is binomial”, and so on. To behave is to act, to be caused, to react. Somehow, it is thought, these distributions are causes. This is the Deadly Sin of Reification, perhaps caused by the beauty of the mathematics where, due to some mental abstraction, the equations undergo biogenesis. The behavior of these “random” creatures is expressed in language about “distributions.” We hear, “Many things are normally (gamma, Weibull, etc. etc.) distributed”, “Height is normally distributed”, “Y is binomial”, “Independent, identically distributed random variables”.

There is no such thing as a “true” distribution in any ontological sense. Examples abound. The temptation here is magical thinking. Strictly and without qualification, to say a thing is “distributed as” is to assume murky causes are at work, pushing variables this way and that knowing they are “part of” some mathematician’s probability distribution.

To say a thing “has” a distribution is false. The only thing we are privileged to say is things like this: “Give this-and-such set of premises, the probability X takes this value equals that”, where “that” is calculated via a probability implied by the premises … Probability is a matter of ascribable or quantifiable uncertainty, a logical relation between accepted premises and some specified proposition, and nothing more.

In econometrics one often gets the feeling that many of its practitioners think of it as a kind of automatic inferential machine: input data and out comes casual knowledge. Like pulling a rabbit from a hat. Great — but first you have to put the rabbit in the hat. And this is where assumptions about distributions and probabilities come into the picture.

The assumption of imaginary “super populations” is one of the many dubious assumptions used in modern econometrics.

As social scientists — and economists — we have to confront the all-important question of how to handle uncertainty and randomness. Should we define randomness with probability? If we do, we have to accept that to speak of randomness we also have to presuppose the existence of nomological probability machines, since probabilities cannot be spoken of – and actually, to be strict, do not at all exist – without specifying such system-contexts. Accepting a domain of probability theory and sample space of infinite populations also implies that judgments are made on the basis of observations that are actually never made!

Infinitely repeated trials or samplings never take place in the real world. So that cannot be a sound inductive basis for a science with aspirations of explaining real-world socio-economic processes, structures or events. It’s not tenable.

And as if this wasn’t enough, one could — as we’ve seen — also seriously wonder what kind of “populations” these statistical and econometric models ultimately are based on. Why should we as social scientists — and not as pure mathematicians working with formal-axiomatic systems without the urge to confront our models with real target systems — unquestioningly accept models based on concepts like the “infinite super populations” used in e.g. the potential outcome framework that has become so popular lately in social sciences?

Of course, one could treat observational or experimental data as random samples from real populations. I have no problem with that. But probabilistic econometrics does not content itself with that kind of populations. Instead it creates imaginary populations of “parallel universes” and assumes that our data are random samples from that kind of “infinite super populations.”

But this is actually nothing else but hand-waving! And it is inadequate for real science. As David Freedman writes:

With this approach, the investigator does not explicitly define a population that could in principle be studied, with unlimited resources of time and money. The investigator merely

assumesthat such a population exists in some ill-defined sense. And there is a further assumption, that the data set being analyzed can be treatedas ifit were based on a random sample from the assumed population.These are convenient fictions… Nevertheless, reliance on imaginary populations is widespread. Indeed regression models are commonly used to analyze convenience samples… The rhetoric of imaginary populations is seductive because it seems to free the investigator from the necessity of understanding how data were generated.

In social sciences — including economics — it’s always wise to ponder C. S. Peirce’s remark that universes are not as common as peanuts …

## Subjective probability — answering questions nobody asked

2 May, 2023 at 18:52 | Posted in Statistics & Econometrics | Comments Off on Subjective probability — answering questions nobody askedSolve for x — give a single, unique number — in the following equation: x + y = 3. Of course, it cannot be done: under no rules of mathematics can a unique x be discovered; there are one too many unknowns. Nevertheless, someone holding to the subjective interpretation of probability could tell us, say, “1 feel x = 7.” Or he might say, “The following is my distribution for the possible values of x.” He’ll draw a picture, a curve of probability showing higher and lower chances for each possible x, maybe peaking somewhere near 3 and tailing off for very large and small numbers. He might say his curve is equivalent to one from the standard toolkit, such as the normal. Absurd?

It shouldn’t sound absurd. The situation is perfectly delineated. The open premise is that x = 3 — y, with a tacit premise that y must be something. The logical probability answer is that there is no probability: not enough information. (We don’t even know if y should be a real number!) But why not, a subjectivist might say, take a “maximal ignorance” position, which implies, he assumes, thaty can be any number, with none being preferred over any other. This leads to something like a “uniform distribution” over the real line; that being so, x is easily solved for, once for each value of y. Even if we allow the subjectivist free rein, this decision of uniformity is unfortunate because it leads to well known logical absurdities. There cannot be an equal probability for infinite alternatives because the sum of probabilities, no matter how small each of the infinite possibilities is, is always (in the limit) infinity; and indeed this particular uniform “distribution” is called “improper.” Giving the non¬ probability a label restores a level of comfort lost upon realizing the non-probability isn’t a probability, but it is a false comfort. Aiding the subjectivist is that the math using improper probabilities sometimes works out, and if the math works out, what’s to complain about?

To say we are “maximally ignorant” of y, or to say anything else about y (or x), is to add information or invent evidence which is not provided. Adding information that is not present or is not plausibly tacit is to change the problem. If we are allowed to arbitrarily change any problem so that it is more to our liking we shall, naturally, be able to solve these problems more easily. But we are not solving the stated problems. We are answering questions nobody asked.

Modern probabilistic econometrics relies on the notion of probability. To at all be amenable to econometric analysis, economic observations allegedly have to be conceived as random events. But is it really necessary to model the economic system as a system where randomness can only be analyzed and understood when based on an *a priori* notion of probability?

In probabilistic econometrics, events and observations are as a rule interpreted as random variables *as if *generated by an underlying probability density function, and* a fortiori *– since probability density functions are only definable in a probability context – consistent with a probability. As Haavelmo (1944:iii) has it:

For no tool developed in the theory of statistics has any meaning – except , perhaps for descriptive purposes – without being referred to some stochastic scheme.

When attempting to convince us of the necessity of founding empirical economic analysis on probability models, Haavelmo – building largely on the earlier Fisherian paradigm – actually forces econometrics to (implicitly) interpret events as random variables generated by an underlying probability density function.

This is at odds with reality. Randomness obviously is a fact of the real world. Probability, on the other hand, attaches to the world via intellectually constructed models, and *a fortiori* is only a fact of a probability-generating machine or a well-constructed experimental arrangement or “chance set-up”.

Just as there is no such thing as a “free lunch,” there is — as forcefully argued by Briggs — no such thing as a “free probability.” To be able at all to talk about probabilities, you have to specify a model. If there is no chance set-up or model that generates the probabilistic outcomes or events – in statistics, one refers to any process where you observe or measure as an experiment (rolling a die) and the results obtained as the *outcomes* or *events* (number of points rolled with the die, being e. g. 3 or 5) of the experiment –there strictly seen is no event at all.

Probability is a relational element. It always must come with a specification of the model from which it is calculated. And then to be of any empirical scientific value it has to be *shown* to coincide with (or at least converge to) real data generating processes or structures – something seldom or never done!

And this is the basic problem with economic data. If you have a fair roulette wheel, you can arguably specify probabilities and probability density distributions. But how do you conceive of the analogous nomological machines for prices, gross domestic product, income distribution etc? Only by a leap of faith. And that does not suffice. You have to come up with some really good arguments if you want to persuade people into believing in the existence of socioeconomic structures that generate data with characteristics conceivable as stochastic events portrayed by probabilistic density distributions!

Continue Reading Subjective probability — answering questions nobody asked…

## The econometric dream-world

30 Apr, 2023 at 09:50 | Posted in Statistics & Econometrics | Comments Off on The econometric dream-worldTrygve Haavelmo — with the completion (in 1958) of the twenty-fifth volume of *Econometrica —* assessed the role of econometrics in the advancement of economics, and although mainly positive of the “repair work” and “clearing-up work” done, he also found some grounds for despair:

We have found certain general principles which would seem to make good sense. Essentially, these principles are based on the reasonable idea that, if an economic model is in fact “correct” or “true,” we can say something a priori about the way in which the data emerging from it must behave. We can say something, a priori, about whether it is theoretically possible to estimate the parameters involved. And we can decide, a priori, what the proper estimation procedure should be … But the concrete results of these efforts have often been a seemingly

lower degree of accuracyof the would-be economic laws (i.e., larger residuals), or coefficients that seem a priori less reasonable than those obtained by using cruder or clearly inconsistent methods.There is the possibility that the more stringent methods we have been striving to develop have actually opened our eyes to recognize a plain fact: viz., that the “laws” of economics are not very accurate in the sense of a close fit, and that we have been living in a dream-world of large but somewhat superficial or spurious correlations.

Another of the founding fathers of modern probabilistic econometrics, Ragnar Frisch, shared Haavelmo’s doubts on the applicability of econometrics:

I have personally always been skeptical of the possibility of making macroeconomic predictions about the development that will follow on the basis of given initial conditions … I have believed that the analytical work will give higher yields – now and in the near future – if they become applied in macroeconomic decision models where the line of thought is the following: “If this or that policy is made, and these conditions are met in the period under consideration, probably a tendency to go in this or that direction is created”.

Econometrics may be an informative tool for research. But if its practitioners do not investigate and make an effort to provide a justification for the credibility of the assumptions on which they erect their building, it will not fulfil its tasks. There is a gap between its aspirations and its accomplishments. Without more supportive evidence to substantiate its claims, critics will continue to consider its ultimate argument as a mixture of rather unhelpful metaphors and metaphysics. Maintaining economics should be a science in the ‘true knowledge’ business, yours truly remains a sceptic of the pretences and aspirations of econometrics. So far, I cannot really see that it has yielded very much in terms of relevant, interesting economic knowledge.

The marginal return on its ever-higher technical sophistication in no way makes up for the lack of serious under-labouring of its deeper philosophical and methodological foundations that already Keynes complained about. The rather one-sided emphasis of usefulness and its concomitant instrumentalist justification cannot hide that the legions of probabilistic econometricians who give supportive evidence for their considering it “fruitful to believe” in the possibility of treating unique economic data as the observable results of random drawings from an imaginary sampling of an imaginary population, are skating on thin ice. After years having analyzed ts ontological and epistemological foundations, I cannot but conclude that econometrics on the whole has not delivered ‘truth,’ nor robust forecasts.

## Causal assumptions in need of careful justification

24 Apr, 2023 at 15:22 | Posted in Statistics & Econometrics | Comments Off on Causal assumptions in need of careful justificationAs is brilliantly attested by the work of Pearl, an extensive and fruitful theory of causality can be erected upon the foundation of a Pearlian DAG. So, when we can assume that a certain DAG is indeed a Pearlian DAG representation of a system, we can apply that theory to further our causal understanding of the system. But this leaves entirely untouched the vital questions: when is a Pearlian DAG representation of a system appropriate at all?; and, when it is, when can a specific DAG D be regarded as filling this rôle? As we have seen, Pearlian representability requires many strong relationships to hold between the behaviours of the system under various kinds of interventions.

Causal discovery algorithms … similarly rely on strong assumptions … The need for such assumptions chimes with Cartwright’s maxim “No causes in, no causes out”, and goes to refute the apparently widespread belief that we are in possession of a soundly-based technology for drawing causal conclusions from purely observational data, without further assumptions …

In my view, the strong assumptions needed even to get started with causal interpretation of a DAG are far from self-evident as a matter of course, and whenever such an interpretation is proposed in a real-world context these assumptions should be carefully considered and justified. Without such justification, why should we have any faith at all in, say, the application of Pearl’s causal theory, or in the output of causal discovery algorithms?

But what would count as justification? … It cannot be conducted entirely within a model, but must, as a matter of logic, involve consideration of the interpretation of the terms in the model in the real world.

## The importance of not equating science with statistical calculations

21 Apr, 2023 at 12:27 | Posted in Statistics & Econometrics | Comments Off on The importance of not equating science with statistical calculationsAll science entails human judgment, and using statistical models doesn’t relieve us of that necessity. Working with misspecified models, the scientific value of statistics is actually zero — even though you’re making valid statistical inferences! Statistical models are no substitutes for doing real science. Or as a famous German philosopher famously wrote 150 years ago:

There is no royal road to science, and only those who do not dread the fatiguing climb of its steep paths have a chance of gaining its luminous summits.

We should never forget that the underlying parameters we use when performing statistical tests are *model constructions*. And if the model is wrong, the value of our calculations is nil. As mathematical statistician and ‘shoe-leather researcher’ David Freedman wrote in *Statistical Models and Causal Inference*:

I believe model validation to be a central issue. Of course, many of my colleagues will be found to disagree. For them, fitting models to data, computing standard errors, and performing significance tests is “informative,” even though the basic statistical assumptions (linearity, independence of errors, etc.) cannot be validated. This position seems indefensible, nor are the consequences trivial. Perhaps it is time to reconsider.

All of this, of course, does also apply when we use statistics in economics. Most work in econometrics and regression analysis is — still — made on the assumption that the researcher has a theoretical model that is ‘true.’ Based on this belief of having a correct specification for an econometric model or running a regression, one proceeds as if the only problem remaining to solve has to do with measurement and observation.

When things sound too good to be true, they usually aren’t. And that goes for econometrics too. The snag is that there is pretty little to support the perfect specification assumption. Looking around in social science and economics we don’t find a single regression or econometric model that lives up to the standards set by the ‘true’ theoretical model — and there is pretty little that gives us reason to believe things will be different in the future.

To think that we are being able to construct a model where all relevant variables are included and correctly specify the functional relationships that exist between them is not only a belief without support but a belief *impossible* to support.

The theories we work with when building our econometric regression models are insufficient. No matter what we study, there are always some variables missing, and we don’t know the correct way to functionally specify the relationships between the variables.

*Every* regression model constructed is misspecified. There is always an endless list of possible variables to include and endless possible ways to specify the relationships between them. So every applied econometrician comes up with his own specification and ‘parameter’ estimates. The econometric Holy Grail of consistent and stable parameter values is nothing but a dream.

In order to draw inferences from data as described by econometric texts, it is necessary to make whimsical assumptions. The professional audience consequently and properly withholds belief until an inference is shown to be adequately insensitive to the choice of assumptions. The haphazard way we individually and collectively study the fragility of inferences leaves most of us unconvinced that any inference is believable. If we are to make effective use of our scarce data resource, it is therefore important that we study fragility in a much more systematic way. If it turns out that almost all inferences from economic data are fragile, I suppose we shall have to revert to our old methods …

A rigorous application of econometric methods in economics really presupposes that the phenomena of our real-world economies are ruled by stable causal relations between variables. Parameter values estimated in specific spatio-temporal contexts are *presupposed* to be exportable to totally different contexts. To warrant this assumption one, however, has to convincingly establish that the targeted acting causes are stable and invariant so that they maintain their parametric status after the bridging. The endemic lack of predictive success of the econometric project indicates that this hope of finding fixed parameters is a hope for which there really is no other ground than hope itself.

The theoretical conditions that have to be fulfilled for regression analysis and econometrics to really work are nowhere even closely met in reality. Making outlandish statistical assumptions does not provide a solid ground for doing relevant social science and economics. Although regression analysis and econometrics have become the most used quantitative methods in social sciences and economics today, it’s still a fact that the inferences made from them are invalid.

Econometrics — and regression analysis — is basically a deductive method. Given the assumptions (such as manipulability, transitivity, separability, additivity, linearity, etc) it delivers deductive inferences. The problem, of course, is that we will never completely know when the assumptions are right. Conclusions can only be as certain as their premises — and that also applies to econometrics and regression analysis.

## Some pitfalls of using randomization and instrumental variables

17 Apr, 2023 at 11:08 | Posted in Statistics & Econometrics | Comments Off on Some pitfalls of using randomization and instrumental variables.

Great presentation, but I think Angrist should have also mentioned that although ‘ideally controlled experiments’ may tell us with certainty what causes what effects, this is so only when given the right ‘closures.’ Making appropriate extrapolations from — ideal, accidental, natural or quasi — experiments to different settings, populations or target systems, is not easy. “It works there” is no evidence for “it will work here.” The causal background assumptions made have to be justified, and without licenses to export, the value of ‘rigorous’ and ‘precise’ methods used when analyzing ‘natural experiments’ is often despairingly small. Since the core assumptions on which IV analysis builds are NEVER directly testable, those of us who choose to use instrumental variables to find out about causality ALWAYS have to defend and argue for the validity of the assumptions the causal inferences build on. Especially when dealing with natural experiments, we should be very cautious when being presented with causal conclusions without convincing arguments about the veracity of the assumptions made. If you are out to make causal inferences you have to rely on a trustworthy theory of the data-generating process. The empirical results causal analysis supplies us with are only as good as the assumptions we make about the data-generating process.

It also needs to be pointed out that many economists, when they use instrumental variables analysis, make the mistake of thinking that swapping an assumption of residuals being uncorrelated with the independent variables with the assumption that the same residuals are uncorrelated with an instrument doesn’t solve the endogeneity problem or improve the causal analysis.

The present interest in randomization, instrumental variables estimation, and natural experiments, is an expression of a new trend in economics, where there is a growing interest in (ideal, quasi, natural) experiments and — not least — how to design them to possibly provide answers to questions about causality and policy effects. Economic research on e. g. discrimination nowadays often emphasizes the importance of a randomization design, for example when trying to determine to what extent discrimination can be causally attributed to differences in preferences or information, using so-called correspondence tests and field experiments.

A common starting point is the ‘counterfactual approach’ developed mainly by Neyman and Rubin, which is often presented and discussed based on examples of research designs like randomized control studies, natural experiments, difference in difference, matching, regression discontinuity, etc.

Mainstream economists generally view this development of the economics toolbox positively. Since yours truly — like, for example, Nancy Cartwright and Angus Deaton — is not entirely positive about the randomization approach, I will share with you some of my criticisms.

A notable limitation of counterfactual randomization designs is that they only give us answers on how ‘treatment groups’ differ on average from ‘control groups.’ Let me give an example to illustrate how limiting this fact can be:

Among school debaters and politicians in Sweden, it is claimed that so-called ‘independent schools’ (charter schools) are better than municipal schools. They are said to lead to better results. To find out if this is really the case, a number of students are randomly selected to take a test. The result could be: Test result = 20 + 5T, where T=1 if the student attends an independent school and T=0 if the student attends a municipal school. This would confirm the assumption that independent school students have an average of 5 points higher results than students in municipal schools. Now, politicians (hopefully) are aware that this statistical result cannot be interpreted in causal terms because independent school students typically do not have the same background (socio-economic, educational, cultural, etc.) as those who attend municipal schools (the relationship between school type and result is confounded by selection bias). To obtain a better measure of the causal effects of school type, politicians suggest that 1000 students be admitted to an independent school through a lottery — a classic example of a randomization design in natural experiments. The chance of winning is 10%, so 100 students are given this opportunity. Of these, 20 accept the offer to attend an independent school. Of the 900 lottery participants who do not ‘win,’ 100 choose to attend an independent school. The lottery is often perceived by school researchers as an ‘instrumental variable,’ and when the analysis is carried out, the result is: Test result = 20 + 2T. This is standardly interpreted as having obtained a causal measure of how much better students would, on average, perform on the test if they chose to attend independent schools instead of municipal schools. But is it true? No! If not all school students have exactly the same test results (which is a rather far-fetched ‘homogeneity assumption’), the specified average causal effect only applies to the students who choose to attend an independent school if they ‘win’ the lottery, but who would not otherwise choose to attend an independent school (in statistical jargon, we call these ‘compliers’). It is difficult to see why this group of students would be particularly interesting in this example, given that the average causal effect estimated using the instrumental variable says nothing at all about the effect on the majority (the 100 out of 120 who choose to attend an independent school without ‘winning’ in the lottery) of those who choose to attend an independent school.

Conclusion: Researchers must be much more careful in interpreting ‘average estimates’ as causal. Reality exhibits a high degree of heterogeneity, and ‘average parameters’ often tell us very little!

To randomize ideally means that we achieve orthogonality (independence) in our models. But it does not mean that in real experiments when we randomize, we achieve this ideal. The ‘balance’ that randomization should ideally result in cannot be taken for granted when the ideal is translated into reality. Here, one must argue and verify that the ‘assignment mechanism’ is truly stochastic and that ‘balance’ has indeed been achieved!

Even if we accept the limitation of only being able to say something about average treatment effects there is another theoretical problem. An ideal randomized experiment assumes that a number of individuals are first chosen from a randomly selected population and then randomly assigned to a treatment group or a control group. Given that both selection and assignment are successfully carried out randomly, it can be shown that the expected outcome difference between the two groups is the average causal effect in the population. The snag is that the experiments conducted almost never involve participants selected from a random population! In most cases, experiments are started because there is a problem of some kind in a given population (e.g., schoolchildren or job seekers in country X) that one wants to address. An ideal randomized experiment assumes that both selection and assignment are randomized — this means that virtually none of the empirical results that randomization advocates so eagerly tout hold up in a strict mathematical-statistical sense. The fact that only assignment is talked about when it comes to ‘as if’ randomization in natural experiments is hardly a coincidence. Moreover, when it comes to ‘as if’ randomization in natural experiments, the sad but inevitable fact is that there can always be a dependency between the variables being studied and unobservable factors in the error term, which can never be tested!

Another significant and major problem is that researchers who use these randomization-based research strategies often set up problem formulations that are not at all the ones we really want answers to, in order to achieve ‘exact’ and ‘precise’ results. Design becomes the main thing, and as long as one can get more or less clever experiments in place, they believe they can draw far-reaching conclusions about both causality and the ability to generalize experimental outcomes to larger populations. Unfortunately, this often means that this type of research has a negative bias away from interesting and important problems towards prioritizing method selection. Design and research planning are important, but the credibility of research ultimately lies in being able to provide answers to relevant questions that both citizens and researchers want answers to.

Believing there is only one really good evidence-based method on the market — and that randomization is the only way to achieve scientific validity — blinds people to searching for and using other methods that in many contexts are better. Insisting on using only *one* tool often means using the *wrong* tool.

## Bayes theorem — what’s the big deal?

3 Apr, 2023 at 18:25 | Posted in Statistics & Econometrics | 2 CommentsThere’s nothing magical about Bayes’ theorem. It boils down to the truism that your belief is only as valid as its evidence. If you have good evidence, Bayes’ theorem can yield good results. If your evidence is flimsy, Bayes’ theorem won’t be of much use. Garbage in, garbage out.

The potential for Bayes abuse begins with your initial estimate of the probability of your belief, often called the “prior” …

In many cases, estimating the prior is just guesswork, allowing subjective factors to creep into your calculations. You might be guessing the probability of something that–unlike cancer—does not even exist, such as strings, multiverses, inflation or God. You might then cite dubious evidence to support your dubious belief. In this way, Bayes’ theorem can promote pseudoscience and superstition as well as reason.

Embedded in Bayes’ theorem is a moral message: If you aren’t scrupulous in seeking alternative explanations for your evidence, the evidence will just confirm what you already believe. Scientists often fail to heed this dictum, which helps explains why so many scientific claims turn out to be erroneous. Bayesians claim that their methods can help scientists overcome confirmation bias and produce more reliable results, but I have my doubts.

And as I mentioned above, some string and multiverse enthusiasts are embracing Bayesian analysis. Why? Because the enthusiasts are tired of hearing that string and multiverse theories are unfalsifiable and hence unscientific, and Bayes’ theorem allows them to present the theories in a more favorable light. In this case, Bayes’ theorem, far from counteracting confirmation bias, enables it.

One of yours truly’s favourite ‘problem situating lecture arguments’ against Bayesianism goes something like this: Assume you’re a Bayesian turkey and hold a nonzero probability belief in the hypothesis H that “people are nice vegetarians that do not eat turkeys and that every day I see the sun rise confirms my belief.” For every day you survive, you update your belief according to Bayes’ Rule

P(H|e) = [P(e|H)P(H)]/P(e),

where evidence e stands for “not being eaten” and P(e|H) = 1. Given that there do exist other hypotheses than H, P(e) is less than 1 and *a fortiori* P(H|e) is greater than P(H). Every day you survive increases your probability belief that you will not be eaten. This is totally rational according to the Bayesian definition of rationality. Unfortunately — as Bertrand Russell famously noticed — for every day that goes by, the traditional Christmas dinner also gets closer and closer …

Neoclassical economics nowadays usually assumes that agents that have to make choices under conditions of uncertainty behave according to Bayesian rules (preferably the ones axiomatized by **Ramsey** (1931), **de Finetti** (1937) or **Savage** (1954)) – that is, they maximize expected utility with respect to some subjective probability measure that is continually updated according to Bayes theorem. If not, they are supposed to be irrational, and ultimately – via some “Dutch book” or “money pump” argument – susceptible to being ruined by some clever “bookie”.

Bayesianism reduces questions of rationality to questions of internal consistency (coherence) of beliefs, but – even granted this questionable reductionism – do rational agents really have to be Bayesian? As I have been arguing repeatedly over the years, there is no strong warrant for believing so.

In many of the situations that are relevant to economics, one could argue that there is simply not enough adequate and relevant information to ground beliefs of a probabilistic kind and that in those situations it is not really possible, in any relevant way, to represent an individual’s beliefs in a single probability measure.

Say you have come to learn (based on your own experience and tons of data) that the probability of you becoming unemployed in the US is 10%. Having moved to another country (where you have no own experience and no data) you have no information on unemployment and a fortiori nothing to help you construct any probability estimate on. A Bayesian would, however, argue that you would have to assign probabilities to the mutually exclusive alternative outcomes and that these have to add up to 1 if you are rational. That is, in this case – and based on symmetry – a rational individual would have to assign a probability of 10% of becoming unemployed and 90% of becoming employed.

That feels intuitively wrong though, and I guess most people would agree. Bayesianism cannot distinguish between symmetry-based probabilities from information and symmetry-based probabilities from an absence of information. In these kinds of situations, most of us would rather say that it is simply irrational to be a Bayesian and better instead to admit that we “simply do not know” or that we feel ambiguous and undecided. Arbitrary and ungrounded probability claims are more irrational than being undecided in face of genuine uncertainty, so if there is not sufficient information to ground a probability distribution it is better to acknowledge that simpliciter, rather than pretending to possess a certitude that we simply do not possess.

I think this critique of Bayesianism is in accordance with the views of **Keynes**’ *A Treatise on Probability* (1921) and *General Theory* (1937). According to Keynes we live in a world permeated by unmeasurable uncertainty – not quantifiable stochastic risk – which often forces us to make decisions based on anything but rational expectations. Sometimes we “simply do not know.” Keynes would not have accepted the view of Bayesian economists, according to whom expectations “tend to be distributed, for the same information set, about the prediction of the theory.” Keynes, rather, thinks that we base our expectations on the confidence or “weight” we put on different events and alternatives. To Keynes, expectations are a question of weighing probabilities by “degrees of belief”, beliefs that have preciously little to do with the kind of stochastic probabilistic calculations made by the rational agents modeled by Bayesian economists.

The bias toward the superficial and the response to extraneous influences on research are both examples of real harm done in contemporary social science by a roughly Bayesian paradigm of statistical inference as the epitome of empirical argument. For instance the dominant attitude toward the sources of black-white differential in United States unemployment rates (routinely the rates are in a two to one ratio) is “phenomenological.” The employment differences are traced to correlates in education, locale, occupational structure, and family background. The attitude toward further, underlying causes of those correlations is agnostic … Yet on reflection, common sense dictates that racist attitudes and institutional racism

mustplay an important causal role. People do have beliefs that blacks are inferior in intelligence and morality, and they are surely influenced by these beliefs in hiring decisions … Thus, an overemphasis on Bayesian success in statistical inference discourages the elaboration of a type of account of racial disadavantages that almost certainly provides a large part of their explanation.

## ‘Severe tests’ of causal claims

26 Mar, 2023 at 21:29 | Posted in Statistics & Econometrics | Comments Off on ‘Severe tests’ of causal claimsFor many questions in the social sciences, a research design guaranteeing the validity of causal inferences is difficult to obtain. When this is the case, researchers can attempt to defend hypothesized causal relationships by seeking data that subjects their theory to repeated falsification. Karl Popper famously argued that the degree to which we have confidence in a hypothesis is not necessarily a function of the number of tests it has withstood, but rather the severity of the tests to which the hypothesis has been subjected. A test of a hypothesis with a design susceptible to hidden bias is not particularly severe or determinative. If the implication is tested in many contexts, however, with different designs that have distinct sources of bias, and the hypothesis is still not rejected, then one may have more confidence that the causal relationship is genuine. Note that repeatedly testing a hypothesis with research designs suffering from similar types of bias does not constitute a severe test, since each repetition will merely replicate the biases of the original design. In cases where randomized experiments are infeasible or credible natural experiments are unavailable, the inferential difficulties facing researchers are large. In such circumstances, only creative and severe falsification tests can make the move from correlation to causation convincing.

## Assumption uncertainty

21 Mar, 2023 at 14:31 | Posted in Statistics & Econometrics | Comments Off on Assumption uncertaintyAn ongoing concern is that excessive focus on formal modeling and statistics can lead to neglect of practical issues and to overconfidence in formal results … Analysis interpretation depends on contextual judgments about how reality is to be mapped onto the model, and how the formal analysis results are to be mapped back into reality. But overconfidence in formal outputs is only to be expected when much labor has gone into deductive reasoning. First, there is a need to feel the labor was justified, and one way to do so is to believe the formal deduction produced important conclusions. Second, there seems to be a pervasive human aversion to uncertainty, and one way to reduce feelings of uncertainty is to invest faith in deduction as a sufficient guide to truth. Unfortunately, such faith is as logically unjustified as any religious creed, since a deduction produces certainty about the real world only when its assumptions about the real world are certain …

Unfortunately, assumption uncertainty reduces the status of deductions and statistical computations to exercises in hypothetical reasoning – they provide best-case scenarios of what we could infer from specific data (which are assumed to have only specific, known problems). Even more unfortunate, however, is that this exercise is deceptive to the extent it ignores or misrepresents available information, and makes hidden assumptions that are unsupported by data …

Despite assumption uncertainties, modelers often express only the uncertainties derived within their modeling assumptions, sometimes to disastrous consequences. Econometrics supplies dramatic cautionary examples in which complex modeling has failed miserably in important applications …

Much time should be spent explaining the full details of what statistical models and algorithms actually assume, emphasizing the extremely hypothetical nature of their outputs relative to a complete (and thus nonidentified) causal model for the data-generating mechanisms. Teaching should especially emphasize how formal ‘‘causal inferences’’ are being driven by the assumptions of randomized (‘‘ignorable’’) system inputs and random observational selection that justify the ‘‘causal’’ label.

Yes, indeed, complex modeling when applying statistics theory fails miserably over and over again. One reason why it does — prominent in econometrics — is that the error term in the regression models standardly used is thought of as representing the effect of the variables that were omitted from the models. The error term is somehow thought to be a ‘cover-all’ term representing omitted content in the model and necessary to include to ‘save’ the assumed deterministic relation between the other random variables included in the model. Error terms are usually assumed to be orthogonal (uncorrelated) to the explanatory variables. But since they are unobservable, they are also impossible to empirically test. And without justification of the orthogonality assumption, there is as a rule nothing to ensure identifiability:

With enough math, an author can be confident that most readers will never figure out where a FWUTV (facts with unknown truth value) is buried. A discussant or referee cannot say that an identification assumption is not credible if they cannot figure out what it is and are too embarrassed to ask.

Distributional assumptions about error terms are a good place to bury things because hardly anyone pays attention to them. Moreover, if a critic does see that this is the identifying assumption, how can she win an argument about the true expected value the level of aether? If the author can make up an imaginary variable, “because I say so” seems like a pretty convincing answer to any question about its properties.

## On the limited value of randomization

18 Mar, 2023 at 19:50 | Posted in Statistics & Econometrics | Comments Off on On the limited value of randomizationIn *Social Science and Medicine* (December 2017), Angus Deaton & Nancy Cartwright argue that Randomized Controlled Trials (RCTs) do not have any warranted special status. They are, simply, far from being the ‘gold standard’ they are usually portrayed as:

Contrary to frequent claims in the applied literature, randomization does not equalize everything other than the treatment in the treatment and control groups, it does not automatically deliver a precise estimate of the average treatment effect (ATE), and it does not relieve us of the need to think about (observed or unobserved) covariates. Finding out whether an estimate was generated by chance is more difficult than commonly believed. At best, an RCT yields an unbiased estimate, but this property is of limited practical value. Even then, estimates apply only to the sample selected for the trial, often no more than a convenience sample, and justification is required to extend the results to other groups, including any population to which the trial sample belongs, or to any individual, including an individual in the trial. Demanding ‘external validity’ is unhelpful because it expects too much of an RCT while undervaluing its potential contribution. RCTs do indeed require minimal assumptions and can operate with little prior knowledge. This is an advantage when persuading distrustful audiences, but it is a disadvantage for cumulative scientific progress, where prior knowledge should be built upon, not discarded. RCTs can play a role in building scientific knowledge and useful predictions but they can only do so as part of a cumulative program, combining with other methods, including conceptual and theoretical development, to discover not ‘what works’, but ‘why things work’.

In a comment on Deaton & Cartwright, statistician Stephen Senn argues that on several issues concerning randomization Deaton & Cartwright “simply confuse the issue,” that their views are “simply misleading and unhelpful” and that they make “irrelevant” simulations:

My view is that randomisation should not be used as an excuse for ignoring what is known and observed but that it does deal validly with hidden confounders. It does not do this by delivering answers that are guaranteed to be correct; nothing can deliver that. It delivers answers about which valid probability statements can be made and, in an imperfect world, this has to be good enough. Another way I sometimes put it is like this: show me how you will analyse something and I will tell you what allocations are exchangeable. If you refuse to choose one at random I will say, “why? Do you have some magical thinking you’d like to share?”

Contrary to Senn, Andrew Gelman shares Deaton’s and Cartwright’s view that randomized trials often are overrated:

There is a strange form of reasoning we often see in science, which is the idea that a chain of reasoning is as strong as its strongest link. The social science and medical research literature is full of papers in which a randomized experiment is performed, a statistically significant comparison is found, and then story time begins, and continues, and continues—as if the rigor from the randomized experiment somehow suffuses through the entire analysis …

One way to get a sense of the limitations of controlled trials is to consider the conditions under which they can yield meaningful, repeatable inferences. The measurement needs to be relevant to the question being asked; missing data must be appropriately modeled; any relevant variables that differ between the sample and population must be included as potential treatment interactions; and the underlying effect should be large. It is difficult to expect these conditions to be satisfied without good substantive understanding. As Deaton and Cartwright put it, “when little prior knowledge is available, no method is likely to yield well-supported conclusions.” Much of the literature in statistics, econometrics, and epidemiology on causal identification misses this point, by focusing on the procedures of scientific investigation—in particular, tools such as randomization and p-values which are intended to enforce rigor—without recognizing that rigor is empty without something to be rigorous about.

Yours truly’s view is that nowadays many social scientists maintain that ‘imaginative empirical methods’ — such as natural experiments, field experiments, lab experiments, RCTs — can help us to answer questions concerning the external validity of models used in social sciences. In their view, they are more or less tests of ‘an underlying model’ that enable them to make the right selection from the ever-expanding ‘collection of potentially applicable models.’ When looked at carefully, however, there are in fact few real reasons to share this optimism.

Many ‘experimentalists’ claims that it is easy to replicate experiments under different conditions and therefore a fortiori easy to test the robustness of experimental results. But is it really that easy? Population selection is almost never simple. Had the problem of external validity only been about inference from sample to population, this would be no critical problem. But the really interesting inferences are those we try to make from specific labs/experiments/fields to specific real-world situations/institutions/ structures that we are interested in understanding or (causally) explaining. And then the population problem is more difficult to tackle.

In randomized trials the researchers try to find out the causal effects that different variables of interest may have by changing circumstances randomly — a procedure somewhat (‘on average’) equivalent to the usual *ceteris paribus* assumption).

Besides the fact that ‘on average’ is not always ‘good enough,’ it amounts to nothing but hand waving to simpliciter assume, without argumentation, that it is tenable to treat social agents and relations as homogeneous and interchangeable entities.

Randomization is used to basically allow the econometrician to treat the population as consisting of ‘exchangeable’ and homogeneous groups (‘treatment’ and ‘control’). The regression models one arrives at by using randomized trials tell us the average effect that variations in variable X has on the outcome variable Y, without having to explicitly control for effects of other explanatory variables R, S, T, etc., etc. Everything is assumed to be essentially equal except the values taken by variable X. But as noted by Jerome Cornfield, even if one of the main functions of randomization is to generate a sample space, there are

reasons for questioning the basic role of the sample space, i.e., of variations from sample to sample, in statistical theory. In practice, certain unusual samples would ordinarily be modified, adjusted or entirely discarded, if they in fact were obtained, even though they are part of the basic description of sampling variation. Savage reports that Fisher, when asked what he would do with a randomly selected Latin Square that turned out to be a Knut Vik Square, replied that “he thought he would draw again and that, ideally, a theory explicitly excluding regular squares should be developed.” But this option is not available in clinical trials and undesired baseline imbalances between treated and control groups can occur. There is often no alternative to reweighting or otherwise adjusting for these imbalances.

In a usual regression context, one would apply an ordinary least squares estimator (OLS) in trying to get an unbiased and consistent estimate:

Y = α + βX + ε,

where α is a constant intercept, β a constant ‘structural’ causal effect and ε an error term.

The problem here is that although we may get an estimate of the ‘true’ average causal effect, this may ‘mask’ important heterogeneous effects of a causal nature. Although we get the right answer of the average causal effect being 0, those who are ‘treated'( X=1) may have causal effects equal to – 100 and those ‘not treated’ (X=0) may have causal effects equal to 100. Contemplating whether being treated or not, most people would probably be interested in knowing about this underlying heterogeneity and would not consider the OLS average effect particularly enlightening.

Limiting model assumptions in science always have to be closely examined since if we are going to be able to show that the mechanisms or causes that we isolate and handle in our models are stable in the sense that they do not change when we ‘export’ them to our ‘target systems,’ we have to be able to show that they do not only hold under ceteris paribus conditions and a fortiori only are of limited value to our understanding, explanations or predictions of real-world systems.

Most ‘randomistas’ underestimate the heterogeneity problem. It does not just turn up as an external validity problem when trying to ‘export’ regression results to different times or different target populations. It is also often an internal problem to the millions of regression estimates that are produced every year.

Just as econometrics, randomization promises more than it can deliver, basically because it requires assumptions that in practice are not possible to maintain. And just like econometrics, randomization is basically a deductive method. Given the assumptions, these methods deliver deductive inferences. The problem, of course, is that we will never completely know when the assumptions are right. And although randomization may contribute to controlling for confounding, it does not guarantee it, since genuine randomness presupposes infinite experimentation and we know all real experimentation is finite. And even if randomization may help to establish average causal effects, it says nothing of individual effects unless homogeneity is added to the list of assumptions. Causal evidence generated by randomization procedures may be valid in ‘closed’ models, but what we usually are interested in, is causal evidence in the real-world target system we happen to live in.

Some statisticians and data scientists think that algorithmic formalisms somehow give them access to causality. That is, however, simply not true. Assuming ‘convenient’ things like ‘faithfulness,’ ‘exchangeability,’ or stability, is not to give proof. It’s to assume what has to be proven. Deductive-axiomatic methods used in statistics do not produce evidence for causal inferences. The real causality we are searching for is the one existing in the real world around us. If there is no warranted connection between axiomatically derived theorems and the real world, well, then we haven’t really obtained the causation we are looking for.

As social scientists, we have to confront the all-important question of how to handle uncertainty and randomness. Should we define randomness with probability? If we do, we have to accept that to speak of randomness we also have to presuppose the existence of nomological probability machines, since probabilities cannot be spoken of – and actually, to be strict, do not at all exist – without specifying such system-contexts. Accepting a domain of probability theory and sample space of infinite populations also implies that judgments are made on the basis of observations that are actually never made!

Infinitely repeated trials or samplings never take place in the real world. So that cannot be a sound inductive basis for science with aspirations of explaining real-world socio-economic processes, structures or events. It’s not tenable.

And as if this wasn’t enough, one could also seriously wonder what kind of ‘populations’ many statistical models ultimately are based on. Why should we as social scientists — and not as pure mathematicians working with formal-axiomatic systems without the urge to confront our models with real target systems — unquestioningly accept models based on concepts like the ‘infinite super populations’ used in e.g. the ‘potential outcome’ framework that has become so popular lately in social sciences?

Modelling assumptions made in statistics are more often than not made for mathematical tractability reasons, rather than verisimilitude. That is unfortunately also a reason why the methodological ‘rigour’ encountered when taking part of statistical research often is deceptive. The models constructed may seem technically advanced and very ‘sophisticated,’ but that’s usually only because the problems here discussed have been swept under the carpet. Assuming that our data are generated by ‘coin flips’ in an imaginary ‘superpopulation’ only means that we get answers to questions that we are not asking. The inferences made based on imaginary ‘superpopulations,’ well, they too are nothing but imaginary.

‘Ideally controlled experiments’ tell us with certainty what causes what effects — but only given the right ‘closures.’ Making appropriate extrapolations from (ideal, accidental, natural or quasi) experiments to different settings, populations or target systems, is not easy. ‘It works there’ is no evidence for ‘it will work here’. Causes deduced in an experimental setting still have to show that they come with an export warrant to the target population/system. The causal background assumptions made have to be justified, and without licenses to export, the value of ‘rigorous’ and ‘precise’ methods — and ‘on-average-knowledge’ — is often despairingly small.

In our days, serious arguments have been made from data. Beautiful, delicate theorems have been proved, although the connection with data analysis often remains to be established. And an enormous amount of fiction has been produced, masquerading as rigorous science …

Indeed, far-reaching claims have been made for the superiority of a quantitative template that depends on modeling — by those who manage to ignore the far-reaching assumptions behind the models. However, the assumptions often turn out to be unsupported by data. If so, the rigor of advanced quantitative methods is a matter of appearance rather than substance …

David A. FreedmanStatistical Models and Causal Inference

## Does randomization control for ‘lack of balance’?

16 Mar, 2023 at 16:40 | Posted in Statistics & Econometrics | 2 CommentsMike Clarke, the Director of the Cochrane Centre in the UK, for example, states on the Centre’s Web site: ‘In a randomized trial, the only difference between the two groups being compared is that of most interest: the intervention under investigation’.

This seems clearly to constitute a categorical assertion that by randomizing, all other factors — both known and unknown — are equalized between the experimental and control groups; hence the only remaining difference is exactly that one group has been given the treatment under test, while the other has been given either a placebo or conventional therapy; and hence any observed difference in outcome between the two groups in a randomized trial (but only in a randomized trial) must be the effect of the treatment under test.

Clarke’s claim is repeated many times elsewhere and is widely believed. It is admirably clear and sharp, but it is clearly unsustainable … Clearly the claim taken literally is quite trivially false: the experimental group contains Mrs Brown and not Mr Smith, whereas the control group contains Mr Smith and not Mrs Brown, etc. Some restriction on the range of differences being considered is obviously implicit here; and presumably the real claim is something like that the two groups have the same means and distributions of all the [causally?] relevant factors. Although this sounds like a meaningful claim, I am not sure whether it would remain so under analysis … And certainly, even with respect to a given (finite) list of potentially relevant factors, no one can really believe that it automatically holds in the case of any particular randomized division of the subjects involved in the study. Although many commentators often seem to make the claim … no one seriously thinking about the issues can hold that randomization is a

sufficientcondition for there to be no difference between the two groups that may turn out to be relevant …In sum, despite what is often said and written, no one can seriously believe that having randomized is a sufficient condition for a trial result to be reasonably supposed to reflect the true effect of some treatment. Is randomizing a

necessarycondition for this? That is, is it true that we cannot have real evidence that a treatment is genuinely effective unless it has been validated in a properly randomized trial? Again, some people in medicine sometimes talk as if this were the case, but again no one can seriously believe it. Indeed, as pointed out earlier, modern medicine would be in a terrible state if it were true. As already noted, the overwhelming majority of all treatments regarded as unambiguously effective by modern medicine today — from aspirin for mild headache through diuretics in heart failure and on to many surgical procedures — were never (and now, let us hope, never will be) ‘validated’ in an RCT.

For more on the question of ‘balance’ in randomized experiments, this collection of papers in *Social Science & Medicine* gives many valuable insights.

Blog at WordPress.com.

Entries and Comments feeds.