Science before statistics — causal inference

10 Sep, 2021 at 12:20 | Posted in Statistics & Econometrics | Leave a comment


Probability and rationality — trickier than most people think

26 Aug, 2021 at 18:42 | Posted in Statistics & Econometrics | 5 Comments

The Coin-tossing Problem

My friend Ben says that on the first day he got the following sequence of Heads and Tails when tossing a coin:

And on the second day he says that he got the following sequence:

184bic9u2w483jpgWhich report makes you suspicious?

Most people yours truly asks this question says the first report looks suspicious.

But actually both reports are equally probable! Every time you toss a (fair) coin there is the same probability (50 %) of getting H or T. Both days Ben makes equally many tosses and every sequence is equally probable!

The Linda Problem

Linda is 40 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations.

Which of the following two alternatives is more probable?

A. Linda is a bank teller.
B. Linda is a bank teller and active in the feminist movement.

‘Rationally,’ alternative B cannot be more likely than alternative A. Nonetheless Amos Tversky and Daniel Kahneman reported — ‘Judgments of and by representativeness.’ In D. Kahneman, P. Slovic & A. Tversky (Eds.), Judgment under uncertainty: Heuristics and biases. Cambridge, UK: Cambridge University Press 1982 — that more than 80 per cent of respondents said that it was.

Why do we make such ‘irrational’ judgments in both these cases? Tversky and Kahneman argued that in making this kind of judgment we seek the closest resemblance between causes and effects (in The Linda Problem, between Linda’s personality and her behaviour), rather than calculating probability, and that this makes alternative B seem preferable. By using a heuristic called representativeness, statement B in The Linda Problem seems more ‘representative’ of Linda based on the description of her, although from a probabilistic point of view it is clearly less likely.

The Lady Tasting Tea

15 Aug, 2021 at 21:42 | Posted in Statistics & Econometrics | Comments Off on The Lady Tasting Tea

En av mina absoluta favoriter i statistikhyllan är David Salsburgs insiktsfulla statistikhistoria The Lady Tasting Tea. Boken är full av djupa och värdefulla reflektioner kring statistikens roll i modern vetenskap. Salsburg är, precis som tidigare Keynes, tveksam till hur många samhällsvetare — inte minst ekonomer — okritiskt ofta bara antar att man kan applicera statistikteorins sannolikhetsfördelningar på sitt eget undersökningsområde. I slutkapitlet skriver Salsburg:

Kolmogorov established the mathematical meaning of probability: Probability is a measure of sets in an abstract space of events. All the mathematical properties of probability can be derived from this definition. When we wish to apply probability to real life, we need to identify that abstract space of events for the particular problem at hand … It is not well established when statistical methods are used for observational studies … If we cannot identify the space of events that generate the probabilities being calculated, then one model is no more valid than another … As statistical models are used more and more for observational studies to assist in social decisions by government and advocacy groups, this fundamental failure to be able to derive probabilities without ambiguity will cast doubt on the usefulness of these methods.

Kloka ord för ekonometriker och andra “räknenissar” att begrunda!

Reverse causal reasoning and inference to the best explanation

12 Aug, 2021 at 18:16 | Posted in Statistics & Econometrics | 16 Comments

Causal Inference: Introduction to Causal Effect Estimation | inovex GmbH

One of the few statisticians that yours truly has on his blogroll is Andrew Gelman. Although not sharing his Bayesian leanings, I find  his thought-provoking and non-dogmatic statistical thinking highly recommendable. The plaidoyer infra for “reverse causal questioning” is typical Gelmanian:

When statistical and econometrc methodologists write about causal inference, they generally focus on forward causal questions. We are taught to answer questions of the type “What if?”, rather than “Why?” Following the work by Rubin (1977) causal questions are typically framed in terms of manipulations: if x were changed by one unit, how much would y be expected to change? But reverse causal questions are important too … In many ways, it is the reverse causal questions that motivate the research, including experiments and observational studies, that we use to answer the forward questions …

Reverse causal reasoning is different; it involves asking questions and searching for new variables that might not yet even be in our model. We can frame reverse causal questions as model checking. It goes like this: what we see is some pattern in the world that needs an explanation. What does it mean to “need an explanation”? It means that existing explanations — the existing model of the phenomenon — does not do the job …

By formalizing reverse casual reasoning within the process of data analysis, we hope to make a step toward connecting our statistical reasoning to the ways that we naturally think and talk about causality. This is consistent with views such as Cartwright (2007) that causal inference in reality is more complex than is captured in any theory of inference … What we are really suggesting is a way of talking about reverse causal questions in a way that is complementary to, rather than outside of, the mainstream formalisms of statistics and econometrics.

In a time when scientific relativism is expanding, it is important to keep up the claim for not reducing science to a pure discursive level. We have to maintain the Enlightenment tradition of thinking of reality as something more and beyond our theories and concepts of it — and of the main task of science as studying the structure of this reality.

Science is made possible by the fact that there exists a reality beyond our theories and concepts of it. It is this reality that our theories in some way deal with. Contrary to positivism, I would as a critical realist argue that the main task of science is not to detect event-regularities between observed facts. Rather, that task should be conceived as identifying the underlying structure and forces that produce the observed events.

In Gelman’s essay there is  no explicit argument for abduction —  inference to the best explanation — but I would still argue that it is de facto nothing but a very strong argument for why scientific realism and inference to the best explanation are the best alternatives for explaining what is going on in the world we live in. The focus on causality, model checking, anomalies and context-dependence is as close to abductive reasoning as we get in statistics and econometrics today.

Instrumental variables — in search for identification

21 Jul, 2021 at 17:52 | Posted in Statistics & Econometrics | Comments Off on Instrumental variables — in search for identification

Nick HK (@nickchk) | TwitterWe need relevance and validity. How realistic is validity, anyway? We ideally want our instrument to behave just like randomization in an experiment. But in the real world, how likely is that to actually happen? Or, if it’s an IV that requires control variables to be valid, how confident can we be that the controls really do everything we need them to?

In the long-ago times, researchers were happy to use instruments without thinking too hard about validity. If you go back to the 1970s or 1980s you can find people using things like parental education as an instrument for your own (surely your parents’ education can’t possibly affect your outcomes except through your own education!). It was the wild west out there…

But these days, go to any seminar where an instrumental variables paper is presented and you’ll hear no end of worries and arguments about whether the instrument is valid. And as time goes on, it seems like people have gotten more and more difficult to convince when it comes to validity. This focus on validity is good, but sometimes comes at the expense of thinking about other IV considerations, like monotonicity (we’ll get there) or even basic stuff like how good the data is.

There’s good reason to be concerned! Not only is it hard to justify that there exists a variable strongly related to treatment that somehow isn’t at all related to all the sources of hard-to-control-for back doors that the treatment had in the first place, we also have plenty of history of instruments that we thought sounded pretty good that turned out not to work so well.

Nick Huntington-Klein’s book is superbly accessible.

Highly recommended reading for anyone interested in causal inference in economics and social science!

Causal discovery and the faithfulness assumption (student stuff)

21 Jul, 2021 at 12:35 | Posted in Statistics & Econometrics | Comments Off on Causal discovery and the faithfulness assumption (student stuff)


For more on the (questionable) faithfulness assumption, cf. chapter six of Nancy Cartwright’s Hunting causes and using them.

Conditional probabilities (student stuff)

21 Jul, 2021 at 12:21 | Posted in Statistics & Econometrics | Comments Off on Conditional probabilities (student stuff)


Which causal inference books to read

19 Jul, 2021 at 11:57 | Posted in Statistics & Econometrics | Comments Off on Which causal inference books to read

Causal Inference Books Flowchart


All suggestions are highly readable, but for the general reader, yours truly would also like to recommend The book of why by Pearl & Mackenzie.

RCT — a questionable claim of establishing causality

16 Jul, 2021 at 11:33 | Posted in Statistics & Econometrics | 2 Comments Randomized Control Trials in the Field of Development: A  Critical Perspective eBook: Bédécarrats, Florent, Guérin, Isabelle,  Roubaud, François: Kindle StoreThe ideal RCT is the special case in which the trial’s treatment status is also assigned randomly (in addition to drawing random samples from the two populations, one treated and one not) and the only error is due to sampling variability … In this special case, as the number of trials increases, the mean of the trial estimates tends to get closer to the true mean impact. This is the sense in which an ideal RCT is said to be unbiased, namely that the sampling error is driven to zero in expectation …

Prominent randomistas have sometimes left out the “in expectation” qualifier, or ignored its implications for the existence of experimental errors. These advocates of RCTs attribute any difference in mean outcomes between the treatment and control samples to the intervention … Many people in the development community now think that any measured difference between the treatment and control groups in an RCT is attributable to the treatment. It is not; even the ideal RCT has some unknown error.

A rare but instructive case is when there is no treatment. Absent any other effects of assignment (such as from monitoring), the impact is zero. Yet the random error in one trial can still yield a non-zero mean impact from an RCT. An example is an RCT in Denmark in which 860 elderly people were randomly and unknowingly divided into treatment and control groups prior to an 18-month period without any actual intervention (Vass, 2010). A statistically significant (prob. = 0.003) difference in mortality rates emerged at the end of the period.

Martin Ravaillon

The role of RCTs in development

15 Jul, 2021 at 14:25 | Posted in Statistics & Econometrics | Comments Off on The role of RCTs in development


Econometrics — science based on unwarranted assumptions

15 Jul, 2021 at 11:52 | Posted in Statistics & Econometrics | 1 Comment

Machine Learning or Econometrics? | by Dr. Dataman | Analytics Vidhya |  MediumThere is first of all the central question of methodology — the logic of applying the method of multiple correlation to unanalysed economic material, which we know to be non-homogeneous through time. If we are dealing with the action of numerically measurable, independent forces, adequately analysed so that we were dealing with independent atomic factors and between them completely comprehensive, acting with fluctuating relative strength on material constant and homogeneous through time, we might be able to use the method of multiple correlation with some confidence for disentangling the laws of their action … In fact we know that every one of these conditions is far from being satisfied by the economic material under investigation.

Letter from John Maynard Keynes to Royall Tyler (1938)

Mainstream economists often hold the view that criticisms of econometrics are the conclusions of sadly misinformed and misguided people who dislike and do not understand much of it. This is a gross misapprehension. To be careful and cautious is not equivalent to dislike.

Keynes' critique of econometrics — as valid today as it was in 1939 | LARS  P. SYLL

The ordinary deductivist ‘textbook approach’ to econometrics views the modelling process as foremost an estimation problem since one (at least implicitly) assumes that the model provided by economic theory is a well-specified and ‘true’ model. The more empiricist, general-to-specific-methodology (often identified as the ‘LSE approach’) on the other hand views models as theoretically and empirically adequate representations (approximations) of a data generating process (DGP). Diagnostics tests (mostly some variant of the F-test) are used to ensure that the models are ‘true’ – or at least ‘congruent’ – representations of the DGP. The modelling process is here more seen as a specification problem where poor diagnostics results may indicate a possible misspecification requiring re-specification of the model. The objective is standardly to identify models that are structurally stable and valid across a large time-space horizon. The DGP is not seen as something we already know, but rather something we discover in the process of modelling it. Considerable effort is put into testing to what extent the models are structurally stable and generalizable over space and time.

Although yours truly has some sympathy for this approach in general, there are still some unsolved ‘problematics’ with its epistemological and ontological presuppositions. There is, e. g., an implicit assumption that the DGP fundamentally has an invariant property and that models that are structurally unstable just have not been able to get hold of that invariance. But one cannot just presuppose or take for granted that kind of invariance. It has to be argued and justified. Grounds have to be given for viewing reality as satisfying conditions of model-closure. It is as if the lack of closure that shows up in the form of structurally unstable models somehow could be solved by searching for more autonomous and invariable ‘atomic uniformity.’ But if reality is ‘congruent’ to this analytical prerequisite has to be argued for, and not simply taken for granted.

A great many models are compatible with what we know in economics — that is to say, do not violate any matters on which economists are agreed. Attractive as this view is, it fails to draw a necessary distinction between what is assumed and what is merely proposed as hypothesis. This distinction is forced upon us by an obvious but neglected fact of statistical theory: the matters ‘assumed’ are put wholly beyond test, and the entire edifice of conclusions (e.g., about identifiability, optimum properties of the estimates, their sampling distributions, etc.) depends absolutely on the validity of these assumptions. The great merit of modern statistical inference is that it makes exact and efficient use of what we know about reality to forge new tools of discovery, but it teaches us painfully little about the efficacy of these tools when their basis of assumptions is not satisfied. 

Millard Hastay

Even granted that closures come in degrees, we should not compromise on ontology. Some methods simply introduce improper closures, closures that make the disjuncture between models and real-world target systems inappropriately large. ‘Garbage in, garbage out.’

Underlying the search for these immutable ‘fundamentals’ is the implicit view of the world as consisting of entities with their own separate and invariable effects. These entities are thought of as being able to be treated as separate and addible causes, thereby making it possible to infer complex interaction from a knowledge of individual constituents with limited independent variety. But, again, if this is a justified analytical procedure cannot be answered without confronting it with the nature of the objects the models are supposed to describe, explain or predict. Keynes thought it generally inappropriate to apply the ‘atomic hypothesis’ to such an open and ‘organic entity’ as the real world. As far as I can see these are still appropriate strictures all econometric approaches have to face. Grounds for believing otherwise have to be provided by the econometricians.

Trygve Haavelmo, the father of modern probabilistic econometrics, wrote (in ‘Statistical testing of business-cycle theories’, The Review of  Economics and Statistics, 1943) that he and other econometricians could not build a complete bridge between our models and reality by logical operations alone, but finally had to make “a non-logical jump” [1943:15]. A part of that jump consisted in that econometricians “like to believe … that the various a priori possible sequences would somehow cluster around some typical time shapes, which if we knew them, could be used for prediction” [1943:16]. But since we do not know the true distribution, one has to look for the mechanisms (processes) that “might rule the data” and that hopefully persist so that predictions may be made. Of possible hypotheses on different time sequences (“samples” in Haavelmo’s somewhat idiosyncratic vocabulary) most had to be ruled out a priori “by economic theory”, although “one shall always remain in doubt as to the possibility of some … outside hypothesis being the true one” [1943:18].

Continue Reading Econometrics — science based on unwarranted assumptions…

Causal inference in social sciences (student stuff)

8 Jul, 2021 at 11:32 | Posted in Statistics & Econometrics | Comments Off on Causal inference in social sciences (student stuff)


The main ideas behind bootstrapping (student stuff)

6 Jul, 2021 at 10:50 | Posted in Statistics & Econometrics | Comments Off on The main ideas behind bootstrapping (student stuff)


Propensity score matching vs. regression (student stuff)

5 Jul, 2021 at 11:37 | Posted in Statistics & Econometrics | Comments Off on Propensity score matching vs. regression (student stuff)


Questionable research practices

2 Jul, 2021 at 17:14 | Posted in Statistics & Econometrics | Comments Off on Questionable research practices


Next Page »

Blog at
Entries and Comments feeds.