## IV regression and the difficult art of mimicking randomization

27 May, 2022 at 11:14 | Posted in Statistics & Econometrics | Leave a commentWe need relevance and validity. How realistic is validity, anyway? We ideally want our instrument to behave just like randomization in an experiment. But in the real world, how likely is that to actually happen? Or, if it’s an IV that requires control variables to be valid, how confident can we be that the controls really do everything we need them to?

In the long-ago times, researchers were happy to use instruments without thinking too hard about validity. If you go back to the 1970s or 1980s you can find people using things like parental education as an instrument for your own (surely your parents’ education can’t possibly affect your outcomes except through your own education!). It was the wild west out there…

But these days, go to any seminar where an instrumental variables paper is presented and you’ll hear no end of worries and arguments about whether the instrument is valid. And as time goes on, it seems like people have gotten more and more difficult to convince when it comes to validity. This focus on validity is good, but sometimes comes at the expense of thinking about other IV considerations, like monotonicity (we’ll get there) or even basic stuff like how good the data is.

There’s good reason to be concerned! Not only is it hard to justify that there exists a variable strongly related to treatment that somehow

isn’t at allrelated to all the sources of hard-to-control-for back doors that the treatment had in the first place, we also have plenty of history of instruments that we thought sounded pretty good that turned out not to work so well.

Nick Huntington-Klein’s new book on how to use observational data to make causal inferences is superbly accessible. Highly recommended reading for anyone interested in causal inference in economics and social science!

## Problems with Propensity Score Matching (wonkish)

30 Apr, 2022 at 14:42 | Posted in Statistics & Econometrics | Comments Off on Problems with Propensity Score Matching (wonkish).

## Statistical inference and sampling assumptions

28 Apr, 2022 at 12:55 | Posted in Statistics & Econometrics | Comments Off on Statistical inference and sampling assumptionsReal probability samples have two great benefits: (i) they allow unbiased extrapolation from the sample; (ii) with data internal to the sample, it is possible to estimate how much results are likely to change if another sample is taken. These benefits, of course, have a price: drawing probability samples is hard work. An investigator who assumes that a convenience sample is like a random sample seeks to obtain the benefits without the costs—just on the basis of assumptions. If scrutinized, few convenience samples would pass muster as the equivalent of probability samples. Indeed, probability sampling is a technique whose use is justified because it is so unlikely that social processes will generate representative samples. Decades of survey research have demonstrated that when a probability sample is desired, probability sampling must be done. Assumptions do not suffice. Hence, our first recommendation for research practice: whenever possible, use probability sampling.

If the data-generation mechanism is unexamined, statistical inference with convenience samples risks substantial error. Bias is to be expected and independence is problematic. When independence is lacking, the p-values produced by conventional formulas can be grossly misleading. In general, we think that reported p-values will be too small; in the social world, proximity seems to breed similarity. Thus, many research results are held to be statistically significant when they are the mere product of chance variation.

In econometrics one often gets the feeling that many of its practitioners think of it as a kind of automatic inferential machine: input data and out comes casual knowledge. This is like pulling a rabbit from a hat. Great — but first you have to put the rabbit in the hat. And this is where assumptions come into the picture.

The assumption of imaginary ‘super populations’ is one of many dubious assumptions used in modern econometrics and statistical analyses to handle uncertainty. As social scientists — and economists — we have to confront the all-important question of how to handle uncertainty and randomness. Should we define randomness with probability? If we do, we have to accept that to speak of randomness we also have to presuppose the existence of nomological probability machines, since probabilities cannot be spoken of – and actually, to be strict, do not at all exist – without specifying such system-contexts. Accepting a domain of probability theory and sample space of infinite populations also implies that judgments are made on the basis of observations that are actually never made!

Infinitely repeated trials or samplings never take place in the real world. So that cannot be a sound inductive basis for a science with aspirations of explaining real-world socio-economic processes, structures or events. It’s not tenable.

And as if this wasn’t enough, one could — as we’ve seen — also seriously wonder what kind of ‘populations’ these statistical and econometric models ultimately are based on. Why should we as social scientists — and not as pure mathematicians working with formal-axiomatic systems without the urge to confront our models with real target systems — unquestioningly accept models based on concepts like the ‘infinite super populations’ used in e.g. the ‘potential outcome’ framework that has become so popular lately in social sciences?

One could, of course, treat observational or experimental data as random samples from real populations. I have no problem with that (although it has to be noted that most ‘natural experiments’ are *not* based on random sampling from some underlying population — which, of course, means that the effect-estimators, strictly seen, only are unbiased for the specific groups studied). But probabilistic econometrics does not content itself with that kind of populations. Instead, it creates imaginary populations of ‘parallel universes’ and assume that our data are random samples from that kind of ‘infinite super populations.’

But this is actually nothing else but hand-waving! And it is inadequate for real science. As David Freedman writes:

With this approach, the investigator does not explicitly define a population that could in principle be studied, with unlimited resources of time and money. The investigator merely

assumesthat such a population exists in some ill-defined sense. And there is a further assumption, that the data set being analyzed can be treatedas ifit were based on a random sample from the assumed population.These are convenient fictions… Nevertheless, reliance on imaginary populations is widespread. Indeed regression models are commonly used to analyze convenience samples… The rhetoric of imaginary populations is seductive because it seems to free the investigator from the necessity of understanding how data were generated.

In social sciences — including economics — it’s always wise to ponder C. S. Peirce’s remark that universes are not as common as peanuts …

## Fisher’s exact test (student stuff)

27 Apr, 2022 at 08:44 | Posted in Statistics & Econometrics | Comments Off on Fisher’s exact test (student stuff).

## Significance testing and the real tasks of social science

21 Apr, 2022 at 11:36 | Posted in Statistics & Econometrics | 2 CommentsAfter having mastered all the technicalities of regression analysis and econometrics, students often feel as though they are masters of the universe. I usually cool them down with the required reading of Christopher Achen’s modern classic *Interpreting and Using Regression*. It usually gets them back on track again, and they understand that

no increase in methodological sophistication … alter the fundamental nature of the subject. It remains a wondrous mixture of rigorous theory, experienced judgment, and inspired guesswork. And that, finally, is its charm.

And in case they get too excited about having learned to master the intricacies of proper significance tests and p-values, I ask them to also ponder on Achen’s apt warning:

Significance testing as a search for specification errors substitutes calculations for substantive thinking. Worse, it channels energy toward the hopeless search for functionally correct specifications and diverts attention from the real tasks, which are to formulate a manageable description of the data and to exclude competing ones.

## Death penalties and homicides — a D-I-D analysis

10 Apr, 2022 at 12:38 | Posted in Statistics & Econometrics | Comments Off on Death penalties and homicides — a D-I-D analysis.

## John Snow and the birth of causal inference

9 Apr, 2022 at 14:05 | Posted in Statistics & Econometrics | Comments Off on John Snow and the birth of causal inference

If anything, Snow’s path-breaking research underlines how important it is not to equate science with statistical calculation. All science entails human judgment, and using statistical models doesn’t relieve us of that necessity. Working with misspecified models, the scientific value of statistics is actually zero — even though you’re making valid statistical inferences! Statistical models are no substitutes for doing real science. Or as a German philosopher once famously wrote:

There is no royal road to science, and only those who do not dread the fatiguing climb of its steep paths have a chance of gaining its luminous summits.

We should never forget that the underlying parameters we use when performing statistical tests are *model constructions*. And if the model is wrong, the value of our calculations is nil. As ‘shoe-leather researcher’ David Freedman wrote in *Statistical Models and Causal Inference*:

I believe model validation to be a central issue. Of course, many of my colleagues will be found to disagree. For them, fitting models to data, computing standard errors, and performing significance tests is “informative,” even though the basic statistical assumptions (linearity, independence of errors, etc.) cannot be validated. This position seems indefensible, nor are the consequences trivial. Perhaps it is time to reconsider.

## Multilevel modeling (student stuff)

4 Apr, 2022 at 13:26 | Posted in Statistics & Econometrics | Comments Off on Multilevel modeling (student stuff).

## Model validation and significance testing

1 Apr, 2022 at 07:12 | Posted in Statistics & Econometrics | 3 CommentsIn its standard form, a significance test is not the kind of ‘severe test’ that we are looking for in our search for being able to confirm or disconfirm empirical scientific hypotheses. This is problematic for many reasons, one being that there is a strong tendency to accept the null hypothesis since they can’t be rejected at the standard 5% significance level. In their standard form, significance tests bias against new hypotheses by making it hard to disconfirm the null hypothesis.

And as shown over and over again when it is applied, people have a tendency to read “not disconfirmed” as “probably confirmed.” Standard scientific methodology tells us that when there is only say a 10 % probability that pure sampling error could account for the observed difference between the data and the null hypothesis, it would be more “reasonable” to conclude that we have a case of disconfirmation. Especially if we perform many independent tests of our hypothesis and they all give about the same 10 % result as our reported one, I guess most researchers would count the hypothesis as even more disconfirmed.

Most importantly — we should never forget that the underlying parameters we use when performing significance tests are *model constructions*. Our p-values mean next to nothing if the model is wrong. As eminent mathematical statistician David Freedman writes:

I believe model validation to be a central issue. Of course, many of my colleagues will be found to disagree. For them, fitting models to data, computing standard errors, and performing significance tests is “informative,” even though the basic statistical assumptions (linearity, independence of errors, etc.) cannot be validated. This position seems indefensible, nor are the consequences trivial. Perhaps it is time to reconsider.

## Machine learning cross-validation (student stuff)

15 Mar, 2022 at 10:31 | Posted in Statistics & Econometrics | Comments Off on Machine learning cross-validation (student stuff).

## Use of covariates in RCTs (wonkish)

1 Mar, 2022 at 15:45 | Posted in Statistics & Econometrics | Comments Off on Use of covariates in RCTs (wonkish).

## Bayes vs classical statistical p-testing

23 Feb, 2022 at 08:10 | Posted in Statistics & Econometrics | 1 Comment.

[For more on the RCT referred to in the video, take a look here. Mortality numbers are, of course, important, but so is the fact that among the 241 patients who received the drug, 52 developed severe illness, compared to 43 of 249 patients who did not take the drug … ]

## Vad ‘kontrollera för något’ betyder i regressionsanalys

21 Feb, 2022 at 10:57 | Posted in Statistics & Econometrics | Comments Off on Vad ‘kontrollera för något’ betyder i regressionsanalys.

## Dynamic and static interpretations of regression coefficients

20 Feb, 2022 at 16:33 | Posted in Statistics & Econometrics | Comments Off on Dynamic and static interpretations of regression coefficientsWhen econometric and statistical textbooks present simple (and multiple) regression analysis for cross-sectional data, they often do it with regressions like “regress test score (y) on study hours (x)” and get the result

y = constant + slope coefficient*x + error term.

When speaking of increases or decreases in x in these interpretations, we have to remember that it is a question of cross-sectional data and ‘increases’ — which means that we are referring to increases in the value of a variable from *one* unit in the population to *another* unit in the same population. Strictly seen it is only admissible to give slope coefficients a *dynamic* interpretation when we are dealing with time-series regression. For cross-sectional data, we should stick to *static* interpretations and look upon slope coefficients as giving information about what we can expect to happen to the value of the dependent variable when there is a change in the independent variable *from one unit to another*.

Although it is tempting to say that a change in the independent variable leads to a change in the dependent variable, we should resist that temptation. Students that put a lot of study hours into their daily routine on average achieve higher scores on their tests than *other* students that study for fewer hours. But — the regressions made do not analyse what happens to individual students as they increase or decrease their study hours.

Why is this important? It is important most of all because interpreting the regression coefficients wrong may give a totally wrong causal view of what is going on in your data. A positive relation between test scores and study hours in a cross-sectional regression does not mean that you as an individual student should expect to get higher test scores by increasing study time.

## Misunderstanding randomization

17 Feb, 2022 at 18:32 | Posted in Statistics & Econometrics | Comments Off on Misunderstanding randomizationWhile raising worthwhile points, most discussions I see misunderstand randomization in both causal and statistical ways. Notably, randomization can be valuable but does not induce balance in the ordinary English sense of the word, nor does it deal with most problems of real experiments. Furthermore, the use of the word “balance” to describe what randomization actually does invites confusion with the ordinary English meaning of “balance” (as does use of ordinary words like “significance” and “confidence” to describe other technical concepts).

Causally, a controlled experiment is one which the experimenter causally controls the causes of (inputs to) the treatment (studied cause) or the outcome (studied effect) – preferably both. A randomized experiment is one in which the causes of the treatment are fully determined by a known randomizing device (at least within levels of fully measured covariates), so that there is nothing unmeasured that causes both treatment and outcome. Provided the outcome is shielded from any effect of the randomizing device except that through (mediated by) the treatment, the random assignment variable becomes a perfect instrumental variable (IV), and statistical techniques based on such perfect IVs can be justified without recourse to dodgy empirical tests. A similar view can be found in Pearl’s book (Causality, 2nd ed. 2009).

Statistically, a frequentist can use the randomization distribution of A to construct a reference distribution for test statistics under various models (hypotheses) about the treatment effect (usually only a test of a no-effect model is described in this fashion, but most so-called confidence intervals are summaries of tests of models across which the treatment effect but nothing else is varied). This view can be seen in writings by Stephen Senn and James Robins. In parallel a Bayesian can use the distribution to provide prior distributions for counterfactual outcomes under the same variety of models (Cornfield, American Journal of Epidemiology 1976).

Note that none of these descriptions use or need the term “balance”, nor need they make claims that randomization corresponds to no confounding (Greenland and Mansournia, European Journal of Epidemiology 2015). Proper randomization can be said to provide “balance in probability” but for frequentists this property is over a purely hypothetical long run while for Bayesians it is a subjective prior probability induced by the knowing that allocation was random (Cornfield 1976 again). Neither use of “balance” applies to the actual state of observed trial populations, which both theories allow or concede may be arbitrarily out of balance on unmeasured covariates due to “bad luck of the draw” (“random confounding” as per Greenland & Mansournia 2015). By properly merging the randomization distribution (information on allocation) and models for treatment effect, frequentists can deal with this chance element via P-value functions (“confidence distributions”) while Bayesians can deal with it via posterior distributions. Again, neither need invoke “balance” – and I would argue that, to avoid the confusion seen in much literature, they shouldn’t.

None of this should be taken as sanctifying or criticizing randomization; I am simply pointing out that randomization, if done and described precisely, does something valuable – but that is not balance of actual samples (as opposed to hypothetical infinite samples or repetitions, or bets about actual samples). Real randomized studies of humans and their groupings must deal with many other considerations such as selectivity of study subjects (hence lack of generalizability), blinding (masking) of subjects and evaluators, outcome measurement error, nonadherence, drop-out, competing risks, etc. Randomization can help deflect concerns about confounding in probability, but is no panacea, and can increase other concerns such as selectivity of participation.

Blog at WordPress.com.

Entries and Comments feeds.