Discrimination and the use of ‘statistical controls’

14 Jun, 2021 at 12:27 | Posted in Statistics & Econometrics | Comments Off on Discrimination and the use of ‘statistical controls’

The gender pay gap is a fact that, sad to say, to a non-negligible extent is the result of discrimination. And even though many women are not deliberately discriminated against, but rather self-select into lower-wage jobs, this in no way magically explains away the discrimination gap. As decades of socialization research has shown, women may be ‘structural’ victims of impersonal social mechanisms that in different ways aggrieve them. Wage discrimination is unacceptable. Wage discrimination is a shame.

You see it all the time in studies. “We controlled for…” And then the list starts … The more things you can control for, the stronger your study is — or, at least, the stronger your study seems. Controls give the feeling of specificity, of precision. But sometimes, you can control for too much. Sometimes you end up controlling for the thing you’re trying to measure …

paperAn example is research around the gender wage gap, which tries to control for so many things that it ends up controlling for the thing it’s trying to measure …

Take hours worked, which is a standard control in some of the more sophisticated wage gap studies. Women tend to work fewer hours than men. If you control for hours worked, then some of the gender wage gap vanishes. As Yglesias wrote, it’s “silly to act like this is just some crazy coincidence. Women work shorter hours because as a society we hold women to a higher standard of housekeeping, and because they tend to be assigned the bulk of childcare responsibilities.”

Controlling for hours worked, in other words, is at least partly controlling for how gender works in our society. It’s controlling for the thing that you’re trying to isolate.

Ezra Klein

Trying to reduce the risk of having established only ‘spurious relations’ when dealing with observational data, statisticians and econometricians standardly add control variables. The hope is that one thereby will be able to make more reliable causal inferences. But — as Keynes showed already back in the 1930s when criticizing statistical-econometric applications of regression analysis — if you do not manage to get hold of all potential confounding factors, the model risks producing estimates of the variable of interest that are even worse than models without any control variables at all. Conclusion: think twice before you simply include ‘control variables’ in your models!

piled-up-dishes-in-kitchen-sinkWhen I present this argument … one or more scholars say, “But shouldn’t I control for everything I can in my regressions? If not, aren’t my coefficients biased due to excluded variables?” … The excluded variable argument only works if you are sure your specification is precisely correct with all variables included. But no one can know that with more than a handful of explanatory variables …

A preferable approach is to separate the observations into meaningful subsets—internally compatible statistical regimes … If this can’t be done, then statistical analysis can’t be done. A researcher claiming that nothing else but the big, messy regression is possible because, after all, some results have to be produced, is like a jury that says, “Well, the evidence was weak, but somebody had to be convicted.”

Christopher H. Achen

Kitchen sink econometric models are often the result of researchers trying to control for confounding. But what they usually haven’t understood is that the confounder problem requires a causal solution and not statistical ‘control.’ Controlling for everything opens up the risk that we control for ‘collider’ variables and thereby create ‘back-door paths’ which gives us confounding that wasn’t there to begin with.

Blog at WordPress.com.
Entries and Comments feeds.