The gender wage gap

15 May, 2021 at 19:40 | Posted in Economics | Leave a comment

uberUber has conducted a study of internal pay differentials between men and women, which they describe as “gender blind” … The study found a 7% pay gap in favor of men. They present their findings as proof that there are issues unrelated to gender that impact driver pay. They quantify the reasons for the gap as follows:

Where: 20% is due to where people choose to drive (routes/neighborhoods).

Experience: 30% is due to experience …

Speed: 50% was due to speed, they claim that men drive slightly faster, so complete more trips per hour …

The company’s reputation has been affected by its sexist and unprofessional corporate culture, and its continued lack of gender balance won’t help. Nor, I suspect, will its insistence, with research conducted by its own staff to prove it, that the pay gap is fair. This simply adds insult to obnoxiousness.

But then, why would we have expected any different? The Uber case study’s conclusions may actually be almost the opposite of what they were trying to prove. Rather than showing that the pay gap is a natural consequence of our gendered differences, they have actually shown that systems designed to insistently ignore differences tend to become normed to the preferences of those who create them.

Avivah Wittenberg-Cox

Spending a couple of hours going through a JEL survey of modern research on the gender wage gap, yours truly was struck almost immediately by how little that research really has accomplished in terms of explaining gender wage discrimination. With all the heavy regression and econometric alchemy used, wage discrimination is somehow more or less conjured away …

Trying to reduce the risk of having established only ‘spurious relations’ when dealing with observational data, statisticians and econometricians standardly add control variables. The hope is that one thereby will be able to make more reliable causal inferences. But if you do not manage to get hold of all potential confounding factors, the model risks producing estimates of the variable of interest that are even worse than models without any control variables at all. Conclusion: think twice before you simply include ‘control variables’ in your models!

That women are working in different areas than men, and have other educations than men, etc., etc., are not only the result of ‘free choices’ causing a gender wage gap, but actually to a large degree itself the consequence of discrimination.

The gender pay gap is a fact that, sad to say, to a non-negligible extent is the result of discrimination. And even though many women are not deliberately discriminated against, but rather ‘self-select’ (sic!) into lower-wage jobs, this in no way magically explains away the discrimination gap. As decades of socialization research has shown, women may be ‘structural’ victims of impersonal social mechanisms that in different ways aggrieve them.

Looking at wage discrimination from a graph theoretical point of view one could arguably identify three paths between gender discrimination (D) and wages (W):

  1. D => W
  2. D => OCC => W
  3. D => OCC <= A => W

where occupation (OCC) is a mediator variable and unobserved ability (A) is a variable that affects both occupational choice and wages. The usual way to find out the effect of discrimination on wages is to perform a regression “controlling” for OCC to get what one considers a “meaningful” estimate of real gender wage discrimination:

W = a + bD + cOCC

The problem with this procedure is that conditioning on OCC not only closes the mediation path (2), but — since OCC is a “collider” — opens up the backdoor path (3) and creates a spurious and biased estimate. Forgetting that may even result in the gender discrimination effect being positively related to wages! So if we want to go down the standard path (controlling for OCC) we certainly also have to control for A if we want to have a chance of identifying the causal effect of gender discrimination on wages. And that may, of course, be tough going, since A often (as here) is unobserved and perhaps even unobservable …

C’n’est pas votre argent qui f’ra mon bonheur

15 May, 2021 at 14:20 | Posted in Varia | Leave a comment


The experimental dilemma

15 May, 2021 at 10:43 | Posted in Economics | Leave a comment

resissWe can either let theory guide us in our attempt to estimate causal relationships from data … or we don’t let theory guide us. If we let theory guide us, our causal inferences will be ‘incredible’ because our theoretical knowledge is itself not certain … If we do not let theory guide us, we have no good reason to believe that our causal conclusions are true either of the experimental population or of other populations because we have no understanding of the mechanisms that are responsible for a causal relationship to hold in the first place, and it is difficult to see how we could generalize an experimental result to other settings if this understanding doesn’t exist. Either way, then, causal inference seems to be a cul-de-sac.

Nowadays many mainstream economists maintain that ‘imaginative empirical methods’ — especially randomized experiments (RCTs) — can help us to answer questions concerning the external validity of economic models. In their view, they are, more or less, tests of ‘an underlying economic model’ and enable economists to make the right selection from the ever-expanding ‘collection of potentially applicable models.’

It is widely believed among economists that the scientific value of randomization — contrary to other methods — is totally uncontroversial and that randomized experiments are free from bias. When looked at carefully, however, there are in fact few real reasons to share this optimism on the alleged ’experimental turn’ in economics. Strictly seen, randomization does not guarantee anything.

‘Ideally controlled experiments’ tell us with certainty what causes what effects — but only given the right ‘closures.’ Making appropriate extrapolations from (ideal, accidental, natural or quasi) experiments to different settings, populations or target systems, is not easy. ‘It works there’ is no evidence for ‘it will work here’. Causes deduced in an experimental setting still have to show that they come with an export-warrant to the target population. The causal background assumptions made have to be justified, and without licenses to export, the value of ‘rigorous’ and ‘precise’ methods — and ‘on-average-knowledge’ — is despairingly small.

the-right-toolThe almost religious belief with which its propagators — including ‘Nobel prize’ winners like Duflo, Banerjee and Kremer  — portray it, cannot hide the fact that RCTs cannot be taken for granted to give generalizable results. That something works somewhere is no warranty for us to believe it to work for us here or that it works generally.

The present RCT idolatry is dangerous. Believing there is only one really good evidence-based method on the market — and that randomization is the only way to achieve scientific validity — blinds people to searching for and using other methods that in many contexts are better. RCTs are simply not the best method for all questions and in all circumstances. Insisting on using only one tool often means using the wrong tool.

The problems we face when using instrumental variables (student stuff)

14 May, 2021 at 23:26 | Posted in Statistics & Econometrics | Leave a comment


Finding the cace (student stuff)

14 May, 2021 at 13:56 | Posted in Economics | Leave a comment


Instrumentalvariabler och heterogenitet

13 May, 2021 at 19:25 | Posted in Statistics & Econometrics | Leave a comment

Användandet av instrumentalvariabler används numera flitigt bland ekonomer och andra samhällsforskare. Inte minst när man vill försöka gå bakom statistikens ‘korrelationer’ och också säga något om ‘kausalitet.’

causation1Tyvärr brister det ofta rejält i tolkningen av de resultat man får med hjälp av den vanligaste metoden som används för detta syfte — statistisk regressionsanalys.

Ett exempel från skolområdet belyser detta väl.

Ibland hävdas det bland skoldebattörer och politiker att friskolor skulle vara bättre än kommunala skolor. De sägs leda till bättre resultat. Alltså: om vi tänker oss att man skulle låta elever från friskolor och kommunala skolor genomföra gemensamma prov så skulle friskoleelever prestera bättre (fler rätt på provräkningar e d).

För argumentets skull antar vi att man för att ta reda på om det verkligen förhåller sig på detta sätt även i Malmö, slumpmässigt väljer ut högstadieelever i Malmö och låter dem skriva ett prov. Resultatet skulle då i vanlig regressionsanalytisk form kunna bli

Provresultat = 20 + 5*T,

där T=1 om eleven går i friskola, och T=0 om eleven går i kommunal skola. Detta skulle innebära att man får bekräftat antagandet — friskoleelever har i genomsnitt 5 poäng högre resultat än elever på kommunala skolor i Malmö.

Nu är ju politiker (förhoppningsvis) inte dummare än att de är medvetna om att detta statistiska resultat inte kan tolkas i kausala termer eftersom elever som går på friskolor typiskt inte har samma bakgrund (socio-ekonomiskt, utbildningsmässigt, kulturellt etc) som de som går på kommunala skolor (relationen skolform-resultat är ‘confounded’ via ‘selection bias.’)

För att om möjligt få ett bättre mått på skolformens kausala effekter väljer Malmös politiker  föreslå att man via lottning gör det möjligt för 1000 högstadieelever att bli antagna till en friskola. ‘Vinstchansen’ är 10%, så 100 elever får denna möjlighet. Av dessa antar 20 erbjudandet att gå i friskola. Av de 900 lotterideltagare som inte ‘vinner’ väljer 100 att gå i friskola.

Lotteriet uppfattas ofta av skolforskare som en ’instrumentalvariabel’ och när man så genomför regressionsanalysen med hjälp av denna visar sig resultatet bli

Provresultat = 20 + 2*T.

Detta tolkas standardmässigt som att man nu har fått ett kausalt mått på hur mycket bättre provresultat högstadieelever i Malmö i genomsnitt skulle få om de istället för att gå på kommunala skolor skulle välja att gå på friskolor.

Men stämmer det? Nej!

Om inte alla Malmös skolelever har exakt samma provresultat (vilket väl får anses vara ett rätt långsökt ‘homogenitetsantagande’) så gäller den angivna genomsnittliga kausala effekten bara de elever som väljer att gå på friskola om de ’vinner’ i lotteriet, men som annars inte skulle välja att gå på en friskola (på statistikjargong kallar vi dessa ’compliers’). Att denna grupp elever skulle vara speciellt intressant i det här exemplet är svårt att se med tanke på att den genomsnittliga kausala effekten skattad med hjälp av instrumentalvariabeln inte säger någonting alls om effekten för majoriteten (de 100 av 120 som väljer en friskola utan att ha ‘vunnit’ i lotteriet) av de som väljer att gå på en friskola.

Slutsats: forskare måste vara mycket mer försiktiga med att tolka vanliga statistiska regressionsanalyser och deras ‘genomsnittsskattningar’ som kausala. Verkligheten uppvisar en hög grad av heterogenitet. Och då säger oss regressionsanalysens konstanta ‘genomsnittsparametrar’ i regel inte ett smack!

Introduction to instrumental variables

13 May, 2021 at 19:23 | Posted in Statistics & Econometrics | Leave a comment


Great presentation, but maybe Angrist  should also have pointed out the mistake many economists do when they use instrumental variables analysis and think that their basic identification assumption is empirically testable. It is not. And just swapping an assumption of residuals being uncorrelated with the independent variables with the assumption that the same residuals are uncorrelated with an instrument doesn’t solve the endogeneity problem or improve our causal analysis.

Rösten vi aldrig glömmer

13 May, 2021 at 17:13 | Posted in Varia | Leave a comment




En av våra riktigt stora sångare — Svante Thuresson — har gått ur tiden.

Musik som glömskan inte rår på.


Are your instrumental variables — really — relevant, exclusive, and exogenous?

12 May, 2021 at 15:30 | Posted in Economics | Leave a comment


From association to causation

11 May, 2021 at 18:15 | Posted in Theory of Science & Methodology | Leave a comment


Evidence-based policy

10 May, 2021 at 13:15 | Posted in Theory of Science & Methodology | Leave a comment

‘Ideally controlled experiments’ tell us with certainty what causes what effects — but only given the right closures. Making appropriate extrapolations from (ideal, accidental, natural or quasi) experiments to different settings, populations or target systems, is not easy. “It works there” is no evidence for “it will work here”. Causes deduced in an experimental setting still have to show that they come with a transportability warrant to the target population/system. The causal background assumptions made have to be justified, and without licenses to transport, the value of “rigorous” and “precise” methods — and ‘on-average-knowledge’ — is despairingly small.

Evidence-Based Policy: A Practical Guide To Doing It Better:  Cartwright, Nancy, Hardie, Jeremy: 9780199841622: BooksLike us, you want evidence that a policy will work here, where you are. Randomized controlled trials (RCTs) do not tell you that. They do not even tell you that a policy works. What they tell you is that a policy worked there, where the trial was carried out, in that population. Our argument is that the changes in tense – from “worked” to “work” – are not just a matter of grammatical detail. To move from one to the other requires hard intellectual and practical effort. The fact that it worked there is indeed fact. But for that fact to be evidence that it will work here, it needs to be relevant to that conclusion. To make RCTs relevant you need a lot more information and of a very different kind.

So, no, I find it hard to share the enthusiasm and optimism on the value of (quasi)natural experiments and all the statistical-econometric machinery that comes with it. Guess I’m still waiting for the transportability warrant …

Only the best is good enough

9 May, 2021 at 19:21 | Posted in Varia | Leave a comment


Bille August’s and Ingmar Bergman’s masterpiece.

With breathtakingly beautiful music by Stefan Nilsson.

Via con me

8 May, 2021 at 19:11 | Posted in Varia | Leave a comment

“Non perderti per niente al mondo lo spettacolo di arte varia di un uomo innamorato di te.”

The role of manipulation and intervention in theories of causality

8 May, 2021 at 11:52 | Posted in Theory of Science & Methodology | 2 Comments

largepreviewAs X’s effect on some other variable in the system S depends on there
being a possible intervention on X, and the possibility of an intervention in
turn depends on the modularity of S, it is a necessary condition for something to be a cause that the system in which it is a cause is modular with respect to that factor. The requirement that all systems are modular with respect to their causes can, in a way, be regarded as an interventionist addition to the unmanipulable causes problem … This implication has also been criticized in particular by Nancy Cartwright. She has proposed that many causal systems are not modular … Pearl has responded to this in 2009 (sect. 11.4.7), where he proposes, on the one hand, that it is in general sufficient that a symbolic intervention can be performed on the causal model, for the determination of causal effects, and on the other hand that we nevertheless could isolate the individual causal contributions …

It is tempting—to philosophers at least—to equate claims in this literature,
about the meaning of causal claims being given by claims about what would
happen under a hypothetical intervention—or an explicit definition of causation to the same effect—with that same claim as it would be interpreted in a philosophical context. That is to say, such a claim would normally be understood there as giving the truth conditions of said causal claims. It is generally hard to know whether any such beliefs are involved in the scientific context. However, Pearl in particular has denied, in increasingly explicit terms, that this is what is intended … He has recently liked to describe a factor Y , that is causally dependent on another factor X, as “listening” to X and determining “its value in response to what it hears” … This formulation suggests to me that it is the fact that Y is “listening” to X that explains why and how Y changes under an intervention on X. That is, what a possible intervention does, is to isolate the influence that X has on Y , in virtue of Y ’s “listening” to X. Thus, Pearl’s theory does not imply an interventionist theory of causation, as we understand that concept in this monograph. This, moreover, suggests that the intervention that is always available, for any cause that is represented by a variable in a causal model, is a formal operation. I take this to be supported by the way he responds to Nancy Cartwright’s objection that modularity does not hold of all causal systems: it is sufficient that a symbolic intervention can be performed. Thus, the operation alluded to in Pearl’s operationalization of causation is a formal operation, always available, regardless of whether it corresponds to any possible intervention event or not.

Interesting dissertation well worth reading for anyone interested in the ongoing debate on the reach of interventionist causal theories.

Framing all causal questions as questions of manipulation and intervention runs in to many problems, especially when we open up for “hypothetical” and “symbolic” interventions. Humans have few barriers to imagining things, but that often also makes it difficult to value the proposed thought experiments in terms of relevance. Performing “well-defined” interventions is one thing, but if we do not want to to give up searching for answers to the questions we are interested in and instead only search for answerable questions, interventionist studies is of limited applicability and value. Intervention effects in thought experiments are not self-evidently the causal effects we are looking for. Identifying causes (reverse causality) and measuring effects of causes (forward causality) is not the same. In social sciences, like economics, we standardly first try to identify the problem and why it occurred, and then afterwards look at the effects of the causes.

Leaning on the interventionist approach often means that instead of posing interesting questions on a social level, focus is on individuals. Instead of asking about structural socio-economic factors behind, e.g., gender or racial discrimination, the focus is on the choices individuals make (which — as I maintain in my book Ekonomisk teori och metod — also tends to make the explanations presented inadequately ‘deep’).  A typical example of the dangers of this limiting approach is ‘Nobel prize’ winner Esther Duflo , who thinks that economics should be based on evidence from randomised experiments and field studies. Duflo et consortes want to give up on ‘big ideas’ like political economy and institutional reform and instead go for solving more manageable problems the way plumbers do. Yours truly is far from sure that is the right way to move economics forward and make it a relevant and realist science. A plumber can fix minor leaks in your system, but if the whole system is rotten, something more than good old fashion plumbing is needed. The big social and economic problems we face today are not going to be solved by plumbers performing interventions or manipulations in the form of RCTs.

The RCT controversy

7 May, 2021 at 09:00 | Posted in Theory of Science & Methodology | Leave a comment

In Social Science and Medicine (December 2017), Angus Deaton & Nancy Cartwright argue that RCTs do not have any warranted special status. They are, simply, far from being the ‘gold standard’ they are usually portrayed as:

rctsRandomized Controlled Trials (RCTs) are increasingly popular in the social sciences, not only in medicine. We argue that the lay public, and sometimes researchers, put too much trust in RCTs over other methods of in- vestigation. Contrary to frequent claims in the applied literature, randomization does not equalize everything other than the treatment in the treatment and control groups, it does not automatically deliver a precise estimate of the average treatment effect (ATE), and it does not relieve us of the need to think about (observed or un- observed) covariates. Finding out whether an estimate was generated by chance is more difficult than commonly believed. At best, an RCT yields an unbiased estimate, but this property is of limited practical value. Even then, estimates apply only to the sample selected for the trial, often no more than a convenience sample, and justi- fication is required to extend the results to other groups, including any population to which the trial sample belongs, or to any individual, including an individual in the trial. Demanding ‘external validity’ is unhelpful because it expects too much of an RCT while undervaluing its potential contribution. RCTs do indeed require minimal assumptions and can operate with little prior knowledge. This is an advantage when persuading dis- trustful audiences, but it is a disadvantage for cumulative scientific progress, where prior knowledge should be built upon, not discarded. RCTs can play a role in building scientific knowledge and useful predictions but they can only do so as part of a cumulative program, combining with other methods, including conceptual and theoretical development, to discover not ‘what works’, but ‘why things work’.

In a comment on Deaton & Cartwright, statistician Stephen Senn argues that on several issues concerning randomization Deaton & Cartwright “simply confuse the issue,” that their views are “simply misleading and unhelpful” and that they make “irrelevant” simulations:

My view is that randomisation should not be used as an excuse for ignoring what is known and observed but that it does deal validly with hidden confounders. It does not do this by delivering answers that are guaranteed to be correct; nothing can deliver that. It delivers answers about which valid probability statements can be made and, in an imperfect world, this has to be good enough. Another way I sometimes put it is like this: show me how you will analyse something and I will tell you what allocations are exchangeable. If you refuse to choose one at random I will say, “why? Do you have some magical thinking you’d like to share?”

Contrary to Senn, Andrew Gelman shares Deaton’s and Cartwright’s view that randomized trials often are overrated:

There is a strange form of reasoning we often see in science, which is the idea that a chain of reasoning is as strong as its strongest link. The social science and medical research literature is full of papers in which a randomized experiment is performed, a statistically significant comparison is found, and then story time begins, and continues, and continues—as if the rigor from the randomized experiment somehow suffuses through the entire analysis …

One way to get a sense of the limitations of controlled trials is to consider the conditions under which they can yield meaningful, repeatable inferences. The measurement needs to be relevant to the question being asked; missing data must be appropriately modeled; any relevant variables that differ between the sample and population must be included as potential treatment interactions; and the underlying effect should be large. It is difficult to expect these conditions to be satisfied without good substantive understanding. As Deaton and Cartwright put it, “when little prior knowledge is available, no method is likely to yield well-supported conclusions.” Much of the literature in statistics, econometrics, and epidemiology on causal identification misses this point, by focusing on the procedures of scientific investigation—in particular, tools such as randomization and p-values which are intended to enforce rigor—without recognizing that rigor is empty without something to be rigorous about.

My own view is that nowadays many social scientists maintain that ‘imaginative empirical methods’ — such as natural experiments, field experiments, lab experiments, RCTs — can help us to answer questions concerning the external validity of models used in social sciences. In their view they are more or less tests of ‘an underlying model’ that enable them to make the right selection from the ever expanding ‘collection of potentially applicable models.’ When looked at carefully, however, there are in fact few real reasons to share this optimism.

Many ‘experimentalists’ claim that it is easy to replicate experiments under different conditions and therefore a fortiori easy to test the robustness of experimental results. But is it really that easy? Population selection is almost never simple. Had the problem of external validity only been about inference from sample to population, this would be no critical problem. But the really interesting inferences are those we try to make from specific labs/experiments/fields to specific real-world situations/institutions/ structures that we are interested in understanding or (causally) to explain. And then the population problem is more difficult to tackle.

In randomized trials the researchers try to find out the causal effects that different variables of interest may have by changing circumstances randomly — a procedure somewhat (‘on average’) equivalent to the usual ceteris paribus assumption).

Besides the fact that ‘on average’ is not always ‘good enough,’ it amounts to nothing but hand waving to simpliciter assume, without argumentation, that it is tenable to treat social agents and relations as homogeneous and interchangeable entities.

Randomization is used to basically allow the econometrician to treat the population as consisting of interchangeable and homogeneous groups (‘treatment’ and ‘control’). The regression models one arrives at by using randomized trials tell us the average effect that variations in variable X has on the outcome variable Y, without having to explicitly control for effects of other explanatory variables R, S, T, etc., etc. Everything is assumed to be essentially equal except the values taken by variable X.

In a usual regression context one would apply an ordinary least squares estimator (OLS) in trying to get an unbiased and consistent estimate:

Y = α + βX + ε,

where α is a constant intercept, β a constant ‘structural’ causal effect and ε an error term.

The problem here is that although we may get an estimate of the ‘true’ average causal effect, this may ‘mask’ important heterogeneous effects of a causal nature. Although we get the right answer of the average causal effect being 0, those who are ‘treated'( X=1) may have causal effects equal to – 100 and those ‘not treated’ (X=0) may have causal effects equal to 100. Contemplating being treated or not, most people would probably be interested in knowing about this underlying heterogeneity and would not consider the OLS average effect particularly enlightening.

Limiting model assumptions in science always have to be closely examined since if we are going to be able to show that the mechanisms or causes that we isolate and handle in our models are stable in the sense that they do not change when we ‘export’ them to our ‘target systems,’ we have to be able to show that they do not only hold under ceteris paribus conditions and a fortiori only are of limited value to our understanding, explanations or predictions of real-world systems.

Most ‘randomistas’ underestimate the heterogeneity problem. It does not just turn up as an external validity problem when trying to ‘export’ regression results to different times or different target populations. It is also often an internal problem to the millions of regression estimates that are produced every year.

Just as econometrics, randomization promises more than it can deliver, basically because it requires assumptions that in practice are not possible to maintain. And just like econometrics, randomization is basically a deductive method. Given the assumptions, these methods deliver deductive inferences. The problem, of course, is that we will never completely know when the assumptions are right. And although randomization may contribute to controlling for confounding, it does not guarantee it, since genuine ramdomness presupposes infinite experimentation and we know all real experimentation is finite. And even if randomization may help to establish average causal effects, it says nothing of individual effects unless homogeneity is added to the list of assumptions. Causal evidence generated by randomization procedures may be valid in ‘closed’ models, but what we usually are interested in, is causal evidence in the real-world target system we happen to live in.

‘Ideally controlled experiments’ tell us with certainty what causes what effects — but only given the right ‘closures.’ Making appropriate extrapolations from (ideal, accidental, natural or quasi) experiments to different settings, populations or target systems, is not easy. “It works there” is no evidence for “it will work here”. Causes deduced in an experimental setting still have to show that they come with an export-warrant to the target population/system. The causal background assumptions made have to be justified, and without licenses to export, the value of ‘rigorous’ and ‘precise’ methods — and ‘on-average-knowledge’ — is despairingly small.

In our days, serious arguments have been made from data. Beautiful, delicate theorems have been proved, although the connection with data analysis often remains to be established. And an enormous amount of fiction has been produced, masquerading as rigorous science …

Indeed, far-reaching claims have been made for the superiority of a quantitative template that depends on modeling — by those who manage to ignore the far-reaching assumptions behind the models. However, the assumptions often turn out to be unsupported by data. If so, the rigor of advanced quantitative methods is a matter of appearance rather than substance …

 David A. Freedman

Next Page »

Blog at
Entries and Comments feeds.