Using formal mathematical modelling, mainstream economists sure can guarantee that the conclusions hold given the assumptions. However, the validity we get in abstract model worlds does not warrant transfer to real-world economies. Validity may be good, but it is not enough.
Mainstream economists are proud of having an ever-growing smorgasbord of models to cherry-pick from (as long as, of course, the models do not question the standard modelling strategy) when performing their analyses. The ‘rigorous’ and ‘precise’ deductions made in these closed models, however, are not in any way matched by a similar stringency or precision when it comes to what ought to be the most important stage of any economic research — making statements and explaining things in real economies. Although almost every mainstream economist holds the view that thought-experimental modelling has to be followed by confronting the models with reality — which is what they indirectly want to predict/explain/understand using their models — they then all of a sudden become exceedingly vague and imprecise. It is as if all the intellectual force has been invested in the modelling stage and nothing is left for what really matters — what exactly do these models teach us about real economies.
No matter how precise and rigorous the analysis, and no matter how hard one tries to cast the argument in modern mathematical form, they do not push economic science forward one single iota if they do not stand the acid test of relevance to the target. Proving things ‘rigorously’ in mathematical models is not a good recipe for doing an interesting and relevant economic analysis. Forgetting to supply export warrants to the real world makes the analysis an empty exercise in formalism without real scientific value. In the realm of true science, it is of little or no value to simply make claims about a model and lose sight of reality.
To have valid evidence is not enough. What economics needs is sound evidence. The premises of a valid argument do not have to be true, but a sound argument, on the other hand, is not only valid but builds on premises that are true. Aiming only for validity, without soundness, is setting the economics aspiration level too low for developing a realist and relevant science.
A major, and notorious, problem with this approach, at least in the domain of science, concerns how to ascribe objective prior probabilities to hypotheses. What seems to be necessary is that we list all the possible hypotheses in some domain and distribute probabilities among them, perhaps ascribing the same probability to each employing the principal of indifference. But where is such a list to come from? It might well be thought that the number of possible hypotheses in any domain is infinite, which would yield zero for the probability of each and the Bayesian game cannot get started. All theories have zero probability and Popper wins the day. How is some finite list of hypotheses enabling some objective distribution of nonzero prior probabilities to be arrived at? My own view is that this problem is insuperable, and I also get the impression from the current literature that most Bayesians are themselves coming around to this point of view.
Chalmers is absolutely right here in his critique of ‘objective’ Bayesianism, but I think it could actually be extended to also encompass its ‘subjective’ variety.
A classic example — borrowed from Bertrand Russell — may perhaps be allowed to illustrate the main point of the critique:
Assume you’re a Bayesian turkey and hold a nonzero probability belief in hypothesis H that “people are nice vegetarians that do not eat turkeys and that every day I see the sun rise confirms my belief.” For every day you survive, you update your belief according to Bayes’ theorem
P(H|e) = [P(e|H)P(H)]/P(e),
where evidence e stands for “not being eaten” and P(e|H) = 1. Given there do exist other hypotheses than H, P(e) is less than 1 and a fortiori P(H|e) is greater than P(H). Every day you survive increases your probability belief that you will not be eaten. This is totally rational according to the Bayesian definition of rationality. Unfortunately, for every day that goes by, the traditional Christmas dinner also gets closer and closer …
The nodal point here is — of course — that although Bayes’ theorem is mathematically unquestionable, that doesn’t qualify it as indisputably applicable to scientific questions.
Bayesian probability calculus is far from the automatic inference engine that its protagonists maintain it is. Where do the priors come from? Wouldn’t it be better in science if we did some scientific experimentation and observation if we are uncertain, rather than starting to make calculations based on people’s often vague and subjective personal beliefs? Is it, from an epistemological point of view, really credible to think that the Bayesian probability calculus makes it possible to somehow fully assess people’s subjective beliefs? And are — as most Bayesians maintain — all scientific controversies and disagreements really possible to explain in terms of differences in prior probabilities? I’ll be dipped!
If we do not fully explain by adding more variables, how do we explain? Mechanisms explain because they embody an invariant property. The first mechanism, linking the gas pedal to the rotating drivetrain, is combustion: The second mechanism, linking the rotating drivetrain to acceleration, is the relationship of torque to force. Combustion is a high energy-initiated, exothermic (heat-generating) chemical reaction between a compound such as a hydrocarbon and an oxidant such as oxygen. The heat generated by combustion increases pressure in a sealed cylinder and impels a piston. A similarly brief description could be given of the relationship between torque, force, and acceleration. The key point is this: Combustion is not a variable. In the proper circumstances -the combination of specific compounds and oxidizing agents, with a high energy initiation to overcome the stability of dioxygen molecules-combustion occurs with law-like regularity. That regularity can in turn be explained by more fine-grained physical processes at the subatomic level.
By saying that a mechanism like combustion is invariant, not a variable, I am stating that it cannot be directly manipulated; one cannot intervene to turn combustion off. We can block the mechanism of combustion from working by intervening on ancestral nodes. We can remove all fuel from the car or disable the electrical system; in either case, depressing the gas pedal will not cause the car to accelerate any longer because combustion cannot take place. But if gas is present and the electrical system is functioning, we cannot intervene to set combustion to the value “off”. If a fuel-air mixture is combined with a functioning ignition system, combustion occurs, and so combustion should not be represented by a node in the causal graph upon which we could intervene. Mechanisms embody invariant causal principles and, in particular instantiations, they generate observed correlations. Mechanisms are thus distinct from causal pathways; they explain the location of the directed edges (arrowheads). It is the invariant causal principle combustion that explains why (given a few more mechanical details) depressing the gas pedal makes the drivetrain rotate. Mechanisms explain the relationship between variables because they are not variables.
There are good reasons to think that moderating causes have an important role general in explaining development and growth. Why? The growth process is apparently strongly affected by what economists call complementarities. Complementarities exist when the action of an agent or the existence of practice affects the marginal benefit to another agent taking an action or to the marginal benefit of another practice. Education is again a good example.A well-trained workforce promotes high value-added production and the existence of the latter provides incentives for educational attainment. The influence of either on growth depends on the value of the other. Arguably, there are complementarities across the board for the factors that matter for development. Other examples besides human capital include market size and the division of labor, and financial development and investment. Complementarities create the kind of contextual effects characteristic of what I have called complex causality.
I am not claiming that using explicit DAGs and explicitly testing them in development economics or elsewhere is a bad thing; quite the opposite. It is a significant improvement over the standard practice of uninterpreted regressions. Nor am I claiming the problems I am pointing to are completely unapproachable in the causal modeling framework. For example, samples can be divided along a moderator variable and separate DAGs tested, with differences being evidence for effect modification. My concern, however, is that the DAG formalism not become a hammer where everything is a nail.
Psychology professor Susan Fiske doesn’t like when people use social media to publish negative comments on published research. She’s implicitly following what I’ve sometimes called the research incumbency rule: that, once an article is published in some approved venue, it should be taken as truth. I’ve written elsewhere on my problems with this attitude — in short, (a) many published papers are clearly in error, which can often be seen just by internal examination of the claims and which becomes even clearer following unsuccessful replication, and (b) publication itself is such a crapshoot that it’s a statistical error to draw a bright line between published and unpublished work …
If you’d been deeply invested in the old system, it must be pretty upsetting to think about change. Fiske is in the position of someone who owns stock in a failing enterprise, so no wonder she wants to talk it up. The analogy’s not perfect, though, because there’s no one for her to sell her shares to. What Fiske should really do is cut her losses, admit that she and her colleagues were making a lot of mistakes, and move on … Short term, though, I guess it’s a lot more comfortable for her to rant about replication terrorists and all that …
And let me emphasize here that, yes, statisticians can play a useful role in this discussion. If Fiske etc. really hate statistics and research methods, that’s fine; they could try to design transparent experiments that work every time. But, no, they’re the ones justifying their claims using p-values extracted from noisy data, … they’re the ones who seem to believe just about anything (e.g., the claim that women were changing their vote preferences by 20 percentage points based on the time of the month) if it has a “p less than .05” attached to it. If that’s the game you want to play, then methods criticism is relevant, for sure …
I am posting this on our blog, where anyone has an opportunity to respond. That’s right, anyone. Susan Fiske can respond, and so can anyone else. Including lots of people who have an interest in psychological science but don’t have the opportunity to write non-peer-reviewed articles for the APS Observer, who aren’t tenured professors at major universities, etc. This is open discussion, it’s the opposite of terrorism. And I think it’s pretty ridiculous that I even have to say such a thing which is so obvious.
30 Mar, 2023 at 17:19 | Posted in Theory of Science & Methodology | Comments Off on Warranting causal claims — vouchers and clinchers
Methods for warranting causal claims fall into two broad categories. There are those that clinch the conclusion but are narrow in their range of application; and those that merely vouch for the conclusion but are broad in their range of application.
Derivation from theory falls into the first category, as do randomized clinical trials (RCTs), econometric methods and others. What is characteristic of methods in this category is that they are deductive: if they are correctly applied, then if the evidence claims are true, so too will the conclusions be true. That is a huge benefit. But there is an equally huge cost. These methods are concomitantly narrow in scope. The assumptions necessary for their successful application tend to be extremely restrictive and they can only take a very specialized type of evidence as input and special forms of conclusion as output.
Those in the second category … are more wide ranging but it cannot be proved that the conclusion is assured by the evidence, either because the method cannot be laid out in a way that lends itself to such a proof or because, by the lights of the method itself, the evidence is symptomatic of the conclusion but not sufficient for it. What then is it to vouch for? That is hard to say since the relation between evidence and conclusion in these cases is not deductive and I do not think there are any good logics’ of non-deductive confirmation, especially ones that make sense for the great variety of methods we use to provide warrant …
My worry is that we want to use clinchers so that we can get a result from a small body of evidence rather than tackling the problems of how to handle a large amorphous body of evidence loosely connected with the hypothesis. This would be okay if only it were not for the down-side of these deductive methods – the conditions under which they can give conclusions at all are very strict.
Mainstream economics is at its core in the story-telling business whereby economic theorists create make-believe analogue models of the target system – usually conceived as the real economic system. This modelling activity is considered useful and essential. Since fully-fledged experiments on a societal scale as a rule are prohibitively expensive, ethically indefensible or unmanageable, economic theorists have to substitute experimenting with something else. To understand and explain relations between different entities in the real economy the predominant strategy is to build models and make things happen in these “analogue-economy models” rather than engineering things happening in real economies.
Formalistic deductive “Glasperlenspiel” can be very impressive and seductive. But in the realm of science, it ought to be considered of little or no value to simply make claims about the model and lose sight of reality. As Julian Reiss writes:
There is a difference between having evidence for some hypothesis and having evidence for the hypothesis relevant for a given purpose. The difference is important because scientific methods tend to be good at addressing hypotheses of a certain kind and not others: scientific methods come with particular applications built into them … The advantage of mathematical modelling is that its method of deriving a result is that of mathemtical proof: the conclusion is guaranteed to hold given the assumptions. However, the evidence generated in this way is valid only in abstract model worlds while we would like to evaluate hypotheses about what happens in economies in the real world … The upshot is that valid evidence does not seem to be enough. What we also need is to evaluate the relevance of the evidence in the context of a given purpose.
Proving things about thought-up worlds is not enough. To have valid evidence is not enough. A deductive argument is valid if it makes it impossible for the premises to be true and the conclusion to be false. Fine, but what we need in science is sound arguments — arguments that are both valid and whose premises are all actually true.
Theories and models being ‘coherent’ or ‘consistent’ with data do not make the theories and models success stories. We want models to somehow represent their real-world targets. This representation can not be complete in most cases because of the complexity of the target systems. That kind of incompleteness is unavoidable. But it’s a totally different thing when models misrepresent real-world targets. Aiming only for validity, without soundness, is setting the economics aspirations level too low for developing a realist and relevant science.
Social scientists pursue a variety of different purposes such as predicting events of interest, explaining individual events or general phenomena, and controlling outcomes for policy. It is interesting to note that the language of“cause” is employed in all these contexts …
What kind of causal hypothesis should be investigated (and, in tandem, what kind of evidence should be sought) therefore is to be determined on the basis of purpose pursued in the given context. For certain kinds of prediction, Granger causation is appropriate and thus probabilistic evidence. Explanation is itself a multifaceted concept, and different notions of explanation require counterfactual, regularity, or mechanistic concepts of cause and the associated kind of evidence. Some kinds of policy require a concept of cause as invariant under intervention and, again, evidence able to support this kind of relation …
Although there are different kinds of evidence for causal relationships, different kinds of evidence tend to support different types of causal claim, a fact that ties evidence and type of causal claim together very tightly. This is unfortunate as we pursue many different purposes and it would be nice if we could establish that X causes Y and thereby be helped in realizing all our purposes. For instance, it would be nice if we could base policies on probabilistic evidence or if we found a mechanism between X and Y infer that X makes a difference to Y. As a general rule, this will not work. To be sure, the different kinds of causal claim are sometimes true of the same system, but whether that is so is an empirical question that has to be addressed, and answered supported by evidence, in its own right.
Preference-based discrimination is based on the fact that, for example, employers, customers, or colleagues have a dislike for those who belong to a certain group. Such discrimination can lead to wage differences between discriminated and non-discriminated groups. However, competition can undermine these wage differences, as non-discriminatory employers will make greater profits and drive discriminatory employers out of the market. Since many markets are not characterized by perfect competition, there is still the possibility for this type of discrimination to persist.
In contrast, statistical discrimination describes situations where those who belong to different groups are affected by expectations about the group’s average characteristics. These differences in characteristics between different groups can be real or based on pure prejudice, but it is difficult to see why profit-maximizing employers would not realize that pure prejudices are just that. They can have a lot to gain by finding out how things actually are and adapting their behavior accordingly. However, even what starts out as a pure prejudice can become a self-fulfilling prophecy with actual consequences. If employers in general avoid investing in a group of employees because they are expected to prioritize family over career, the group in question may find it completely rational to prioritize family.
Priks and Vlachos’ The Tools of Economics is an expression of a new trend in economics, where there is a growing interest in experiments and — not least — how to design them to possibly provide answers to questions about causality and policy effects. Economic research on discrimination nowadays often emphasizes the importance of a randomization design, for example when trying to determine to what extent discrimination can be causally attributed to differences in preferences or information, using so-called correspondence tests and field experiments.
A common starting point is the ‘counterfactual approach’ developed mainly by Neyman and Rubin, which is often presented and discussed based on examples of randomized control studies, natural experiments, difference in difference, matching, regression discontinuity, etc.
Mainstream economists generally view this development of the economics toolbox positively. Since yours truly — like, for example, Nancy Cartwright and Angus Deaton — is not entirely positive about the randomization approach, it may perhaps be interesting for the reader to hear some of my criticisms.
A notable limitation of counterfactual randomization designs is that they only give us answers on how ‘treatment groups’ differ on average from ‘control groups.’ Let me give an example to illustrate how limiting this fact can be:
Among school debaters and politicians in Sweden, it is claimed that so-called ‘independent schools’ (charter schools) are better than municipal schools. They are said to lead to better results. To find out if this is really the case, a number of students are randomly selected to take a test. The result could be: Test result = 20 + 5T, where T=1 if the student attends an independent school and T=0 if the student attends a municipal school. This would confirm the assumption that independent school students have an average of 5 points higher results than students in municipal schools. Now, politicians (hopefully) are aware that this statistical result cannot be interpreted in causal terms because independent school students typically do not have the same background (socio-economic, educational, cultural, etc.) as those who attend municipal schools (the relationship between school type and result is confounded by selection bias). To obtain a better measure of the causal effects of school type, politicians suggest that 1000 students be admitted to an independent school through a lottery — a classic example of a randomization design in natural experiments. The chance of winning is 10%, so 100 students are given this opportunity. Of these, 20 accept the offer to attend an independent school. Of the 900 lottery participants who do not ‘win,’ 100 choose to attend an independent school. The lottery is often perceived by school researchers as an ‘instrumental variable,’ and when the analysis is carried out, the result is: Test result = 20 + 2T. This is standardly interpreted as having obtained a causal measure of how much better students would, on average, perform on the test if they chose to attend independent schools instead of municipal schools. But is it true? No! If not all school students have exactly the same test results (which is a rather far-fetched ‘homogeneity assumption’), the specified average causal effect only applies to the students who choose to attend an independent school if they ‘win’ the lottery, but who would not otherwise choose to attend an independent school (in statistical jargon, we call these ‘compliers’). It is difficult to see why this group of students would be particularly interesting in this example, given that the average causal effect estimated using the instrumental variable says nothing at all about the effect on the majority (the 100 out of 120 who choose to attend an independent school without ‘winning’ in the lottery) of those who choose to attend an independent school.
Conclusion: Researchers must be much more careful in interpreting ‘average estimates’ as causal. Reality exhibits a high degree of heterogeneity, and ‘average parameters’ often tell us very little!
To randomize ideally means that we achieve orthogonality (independence) in our models. But it does not mean that in real experiments when we randomize, we achieve this ideal. The ‘balance’ that randomization should ideally result in cannot be taken for granted when the ideal is translated into reality. Here, one must argue and verify that the ‘assignment mechanism’ is truly stochastic and that ‘balance’ has indeed been achieved!
Even if we accept the limitation of only being able to say something about average treatment effects there is another theoretical problem. An ideal randomized experiment assumes that a number of individuals are first chosen from a randomly selected population and then randomly assigned to a treatment group or a control group. Given that both selection and assignment are successfully carried out randomly, it can be shown that the expected outcome difference between the two groups is the average causal effect in the population. The snag is that the experiments conducted almost never involve participants selected from a random population! In most cases, experiments are started because there is a problem of some kind in a given population (e.g., schoolchildren or job seekers in country X) that one wants to address. An ideal randomized experiment assumes that both selection and assignment are randomized — this means that virtually none of the empirical results that randomization advocates so eagerly tout hold up in a strict mathematical-statistical sense. The fact that only assignment is talked about when it comes to ‘as if’ randomization in natural experiments is hardly a coincidence. Moreover, when it comes to ‘as if’ randomization in natural experiments, the sad but inevitable fact is that there can always be a dependency between the variables being studied and unobservable factors in the error term, which can never be tested!
Another significant and major problem is that researchers who use these randomization-based research strategies often set up problem formulations that are not at all the ones we really want answers to, in order to achieve ‘exact’ and ‘precise’ results. Design becomes the main thing, and as long as one can get more or less clever experiments in place, they believe they can draw far-reaching conclusions about both causality and the ability to generalize experimental outcomes to larger populations. Unfortunately, this often means that this type of research has a negative bias away from interesting and important problems towards prioritizing method selection. Design and research planning are important, but the credibility of research ultimately lies in being able to provide answers to relevant questions that both citizens and researchers want answers to.
Believing there is only one really good evidence-based method on the market — and that randomization is the only way to achieve scientific validity — blinds people to searching for and using other methods that in many contexts are better. Insisting on using only one tool often means using the wrong tool.
In mainstream economics, both logic and mathematics are used extensively. And most mainstream economists sure look upon themselves as “twice blessed.”
Is there any scientific ground for that blessedness? None whatsoever!
If scientific progress in economics lies in our ability to tell ‘better and better stories’ one would, of course, expect economics journals to be filled with articles supporting the stories with empirical evidence confirming the predictions. However, the journals still show a striking and embarrassing paucity of empirical studies that (try to) substantiate these predictive claims. Equally amazing is how little one has to say about the relationship between the model and real-world target systems. It is as though explicit discussion, argumentation, and justification on the subject aren’t considered to be required.
In mathematics and logic, the deductive-axiomatic method has worked just fine. But science is not mathematics or logic. Conflating those two domains of knowledge has been one of the most fundamental mistakes made in modern economics. Applying it to real-world open systems immediately proves it to be excessively narrow and hopelessly irrelevant. Both the confirmatory and explanatory ilk of hypothetico-deductive reasoning fails since there is no way you can relevantly analyze confirmation or explanation as a purely logical relation between hypothesis and evidence or between law-like rules and explananda. In science, we argue and try to substantiate our beliefs and hypotheses with reliable evidence. Propositional and predicate deductive logic, on the other hand, is not about reliability, but the validity of the conclusions given that the premises are true.
That logic should have been thus successful is an advantage which it owes entirely to its limitations, whereby it is justified in abstracting — indeed, it is under obligation to do so — from all objects of knowledge and their differences, leaving the understanding nothing to deal with save itself and its form. But for reason to enter on the sure path of science is, of course, much more difficult, since it has to deal not with itself alone but also with objects. Logic, therefore, as a propaedeutic, forms, as it were, only the vestibule of the sciences; and when we are concerned with specific modes of knowledge, while logic is indeed presupposed in any critical estimate of them, yet for the actual acquiring of them we have to look to the sciences properly so called, that is, to the objective sciences.
What enables and yet constrains research? What is both medium and outcome of research? What do researchers reproduce without even knowing it? What is supposed to unite researchers but may divide them? What empowers researchers to speak but is never fully articulated? What is played out in the routine of research but can never be routinised? What is the responsibility of all researchers but for which none has a mandate?
The answer to all of these riddles is METHODOLOGY.
If you’d like to learn more on the issue, have a look at James Surowiecki’s The Wisdom of Crowds (Anchor Books, 2005) or Scott Page’s The Diversity Bonus (Princeton University Press, 2017).
I like comments. Follow netiquette. Comments — especially anonymous ones — with pseudo argumentations, abusive language or irrelevant links will not be posted. And please remember — being a full-time professor leaves only limited time to respond to comments.