An ongoing concern is that excessive focus on formal modeling and statistics can lead to neglect of practical issues and to overconfidence in formal results … Analysis interpretation depends on contextual judgments about how reality is to be mapped onto the model, and how the formal analysis results are to be mapped back into reality. But overconfidence in formal outputs is only to be expected when much labor has gone into deductive reasoning. First, there is a need to feel the labor was justified, and one way to do so is to believe the formal deduction produced important conclusions. Second, there seems to be a pervasive human aversion to uncertainty, and one way to reduce feelings of uncertainty is to invest faith in deduction as a sufficient guide to truth. Unfortunately, such faith is as logically unjustified as any religious creed, since a deduction produces certainty about the real world only when its assumptions about the real world are certain …
Unfortunately, assumption uncertainty reduces the status of deductions and statistical computations to exercises in hypothetical reasoning – they provide best-case scenarios of what we could infer from specific data (which are assumed to have only specific, known problems). Even more unfortunate, however, is that this exercise is deceptive to the extent it ignores or misrepresents available information, and makes hidden assumptions that are unsupported by data …
Despite assumption uncertainties, modelers often express only the uncertainties derived within their modeling assumptions, sometimes to disastrous consequences. Econometrics supplies dramatic cautionary examples in which complex modeling has failed miserably in important applications …
Yes, indeed, econometrics fails miserably over and over again. One reason why it does, is that the error term in the regression models used are thought of as representing the effect of the variables that were omitted from the models. The error term is somehow thought to be a ‘cover-all’ term representing omitted content in the model and necessary to include to ‘save’ the assumed deterministic relation between the other random variables included in the model. Error terms are usually assumed to be orthogonal (uncorrelated) to the explanatory variables. But since they are unobservable, they are also impossible to empirically test. And without justification of the orthogonality assumption, there is as a rule nothing to ensure identifiability:
With enough math, an author can be confident that most readers will never figure out where a FWUTV (facts with unknown truth value) is buried. A discussant or referee cannot say that an identification assumption is not credible if they cannot figure out what it is and are too embarrassed to ask.
Distributional assumptions about error terms are a good place to bury things because hardly anyone pays attention to them. Moreover, if a critic does see that this is the identifying assumption, how can she win an argument about the true expected value the level of aether? If the author can make up an imaginary variable, “because I say so” seems like a pretty convincing answer to any question about its properties.
It is well known that even experienced scientists routinely misinterpret p-values in all sorts of ways, including confusion of statistical and practical significance, treating non-rejection as acceptance of the null hypothesis, and interpreting the p-value as some sort of replication probability or as the posterior probability that the null hypothesis is true …
It is shocking that these errors seem so hard-wired into statisticians’ thinking, and this suggests that our profession really needs to look at how it teaches the interpretation of statistical inferences. The problem does not seem just to be technical misunderstandings; rather, statistical analysis is being asked to do something that it simply can’t do, to bring out a signal from any data, no matter how noisy. We suspect that, to make progress in pedagogy, statisticians will have to give up some of the claims we have implicitly been making about the effectiveness of our methods …
It would be nice if the statistics profession was offering a good solution to the significance testing problem and we just needed to convey it more clearly. But, no, … many statisticians misunderstand the core ideas too. It might be a good idea for other reasons to recommend that students take more statistics classes—but this won’t solve the problems if textbooks point in the wrong direction and instructors don’t understand what they are teaching. To put it another way, it’s not that we’re teaching the right thing poorly; unfortunately, we’ve been teaching the wrong thing all too well.
Teaching both statistics and economics, yours truly can’t but notice that the statements “give up some of the claims we have implicitly been making about the effectiveness of our methods” and “it’s not that we’re teaching the right thing poorly; unfortunately, we’ve been teaching the wrong thing all too well” obviously apply not only to statistics …
And the solution? Certainly not — as Gelman and Carlin also underline — to reform p-values. Instead we have to accept that we live in a world permeated by genuine uncertainty and that it takes a lot of variation to make good inductive inferences.
Sounds familiar? It definitely should!
The standard view in statistics – and the axiomatic probability theory underlying it – is to a large extent based on the rather simplistic idea that ‘more is better.’ But as Keynes argues in his seminal A Treatise on Probability (1921), ‘more of the same’ is not what is important when making inductive inferences. It’s rather a question of ‘more but different’ — i.e., variation.
Variation, not replication, is at the core of induction. Finding that p(x|y) = p(x|y & w) doesn’t make w ‘irrelevant.’ Knowing that the probability is unchanged when w is present gives p(x|y & w) another evidential weight (‘weight of argument’). Running 10 replicative experiments do not make you as ‘sure’ of your inductions as when running 10 000 varied experiments – even if the probability values happen to be the same.
According to Keynes we live in a world permeated by unmeasurable uncertainty – not quantifiable stochastic risk – which often forces us to make decisions based on anything but ‘rational expectations.’ Keynes rather thinks that we base our expectations on the confidence or ‘weight’ we put on different events and alternatives. To Keynes expectations are a question of weighing probabilities by ‘degrees of belief,’ beliefs that often have preciously little to do with the kind of stochastic probabilistic calculations made by the rational agents as modeled by “modern” social sciences. And often we ‘simply do not know.’
In many social sciences p values and null hypothesis significance testing (NHST) are often used to draw far-reaching scientific conclusions – despite the fact that they are as a rule poorly understood and that there exist altenatives that are easier to understand and more informative.
Not the least using confidence intervals (CIs) and effect sizes are to be preferred to the Neyman-Pearson-Fisher mishmash approach that is so often practised by applied researchers.
Running a Monte Carlo simulation with 100 replications of a fictitious sample having N = 20, confidence itervals of 95%, a normally distributed population with a mean = 10 and a standard deviation of 20, taking two-tailed p values on a zero null hypothesis, we get varying CIs (since they are based on varying sample standard deviations), but with a minimum of 3.2 and a maximum of 26.1 we still get a clear picture of what would happen in an infinite limit sequence. On the other hand p values (even though from a purely mathematical statistical sense more or less equivalent to CIs) vary strongly from sample to sample, and jumping around between a minimum of 0.007 and a maximum of 0.999 don’t give you a clue of what will happen in an infinite limit sequence! So, I can’t but agree with Geoff Cummings:
The problems are so severe we need to shift as much as possible from NHST … The first shift should be to estimation: report and interpret effect sizes and CIs … I suggest p should be given only a marginal role, its problem explained, and it should be interpreted primarily as an indicator of where the 95% CI falls in relation to a null hypothesised value.
In case you want to do your own Monte Carlo simulation, here’s an example I’ve made using Gretl:
loop 100 –progressive
series y = normal(10,15)
scalar zs = (10-mean(y))/sd(y)
scalar df = $nobs-1
scalar ysd= sd(y)
scalar tstat = (ybar-10)/ybarsd
pvalue t df tstat
scalar lowb = mean(y) – critical(t,df,0.025)*ybarsd
scalar uppb = mean(y) + critical(t,df,0.025)*ybarsd
scalar pval = pvalue(t,df,tstat)
store E:\pvalcoeff.gdt lowb uppb pval
Exploratory factor analysis exploits correlations to summarize data, and confirmatory factor analysis — stuff like testing that the right partial correlations vanish — is a prudent way of checking whether a model with latent variables could possibly be right. What the modern g-mongers do, however, is try to use exploratory factor analysis to uncover hidden causal structures. I am very, very interested in the latter pursuit, and if factor analysis was a solution I would embrace it gladly. But if factor analysis was a solution, when my students asked me (as they inevitably do) “so, how do we know how many factors we need?”, I would be able to do more than point them to rules of thumb based on squinting at “scree plots” like this and guessing where the slope begins. (There are ways of estimating the intrinsic dimension of noisily-sampled manifolds, but that’s not at all the same.) More broadly, factor analysis is part of a larger circle of ideas which all more or less boil down to some combination of least squares, linear regression and singular value decomposition, which are used in the overwhelming majority of work in quantitative social science, including, very much, work which tries to draw causal inferences without the benefit of experiments. A natural question — but one almost never asked by users of these tools — is whether they are reliable instruments of causal inference. The answer, unequivocally, is “no”.
I will push extra hard, once again, Clark Glymour’s paper on The Bell Curve, which patiently explains why these tools are just not up to the job of causal inference … The conclusions people reach with such methods may be right and may be wrong, but you basically can’t tell which from their reports, because their methods are unreliable.
This is why I said that using factor analysis to find causal structure is like telling time with a stopped clock. It is, occasionally, right. Maybe the clock stopped at 12, and looking at its face inspires you to look at the sun and see that it’s near its zenith, and look at shadows and see that they’re short, and confirm that it’s near noon. Maybe you’d not have thought to do those things otherwise; but the clock gives no evidence that it’s near noon, and becomes no more reliable when it’s too cloudy for you to look at the sun.
The one place that preregistration is really needed … is if you want clean p-values. A p-value is very explicitly a statement about how you would’ve analyzed the data, had they come out differently. Sometimes when I’ve criticized published p-values on the grounds of forking paths, the original authors have fought back angrily, saying how unfair it is for me to first make an assumption about what they would’ve done under different conditions, and then make conclusions based on these assumptions. But they’re getting things backward: By stating a p-value at all, they’re the ones who are making a very strong assumption about their hypothetical behavior—an assumption that, in general, I have no reason to believe.
Preregistration is in fact the only way to ensure that p-values can be taken at their nominal values. In that way, preregistration is like random sampling which, strictly speaking, is the only way that sampling probabilities, estimates, standard errors, etc., can be taken at their nominal values …
Yes, you can do surveys and get estimates and standard errors without ever taking a random sample … but to do this we need to make assumptions.
And, yes, you can do causal inference from observational studies—indeed, in many settings this is absolutely necessary—but, again, assumptions are needed …
Just as a serious social science journal—or even Psychological Science or PPNAS—would never accept a paper on sampling without some discussion of the representativeness of the sample, and just as they would never accept a causal inference based on a simple regression with no identification strategy and no discussion of imbalance between treatment and control groups, so should they not take seriously a p-value without a careful assessment of the assumptions underlying it.
Causal modeling attempts to maintain this deductive focus within imperfect research by deriving models for observed associations from more elaborate causal (‘structural’) models with randomized inputs … But in the world of risk assessment … the causal-inference process cannot rely solely on deductions from models or other purely algorithmic approaches. Instead, when randomization is doubtful or simply false (as in typical applications), an honest analysis must consider sources of variation from uncontrolled causes with unknown, nonrandom interdependencies. Causal identification then requires nonstatistical information in addition to information encoded as data or their probability distributions …
This need raises questions of to what extent can inference be codified or automated (which is to say, formalized) in ways that do more good than harm. In this setting, formal models – whether labeled ‘‘causal’’ or ‘‘statistical’’ – serve a crucial but limited role in providing hypothetical scenarios that establish what would be the case if the assumptions made were true and the input data were both trustworthy and the only data available. Those input assumptions include all the model features and prior distributions used in the scenario, and supposedly encode all information being used beyond the raw data file (including information about the embedding context as well as the study design and execution).
Overconfident inferences follow when the hypothetical nature of these inputs is forgotten and the resulting outputs are touted as unconditionally sound scientific inferences instead of the tentative suggestions that they are (however well informed) …
The practical limits of formal models become especially apparent when attempting to integrate diverse information sources. Neither statistics nor medical science begins to capture the uncertainty attendant in this process, and in fact both encourage pernicious overconfidence by failing to make adequate allowance for unmodeled uncertainty sources. Instead of emphasizing the uncertainties attending field research, statistics and other quantitative methodologies tend to focus on mathematics and often fall prey to the satisfying – and false – sense of logical certainty that brings to population inferences. Meanwhile, medicine focuses on biochemistry and physiology, and the satisfying – and false – sense of mechanistic certainty about results those bring to individual events.
Wise words from a renowned epidemiologist.
As long as economists and statisticians cannot identify their statistical theories with real-world phenomena there is no real warrant for taking their statistical inferences seriously.
Just as there is no such thing as a ‘free lunch,’ there is no such thing as a ‘free probability.’ To be able at all to talk about probabilities, you have to specify a model. If there is no chance set-up or model that generates the probabilistic outcomes or events -– in statistics one refers to any process where you observe or measure as an experiment (rolling a die) and the results obtained as the outcomes or events (number of points rolled with the die, being e. g. 3 or 5) of the experiment -– there, strictly seen, is no event at all.
Probability is a relational element. It always must come with a specification of the model from which it is calculated. And then to be of any empirical scientific value it has to be shown to coincide with (or at least converge to) real data generating processes or structures –- something seldom or never done!
And this is the basic problem with economic data. If you have a fair roulette-wheel, you can arguably specify probabilities and probability density distributions. But how do you conceive of the analogous ‘nomological machines’ for prices, gross domestic product, income distribution etc? Only by a leap of faith. And that does not suffice. You have to come up with some really good arguments if you want to persuade people into believing in the existence of socio-economic structures that generate data with characteristics conceivable as stochastic events portrayed by probabilistic density distributions!
The tool of statistical inference becomes available as the result of a self-imposed limitation of the universe of discourse. It is assumed that the available observations have been generated by a probability law or stochastic process about which some incomplete knowledge is available a priori …
It should be kept in mind that the sharpness and power of these remarkable tools of inductive reasoning are bought by willingness to adopt a specification of the universe in a form suitable for mathematical analysis.
Yes indeed — using statistics and econometrics to make inferences you have to make lots of (mathematical) tractability assumptions. And especially since econometrics aspires to explain things in terms of causes and effects, it needs loads of assumptions, such as e.g. invariance, additivity and linearity.
Limiting model assumptions in economic science always have to be closely examined since if we are going to be able to show that the mechanisms or causes that we isolate and handle in our models are stable in the sense that they do not change when we ‘export’ them to our ‘target systems,’ we have to be able to show that they do not only hold under ceteris paribus conditions. If not, they are of limited value to our explanations and predictions of real economic systems.
Unfortunately, real world social systems are usually not governed by stable causal mechanisms or capacities. The kinds of ‘laws’ and relations that econometrics has established, are laws and relations about entities in models that presuppose causal mechanisms being invariant, atomistic and additive. But — when causal mechanisms operate in the real world they mostly do it in ever-changing and unstable ways. If economic regularities obtain they do so as a rule only because we engineered them for that purpose. Outside man-made ‘nomological machines’ they are rare, or even non-existant.
So — if we want to explain and understand real-world economies we should perhaps be a little bit more cautious with using universe specifications “suitable for mathematical analysis.”
It should be kept in mind, when we evaluate the application of statistics and econometrics, that the sharpness and power of these remarkable tools of inductive reasoning are bought by willingness to adopt a specification of the universe in a form suitable for mathematical analysis.
As emphasised by Greenland, can causality in social sciences — and economics — never solely be a question of statistical inference. Causality entails more than predictability, and to really in depth explain social phenomena require theory. Analysis of variation — the foundation of all econometrics — can never in itself reveal how these variations are brought about. First when we are able to tie actions, processes or structures to the statistical relations detected, can we say that we are getting at relevant explanations of causation.
Most facts have many different, possible, alternative explanations, but we want to find the best of all contrastive (since all real explanation takes place relative to a set of alternatives) explanations. So which is the best explanation? Many scientists, influenced by statistical reasoning, think that the likeliest explanation is the best explanation. But the likelihood of x is not in itself a strong argument for thinking it explains y. I would rather argue that what makes one explanation better than another are things like aiming for and finding powerful, deep, causal, features and mechanisms that we have warranted and justified reasons to believe in. Statistical — especially the variety based on a Bayesian epistemology — reasoning generally has no room for these kinds of explanatory considerations. The only thing that matters is the probabilistic relation between evidence and hypothesis.
Some statisticians and data scientists think that algorithmic formalisms somehow give them access to causality. That is, however, simply not true. Assuming ‘convenient’ things like faithfulness or stability is not to give proofs. It’s to assume what has to be proven. Deductive-axiomatic methods used in statistics do no produce evidence for causal inferences. The real casuality we are searching for is the one existing in the real-world around us. If there is no warranted connection between axiomatically derived theorems and the real-world, well, then we haven’t really obtained the causation we are looking for.
From a more theoretical perspective, Simpson’s paradox importantly shows that causality can never be reduced to a question of statistics or probabilities, unless you are — miraculously — able to keep constant all other factors that influence the probability of the outcome studied.
To understand causality we always have to relate it to a specific causal structure. Statistical correlations are never enough. No structure, no causality.
Simpson’s paradox is an interesting paradox in itself, but it can also highlight a deficiency in the traditional econometric approach towards causality. Say you have 1000 observations on men and an equal amount of observations on women applying for admission to university studies, and that 70% of men are admitted, but only 30% of women. Running a logistic regression to find out the odds ratios (and probabilities) for men and women on admission, females seem to be in a less favourable position (‘discriminated’ against) compared to males (male odds are 2.33, female odds are 0.43, giving an odds ratio of 5.44). But once we find out that males and females apply to different departments we may well get a Simpson’s paradox result where males turn out to be ‘discriminated’ against (say 800 male apply for economics studies (680 admitted) and 200 for physics studies (20 admitted), and 100 female apply for economics studies (90 admitted) and 900 for physics studies (210 admitted) — giving odds ratios of 0.62 and 0.37).
Econometric patterns should never be seen as anything else than possible clues to follow. From a critical realist perspective it is obvious that behind observable data there are real structures and mechanisms operating, things that are — if we really want to understand, explain and (possibly) predict things in the real world — more important to get hold of than to simply correlate and regress observable variables.
Math cannot establish the truth value of a fact. Never has. Never will.
Mainstream economists often hold the view that Keynes’ criticism of econometrics was the result of a sadly misinformed and misguided person who disliked and did not understand much of it.
This is, however, nothing but a gross misapprehension.
To be careful and cautious is not the same as to dislike. Keynes did not misunderstand the crucial issues at stake in the development of econometrics. Quite the contrary. He knew them all too well — and was not satisfied with the validity and philosophical underpinning of the assumptions made for applying its methods.
Keynes’ critique is still valid and unanswered in the sense that the problems he pointed at are still with us today and ‘unsolved.’ Ignoring them — the most common practice among applied econometricians — is not to solve them.
To apply statistical and mathematical methods to the real-world economy, the econometrician has to make some quite strong assumptions. In a review of Tinbergen’s econometric work — published in The Economic Journal in 1939 — Keynes gave a comprehensive critique of Tinbergen’s work, focusing on the limiting and unreal character of the assumptions that econometric analyses build on:
Completeness: Where Tinbergen attempts to specify and quantify which different factors influence the business cycle, Keynes maintains there has to be a complete list of all the relevant factors to avoid misspecification and spurious causal claims. Usually this problem is ‘solved’ by econometricians assuming that they somehow have a ‘correct’ model specification. Keynes is, to put it mildly, unconvinced:
It will be remembered that the seventy translators of the Septuagint were shut up in seventy separate rooms with the Hebrew text and brought out with them, when they emerged, seventy identical translations. Would the same miracle be vouchsafed if seventy multiple correlators were shut up with the same statistical material? And anyhow, I suppose, if each had a different economist perched on his a priori, that would make a difference to the outcome.
Homogeneity: To make inductive inferences possible — and being able to apply econometrics — the system we try to analyse has to have a large degree of ‘homogeneity.’ According to Keynes most social and economic systems — especially from the perspective of real historical time — lack that ‘homogeneity.’ As he had argued already in Treatise on Probability (ch. 22), it wasn’t always possible to take repeated samples from a fixed population when we were analysing real-world economies. In many cases there simply are no reasons at all to assume the samples to be homogenous. Lack of ‘homogeneity’ makes the principle of ‘limited independent variety’ non-applicable, and hence makes inductive inferences, strictly seen, impossible since one its fundamental logical premisses are not satisfied. Without “much repetition and uniformity in our experience” there is no justification for placing “great confidence” in our inductions (TP ch. 8).
And then, of course, there is also the ‘reverse’ variability problem of non-excitation: factors that do not change significantly during the period analysed, can still very well be extremely important causal factors.
Stability: Tinbergen assumes there is a stable spatio-temporal relationship between the variables his econometric models analyze. But as Keynes had argued already in his Treatise on Probability it was not really possible to make inductive generalisations based on correlations in one sample. As later studies of ‘regime shifts’ and ‘structural breaks’ have shown us, it is exceedingly difficult to find and establish the existence of stable econometric parameters for anything but rather short time series.
Measurability: Tinbergen’s model assumes that all relevant factors are measurable. Keynes questions if it is possible to adequately quantify and measure things like expectations and political and psychological factors. And more than anything, he questioned — both on epistemological and ontological grounds — that it was always and everywhere possible to measure real-world uncertainty with the help of probabilistic risk measures. Thinking otherwise can, as Keynes wrote, “only lead to error and delusion.”
Independence: Tinbergen assumes that the variables he treats are independent (still a standard assumption in econometrics). Keynes argues that in such a complex, organic and evolutionary system as an economy, independence is a deeply unrealistic assumption to make. Building econometric models from that kind of simplistic and unrealistic assumptions risk to produce nothing but spurious correlations and causalities. Real-world economies are organic systems for which the statistical methods used in econometrics are ill-suited, or even, strictly seen, inapplicable. Mechanical probabilistic models have little leverage when applied to non-atomic evolving organic systems — such as economies.
It is a great fault of symbolic pseudo-mathematical methods of formalising a system of economic analysis … that they expressly assume strict independence between the factors involved and lose all their cogency and authority if this hypothesis is disallowed; whereas, in ordinary discourse, where we are not blindly manipulating but know all the time what we are doing and what the words mean, we can keep “at the back of our heads” the necessary reserves and qualifications and the adjustments which we shall have to make later on, in a way in which we cannot keep complicated partial differentials “at the back” of several pages of algebra which assume that they all vanish.
Building econometric models can’t be a goal in itself. Good econometric models are means that make it possible for us to infer things about the real-world systems they ‘represent.’ If we can’t show that the mechanisms or causes that we isolate and handle in our econometric models are ‘exportable’ to the real-world, they are of limited value to our understanding, explanations or predictions of real-world economic systems.
The kind of fundamental assumption about the character of material laws, on which scientists appear commonly to act, seems to me to be much less simple than the bare principle of uniformity. They appear to assume something much more like what mathematicians call the principle of the superposition of small effects, or, as I prefer to call it, in this connection, the atomic character of natural law. The system of the material universe must consist, if this kind of assumption is warranted, of bodies which we may term (without any implication as to their size being conveyed thereby) legal atoms, such that each of them exercises its own separate, independent, and invariable effect, a change of the total state being compounded of a number of separate changes each of which is solely due to a separate portion of the preceding state …
The scientist wishes, in fact, to assume that the occurrence of a phenomenon which has appeared as part of a more complex phenomenon, may be some reason for expecting it to be associated on another occasion with part of the same complex. Yet if different wholes were subject to laws qua wholes and not simply on account of and in proportion to the differences of their parts, knowledge of a part could not lead, it would seem, even to presumptive or probable knowledge as to its association with other parts.
Linearity: To make his models tractable, Tinbergen assumes the relationships between the variables he study to be linear. This is still standard procedure today, but as as Keynes writes:
It is a very drastic and usually improbable postulate to suppose that all economic forces are of this character, producing independent changes in the phenomenon under investigation which are directly proportional to the changes in themselves; indeed, it is ridiculous.
To Keynes it was a ‘fallacy of reification’ to assume that all quantities are additive (an assumption closely linked to independence and linearity).
The unpopularity of the principle of organic unities shows very clearly how great is the danger of the assumption of unproved additive formulas. The fallacy, of which ignorance of organic unity is a particular instance, may perhaps be mathematically represented thus: suppose f(x) is the goodness of x and f(y) is the goodness of y. It is then assumed that the goodness of x and y together is f(x) + f(y) when it is clearly f(x + y) and only in special cases will it be true that f(x + y) = f(x) + f(y). It is plain that it is never legitimate to assume this property in the case of any given function without proof.
J. M. Keynes “Ethics in Relation to Conduct” (1903)
And as even one of the founding fathers of modern econometrics — Trygve Haavelmo — wrote:
What is the use of testing, say, the significance of regression coefficients, when maybe, the whole assumption of the linear regression equation is wrong?
Real-world social systems are usually not governed by stable causal mechanisms or capacities. The kinds of ‘laws’ and relations that econometrics has established, are laws and relations about entities in models that presuppose causal mechanisms and variables — and the relationship between them — being linear, additive, homogenous, stable, invariant and atomistic. But — when causal mechanisms operate in the real world they only do it in ever-changing and unstable combinations where the whole is more than a mechanical sum of parts. Since statisticians and econometricians — as far as I can see — haven’t been able to convincingly warrant their assumptions of homogeneity, stability, invariance, independence, additivity as being ontologically isomorphic to real-world economic systems, Keynes’ critique is still valid . As long as — as Keynes writes in a letter to Frisch in 1935 — “nothing emerges at the end which has not been introduced expressively or tacitly at the beginning,” I remain doubtful of the scientific aspirations of econometrics.
In his critique of Tinbergen, Keynes points us to the fundamental logical, epistemological and ontological problems of applying statistical methods to a basically unpredictable, uncertain, complex, unstable, interdependent, and ever-changing social reality. Methods designed to analyse repeated sampling in controlled experiments under fixed conditions are not easily extended to an organic and non-atomistic world where time and history play decisive roles.
Econometric modeling should never be a substitute for thinking. From that perspective it is really depressing to see how much of Keynes’ critique of the pioneering econometrics in the 1930s-1940s is still relevant today.
The general line you take is interesting and useful. It is, of course, not exactly comparable with mine. I was raising the logical difficulties. You say in effect that, if one was to take these seriously, one would give up the ghost in the first lap, but that the method, used judiciously as an aid to more theoretical enquiries and as a means of suggesting possibilities and probabilities rather than anything else, taken with enough grains of salt and applied with superlative common sense, won’t do much harm. I should quite agree with that. That is how the method ought to be used.
Keynes, letter to E.J. Broster, December 19, 1939
Which independent variables should be included in the equation? The goal is a “good fit” … How can a good fit be recognized? A popular measure for the satisfactoriness of a regression is the coefficient of determination, R2. If this number is large, it is said, the regression gives a good fit …
Nothing about R2 supports these claims. This statistic is best regarded as characterizing the geometric shape of the regression points and not much more.
The central difficulty with R2 for social scientists is that the independent variables are not subject to experimental manipulation. In some samples, they vary widely, producing large variance; in other cases, the observations are more tightly grouped and there is little dispersion. The variances are a function of the sample, not of the underlying relationship. Hence they cannot have any real connection to the “strength” of the relationship as social scientists ordinarily use the term, i. e., as a measure of how much effect a given change in independent variable has on the dependent variable …
Thus “maximizing R2” cannot be a reasonable procedure for arriving at a strong relationship. It neither measures causal power nor is comparable across samples … “Explaining variance” is not what social science is about.
Almost everything we do these days leaves some kind of data trace in some computer system somewhere. When such data is aggregated into huge databases it is called “Big Data”. It is claimed social science will be transformed by the application of computer processing and Big Data. The argument is that social science has, historically, been “theory rich” and “data poor” and now we will be able to apply the methods of “real science” to “social science” producing new validated and predictive theories which we can use to improve the world.
What’s wrong with this? … Firstly what is this “data” we are talking about? In it’s broadest sense it is some representation usually in a symbolic form that is machine readable and processable. And how will this data be processed? Using some form of machine learning or statistical analysis. But what will we find? Regularities or patterns … What do such patterns mean? Well that will depend on who is interpreting them …
Looking for “patterns or regularities” presupposes a definition of what a pattern is and that presupposes a hypothesis or model, i.e. a theory. Hence big data does not “get us away from theory” but rather requires theory before any project can commence.
What is the problem here? The problem is that a certain kind of approach is being propagated within the “big data” movement that claims to not be a priori committed to any theory or view of the world. The idea is that data is real and theory is not real. That theory should be induced from the data in a “scientific” way.
I think this is wrong and dangerous. Why? Because it is not clear or honest while appearing to be so. Any statistical test or machine learning algorithm expresses a view of what a pattern or regularity is and any data has been collected for a reason based on what is considered appropriate to measure. One algorithm will find one kind of pattern and another will find something else. One data set will evidence some patterns and not others. Selecting an appropriate test depends on what you are looking for. So the question posed by the thought experiment remains “what are you looking for, what is your question, what is your hypothesis?”
Ideas matter. Theory matters. Big data is not a theory-neutral way of circumventing the hard questions. In fact it brings these questions into sharp focus and it’s time we discuss them openly.