## The limited epistemic value of ‘variation analysis’

23 May, 2023 at 07:20 | Posted in Statistics & Econometrics | 8 Comments

While appeal to R squared is a common rhetorical device, it is a very tenuous connection to any plausible explanatory virtues for many reasons. Either it is meant to be merely a measure of predictability in a given data set or it is a measure of causal influence. In either case it does not tell us much about explanatory power. Taken as a measure of predictive power, it is limited in that it predicts variances only. But what we mostly want to predict is levels, about which it is silent. In fact, two models can have exactly the same R squared and yet describe regression lines with very different slopes, the natural predictive measure of levels. Furthermore even in predicting variance, it is entirely dependent on the variance in the sample—if a covariate shows no variation, then it cannot predict anything. This leads to getting very different measures of explanatory power across samples for reasons not having any obvious connection to explanation.

Taken as a measure of causal explanatory power, R squared does not fare any better. The problem of explaining variances rather than levels shows up here as well—if it measures causal influence, it has to be influences on variances. But we often do not care about the causes of variance in economic variables but instead about the causes of levels of those variables about which it is silent. Similarly, because the size of R squared varies with variance in the sample, it can find a large effect in one sample and none in another for arbitrary, noncausal reasons. So while there may be some useful epistemic roles for R squared, measuring explanatory power is not one of them.

Harold Kincaid

Although in a somewhat different context, Jon Elster makes basically the same observation as Kincaid:

Consider two elections, A and B. For each of them, identify the events that cause a given percentage of voters to turn out. Once we have thus explained the turnout in election A and the turnout in election B, the explanation of the difference (if any) follows automatically, as a by-product. As a bonus, we might be able to explain whether identical turnouts in A and B are accidental, that is, due to differences that exactly offset each other, or not. In practice, this procedure might be too demanding. The data or he available theories might not allow us to explain the phenomena “in and of themselves.” We should be aware, however, that if we do resort to explanation of variation, we are engaging in a second-best explanatory practice.

Modern econometrics is fundamentally based on assuming — usually without any explicit justification — that we can gain causal knowledge by considering independent variables that may have an impact on the variation of a dependent variable. As argued by both Kincaid and Elster, this is, however, far from self-evident. Often the fundamental causes are constant forces that are not amenable to the kind of analysis econometrics supplies us with. As Stanley Lieberson has it in Making It Count:

One can always say whether, in a given empirical context, a given variable or theory accounts for more variation than another. But it is almost certain that the variation observed is not universal over time and place. Hence the use of such a criterion first requires a conclusion about the variation over time and place in the dependent variable. If such an analysis is not forthcoming, the theoretical conclusion is undermined by the absence of information …

Moreover, it is questionable whether one can draw much of a conclusion about causal forces from simple analysis of the observed variation … To wit, it is vital that one have an understanding, or at least a working hypothesis, about what is causing the event per se; variation in the magnitude of the event will not provide the answer to that question.

Trygve Haavelmo was making a somewhat similar point back in 1941 when criticizing the treatment of the interest variable in Tinbergen’s regression analyses. The regression coefficient of the interest rate variable being zero was according to Haavelmo not sufficient for inferring that “variations in the rate of interest play only a minor role, or no role at all, in the changes in investment activity.” Interest rates may very well play a decisive indirect role by influencing other causally effective variables. And:

the rate of interest may not have varied much during the statistical testing period, and for this reason the rate of interest would not “explain” very much of the variation in net profit (and thereby the variation in investment) which has actually taken place during this period. But one cannot conclude that the rate of influence would be inefficient as an autonomous regulator, which is, after all, the important point.

This problem of ‘nonexcitation’ — when there is too little variation in a variable to say anything about its potential importance, and we can’t identify the reason for the factual influence of the variable being ‘negligible’ — strongly confirms that causality in economics and other social sciences can never solely be a question of statistical inference. Causality entails more than predictability, and to really in-depth explain social phenomena requires theory.

Analysis of variation — the foundation of all econometrics — can never in itself reveal how these variations are brought about. First, when we are able to tie actions, processes, or structures to the statistical relations detected, can we say that we are getting at relevant explanations of causation. Too much in love with axiomatic-deductive modelling, neoclassical economists especially tend to forget that accounting for causation — how causes bring about their effects — demands deep subject-matter knowledge and acquaintance with the intricate fabrics and contexts. As Keynes already argued in his A Treatise on Probability, statistics (and econometrics) should primarily be seen as means to describe patterns of associations and correlations, means that we may use as suggestions of possible causal relations. Forgetting that, economists will continue to be stuck with a second-best explanatory practice.

1. For several reasons I view proportional-variance measures like R-squared and correlation coefficients as fundamentally defective and often misleading for causal explanation, worthy of demotion in emphasis if not banishment from discussion of statistical results. Some of these defects are described above, but there are more. Furthermore, these defects extend to standardized regression coefficients which supposedly measure effects of levels.

One defect, described by Tukey as far back as 1954 and noted by Kincaid above, is that such measures confound scientifically irrelevant quirks of the study design and material with the effects of interest to the contextual theory or practical aims. As a result these measures can create spurious appearances of differences in some effects and mask real differences in other effects, even within the same study. We reviewed and illustrated these problems in
Greenland S, Schlesselman JJ, Criqui MH. The fallacy of employing standardized regression coefﬁcients and correlations as measures of effect. American Journal of Epidemiology 1986;123:203–208.
Greenland S, Maclure M, Schlesselman JJ, Poole C, Morgenstern H. Standardized regression coefﬁcients: a further critique and review of some alternatives. Epidemiology 1991;2:387–392.

Another defect is that variance is not a sufficient measure of variation outside of Gaussian (normal) linear models. Interchanging the very specific measure of variance with the very general concept of variation reflects a dangerous misidentification, one which developed from a time when it was computationally difficult to use anything other than such simple models. The unlimited error such misidentification can produce even with Gaussian variates is easily seen in the relation
y = x^2 + u with x observed, u unobserved, and x and u independent mean-zero Gaussian with unit variances: Here, the observable systematic variation in y is considerable and entirely explainable by variation in x, yet the correlation of y with x is zero.

Finally, R-squared and correlation measures can suffer from severe yet overlooked range restrictions when applied to non-Gaussian variables and nonlinear transforms of multivariate-Gaussian variables, as illustrated in
Greenland S. A lower bound for the correlation of exponentiated bivariate normal pairs. The American Statistician 1996;50:163–164,
which cites previous examples of bizarre yet routinely unnoticed behavior of such measures.

All in all, I regard R-squared and correlation measures as narrow mathematical quantities which, through an unfortunate historical overemphasis on Gaussian linear models, became misidentified with supremely important concepts of systematic relation, causal explanation, and contextual relevance. Thus I maintain that, in health, medical, and social sciences, discussions of contextual meanings of statistical results ought to be purged of these measures, just as surely as they ought to be purged of misleading terms like “statistical significance” and its allied misuse of “confidence” for what are mere measures of compatibility of data with models, as argued in these free to read articles:
Greenland S, Mansournia M, Joffe M. To curb research misreporting, replace significance and confidence by compatibility. Preventive Medicine 2022;164.
Rafi Z, Greenland S. Semantic and cognitive tools to aid statistical science: Replace confidence and significance by compatibility and surprise. BMC Medical Research Methodology, 20, 244.
Amrhein V, Greenland S. Discuss practical importance of results based on interval estimates and p-value functions, not only on point estimates and null p-values. Journal of Information Technology 2022; 37: 316-320.

• Sander, thank you for these, as always, learned and insightful comments and links! And I certainly do agree that the defects “extend to standardized regression coefficients” as well.

2. The value of R-squared can be negative. If a value of R-squared is 0.25, it implies that up to 25% of one variable in a pair might be caused by the other variable. The causation could be anything from 0% up to 25%. But the negative is often more important: the inference is that 75% of one variable cannot be caused by the other.
– – John Lounsbury

3. Funny, because this was in some ways Lucas’ point

• Please elaborate. In what way?

• causes are constant forces that are not amenable to the kind of analysis econometrics supplies us with…. The Lucas critique was without the identification of an invariant the R square was just statistical noise

• Alright, I see the connection even though I find it rather vague.

• It’s not vague at all! The only ambiguity is that Lucas was not interested in statistical modeling perse! His insight was in a world with data generating processes which are noisy but relatively consistent, business cycles, what’s the causal link in it! The rational expectation individual