Factor analysis — like telling time with a stopped clock

14 March, 2017 at 17:33 | Posted in Statistics & Econometrics | 1 Comment

even-a-stopped-clock-detailExploratory factor analysis exploits correlations to summarize data, and confirmatory factor analysis — stuff like testing that the right partial correlations vanish — is a prudent way of checking whether a model with latent variables could possibly be right. What the modern g-mongers do, however, is try to use exploratory factor analysis to uncover hidden causal structures. I am very, very interested in the latter pursuit, and if factor analysis was a solution I would embrace it gladly. But if factor analysis was a solution, when my students asked me (as they inevitably do) “so, how do we know how many factors we need?”, I would be able to do more than point them to rules of thumb based on squinting at “scree plots” like this and guessing where the slope begins. (There are ways of estimating the intrinsic dimension of noisily-sampled manifolds, but that’s not at all the same.) More broadly, factor analysis is part of a larger circle of ideas which all more or less boil down to some combination of least squares, linear regression and singular value decomposition, which are used in the overwhelming majority of work in quantitative social science, including, very much, work which tries to draw causal inferences without the benefit of experiments. A natural question — but one almost never asked by users of these tools — is whether they are reliable instruments of causal inference. The answer, unequivocally, is “no”.

I will push extra hard, once again, Clark Glymour’s paper on The Bell Curve, which patiently explains why these tools are just not up to the job of causal inference … The conclusions people reach with such methods may be right and may be wrong, but you basically can’t tell which from their reports, because their methods are unreliable.

This is why I said that using factor analysis to find causal structure is like telling time with a stopped clock. It is, occasionally, right. Maybe the clock stopped at 12, and looking at its face inspires you to look at the sun and see that it’s near its zenith, and look at shadows and see that they’re short, and confirm that it’s near noon. Maybe you’d not have thought to do those things otherwise; but the clock gives no evidence that it’s near noon, and becomes no more reliable when it’s too cloudy for you to look at the sun.

Cosma Shalizi


1 Comment »

RSS feed for comments on this post. TrackBack URI

  1. 1 stopped clock is right twice a day. Just add more data, analyse thousands of stopped clocks you’ll be right always. But your understanding of how a clock works has not increased. With enough data, with enough variables you are eventually guaranteed to find very nice linear relations in your data.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

Create a free website or blog at WordPress.com.
Entries and comments feeds.