Transitivity — just another questionable assumption29 February, 2016 at 10:33 | Posted in Economics | 1 Comment
My doctor once recommended I take niacin for the sake of my heart. Yours probably has too, unless you’re a teenager or a marathon runner or a member of some other metabolically privileged caste. Here’s the argument: Consumption of niacin is correlated with higher levels of HDL, or “good cholesterol,” and high HDL is correlated with lower risk of “cardiovascular events.” If you’re not a native speaker of medicalese, that means people with plenty of good cholesterol are less likely on average to clutch their hearts and keel over dead.
But a large-scale trial carried out by the National Heart, Lung, and Blood Institute was halted in 2011, a year and a half before the scheduled finish, because the results were so weak it didn’t seem worth it to continue. Patients who got niacin did indeed have higher HDL levels, but they had just as many heart attacks and strokes as everybody else.
How can this be? Because correlation isn’t transitive. That is: Just because niacin is correlated with HDL, and high HDL is correlated with low risk of heart disease, you can’t conclude that niacin is correlated with low risk of heart disease.
Transitive relations are ones like “weighs more than.” If I weigh more than my son and my son weighs more than my daughter, it’s an absolute certainty that I weigh more than my daughter. “Lives in the same city as” is transitive, too—if I live in the same city as Bill, who lives in the same city as Bob, then I live in the same city as Bob.
But many of the most interesting relations we find in the world of data aren’t transitive. Correlation, for instance, is more like “blood relation.” I’m related to my son, who’s related to my wife, but my wife and I aren’t blood relatives. In fact, it’s not a terrible idea to think of correlated variables as “sharing part of their DNA.” Suppose I run a boutique money management firm with just three investors, Laura, Sara, and Tim. Their stock positions are pretty simple: Laura’s fund is split 50–50 between Facebook and Google, Tim’s is one-half General Motors and one-half Honda, and Sara, poised between old economy and new, goes one-half Honda, one-half Facebook. It’s pretty obvious that Laura’s returns will be positively correlated with Sara’s; they have half their portfolio in common. And the correlation between Sara’s returns and Tim’s will be equally strong. But there’s no reason (except insofar as the whole stock market tends to move in concert) to think Tim’s performance has to be correlated with Laura’s. Those two funds are like the parents, each contributing one-half of their “genetic material” to form Sara’s hybrid fund.