RCTs – not so much of a “gold standard” after all
14 February, 2013 at 08:51 | Posted in Statistics & Econometrics | 1 CommentControlled experiments are the gold standard in science for proving causality. The FDA, for example, requires controlled experiments (randomized clinical trials) for approving drugs. In the software world, online controlled experiments are being used heavily to make data-driven decisions, especially in areas where the forefront of knowledge is being pushed …
The statistical theory of controlled experiments is well understood, but the devil is in the details and the difference between theory and practice is greater in practice than in theory. We have shared five puzzling experiment outcomes, which we were able to analyze deeply and explain …
Generalizing from these puzzles, we see two themes. One is that instrumentation is not as precise as we would like it to be, interacting in subtle ways with experiments …A second theme is that lessons from offline experiments don’t always map well online …
Anyone can run online controlled experiments and generate numbers with six digits after the decimal point. It’s easy to generate p-values and beautiful 3D graphs of trends over time. But the real challenge is in understanding when the results are invalid, not at the sixth decimal place, but before the decimal point, or even at the plus/minus for the percent effect; that’s what these analyses did to the initial results. We hope we’ve managed to shed light on puzzling outcomes and we encourage others to drill deep and share other similar results. Generating numbers is easy; generating numbers you should trust is hard!
h/t Andrew Gelman
1 Comment »
RSS feed for comments on this post. TrackBack URI
Leave a Reply
Blog at WordPress.com. | Theme: Pool by Borja Fernandez.
Entries and comments feeds.
The statistical theory of controlled experiments is well understood, but the devil is in the details and the difference between theory and practice is greater in practice than in theory. We have shared five puzzling experiment outcomes, which we were able to analyze deeply and explain …



They are treating time series like if it was a simple random variable. This way they have the illusion that over time they have more and more accurate inferences which is obviously false.
This is like if I take all the daily temperatures for 2012 in Stockholm to estimate the temperature today and I publish the “puzzling” result that the temperature is outside the 95% CI using 1/sqrt(365) as factor to estimate it…. hahaha.. oh my God.
I give you something; the article is “puzzling”
Comment by Francisco Urbano García— 14 February, 2013 #