The way forward — discard 90% of the data!21 June, 2015 at 20:02 | Posted in Statistics & Econometrics | 1 Comment
Could it be better to discard 90% of the reported research? Surprisingly, the answer is yes to this statistical paradox. This paper has shown how publication selection can greatly distort the research record and its conventional summary statistics. Using both Monte Carlo simulations and actual research examples, we show how a simple estimator, which uses only 10 percent of the reported research reduces publication bias and improves efficiency over conventional summary statistics that use all the reported research.
The average of the most precise 10 percent, ‘Top10,’ of the reported estimates of a given empirical phenomenon is often better than conventional summary estimators because of its heavy reliance on the reported estimate’s precision (i.e., the inverse of the estimate’s standard error). When estimates are chosen, in part, for their statistical significance, studies cursed with imprecise estimates have to engage in more intense selection from among alternative statistical techniques, models, data sets, and measures to produce the larger estimate that statistical significance demands. Thus, imprecise estimates will contain larger biases.
Studies that have access to more data will tend to be more precise, and hence less biased. At the level of the original empirical research, the statistician’s motto, “the more data the better,” holds because more data typically produce more precise estimates. It is only at the meta-level of integrating, summarizing, and interpreting an entire area of empirical research (meta-analysis), where the removal of 90% of the data might actually improve our empirical knowledge. Even when the authors of these larger and more precise studies actively select for statistical significance in the desired direction, smaller significant estimates will tend to be reported. Thus, precise studies will, on average, be less biased and thereby possess greater scientific quality, ceteris paribus.
We hope that the statistical paradox identified in this paper refocuses the empirical sciences upon precision. Precision should be universally adopted as one criterion of research quality, regardless of other statistical outcomes.