Bayesianism and noninformative priors – lost causes (wonkish)14 July, 2013 at 12:00 | Posted in Statistics & Econometrics, Theory of Science & Methodology | Leave a comment
I like to say that noninformative priors are the perpetual motion machines of statistics. Everyone wants one but they don’t exist.
By definition, a prior represents information. So it should come as no surprise that a prior cannot represent lack of information.
The first “noninformative prior” was of course the flat prior. The major flaw with this prior is lack of invariance: if it is flat in one parameterization it will not be flat in most other parameterizations. Flat prior have lots of other problems too. See my earlier post here.
The most famous noninformative prior (I’ll stop putting quotes around the phrase from now on) is Jeffreys prior which is proportional to the square root of the determinant of the Fisher information matrix. While this prior is invariant, it can still have undesirable properties …
A more fundamental question is: what does it mean for a prior to be noninformative? Of course, people have argued about this for many, many years. One definition, which has the virtue of being somewhat precise, is that a prior is noninformative if the posterior regions have frequentist coverage equal (approximately) to . These are sometimes called “matching priors.”
In general, it is hard to construct matching priors especially in high-dimensional complex models. But matching priors raise a fundamental question: if your goal is to match frequentist coverage, why bother with Bayes at all? Just use a frequentist confidence interval.
These days I think that most people agree that the virtue of Bayesian methods is that it gives you a way to include prior information in a systematic way. There is no reason to formulate a “noninformative prior.”
On the other hand, in practice, we often deal with very complex, high-dimensional models. Can we really formulate a meaningful informative prior in such problems? And if we do, will anyone care about our inferences?
In 1996, I wrote a review paper with Rob Kass on noninformative priors (Kass, Robert E and Wasserman, Larry. (1996). The selection of prior distributions by formal rules. Journal of the American Statistical Association, 91, 1343-1370) … One of our conclusions was:
“We conclude that the problems raised by the research on priors chosen by formal rules are serious and may not be dismissed lightly: When sample sizes are small (relative the number of parameters being estimated), it is dangerous to put faith in any default solution; but when asymptotics take over, Jeffreys’s rules and their variants remain reasonable choices.”
Looking at this almost twenty years later, the one thing that has changed is the “the number of parameters being estimated” which these days is often very, very large.
My conclusion: noninformative priors are a lost cause.