The Bernstein–Von Mises Theorem
The Bernstein–Von Mises Theorem is a fundamental result in the field of Bayesian statistics, a branch of statistics that interprets probability as a measure of belief or confidence instead of frequency. This theorem provides a formal justification for the use of Bayesian methods by establishing a connection between Bayesian inference and frequentist inference under certain conditions. Named after Russian mathematician Sergei Natanovich Bernstein and German mathematician Richard von Mises, the theorem is sometimes referred to as the Bayesian Central Limit Theorem.
Understanding the Bernstein–Von Mises Theorem
The Bernstein–Von Mises Theorem states that, under some regularity conditions, the posterior distribution of a parameter given the observed data converges to a normal distribution centered at the true parameter value, as the sample size approaches infinity. This convergence is in the sense of weak convergence of probability measures, which implies that the posterior distribution becomes increasingly concentrated around the true parameter value, with the variance shrinking at a rate proportional to the inverse of the sample size.
The theorem has important implications for Bayesian inference because it implies that, for large sample sizes, the Bayesian posterior distribution is approximately normal regardless of the prior distribution, provided that the prior assigns positive probability to neighborhoods of the true parameter. This result allows Bayesian methods to yield estimates and credible intervals that are similar to those obtained from classical frequentist methods, such as confidence intervals derived from maximum likelihood estimation.
Conditions for the Bernstein–Von Mises Theorem
The Bernstein–Von Mises Theorem holds under certain conditions that ensure the regularity and identifiability of the statistical model. These conditions typically include:
- Smoothness: The likelihood function should be sufficiently smooth as a function of the parameter.
- Identifiability: The true parameter value should be identifiable; that is, different parameter values should lead to different probability distributions of the data.
- Consistency: The prior distribution should be consistent with the data-generating process, assigning positive probability to all open sets containing the true parameter value.
- Asymptotic Normality: The likelihood function should satisfy certain conditions that guarantee the asymptotic normality of the maximum likelihood estimator.
When these conditions are met, the Bernstein–Von Mises Theorem assures that the posterior distribution behaves asymptotically like a frequentist sampling distribution, thus providing a bridge between Bayesian and frequentist approaches to statistical inference.
Implications and Limitations
The Bernstein–Von Mises Theorem has several important implications:
- It justifies the use of Bayesian credible intervals as approximations to frequentist confidence intervals for large sample sizes.
- It provides a rationale for the use of noninformative or vague priors in Bayesian analysis, as the influence of the prior diminishes with increasing sample size.
- It supports the use of Bayesian methods in settings where frequentist properties are desired, as the theorem ensures that Bayesian estimators are asymptotically unbiased and efficient.
However, the theorem also has limitations:
- The theorem does not apply to all Bayesian models, particularly those with non-regular likelihoods or improper priors.
- In finite samples, the choice of prior can still have a significant impact on the posterior distribution, and the Bernstein–Von Mises Theorem does not provide guidance on prior selection in such cases.
- The theorem does not address the computational challenges of Bayesian inference, such as the difficulty of calculating posterior distributions in complex models.
Conclusion
The Bernstein–Von Mises Theorem is a cornerstone of Bayesian statistics, providing a theoretical foundation for the convergence of Bayesian posterior distributions to normal distributions in large samples. By establishing the asymptotic equivalence of Bayesian and frequentist inference under certain conditions, the theorem offers a compelling argument for the use of Bayesian methods in statistical practice. However, statisticians must be aware of its conditions and limitations, especially when dealing with finite samples or non-regular models.