The Bernstein–Von Mises Theorem

What is the Bernstein–Von Mises Theorem?

The Bernstein–Von Mises theorem states that the more data available for sampling, the less influence the prior probability has on a prediction model, at least for common Bayesian inference models that have a specific set of constraints. As the data pool expands, the posterior distribution grows more independent of the prior probability assumption and the posterior’s curve looks just like (is asymptotic to) a sampling distribution drawn from a maximum likelihood estimator.

While this is a straightforward theory that appears to eliminate the need to create a prior probability, the list of specific conditions required for this rule to apply is not always easy to satisfy:

  • The maximum likelihood estimator must be consistent throughout the model.
  • The model needs a finite and fixed number of parameters. Infinite dimensional (non-parametric) parameters, such as unknown functions, will not work.
  • Cromwell’s rule must apply to the prior and posterior probability, so the parameter’s value lies on the interior of the parameter space (neither 0 nor 1).

  • Prior density must also be non-zero.
  • The log-likelihood needs to be consistent throughout the model. 
  • All samples in the database must be unique and independent, free of any selection bias and each gathered by the same experiment design. In general, this is harder to achieve the larger the database grows, since most “big data” libraries are collections of smaller databases gathered at different times with different selection criteria.