Anchored Bayesian Gaussian Mixture Models
Finite Gaussian mixtures are a flexible modeling tool for irregularly shaped densities and samples from heterogeneous populations. When modeling with mixtures using an exchangeable prior on the component features, the component labels are arbitrary and are indistinguishable in posterior analysis. This makes it impossible to attribute any meaningful interpretation to the marginal posterior distributions of the component features. We present an alternative to the exchangeable prior: by assuming that a small number of latent class labels are known a priori, we can make inference on the component features without post-processing. Our method assigns meaning to the component labels at the modeling stage and can be justified as a data-dependent informative prior on the labelings. We show that our method produces interpretable results, often (but not always) similar to those resulting from relabeling algorithms, with the added benefit that the marginal inferences originate directly from a well specified probability model rather than a post hoc manipulation. We provide practical guidelines for model selection that are motivated by maximizing prior information about the class labels and we demonstrate our method on real and simulated data.
READ FULL TEXT