Fast Learning of Clusters and Topics via Sparse Posteriors

09/23/2016
by   Michael C. Hughes, et al.
0

Mixture models and topic models generate each observation from a single cluster, but standard variational posteriors for each observation assign positive probability to all possible clusters. This requires dense storage and runtime costs that scale with the total number of clusters, even though typically only a few clusters have significant posterior mass for any data point. We propose a constrained family of sparse variational distributions that allow at most L non-zero entries, where the tunable threshold L trades off speed for accuracy. Previous sparse approximations have used hard assignments (L=1), but we find that moderate values of L>1 provide superior performance. Our approach easily integrates with stochastic or incremental optimization algorithms to scale to millions of examples. Experiments training mixture models of image patches and topic models for news articles show that our approach produces better-quality models in far less time than baseline methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/18/2019

Boltzmann Exploration Expectation-Maximisation

We present a general method for fitting finite mixture models (FMM). Lea...
research
05/23/2019

Posterior Distribution for the Number of Clusters in Dirichlet Process Mixture Models

Dirichlet process mixture models (DPMM) play a central role in Bayesian ...
research
11/12/2020

MCMC computations for Bayesian mixture models using repulsive point processes

Repulsive mixture models have recently gained popularity for Bayesian cl...
research
10/31/2016

Flexible Models for Microclustering with Application to Entity Resolution

Most generative models for clustering implicitly assume that the number ...
research
12/02/2015

Microclustering: When the Cluster Sizes Grow Sublinearly with the Size of the Data Set

Most generative models for clustering implicitly assume that the number ...
research
06/23/2022

Quantifying Distances Between Clusters with Elliptical or Non-Elliptical Shapes

Finite mixture models that allow for a broad range of potentially non-el...
research
11/09/2017

Can clustering scale sublinearly with its clusters? A variational EM acceleration of GMMs and k-means

One iteration of k-means or EM for Gaussian mixture models (GMMs) scales...

Please sign up or login with your details

Forgot password? Click here to reset