Sparse Variational Inference: Bayesian Coresets from Scratch
The proliferation of automated inference algorithms in Bayesian statistics has provided practitioners newfound access to fast, reproducible data analysis and powerful statistical models. Designing automated methods that are also both computationally scalable and theoretically sound, however, remains a significant challenge. Recent work on Bayesian coresets takes the approach of compressing the dataset before running a standard inference algorithm, providing both scalability and guarantees on posterior approximation error. But the automation of past coreset methods is limited because they depend on the availability of a reasonable coarse posterior approximation, which is difficult to specify in practice. In the present work we remove this requirement by formulating coreset construction as sparsity-constrained variational inference within an exponential family. This perspective leads to a novel construction via greedy optimization, and also provides a unifying information-geometric view of present and past methods. The proposed Riemannian coreset construction algorithm is fully automated, requiring no inputs aside from the dataset, probabilistic model, desired coreset size, and sample size used for Monte Carlo estimates. In addition to being easier to use than past methods, experiments demonstrate that the proposed algorithm achieves state-of-the-art Bayesian dataset summarization.
READ FULL TEXT