Distribution Compression in Near-linear Time

by   Abhishek Shetty, et al.

In distribution compression, one aims to accurately summarize a probability distribution ℙ using a small number of representative points. Near-optimal thinning procedures achieve this goal by sampling n points from a Markov chain and identifying √(n) points with 𝒪(1/√(n)) discrepancy to ℙ. Unfortunately, these algorithms suffer from quadratic or super-quadratic runtime in the sample size n. To address this deficiency, we introduce Compress++, a simple meta-procedure for speeding up any thinning algorithm while suffering at most a factor of 4 in error. When combined with the quadratic-time kernel halving and kernel thinning algorithms of Dwivedi and Mackey (2021), Compress++ delivers √(n) points with 𝒪(√(log n/n)) integration error and better-than-Monte-Carlo maximum mean discrepancy in 𝒪(n log^3 n) time and 𝒪( √(n)log^2 n ) space. Moreover, Compress++ enjoys the same near-linear runtime given any quadratic-time input and reduces the runtime of super-quadratic algorithms by a square-root factor. In our benchmarks with high-dimensional Monte Carlo samples and Markov chains targeting challenging differential equation posteriors, Compress++ matches or nearly matches the accuracy of its input algorithm in orders of magnitude less time.



There are no comments yet.


page 1

page 2

page 3

page 4


Kernel Thinning

We introduce kernel thinning, a new procedure for compressing a distribu...

Practical Low-Dimensional Halfspace Range Space Sampling

We develop, analyze, implement, and compare new algorithms for creating ...

Generalized Kernel Thinning

The kernel thinning (KT) algorithm of Dwivedi and Mackey (2021) compress...

On the Sampling Problem for Kernel Quadrature

The standard Kernel Quadrature method for numerical integration with ran...

A local graph rewiring algorithm for sampling spanning trees

We introduce a Markov Chain Monte Carlo algorithm which samples from the...

On the Computational Complexity of Geometric Langevin Monte Carlo

Manifold Markov chain Monte Carlo algorithms have been introduced to sam...

An open question about powers of log(n) in quasi-Monte Carlo

The commonly quoted error rate for QMC integration with an infinite low ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.