Approximate Bayesian computation with the Wasserstein distance
A growing number of generative statistical models do not permit the numerical evaluation of their likelihood functions. Approximate Bayesian computation (ABC) has become a popular approach to overcome this issue, in which one simulates synthetic data sets given parameters and compares summaries of these data sets with the corresponding observed values. We propose to avoid the use of summaries and the ensuing loss of information by instead using the Wasserstein distance between the empirical distributions of the observed and synthetic data. This generalizes the well-known approach of using order statistics within ABC to arbitrary dimensions. We describe how recently developed approximations of the Wasserstein distance allow the method to scale to realistic data sizes, and propose a new distance based on the Hilbert space-filling curve. We provide a theoretical study of the proposed method, describing consistency as the threshold goes to zero while the observations are kept fixed, and concentration properties as the number of observations grows. Various extensions to time series data are discussed. The approach is illustrated on various examples, including univariate and multivariate g-and-k distributions, a toggle switch model from systems biology, a queueing model, and a Lévy-driven stochastic volatility model.
READ FULL TEXT