# The Sketched Wasserstein Distance for mixture distributions

The Sketched Wasserstein Distance (W^S) is a new probability distance specifically tailored to finite mixture distributions. Given any metric d defined on a set ๐ of probability distributions, W^S is defined to be the most discriminative convex extension of this metric to the space ๐ฎ = conv(๐) of mixtures of elements of ๐. Our representation theorem shows that the space (๐ฎ, W^S) constructed in this way is isomorphic to a Wasserstein space over ๐ณ = (๐, d). This result establishes a universality property for the Wasserstein distances, revealing them to be uniquely characterized by their discriminative power for finite mixtures. We exploit this representation theorem to propose an estimation methodology based on KantorovichโRubenstein duality, and prove a general theorem that shows that its estimation error can be bounded by the sum of the errors of estimating the mixture weights and the mixture components, for any estimators of these quantities. We derive sharp statistical properties for the estimated W^S in the case of p-dimensional discrete K-mixtures, which we show can be estimated at a rate proportional to โ(K/N), up to logarithmic factors. We complement these bounds with a minimax lower bound on the risk of estimating the Wasserstein distance between distributions on a K-point metric space, which matches our upper bound up to logarithmic factors. This result is the first nearly tight minimax lower bound for estimating the Wasserstein distance between discrete distributions. Furthermore, we construct โ(N) asymptotically normal estimators of the mixture weights, and derive a โ(N) distributional limit of our estimator of W^S as a consequence. Simulation studies and a data analysis provide strong support on the applicability of the new Sketched Wasserstein Distance.

READ FULL TEXT