Depth-based pseudo-metrics between probability distributions

03/23/2021
by   Guillaume Staerman, et al.
0

Data depth is a non parametric statistical tool that measures centrality of any element x∈ℝ^d with respect to (w.r.t.) a probability distribution or a data set. It is a natural median-oriented extension of the cumulative distribution function (cdf) to the multivariate case. Consequently, its upper level sets – the depth-trimmed regions – give rise to a definition of multivariate quantiles. In this work, we propose two new pseudo-metrics between continuous probability measures based on data depth and its associated central regions. The first one is constructed as the Lp-distance between data depth w.r.t. each distribution while the second one relies on the Hausdorff distance between their quantile regions. It can further be seen as an original way to extend the one-dimensional formulae of the Wasserstein distance, which involves quantiles and cdfs, to the multivariate space. After discussing the properties of these pseudo-metrics and providing conditions under which they define a distance, we highlight similarities with the Wasserstein distance. Interestingly, the derived non-asymptotic bounds show that in contrast to the Wasserstein distance, the proposed pseudo-metrics do not suffer from the curse of dimensionality. Moreover, based on the support function of a convex body, we propose an efficient approximation possessing linear time complexity w.r.t. the size of the data set and its dimension. The quality of this approximation as well as the performance of the proposed approach are illustrated in experiments. Furthermore, by construction the regions-based pseudo-metric appears to be robust w.r.t. both outliers and heavy tails, a behavior witnessed in the numerical experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/11/2018

Multivariate Brenier cumulative distribution functions and their application to non-parametric testing

In this work we introduce a novel approach of construction of multivaria...
research
05/23/2017

Ambiguity set and learning via Bregman and Wasserstein

Construction of ambiguity set in robust optimization relies on the choic...
research
04/19/2020

A Universal Approximation Theorem of Deep Neural Networks for Expressing Distributions

This paper studies the universal approximation property of deep neural n...
research
07/26/2021

High-Dimensional Distribution Generation Through Deep Neural Networks

We show that every d-dimensional probability distribution of bounded sup...
research
12/20/2017

Coalgebraic Behavioral Metrics

We study different behavioral metrics, such as those arising from both b...
research
05/13/2020

The Equivalence of Fourier-based and Wasserstein Metrics on Imaging Problems

We investigate properties of some extensions of a class of Fourier-based...
research
01/03/2021

Distributionally robust halfspace depth

Tukey's halfspace depth can be seen as a stochastic program and as such ...

Please sign up or login with your details

Forgot password? Click here to reset