High Probability Lower Bounds for the Total Variation Distance

05/12/2020
by   Loris Michel, et al.
0

The statistics and machine learning communities have recently seen a growing interest in classification-based approaches to two-sample testing (e.g. Kim et al. [2016]; Rosenblatt et al. [2016]; Lopez-Paz and Oquab [2017]; Hediger et al. [2019]). The outcome of a classification-based two-sample test remains a rejection decision, which is not always informative since the null hypothesis is seldom strictly true. Therefore, when a test rejects, it would be beneficial to provide an additional quantity serving as a refined measure of distributional difference. In this work, we introduce a framework for the construction of high-probability lower bounds on the total variation distance. These bounds are based on a one-dimensional projection, such as a classification or regression method, and can be interpreted as the minimal fraction of samples pointing towards a distributional difference. We further derive asymptotic power and detection rates of two proposed estimators and discuss potential uses through an application to a reanalysis climate dataset.

READ FULL TEXT
research
12/12/2022

Lower Bounds for the Total Variation Distance Given Means and Variances of Distributions

For arbitrary two probability measures on real d-space with given means ...
research
09/02/2021

Lower Bounds on the Total Variation Distance Between Mixtures of Two Gaussians

Mixtures of high dimensional Gaussian distributions have been studied ex...
research
06/29/2018

Guaranteed Deterministic Bounds on the Total Variation Distance between Univariate Mixtures

The total variation distance is a core statistical distance between prob...
research
06/19/2023

Minimax optimal testing by classification

This paper considers an ML inspired approach to hypothesis testing known...
research
05/08/2019

Bounding distributional errors via density ratios

We present some new and explicit error bounds for the approximation of d...
research
02/25/2023

Data-Copying in Generative Models: A Formal Framework

There has been some recent interest in detecting and addressing memoriza...
research
09/07/2023

Total Variation Floodgate for Variable Importance Inference in Classification

Inferring variable importance is the key problem of many scientific stud...

Please sign up or login with your details

Forgot password? Click here to reset