Interpretable Distribution Features with Maximum Testing Power

05/22/2016
by   Wittawat Jitkrittum, et al.
0

Two semimetrics on probability distributions are proposed, given as the sum of differences of expectations of analytic functions evaluated at spatial or frequency locations (i.e, features). The features are chosen so as to maximize the distinguishability of the distributions, by optimizing a lower bound on test power for a statistical test using these features. The result is a parsimonious and interpretable indication of how and where two distributions differ locally. An empirical estimate of the test power criterion converges with increasing sample size, ensuring the quality of the returned features. In real-world benchmarks on high-dimensional text and image data, linear-time tests using the proposed semimetrics achieve comparable performance to the state-of-the-art quadratic-time maximum mean discrepancy test, while returning human-interpretable features that explain the test results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/19/2019

Comparing distributions: ℓ_1 geometry improves kernel two-sample testing

Are two sets of observations drawn from the same distribution? This prob...
research
10/15/2016

An Adaptive Test of Independence with Analytic Kernel Embeddings

A new computationally efficient dependence measure, and an adaptive stat...
research
06/15/2015

Fast Two-Sample Testing with Analytic Representations of Probability Measures

We propose a class of nonparametric two-sample tests with a cost linear ...
research
10/27/2018

Informative Features for Model Comparison

Given two candidate models, and a set of target observations, we address...
research
05/22/2017

A Linear-Time Kernel Goodness-of-Fit Test

We propose a novel adaptive test of goodness-of-fit, with computational ...
research
11/14/2016

Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy

We propose a method to optimize the representation and distinguishabilit...
research
02/17/2018

Post Selection Inference with Incomplete Maximum Mean Discrepancy Estimator

Measuring divergence between two distributions is essential in machine l...

Please sign up or login with your details

Forgot password? Click here to reset