Testing Identity of Multidimensional Histograms

04/10/2018
by   Ilias Diakonikolas, et al.
0

We investigate the problem of identity testing for multidimensional histogram distributions. A distribution p: D →R_+, where D ⊆R^d, is called a k-histogram if there exists a partition of the domain into k axis-aligned rectangles such that p is constant within each such rectangle. Histograms are one of the most fundamental non-parametric families of distributions and have been extensively studied in computer science and statistics. We give the first identity tester for this problem with sub-learning sample complexity in any fixed dimension and a nearly-matching sample complexity lower bound. More specifically, let q be an unknown d-dimensional k-histogram and p be an explicitly given k-histogram. We want to correctly distinguish, with probability at least 2/3, between the case that p = q versus p-q_1 ≥ϵ. We design a computationally efficient algorithm for this hypothesis testing problem with sample complexity O((√(k)/ϵ^2) ^O(d)(k/ϵ)). Our algorithm is robust to model misspecification, i.e., succeeds even if q is only promised to be close to a k-histogram. Moreover, for k = 2^Ω(d), we show a nearly-matching sample complexity lower bound of Ω((√(k)/ϵ^2) ((k/ϵ)/d)^Ω(d)) when d≥ 2. Prior to our work, the sample complexity of the d=1 case was well-understood, but no algorithm with sub-learning sample complexity was known, even for d=2. Our new upper and lower bounds have interesting conceptual implications regarding the relation between learning and testing in this setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/14/2022

Near-Optimal Bounds for Testing Histogram Distributions

We investigate the problem of testing whether a discrete probability dis...
research
12/31/2020

The Sample Complexity of Robust Covariance Testing

We study the problem of testing the covariance matrix of a high-dimensio...
research
02/23/2018

Fast and Sample Near-Optimal Algorithms for Learning Multidimensional Histograms

We study the problem of robustly learning multi-dimensional histograms. ...
research
06/01/2019

Graph-based Discriminators: Sample Complexity and Expressiveness

A basic question in learning theory is to identify if two distributions ...
research
04/27/2020

Testing Data Binnings

Motivated by the question of data quantization and "binning," we revisit...
research
12/29/2020

Testing Product Distributions: A Closer Look

We study the problems of identity and closeness testing of n-dimensional...
research
02/24/2019

Testing Preferential Domains Using Sampling

A preferential domain is a collection of sets of preferences which are l...

Please sign up or login with your details

Forgot password? Click here to reset