DeepAI AI Chat
Log In Sign Up

Testing Identity of Multidimensional Histograms

by   Ilias Diakonikolas, et al.
University of California, San Diego

We investigate the problem of identity testing for multidimensional histogram distributions. A distribution p: D →R_+, where D ⊆R^d, is called a k-histogram if there exists a partition of the domain into k axis-aligned rectangles such that p is constant within each such rectangle. Histograms are one of the most fundamental non-parametric families of distributions and have been extensively studied in computer science and statistics. We give the first identity tester for this problem with sub-learning sample complexity in any fixed dimension and a nearly-matching sample complexity lower bound. More specifically, let q be an unknown d-dimensional k-histogram and p be an explicitly given k-histogram. We want to correctly distinguish, with probability at least 2/3, between the case that p = q versus p-q_1 ≥ϵ. We design a computationally efficient algorithm for this hypothesis testing problem with sample complexity O((√(k)/ϵ^2) ^O(d)(k/ϵ)). Our algorithm is robust to model misspecification, i.e., succeeds even if q is only promised to be close to a k-histogram. Moreover, for k = 2^Ω(d), we show a nearly-matching sample complexity lower bound of Ω((√(k)/ϵ^2) ((k/ϵ)/d)^Ω(d)) when d≥ 2. Prior to our work, the sample complexity of the d=1 case was well-understood, but no algorithm with sub-learning sample complexity was known, even for d=2. Our new upper and lower bounds have interesting conceptual implications regarding the relation between learning and testing in this setting.


page 1

page 2

page 3

page 4


Near-Optimal Bounds for Testing Histogram Distributions

We investigate the problem of testing whether a discrete probability dis...

The Sample Complexity of Robust Covariance Testing

We study the problem of testing the covariance matrix of a high-dimensio...

Fast and Sample Near-Optimal Algorithms for Learning Multidimensional Histograms

We study the problem of robustly learning multi-dimensional histograms. ...

Graph-based Discriminators: Sample Complexity and Expressiveness

A basic question in learning theory is to identify if two distributions ...

Testing Product Distributions: A Closer Look

We study the problems of identity and closeness testing of n-dimensional...

Learning Mixtures of Linear Regressions with Nearly Optimal Complexity

Mixtures of Linear Regressions (MLR) is an important mixture model with ...

Reproducibility in Learning

We introduce the notion of a reproducible algorithm in the context of le...