Density estimation for shift-invariant multidimensional distributions

11/09/2018
by   Anindya De, et al.
0

We study density estimation for classes of shift-invariant distributions over R^d. A multidimensional distribution is "shift-invariant" if, roughly speaking, it is close in total variation distance to a small shift of it in any direction. Shift-invariance relaxes smoothness assumptions commonly used in non-parametric density estimation to allow jump discontinuities. The different classes of distributions that we consider correspond to different rates of tail decay. For each such class we give an efficient algorithm that learns any distribution in the class from independent samples with respect to total variation distance. As a special case of our general result, we show that d-dimensional shift-invariant distributions which satisfy an exponential tail bound can be learned to total variation distance error ϵ using Õ_d(1/ ϵ^d+2) examples and Õ_d(1/ ϵ^2d+2) time. This implies that, for constant d, multivariate log-concave distributions can be learned in Õ_d(1/ϵ^2d+2) time using Õ_d(1/ϵ^d+2) samples, answering a question of [Diakonikolas, Kane and Stewart, 2016] All of our results extend to a model of noise-tolerant density estimation using Huber's contamination model, in which the target distribution to be learned is a (1-ϵ,ϵ) mixture of some unknown distribution in the class with some other arbitrary and unknown distribution, and the learning algorithm must output a hypothesis distribution with total variation distance error O(ϵ) from the target distribution. We show that our general results are close to best possible by proving a simple Ω(1/ϵ^d) information-theoretic lower bound on sample complexity even for learning bounded distributions that are shift-invariant.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/14/2013

Efficient Density Estimation via Piecewise Polynomial Approximation

We give a highly efficient "semi-agnostic" algorithm for learning univar...
research
03/13/2019

The Log-Concave Maximum Likelihood Estimator is Optimal in High Dimensions

We study the problem of learning a d-dimensional log-concave distributio...
research
11/05/2019

Efficiently Learning Structured Distributions from Untrusted Batches

We study the problem, introduced by Qiao and Valiant, of learning from u...
research
01/13/2021

On Misspecification in Prediction Problems and Robustness via Improper Learning

We study probabilistic prediction games when the underlying model is mis...
research
04/26/2019

Sample Amplification: Increasing Dataset Size even when Learning is Impossible

Given data drawn from an unknown distribution, D, to what extent is it p...
research
09/25/2009

Discrete MDL Predicts in Total Variation

The Minimum Description Length (MDL) principle selects the model that ha...
research
10/24/2022

Learning and Covering Sums of Independent Random Variables with Unbounded Support

We study the problem of covering and learning sums X = X_1 + ⋯ + X_n of ...

Please sign up or login with your details

Forgot password? Click here to reset