Efficiently Learning Structured Distributions from Untrusted Batches

11/05/2019
by   Sitan Chen, et al.
0

We study the problem, introduced by Qiao and Valiant, of learning from untrusted batches. Here, we assume m users, all of whom have samples from some underlying distribution p over 1, ..., n. Each user sends a batch of k i.i.d. samples from this distribution; however an ϵ-fraction of users are untrustworthy and can send adversarially chosen responses. The goal is then to learn p in total variation distance. When k = 1 this is the standard robust univariate density estimation setting and it is well-understood that Ω (ϵ) error is unavoidable. Suprisingly, Qiao and Valiant gave an estimator which improves upon this rate when k is large. Unfortunately, their algorithms run in time exponential in either n or k. We first give a sequence of polynomial time algorithms whose estimation error approaches the information-theoretically optimal bound for this problem. Our approach is based on recent algorithms derived from the sum-of-squares hierarchy, in the context of high-dimensional robust estimation. We show that algorithms for learning from untrusted batches can also be cast in this framework, but by working with a more complicated set of test functions. It turns out this abstraction is quite powerful and can be generalized to incorporate additional problem specific constraints. Our second and main result is to show that this technology can be leveraged to build in prior knowledge about the shape of the distribution. Crucially, this allows us to reduce the sample complexity of learning from untrusted batches to polylogarithmic in n for most natural classes of distributions, which is important in many applications. To do so, we demonstrate that these sum-of-squares algorithms for robust mean estimation can be made to handle complex combinatorial constraints (e.g. those arising from VC theory), which may be of independent technical interest.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2022

Robust Sparse Mean Estimation via Sum of Squares

We study the problem of high-dimensional sparse mean estimation in the p...
research
11/09/2018

Density estimation for shift-invariant multidimensional distributions

We study density estimation for classes of shift-invariant distributions...
research
04/12/2017

Robustly Learning a Gaussian: Getting Optimal Error, Efficiently

We study the fundamental problem of learning the parameters of a high-di...
research
02/21/2023

Robust Mean Estimation Without a Mean: Dimension-Independent Error in Polynomial Time for Symmetric Distributions

In this work, we study the problem of robustly estimating the mean/locat...
research
10/22/2021

Polynomial-Time Sum-of-Squares Can Robustly Estimate Mean and Covariance of Gaussians Optimally

In this work, we revisit the problem of estimating the mean and covarian...
research
02/24/2020

Learning Structured Distributions From Untrusted Batches: Faster and Simpler

We revisit the problem of learning from untrusted batches introduced by ...
research
11/19/2019

Robust Learning of Discrete Distributions from Batches

Let d be the lowest L_1 distance to which a k-symbol distribution p can ...

Please sign up or login with your details

Forgot password? Click here to reset