Estimating location parameters in entangled single-sample distributions

07/06/2019
by   Ankit Pensia, et al.
0

We consider the problem of estimating the common mean of independently sampled data, where samples are drawn in a possibly non-identical manner from symmetric, unimodal distributions with a common mean. This generalizes the setting of Gaussian mixture modeling, since the number of distinct mixture components may diverge with the number of observations. We propose an estimator that adapts to the level of heterogeneity in the data, achieving near-optimality in both the i.i.d. setting and some heterogeneous settings, where the fraction of "low-noise" points is as small as n/n. Our estimator is a hybrid of the modal interval, shorth, and median estimators from classical statistics; however, the key technical contributions rely on novel empirical process theory results that we derive for independent but non-i.i.d. data. In the multivariate setting, we generalize our theory to mean estimation for mixtures of radially symmetric distributions, and derive minimax lower bounds on the expected error of any estimator that is agnostic to the scales of individual data points. Finally, we describe an extension of our estimators applicable to linear regression. In the multivariate mean estimation and regression settings, we present computationally feasible versions of our estimators that run in time polynomial in the number of data points.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/26/2019

Robust multivariate mean estimation: the optimality of trimmed mean

We consider the problem of estimating the mean of a random vector based ...
research
04/20/2020

Learning Entangled Single-Sample Distributions via Iterative Trimming

In the setting of entangled single-sample distributions, the goal is to ...
research
06/10/2019

Mean estimation and regression under heavy-tailed distributions--a survey

We survey some of the recent advances in mean estimation and regression ...
research
09/17/2023

L^1 Estimation: On the Optimality of Linear Estimators

Consider the problem of estimating a random variable X from noisy observ...
research
07/05/2016

Efficient Estimation in the Tails of Gaussian Copulas

We consider the question of efficient estimation in the tails of Gaussia...
research
07/28/2023

Mean Estimation with User-level Privacy under Data Heterogeneity

A key challenge in many modern data analysis tasks is that user data are...
research
10/24/2022

Subspace Recovery from Heterogeneous Data with Non-isotropic Noise

Recovering linear subspaces from data is a fundamental and important tas...

Please sign up or login with your details

Forgot password? Click here to reset