Optimal Mean Estimation without a Variance

11/24/2020
āˆ™
by   Yeshwanth Cherapanamjeri, et al.
āˆ™
10
āˆ™

We study the problem of heavy-tailed mean estimation in settings where the variance of the data-generating distribution does not exist. Concretely, given a sample š— = {X_i}_i = 1^n from a distribution š’Ÿ over ā„^d with mean Ī¼ which satisfies the following weak-moment assumption for some Ī±āˆˆ [0, 1]: āˆ€v = 1: š”¼_X š’Ÿ[|āŸØ X - Ī¼, vāŸ©|^1 + Ī±] ā‰¤ 1, and given a target failure probability, Ī“, our goal is to design an estimator which attains the smallest possible confidence interval as a function of n,d,Ī“. For the specific case of Ī± = 1, foundational work of Lugosi and Mendelson exhibits an estimator achieving subgaussian confidence intervals, and subsequent work has led to computationally efficient versions of this estimator. Here, we study the case of general Ī±, and establish the following information-theoretic lower bound on the optimal attainable confidence interval: Ī©(āˆš(d/n) + (d/n)^Ī±/(1 + Ī±) + (log 1 / Ī“/n)^Ī±/(1 + Ī±)). Moreover, we devise a computationally-efficient estimator which achieves this lower bound.

READ FULL TEXT

page 1

page 2

page 3

page 4

āˆ™ 04/20/2020

Is distribution-free inference possible for binary regression?

For a regression problem with a binary label response, we examine the pr...
āˆ™ 08/05/2022

Catoni-style Confidence Sequences under Infinite Variance

In this paper, we provide an extension of confidence sequences for setti...
āˆ™ 06/18/2020

A Framework for Sample Efficient Interval Estimation with Control Variates

We consider the problem of estimating confidence intervals for the mean ...
āˆ™ 01/25/2019

Communication Complexity of Estimating Correlations

We characterize the communication complexity of the following distribute...
āˆ™ 07/06/2021

Distributed Adaptive Huber Regression

Distributed data naturally arise in scenarios involving multiple sources...
āˆ™ 11/30/2018

Prior-free Data Acquisition for Accurate Statistical Estimation

We study a data analyst's problem of acquiring data from self-interested...
āˆ™ 04/18/2019

Efficient two-sample functional estimation and the super-oracle phenomenon

We consider the estimation of two-sample integral functionals, of the ty...