Extrapolating the profile of a finite population

by   Soham Jana, et al.

We study a prototypical problem in empirical Bayes. Namely, consider a population consisting of k individuals each belonging to one of k types (some types can be empty). Without any structural restrictions, it is impossible to learn the composition of the full population having observed only a small (random) subsample of size m = o(k). Nevertheless, we show that in the sublinear regime of m =ω(k/log k), it is possible to consistently estimate in total variation the profile of the population, defined as the empirical distribution of the sizes of each type, which determines many symmetric properties of the population. We also prove that in the linear regime of m=c k for any constant c the optimal rate is Θ(1/log k). Our estimator is based on Wolfowitz's minimum distance method, which entails solving a linear program (LP) of size k. We show that there is a single infinite-dimensional LP whose value simultaneously characterizes the risk of the minimum distance estimator and certifies its minimax optimality. The sharp convergence rate is obtained by evaluating this LP using complex-analytic techniques.


page 1

page 2

page 3

page 4


Sample complexity of population recovery

The problem of population recovery refers to estimating a distribution b...

Quantitative Group Testing in the Sublinear Regime

The quantitative group testing (QGT) problem deals with efficiently iden...

Fast Learning Rate of lp-MKL and its Minimax Optimality

In this paper, we give a new sharp generalization bound of lp-MKL which ...

The Log-Concave Maximum Likelihood Estimator is Optimal in High Dimensions

We study the problem of learning a d-dimensional log-concave distributio...

Profile Entropy: A Fundamental Measure for the Learnability and Compressibility of Discrete Distributions

The profile of a sample is the multiset of its symbol frequencies. We sh...

The exponential distribution analog of the Grubbs--Weaver method

Grubbs and Weaver (JASA 42 (1947) 224--241) suggest a minimum-variance u...

Cutoff profile of the Metropolis biased card shuffling

We consider the Metropolis biased card shuffling (also called the multi-...