Missing Mass Estimation from Sticky Channels

02/06/2022
by   Prafulla Chandra, et al.
0

Distribution estimation under error-prone or non-ideal sampling modelled as "sticky" channels have been studied recently motivated by applications such as DNA computing. Missing mass, the sum of probabilities of missing letters, is an important quantity that plays a crucial role in distribution estimation, particularly in the large alphabet regime. In this work, we consider the problem of estimation of missing mass, which has been well-studied under independent and identically distributed (i.i.d) sampling, in the case when sampling is "sticky". Precisely, we consider the scenario where each sample from an unknown distribution gets repeated a geometrically-distributed number of times. We characterise the minimax rate of Mean Squared Error (MSE) of estimating missing mass from such sticky sampling channels. An upper bound on the minimax rate is obtained by bounding the risk of a modified Good-Turing estimator. We derive a matching lower bound on the minimax rate by extending the Le Cam method.

READ FULL TEXT

page 1

page 2

page 3

page 4

06/25/2018

On consistent estimation of the missing mass

Given n samples from a population of individuals belonging to different ...
02/03/2021

Missing Mass of Rank-2 Markov Chains

Estimation of missing mass with the popular Good-Turing (GT) estimator i...
06/04/2022

Concentration of the missing mass in metric spaces

We study the estimation of the probability to observe data further than ...
03/31/2022

Adaptive Estimation of Random Vectors with Bandit Feedback

We consider the problem of sequentially learning to estimate, in the mea...
02/27/2019

A Good-Turing estimator for feature allocation models

Feature allocation models generalize species sampling models by allowing...
12/08/2020

Adaptive Sampling for Estimating Distributions: A Bayesian Upper Confidence Bound Approach

The problem of adaptive sampling for estimating probability mass functio...
12/05/2017

Estimating linear functionals of a sparse family of Poisson means

Assume that we observe a sample of size n composed of p-dimensional sign...