Optimal estimation of high-order missing masses, and the rare-type match problem

06/26/2023
āˆ™
by   Stefano Favaro, et al.
āˆ™
0
āˆ™

Consider a random sample (X_1,ā€¦,X_n) from an unknown discrete distribution P=āˆ‘_jā‰„1p_jĪ“_s_j on a countable alphabet š•Š, and let (Y_n,j)_jā‰„1 be the empirical frequencies of distinct symbols s_j's in the sample. We consider the problem of estimating the r-order missing mass, which is a discrete functional of P defined as Īø_r(P;š—_n)=āˆ‘_jā‰„1p^r_jI(Y_n,j=0). This is generalization of the missing mass whose estimation is a classical problem in statistics, being the subject of numerous studies both in theory and methods. First, we introduce a nonparametric estimator of Īø_r(P;š—_n) and a corresponding non-asymptotic confidence interval through concentration properties of Īø_r(P;š—_n). Then, we investigate minimax estimation of Īø_r(P;š—_n), which is the main contribution of our work. We show that minimax estimation is not feasible over the class of all discrete distributions on š•Š, and not even for distributions with regularly varying tails, which only guarantee that our estimator is consistent for Īø_r(P;š—_n). This leads to introduce the stronger assumption of second-order regular variation for the tail behaviour of P, which is proved to be sufficient for minimax estimation of Īø_r(P;š—_n), making the proposed estimator an optimal minimax estimator of Īø_r(P;š—_n). Our interest in the r-order missing mass arises from forensic statistics, where the estimation of the 2-order missing mass appears in connection to the estimation of the likelihood ratio T(P,š—_n)=Īø_1(P;š—_n)/Īø_2(P;š—_n), known as the "fundamental problem of forensic mathematics". We present theoretical guarantees to nonparametric estimation of T(P,š—_n).

READ FULL TEXT
research
āˆ™ 06/25/2018

On consistent estimation of the missing mass

Given n samples from a population of individuals belonging to different ...
research
āˆ™ 10/05/2021

Estimation and Concentration of Missing Mass of Functions of Discrete Probability Distributions

Given a positive function g from [0,1] to the reals, the function's miss...
research
āˆ™ 06/04/2022

Concentration of the missing mass in metric spaces

We study the estimation of the probability to observe data further than ...
research
āˆ™ 02/21/2022

Asymptotic properties of the normalized discrete associated-kernel estimator for probability mass function

Discrete kernel smoothing is now gaining importance in nonparametric sta...
research
āˆ™ 04/07/2021

Near-optimal estimation of the unseen under regularly varying tail populations

Given n samples from a population of individuals belonging to different ...
research
āˆ™ 02/06/2022

Missing Mass Estimation from Sticky Channels

Distribution estimation under error-prone or non-ideal sampling modelled...
research
āˆ™ 05/13/2023

On Semi-Supervised Estimation of Distributions

We study the problem of estimating the joint probability mass function (...

Please sign up or login with your details

Forgot password? Click here to reset