Novel Deviation Bounds for Mixture of Independent Bernoulli Variables with Application to the Missing Mass

In this paper, we are concerned with obtaining distribution-free concentration inequalities for mixture of independent Bernoulli variables that incorporate a notion of variance. Missing mass is the total probability mass associated to the outcomes that have not been seen in a given sample which is an important quantity that connects density estimates obtained from a sample to the population for discrete distributions. Therefore, we are specifically motivated to apply our method to study the concentration of missing mass - which can be expressed as a mixture of Bernoulli - in a novel way. We not only derive - for the first time - Bernstein-like large deviation bounds for the missing mass whose exponents behave almost linearly with respect to deviation size, but also sharpen McAllester and Ortiz (2003) and Berend and Kontorovich (2013) for large sample sizes in the case of small deviations which is the most interesting case in learning theory. In the meantime, our approach shows that the heterogeneity issue introduced in McAllester and Ortiz (2003) is resolvable in the case of missing mass in the sense that one can use standard inequalities but it may not lead to strong results. Thus, we postulate that our results are general and can be applied to provide potentially sharp Bernstein-like bounds under some constraints.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/10/2015

Novel Bernstein-like Concentration Inequalities for the Missing Mass

We are concerned with obtaining novel concentration inequalities for the...
research
03/20/2015

A Bennett Inequality for the Missing Mass

Novel concentration inequalities are obtained for the missing mass, i.e....
research
05/19/2020

Revisiting Concentration of Missing Mass

We revisit the problem of missing mass concentration, deriving Bernstein...
research
10/05/2021

Estimation and Concentration of Missing Mass of Functions of Discrete Probability Distributions

Given a positive function g from [0,1] to the reals, the function's miss...
research
10/12/2021

Uniform concentration bounds for frequencies of rare events

New Vapnik and Chervonkis type concentration inequalities are derived fo...
research
08/02/2021

Generalization bounds for nonparametric regression with β-mixing samples

In this paper we present a series of results that permit to extend in a ...
research
03/12/2015

On the Impossibility of Learning the Missing Mass

This paper shows that one cannot learn the probability of rare events wi...

Please sign up or login with your details

Forgot password? Click here to reset