Confidence Intervals for Unobserved Events

11/06/2022
by   Amichai Painsky, et al.
0

Consider a finite sample from an unknown distribution over a countable alphabet. Unobserved events are alphabet symbols which do not appear in the sample. Estimating the probabilities of unobserved events is a basic problem in statistics and related fields, which was extensively studied in the context of point estimation. In this work we introduce a novel interval estimation scheme for unobserved events. Our proposed framework applies selective inference, as we construct confidence intervals (CIs) for the desired set of parameters. Interestingly, we show that obtained CIs are dimension-free, as they do not grow with the alphabet size. Further, we show that these CIs are (almost) tight, in the sense that they cannot be further improved without violating the prescribed coverage rate. We demonstrate the performance of our proposed scheme in synthetic and real-world experiments, showing a significant improvement over the alternatives. Finally, we apply our proposed scheme to large alphabet modeling. We introduce a novel simultaneous CI scheme for large alphabet distributions which outperforms currently known methods while maintaining the prescribed coverage rate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/01/2023

Joint Coverage Regions: Simultaneous Confidence and Prediction Sets

We introduce Joint Coverage Regions (JCRs), which unify confidence inter...
research
08/31/2023

Optimal confidence interval for the difference of proportions

Estimating the probability of the binomial distribution is a basic probl...
research
07/17/2023

Tight Distribution-Free Confidence Intervals for Local Quantile Regression

It is well known that it is impossible to construct useful confidence in...
research
10/16/2022

Inference on Extreme Quantiles of Unobserved Individual Heterogeneity

We develop a methodology for conducting inference on extreme quantiles o...
research
06/18/2020

A Framework for Sample Efficient Interval Estimation with Control Variates

We consider the problem of estimating confidence intervals for the mean ...
research
12/13/2021

Scalable subsampling: computation, aggregation and inference

Subsampling is a general statistical method developed in the 1990s aimed...
research
12/08/2020

Split: Inferring Unobserved Event Probabilities for Disentangling Brand-Customer Interactions

Often, data contains only composite events composed of multiple events, ...

Please sign up or login with your details

Forgot password? Click here to reset