Non-parametric targeted Bayesian estimation of class proportions in unlabeled data

11/22/2019
by   Iván Díaz, et al.
0

We introduce a novel Bayesian estimator for the class proportion in an unlabeled dataset, based on the targeted learning framework. Our procedure requires the specification of a prior (and outputs a posterior) only for the target of inference, instead of the prior (and posterior) on the full-data distribution employed by classical non-parametric Bayesian methods .When the scientific question can be characterized by a low-dimensional parameter functional, focus on such a prior and posterior distributions is more aligned with Bayesian subjectivism, compared to focus on entire data distributions. We prove a Bernstein-von Mises-type result for our proposed Bayesian procedure, which guarantees that the posterior distribution converges to the distribution of an efficient, asymptotically linear estimator. In particular, the posterior is Gaussian, doubly robust, and efficient in the limit, under the only assumption that certain nuisance parameters are estimated at slow rates. We perform numerical studies illustrating the frequentist properties of the method. We also illustrate their use in a motivating application to estimate the proportion of embolic strokes of undetermined source arising from occult cardiac sources or large-artery atherosclerotic lesions. Though we focus on the motivating example of the proportion of cases in an unlabeled dataset, the procedure is general and can be adapted to estimate any pathwise differentiable parameter in a non-parametric model.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset