Improved Estimation of Class Prior Probabilities through Unlabeled Data

10/06/2015
by   Norman Matloff, et al.
0

Work in the classification literature has shown that in computing a classification function, one need not know the class membership of all observations in the training set; the unlabeled observations still provide information on the marginal distribution of the feature set, and can thus contribute to increased classification accuracy for future observations. The present paper will show that this scheme can also be used for the estimation of class prior probabilities, which would be very useful in applications in which it is difficult or expensive to determine class membership. Both parametric and nonparametric estimators are developed. Asymptotic distributions of the estimators are derived, and it is proven that the use of the unlabeled observations does reduce asymptotic variance. This methodology is also extended to the estimation of subclass probabilities.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/15/2018

Alternate Estimation of a Classifier and the Class-Prior from Positive and Unlabeled Data

We consider a problem of learning a binary classifier only from positive...
research
04/18/2019

Asymptotic normality of generalized maximum spacing estimators for multivariate observations

In this paper, the maximum spacing method is considered for multivariate...
research
05/08/2022

Bayesian Estimation of Multinomial Cell Probabilities Incorporating Information from Aggregated Observations

In this note, we consider the problem of estimating multinomial cell pro...
research
01/30/2018

Mixture Proportion Estimation for Positive--Unlabeled Learning via Classifier Dimension Reduction

Positive--unlabeled (PU) learning considers two samples, a positive set ...
research
10/15/2017

Estimation of Squared-Loss Mutual Information from Positive and Unlabeled Data

Capturing input-output dependency is an important task in statistical da...
research
01/08/2016

Nonparametric semi-supervised learning of class proportions

The problem of developing binary classifiers from positive and unlabeled...
research
07/08/2021

Moment-based density and risk estimation from grouped summary statistics

Data on a continuous variable are often summarized by means of histogram...

Please sign up or login with your details

Forgot password? Click here to reset