DEDPUL: Method for Mixture Proportion Estimation and Positive-Unlabeled Classification based on Density Estimation
This paper studies Positive-Unlabeled Classification, the problem of semi-supervised binary classification in the case when Negative (N) class in the training set is contaminated with instances of Positive (P) class. We develop a novel method (DEDPUL) that simultaneously solves two problems concerning the contaminated Unlabeled (U) sample: estimates the proportions of the mixing components (P and N) in U, and classifies U. By conducting experiments on synthetic and real-world data we favorably compare DEDPUL with current state-of-the-art methods for both problems. We introduce an automatic procedure for DEDPUL hyperparameter optimization. Additionally, we improve two methods in the literature and achieve DEDPUL level of performance with one of them.
READ FULL TEXT