An Apparent Paradox: A Classifier Trained from a Partially Classified Sample May Have Smaller Expected Error Rate Than That If the Sample Were Completely Classified

10/21/2019
by   Daniel Ahfock, et al.
0

There has been increasing interest in using semi-supervised learning to form a classifier. As is well known, the (Fisher) information in an unclassified feature with unknown class label is less (considerably less for weakly separated classes) than that of a classified feature which has known class label. Hence assuming that the labels of the unclassified features are randomly missing or their missing-label mechanism is simply ignored, the expected error rate of a classifier formed from a partially classified sample is greater than that if the sample were completely classified. We propose to treat the labels of the unclassified features as missing data and to introduce a framework for their missingness in situations where these labels are not randomly missing. An examination of several partially classified data sets in the literature suggests that the unclassified features are not occurring at random but rather tend to be concentrated in regions of relatively high entropy in the feature space. Here in the context of two normal classes with a common covariance matrix we consider the situation where the missingness of the labels of the unclassified features can be modelled by a logistic model in which the probability of a missing label for a feature depends on its entropy. Rather paradoxically, we show that the classifier so formed from the partially classified sample may have smaller expected error rate that that if the sample were completely classified.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2022

Some Simulation and Empirical Results for Semi-Supervised Learning of the Bayes Rule of Allocation

There has been increasing attention to semi-supervised learning (SSL) ap...
research
04/08/2021

Semi-Supervised Learning of Classifiers from a Statistical Perspective: A Brief Review

There has been increasing attention to semi-supervised learning (SSL) ap...
research
02/26/2023

Semi-supervised Gaussian mixture modelling with a missing-data mechanism in R

Semi-supervised learning is being extensively applied to estimate classi...
research
04/13/2020

Estimation of Classification Rules from Partially Classified Data

We consider the situation where the observed sample contains some observ...
research
04/05/2019

On missing label patterns in semi-supervised learning

We investigate model based classification with partially labelled traini...
research
05/24/2018

Cautious Deep Learning

Most classifiers operate by selecting the maximum of an estimate of the ...
research
11/15/2018

Learning to Bound the Multi-class Bayes Error

In the context of supervised learning, meta learning uses features, meta...

Please sign up or login with your details

Forgot password? Click here to reset