On missing label patterns in semi-supervised learning

04/05/2019
by   Daniel Ahfock, et al.
0

We investigate model based classification with partially labelled training data. In many biostatistical applications, labels are manually assigned by experts, who may leave some observations unlabelled due to class uncertainty. We analyse semi-supervised learning as a missing data problem and identify situations where the missing label pattern is non-ignorable for the purposes of maximum likelihood estimation. In particular, we find that a relationship between classification difficulty and the missing label pattern implies a non-ignorable missingness mechanism. We examine a number of real datasets and conclude the pattern of missing labels is related to the difficulty of classification. We propose a joint modelling strategy involving the observed data and the missing label mechanism to account for the systematic missing labels. Full likelihood inference including the missing label mechanism can improve the efficiency of parameter estimation, and increase classification accuracy.

READ FULL TEXT
research
02/15/2023

Are labels informative in semi-supervised learning? – Estimating and leveraging the missing-data mechanism

Semi-supervised learning is a powerful technique for leveraging unlabele...
research
10/12/2016

Optimistic Semi-supervised Least Squares Classification

The goal of semi-supervised learning is to improve supervised classifier...
research
04/13/2020

Estimation of Classification Rules from Partially Classified Data

We consider the situation where the observed sample contains some observ...
research
02/28/2019

Learning partially ranked data based on graph regularization

Ranked data appear in many different applications, including voting and ...
research
11/13/2018

What is really needed to justify ignoring the response mechanism for modelling purposes?

With incomplete data, the standard argument for when the response mechan...
research
06/27/2023

Biclustering random matrix partitions with an application to classification of forensic body fluids

Classification of unlabeled data is usually achieved by supervised learn...

Please sign up or login with your details

Forgot password? Click here to reset