Semi-Supervised Learning with Multiple Imputations on Non-Random Missing Labels

08/15/2023
by   Jason Lu, et al.
0

Semi-Supervised Learning (SSL) is implemented when algorithms are trained on both labeled and unlabeled data. This is a very common application of ML as it is unrealistic to obtain a fully labeled dataset. Researchers have tackled three main issues: missing at random (MAR), missing completely at random (MCAR), and missing not at random (MNAR). The MNAR problem is the most challenging of the three as one cannot safely assume that all class distributions are equal. Existing methods, including Class-Aware Imputation (CAI) and Class-Aware Propensity (CAP), mostly overlook the non-randomness in the unlabeled data. This paper proposes two new methods of combining multiple imputation models to achieve higher accuracy and less bias. 1) We use multiple imputation models, create confidence intervals, and apply a threshold to ignore pseudo-labels with low confidence. 2) Our new method, SSL with De-biased Imputations (SSL-DI), aims to reduce bias by filtering out inaccurate data and finding a subset that is accurate and reliable. This subset of the larger dataset could be imputed into another SSL model, which will be less biased. The proposed models have been shown to be effective in both MCAR and MNAR situations, and experimental results show that our methodology outperforms existing methods in terms of classification accuracy and reducing bias.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/29/2022

On Non-Random Missing Labels in Semi-Supervised Learning

Semi-Supervised Learning (SSL) is fundamentally a missing label problem,...
research
06/27/2023

Biclustering random matrix partitions with an application to classification of forensic body fluids

Classification of unlabeled data is usually achieved by supervised learn...
research
03/28/2018

Semi-supervised learning for structured regression on partially observed attributed graphs

Conditional probabilistic graphical models provide a powerful framework ...
research
08/15/2023

Boosting Semi-Supervised Learning by bridging high and low-confidence predictions

Pseudo-labeling is a crucial technique in semi-supervised learning (SSL)...
research
03/23/2022

Towards Semi-Supervised Deep Facial Expression Recognition with An Adaptive Confidence Margin

Only parts of unlabeled data are selected to train models for most semi-...
research
05/15/2022

FreeMatch: Self-adaptive Thresholding for Semi-supervised Learning

Pseudo labeling and consistency regularization approaches with confidenc...
research
07/18/2021

Flood Segmentation on Sentinel-1 SAR Imagery with Semi-Supervised Learning

Floods wreak havoc throughout the world, causing billions of dollars in ...

Please sign up or login with your details

Forgot password? Click here to reset