A Robust Classifier Under Missing-Not-At-Random Sample Selection Bias

05/25/2023
by   Huy Mai, et al.
0

The shift between the training and testing distributions is commonly due to sample selection bias, a type of bias caused by non-random sampling of examples to be included in the training set. Although there are many approaches proposed to learn a classifier under sample selection bias, few address the case where a subset of labels in the training set are missing-not-at-random (MNAR) as a result of the selection process. In statistics, Greene's method formulates this type of sample selection with logistic regression as the prediction model. However, we find that simply integrating this method into a robust classification framework is not effective for this bias setting. In this paper, we propose BiasCorr, an algorithm that improves on Greene's method by modifying the original training set in order for a classifier to learn under MNAR sample selection bias. We provide theoretical guarantee for the improvement of BiasCorr over Greene's method by analyzing its bias. Experimental results on real-world datasets demonstrate that BiasCorr produces robust classifiers and can be extended to outperform state-of-the-art classifiers that have been proposed to train under sample selection bias.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2023

On Prediction Feature Assignment in the Heckman Selection Model

Under missing-not-at-random (MNAR) sample selection bias, the performanc...
research
07/20/2018

TESSERACT: Eliminating Experimental Bias in Malware Classification across Space and Time

Academic research on machine learning-based malware classification appea...
research
09/23/2019

Detection of Classifier Inconsistencies in Image Steganalysis

In this paper, a methodology to detect inconsistencies in classification...
research
06/29/2020

Decorrelated Clustering with Data Selection Bias

Most of existing clustering algorithms are proposed without considering ...
research
04/28/2018

Detect, Quantify, and Incorporate Dataset Bias: A Neuroimaging Analysis on 12,207 Individuals

Neuroimaging datasets keep growing in size to address increasingly compl...
research
10/08/2021

Fair Regression under Sample Selection Bias

Recent research on fair regression focused on developing new fairness no...
research
03/22/2021

Detecting Racial Bias in Jury Selection

To support the 2019 U.S. Supreme Court case "Flowers v. Mississippi", AP...

Please sign up or login with your details

Forgot password? Click here to reset