Semi-supervised Wrapper Feature Selection with Imperfect Labels

11/12/2019
by   Vasilii Feofanov, et al.
0

In this paper, we propose a new wrapper approach for semi-supervised feature selection. A common strategy in semi-supervised learning is to augment the training set by pseudo-labeled unlabeled examples. However, the pseudo-labeling procedure is prone to error and has a high risk of disrupting the learning algorithm with additional noisy labeled training data. To overcome this, we propose to model explicitly the mislabeling error during the learning phase with the overall aim of selecting the most relevant feature characteristics. We derive a C-bound for Bayes classifiers trained over partially labeled training sets by taking into account the mislabeling errors. The risk bound is then considered as an objective function that is minimized over the space of possible feature subsets using a genetic algorithm. In order to produce both sparse and accurate solution, we propose a modification of a genetic algorithm with the crossover based on feature weights and recursive elimination of irrelevant features. Empirical results on different data sets show the effectiveness of our framework compared to several state-of-the-art semi-supervised feature selection approaches.

READ FULL TEXT
research
09/29/2021

Multi-class Probabilistic Bounds for Self-learning

Self-learning is a classical approach for learning with both labeled and...
research
11/17/2022

Contrastive Credibility Propagation for Reliable Semi-Supervised Learning

Inferencing unlabeled data from labeled data is an error-prone process. ...
research
01/24/2020

Sparse Semi-supervised Heterogeneous Interbattery Bayesian Analysis

The Bayesian approach to feature extraction, known as factor analysis (F...
research
07/19/2022

A-SFS: Semi-supervised Feature Selection based on Multi-task Self-supervision

Feature selection is an important process in machine learning. It builds...
research
11/29/2021

Self-Training of Halfspaces with Generalization Guarantees under Massart Mislabeling Noise Model

We investigate the generalization properties of a self-training algorith...
research
12/10/2021

Building Autocorrelation-Aware Representations for Fine-Scale Spatiotemporal Prediction

Many scientific prediction problems have spatiotemporal data- and modeli...

Please sign up or login with your details

Forgot password? Click here to reset