Learning from Incomplete Data by Simultaneous Training of Neural Networks and Sparse Coding

11/28/2020
by   Cesar F. Caiafa, et al.
16

Handling correctly incomplete datasets in machine learning is a fundamental and classical challenge. In this paper, the problem of training a classifier on a dataset with missing features, and its application to a complete or incomplete test dataset, is addressed. A supervised learning method is developed to train a general classifier, such as a logistic regression or a deep neural network, using only a limited number of features per sample, while assuming sparse representations of data vectors on an unknown dictionary. The pattern of missing features is allowed to be different for each input data instance and can be either random or structured. The proposed method simultaneously learns the classifier, the dictionary and the corresponding sparse representation of each input data sample. A theoretical analysis is provided, comparing this method with the standard imputation approach, which consists of performing data completion followed by training the classifier with those reconstructions. Sufficient conditions are identified such that, if it is possible to train a classifier on incomplete observations so that their reconstructions are well separated by a hyperplane, then the same classifier also correctly separates the original (unobserved) data samples. Extensive simulation results on synthetic and well-known reference datasets are presented that validate our theoretical findings and demonstrate the effectiveness of the proposed method compared to traditional data imputation approaches and one state of the art algorithm.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 6

page 8

page 10

page 11

research
07/25/2018

Propensity score estimation using classification and regression trees in the presence of missing covariate data

Data mining and machine learning techniques such as classification and r...
research
12/06/2018

MIWAE: Deep Generative Modelling and Imputation of Incomplete Data

We consider the problem of handling missing data with deep latent variab...
research
12/06/2018

missIWAE: Deep Generative Modelling and Imputation of Incomplete Data

We present a simple technique to train deep latent variable models (DLVM...
research
06/16/2022

Classification of datasets with imputed missing values: does imputation quality matter?

Classifying samples in incomplete datasets is a common aim for machine l...
research
07/01/2021

Neural Network Training with Highly Incomplete Datasets

Neural network training and validation rely on the availability of large...
research
08/28/2022

Leachable Component Clustering

Clustering attempts to partition data instances into several distinctive...
research
08/01/2018

Forest Learning Universal Coding

This paper considers structure learning from data with n samples of p va...

Please sign up or login with your details

Forgot password? Click here to reset