Gaussian Processes for Missing Value Imputation

04/10/2022
by   Bahram Jafrasteh, et al.
0

Missing values are common in many real-life datasets. However, most of the current machine learning methods can not handle missing values. This means that they should be imputed beforehand. Gaussian Processes (GPs) are non-parametric models with accurate uncertainty estimates that combined with sparse approximations and stochastic variational inference scale to large data sets. Sparse GPs can be used to compute a predictive distribution for missing data. Here, we present a hierarchical composition of sparse GPs that is used to predict missing values at each dimension using all the variables from the other dimensions. We call the approach missing GP (MGP). MGP can be trained simultaneously to impute all observed missing values. Specifically, it outputs a predictive distribution for each missing value that is then used in the imputation of other missing values. We evaluate MGP in one private clinical data set and four UCI datasets with a different percentage of missing values. We compare the performance of MGP with other state-of-the-art methods for imputing missing values, including variants based on sparse GPs and deep GPs. The results obtained show a significantly better performance of MGP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/15/2021

Input Dependent Sparse Gaussian Processes

Gaussian Processes (GPs) are Bayesian models that provide uncertainty es...
research
08/12/2019

Mixture-based Multiple Imputation Models for Clinical Data with a Temporal Dimension

The problem of missing values in multivariable time series is a key chal...
research
09/03/2015

Semi-described and semi-supervised learning with Gaussian processes

Propagating input uncertainty through non-linear Gaussian process (GP) m...
research
01/13/2022

Multi-task longitudinal forecasting with missing values on Alzheimer's Disease

Machine learning techniques typically applied to dementia forecasting la...
research
01/28/2020

Multi-class Gaussian Process Classification with Noisy Inputs

It is a common practice in the supervised machine learning community to ...
research
11/05/2019

Scalable Variational Gaussian Processes for Crowdsourcing: Glitch Detection in LIGO

In the last years, crowdsourcing is transforming the way classification ...
research
06/11/2015

Mondrian Forests for Large-Scale Regression when Uncertainty Matters

Many real-world regression problems demand a measure of the uncertainty ...

Please sign up or login with your details

Forgot password? Click here to reset