Imputation of mixed data with multilevel singular value decomposition

04/30/2018
by   François Husson, et al.
0

Statistical analysis of large data sets offers new opportunities to better understand many processes. Yet, data accumulation often implies relaxing acquisition procedures or compounding diverse sources. As a consequence, such data sets often contain mixed data, i.e. both quantitative and qualitative and many missing values. Furthermore, aggregated data present a natural multilevel structure, where individuals or samples are nested within different sites, such as countries or hospitals. Imputation of multilevel data has therefore drawn some attention recently, but current solutions are not designed to handle mixed data, and suffer from important drawbacks such as their computational cost. In this article, we propose a single imputation method for multilevel data, which can be used to complete either quantitative, categorical or mixed data. The method is based on multilevel singular value decomposition (SVD), which consists in decomposing the variability of the data into two components, the between and within groups variability, and performing SVD on both parts. We show on a simulation study that in comparison to competitors, the method has the great advantages of handling data sets of various size, and being computationally faster. Furthermore, it is the first so far to handle mixed data. We apply the method to impute a medical data set resulting from the aggregation of several data sets coming from different hospitals. This application falls in the framework of a larger project on Trauma patients. To overcome obstacles associated to the aggregation of medical data, we turn to distributed computation. The method is implemented in an R package.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/04/2011

MissForest - nonparametric missing value imputation for mixed-type data

Modern data acquisition based on high-throughput technology is often fac...
research
01/12/2023

Multiple imputation of incomplete multilevel data using Heckman selection models

Missing data is a common problem in medical research, and is commonly ad...
research
05/10/2022

Explainable Data Imputation using Constraints

Data values in a dataset can be missing or anomalous due to mishandling ...
research
10/28/2019

Missing Value Imputation for Mixed Data Through Gaussian Copula

Missing data imputation forms the first critical step of many data analy...
research
02/17/2019

Separating common (global and local) and distinct variation in multiple mixed types data sets

Multiple sets of measurements on the same objects obtained from differen...
research
05/06/2020

Group Heterogeneity Assessment for Multilevel Models

Many data sets contain an inherent multilevel structure, for example, be...
research
05/30/2014

Online and Adaptive Pseudoinverse Solutions for ELM Weights

The ELM method has become widely used for classification and regressions...

Please sign up or login with your details

Forgot password? Click here to reset