Determining the Number of Components in PLS Regression on Incomplete Data

10/18/2018
by   Titin Agustin Nengsih, et al.
0

Partial least squares regression---or PLS---is a multivariate method in which models are estimated using either the SIMPLS or NIPALS algorithm. PLS regression has been extensively used in applied research because of its effectiveness in analysing relationships between an outcome and one or several components. Note that the NIPALS algorithm is able to provide estimates on incomplete data. Selection of the number of components used to build a representative model in PLS regression is an important problem. However, how to deal with missing data when using PLS regression remains a matter of debate. Several approaches have been proposed in the literature, including the Q^2 criterion, and the AIC and BIC criteria. Here we study the behavior of the NIPALS algorithm when used to fit a PLS regression for various proportions of missing data and for different types of missingness. We compare criteria for selecting the number of components for a PLS regression on incomplete data and on imputed datasets using three imputation methods: multiple imputation by chained equations, k-nearest neighbor imputation, and singular value decomposition imputation. Various criteria were tested with different proportions of missing data (ranging from 5 missingness assumptions. Q2-leave-one-out component selection methods gave more reliable results than AIC and BIC-based ones.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/15/2022

Inference with Imputed Data: The Allure of Making Stuff Up

Incomplete observability of data generates an identification problem. Th...
research
01/12/2018

Multiple Imputation: A Review of Practical and Theoretical Findings

Multiple imputation is a straightforward method for handling missing dat...
research
01/20/2022

Evaluation of data imputation strategies in complex, deeply-phenotyped data sets: the case of the EU-AIMS Longitudinal European Autism Project

An increasing number of large-scale multi-modal research initiatives has...
research
05/18/2018

Processing of missing data by neural networks

We propose a general, theoretically justified mechanism for processing m...
research
11/27/2020

Clustering with missing data: which equivalent for Rubin's rules?

Multiple imputation (MI) is a popular method for dealing with missing va...
research
11/04/2014

Iterated geometric harmonics for data imputation and reconstruction of missing data

The method of geometric harmonics is adapted to the situation of incompl...
research
06/22/2010

Large gaps imputation in remote sensed imagery of the environment

Imputation of missing data in large regions of satellite imagery is nece...

Please sign up or login with your details

Forgot password? Click here to reset