A Unified Parallel Algorithm for Regularized Group PLS Scalable to Big Data

Partial Least Squares (PLS) methods have been heavily exploited to analyse the association between two blocs of data. These powerful approaches can be applied to data sets where the number of variables is greater than the number of observations and in presence of high collinearity between variables. Different sparse versions of PLS have been developed to integrate multiple data sets while simultaneously selecting the contributing variables. Sparse modelling is a key factor in obtaining better estimators and identifying associations between multiple data sets. The cornerstone of the sparsity version of PLS methods is the link between the SVD of a matrix (constructed from deflated versions of the original matrices of data) and least squares minimisation in linear regression. We present here an accurate description of the most popular PLS methods, alongside their mathematical proofs. A unified algorithm is proposed to perform all four types of PLS including their regularised versions. Various approaches to decrease the computation time are offered, and we show how the whole procedure can be scalable to big data sets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/31/2020

Spatial Factor Modeling: A Bayesian Matrix-Normal Approach for Misaligned Data

Multivariate spatially-oriented data sets are prevalent in the environme...
research
11/30/2021

Efficient and robust high-dimensional sparse logistic regression via nonlinear primal-dual hybrid gradient algorithms

Logistic regression is a widely used statistical model to describe the r...
research
03/12/2019

An "On The Fly" Framework for Efficiently Generating Synthetic Big Data Sets

Collecting, analyzing and gaining insight from large volumes of data is ...
research
02/17/2019

Separating common (global and local) and distinct variation in multiple mixed types data sets

Multiple sets of measurements on the same objects obtained from differen...
research
06/08/2017

Scaling up the Automatic Statistician: Scalable Structure Discovery using Gaussian Processes

Automating statistical modelling is a challenging problem that has far-r...

Please sign up or login with your details

Forgot password? Click here to reset