DeepAI
Log In Sign Up

Partial Least Square Regression via Three-factor SVD-type Manifold Optimization for EEG Decoding

08/09/2022
by   Wanguang Yin, et al.
Southern University of Science & Technology
0

Partial least square regression (PLSR) is a widely-used statistical model to reveal the linear relationships of latent factors that comes from the independent variables and dependent variables. However, traditional methods to solve PLSR models are usually based on the Euclidean space, and easily getting stuck into a local minimum. To this end, we propose a new method to solve the partial least square regression, named PLSR via optimization on bi-Grassmann manifold (PLSRbiGr). Specifically, we first leverage the three-factor SVD-type decomposition of the cross-covariance matrix defined on the bi-Grassmann manifold, converting the orthogonal constrained optimization problem into an unconstrained optimization problem on bi-Grassmann manifold, and then incorporate the Riemannian preconditioning of matrix scaling to regulate the Riemannian metric in each iteration. PLSRbiGr is validated with a variety of experiments for decoding EEG signals at motor imagery (MI) and steady-state visual evoked potential (SSVEP) task. Experimental results demonstrate that PLSRbiGr outperforms competing algorithms in multiple EEG decoding tasks, which will greatly facilitate small sample data learning.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

11/12/2019

Nonsmooth Optimization over Stiefel Manifold: Riemannian Subgradient Methods

Nonsmooth Riemannian optimization is a still under explored subfield of ...
10/05/2022

MAtt: A Manifold Attention Network for EEG Decoding

Recognition of electroencephalographic (EEG) signals highly affect the e...
02/11/2019

Manifold Optimisation Assisted Gaussian Variational Approximation

Variational approximation methods are a way to approximate the posterior...
11/17/2017

Dimensionality Reduction on Grassmannian via Riemannian Optimization: A Generalized Perspective

This paper proposes a generalized framework with joint normalization whi...
06/04/2019

Manifold-regression to predict from MEG/EEG brain signals without source modeling

Magnetoencephalography and electroencephalography (M/EEG) can reveal neu...
02/15/2021

Fast and accurate optimization on the orthogonal manifold without retraction

We consider the problem of minimizing a function over the manifold of or...
11/17/2022

Optimization on the symplectic Stiefel manifold: SR decomposition-based retraction and applications

Numerous problems in optics, quantum physics, stability analysis, and co...

1 Introduction

Extracting the latent factor is an essential procedure for discovering the latent space of high-dimensional data; thereby proposed lots of regression model for latent semantic or variable analysis. Among of them, partial least squares regression (PLSR) is a well-established model for learning the latent variables, by a sequential way to learn the latent sub-spaces while maximally maintaining the correlations between the latent variables of independent variables

and dependent variables . Specifically, it can be described as the projection of two variables and onto the lower dimensional sub-spaces and . This application can be found in a various of fields, including chemometrics [brereton2018partial, hasegawa2000rational], chemical process control [dong2018regression, zheng2018semisupervised], and neuroscience [chu2020decoding, hoagey2019joint].

As a result, there emerges a lot of partial least square regression models, while most of them are based on the Euclidean space to solve the latent factors and identify the latent factors column by column in each iteration. A drawback of this approach is that it easily converges to a spurious local minimum

and hardly obtain a globally optimal solution. Given an extension of PLSR to the N-way tensors with rank-one decomposition, which can provide an improvement on the intuitive interpretation of the model 

[bro1996multiway]. However, N-way PLSR suffers from high computational complexity and slow convergence in dealing with complex data.

To address problems above, we propose a three-factor SVD-type decomposition of PLSR via optimization on bi-Grassmann manifold (PLSRbiGr). Hence, its corresponding sub-spaces and associated with independent variable and dependent variable can be solved by using a nonlinear manifold optimization method. Moreover, we leverage the Riemannian preconditioning of matrix scaling to regulate the Riemannian metric in each iteration and self-adapt to the changing of subspace [kasai2016low]. To validate the performance of PLSRbiGr, we conduct two EEG classification tasks on the motor imagery (MI) and steady-state visual evoked potential (SSVEP) datasets. The results demonstrate PLSRbiGr is superior to the inspired modification of PLSR (SIMPLSR) [de1993simpls], SIMPLSR with the generalized Grassmann manifold (PLSRGGr), SIMPLSR with product manifold (PLSRGStO), as well as the sparse SIMPLSR via optimization on generalized Stiefel manifold (SPLSRGSt) [chen2018solving]. Our main contributions can be summarized as follow:

  • We propose a novel method for solving partial least square regression (PLSR), named PLSR via optimization on bi-Grassmann manifold (PLSRbiGr), which decomposes the cross-covariance matrix () to an interpretable subspace ( and ) and simultaneously learns the latent factors via optimization on bi-Grassmann manifold (Sec 2.2).

  • We present Riemannian metric that equipped with Riemannian preconditioning, to self-adapt to the changing of subspace. Fortunately, Riemannian preconditioning can largely improves the algorithmic performance (i.e. convergence speed and classification accuracy) (Sec 3.2).

  • The results of EEG decoding demonstrate that PLSRbiGr outperforms conventional Euclidean-based methods and Riemannian-based methods for small sample data learning (Sec 3.1).

2 Method

2.1 Review of Partial Least Squares Regression

Partial least square regression (PLSR) is a wide class of models for learning the linear relationship between independent variables and dependent variables by means of latent variables. We present the schematic illustration of partial least square regression (PLSR) in Fig. 1.

To predict dependent variables from independent variables

, PLSR finds a set of latent variables (also called the latent vectors, score vectors, or components) by projecting both

and onto a lower dimensional subspace while at the same time maximizing the pairwise covariance between the latent variables and , which are presented in Eq. (1) & Eq. (2).

Figure 1: The PLSR decomposes the independent variables (EEG) and dependent variables (sample labels) as a sum of rank-one matrices.
(1)
(2)

where is a matrix of extracted latent variables from , and are latent variables from , their columns have the maximum correlations between each other. In addition, and are the loading matrices, and and are the residuals with respect to and .

However, most of the current methods for solving PLSR are based on the Euclidean space by performing the sum of a minimum number of rank-one decomposition to jointly approximate the independent variables and dependent variables . No existing method solves PLSR with bi-Grassmann manifold optimization. To this end, we propose a novel method for solving PLSR via optimization on the bi-Grassmann manifold.

2.1.1 Regression model

To predict the dependent variable given a new sample , its regression model is given by:

(3)

where is the predicted output, and is the regression coefficient calculated from the training data. Following to [chen2018solving], we can obtain the learned sub-spaces & , and its corresponding latent factors & by solving the SVD-type model via optimization on bi-Grassmann manifold.

2.2 PLSRbiGr: PLSR via Optimization on Bi-Grassmann Manifold

In the sequel, we present the SVD-type decomposition via optimization on bi-Grassmann manifold. As shown in Fig. 2, we treat the cross-product matrix (or tensor) of independent variables and dependent variables as the input data, and then perform the three factor decomposition via optimization on bi-Grassmann manifold.

Figure 2: SVD-type decomposition via optimization on bi-Grassmann manifold.

2.2.1 Three-factor SVD-type decomposition

The SVD-type decomposition of cross-product matrix (i.e. ) can be formulated as the following tensor-matrix product:

(4)

where denotes the count of iterations. To apply the Riemannian version of conjugate gradient descent or trust-region method, it needs to obtain the partial derivatives of with respect to , and , that is given by

(5)
(6)
(7)

Then, we project the Euclidean gradient (i.e. Eq.(5), Eq.(6), and Eq.(7)) onto the Riemannian manifold space.

2.2.2 SVD-type decomposition via optimization on bi-Grassmann manifold

In this subsection, we derive the bi-Grassmann manifold optimization for the SVD-type decomposition and is a manifold equipped with Riemannian metric. Recall that Stiefel manifold is the set of matrices whose columns are orthogonal, that is denoted by

(8)

For a Stiefel manifold , its related Grassmann manifold can be formulated as the quotient space of , under the equivalence relation defined by the orthogonal group,

(9)

here, is the orthogonal group defined by

(10)

Moreover, for the SVD-type decomposition, the optimization sub-spaces can be expressed as the following bi-Grassmann manifold:

(11)

that is equipped with following Riemannian metric:

(12)

In practice, the computational space (i.e. ) can be first decomposed into orthogonal complementary sub-spaces (normal space i.e. , and tangent space i.e. ), and then the tangent space can be further decomposed into the other two orthogonal complementary sub-spaces (horizontal space i.e. , and vertical space i.e. ), and eventually we project the Euclidean gradient to the horizontal space defined by the equivalence relation of orthogonal group [absil2009optimization].

2.2.3 Riemannian Gradient

To obtain the Riemannian gradient, it needs to project the Euclidean gradient onto the Riemannian manifold space [absil2009optimization], that is

(13)
(14)

where and is the scaling factors. is the project operator that involves two steps of operation, one is a mapping from ambient space to the tangent space , and the other is a mapping from tangent space to the horizontal space . Therefore, the computational space of Stiefel manifold can be decomposed into tangent space and normal space, and the tangent space can be further decomposed into two orthogonal complementary sub-spaces (i.e. horizontal space and vertical space) [kasai2016low]. Once the expression of Riemannian gradient and Riemannian Hessian are obtained, we can conduct the Riemannian manifold optimization by using Manopt toolbox [absil2009optimization].

3 Experiments and Results

To test the performance of our proposed algorithm, we conduct experiments on a lot of EEG signal decoding, whose performance is compared to several well known algorithms, including the statistically inspired modification of PLSR (SIMPLSR), SIMPLSR with the generalized Grassmann manifold (PLSRGGr), sparse SIMPLSR via optimization on generalized Stiefel manifold (SPLSRGSt)  [chen2018solving, de1993simpls], and higher order partial least squares regression (HOPLSR) [zhao2012higher].

3.1 EEG Decoding

In this subsection, we test the efficiency and accuracy of our proposed algorithm (PLSRbiGr) on the public PhysioNet MI dataset [schalk2004bci2000]. We compare PLSRbiGr with other existing algorithms, including the statistically inspired modification of PLSR (SIMPLSR), SIMPLSR with the generalized Grassmann manifold (PLSRGGr), sparse SIMPLSR via optimization on generalized Stiefel manifold (SPLSRGSt)  [chen2018solving, de1993simpls], and higher order partial least squares (HOPLSR) [zhao2012higher]

. To evaluate the performance of decoding algorithms, we use Accuracy (Acc) as the evaluation metric to quantify the performance of comparison algorithms.

Figure 3: Accuracy of 2-class MI classification task on PhysioNet MI dataset

In training, we set the 4-fold-cross-validation to obtain the averaged classification accuracy in testing samples. As shown in Fig. 3, PLSRbiGr generally achieves the best performance in comparison to the existing methods. The used PhysioNet EEG MI dataset consists of 2-class MI tasks (i.e. runs 3, 4, 7, 8, 11, and 12, with imagine movements of left fist or right fist)  [schalk2004bci2000], which is recorded from 109 subjects with 64-channel EEG signals (sampling rate equals to 160 Hz) during MI tasks. We randomly select 10 subjects from PhysioNet MI dataset in our experiments. The EEG signals are filtered with a band-pass filter (cutoff frequencies at Hz) and a spatial filter (i.e. Xdawn with 16 filters), therefore the resulting data is represented by .

Figure 4: Accuracy of 4-class SSVEP classification task on Macau SSVEP dataset

The used Macau SSVEP dataset contains 128-channel EEG recordings from 7 subjects sampled at 1000 Hz, which was recorded at University of Macau with their ethical approval. There are four types visual stimulus with flashing frequency at 10Hz, 12Hz, 15Hz, and 20Hz. To increase the signal-noise ratio (SNR) and reduce the dimension of raw EEG signals, the EEG signals are filtered with a band-pass filter (cutoff frequencies at Hz) and a spatial filter (i.e.

Xdawn with 16 filters). The epochs of 1-second EEG signals before every time point of the SSVEP data were extracted and down-sampled to 200 Hz, therefore the resulting data is represented by

. Fig. 4 presents the classification accuracy of all comparison algorithms on the SSVEP dataset.

3.2 Effects of Riemannian Preconditioning

Furthermore, we test the effects of Riemannian preconditioning. In PLSRbiGr, the scaling factors and of Riemannian preconditioning provides an effective strategy to accelerate the convergence speed. As shown in Table 1, the classification accuracy and running time of PLSRbiGr equipped with Riemannian preconditioning are closely better than results of other methods, such as PLSRGGr, PLSRGStO, and SPLSRGSt that have not taken into account of Riemannian preconditioning.

width=1.00 ID Accuracy Running time (s) preconditioned non-preconditioned preconditioned non-preconditioned S001 0.8487±0.0148 0.3063±0.0394 0.0571 0.0570 S002 0.7036±0.0123 0.1211±0.0241 0.0624 0.0544 S003 0.5903±0.0183 0.1854±0.0278 0.0576 0.0551 S004 0.7360±0.0231 0.1056±0.0258 0.0960 0.0591 S005 0.8026±0.0182 0.3187±0.0483 0.0692 0.0560 S006 0.7987±0.0075 0.1522±0.0285 0.0578 0.0574 S007 0.8375±0.0124 0.2135±0.0375 0.0593 0.0557 mean 0.7596±0.0152 0.2004±0.0330 0.0656 0.0563

Table 1: The effects of Riemannian preconditioning on the classification accuracy and running time. The test experiments were conducted on Macau SSVEP dataset by using PLSRbiGr.

4 Discussion and Conclusion

In this paper, we propose a novel method, named partial least square regression via optimization on bi-Grassmann manifold (PLSRbiGr) for EEG signal decoding. It features to find the objective solution via optimization on bi-Grassmann manifold. Specifically, to relax the orthogonality constraints of objective function, PLSRbiGr converts the constrained optimization problem in Euclidean space to an optimization problem defined on bi-Grassmann manifold, thereby its corresponding subspaces (i.e. and ) can be learned by using Riemannian manifold optimization instead of the traditional methods that by deflating the residuals in each iteration. In practice, PLSRbiGr can also be used for image classification, and many other prediction tasks, which has no such limitations. More importantly, extensive experiments on MI and SSVEP datasets suggest that PLSRbiGr robustly outperforms other methods optimized in Euclidean space and has a fast convergence than other algorithms that solved in Riemannian manifold space (Table 1

), and the scaling factor of Riemannian preconditioning provides a good generalization ability between subjects and robust to variances (Table

1).

A limitation of our method is that the column of cross-product matrix equals to the class of training samples, thereby the low-rank nature of PLSRbiGr is a certain value. To address such problem, several different directions can be carried out in our feature work. For example, we only consider the case of cross-product matrix, when cross covariance is a multi-way array (tensors), it needs to further consider the product manifold optimization. For example, the higher order partial least squares (HOPLSR) can also be defined on the Riemannian manifold space [zhao2012higher], and directly optimized over the entire product of manifolds, thereby the rank of cross-product tensor can be automatically inferred from the optimization procedures. Another issue that needs to be further investigated is how to scale the computation of Riemannian preconditioning to the higher order partial least squares (HOPLSR).

References