1 Introduction
Extracting the latent factor is an essential procedure for discovering the latent space of highdimensional data; thereby proposed lots of regression model for latent semantic or variable analysis. Among of them, partial least squares regression (PLSR) is a wellestablished model for learning the latent variables, by a sequential way to learn the latent subspaces while maximally maintaining the correlations between the latent variables of independent variables
and dependent variables . Specifically, it can be described as the projection of two variables and onto the lower dimensional subspaces and . This application can be found in a various of fields, including chemometrics [brereton2018partial, hasegawa2000rational], chemical process control [dong2018regression, zheng2018semisupervised], and neuroscience [chu2020decoding, hoagey2019joint].As a result, there emerges a lot of partial least square regression models, while most of them are based on the Euclidean space to solve the latent factors and identify the latent factors column by column in each iteration. A drawback of this approach is that it easily converges to a spurious local minimum
and hardly obtain a globally optimal solution. Given an extension of PLSR to the Nway tensors with rankone decomposition, which can provide an improvement on the intuitive interpretation of the model
[bro1996multiway]. However, Nway PLSR suffers from high computational complexity and slow convergence in dealing with complex data.To address problems above, we propose a threefactor SVDtype decomposition of PLSR via optimization on biGrassmann manifold (PLSRbiGr). Hence, its corresponding subspaces and associated with independent variable and dependent variable can be solved by using a nonlinear manifold optimization method. Moreover, we leverage the Riemannian preconditioning of matrix scaling to regulate the Riemannian metric in each iteration and selfadapt to the changing of subspace [kasai2016low]. To validate the performance of PLSRbiGr, we conduct two EEG classification tasks on the motor imagery (MI) and steadystate visual evoked potential (SSVEP) datasets. The results demonstrate PLSRbiGr is superior to the inspired modification of PLSR (SIMPLSR) [de1993simpls], SIMPLSR with the generalized Grassmann manifold (PLSRGGr), SIMPLSR with product manifold (PLSRGStO), as well as the sparse SIMPLSR via optimization on generalized Stiefel manifold (SPLSRGSt) [chen2018solving]. Our main contributions can be summarized as follow:

We propose a novel method for solving partial least square regression (PLSR), named PLSR via optimization on biGrassmann manifold (PLSRbiGr), which decomposes the crosscovariance matrix () to an interpretable subspace ( and ) and simultaneously learns the latent factors via optimization on biGrassmann manifold (Sec 2.2).

We present Riemannian metric that equipped with Riemannian preconditioning, to selfadapt to the changing of subspace. Fortunately, Riemannian preconditioning can largely improves the algorithmic performance (i.e. convergence speed and classification accuracy) (Sec 3.2).

The results of EEG decoding demonstrate that PLSRbiGr outperforms conventional Euclideanbased methods and Riemannianbased methods for small sample data learning (Sec 3.1).
2 Method
2.1 Review of Partial Least Squares Regression
Partial least square regression (PLSR) is a wide class of models for learning the linear relationship between independent variables and dependent variables by means of latent variables. We present the schematic illustration of partial least square regression (PLSR) in Fig. 1.
To predict dependent variables from independent variables
, PLSR finds a set of latent variables (also called the latent vectors, score vectors, or components) by projecting both
and onto a lower dimensional subspace while at the same time maximizing the pairwise covariance between the latent variables and , which are presented in Eq. (1) & Eq. (2).(1) 
(2) 
where is a matrix of extracted latent variables from , and are latent variables from , their columns have the maximum correlations between each other. In addition, and are the loading matrices, and and are the residuals with respect to and .
However, most of the current methods for solving PLSR are based on the Euclidean space by performing the sum of a minimum number of rankone decomposition to jointly approximate the independent variables and dependent variables . No existing method solves PLSR with biGrassmann manifold optimization. To this end, we propose a novel method for solving PLSR via optimization on the biGrassmann manifold.
2.1.1 Regression model
To predict the dependent variable given a new sample , its regression model is given by:
(3) 
where is the predicted output, and is the regression coefficient calculated from the training data. Following to [chen2018solving], we can obtain the learned subspaces & , and its corresponding latent factors & by solving the SVDtype model via optimization on biGrassmann manifold.
2.2 PLSRbiGr: PLSR via Optimization on BiGrassmann Manifold
In the sequel, we present the SVDtype decomposition via optimization on biGrassmann manifold. As shown in Fig. 2, we treat the crossproduct matrix (or tensor) of independent variables and dependent variables as the input data, and then perform the three factor decomposition via optimization on biGrassmann manifold.
2.2.1 Threefactor SVDtype decomposition
The SVDtype decomposition of crossproduct matrix (i.e. ) can be formulated as the following tensormatrix product:
(4) 
where denotes the count of iterations. To apply the Riemannian version of conjugate gradient descent or trustregion method, it needs to obtain the partial derivatives of with respect to , and , that is given by
(5)  
(6)  
2.2.2 SVDtype decomposition via optimization on biGrassmann manifold
In this subsection, we derive the biGrassmann manifold optimization for the SVDtype decomposition and is a manifold equipped with Riemannian metric. Recall that Stiefel manifold is the set of matrices whose columns are orthogonal, that is denoted by
(8) 
For a Stiefel manifold , its related Grassmann manifold can be formulated as the quotient space of , under the equivalence relation defined by the orthogonal group,
(9) 
here, is the orthogonal group defined by
(10) 
Moreover, for the SVDtype decomposition, the optimization subspaces can be expressed as the following biGrassmann manifold:
(11) 
that is equipped with following Riemannian metric:
(12) 
In practice, the computational space (i.e. ) can be first decomposed into orthogonal complementary subspaces (normal space i.e. , and tangent space i.e. ), and then the tangent space can be further decomposed into the other two orthogonal complementary subspaces (horizontal space i.e. , and vertical space i.e. ), and eventually we project the Euclidean gradient to the horizontal space defined by the equivalence relation of orthogonal group [absil2009optimization].
2.2.3 Riemannian Gradient
To obtain the Riemannian gradient, it needs to project the Euclidean gradient onto the Riemannian manifold space [absil2009optimization], that is
(13)  
(14)  
where and is the scaling factors. is the project operator that involves two steps of operation, one is a mapping from ambient space to the tangent space , and the other is a mapping from tangent space to the horizontal space . Therefore, the computational space of Stiefel manifold can be decomposed into tangent space and normal space, and the tangent space can be further decomposed into two orthogonal complementary subspaces (i.e. horizontal space and vertical space) [kasai2016low]. Once the expression of Riemannian gradient and Riemannian Hessian are obtained, we can conduct the Riemannian manifold optimization by using Manopt toolbox [absil2009optimization].
3 Experiments and Results
To test the performance of our proposed algorithm, we conduct experiments on a lot of EEG signal decoding, whose performance is compared to several well known algorithms, including the statistically inspired modification of PLSR (SIMPLSR), SIMPLSR with the generalized Grassmann manifold (PLSRGGr), sparse SIMPLSR via optimization on generalized Stiefel manifold (SPLSRGSt) [chen2018solving, de1993simpls], and higher order partial least squares regression (HOPLSR) [zhao2012higher].
3.1 EEG Decoding
In this subsection, we test the efficiency and accuracy of our proposed algorithm (PLSRbiGr) on the public PhysioNet MI dataset [schalk2004bci2000]. We compare PLSRbiGr with other existing algorithms, including the statistically inspired modification of PLSR (SIMPLSR), SIMPLSR with the generalized Grassmann manifold (PLSRGGr), sparse SIMPLSR via optimization on generalized Stiefel manifold (SPLSRGSt) [chen2018solving, de1993simpls], and higher order partial least squares (HOPLSR) [zhao2012higher]
. To evaluate the performance of decoding algorithms, we use Accuracy (Acc) as the evaluation metric to quantify the performance of comparison algorithms.
In training, we set the 4foldcrossvalidation to obtain the averaged classification accuracy in testing samples. As shown in Fig. 3, PLSRbiGr generally achieves the best performance in comparison to the existing methods. The used PhysioNet EEG MI dataset consists of 2class MI tasks (i.e. runs 3, 4, 7, 8, 11, and 12, with imagine movements of left fist or right fist) [schalk2004bci2000], which is recorded from 109 subjects with 64channel EEG signals (sampling rate equals to 160 Hz) during MI tasks. We randomly select 10 subjects from PhysioNet MI dataset in our experiments. The EEG signals are filtered with a bandpass filter (cutoff frequencies at Hz) and a spatial filter (i.e. Xdawn with 16 filters), therefore the resulting data is represented by .
The used Macau SSVEP dataset contains 128channel EEG recordings from 7 subjects sampled at 1000 Hz, which was recorded at University of Macau with their ethical approval. There are four types visual stimulus with flashing frequency at 10Hz, 12Hz, 15Hz, and 20Hz. To increase the signalnoise ratio (SNR) and reduce the dimension of raw EEG signals, the EEG signals are filtered with a bandpass filter (cutoff frequencies at Hz) and a spatial filter (i.e.
Xdawn with 16 filters). The epochs of 1second EEG signals before every time point of the SSVEP data were extracted and downsampled to 200 Hz, therefore the resulting data is represented by
. Fig. 4 presents the classification accuracy of all comparison algorithms on the SSVEP dataset.3.2 Effects of Riemannian Preconditioning
Furthermore, we test the effects of Riemannian preconditioning. In PLSRbiGr, the scaling factors and of Riemannian preconditioning provides an effective strategy to accelerate the convergence speed. As shown in Table 1, the classification accuracy and running time of PLSRbiGr equipped with Riemannian preconditioning are closely better than results of other methods, such as PLSRGGr, PLSRGStO, and SPLSRGSt that have not taken into account of Riemannian preconditioning.
4 Discussion and Conclusion
In this paper, we propose a novel method, named partial least square regression via optimization on biGrassmann manifold (PLSRbiGr) for EEG signal decoding. It features to find the objective solution via optimization on biGrassmann manifold. Specifically, to relax the orthogonality constraints of objective function, PLSRbiGr converts the constrained optimization problem in Euclidean space to an optimization problem defined on biGrassmann manifold, thereby its corresponding subspaces (i.e. and ) can be learned by using Riemannian manifold optimization instead of the traditional methods that by deflating the residuals in each iteration. In practice, PLSRbiGr can also be used for image classification, and many other prediction tasks, which has no such limitations. More importantly, extensive experiments on MI and SSVEP datasets suggest that PLSRbiGr robustly outperforms other methods optimized in Euclidean space and has a fast convergence than other algorithms that solved in Riemannian manifold space (Table 1
), and the scaling factor of Riemannian preconditioning provides a good generalization ability between subjects and robust to variances (Table
1).A limitation of our method is that the column of crossproduct matrix equals to the class of training samples, thereby the lowrank nature of PLSRbiGr is a certain value. To address such problem, several different directions can be carried out in our feature work. For example, we only consider the case of crossproduct matrix, when cross covariance is a multiway array (tensors), it needs to further consider the product manifold optimization. For example, the higher order partial least squares (HOPLSR) can also be defined on the Riemannian manifold space [zhao2012higher], and directly optimized over the entire product of manifolds, thereby the rank of crossproduct tensor can be automatically inferred from the optimization procedures. Another issue that needs to be further investigated is how to scale the computation of Riemannian preconditioning to the higher order partial least squares (HOPLSR).