1 Introduction
Partial least squares regression (PLSR) is a statistical method for modeling a linear relationship between two data sets, which may be two different descriptions of an object. Instead of finding hyperplanes of maximum variance of the original datasets, it finds the maximum degree of linear association between two latent components which are the projection of two original data sets to a new space, and based on those latent components, regresses the loading matrices of the two original datasets, respectively. Compared with the multiple linear regression (MLR)
[Aiken, West, and Pitts2003] and principal component regression (PCR) [Kendall1957, Jolliffe1982], PLSR has also been proved to be not only useful for highdimensional data
[Huang et al.2005, Boulesteix and Strimmer2007], but also to be a good alternative because it is more robust and adaptable [Wold et al.1984]. Robust means that the model parameters do not change very much when new training samples are taken from the same total population. Thus PLSR has wide applications in several areas of scientific research [Liton et al.2015, Hao, Thelen, and Gao2016, Worsley1997, Hulland1999, Lobaugh, West, and McIntosh2001] since the 1960s.There exist many forms of PLSR, such as NIPALS (the nonlinear iterative partial least squares)[Wold1975], PLS1 (one of the data sets consists of a single variable)[Höskuldsson1988]
and PLS2 (both data sets are multidimensional) where a linear inner relation between the projection vectors exists, PLSSB
[Wegelin2000, Rosipal and Kramer2006], where the extracted projection matrices are in general not mutually orthogonal, statistically inspired modification of PLS (SIMPLS) [Jong1993], which calculates the PLSR factors directly as linear combinations of the original data sets, Kernel PLSR [Rosipal2003] applied in a reproducing kernel Hilbert space, and Sparse PLSR [Chun and Keles2010] to achieve factors selection by producing sparse linear combinations of the original data sets.However, it is difficult to directly solve for projection matrices with orthogonality as a whole in Euclidean spaces. To the best of our knowledge, all the existing algorithms greedily proceed through a sequence of lowdimensional subspaces: the first dimension is chosen to optimize the PLSR objective, e.g., maximizing the covariance between the projected data sets, and then subsequent dimensions are chosen to optimize the objective on a residual or reduced data sets. In some sense, this can be actually fruitful but limited, often resulting in ad hoc or suboptimal solutions. To overcome the shortcoming, we are devoted to proposing several novel models and algorithms to solve PLSR problems under the framework of Riemannian manifold optimisation [Absil, Mahony, and Sepulchre2008]. For the optimisation problems from PLSR, the orthogonality constraint can be easily eliminated in Stiefel/Grassmann manifolds with the possibility of solving the factors of PLSR as a whole and being steadily convergent at global optimum.
In general, Riemannian optimization is directly based on the curved manifold geometry such as Stiefel/Grassmann manifolds, benefiting from a lower complexity and better numerical properties. The geometrical framework of Stiefel and Grassmann manifolds were proposed in [Edelman, Arias, and Smith1998]
. Stiefel manifold was successfully applied in neural networks
[Nishimori and Akaho2005] and linear dimensionality reduction [Cunningham and Ghahramani2014]. Meanwhile, Grassmann manifold has been studied in two major fields, data analysis such as video stream analysis [He, Balzano, and Szlam2012], clustering subspaces into classes of subspaces [Wang et al.2014, Wang et al.2016], and parameter analysis such as an unifying view on the subspacebased learning method [Hamm and Lee2008], and optimization over the Grassmann manifold [Mishra and Sepulchre2014, Mishra et al.2014]. According to [Edelman, Arias, and Smith1998, Absil, Mahony, and Sepulchre2004], the generalized Stiefel manifold is endowed with a scaled metric by making it a Riemannian submanifold based on Stiefel manifold, which is more flexible to the constraints of the optimization raised from the generalised PLSR. Generalized Grassmann manifold is generated by the Generalized Stiefel manifold, and each point on this manifold is a collection of “scaled” vector subspaces of dimension embedded in . Another important matrix manifold is the oblique manifold which is a product of spheres. Absil et al. [Absil and Gallivan2006]investigate the geometry of this manifold and show how independent component analysis can be cast on this manifold as nonorthogonal joint diagonalization.
Some conceptual algorithms and its convergence analysis based on ideas of Riemannian manifolds, and the efficient numerical implementation [Absil, Mahony, and Sepulchre2008] have been developed recently. This has paved the way for one to investigate overall algorithms to solve PLSR problems based on optimization algorithms on Riemannian manifolds. Particularly, Mishra et al. [Boumal et al.2014] have developed a useful MATLAB toolbox ManOpt (Manifold Optimization) http://www.manopt.org/ which can be perfectly adopted in this research to test the algorithms to be developed.
The contributions of this paper are:

We establish several novel PLSR models on Riemannian manifolds and give some matrices representations of relate optimization ingredients;

We give new algorithms for the proposed PLSR model on Riemannian manifolds, which are able to calculate all the factors as a whole so as to obtain optimal solutions.
2 Notations and Preliminaries
This section will briefly describe some notations and concepts that will be used throughout the paper.
2.1 Notations
We denote matrices by boldface capital letters, e.g., , vectors by boldface lowercase letters, e.g., , and scalars by letters, e.g., . The superscript denotes the transpose of a vector/matrix. denotes the diagonal matrix with elements from the diagonal of . means that is a positive definite matrix. The SVD decomposition of a matrix is denoted by , while the eigendecomposition of a diagonable square matrix is denoted by .
The set of all order orthogonal matrices is denoted by
also called orthogonal group of order . The Stiefel manifold is the set of all the matrices whose columns are orthogonal, denoted by
(1) 
Given a Stiefel manifold , the related Grassmann manifold can be formed as the quotient space of under the equivalent relation defined by the orthogonal group , i.e.
(2) 
Two Stiefel points are equivalent to each other, if there exists an such that . We use to denote the equivalent class for a given , and is called a representation of the Grassmann point . More intuitively, Grassmann manifold is the set of all dimensional subspaces in .
In this paper, we are also interested in the socalled generalized Stiefel manifold which is defined under the orthogonality
(3) 
where is a given positive definite matrix. And similarly the generalized Grassmann manifold is defined by
(4) 
If we relax the orthogonal constraints but retain unit constraint, we have the socalled Oblique manifold which consists of all the matrices whose columns are unit vectors. That is
(5) 
2.2 Partial Least Squares Regression (PLSR)
Let , are observation samples and , are response data. Then , .
Suppose there exists a linear regression relation
(6) 
where is the regression coefficient and is the residual matrix. PLSR is usually an effective approach to dealing with the case of when the classical linear regression fails since the covariance matrix is singular.
In order to obtain , PLSR generally decomposes datasets and ( and are preprocessed to be zeromean data) into the following form
(7) 
where and are vectors giving the latent components for the observations, and represent loading vectors. and are residual matrices.
PLSR searches the latent components and such that the squared covariance between them is maximized, where the projection vectors and satisfy the constraints and , respectively. The solution is given by
(8) 
It can be shown that the projection vector
corresponds to the first eigenvector of
[Höskuldsson1988, Rosipal and Kramer2006] and the optimal solution of(9) 
is also the first eigenvector of . Thus both objectives (8) and (9) have the same solution on .
We can also obtain while swapping the position of and . After obtaining the projection vectors and , the latent vectors and are also acquired.
The essence of (8) is to maximum degree of linear association between and . Suppose that a linear relation between the latent vectors and exists, e.i. , where is a constant, is error term, and and can be absorbed by and , respectively. Based on this relation, (7) can be casted as the following formula
(10) 
Thus and can be obtained by the least square method. Then and can be updated
(11) 
This procedure is reiterated times, and we can obtain the projection matrix , latent components , loading matrices and . And (10) can be recast as
(12) 
According to , and regression coefficient .
3 The PLSR on Riemannian Manifolds
The core of PLSR is to optimize the squared covariance between latent components and the data , see (9). Boulesterx and Strimmer [Boulesteix and Strimmer2007] had summarized several different model modification for optimizing the projection matrix in Euclidean spaces. However all the algorithms take a greedy strategy to calculate all the factors one by one, and thus often result in suboptimal solutions. In order to overcome this shortcoming, this paper will take those models as optimization on Riemannian manifolds, and propose an algorithms for solving the projection matrix thus the latent component matrix as a whole on Riemannian manifolds.
3.1 SIMPLSR on the Generalized Grassmann Manifolds
We can transform model (9) into following optimization problem
(13) 
where and
is the identity matrix.
Because of the orthogonal constraint, this constrained optimization problem can be taken as unconstrained optimization on Stiefel manifold
(14) 
To represent the data sets and from (12), it is more reasonable to constrain latent components in an orthogonal space. Thus model (14) can be rewritten as
(15) 
Similar to model (13), we can first convert problem (15) to an unconstrained problem on the generalized Stiefel manifold with , i.e.,
(16) 
Let be defined on generalized Stiefel manifold . For any matrix , we have . This means that the maximizer of is unidentifiable on generalized Stiefel in the sense that if is a solution to (16), then so is for any . This may cause some trouble for numerical algorithms for solving (16).
If we contract all the generalized Stiefel points in its equivalent class together, it is straightforward to convert the optimization (16) on generalized Stiefel manifold to the generalized Grassmann manifold [Edelman, Arias, and Smith1998] as follows
(17) 
The model (17) is called as statistically inspired modification of PLSR (SIMPLSR) on generalized Grassmann manifolds.
We will use the metric on generalized Grassmann manifold. The matrix representation of the tangent space of the generalized Grassmann manifold is identified with a subspace of the tangent space of the total space that does not produce a displacement along the equivalence classes. This subspace is called the horizontal space [Mishra and Sepulchre2014]. The horizontal space . The other related ingredients such as projection operator, retraction operator, transport operator for implementing an offtheshelf nonlinear conjugategradient algorithm [Tan et al.2014] for (17) are listed in Algorithm 1 which is the optimization algorithm of PLSR on generalized Grassmann manifold.
3.2 SIMPLSR on Product Manifolds
Another equivalent expression for SIMPLSR [Boulesteix and Strimmer2007] which often appear in the literature is as follows
(18)  
The feasible domain of and can be considered as a product manifold of a generalized Stiefel manifold with (see (3)) and Oblique manifold (see (5)), respectively. The product manifold is denoted as
(19) 
So model (18) can be modified as
(20) 
We call this model as equivalent statistically inspired modification of PLSR (ESIMPLSR) on product manifolds.
To induce the geometry of the product manifold, we use the metric and the tangent space on the generalized Stiefel manifold, and the metric and the tangent space on the Oblique manifold. We optimize model (20) on the product manifold by alternating directions method (ADM) [Boyd et al.2011] and nonlinear Riemannian conjugate gradient method (NRCG), summarized in Algorithm 2. It is the optimization algorithm of PLSR on the generalized Stiefel manifold.
4 Experimental Results and Analysis
In this section, we conduct several experiments on face recognition and object classification on several public databases to assess the proposed algorithms. These experiments are designed to compare the feature extraction performance of the proposed algorithms with existing algorithms including principal component regression (PCR)
[Ns and Martens1988] ^{1}^{1}1PCR and SIMPLS codes are from http://cn.mathworks.com/help/stats/examples.htmland SIMPLSR [Jong1993]. All algorithms are coded in Matlab (R2014a) and run on a PC machine installed a 64bit operating system with an intel(R) Core (TM) i7 CPU (3.4GHz with singlethread mode) and 28 GB memory.In our experiments, face dataset have samples from classes. The th class includes samples. The response data (labels) can be set as binary matrix,
PLSR are used to estimate the regression coefficient matrix
by exploiting training data sets and . Then the response matrices can be predicted for testing data . We get the predicted response matrix (predicted labels) by setting the largest value to 1 and others to 0 for each row of for classification.4.1 Face Recognition
Data Preparation
Face data are from the following two public available databases:

The AR face dataset (http://rvl1.ecn.purdue.edu/aleix/aleixfaceDB.html)

The Yale face dataset (http://www.cad.zju.edu.cn/home/dengcai/Data/FaceData.html)
The AR face database consists of over 3,200 frontal color images for 126 people (70 men and 56 women). Each individual has 26 images which were collected in two different sessions separated by two weeks. There are 13 images from each session. In experiments, we select data from 100 randomly chosen individuals. The thirteen in first session of each individual are used for training and the other thirteen in second session for testing. Each image is cropped and resized to 60 43 pixels, then vectorized as a 2580dimension vector.
The Yale face database contains 165 images from 15 individuals. Each individual provides 11 different images. In the experiment, 6 images from each individual are randomly selected as training sample while the remaining images are for testing. Each images are scaled to a resolution of pixels, then vectorized as a 4096dimensional vector.
Recognition Performance
we compare the recognition performance of PCR, SIMPLSR, PLSRGGr and PLSGRStO on both AR and Yale face datasets.
Figure 1 reports the experiment results on AR face database. It shows that the recognition performance of our proposed algorithms, PLSRGGr and PLSRGStO, is better than other methods more than 4 percent when reduced dimension is greater than 60. Obviously, PLSRGGr has good performance all the time. This demonstrates that our proposed optimization models and algorithms of PLSR on Riemannian manifold significantly enhances the accuracy. The reason is that calculating PLSR factors as a whole on Riemannian manifolds can obtain the optimal solution.
PCR  SIMPLSR  PLSRGGr  PLSRGStO  

12  33.273.10  26.004.59  20.070.30  
13  30.874.66  21.73.004.06  15.930.30  
14  29.274.79  19.003.82  8.130.60  
15  25.874.61  15.803.13  10.670 
GDA  DCC  LSRM  PCR  SIMPLSR  PLSRGGr  PLSRGStO  
5        23.75  26.25  
6        22.50  15.00  
7        21.25  7.50  
8  2.50  11.20  5.00  20.00  3.75 
Another experiment was conducted on Yale face database. In this experiment, the compared algorithms are PCR, SIMPLSR, PLSRGGr and PLSRGStO, and every algorithm is run 20 times. Table 1
lists the recognition error rates including their mean and standard deviation values with reduced dimensions
. From the table we can observe that the mean of recognition error rates of PLSRGGr and PLSRGStO is superior to others with a margin of 5 to 14 percentages, and the standard deviation is also smaller. This demonstrates that our proposed methods more robust. The bold figures in the table highlight the best results for comparison.4.2 Object Classification
Data Preparation
For the object classification tasks, we use the following two public available databases for testing,

COIL20 dataset (http://www.cs.columbia.edu/CAVE/software/softlib/coil20.php);
Columbia Object Image Library (COIL20) contains 1,440 grayscale images from 20 objects. Each object offers 72 images. 36 images of each object were selected by equal interval sampling as training while the remaining images are for testing.
ETH80 database [Leibe and Schiele2003] consists of 8 categories of objects Each category contains 10 objects with 41 views per object, spaced equally over the viewing hemisphere, for a total of 3280 images. Images are resized to pixels with grayscale pixels and vectorized as 1024dimensional vector. For each category and each object, we model the pose variations by a subspace of the size , spanned by the 7 largest eigenvectors from SVD. In our experiments, the Grassmann distance measure between two point is defined as which is the Fnorm of principal angles [Wolf and Shashua2003],
denotes the singular value of
. We follow the experimental protocol from [Hamm and Lee2008] which is tenfold cross validation for imageset matching.Classification Performance
Figure 2 lists the classification error of four algorithms on COIL20 database. The classification errors are recorded for the different reduced dimension , respectively. From the results, it can be found that the proposed methods, PLSRGGr and PLSRGStO, outperform their compared nonmanifold methods with a margin of 2 to 10 percentages when reduced dimension is greater than 10.
To demonstrate the effectiveness of our regression algorithms on the ETH80 data set. We compared with several contrast methods. Table 2 reports the experimental results with reduced dimension . The results of GDA (Grassmann discriminant analysis) [Hamm and Lee2008], DCC (Discriminant canonical correlation) [Kim, Kittler, and Cipolla2007], LSRM (Least squares regression on manifold) [Lui2016] in last line of Table 2 are from [Lui2016]. Compared with state oftheart algorithms, our proposed methods, PLSRGGr and PLSRGStO, both outperform all of them.
5 Conclusions
In this paper, we developed PLSR optimization models on both Riemannian manifolds, i.e. generalized Grassmann manifold and product manifold. We also gave optimization algorithms on both the Riemannian manifolds, respectively. Each of new models transforms the corresponding original constrained optimization problem to an unconstraint optimization on Riemannian manifolds. This makes it possible to calculate all the PLSR factors as a whole to obtain the optimal solution. The experimental results show our proposed PLSRGGr and PLSRGStO outperform other methods on several public datasets.
References
 [Absil and Gallivan2006] Absil, P., and Gallivan, K. 2006. Joint diagonalization on the oblique manifold for independent component analysis. In Acoustics,Speech, and Signal Processing (ICASSP), volume 5.
 [Absil, Mahony, and Sepulchre2004] Absil, P. A.; Mahony, R.; and Sepulchre, R. 2004. Riemannian geometry of Grassmann manifolds with a view on algorithmic computation. Acta Applicandae Mathematica 80(2):199–220.
 [Absil, Mahony, and Sepulchre2008] Absil, P. A.; Mahony, R.; and Sepulchre, R. 2008. Optimization Algorithm on Matrix Manifolds. Princeton University Press.
 [Aiken, West, and Pitts2003] Aiken, L. S.; West, S. G.; and Pitts, S. C. 2003. Multiple linear regression. Handbook of Psychology. 4(19):481–507.
 [Boulesteix and Strimmer2007] Boulesteix, A., and Strimmer, K. 2007. Partial least squares: A versatile tool for the analysis of highdimensional genomic data. Briefings in Bioinformatics 8(1):32–44.
 [Boumal et al.2014] Boumal, N.; Mishra, B.; Absil, P. A.; and Sepulchre, R. 2014. Manopt, a Matlab toolbox for optimization on manifolds.
The Journal of Machine Learning Research
15(1):1455–1459.
 [Boyd et al.2011] Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; and Eckstein, J. 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends®in Machine Learning 3(1):1–122.
 [Chun and Keles2010] Chun, H., and Keles, S. 2010. Sparse partial least squares regression for simultaneous dimension reduction and variable selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72(1):3–25.
 [Cunningham and Ghahramani2014] Cunningham, J. P., and Ghahramani, Z. 2014. Unifying linear dimensionality reduction. arXiv:1406.0873.
 [Edelman, Arias, and Smith1998] Edelman, A.; Arias, T.; and Smith, S. 1998. The geometry of algorithms with orthogonality constraints. SIAM Journal on Matrix Analysis and Applications 20(2):303–353.
 [Hamm and Lee2008] Hamm, J., and Lee, D. 2008. Grassmann discriminant analysis: A unifying view on subspacebased learning. In International conference on machine learning, 376–383.
 [Hao, Thelen, and Gao2016] Hao, X.; Thelen, K.; and Gao, J. 2016. Spatial variability in biomass yield of switchgrass, native prairie, and corn at field scale. Biofuels 108(2):548–558.
 [He, Balzano, and Szlam2012] He, J.; Balzano, L.; and Szlam, A. 2012. Incremental gradient on the grassmannian for online foreground and background separation in subsampled video. In Computer Vision and Pattern Recognition.
 [Höskuldsson1988] Höskuldsson, A. 1988. Pls regression methods. Journal of Chemometrics 2(3):211–228.
 [Huang et al.2005] Huang, X.; Pan, W.; Grindle, S.; and Han, X. 2005. A comparative study of discriminating human heart failure etiology using gene expression profiles. BMC Bioinformatics 6:205.
 [Hulland1999] Hulland, J. 1999. Use of partial least squares (PLS) in strategic management research: A review of four recent studies. Strategic Management Journal 20(2):195–204.
 [Jolliffe1982] Jolliffe, I. 1982. A note on the use of principal components in regression. Applied Statistics 31(3):300–303.
 [Jong1993] Jong, S. D. 1993. SIMPLS: An alternative approach to partial least squares regression. Chemometrics and Intelligent Laboratory Systems 18(3):251–263.
 [Kendall1957] Kendall, M. G. 1957.
A Course in Multivariate Analysis
. Griffin, London.
 [Kim, Kittler, and Cipolla2007] Kim, T. K.; Kittler, J.; and Cipolla, R. 2007. Discriminative learning and recognition of image set classes using canonical correlations. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(6):1005–1018.
 [Leibe and Schiele2003] Leibe, B., and Schiele, B. 2003. Analyzing appearance and contour based methods for object categorization. In Computer Vision and Pattern Recognition.
 [Liton et al.2015] Liton, M. A. K.; Helen, S.; Das, M.; Islam, D.; and Karim, M. 2015. Prediction ofaccuratep values of some substituted carboxylicacids with low cost of computational methods. Journal of Physical & Theoretical Chemistry 12(3):243–255.
 [Lobaugh, West, and McIntosh2001] Lobaugh, N.; West, R.; and McIntosh, A. 2001. Spatiotemporal analysis of experimental differences in eventrelated potential data with partial least squares. Sychophysiology 38(3):517–530.
 [Lui2016] Lui, Y. 2016. A general least squares regression framework on matrix manifolds for computervision. In Riemannian Computing in Computer Vision. Springer International Publishing. 303–323.
 [Mishra and Sepulchre2014] Mishra, B., and Sepulchre, R. 2014. R3MC: A Riemannian threefactor algorithm for lowrank matrix completion. In Conference on Decision and Control, 1137–1142.
 [Mishra et al.2014] Mishra, B.; Meyer, G.; Bonnabel, S.; and Sepulchre, R. 2014. Fixedrank matrix factorizations and Riemannian lowrank optimization. Computational Statistics 29(3):591–621.
 [Ns and Martens1988] Ns, T., and Martens, H. 1988. Principal component regression in NIR analysis: Viewpoints, background details and selection of components. Journal of Chemometrics 2(2):155–167.
 [Nishimori and Akaho2005] Nishimori, Y., and Akaho, S. 2005. Learning algorithms utilizing quasigeodesic flows on the Stiefel manifold. Neurocomputing 67:106–135.
 [Rosipal and Kramer2006] Rosipal, R., and Kramer, N. 2006. Overview and recent advances in partial least squares. Lecture Notes in Computer Science 3940:34–51.
 [Rosipal2003] Rosipal, R. 2003. Kernel partial least squares for nonlinear regression and discrimination. Neural Network World 13(3):291–300.
 [Tan et al.2014] Tan, M.; Tsang, I.; Wang, L.; Vandereycken, B.; and Pan, S. 2014. Riemannian pursuit for big matrix recovery. In International Conference on Machine Learning, volume 32, 1539–1547.
 [Wang et al.2014] Wang, B.; Hu, Y.; Gao, J.; Sun, Y.; and Yin, B. 2014. Low rank representation on grassmann manifolds. In Asian Conference on Computer Vision.
 [Wang et al.2016] Wang, B.; Hu, Y.; Gao, J.; Sun, Y.; and Yin, B. 2016. Product grassmann manifold representation and its lrr models. In
American Association for Artificial Intelligence
.
 [Wegelin2000] Wegelin, J. 2000. A survey of partial least squares (PLS) methods, with emphasis on the twoblock case. Technical report, Department of Statistics, University of Washington, Seattle.
 [Wold et al.1984] Wold, S.; Ruhe, H.; Wold, H.; and Dunn, W. 1984. The collinearity problem in linear regression. The partial least squares (PLS) approach to generalized inverses. SIAM Journal of Scientific and Statistical Computations 5(3):735–743.
 [Wold1975] Wold, H. 1975. Path Models with Latent Variables: The NIPALS Approach. Academic Press.
 [Wolf and Shashua2003] Wolf, L., and Shashua, A. 2003. Learning over sets using kernel principal angles. Journal of Machine Learning Research 4:913–931.
 [Worsley1997] Worsley, K. 1997. An overview and some new developments in the statistical analysis of PET and fMRI data. Human Brain Mapping 5(4):254–258.
References

[Absil and Gallivan2006]
Absil, P., and Gallivan, K.
2006.
Joint diagonalization on the oblique manifold for independent
component analysis.
In Acoustics,Speech, and Signal Processing (ICASSP), volume 5.

[Absil, Mahony, and
Sepulchre2004]
Absil, P. A.; Mahony, R.; and Sepulchre, R.
2004.
Riemannian geometry of Grassmann manifolds with a view on
algorithmic computation.
Acta Applicandae Mathematica 80(2):199–220.

[Absil, Mahony, and
Sepulchre2008]
Absil, P. A.; Mahony, R.; and Sepulchre, R.
2008.
Optimization Algorithm on Matrix Manifolds.
Princeton University Press.

[Aiken, West, and
Pitts2003]
Aiken, L. S.; West, S. G.; and Pitts, S. C.
2003.
Multiple linear regression.
Handbook of Psychology. 4(19):481–507.

[Boulesteix and
Strimmer2007]
Boulesteix, A., and Strimmer, K.
2007.
Partial least squares: A versatile tool for the analysis of
highdimensional genomic data.
Briefings in Bioinformatics 8(1):32–44.

[Boumal et al.2014]
Boumal, N.; Mishra, B.; Absil, P. A.; and Sepulchre, R.
2014.
Manopt, a Matlab toolbox for optimization on manifolds.
The Journal of Machine Learning Research
15(1):1455–1459.

[Boyd et al.2011]
Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; and Eckstein, J.
2011.
Distributed optimization and statistical learning via the alternating
direction method of multipliers.
Foundations and Trends®in Machine Learning
3(1):1–122.

[Chun and Keles2010]
Chun, H., and Keles, S.
2010.
Sparse partial least squares regression for simultaneous dimension
reduction and variable selection.
Journal of the Royal Statistical Society: Series B (Statistical
Methodology) 72(1):3–25.

[Cunningham and
Ghahramani2014]
Cunningham, J. P., and Ghahramani, Z.
2014.
Unifying linear dimensionality reduction.
arXiv:1406.0873.

[Edelman, Arias, and
Smith1998]
Edelman, A.; Arias, T.; and Smith, S.
1998.
The geometry of algorithms with orthogonality constraints.
SIAM Journal on Matrix Analysis and Applications
20(2):303–353.

[Hamm and Lee2008]
Hamm, J., and Lee, D.
2008.
Grassmann discriminant analysis: A unifying view on subspacebased
learning.
In International conference on machine learning, 376–383.

[Hao, Thelen, and Gao2016]
Hao, X.; Thelen, K.; and Gao, J.
2016.
Spatial variability in biomass yield of switchgrass, native prairie,
and corn at field scale.
Biofuels 108(2):548–558.

[He, Balzano, and
Szlam2012]
He, J.; Balzano, L.; and Szlam, A.
2012.
Incremental gradient on the grassmannian for online foreground and
background separation in subsampled video.
In Computer Vision and Pattern Recognition.

[Höskuldsson1988]
Höskuldsson, A.
1988.
Pls regression methods.
Journal of Chemometrics 2(3):211–228.

[Huang et al.2005]
Huang, X.; Pan, W.; Grindle, S.; and Han, X.
2005.
A comparative study of discriminating human heart failure etiology
using gene expression profiles.
BMC Bioinformatics 6:205.

[Hulland1999]
Hulland, J.
1999.
Use of partial least squares (PLS) in strategic management
research: A review of four recent studies.
Strategic Management Journal 20(2):195–204.

[Jolliffe1982]
Jolliffe, I.
1982.
A note on the use of principal components in regression.
Applied Statistics 31(3):300–303.

[Jong1993]
Jong, S. D.
1993.
SIMPLS: An alternative approach to partial least squares
regression.
Chemometrics and Intelligent Laboratory Systems 18(3):251–263.

[Kendall1957]
Kendall, M. G.
1957.
A Course in Multivariate Analysis
. Griffin, London.

[Kim, Kittler, and
Cipolla2007]
Kim, T. K.; Kittler, J.; and Cipolla, R.
2007.
Discriminative learning and recognition of image set classes using
canonical correlations.
IEEE Transactions on Pattern Analysis and Machine Intelligence
29(6):1005–1018.

[Leibe and Schiele2003]
Leibe, B., and Schiele, B.
2003.
Analyzing appearance and contour based methods for object
categorization.
In Computer Vision and Pattern Recognition.

[Liton et al.2015]
Liton, M. A. K.; Helen, S.; Das, M.; Islam, D.; and Karim, M.
2015.
Prediction ofaccuratep values of some
substituted carboxylicacids with low cost of computational methods.
Journal of Physical & Theoretical Chemistry 12(3):243–255.

[Lobaugh, West, and
McIntosh2001]
Lobaugh, N.; West, R.; and McIntosh, A.
2001.
Spatiotemporal analysis of experimental differences in eventrelated
potential data with partial least squares.
Sychophysiology 38(3):517–530.

[Lui2016]
Lui, Y.
2016.
A general least squares regression framework on matrix manifolds for
computervision.
In Riemannian Computing in Computer Vision. Springer
International Publishing.
303–323.

[Mishra and
Sepulchre2014]
Mishra, B., and Sepulchre, R.
2014.
R3MC: A Riemannian threefactor algorithm for lowrank matrix
completion.
In Conference on Decision and Control, 1137–1142.

[Mishra et al.2014]
Mishra, B.; Meyer, G.; Bonnabel, S.; and Sepulchre, R.
2014.
Fixedrank matrix factorizations and Riemannian lowrank
optimization.
Computational Statistics 29(3):591–621.

[Ns and Martens1988]
Ns, T., and Martens, H.
1988.
Principal component regression in NIR analysis: Viewpoints,
background details and selection of components.
Journal of Chemometrics 2(2):155–167.

[Nishimori and
Akaho2005]
Nishimori, Y., and Akaho, S.
2005.
Learning algorithms utilizing quasigeodesic flows on the Stiefel
manifold.
Neurocomputing 67:106–135.

[Rosipal and Kramer2006]
Rosipal, R., and Kramer, N.
2006.
Overview and recent advances in partial least squares.
Lecture Notes in Computer Science 3940:34–51.

[Rosipal2003]
Rosipal, R.
2003.
Kernel partial least squares for nonlinear regression and
discrimination.
Neural Network World 13(3):291–300.

[Tan et al.2014]
Tan, M.; Tsang, I.; Wang, L.; Vandereycken, B.; and Pan, S.
2014.
Riemannian pursuit for big matrix recovery.
In International Conference on Machine Learning, volume 32,
1539–1547.

[Wang et al.2014]
Wang, B.; Hu, Y.; Gao, J.; Sun, Y.; and Yin, B.
2014.
Low rank representation on grassmann manifolds.
In Asian Conference on Computer Vision.

[Wang et al.2016]
Wang, B.; Hu, Y.; Gao, J.; Sun, Y.; and Yin, B.
2016.
Product grassmann manifold representation and its lrr models.
In
American Association for Artificial Intelligence
.

[Wegelin2000]
Wegelin, J.
2000.
A survey of partial least squares (PLS) methods, with emphasis on
the twoblock case.
Technical report, Department of Statistics, University of Washington,
Seattle.

[Wold et al.1984]
Wold, S.; Ruhe, H.; Wold, H.; and Dunn, W.
1984.
The collinearity problem in linear regression. The partial least
squares (PLS) approach to generalized inverses.
SIAM Journal of Scientific and Statistical Computations
5(3):735–743.

[Wold1975]
Wold, H.
1975.
Path Models with Latent Variables: The NIPALS
Approach.
Academic Press.

[Wolf and Shashua2003]
Wolf, L., and Shashua, A.
2003.
Learning over sets using kernel principal angles.
Journal of Machine Learning Research 4:913–931.

[Worsley1997]
Worsley, K.
1997.
An overview and some new developments in the statistical analysis of
PET and fMRI data.
Human Brain Mapping 5(4):254–258.