1 Introduction
TwoDimensional Principal Component Analysis (2DPCA) yzfy04 and its variations (e.g., zhzh05 ; lpy10 ; whwj13 ; wangj16 ; gxcdgl19 ) are playing an increasingly important role in the recently proposed deep learning frameworks such as 2DPCANet yuwu17 ; lwk19
. It is always expected that 2DPCA can extract the spacial information and the best features of 2D samples which can improve the performance of dimensional reduction. From the view of numerical linear algebra, the principle of 2DPCA is to find a subspace (called eigenfaces or features) on which the projected samples have the largest variance. The reconstruction from such projection or extraction of lower dimension is in fact the optimal lowrank approximation of the original sample. When applying 2DPCA to face recognition, we compute the eigenfaces or features based on the training set, and fairly use them to compress the training and testing samples before classification. An implicitly natural assumption is that the projected samples from the testing set still have large variance on the computed subspace. This exactly depends on the generalization ability of 2DPCA. In this paper, we will review 2DPCA and variations, and present a new relaxed 2DPCA (R2DPCA) with perfections in three aspects: abstracting the features of matrix samples in both row and column directions, being innovatively armed with generalization ability, and weighting the main components by corresponding eigenvalues. Especially, R2DPCA utilizes the label information of training data, and not only aims to enlarge the variance of projections of training samples.
The principal component analysis (PCA) Jolliffe04 ; TP91 , has become one of the most powerful approaches of face recognition siki87 ; kisi90 ; tupe91 ; zhya99 ; pent00 . Recently, many robust PCA (RPCA) algorithms are proposed with improving the quadratic formulation, which renders PCA vulnerable to noises, into norm on the objection function, e.g., PCA keka05 , PCA dzhz06 , and PCA kwak08 . Meanwhile, sparsity is also introduced into PCA algorithms, resulting in a series of sparse PCA (SPCA) algorithms zht06 ; agjl07 ; shhu08 ; wth09 . A newly proposed robust SPCA (RSPCA) mzx12 further applies norm both in objective and constraint functions of PCA, inheriting the merits of robustness and sparsity. Observing that , , and norms are all special norm, it is natural to impose norm on the objection or/and constraint functions, straightforwardly; see PCA kwak14 and generalized PCA (GPCA) lxzzl13 for instance.
To preserve the spatial structure of face images, two dimensional PCA (2DPCA), proposed by Yang et al. yzfy04 , represents face images with two dimensional matrices rather than one dimensional vectors. The computational problems bases on 2DPCA are of much smaller scale than those based on PCA, and the difficulties caused by rank defect are also avoided in general. This imageasmatrix method offers insights for improving above RSPCA, PCA, GPCA, etc. As typical examples, the normbased 2DPCA (2DPCA) lpy10 and 2DPCA with sparsity (2DPCAS) whwj13 are improvements of PCA and RSPCA, respectively, and the generalized 2DPCA (G2DPCA) wangj16 imposes norm on both objective and constraint functions of 2DPCA. Recently, the quaternion 2DPCA is proposed in jlz17 and applied to color face recognition, where the red, green and blue channels of a color image is encoded as three imaginary parts of a pure quaternion matrix. To arm the quaternion 2DPCA with the generalization ability, Zhao, Jia and Gong zjg17
proposed the samplerelaxed quaternion 2DPCA with applying the label information (if known) of training samples. The structurepreserving algorithms of quaternion eigenvalue decomposition and singular value decomposition can be found in
jwl13 ; mjb18 ; jmz17 ; jwzc18 ; jns18b ; jns18a .Linear Discriminant Analysis (LDA) is another powerful feature extraction algorithm in pattern recognition and computer vision. Since LDA often suffers from the small sample size (3S) problem, some effective approaches have been proposed, such as PCA + LDA
bhk97 , orthogonal LDA ye05 , LDA/GSVD hopa04 , and LDA/QR yeli05 . Because of the advantages over the singularity problem and the computational cost, 2DLDA and its variants have recently attracted much attention from researchers (e.g., lls08 ; kora05 ; xsa05 ; yyfz03 ; liyu05 ; cckkl06 ). With applying the label information, the LDAlike methods are intend to compute the discriminant vectors which maximize the ratio of the betweenclass distance to the withinclass distance.PCA, 2DPCA and their variations are unsupervised methods, without applying the potential or known label information of samples. Their features are calculated based on the training set and thus maximize the scatter of projected training samples. The scatter of projected testing samples are not surely optimal, and certainly, so are the whole (training and testing) projected samples. Inspired by this observation, we present a new relaxation 2DPCA (R2DPCA). This approach is a generalization of G2DPCA wangj16 , and will reduce to G2DPCA if the label information is unknown or unused. Remark that the projection of R2DPCA does not aim to maximize the variance of training samples as 2DPCA, but intends to avoid the overfitting and to enhance the generalization ability. R2DPCA sufficiently utilizes the labels (if known) of training samples, and can enhance the total scatter of whole projected samples (see Example 4.3 for the indication). Different to the idea of LDA, R2DPCA aims to apply the label information to generate a weighting vector and to construct a weighted covariance matrix in the newly proposed approach of face recognition. Thus R2DPCA never suffers from the small sample size (3S) problem.
Our contributions are in three aspects. (1) We present a new ridge regression model for 2DPCA and variations by norm. Such model is general and abstracts features of face images from both row and column directions. With this model, 2DPCA and variations are combined with additional regularization on the solution to fit various realworld applications, with the great flexibility. (2) A novel relaxed 2DPCA (R2DPCA) is proposed with a new ridge regression model. R2DPCA has the stronger generalization ability than 2DPCA, 2DPCAL1 2DPCAL1S and G2DPCA. To the best of our knowledge, we are the first to introduce the label information into the 2DPCAbased algorithms. We also weight the selected principle components by corresponding eigenvalues to enhance the role of main components. (3) The R2DPCAbased approaches are presented for face recognition and image reconstruction, and their effectiveness is verified by applying them on practical face image databases. They are indicated to perform better than the deep learning methods such as DNNs, DBNs and CNNs in the numerical examples.
The rest of this paper is organized as follows. In Section 2, we recall 2DPCA, 2DPCAL1 2DPCAL1S and G2DPCA, and present a ridge regression model to gather them together. Their improved versions are also proposed. In Section 3, we present a new relaxed two dimensional principal component analysis (R2DPCA) and the optimal algorithms. We also present the R2DPCAbased approaches for face recognition and image reconstruction. In Section 4, we compare the R2DPCA with the statetotheart approaches, and indicate the efficiency of the R2DPCA . In Section 5, we sum up the contributions of this paper.
2 The General Ridge Regression Model of 2DPCA and Variations
The twodimensional principle component analysis (2DPCA) has become one of the most popular and powerful methods in data sciences, especially in image recognition. Several deep learning frameworks, which rely hugely on the wonderful properties of 2DPCA, have achieved a high level performance in data analysis. This motivates us to develop a general model of 2DPCA and variations, providing a feasible way to embed them into artificial intelligence algorithms. In this section, we firstly present a ridge regression model of the improved 2DPCA, and then analyze the relationship of the stateoftheart variations of 2DPCA.
2.1 A New Ridge Regression Model of Improved 2DPCA
The objective of 2DPCA is to find left and/or right orthonormal bases vectors so that the projected matrix samples have the largest scatter after projection. Suppose that there are training matrix samples , where and denote the height and width of images, respectively. Their mean value is . Let and gather the left and right optimal basis vectors as columns, respectively. Then the th projected matrix sample is defined by . The improved 2DPCA seeks optimal and that minimize the scatter of the projected matrix samples. This scatter is characterized as
(1) 
where
(2) 
denote the covariance matrices of input samples on column and row directions, respectively. Both and are symmetric and semidefinite matrices. Since and are of full column rank, and are nonnegative. Here, represents the trace of a matrix. Thus, a new ridge regression model for the improved 2DPCA is proposed as
(3a)  
(3b) 
To solve the optimal problem (3), we need compute the eigenvalue problems of and . See Algorithm 2.1 for the detail.
Algorithm 2.1 (Improved 2DPCA).
Input matrix samples, , and two dimensions and . Output the left and right optimal bases and . Compute the covariance matrices of training samples on column and row directions and as in (2). Compute the largest eigenvalues ofand the corresponding eigenvectors, denoted as
. Let . Compute the largest eigenvalues of and the corresponding eigenvectors, denoted as . Let .2.2 Improved Variations of 2DPCA and Optimal Algorithms
Based on the idea in Section 2.1, we present the improved versions of 2DPCA yzfy04 , 2DPCA lpy10 , 2DPCAS whwj13 , and G2DPCA wangj16 , whose ridge regression models are proposed in the forms of computing the first projection vector.
Without loss of generality, we assume that the training samples are meancentered, i.e., ; otherwise, we will replace by . After obtaining first left and right projection vectors and , the th left and right projection vectors and can be calculated similarly on deflated samples:
(4) 
where . The ridge regression models of improved 2DPCA and variations find the first left and right projection vectors and by solving the optimization problem with equality constraints as follows.

The improved 2DPCA:
(5) 
The improved 2DPCA:
(6) 
The improved 2DPCAS:
(7) where is a positive constant.

The improved G2DPCA:
(8) where and .
Since two independent variables in models (5)(8) are separated, it is appropriate to solve and separately by optimal algorithms. Taking the improved G2DPCA (8) for instance, the first projection vector w ( or ) is computed by solving the optimization problem with equality constraints:
(9) 
where or , and . Depending on the value , the projection vector w can be updated in two different ways. If ,
(10a)  
(10b) 
where satisfies , denotes the Hadamard product, i.e., the elementwise product between two vectors. If ,
(11a)  
(11b) 
3 Relaxed TwoDimensional Principal Component Analysis
2DPCA is an unsupervised methods and overlooks the potential or known label information of samples. The abstracted features maximize the scatter of projected training samples, and are implicitly expected to maximize (not surely) the scatter of projected testing samples as well. In this section, we present a new relaxed twodimensional principal component analysis (R2DPCA) method by norm to avoid the overfitting and to enhance the generalization ability. In large amount of experiments, R2DPCA sufficiently utilizes the labels (if known) of training samples, and can enhance the total scatter of whole projected samples. Interestingly, R2DPCA never suffers from the small sample size (3S) problem as supervised method such as LDA. Now we introduce the R2DPCA from two parts: weighting vector and objective function relaxation.
3.1 Weighting vector
Suppose that training samples can be partitioned into classes and each class contains samples:
(12) 
where denotes the th sample of the th class, , . Define the mean of training samples from the th class as and the th withinclass covariance matrix of the training set as where , and . The withinclass covariance matrix is a symmetric and positive semidefinite matrix. Its maximal eigenvalue, denoted by , represents the variance of training samples in the principal component. The larger is, the better scattered the training samples of th class are. If then all of training samples from the th class are same, and then the contribution of the th class to the covariance matrix of training set should be reduced. To this aim, we define a weighting vector of training classes,
(13) 
where is a weighting factor of the th class with a function, . The computation of the weighting vector is proposed in Algorithm 3.2.
3.2 Objective function relaxation
With the computed weighting vector in hand, we define a relaxed criterion as
(14) 
where is a relaxation parameter, and are unit vectors under norm,
(15a)  
(15b) 
The R2DPCA finds its first projection vectors and by solving the optimization problem with equality constraints:
(16) 
where the criterion is defined as in (14). Notice that the relaxed criterion (16) reduces to (8) if , and thus, the first projection vectors of R2DPCA and G2DPCA are the same. If first projection vectors and have been obtained, the th projection vectors and can be calculated similarly on the deflated samples, defined as in (4). At each iterative step, we also obtain the maximal value of the objective function,
Exactly, the first optimal projection vectors of R2DPCA solve the optimal problem with equality constraints:
(17) 
Algorithm 3.3 is presented to compute first optimal left and right projection vectors.
Now we present the relationships among the improved 2DPCA, 2DPCA, 2DPCAS , G2DPCA, and R2DPCA. It is obvious that 2DPCA and 2DPCA are two special cases of G2DPCA. 2DPCAS originates from G2DPCA with and which leads to projection vector with only one nonzero element. Then the norm constraint is employed to fix this problem, resulting in 2DPCAS. On the other hand, G2DPCA with and behaves like 2DPCAS, since the norm constraint in G2DPCA behaves like the mixednorm constraint in 2DPCAS. R2DPCA is a generalization of G2DPCA.
Algorithm 3.3 (Relaxed TwoDimensional Principal Component Analysis (R2DPCA)).
Input training samples as in (12) and parameters , (1) Computing the weighting vector, , by Algorithm 3.2. (2) Compute the covariance matrix and the relaxed covariance matrix as in (15). (3) Compute the left features , the right features , and the variances , according to the relaxed criterion (14). for do Initialize , , arbitrary with . according to (14). while do is computed in the following four cases. is computed by the similar way. . Case 1: . Case 2: . Case 3: . Case 4: . according to (14). end while end for3.3 Face recognition
Suppose we have computed the optimal projections, and , and the diagonal matrix by R2DPCA. The R2DPCA approach for color face recognition is proposed in Algorithm 3.4.
Algorithm 3.4 (R2DPCA approach for face recognition).
Input the training set, , the optimal projections, and , and the set of face images to be recognized, . Output the identity vector of , . Compute the features of training face images under and as3.4 Image reconstruction
The original digit image, , can be optimally approximated by a lowrank reconstruction from its feature, . Suppose that and are the unitary complement of and . For , the reconstructions are defined as
(18) 
with . Here, the mean value of samples is assumed to be zero for simplicity. The image reconstruction rate of is defined as follows
(19) 
Note that is always a good approximation of . If and , and are a unitary matrices and hence , which means .
4 Experiments
In this section, we present numerical experiments to compare all advanced variations of 2DPCA, including in the relaxed twodimensional principle component analysis (R2DPCA), with the stateoftheart algorithms. The numerical experiments are performed with MATLABR2016 on a personal computer with Intel(R) Xeon(R) CPU E52630 v3 @ 2.4GHz (dual processor) and RAM 32GB.
Example 4.1.
In this experiment, we compare R2DPCA with 2DPCA, 2DPCA, 2DPCAS, and G2DPCA on face recognition by utilizing three famous databases as follows:

Faces95 database^{1}^{1}1Collection of Facial Images: Faces95. http://cswww.essex. ac.uk/mv/allf aces/faces95.html (1440 images from 72 subjects, twenty images per subject),

Color FERET database^{2}^{2}2The color Face Recognition Technology (FERET) database: https://www.nist.gov/itl/iad/imagegroup/colorferetdatabase. (3025 images from 275 subjects, eleven images per subject),

Grey FERET database^{3}^{3}3Here we use the widely used cropped version of the FERET database. The size of each face image is . (1400 images from 200 subjects, seven images per subject).
All of face images are cropped and resized such that each image is of 8080 size. The basic setting is that and face images of each person from Faces95 and (color or grey) FERET face databases are randomly selected out as training samples, and the remaining ones are left for testing.
We test the effect of numbers of chosen features on the recognition accuracy. Let and be fixed as the optimal parameters in Table 1. The recognition accuracies of 2DPCA, 2DPCAL1, 2DPCAL1S, G2DPCA and R2DPCA with different feature numbers in the range of on the grey Feret databases are shown in Figure 1. The recognition accuracies of G2DPCA and R2DPCA on the Faces95 and Color Feret databases are shown in Figure 2.
From the numerical results in Table 1 and Figures 1 and 2, we can conclude that the classification accuracies of R2DPCA are higher and more stable than 2DPCA, 2DPCAL1, 2DPCAL1S and G2DPCA when the number of chosen features is large. The recognition accuracies of G2DPCA and R2DPCA are the same when , in which case neither relaxation nor weighting is necessary in G2DPCA.
Algorithms  Face95  color Feret  

2DPCA  
2DPCA  
2DPCAS  
G2DPCA  
R2DPCA  0.9493  0.7085 
Example 4.2.
In this experiment, we compare R2DPCA with three most prominent deep learning primitives: Convolutional Neural Networks (CNNs), Deep Belief Networks (DBNs) and Deep Neural Networks (DNNs). These methods are applied on the partial MNIST database of handwritten digits, which has a training set of
samples, and a test set of samples. The size of each image is pixels. The codes of CNNs, DBNs and DNNs are according to palm12 and nielsen13 , and the settings are as follows.Deep Neural Networks (DNNs) are implemented by stacking layers of neural networks along the depth and width of smaller architectures. A fourlayer neural network is used in our tests. The input layer of the network contains neurons and the output layer contains neurons. The number of neurons in first and second hidden layers are set by and , where increases from to and is fixed. All weights and biases are initialized randomly between and will be updated by error back propagation algorithm. The iteration will stop once the convergence condition is achieved. In our test, this condition is that if current accuracy of test samples is lower than last iteration more than three times.
Deep Belief Networks (DBNs) consist of a number of layers of Restricted Boltzmann Machines (RBMs) which are trained in a greedy layer wise fashion. The lower layer is same as the input layer in DNNs, and the top layer as the hidden layer. In our experiment, a four layers consisted of two RBMs are constructed. We set the number of hidden neurons in first RBM from
to , step and a fixed number of hidden neurons ofin second RBM. Each RBM is trained in a layerwise greedy manner with contrastive divergence. All weights and biases are initialized to be zero. Each RBM is trained on the full
images training set, using minibatches of size , with a fixed learning rate offor one epoch. One epoch is one full sweep of the data. Having trained the first RBM the entire training dataset is transformed through the first RBM resulting in a new
by dataset which the second RBM is trained on. Then the trained weights and biases are used to initialize a feedforward neural net with layers of sizes, the last 10 neurons being the output label units. The feedforward neural net is trained with sigmoid activation function using backpropagation. Here we set the minibatches of size
for one epoch using a fixed learning rate of . At last the test samples are performed in the feedforward network and the maximum output unit are their labels.Convolutional Neural Networks (CNNs) are feedforward, backpropagate neural networks with a special architecture inspired from the visual system, consisting of alternating layers of convolution layers and subsampling layers. What is different is that CNNs work on the two dimensional data directly. In our experiment, we set two convolution layers and two subsampling layers. The first layer has k feature maps, where we set from to , step , connected to the single input layer through kernels. The second layer is a meanpooling layer. The third layer has
feature maps which are all connected to all k meanpooling layers below through
kernels. The fourth layer is still a meanpooling layer. After above steps the feature maps is concatenated into a feature vector which feeds into the final layer which consists of output neurons, corresponding to theclass labels. The CNNs are trained with stochastic gradient descent on the training set, using minibatches of size
, with a fixed learning rate of for one epoch. Putting test samples in the trained networks and comparing output with their true labels in order are to get the recognition rate.The numerical results are shown in Figure 3. We can see that R2DPCA has the better performance over CNNs, DBNs and DNNs in the recognition accuracies. It should be noticed that the recognition rates of CNNs, DBNs and DNNs can not achieve at high levels with small samples, but will increase when the amount of training samples become larger.
Example 4.3.
In this experiment, we indicate the generalization ability of R2DPCA. Let randomly generated points be equally separated into two classes (denoted as and , respectively). points are chosen from each class as training samples (denoted as magenta and ) and the rest as testing samples (denoted as blue and ). The principle component of training points is computed by 2DPCA and R2DPCA. In three random cases, the computed principle components by two methods are plotted with the black lines, and the weighting vectors of R2DPCA are , and . The variances (the larger the better) of the training set and the whole points, under the projection of 2DPCA and R2DPCA, are shown in Table 2.
Variance of training points  Variance of testing points  Variance of the whole points  
2DPCA  R2DPCA  2DPCA  R2DPCA  2DPCA  R2DPCA  
5 Conclusion
This paper is a survey of recent development of 2DPCA. We present a general ridge regression model for 2DPCA and variations by norm, with the improvement on feature extraction from both row and column directions. To enhance the generalization ability, the relaxed 2DPCA (R2DPCA) is proposed with a general ridge regression model. The R2DPCA is a generalization of 2DPCA, 2DPCA and G2DPCA, and has higher generalization ability. Since utilizing the label information, the R2DPCA can be seen as a new supervised projection method, but it is totally different to the twodimensional linear discriminant analysis (2DLDA)ye05 ; lls08 . The R2DPCAbased approaches for face recognition and image reconstruction are also proposed and the selected principle components are weighted to enhance the role of main components. The properties and the effectiveness of proposed methods are verified by practical face image databases. In numerical experiments, R2DPCA has a better performance than 2DPCA, 2DPCA,G2DPCA, CNNs, DBNs, and DNNs.
Acknowledgments
This paper is supported in part by National Natural Science Foundation of China under grants 11771188 and a Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions.
References
References
 (1) J. Yang, D. Zhang, A. F. Frangi, J. Y. Yang (2004) Twodimensional PCA: A new approach to appearancebased face representation and recognition, IEEE Trans. Pattern Anal. Mach. Intell., 26 (1), pp. 131137.
 (2) Daoqiang Zhang and ZhiHua Zhou (2005), (2D)PCA: Twodirectional twodimensional PCA for efficient face representation and recognition, Neurocomputing, 69(13), pp. 224231.
 (3) X. Li, Y. Pang, and Y. Yuan (2010) normbased 2DPCA, IEEE Trans. Syst., Man, Cybern. B, Cybern., 40 (4), pp. 11701175.
 (4) H. Wang and J. Wang (2013) 2DPCA with norm for simultaneously robust and sparse modelling, Neural Netw., 46, pp. 190198.
 (5) J. Wang (2016) Generalized 2D Principal Component Analysis by Norm for Image Analysis, IEEE Trans. Cybern., 46 (3), pp. 792803.
 (6) Q. Gao et al. (2019) 2DPCA and face recognition, IEEE Trans. Cybern., 49(4), pp. 12121223.
 (7) D. Yu, X.J. Wu (2017) 2DPCANet: a deep leaning network for face recognition, Multimed. Tools Appl., 4, pp. 116.
 (8) Y.K. Li, X.J. Wu, and J. Kittler (2019) L12D2PCANet: a deep learning network for face recognition, J. of Electronic Imaging, 28(2), pp. 023016 (20 March 2019). https://doi.org/10.1117/1.JEI.28.2.023016
 (9) I. Jolliffe (2004) Principal Component Analysis, New York, NY, USA: Springer.
 (10) M. Turk, A. Pentland (1991) Eigenfaces for recognition, J. Cogn. Neurosci., 3 (1), pp. 7186.
 (11) L. Sirovich, M. Kirby (1987) Lowdimensional procedure for characterization of human faces, J. Optical Soc. Am. 4, pp. 519524.
 (12) M. Kirby, L. Sirovich (1990) Application of the karhunenloeve procedure for the characterization of human faces, IEEE Trans. Pattern Anal. Mach. Intell., 12 (1), pp. 103108.
 (13) M. Turk, A. Pentland (1991) Eigenfaces for recognition. J. Cognitive Neurosci, 3(1), pp. 7176.
 (14) L. Zhao, Y. Yang (1999) Theoretical analysis of illumination in PCAbased vision systems, Pattern Recogn., 32(4), pp. 547564.
 (15) A. Pentland (2000) Looking at people: sensing for ubiquitous and wearable computing, IEEE Trans. Pattern Anal. Mach. Intell., 22 (1), pp. 107119.
 (16) Q. Ke and T. Kanade (2005) Robust norm factorization in the presence of outliters and missing data by alternative convex programming, Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 1, San Diego, CA, USA, pp. 739746.
 (17) C. Ding, D. Zhou, X. He, and H. Zha (2006) PCA: Rotational invariant norm principal component analysis for robust subspace factorization, Proc. 23rd Int. Conf. Mach. Learn., Pittsburgh, PA, USA, pp. 281288.
 (18) N. Kwak (2008) Principal component analysis based on norm maximization, IEEE Trans. Pattern Anal. Mach. Intell., 30 (9), pp. 16721680.
 (19) H. Zou, T. Hastie, and R. Tibshirani (2006) Sparse principal component analysis, J. Comput. Graph. Stat., 15 (2), pp. 265286.
 (20) A. d’Aspremont, L. EI Ghaoui, M. I. Jordan, and G. R. Lanckriet (2007) A direct formulation for sparse PCA using semidefinite programming, SIAM Rev., 49 (3), pp. 434448.
 (21) H. Shen and J. Z. Huang (2008) Sparse principal component analysis via regularized low rank matrix approximation, J. Multivar. Anal., 99 (6), pp. 10151034.
 (22) D. M. Witten, R. Tibshirani, and T. Hastie (2009) A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis, Biostatistics, 10 (3), pp. 515534.
 (23) D. Meng, Q. Zhao, and Z. Xu (2012) Improve robustness of sparse PCA by norm maximization, Pattern Recognit., 45 (1), pp. 487497.
 (24) N. Kwak (2014) Principal component analysis by norm maximization, IEEE Trans. Cybern., 44 (5), pp. 594609.
 (25) Z. Liang, S. Xia, Y. Zhou, L. Zhang, and Y. Li (2013) Feature extraction based on norm generalized principal component analysis, Pattern Recognit. Lett., 34 (9), pp. 10371045.
 (26) Z. Jia, S. Ling, M. Zhao (2017) Color twodimensional principal component analysis for face recognition based on quaternion model, LNCS, vol. 10361, pp. 177189.
 (27) M. Zhao, Z. Jia, D. Gong (2018) Samplerelaxed twodimensional color principal component analysis for face recognition and image reconstruction, arXiv.org/cs /arXiv:1803.03837v1, 10 Mar 2018.
 (28) Z. Jia, M. Wei, S. Ling (2013) A new structurepreserving method for quaternion Hermitian eigenvalue problems, J. Comput. Appl. Math. 239, pp. 1224.
 (29) R. Ma, Z. Jia, Z. Bai (2018) A structurepreserving Jacobi algorithm for quaternion Hermitian eigenvalue problems, Comput. Math. Appl., 75(3), pp. 809820.
 (30) Z. Jia, R. Ma, M. Zhao (2017) A New StructurePreserving Method for Recognition of Color Face Images, Computer Science and Artificial Intelligence , pp. 427432.
 (31) Z. Jia, M. Wei, M. Zhao, Y. Chen (2018) A new real structurepreserving quaternion QR algorithm, J. Comput. Appl. Math. 343, pp. 2648.
 (32) Z. Jia, M.K. Ng, G. Song (2018) Lanczos method for largescale quaternion singular value decomposition, Numer. Algorithms, 08 November 2018. https://doi.org/10.1007/s1107501806210

(33)
Z. Jia, M.K. Ng, and G. Song (2019) Robust Quaternion Matrix Completion with Applications to Image Inpainting, Numer. Linear Algebra Appl., DOI:10.1002/nla.2245.
http://www.math.hkbu.edu.hk/ mng/quaternion.html  (34) L. Mackey (2008) Deflation methods for sparse PCA, Proc. Adv. Neural Inf. Process. Syst., 21, Whistler, BC, Canada., pp. 10171024.
 (35) P.N. Belhumeur, J. Hespanda, D. Kriegeman (1997) Eigenfaces vs Fisherfaces: Recognition using class specific linear projection. IEEE Trans. Pattern Anal. Machine Intell. 19(7), pp. 711720.

(36)
J. Ye (2005) Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems, Machine Learning Res., 6, pp. 15324435.
 (37) P. Howland, H. Park (2004) Generalized discriminant analysis using the generalized singular value decomposition, IEEE Trans. Pattern Anal. Machine Intell. 8, pp. 9951006.

(38)
J. Ye, Q. Li (2005) A twostage discriminant analysis via QR decomposition, IEEE Trans. Pattern Anal. Machine Intell., 27(6), pp. 929941.
 (39) Z.Z. Liang, Y.F. Li, P.F. Shi (2008) A note on twodimensional linear discriminant analysis, Pattern Recognit. Lett. 29, pp. 21222128.
 (40) S. Kongsontana, Y. Rangsanseri (2005) Face recognition using 2DLDA algorithm, In: Proc. 8th Internat. Symp. Signal Process. Appl., pp. 675678.
 (41) H. Xiong, M.N.S. Swamy, M.O. Ahmad (2005) Twodimensional FLD for face recognition. Pattern Recognition 38 (7), 11211124.
 (42) J. Yang, J.Y. Yang, A.F. Frangi, D. Zhang (2003) Uncorrelated projection discriminant analysis and its application to face image feature extraction, Internat. J. Pattern Recognition Artificial Intell. 17 (8), pp. 13251347.
 (43) M. Li, B. Yuan (2005) 2DLDA: A novel statistical linear discriminant analysis for image matrix. Pattern Recognition Lett. 26 (55), pp. 527532.
 (44) D. Cho, U. Chang, K. Kim, B. Kim, S. Lee (2006) (2D)2DLDA for efficient face recognition. LNCS 4319, pp. 314321.
 (45) R.B.Palm. Prediction as a candidate for learning deep hierarchical models of data.Technical University of Denmark, 2012.
 (46) M. Nielsen. Neural Networks and Deep Learning[online]. 2013. http://neuralnetworksanddeeplearning.com