1 Introduction
Advances in sensing technologies have led to the continuous generation of massive multidimensional data, used in a wide range of applications. Their successful exploitation, however, is directly linked to the effectiveness of pattern recognition methods employed for their analysis. Despite the high dimensionality, this kind of data is often characterized by large amounts of redundancy, occupying a subspace of the input space
[1]. In this context, feature extraction for subspace learning plays a crucial role towards the mapping of highdimensional data to a lowdimensional space
[2, 3, 4, 5]. However, feature extraction is often a challenging task due to the complex distribution of input data [6], especially in cases of limited training samples [7, 8].The goal of feature extraction is to extract information regarding the underlying nature of the data. Unsupervised feature extraction methods in particular aim at capturing the principal statistical relation within the data and represent it in lower dimension spaces. This type of methods is referred to as unsupervised subspace learning and includes techniques such as 2D Principal Component Analysis (2DPCA) [9], Generalized Low Rank Approximation of Matrices (GLRAM) [10], Concurrent Subspace Analysis [11] and Multilinear Principal Component Analysis (MPCA) [2]. Such methods can also decrease the computational cost of pattern recognition algorithms (e.g. for classification or regression) through the reduction of data dimensionality.
The main objective of pattern recognition, however, is the extraction of features capable of discriminating different classes. Although unsupervised subspace learning can provide a valuable tool for data analysis, the features extracted are not necessarily those salient features required to discriminate among pattern classes, since the problem of finding discriminative features is conceptually and fundamentally different than mapping data to a lower dimension space. Different sets of features should be used for different classes, which means that the feature extraction process should be conducted in a supervised manner (supervised subspace learning).
In this paper, we propose a supervised subspace learning method, which is motivated by the Common Spatial Patterns (CSP) [12, 13] algorithm. The CSP algorithm is based on a modification of the KarhunenLoeve expansion [14], aiming at extracting features that increase interclass separability. The application of CSP, however, is restricted to 2D data. Motivated by this fact, we extend the CSP algorithm to tensor objects of arbitrary order. In particular, we extract the common patterns corresponding to each mode of tensor objects  hence naming the former Common Mode Patterns (CMP)  that increase the separability between two classes.
2 Preliminaries
In this section we present some tensor algebra definitions and operations that will be used throughout this work. Tensor objects are denoted in calligraphic uppercase letters, matrices in bold uppercase letters, vectors in bold lowercase letters and scalars in lowercase letters.
Tensor matricization. Mode matricization maps a tensor into a matrix , by arranging the mode fibers to be the columns of the resulting matrix.
mode product. The mode product of a tensor and a matrix denoted as is a tensor in with entries
Scalar product. The scalar product of two tensors is denoted as and is equal to .
Tensor norm. The Frobenius norm of a tensor is defined as .
Average total scatter. The average total scatter of a set of tensors is defined as
(1) 
where .
Average mode scatter matrix. The average mode scatter matrix of a set of tensors is defined as
(2) 
where , and is the mode matricization of .
3 Problem Formulation
We consider a binary classification problem, where the samples are tensor objects. Let , be a set of samples that belong to the th class, and a set of matrices, where with . The projection of any onto the subspace is defined as
(3) 
A matrix representation of this projection can be obtained through the mode matricization of and as
(4) 
with
(5) 
In relation (5), the operator denotes the Kronecker product.
The objective of this work is to project the tensor samples
onto a subspace, where the explained variance for each class is maximized, and the different properties of each class are emphasized. The projected samples thus minimize information loss and can discriminate between the two different pattern classes. By assuming that the variance of a class can be measured by the average total scatter of the tensor samples belonging to this class
[2], we can formally define the problem that needs to be solved for achieving the aforementioned objective.Problem 1.
In Problem 1, defined above, is the projection of onto a subspace using the projection matrices , while
(7) 
and
(8) 
Remark 1. Suppose that is a set of projection matrices that satisfies (6) either for or for . Then, as is shown in [2], each matrix , , consists of the eigenvectors corresponding to the largest eigenvalues of the matrix
(9) 
where is as in (5). However, in Problem 1, the set of projection matrices should satisfy (6) both for and , and, at the same time, the resulting projection should emphasize different sets of features for each class.
4 Common Mode Patterns
4.1 Normalization Process
For the CMP algorithm to extract those important features that are required for separating two pattern classes, a preprocessing step, in the sense of a normalization process, is necessary. We hereby present this normalization process.
Suppose that we have at our disposal a set of raw tensor measurements (samples) in that belong to the th class. Based on these samples we can define the matrix
(10) 
For every and the matrix
is symmetric. Hence, matrix is also symmetric, since it is the weighted sum of symmetric matrices.
Let us define the symmetric matrix . Since is symmetric, there exists the transformation matrix
(11) 
such that
(12) 
In (11), stands for the diagonal matrix of eigenvalues of , while for the matrix of eigenvectors of .
Following the above normalization process, we define the mode matricization of tensor objects , that need to be projected onto a subspace as
(13) 
The normalization process takes place before the projection, and actually corresponds to a linear transformation, which is applied on the tensor objects.
4.2 The CMP Algorithm
This section presents the CMP algorithm, which constitutes the core contribution of this paper. The CMP algorithm is based on Theorem 1 below.
Theorem 1.
Let be the solution to Problem 1. Then, given all other projection matrices , matrix consists of the eigenvectors corresponding to the largest eigenvalues of the matrix and the eigenvectors corresponding to the largest eigenvalues of the matrix .
Proof.
From the definition of Frobenius norm for a tensor and that for a matrix, , and from Eq. (4), it holds that
(14) 
Moreover, from Eq. (9) can be written as
(15) 
The maximum trace of is obtained if consists of the eigenvectors of matrix corresponding to the largest eigenvalues. Since we want to maximize simultaneously for and , matrix will consist of the eigenvectors corresponding to the largest eigenvalues of matrix and the eigenvectors corresponding to the largest eigenvalues of matrix .
Let us denote as the matrix
(16) 
After the normalization process [see Eq. (12)]
(17) 
The eigenvalues and eigenvectors of are given by
(18) 
with . From Eq. (17), and (18) we have that
(19) 
The same holds for matrices in relation (15), since they are similar (i.e., have the same eigenvalues) with the matrices in relation (16). For this to become clearer, note that , since is the Kronecker product of orthogonal matrices, and thus it is also orthogonal.
From Eq. (19), we have that the important features for the first class are the least important features for the second class, and vice versa. This means that after the projection, the two classes cannot share common important features. ∎
The CMP algorithm is presented in Algorithm 1. Please note that during the estimation of the set only the matrices are used. matrices are not employed in the algorithm since matrices and have the same eigenvectors and reversely ordered eigenvalues.
5 Experimental Results
In this study we validated the CMP methodology using a widely known and publicly available hyperspectral imaging dataset, named Pavia University, whose number of spectral bands is 103 (see Fig.1). Ground truth contains 9 classes, while pixels in white color are not annotated.
The CMP method was developed for binary classification problems. Thus, we grouped together pixels that depict manmade objects and discriminated them from the rest of the pixels. For this dataset, pixels that depict manmade objects are labeled as asphalt, metal sheets, bricks and bitumen. Then, the tagged parts of the dataset were split into two sets, i.e. training and testing data. The training set was created by selecting 200 samples from each class.
In order to classify a pixel at location
on image plane, we followed the approach presented in [16], according to which the image is split, along its spatial dimensions, into overlapping patches of size , where is the number of spectral bands. Then, it is assumed that the label of a pixel located at position on image plane, will be the same as the label of the patch centered at location.During experimental validation we compared the proposed CMP method against MPCA [2], using three different classifiers: Rank1 Tensor Regression (Rank1 TR) [17], CNN [16, 18], and Rank1 FNN [7, 8]. The efficiency of CMP and MPCA was quantified in terms of the classification accuracy of the classifiers on testing set. In our experiments, we set the parameter equal to 7, and required from both MPCA and CMP to reduce the spatial dimension of the samples to elements. Then, two sets of experiments were conducted. In the first one, the spectral dimensionality of the samples was reduced by selecting: the 26 principal components using MPCA (MPCA26); the 26 principal components for each pattern class using CMP (CMP26), and the 13 principal components for each pattern class using CMP (CMP13), so that the dimensionality along the spectral dimension is 26. In the second one, the spectral dimensionality of the samples was reduced by selecting: the 10 principal components using MPCA (MPCA10); the 10 principal components for each pattern class using CMP (CMP10), and the 5 principal components for each pattern class using CMP (CMP5). In the first experiment the size of the dataset was reduced 4 times, while in the second one 10 times.
CNN  Rank1 FNN  Rank1 TR  

MPCA26  85.08  86.80  77.56 
CMP26  90.41  91.25  77.96 
CMP13  88.57  88.67  76.90 
MPCA10  83.49  84.39  77.59 
CMP10  88.23  88.31  77.52 
CMP5  86.76  86.27  76.08 
The comparison between MPCA and CMP is presented in Table 1. The CMP method is more efficient than the MPCA for reducing the dimensionality of tensor objects, regardless of the classification model used, due to the fact that it can exploit labels’ information. In other words, CMP is a supervised subspace learning technique, while MPCA is an unsupervised one. For Rank1 TR the classification accuracy is almost the same both when MPCA and CMP methods are used. This is justified by the fact that Rank1 TR is a linear classifier and, due its low capacity, cannot perform any better on this dataset.
6 Conclusion
In this work, we presented the CMP method, a supervised tensor subspace learning technique, which ensures that tensor objects that belong to different classes will not share common important features after dimensionality reduction. The CMP method was compared against MPCA, and experimental results indicate that it can reduce the dimensionality of tensor objects in a more efficient way. However, the main limitation of CMP is that it is designed for binary classification problems. Therefore, the main focus of our future work is, first, to extend this approach to multiclass classification problems. Another priority of our future work includes the evaluation of CMP efficiency on more datasets with comparisons against other supervised tensor subspace learning methods.
References

[1]
Gregory Shakhnarovich and Baback Moghaddam,
“Face recognition in subspaces,”
in Handbook of Face Recognition, pp. 19–49. Springer, 2011. 
[2]
Haiping Lu, Konstantinos N Plataniotis, and Anastasios N Venetsanopoulos,
“Mpca: Multilinear principal component analysis of tensor objects,”
IEEE transactions on Neural Networks
, vol. 19, no. 1, pp. 18–39, 2008.  [3] Feiping Nie, Shiming Xiang, Yangqiu Song, and Changshui Zhang, “Extracting the optimal dimensionality for local tensor discriminant analysis,” Pattern Recognition, vol. 42, no. 1, pp. 105–114, 2009.
 [4] Zhihui Lai, Yong Xu, Jian Yang, Jinhui Tang, and David Zhang, “Sparse tensor discriminant analysis,” IEEE transactions on Image processing, vol. 22, no. 10, pp. 3904–3915, 2013.

[5]
Weiming Hu, Xi Li, Xiaoqin Zhang, Xinchu Shi, Stephen Maybank, and Zhongfei
Zhang,
“Incremental tensor subspace learning and its applications to
foreground segmentation and tracking,”
International Journal of Computer Vision
, vol. 91, no. 3, pp. 303–327, 2011.  [6] Haiping Lu, Konstantinos N Plataniotis, and Anastasios N Venetsanopoulos, “A survey of multilinear subspace learning for tensor data,” Pattern Recognition, vol. 44, no. 7, pp. 1540–1551, 2011.
 [7] Konstantinos Makantasis, Anastasios Doulamis, Nikolaos Doulamis, and Antonis Nikitakis, “Tensorbased classifiers for hyperspectral data analysis,” in International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018.
 [8] Konstantinos Makantasis, Anastasios Doulamis, Nikolaos Doulamis, Antonis Nikitakis, and Athanasios Voulodimos, “Tensorbased nonlinear classifier for highorder data analysis,” arXiv preprint arXiv:1802.05981, 2018.
 [9] Jian Yang, David Zhang, Alejandro F Frangi, and Jingyu Yang, “Twodimensional pca: a new approach to appearancebased face representation and recognition,” IEEE transactions on pattern analysis and machine intelligence, vol. 26, no. 1, pp. 131–137, 2004.
 [10] Jieping Ye, “Generalized low rank approximations of matrices,” Machine Learning, vol. 61, no. 13, pp. 167–191, 2005.
 [11] Dong Xu, Shuicheng Yan, Lei Zhang, Stephen Lin, HongJiang Zhang, and Thomas S Huang, “Reconstruction and recognition of tensorbased objects with concurrent subspaces analysis,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 18, no. 1, pp. 36–47, 2008.
 [12] Herbert Ramoser, Johannes MullerGerking, and Gert Pfurtscheller, “Optimal spatial filtering of single trial eeg during imagined hand movement,” IEEE transactions on rehabilitation engineering, vol. 8, no. 4, pp. 441–446, 2000.
 [13] Benjamin Blankertz, Ryota Tomioka, Steven Lemm, Motoaki Kawanabe, and KR Muller, “Optimizing spatial filters for robust eeg singletrial analysis,” IEEE Signal processing magazine, vol. 25, no. 1, pp. 41–56, 2008.

[14]
Keinosuke Fukunaga and Warren LG Koontz,
“Application of the karhunenloeve expansion to feature selection and ordering,”
IEEE Transactions on computers, vol. 100, no. 4, pp. 311–318, 1970. 
[15]
Mauro Dalla Mura, Alberto Villa, Jon Atli Benediktsson, Jocelyn Chanussot, and
Lorenzo Bruzzone,
“Classification of hyperspectral images by using extended morphological attribute profiles and independent component analysis,”
Geoscience and Remote Sensing Letters, IEEE, vol. 8, no. 3, pp. 542–546, 2011. 
[16]
Konstantinos Makantasis, Konstantinos Karantzalos, Anastasios Doulamis, and
Nikolaos Doulamis,
“Deep supervised learning for hyperspectral data classification through convolutional neural networks,”
in Geoscience and Remote Sensing Symposium (IGARSS), 2015 IEEE International. IEEE, 2015, pp. 4959–4962.  [17] Hua Zhou, Lexin Li, and Hongtu Zhu, “Tensor regression with applications in neuroimaging data analysis,” Journal of the American Statistical Association, vol. 108, no. 502, pp. 540–552, 2013.

[18]
Konstantinos Makantasis, Konstantinos Karantzalos, Anastasios Doulamis, and
Konstantinos Loupos,
“Deep learningbased manmade object detection from hyperspectral data,”
in International Symposium on Visual Computing. Springer, 2015, pp. 717–727.
Comments
There are no comments yet.