Common Mode Patterns for Supervised Tensor Subspace Learning

02/06/2019 ∙ by Konstantinos Makantasis, et al. ∙ 0

In this work we propose a method for reducing the dimensionality of tensor objects in a binary classification framework. The proposed Common Mode Patterns method takes into consideration the labels' information, and ensures that tensor objects that belong to different classes do not share common features after the reduction of their dimensionality. We experimentally validate the proposed supervised subspace learning technique and compared it against Multilinear Principal Component Analysis using a publicly available hyperspectral imaging dataset. Experimental results indicate that the proposed CMP method can efficiently reduce the dimensionality of tensor objects, while, at the same time, increasing the inter-class separability.



There are no comments yet.


page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Advances in sensing technologies have led to the continuous generation of massive multidimensional data, used in a wide range of applications. Their successful exploitation, however, is directly linked to the effectiveness of pattern recognition methods employed for their analysis. Despite the high dimensionality, this kind of data is often characterized by large amounts of redundancy, occupying a subspace of the input space


. In this context, feature extraction for subspace learning plays a crucial role towards the mapping of high-dimensional data to a low-dimensional space

[2, 3, 4, 5]. However, feature extraction is often a challenging task due to the complex distribution of input data [6], especially in cases of limited training samples [7, 8].

The goal of feature extraction is to extract information regarding the underlying nature of the data. Unsupervised feature extraction methods in particular aim at capturing the principal statistical relation within the data and represent it in lower dimension spaces. This type of methods is referred to as unsupervised subspace learning and includes techniques such as 2D Principal Component Analysis (2D-PCA) [9], Generalized Low Rank Approximation of Matrices (GLRAM) [10], Concurrent Subspace Analysis [11] and Multilinear Principal Component Analysis (MPCA) [2]. Such methods can also decrease the computational cost of pattern recognition algorithms (e.g. for classification or regression) through the reduction of data dimensionality.

The main objective of pattern recognition, however, is the extraction of features capable of discriminating different classes. Although unsupervised subspace learning can provide a valuable tool for data analysis, the features extracted are not necessarily those salient features required to discriminate among pattern classes, since the problem of finding discriminative features is conceptually and fundamentally different than mapping data to a lower dimension space. Different sets of features should be used for different classes, which means that the feature extraction process should be conducted in a supervised manner (supervised subspace learning).

In this paper, we propose a supervised subspace learning method, which is motivated by the Common Spatial Patterns (CSP) [12, 13] algorithm. The CSP algorithm is based on a modification of the Karhunen-Loeve expansion [14], aiming at extracting features that increase inter-class separability. The application of CSP, however, is restricted to 2D data. Motivated by this fact, we extend the CSP algorithm to tensor objects of arbitrary order. In particular, we extract the common patterns corresponding to each mode of tensor objects - hence naming the former Common Mode Patterns (CMP) - that increase the separability between two classes.

2 Preliminaries

In this section we present some tensor algebra definitions and operations that will be used throughout this work. Tensor objects are denoted in calligraphic uppercase letters, matrices in bold uppercase letters, vectors in bold lowercase letters and scalars in lowercase letters.

Tensor matricization. Mode- matricization maps a tensor into a matrix , by arranging the mode- fibers to be the columns of the resulting matrix.

-mode product. The -mode product of a tensor and a matrix denoted as is a tensor in with entries

Scalar product. The scalar product of two tensors is denoted as and is equal to .

Tensor norm. The Frobenius norm of a tensor is defined as .

Average total scatter. The average total scatter of a set of tensors is defined as


where .

Average mode- scatter matrix. The average mode- scatter matrix of a set of tensors is defined as


where , and is the -mode matricization of .

3 Problem Formulation

We consider a binary classification problem, where the samples are tensor objects. Let , be a set of samples that belong to the -th class, and a set of matrices, where with . The projection of any onto the subspace is defined as


A matrix representation of this projection can be obtained through the mode- matricization of and as




In relation (5), the operator denotes the Kronecker product.

The objective of this work is to project the tensor samples

onto a subspace, where the explained variance for each class is maximized, and the different properties of each class are emphasized. The projected samples thus minimize information loss and can discriminate between the two different pattern classes. By assuming that the variance of a class can be measured by the average total scatter of the tensor samples belonging to this class

[2], we can formally define the problem that needs to be solved for achieving the aforementioned objective.

Problem 1.

Estimate a single set of projection matrices [see (3)] that satisfy


for , such that the projected samples that belong to different classes will not share common important features.

In Problem 1, defined above, is the projection of onto a subspace using the projection matrices , while




Remark 1. Suppose that is a set of projection matrices that satisfies (6) either for or for . Then, as is shown in [2], each matrix , , consists of the eigenvectors corresponding to the largest eigenvalues of the matrix


where is as in (5). However, in Problem 1, the set of projection matrices should satisfy (6) both for and , and, at the same time, the resulting projection should emphasize different sets of features for each class.

4 Common Mode Patterns

4.1 Normalization Process

For the CMP algorithm to extract those important features that are required for separating two pattern classes, a preprocessing step, in the sense of a normalization process, is necessary. We hereby present this normalization process.

Suppose that we have at our disposal a set of raw tensor measurements (samples) in that belong to the -th class. Based on these samples we can define the matrix


For every and the matrix

is symmetric. Hence, matrix is also symmetric, since it is the weighted sum of symmetric matrices.

Let us define the symmetric matrix . Since is symmetric, there exists the transformation matrix


such that


In (11), stands for the diagonal matrix of eigenvalues of , while for the matrix of eigenvectors of .

Following the above normalization process, we define the mode- matricization of tensor objects , that need to be projected onto a subspace as


The normalization process takes place before the projection, and actually corresponds to a linear transformation, which is applied on the tensor objects.

4.2 The CMP Algorithm

This section presents the CMP algorithm, which constitutes the core contribution of this paper. The CMP algorithm is based on Theorem 1 below.

Theorem 1.

Let be the solution to Problem 1. Then, given all other projection matrices , matrix consists of the eigenvectors corresponding to the largest eigenvalues of the matrix and the eigenvectors corresponding to the largest eigenvalues of the matrix .


From the definition of Frobenius norm for a tensor and that for a matrix, , and from Eq. (4), it holds that


Moreover, from Eq. (9) can be written as


The maximum trace of is obtained if consists of the eigenvectors of matrix corresponding to the largest eigenvalues. Since we want to maximize simultaneously for and , matrix will consist of the eigenvectors corresponding to the largest eigenvalues of matrix and the eigenvectors corresponding to the largest eigenvalues of matrix .

Let us denote as the matrix


After the normalization process [see Eq. (12)]


The eigenvalues and eigenvectors of are given by


with . From Eq. (17), and (18) we have that


The same holds for matrices in relation (15), since they are similar (i.e., have the same eigenvalues) with the matrices in relation (16). For this to become clearer, note that , since is the Kronecker product of orthogonal matrices, and thus it is also orthogonal.

From Eq. (19), we have that the important features for the first class are the least important features for the second class, and vice versa. This means that after the projection, the two classes cannot share common important features. ∎

The CMP algorithm is presented in Algorithm 1. Please note that during the estimation of the set only the matrices are used. matrices are not employed in the algorithm since matrices and have the same eigenvectors and reversely ordered eigenvalues.

1. Set , for
2. Calculate and using relations (10) and (11) for
3. Normalize tensor samples using relation (13)
4. repeat
       for  do
             4.1 Calculate the matrix of relation.(5)
             4.2 Calculate the matrix of relation.(9)
             4.3 Calculate the eigenvectors of
             4.4 Set the columns of equal to the eigenvectors of
       end for
until termination criteria are met;
5. For each keep the eigenvectors with the largest eigenvalues and eigenvectors with the smallest eigenvalues.
Algorithm 1 Estimation of matrices

5 Experimental Results

In this study we validated the CMP methodology using a widely known and publicly available hyperspectral imaging dataset, named Pavia University, whose number of spectral bands is 103 (see Fig.1). Ground truth contains 9 classes, while pixels in white color are not annotated.

Figure 1: Pavia University dataset (figure taken from [15]).

The CMP method was developed for binary classification problems. Thus, we grouped together pixels that depict man-made objects and discriminated them from the rest of the pixels. For this dataset, pixels that depict man-made objects are labeled as asphalt, metal sheets, bricks and bitumen. Then, the tagged parts of the dataset were split into two sets, i.e. training and testing data. The training set was created by selecting 200 samples from each class.

In order to classify a pixel at location

on image plane, we followed the approach presented in [16], according to which the image is split, along its spatial dimensions, into overlapping patches of size , where is the number of spectral bands. Then, it is assumed that the label of a pixel located at position on image plane, will be the same as the label of the patch centered at location.

During experimental validation we compared the proposed CMP method against MPCA [2], using three different classifiers: Rank-1 Tensor Regression (Rank-1 TR) [17], CNN [16, 18], and Rank-1 FNN [7, 8]. The efficiency of CMP and MPCA was quantified in terms of the classification accuracy of the classifiers on testing set. In our experiments, we set the parameter equal to 7, and required from both MPCA and CMP to reduce the spatial dimension of the samples to elements. Then, two sets of experiments were conducted. In the first one, the spectral dimensionality of the samples was reduced by selecting: the 26 principal components using MPCA (MPCA-26); the 26 principal components for each pattern class using CMP (CMP-26), and the 13 principal components for each pattern class using CMP (CMP-13), so that the dimensionality along the spectral dimension is 26. In the second one, the spectral dimensionality of the samples was reduced by selecting: the 10 principal components using MPCA (MPCA-10); the 10 principal components for each pattern class using CMP (CMP-10), and the 5 principal components for each pattern class using CMP (CMP-5). In the first experiment the size of the dataset was reduced 4 times, while in the second one 10 times.

CNN Rank-1 FNN Rank-1 TR
MPCA-26 85.08 86.80 77.56
CMP-26 90.41 91.25 77.96
CMP-13 88.57 88.67 76.90
MPCA-10 83.49 84.39 77.59
CMP-10 88.23 88.31 77.52
CMP-5 86.76 86.27 76.08
Table 1: Overall classification accuracy results (%).

The comparison between MPCA and CMP is presented in Table 1. The CMP method is more efficient than the MPCA for reducing the dimensionality of tensor objects, regardless of the classification model used, due to the fact that it can exploit labels’ information. In other words, CMP is a supervised subspace learning technique, while MPCA is an unsupervised one. For Rank-1 TR the classification accuracy is almost the same both when MPCA and CMP methods are used. This is justified by the fact that Rank-1 TR is a linear classifier and, due its low capacity, cannot perform any better on this dataset.

6 Conclusion

In this work, we presented the CMP method, a supervised tensor subspace learning technique, which ensures that tensor objects that belong to different classes will not share common important features after dimensionality reduction. The CMP method was compared against MPCA, and experimental results indicate that it can reduce the dimensionality of tensor objects in a more efficient way. However, the main limitation of CMP is that it is designed for binary classification problems. Therefore, the main focus of our future work is, first, to extend this approach to multi-class classification problems. Another priority of our future work includes the evaluation of CMP efficiency on more datasets with comparisons against other supervised tensor subspace learning methods.


  • [1] Gregory Shakhnarovich and Baback Moghaddam,

    Face recognition in subspaces,”

    in Handbook of Face Recognition, pp. 19–49. Springer, 2011.
  • [2] Haiping Lu, Konstantinos N Plataniotis, and Anastasios N Venetsanopoulos, “Mpca: Multilinear principal component analysis of tensor objects,”

    IEEE transactions on Neural Networks

    , vol. 19, no. 1, pp. 18–39, 2008.
  • [3] Feiping Nie, Shiming Xiang, Yangqiu Song, and Changshui Zhang, “Extracting the optimal dimensionality for local tensor discriminant analysis,” Pattern Recognition, vol. 42, no. 1, pp. 105–114, 2009.
  • [4] Zhihui Lai, Yong Xu, Jian Yang, Jinhui Tang, and David Zhang, “Sparse tensor discriminant analysis,” IEEE transactions on Image processing, vol. 22, no. 10, pp. 3904–3915, 2013.
  • [5] Weiming Hu, Xi Li, Xiaoqin Zhang, Xinchu Shi, Stephen Maybank, and Zhongfei Zhang, “Incremental tensor subspace learning and its applications to foreground segmentation and tracking,”

    International Journal of Computer Vision

    , vol. 91, no. 3, pp. 303–327, 2011.
  • [6] Haiping Lu, Konstantinos N Plataniotis, and Anastasios N Venetsanopoulos, “A survey of multilinear subspace learning for tensor data,” Pattern Recognition, vol. 44, no. 7, pp. 1540–1551, 2011.
  • [7] Konstantinos Makantasis, Anastasios Doulamis, Nikolaos Doulamis, and Antonis Nikitakis, “Tensor-based classifiers for hyperspectral data analysis,” in International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018.
  • [8] Konstantinos Makantasis, Anastasios Doulamis, Nikolaos Doulamis, Antonis Nikitakis, and Athanasios Voulodimos, “Tensor-based nonlinear classifier for high-order data analysis,” arXiv preprint arXiv:1802.05981, 2018.
  • [9] Jian Yang, David Zhang, Alejandro F Frangi, and Jing-yu Yang, “Two-dimensional pca: a new approach to appearance-based face representation and recognition,” IEEE transactions on pattern analysis and machine intelligence, vol. 26, no. 1, pp. 131–137, 2004.
  • [10] Jieping Ye, “Generalized low rank approximations of matrices,” Machine Learning, vol. 61, no. 1-3, pp. 167–191, 2005.
  • [11] Dong Xu, Shuicheng Yan, Lei Zhang, Stephen Lin, Hong-Jiang Zhang, and Thomas S Huang, “Reconstruction and recognition of tensor-based objects with concurrent subspaces analysis,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 18, no. 1, pp. 36–47, 2008.
  • [12] Herbert Ramoser, Johannes Muller-Gerking, and Gert Pfurtscheller, “Optimal spatial filtering of single trial eeg during imagined hand movement,” IEEE transactions on rehabilitation engineering, vol. 8, no. 4, pp. 441–446, 2000.
  • [13] Benjamin Blankertz, Ryota Tomioka, Steven Lemm, Motoaki Kawanabe, and K-R Muller, “Optimizing spatial filters for robust eeg single-trial analysis,” IEEE Signal processing magazine, vol. 25, no. 1, pp. 41–56, 2008.
  • [14] Keinosuke Fukunaga and Warren LG Koontz,

    “Application of the karhunen-loeve expansion to feature selection and ordering,”

    IEEE Transactions on computers, vol. 100, no. 4, pp. 311–318, 1970.
  • [15] Mauro Dalla Mura, Alberto Villa, Jon Atli Benediktsson, Jocelyn Chanussot, and Lorenzo Bruzzone,

    “Classification of hyperspectral images by using extended morphological attribute profiles and independent component analysis,”

    Geoscience and Remote Sensing Letters, IEEE, vol. 8, no. 3, pp. 542–546, 2011.
  • [16] Konstantinos Makantasis, Konstantinos Karantzalos, Anastasios Doulamis, and Nikolaos Doulamis,

    “Deep supervised learning for hyperspectral data classification through convolutional neural networks,”

    in Geoscience and Remote Sensing Symposium (IGARSS), 2015 IEEE International. IEEE, 2015, pp. 4959–4962.
  • [17] Hua Zhou, Lexin Li, and Hongtu Zhu, “Tensor regression with applications in neuroimaging data analysis,” Journal of the American Statistical Association, vol. 108, no. 502, pp. 540–552, 2013.
  • [18] Konstantinos Makantasis, Konstantinos Karantzalos, Anastasios Doulamis, and Konstantinos Loupos,

    Deep learning-based man-made object detection from hyperspectral data,”

    in International Symposium on Visual Computing. Springer, 2015, pp. 717–727.