Riemannian joint dimensionality reduction and dictionary learning on symmetric positive definite manifold

02/11/2019 ∙ by Hiroyuki Kasai, et al. ∙ Microsoft University of Electro-Communications 0

Dictionary leaning (DL) and dimensionality reduction (DR) are powerful tools to analyze high-dimensional noisy signals. This paper presents a proposal of a novel Riemannian joint dimensionality reduction and dictionary learning (R-JDRDL) on symmetric positive definite (SPD) manifolds for classification tasks. The joint learning considers the interaction between dimensionality reduction and dictionary learning procedures by connecting them into a unified framework. We exploit a Riemannian optimization framework for solving DL and DR problems jointly. Finally, we demonstrate that the proposed R-JDRDL outperforms existing state-of-the-arts algorithms when used for image classification tasks.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Dictionary leaning (DL) combined with sparse representation (SR) has become popular for many computer vision tasks. Many DL algorithms, e.g., K-SVD

[2]

, were applied originally for unsupervised learning tasks. Recently, some supervised DL algorithms have been proposed for classification tasks which exploit

class label information in the training samples. They include D-KSVD [3] and LC-KSVD [4]

, to name a few. However, DL for high-dimensional data is computationally expensive. To circumvent this issue, dimensionality reduction (DR) techniques are used which reduce the computational cost and highlight the low-dimensional discriminative feature of the data.

In general, DR is applied first to the data samples, and then the dimensionality-reduced data are used for DL. The separately pre-learned DR projection matrix, however, does not fully promote the latent structure of data or preserve the best feature for DL [5]. To address this issue, Feng et al. [6] have proposed integration of DL and DR for improvement of the discriminative classification performance, in which a specific constraint similar to the Fisher linear discriminative analysis is imposed on the coefficient matrix. Similarly, Yang et al. [7] propose learning of the projection matrix and class-specific dictionary jointly. Li et al. [8] report an integrated learning method of the non-negative projection matrix. Foroughi et al. [9] discuss specific constraints on the coefficient matrix and on the projection matrix.

In many computer vision tasks, data of interest often reside on a manifold, which is a generalization of the Euclidean space. A particular manifold of interest is the manifold of symmetric positive definite

(SPD) matrices that has been widely used in many applications. For example, region covariance matrices (RCM), which are symmetric positive definite, give good performance in texture classification and face recognition tasks

[10, 11]

. The diagonal elements of a RCM represent the variances of coponent features, and the off-diagonal elements indicate the respective correlations among them. Therefore, the RCM can represent multiple features in a natural way. It should be noted that the SPD matrices form a

Riemannian manifold, which allows to understand the geometry of the space [12]. Cherian and Sra [13] exploit the manifold structure to propose a Riemannian DL and sparse coding (SC) algorithm. Separately, the Riemannian DR techniques have been proposed in several works [14, 15, 16, 17].

In this paper, our main contribution is to learn DL and DR jointly in the Riemannian framework. We propose R-JDRDL, an algorithm for jointly learning the projection matrix for DR and the discriminative dictionary on the SPD matrices for classification tasks. The joint learning considers the interaction between DR and DL procedures by connecting them into a unified framework. The model is formulated as an objective function over a sparse coefficient matrix and a Cartesian product manifold that consists of the Stiefel manifold and multiple SPD manifolds. Optimization on the Cartesian product manifold is cast as an optimization problem on Riemannian manifolds [18]. Optimization on the sparse coefficient matrix, on the other hand, is a convex program.

This paper is organized as follows. Section II briefly introduces the SPD manifold and the Riemannian DL. Section III details the proposed R-JDRDL algorithm. Our initial results on the MNIST image classification task in Section IV show that R-JDRDL outperforms state-of-the-art algorithms in the domain.

2 SPD manifold and Riemannian DL

This section briefly explains the geometry of SPD manifold and then introduces the Riemannian DL. Hereinafter, we denote the scalars with lower-case letters

, vectors with bold lower-case letters

, and matrices with bold-face capitals . We denote a multidimensional or multi-order array as a tensor, which is denoted by .

2.1 Geometry of SPD manifold [12]

A manifold of dimensional is a topological space that locally resembles the Euclidean space in a neighborhood of each point . All the tangent vectors at X form a vector space called the tangent space of at X and denoted as . When endowed with a smoothly defined metric, i.e., inner product between vectors in the tangent space at , the manifold is called a Riemannian manifold. The space of SPD matrices, denoted as , is a Riemannian manifold, called SPD manifold, when endowed with an appropriate Riemannian metric. The tangent space at any point on is identifiable with the set symmetric matrices .

One particular choice of the Riemannian metric on the SPD manifold is the affine-invariant Riemannian metric (AIRM) [19, 12]. If P is an element on , the AIRM is defined as

where . The choice of metric does not change with affine action by , which means that on and P. The Riemannian metric provides a way to compute the distance between two points on the manifold. Because the SPD manifold with the AIRM metric has a unique shortest path, which is called geodesic, between every two points [12, Section 6], the geodesic distance is given as

where , denotes the Frobenius norm, and denotes the matrix logarithm.

2.2 Riemannian DL (R-DL)

Let be the input training sample set, where denotes -th sample that forms a SPD matrix . The dictionary to be learned is denoted as , where is an atom of the dictionary. It should be noted that and are third-order tensors. We also denote a sparse coefficient vector as , which forms a coefficient matrix , to represent a query SPD matrix using the dictionary . It should also be emphasized that is required to be non-negative to ensure that the resultant combination with the dictionary is positive definite. Therefore, we specifically represent a sparse conic combination of the dictionary and the coefficient vector as for . Finally, the problem formulation is defined as

where and respectively represent the regularizers on the coefficient vector and the dictionary [13]. To optimize this non-convex problem, an alternative minimization algorithm is used for the DL and the SC sub-problems.

3 R-JDRDL on SPD manifolds

3.1 Problem formulation of R-JDRDL

Let be the set of SPD matrices of size accompanied with class labels, i.e., , where denotes the -th class training samples. is further composed of individual samples as , where and is the number of samples of the -th class in the training set, i.e., . Both and are third-order tensors. The dictionary is denoted as , where is the class-specific sub-dictionary associated with the -th class. is also composed as , where is the number of atoms of the -th class sub-dictionary, and .

As described earlier, the proposed R-JDRDL algorithm learns not only the dictionary , but also the projection matrix , which projects -dimensional data onto -dimensional data space. More specifically, is mapped into . Here, we need only full-rankness of U to guarantee that is a SPD matrix. Equivalently, we could enforce a unitary constraint on U, i.e., . The space of unitary matrices is called the Stiefel manifold St.

Considering that model parameters are and , where denotes the space of the product manifold , our proposed formulation is

(1)

where is the discriminative reconstruction error and where and represent the graph-based constraints on the coefficient and the projection matrices, respectively. , which imposes sparsity on A. . s are non-negative regularization parameters. , , and are described below.

Discriminative reconstruction error term : The dictionary is expected to approximate the dimensionality-reduced samples from all classes, of which error is represented as , where is the Riemannian geodesic distance on the SPD manifold. In addition, to impose a more discriminative power on , the -th sub-dictionary is expected to approximate the dimensionality-reduced training samples associated with the -th class. Here, let be the sub-vector that corresponds to the -th sub-dictionary as , where . The error is equivalent to . It should be small. The sub-vector corresponding to other classes should be nearly zero, such that is small. Consequently, we obtain the cost function for as

(2)

is the regularization parameter.

Graph-based coefficient term : We enforce A to be more discriminative, and therefore, we seek to constrain the intra-class coefficients to be mutually similar and the inter-class ones to be highly dissimilar. To this end, we first construct an geometry-aware intrinsic graph of intra-class and a penalty graph for inter-class discrimination for two points as

where is the set of nearest intra-class neighbors of X in terms of geodesic distance. Similarly, is the set of nearest inter-class neighbors of X. Considering the distance of pairs of coding coefficient vectors and as an indicator of discrimination capability, the final graph-based coefficient term is defined as

where [14]. This term enforces minimization of the difference of the two coding coefficients if they are the same class, although the difference of the code is maximized if they are from different classes.

Graph-based projection term : We also learn a projection matrix that can preserve class information and which can map the training samples to a low-dimensional discriminative space. Consequently, is defined as

where the affinity matrix

allows to assign different weights to the Riemannian distance between different points, e.g., the distance is assigned the weight .

3.2 Optimization of R-JDRDL

The objective function of (1) is divided into two sub-problems, which are solved in alternating fashion. We discuss both the sub-problems below.

DL sub-problem on the product manifold: We consider the DL sub-problem of (1) by optimizing the projection matrix U and the tensor-formed dictionary , keeping A fixed to . Consequently, the problem is can be re-formulated as

We exploit the Riemannian optimization framework on the Cartesian product manifold (consisting of the Stiefel manifold and multiple SPD manifolds). In particular, we use the Riemannian conjugate gradient (RCG) method for solving the DL sub-problem. Theoretical convergence of the Riemannian algorithms is to a stationary point. The convergence analysis follows from [20, 21]. To this end, we require the expression for the Riemannian gradient. According to [13], the Riemannian gradient is obtained as with respect to from the definition of AIRM where is the Euclidean gradient of with respect to .

SC sub-problem: We consider the SC sub-problem of (1) for solving A, keeping U and fixed to and , respectively. The problem, therefore, can be re-formulated as

where is denoted as for simplicity. Here, we calculate each column of A, i.e., sequentially by fixing the other coefficients.

It should be emphasized that the above problem is a convex problem and is solved with a gradient projection algorithm. Specifically, we use the spectral projected gradient (SPG) solver [Birgin_ACMTMS_2001, 13].

Classification scheme: We apply the learned projection matrix U and the dictionary on the query test sample

to estimate its class label. For this purpose, the test sample is first projected into the low-dimensional space by

U. Subsequently, it is coded over by solving the following equation:

where . is the sub-vector corresponding to the sub-directory . The residual for the -th class is calculated as

where is a weight to balance these two terms. is the mean vector of the learned coding coefficient matrix of the -th class, i.e., . We adopt the distance between and the mean vector of the learned coding coefficient of the corresponding -th class because it gives better classification results as shown in [22]. Finally, the identity of the testing sample is determined by selecting the class label with the minimum .

4 Numerical experiments

In this section, we show the effectiveness of the proposed R-JDRDL algorithm against state-of-the-art classification algorithms on SPD matrices.

The comparison methods are the following: NN-AIRM is the AIRM-based nearest neighbor (NN) classifier; NN-Stein is the Stein metric-based NN classifier. The Stein metric

is a symmetric type of Bregman divergence and is defined as , where A and [23]. DR-NN-AIRM is the AIRM-based NN classifier with the dimensionality-reduced training samples, which are obtained by R-DR [14]. DR-NN-AIRM is the same algorithm, but the distance metric is the Stein metric. R-SRC-AIRM and R-SRC-Stein are the sparse representation classifiers (SRCs) based on the AIRM and Stein metrics, respectively. R-KSRC stands for kernel-based SRC with the Stein metric. R-DL is the DL with the SRC classifier [13]. R-DR-DL-AIRM and R-DR-DL-Stein are the DL with the SRC classifier after the R-DR algorithm.

We implement our proposed algorithm in Matlab. The DL sub-problem on the product manifold makes use of the Matlab toolbox Manopt [24]. The Matlab codes R-DL, R-DR, and R-KSRC are downloaded from the respective authors’ homepages.

We use the MNIST dataset111http://yann.lecun.com/exdb/mnist/., which are handwritten digits of 0–9. It has 60,000 images for training and 10,000 images for testing. For this dataset, we generate RCMs [10], which is computed at from the feature vector

where is the pixel value at , , , and . Then, three RCMs, one from the entire image, one from the left half and one from the right, are concatenated diagonally, which produce RCM of size for each image. We execute 10 runs under randomly selected test samples with and training samples. The dictionary size is equal to that of the training sample. Therefore, the case of represents an extreme situation. We set the parameters of the proposed algorithm, based on cross-validation, to , , and . are and in and , respectively. We also set . The original and reduced dimensions are and , respectively. We initialize U from the DR method [14] using single sample per class.

The results of the classification accuracy are presented in Table 1. The table presents superior performances of the proposed R-JDRDL against state-of-the-art algorithms. It should be noted that R-DR-DL (both with Stein and AIRM metrics) give poor performance, implying that the separately pre-learned DR projection matrix might not be optimal for the subsequent DL.

Algorithm Accuracy (Average Standard deviation)
Dictionary size 5 10
NN-AIRM
NN-Stein
DR-NN-AIRM
DR-NN-Stein
RSRC-AIRM
RSRC-Stein
R-KSRC
R-DL
R-DR-DL-AIRM
R-DR-DL-Stein
R-JDRDL (Proposed)
Table 1: Accuracy results

5 Conclusions

We have presented a Riemannian joint framework, R-JDRDL, of performing dimensionality reduction along with discriminative dictionary learning on the set of SPD matrices for classification tasks. We formulate the joint learning as an objective function with the reconstruction error term and with the constraints on the projection matrix, the dictionary, and the sparse coefficient codes. Our numerical experiments demonstrate the good performance of jointly performing DL and DR. In particular, R-JDRDL outperforms existing state-of-the-arts algorithms for the MNIST image classification task.

Extending the framework to learning with other metrics on the SPD manifold (e.g., the Stein metric or the log-Euclidean metric) will be a topic of future research, as well as having a competitive numerical implementation with extensive evaluations on other real-world datasets.

Acknowledgements

H. Kasai was partially supported by JSPS KAKENHI Grant Numbers JP16K00031 and JP17H01732.

References

  • [1] H. Kasai and B. Mishra. Riemannian joint dimensionality reduction and dictionary learning¥¥ on symmetric positive definite manifold. In EUSIPCO, 2018.
  • [2] M. Aharon, M. Elad, and A. Bruckstein. K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Sig. Proc., 54(11):4311–4322, 2006.
  • [3] Q. Zhang and B. Li. Discriminative k-svd for dictionary learning in face recognition. In CVPR, 2010.
  • [4] Z. Jiang, Z. Lin, and L.S. Davis. Learning a discriminative dictionary for sparse coding via label consistent K-SVD. IEEE Trans. Pattern Anal. Mach. Intell., 35(11):2651–2664, 2013.
  • [5] H. V. Nguyen, V. M. Patel, N. M. Nasrabadi, and R. Chellappa. Sparse embedding: A framework for sparsity promoting dimensionality reduction. In ECCV, pages 414–427, 2012.
  • [6] Z. Feng, L. Yang, M. Zhang, Y. Liu, and D. Zhang. Joint discriminative dimensionality reduction and dictionary learning for face recognition. Pattern Recognition, 46(8):2134–2143, 2013.
  • [7] B. Q. Yang, C.-C. Gu, K.-J. Wu, T. Zhang, and X.-P. Guan. Simultaneous dimensionality reduction and dictionary learning for sparse representation based classification. Multimedia Tools and Applications, 76(6):pp 8969–8990, 2016.
  • [8] W. Liu, Z. Yu, Y. Wen, R. Lin, and M. Yang. Jointly learning non-negative projection and dictionary with discriminative graph constraints for classification. In BMVC, 2016.
  • [9] H. Foroughi, N. Ray, and H. Zhang. Object classification with joint projection and low-rank dictionary learning. IEEE Trans. on Image Process., 27(2):806–821, 2018.
  • [10] Y. Pang, Y. Yuan, and X. Li. Gabor-based region covariance matrices for face recognition. IEEE Trans. Circuits Syst. Video Technol., 18(7):989–993, 2008.
  • [11] O. Tuzel, F. Porikli, and P. Meer. Region covariance: a fast descriptor for detection and classification. In ECCV, 2006.
  • [12] R. Bhatia. Positive definite matrices. Princeton series in applied mathematics. Princeton University Press, 2007.
  • [13] A. Cherian and S. Sra. Riemannian dictionary learning and sparse coding for positive definite matrices. IEEE Trans. Neural Netw. Learn. Syst., 2016.
  • [14] M. Harandi, M. Salzmann, and H. Richard. Dimensionality reduction on spd manifolds: The emergence of geometry-aware methods. IEEE Trans. Pattern Anal. Mach. Intell., 2017.
  • [15] Z. Huang and L. V. Gool. A riemannian network for spd matrix learning. In AAAI, 2017.
  • [16] Z. Huang, R. Wang, S. Shan, X. Li, and X. Chen. Log-euclidean metric learning on symmetric positive definite manifold with application to image set classification. In ICML, 2015.
  • [17] Z. Huang, R. Wang, X. Li, W. Liu, S. Shan, L. V. Gool, and X. Chen. Geometry-aware similarity learning on spd manifolds for visual recognition. IEEE Trans. Circuits Syst. Video Technol., 2017.
  • [18] P.-A. Absil, R. Mahony, and R. Sepulchre. Optimization Algorithms on Matrix Manifolds. Princeton University Press, 2008.
  • [19] X. Pennec, P. Fillard, and N. Ayache. A Riemannian framework for tensor computing. Int. Jornal of Computer Vision, 66(1):41–66, 2006.
  • [20] H. Sato and T. Iwai. A new, globally convergent Riemannian conjugate gradient method. Optimization, 64(4):1011–1031, 2015.
  • [21] W. Ring and B. Wirth. Optimization methods on Riemannian manifolds and their application to shape space. SIAM J. Optim., 22(2):596–627, 2012.
  • [22] M. Yang, L. Zhang, X. Feng, and D. Zhang. Fisher discrimination dictionary learning for sparse representation. In ICCV, 2011.
  • [23] S. Sra.

    A new metric on the manifold of kernel matrices with application to matrix geometric means.

    In NIPS, 2012.
  • [24] N. Boumal, B. Mishra, P.-A. Absil, and R. Sepulchre. Manopt: a Matlab toolbox for optimization on manifolds. JMLR, 15(1):1455–1459, 2014.