Riemannian Metric Learning for Symmetric Positive Definite Matrices

01/10/2015
by   Raviteja Vemulapalli, et al.
0

Over the past few years, symmetric positive definite (SPD) matrices have been receiving considerable attention from computer vision community. Though various distance measures have been proposed in the past for comparing SPD matrices, the two most widely-used measures are affine-invariant distance and log-Euclidean distance. This is because these two measures are true geodesic distances induced by Riemannian geometry. In this work, we focus on the log-Euclidean Riemannian geometry and propose a data-driven approach for learning Riemannian metrics/geodesic distances for SPD matrices. We show that the geodesic distance learned using the proposed approach performs better than various existing distance measures when evaluated on face matching and clustering tasks.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

08/25/2019

Riemannian Geometry of Symmetric Positive Definite Matrices via Cholesky Decomposition

We present a new Riemannian metric, termed Log-Cholesky metric, on the m...
02/03/2022

Log-Euclidean Signatures for Intrinsic Distances Between Unaligned Datasets

The need for efficiently comparing and representing datasets with unknow...
02/23/2022

Robust Geometric Metric Learning

This paper proposes new algorithms for the metric learning problem. We s...
08/05/2018

Prediction in Riemannian metrics derived from divergence functions

Divergence functions are interesting discrepancy measures. Even though t...
07/18/2016

Geometric Mean Metric Learning

We revisit the task of learning a Euclidean metric from data. We approac...
10/13/2016

Infinite-dimensional Log-Determinant divergences II: Alpha-Beta divergences

This work presents a parametrized family of divergences, namely Alpha-Be...
01/27/2016

Neighborhood Preserved Sparse Representation for Robust Classification on Symmetric Positive Definite Matrices

Due to its promising classification performance, sparse representation b...

Notations

  • denotes the identity matrix of appropriate size.


  • denotes an inner product.

  • denotes the set of symmetric matrices.

  • denotes the set of symmetric positive definite matrices.

  • denotes the tangent space to the manifold at the point .

  • denotes the matrix Frobenius norm.

  • Chol(P) denotes the lower triangular matrix obtained from the Cholesky decomposition of a matrix P.

  • exp() and log() denote matrix exponential and logarithm respectively.

  • and represent partial derivatives.

1 Introduction

Many computer vision applications involve features that obey specific constraints. Such features often lie in non-Euclidean spaces, where the underlying distance metric is not the regular norm. For instance, popular features like shapes, rotation matrices, linear subspaces, symmetric positive definite (SPD) matrices, etc. are known to lie on Riemannian manifolds. In such cases, one needs to develop inference techniques that make use of the underlying manifold structure.

Over the past few years, manifolds have been receiving considerable attention from the computer vision community. In this work, we focus our attention on the set of SPD matrices. Examples of SPD matrices in computer vision include diffusion tensors [1], structure tensors [2] and covariance region descriptors [3]. Diffusion tensors arise naturally in medical imaging [1]. In diffusion tensor magnetic resonance imaging (DT-MRI), water diffusion in tissues is represented by a diffusion tensor characterizing the anisotropy within the tissue. In optical flow estimation and motion segmentation, structure tensors are often employed to encode important image features, such as texture and motion [2]. Covariance region descriptors are used in texture classification [3], object detection [4], object tracking, action recognition and face recognition [5]. There are several advantages of using covariance matrices as region descriptors. Covariance matrices provide a natural way of fusing multiple features which might be correlated. The diagonal entries of a covariance matrix represent the variance of individual features and the non-diagonal entries represent the cross correlations. The noise corrupting individual samples is largely filtered out with an averaging filter during covariance computation. Covariance matrices are low dimensional compared to joint feature histograms. Covariance matrices do not have any information regarding the ordering and the number of points. This implies a certain level of scale and rotation invariance over the regions in different images.

Various distance measures have been proposed in the literature for the comparison of SPD matrices. Among them, the two most widely-used distance measures are the affine-invariant distance [1] and the log-Frobenius distance [6] (also referred to as log-Euclidean distance in the literature). The main reason for their popularity is that they are geodesic distances induced by Riemannian metrics.

The log-Euclidean framework [6] proposed by Arsigny et. al. defines a class of Riemannian metrics, rather than a single metric, called log-Euclidean Riemannian metrics. According to this framework, any inner product defined on extended to by left- or right- multiplication is a bi-invariant Riemannian metric. Equipped with this bi-invariant metric, the space of SPD matrices is a flat Riemannian space and the geodesic distance corresponding to this bi-invariant Riemannian metric is equal to the distance induced by in . Surprisingly, this remarkable result has not been used by the computer vision community. Since

is a vector space, this result allows us to learn log-Euclidean Riemannian metrics and corresponding log-Euclidean geodesic distances from the data by using Mahalanobis distance learning techniques like information-theoretic metric learning (ITML) [7] and large margin nearest neighbor distance learning [8] in

. In this work, we explore this idea of data driven Riemannian metrics/geodesic distances for the set of SPD matrices. For learning Mahalanobis distances in we use the ITML technique.

Organization: In section 2, we provide a brief overview of various distance measures used in the literature to compare SPD matrices. We briefly explain the ITML technique in section 3 and present our approach for learning log-Euclidean Riemannian metrics/log-Euclidean geodesic distances from the data in section 4. We provide some experimental results in section 5 and conclude the paper in section 6.

2 Distances to compare SPD matrices

Various distance measures have been used in the literature to compare SPD matrices. Each distance has been derived from different geometrical, statistical or information-theoretic considerations. Though many of these distances try to capture the non-linearity of SPD matrices, not all of them are geodesic distances induced by Riemannian metrics. Tables 1 and 2 summarize these distances and their properties. Among them, the log-Frobenius distance[6] and the affine-invariant distance[1] are the most popular ones.

3 Mahalanobis distance learning using ITML

Information theoretic metric learning [7] is a technique for learning Mahalanobis distance functions from the data based on similarity and dissimilarity constraints. Let be a set of points in . Given pairs of similar points and pairs of dissimilar points , the aim of ITML is to learn an SPD matrix such that the Mahalanobis distance parametrized by is below a given threshold for similar pairs of points and above a given threshold for dissimilar pairs of points.
Let denote the LogDet divergence between SPD matrices defined as

(1)

ITML formulates the Mahalanobis matrix learning as the following optimization problem:

(2)

where denotes the index of the th constraint, is the vector of variables , is a vector whose components equal for similarity constraints and for dissimilarity constraints, is an SPD matrix that captures the prior knowledge about , and is a parameter controlling the tradeoff between satisfying the constraints and minimizing . This optimization problem can be solved efficiently using Bregman iterations. In this work, we use the publicly available ITML code provided by the authors of [7].

ITML parameters: We need to specify the values for the following parameters while using ITML: . We choose the constraint thresholds and as the and percentiles of the observed distribution of distances between pairs of points within the training dataset. Hence, the parameters for the ITML algorithm are and .

Distance Formula Symmetric Triangle inequality Geodesic
Frobenius Yes Yes No
Cholesky-Frobenius [13] Yes Yes No
J-divergence [12] Yes No No
Jensen-Bregman LogDet Divergence[11] Yes No No
Affine-invariant [1] Yes Yes Yes
Log-Frobenius [6] Yes Yes Yes
Table 2: SPD matrix distances and their properties
Distance Distance from Affine invariance Scale invariance Rotation invariance Inversion invariance
Frobenius Finite No No Yes No
Cholesky-Frobenius [13] Finite No No No No
J-divergence [12] Infinite Yes Yes Yes Yes
Jensen-Bregman LogDet Divergence[11] Infinite Yes Yes Yes Yes
Affine-invariant [1] Infinite Yes Yes Yes Yes
Log-Frobenius [6] Infinite No Yes Yes Yes
Table 1: SPD matrix distances and their properties

4 Log-Euclidean Riemannian metric learning

The log-Euclidean framework [6] proposed by Arsigny et. al. defines a class of Riemannian metrics called log-Euclidean metrics. The geodesic distances associated with log-Euclidean metrics are called log-Euclidean distances. Let be an operation on SPD matrices defined as . We have the following result based on the log-Euclidean framework introduced in [6]:
Result 4.1: Any inner product defined on extended to the Lie group by left- or right- multiplication is a bi-invariant Riemannian metric. The corresponding geodesic distance between and is given by

(3)

where is the norm induced by . Note that here is the inverse-exponential map at the identity matrix which is equal to the usual matrix logarithm in this case.
The set of all symmetric matrices form a vector space of dimension . Let denote the column vector form of the upper triangular part of a matrix . This operation provides a dimensional vector representation for . Let be an inner product defined on the vector space and be the corresponding matrix of inner products between the basis vectors corresponding to representation. Note that is uniquely characterized by . The distance between two matrices and induced by this inner product is given by

(4)

Result 4.2: Let , where . Then, defines a unique inner product denoted by on . This inner product also defines a log-Euclidean Riemannian metric which can be obtained by simply extending to the Lie group by left- or right- multiplication. The corresponding log-Euclidean geodesic distance between and is given by

(5)

The above result follows directly from result 4.1. Result 4.2 says that any Mahalanobis distance defined in the vector space is a geodesic distance on and the corresponding Riemannian metric is uniquely defined by the Mahalanobis matrix . Hence, we can learn Riemannian metrics/geodesic distances for from the data by learning Mahalanobis distance functions in the vector space . Table 3 summarizes our approach for leaning geodesic distances on . In this work, we use ITML technique for Mahalanobis distance learning.

Input:
for   to
      
end
Learn a Mahalanobis distance function using This gives a Mahalanobis matrix , where .
Output: Geodesic distance between and :
.
Table 3: Algorithm for learning geodesic distances on

5 Experiments

In the section, we evaluate the performance of the proposed Riemannian metric/geodesic distance learning approach on two applications: (i) Face matching using Labeled Faces in the Wild (LFW) dataset and (ii) Semi-supervised clustering using ETH80 dataset.

5.1 Face matching using LFW face dataset

In this experiment our aim is to predict whether a given pair of face images correspond to the same person or not.
Dataset: The LFW dataset [9] is a collection of face photographs designed for studying the problem of unconstrained face recognition. This dataset consists of 13233 labeled face images of 1680 subjects collected from the web. This dataset consists of two subsets:

  • Development subset: The development subset consists of 2200 training image pairs, where 1100 are similar pairs and 1100 are dissimilar pairs, and 1000 test image pairs, where 500 are similar pairs and 500 are dissimilar pairs. An image pair is said to be similar if both the images correspond to the same person and dissimilar if they correspond to different persons.

  • Evaluation subset: The evaluation subset consists of 3000 similar image pairs and 3000 dissimilar image pairs. It is further divided into 10 subsets each of which consists of 300 similar pairs and 300 dissimilar pairs.

All the image pairs were generated by randomly selecting images from the 13233 images in the dataset. The development subset is meant for model and parameter selection. The evaluation subset should be used only once for final training and testing. To avoid overfitting, the image pairs in the development subset were chosen to be different from the image pairs in the evaluation subset.

5.1.1 Feature extraction

We crop the face region in each image and resize it to a image. Following [3], we convert each pixel in an image into a 9-dimensional feature vector given by

where are the column and row coordinates respectively, and are the color coordinates and is the grayscale image. We use the covariance matrix of the feature vectors to represent the image.

5.1.2 Experimental protocol

Following the standard experimental protocol for this dataset, we use the development set for selecting the parameters of ITML and then use the evaluation set only once for final training and testing. Following steps summarize our experimental procedure:

  • Parameter selection: We train the ITML algorithm using the 2200 training pairs of the development subset and then test it on the 1000 test pairs of the development subset. We select the ITML parameters that give the best test accuracy.

  • Final training and testing: The evaluation set consists of 10 splits and we perform 10-fold cross-validation. In each fold, we use 9 splits (2700 similar pairs and 2700 dissimilar pairs) for training ITML and 1 split (300 similar pairs and 300 dissimilar pairs) for testing. For training ITML, we use the parameters that were selected in the previous step. Since our task is face matching, we need to threshold the learned distance function. In each fold, we find the threshold that gives best training accuracy and use the same threshold for test image pairs.

5.1.3 Comparative methods

We compare the performance of the proposed log-Euclidean metric learning approach with the following approaches:

  • Directly use any of the following distances for matching:

    • Frobenius, Cholesky-Frobenius, J-divergence, Jensen-Bregman LogDet divergence, Affine-invariant and Log-Frobenius.

  • Use ITML directly with the covariance matrices by treating them as elements of the Euclidean space of symmetric matrices.

  • Use ITML with the lower triangular matrix obtained by Cholesky decomposition.

In all these methods the distance threshold is obtained in each fold independently based on the training data.

5.1.4 Parameters

The following parameter values were used for ITML:

These parameters were selected using the development subset of the dataset.

Frobenius Cholesky-Frobenius Log-Frobenius J-divergence Jenson-Bregman LogDet divergence Affine invariant
  53.77   56.62      60.43     60.92         61.62     61.15
Table 5: Prediction accuracy on LFW dataset using distance learning
Covariance matrices Cholesky decompositions Log-Euclidean
Frobenius ITML ITML gain Frobenius ITML ITML gain Frobenius ITML ITML gain
   53.77 57.58      3.81    56.62 63.53      6.91    60.43 69.37      8.94
Table 4: Prediction accuracy on LFW dataset using various SPD matrix distances

5.1.5 Results

Tables 4 and 5 summarize the prediction results for various approaches on the LFW data set. We can draw the following conclusions from these results:

  • The proposed Riemannian metric/geodesic distance learning approach outperforms the other approaches for comparing covariance matrices.

  • The log-Euclidean geodesic distance learned from the data performs much better than the standard log-Frobenius distance.

  • Distance learning with original covariance matrices or Cholesky decompositions performs poorly compared to distance learning in the logarithm domain.

5.2 Semi-supervised clustering using ETH80 object dataset

In this experiment, we are interested in clustering the images in the ETH80 dataset into different object categories.

5.2.1 Dataset

The ETH80 object dataset [10] consists of images of 8 object categories with each category including 10 different object instances. Each object instance has 41 images captured under different views. So, each object category has 410 images resulting in a total of 3280 images.

5.2.2 Feature extraction

We convert each pixel in an image into a 9-dimensional feature vector given by

where are the column and row coordinates respectively, and are the color coordinates and is the grayscale image. We compute the covariance matrix of the feature vectors over the entire image and use it to represent the image.

5.2.3 Experimental protocol and parameters

For every object category, we randomly select 4 images from each instance for training. Hence, we use 40 samples from each object category for training, resulting in a total of 320 training images. From each pair of training images, we generate either a similarity constraint or a dissimilarity constraints based on their category labels. We use all such constraints in learning the Mahalanobis distance function. Once we learn the Mahalanobis distance function, we use it for clustering the entire dataset of 3280 images.
We repeat the above procedure 5 times and report the average clustering accuracy. In each run, we select the value of ITML parameter using two fold cross-validation on the training data. We use the following values for other ITML parameters in all the 5 runs: .

We use K-means algorithm for clustering. To handle the local-optimum issue, we run K-means with 20 different random initializations and select the clustering result corresponding to the minimum K-means cost value.

5.2.4 Comparative methods

We compare the performance of the proposed log-Euclidean metric learning approach with the following approaches:

  • Unsupervised: Directly perform K-means clustering using any of the following distances: Frobenius, Cholesky-Frobenius and Log-Frobenius.

  • Use ITML directly with the covariance matrices by treating them as elements of the Euclidean space of symmetric matrices.

  • Use ITML with the lower triangular matrix obtained by Cholesky decomposition.

Computation of mean doesn’t have a closed form solution in the case of J-divergence or Jensen-Bregman LogDet divergence or Affine-invariant distance. Hence, we need to use some optimization procedure for computing the mean. This makes K-means algorithm highly computational. Hence, we do not use these distances for comparison in this work.

5.2.5 Results

Table 6 summarizes the clustering results for various approaches on the ETH80 dataset. We can draw the following conclusions from these results:

  • The proposed Riemannian metric/geodesic distance learning approach performs better than other approaches for clustering SPD matrices.

  • The log-Euclidean geodesic distance learned from the data performs much better than the standard log-Frobenius distance.

  • Distance learning with original covariance matrices or Cholesky decompositions performs poorly compared to distance learning in the logarithm domain.

6 Conclusion

In this work, we have explored the idea of data-driven Riemannian metrics or geodesic distances. Based on the log-Euclidean framework [6], we have shown how geodesic distance functions can be learned for by simply learning Mahalanobis distance functions in the logarithm domain. We have conducted experiments using face and object data sets. The face matching and semi-supervised object categorization results clearly show that the learned log-Euclidean geodesic distance performs much better than other distances.

Covariance matrices Cholesky decompositions Log-Euclidean
Frobenius ITML ITML gain Frobenius ITML ITML gain Frobenius ITML ITML gain
   35.58 70.50      34.92    51.13 70.36      19.24    55.70 73.79      18.09
Table 6: Clustering accuracy on ETH80 dataset

References

  1. X. Pennec, P. Fillard, and N. Ayache, “A Riemannian Framework for Tensor Computing”, IJCV, 2006.

  2. A. Goh and R. Vidal, “Clustering and Dimensionality Reduction on Riemannian Manifolds”, In CVPR, 2008.

  3. O. Tuzel, F. Porikli, and P. Meer, “Region Covariance: A Fast Descriptor for Detection and Classification”, In ECCV, 2006.

  4. O. Tuzel, F. Porikli, and P. Meer, “Pedestrian Detection via Classification on Riemannian Manifolds”, PAMI, 2008.

  5. M. Harandi, C. Sanderson, R. Hartley, and B. Lovel, “Sparse Coding and Dictionary Learning for Symmetric Positive Definite Matrices: A Kernel Approach”, In ECCV, 2012.

  6. V. Arsigny, P. Fillard, X. Pennec, and N. Ayache, “Log-Euclidean Metrics for Fast and Simple Calculus on Diffusion Tensors”, Magnetic Resonance in Medicine, 2006.

  7. J. V. Davis, B. Kulis, P. Jain, S. Sra, and I. S. Dhillon, “Information-Theoretic Metric Learning ”, In ICML, 2007.

  8. K. Q. Weinberger and L. K. Saul, “Distance Metric Learning for Large Margin Nearest Neighbor Classification”, JMLR, 2009.

  9. G. B. Huang, M. Ramesh, T. Berg, and E. L.-Miller, “Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments”, University of Massachusetts, Amherst, Technical Report 07-49, October, 2007.

  10. B. Leibe and B. Schiele, “Analyzing Appearance and Contour Based Methods for Object Categorization”, In CVPR, 2003.

  11. A. Cherian, S. Sra, A. Banerjee, and N. Papanikolopoulos, “Efficient Similarity Search for Covariance Matrices via the Jensen-Bregman LogDet Divergence”, In ICCV, 2011.

  12. Z. Wang and B. C. Vemuri, “An Affine Invariant Tensor Dissimilarity Measure and its Applications to Tensor-valued Image Segmentation”, In CVPR, 2004.

  13. I. L. Dryden, A. Koloydenko, and D. Zhou, “Non-Euclidean Statistics for Covariance Matrices, with Applications to Diffusion Tensor Imaging”, The Annals of Applied Statistics, 2009.