Symmetric Positive Semi-definite Riemannian Geometry with Application to Domain Adaptation

07/28/2020 ∙ by Or Yair, et al. ∙ Technion 0

In this paper, we present new results on the Riemannian geometry of symmetric positive semi-definite (SPSD) matrices. First, based on an existing approximation of the geodesic path, we introduce approximations of the logarithmic and exponential maps. Second, we present a closed-form expression for Parallel Transport (PT). Third, we derive a canonical representation for a set of SPSD matrices. Based on these results, we propose an algorithm for Domain Adaptation (DA) and demonstrate its performance in two applications: fusion of hyper-spectral images and motion identification.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 19

page 22

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Recent technological advances give rise to the collection and storage of massive complex datasets. These datasets are often high dimensional and multimodal, calling for the development of informative representations. Since such complex data typically do not live in a Euclidean space, standard linear analysis techniques applied directly to the data are inappropriate. Consequently, analysis techniques based on Riemannian geometry have attracted significant research attention. While the basic mathematical operations used for data analysis, e.g., addition, subtraction, and comparison, are straight-forward in the Euclidean space, they are often non-trivial or intractable in particular Riemannian spaces.

A considerable amount of literature has been published on the Riemannian geometry of Symmetric Positive Definite (SPD) matrices, where it was shown to be useful for various applications, e.g., in computer vision, medical data analysis, and machine learning

[pennec2006riemannian, barachant2013classification, jayasumana2015kernel, huang2015log, bacak2016second, rodrigues2017dimensionality, rodrigues2018riemannian, bergmann2018priors]. For example, in [pennec2006riemannian], Pennec et al. introduced the use of the affine-invariant metric, facilitating closed-form expressions of the exponential and logarithmic maps, in medical imaging. In [barachant2013classification]

, Barachant et al. proposed an algorithm based on the Riemannian distance and the estimation of the Riemannian mean

[moakher2005differential] of SPD matrices for brain computer interface (BCI). The PT on the SPD manifold, which has a closed-form expression, was used in [yair2019parallel] for DA. Similar geometric operations in other Riemannian spaces have been developed as well, e.g., on the Grassmann manifold [absil2004riemannian] and on the Stiefel manifold [edelman1998geometry], and were shown to be beneficial for a wide variety of data analysis tasks, e.g. [shrivastava2014unsupervised].

In this paper, we consider the Riemannian geometry of symmetric positive semi-definite (SPSD) matrices. Formally, let denote the set of SPSD matrices with a fixed rank

. Based on the eigenvalue decomposition, it can be shown that any SPSD matrix

can be represented by:

where has orthonormal columns representing a point on the Grassmann manifold and

is an SPD matrix. This geometry extends the Riemannian geometry of SPD matrices. In addition, it facilitates the analysis of a larger pool of data features. For example, the SPD geometry supports the analysis of only full-rank covariance matrices. However, it is well known that in many real-world problems, this is not the case. Often, high-dimensional data such as gene expression data

[kapur2016gene] and hyper-spectral imaging data [HyperspectralLowRank, niu2016hyperspectral] have an intrinsic low-rank structure, and therefore, the associated covariance matrices are not full-rank. In addition to supporting low-rank covariance matrices, in contrast to the SPD geometry, the SPSD geometry applies to a wide variety of kernels, graph Laplacians, and similarity matrices, which are common data features in contemporary data analysis.

Despite the high relevance to many data analysis techniques, the usage of the Riemannian geometry of SPSD matrices has thus far been limited, since it lacks several pivotal components. First, there is no available explicit expression for the geodesic path in connecting two SPSD matrices. As a consequence, there is no definitive expression for the Riemannian distance between two SPSD matrices, which is typically defined as the arc length of the geodesic path. In addition, basic operations such as the logarithmic and the exponential maps, which are derived from the geodesic path, are undefined. Second, the representation of by a pair is not unique, posing challenges when jointly analyzing multiple SPSD matrices. These missing components led to an avenue of research, where full-rank structure is imposed by adding a scalar matrix to each of the given low-rank matrices [wang2012covariance, fang2018new]. Essentially, this approach “artificially” transforms the SPSD geometry into the SPD geometry by introducing a component that does not stem from the data.

Instead, here we propose to extend the Riemannian geometry of SPSD matrices head-on. Our developments largely rely on the work of Bonnabel and Sepulchre [bonnabel2010riemannian], where an approximation of the geodesic path on the SPSD manifold was presented, giving rise to a meaningful measure of proximity between two SPSD matrices, and on the work of Bonnabel et al. [bonnabel2013rank], where a rank-preserving mean of a set of fixed-rank SPSD matrices was defined. First, based on the approximation of the geodesic path in [bonnabel2010riemannian], we introduce an approximation of the logarithmic and exponential maps. Second, we present a closed-form expression for the PT on . Finally, using the mean of SPSD matrices proposed in [bonnabel2013rank], we derive a canonical representation for a set of SPSD matrices.

Based on the developed mathematical infrastructure for the analysis of SPSD matrices with a fixed rank, we address the problem of DA. Often, due to the inherent heterogeneity of many types of datasets, useful representations usually cannot be achieved simply by considering the union of multiple datasets. We present an algorithm for DA, which is based on the PT on

and facilitates an informative representation of multiple heterogeneous datasets. We showcase the performance of our algorithm in two applications. First, we demonstrate fusion of hyper-spectral images collected by airborne sensors, which allows high-quality categorization of land-covers in one image by training a classifier on another image. Second, we show accurate motion identification based on recordings of motions, which is actor-independent, i.e., independent of the actor executing these motions.

The remainder of the paper is organized as follows. In Section 2 we present preliminaries on the Riemannian manifolds which are relevant to our work: the manifold of SPD matrices , the Grassman manifold and the manifold of SPSD matrices with a fixed rank . In Section 3, we describe a particular transportation map on and that is derived from PT. Section 4 presents our approximations for the logarithmic and exponential maps on , the PT-driven transportation map on , and a canonical representation for a set of SPSD matrices. Next, we propose a new DA algorithm in Section 5. Section 6 consists of two applications of the proposed DA algorithm to real data. Finally, Section 7 concludes the paper.

2 Preliminaries

In this section, we briefly describe several known properties of the manifold of SPD matrices , the Grassman manifold , and the manifold of SPSD matrices , which will be extensively used throughout the paper. First, we formally denote the following sets:

  • – The set of SPD matrices.

  • – The set of SPSD matrices with rank .

  • – The set of -dimensional subspaces of .

  • – The set of matrices with orthonormal columns: for .

  • – The set of orthogonal matrices .

In addition, given a manifold with its Riemannian geodesic distance , we denote the Fréchet (Karcher) mean of the set by:

2.1 The manifold of SPD matrices

The matrix is an SPD matrix if it is symmetric and all of its eigenvalues are strictly positive. Denote the set of all SPD matrices by:

The set can be embedded in a dimensional space, that is:

The tangent space at any point is the set of all symmetric matrices:

The affine invariant metric (inner product) in the tangent space is given by:

(1)

for any , where is the standard Euclidean inner product given by .

The set equipped with the affine invariant metric Eq. 1 gives rise to a Riemannian manifold. Below, we outline the main properties of this manifold. For more details, we refer the readers to [bhatia2009positive].

  • The geodesic path from to can be parametrized by:

    (2)
  • The arc length of the geodesic path defines an affine invariant distance and is explicitly given by:

    where is the th eigenvalue of the matrix , and is the Frobenius norm.

  • The exponential map from the point at the direction is given by:

    (3)
  • The logarithmic map, which is the inverse of the exponential map, is given by:

    (4)

    for any .

  • The PT

    of the tangent vector

    to , is given by:

  • Given a set of SPD matrices , a useful Euclidean vector approximation in the tangent space , where , is given by:

    where

Given a set of SPD matrices , Algorithm 1 can be used to obtain the Riemannian SPD mean .

Input: A set of SPD matrices

Output: The Riemannian mean

  1. set

  2. do

    1. using Eq. 4

    2. using Eq. 3

    while

Algorithm 1 SPD Mean

2.2 The Grassman manifold

Let

(5)

be the equivalence class of all orthogonal matrices such that their leftmost columns span the same subspace. If , that is, the leftmost columns have the same span, we denote the equivalence relation by:

(6)

For convenience, when considering only the leftmost columns of we sometimes use (and similarly ) instead of (and ), and we will state the dimensions explicitly when necessary. This “thin representation”, using instead of , gives rise to economic implementations of most of the operations detailed below.

Let be the set of all -dimensional subspaces of , where represents any unique -dimensional span as in Eq. 5. It can also be viewed as the quotient space

Following [edelman1998geometry], for computational purposes, we usually consider a single matrix, either or , to represent the entire equivalence class . Throughout the paper, when considering multiple points (subspaces) on the Grassman manifold, we assume that the principal angels between those subspaces are strictly smaller than .

The set can be embedded in a dimensional space, that is:

The tangent space at

, represented by the orthogonal matrix

, is given by:

where for any . For simplicity, the tangent space can be equivalently written as

where , , and is the orthogonal complement of . The inner product in is given by:

(7)

where .

The set and the inner product Eq. 7 form the Grassman manifold. Below, we outline its main properties. For more details, we refer the readers to [edelman1998geometry].

  • The exponential map from the point , which represents the point , at the direction is given by:

    (8)

    where . For small values of , the curve is a geodesic. Similarly, the exponential map from the point , which represents the point , at the direction is given by:

    (9)

    where is a compact SVD.

  • Given two points , representing the two points , the logarithmic map, which is the inverse of the exponential map, is given by:

    (10)

    where

    is a compact SVD decomposition. Let and . The tangent vector in Eq. 10 can be recast as

    (11)

    where and .

  • Given , the geodesic between the two points can be computed by:

    (12)

    Note that in general but not necessarily . We note that the expression in Eq. 12 is well defined if all the principal angels between the two subspaces and are strictly smaller than .

  • The arc length of the geodesic path between the points and is given by:

    where is an SVD decomposition, , , and are known as the principal angles between the two subspaces and .

  • The PT of the tangent vector along the geodesic where is given by:

    (13)

    Specifically, if we have:

Given a set of matrices , where each represents a point , Algorithm 2 can be used to obtain the Riemannian mean on the Grassman manifold .

Let and represent two points on , and let and be their leftmost columns, respectively. When considering the Stiefel manifold , the closest point in to is given by:

(14)

where the logarithmic map is computed using Eq. 10 and Eq. 11, and the exponential map is computed using Eq. 8. See Fig. 1 for illustration. Using the compact representations and , Eq. 14 can be simply recast as

(15)

where is an SVD decomposition. This result will be heavily used in remainder of the paper.

Figure 1: Illustration of the computation of the closest point in to in Eq. 14.

Input: A set of matrices , each represents a point

Output: The Grassman mean

  1. set

  2. do

    1. using Eq. 10

    2. using Eq. 9

    while

Algorithm 2 Grassman Mean

2.3 The manifold of SPSD

The set of all SPSD matrices with a fixed rank is given by

Since any can be represented by:

where and , Bonnabel and Sepulchre [bonnabel2010riemannian] proposed the following structure space representation

We note that this representation is not unique since

and therefore

for any . In other words, the set can be written as the quotient manifold

Using the structure space, the set can be embedded in dimensional space, that is:

The tangent space in the structure space is given by:

The inner product in the tangent space is given by:

(16)

where .

There is no definitive expression for the geodesic path between two points on the manifold. Bonnabel and Sepulchre [bonnabel2010riemannian] proposed the following approximation in the structure space. Let and be two points on such that . Then, the approximate geodesic path between and is given by:

(17)

where and as in Eq. 2 and Eq. 12. In addition, the length of the curve is given by:

(18)

We note that this is not a distance on , since it does not satisfy the triangle inequality. For more details on the SPSD manifold, we refer the readers to [bonnabel2010riemannian].

Given a set of SPSD matrices

, an algorithm to compute a point which admits the desirable property of the geometric mean was proposed in

[bonnabel2013rank]. The algorithm is summarized in Algorithm 3.

We remark that the lack of definitive expression for the geodesic path entails that there is also no definitive expression for the logarithmic and exponential maps.

Input: A set of SPSD matrices

Output: The proposed mean

  1. Obtain the Grassman mean :

    1. Obtain , the range of (e.g., by using SVD)

    2. Compute , the Grassman mean of using Algorithm 2

  2. Obtain the SPD mean :

    1. Compute and using SVD:

    2. Set

    3. Compute , the SPD mean of using Algorithm 1

  3. Set

Algorithm 3 Riemannian Mean of SPSD matrices as proposed in [bonnabel2013rank]

3 Transportation on a Riemannian manifold

In this section, we study a transport map of a set of points with respect to two reference points on a Riemannian manifold. This transportation gives the foundation to the proposed DA, as we will show in the sequel. We begin with a general definition.

Definition 1

Consider a set of points on a Riemmanian manifold . Let be the Riemannian mean of the set, and let be a target mean.

We call a transport map an isometric transport of from to , if it satisfies the following two properties.

  1. preserves pairwise distances: .

  2. The mean of the transported set is , that is .

By the above definition, given such a transport map , any composition of “rotation” about (in the Riemannian sense) applied to also satisfied Definition 1.

In order to resolve this degree of freedom, we focus on transport maps defined as follows. Let

be the PT from to on the manifold . Based on , we define the transport map as follows

for any . Namely, the map is a composition of three steps:

  1. Applying the logarithmic map to and obtaining the corresponding vector :

  2. Applying PT to from to :

  3. Applying the exponential map to and obtaining the point :

This transport is derived from PT with one important distinction: while PT maps points from tangent space to tangent space, this transport maps points from the manifold to the manifold.

The extra degree of freedom associated with Definition 1 is resolved by considering the map , because the parallel transports we consider on the specific manifolds of interest are with respect to the Levi-Civita connection. The Levi-Civita connection is the unique torsion-free metric connection. As a result, such parallel transports along a curve are torsion-free, i.e., they preserve the inner products on the various tangent spaces, circumventing the “screw around the curve”.

In the following, we will show that the specifications of to the manifolds and satisfy Definition 1. In addition, we will provide compact and closed-form expressions for these transports.

3.1 on

Let be a set of points on with mean , and let be a target mean. In [yair2019parallel], it was shown that can be written in a compact (linear) form:

(19)

where

(20)

Direct computation yields that admits the properties of Definition 1. First, is isometric, i.e., it preserves the pairwise distances; for any

where the latter transition is because is affine invariant. Second, the mean of the transported set coincides with the target mean, that is

(21)

where in we use the congruence invariance property of geometric mean (see [bhatia2009positive]), and in we plug Eq. 20.

Suppose that is the Riemannian mean of another set . In addition to satisfying Definition 1, the transported set coincides with under the conditions specified in the following statement.

Proposition 1

Let be a set of points on with the Riemannian mean . Consider the map defined by

where . Let be the Riemannian mean of the resulting set . The following holds

if and only if is of the form where either or .

See the proof in the Supplementary Material (SM).

Note that on , one can overload the PT operator with the manifold transportation , that is

(22)

3.2 on

Let be a set of points on with mean , and let be a target mean. On the Grassman manifold, we have an equivalent result to Eq. 22, giving rise to a closed-form expression of .

Proposition 2

Let and be two points in , such that . Define by

(23)

Then