# Expanding the Family of Grassmannian Kernels: An Embedding Perspective

Modeling videos and image-sets as linear subspaces has proven beneficial for many visual recognition tasks. However, it also incurs challenges arising from the fact that linear subspaces do not obey Euclidean geometry, but lie on a special type of Riemannian manifolds known as Grassmannian. To leverage the techniques developed for Euclidean spaces (e.g, support vector machines) with subspaces, several recent studies have proposed to embed the Grassmannian into a Hilbert space by making use of a positive definite kernel. Unfortunately, only two Grassmannian kernels are known, none of which -as we will show- is universal, which limits their ability to approximate a target function arbitrarily well. Here, we introduce several positive definite Grassmannian kernels, including universal ones, and demonstrate their superiority over previously-known kernels in various tasks, such as classification, clustering, sparse coding and hashing.

There are no comments yet.

## Authors

• 6 publications
• 107 publications
• 10 publications
• 50 publications
• 84 publications
• ### Kernel Methods on Riemannian Manifolds with Gaussian RBF Kernels

In this paper, we develop an approach to exploiting kernel methods with ...
11/30/2014 ∙ by Sadeep Jayasumana, et al. ∙ 0

• ### Disturbance Grassmann Kernels for Subspace-Based Learning

In this paper, we focus on subspace-based learning problems, where data ...
02/10/2018 ∙ by Junyuan Hong, et al. ∙ 0

• ### Geodesic Exponential Kernels: When Curvature and Linearity Conflict

We consider kernel methods on general geodesic metric spaces and provide...
11/02/2014 ∙ by Aasa Feragen, et al. ∙ 0

• ### Optimizing Over Radial Kernels on Compact Manifolds

We tackle the problem of optimizing over all possible positive definite ...
12/13/2014 ∙ by Sadeep Jayasumana, et al. ∙ 0

• ### Extrinsic Methods for Coding and Dictionary Learning on Grassmann Manifolds

Sparsity-based representations have recently led to notable results in v...
01/31/2014 ∙ by Mehrtash Harandi, et al. ∙ 0

• ### More About Covariance Descriptors for Image Set Coding: Log-Euclidean Framework based Kernel Matrix Representation

We consider a family of structural descriptors for visual data, namely c...
09/16/2019 ∙ by Kai-Xuan Chen, et al. ∙ 0

• ### Geometrical and statistical properties of M-estimates of scatter on Grassmann manifolds

We consider data from the Grassmann manifold G(m,r) of all vector subspa...
12/30/2018 ∙ by Corina Ciobotaru, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

This paper introduces a set of positive definite kernels to embed Grassmannians (i.e., manifolds of linear subspaces that have a nonlinear Riemannian structure) into Hilbert spaces, which have a more familiar Euclidean structure. Nowadays, linear subspaces are a core representation of many visual recognition techniques. For example, several state-of-the-art video, or image-set, matching methods model the visual data as subspaces [6, 8, 26, 27, 12]

. Linear subspaces have also proven a powerful representation for many other computer vision applications, such as chromatic noise filtering

[25] and domain adaptation [5].

Despite their success, linear subspaces suffer from the drawback that they cannot be analyzed using Euclidean geometry. Indeed, subspaces lie on a special type of Riemannian manifolds, the Grassmann manifold, which has a nonlinear structure. As a consequence, popular techniques developed for Euclidean spaces do not apply. Recently, this problem has been addressed by embedding the Grassmannian into a Hilbert space. This can be achieved either by tangent space approximation of the manifold, or by exploiting a positive definite kernel function to embed the manifold into a reproducing kernel Hilbert space (RKHS). In either case, any existing Euclidean technique can then be applied to the embedded data, since Hilbert spaces obey Euclidean geometry. Recent studies, however, report superior results with RKHS embedding over flattening the manifold using its tangent spaces [6, 27, 11]. Intuitively, this can be attributed to the fact that a tangent space is a first order approximation to the true geometry of the manifold, whereas, being higher-dimensional, an RKHS has the capacity of better capturing the nonlinearity of the manifold.

While RKHS embeddings therefore seem preferable, their applicability is limited by the fact that only very few positive definite Grassmannian kernels are known. Indeed, in the literature, only two kernels have been introduced to embed Grassmannians into RKHS: the Binet-Cauchy kernel [28] and the projection kernel [6]. The former is a homogeneous second order polynomial kernel, while the latter is a linear kernel. As simple (low-order) polynomial kernels, they are limited in their ability to closely approximate arbitrary functions. In contrast, universal kernels provide much better generalization power [23, 17].

In this paper, we introduce a set of new positive definite Grassmannian kernels, which, among others, includes universal Grassmannian kernels. To this end, we start from the perspective of the two embeddings from which the Binet-Cauchy and the projection kernels are derived: the Plücker embedding and the projection embedding. These two embeddings yield two distance functions. We then exploit the properties of these distances, in conjunction with several theorems analyzing the positive definiteness of kernels, to derive the ten new Grassmannian kernels summarized in Table 1.

Our experimental evaluation demonstrates the benefits of our Grassmannian kernels for classification, clustering, sparse coding and hashing. Our results show that our kernels outperform the Binet-Cauchy and projection ones for gender and gesture recognition, pose categorization and mouse behavior analysis.

## 2 Background Theory

In this section, we first review some notions of geometry of Grassmannians and then briefly discuss existing positive definite kernels and their properties. Throughout the paper, we use bold capital letters to denote matrices (e.g., ) and bold lower-case letters to denote column vectors (e.g., ). is the identity matrix. indicates the Frobenius norm, with the matrix trace.

### 2.1 Grassmannian Geometry

The space of -dimensional linear subspaces of for is not a Euclidean space, but a Riemannian manifold known as the Grassmannian  [1]. We note that in the special case of , the Grassmann manifold becomes the projective space , which consists of all lines passing through the origin. A point on the Grassmann manifold  may be specified by an arbitrary matrix with orthogonal columns, i.e., 111A point on the Grassmannian is a subspace spanned by the columns of a full rank matrix and should therefore be denoted by . With a slight abuse of notation, here we call a Grassmannian point whenever it represents a basis for a subspace..

On a Riemannian manifold, points are connected via smooth curves. The distance between two points is defined as the length of the shortest curve connecting them on the manifold. The shortest curve and its length are called geodesic and geodesic distance, respectively. For the Grassmannian, the geodesic distance between two points and is given by

 δg(X,Y)=∥Θ∥2, (1)

where is the vector of principal angles between and .

###### Definition 1 (Principal Angles)

Let and be two matrices of size with orthonormal columns. The principal angles between two subspaces and are defined recursively by

 cos(θi)=maxui∈span(X)maxvi∈span(Y)uTivi (2) s.t. ∥ui∥2~{}=~{}∥vi∥2~{}=~{}1 uTiuj~{}=~{}0,j=1,2,⋯,i−1 vTivj~{}=~{}0,j=1,2,⋯,i−1

In other words, the first principal angle

is the smallest angle between any two unit vectors in the first and the second subspaces. The cosines of the principal angles correspond to the singular values of

[1]. In addition to the geodesic distance, several other metrics can be employed to measure the similarity between Grassmannian points [6]. In Section 3, we will discuss two other metrics on the Grassmannian.

### 2.2 Positive Definite Kernels and Grassmannians

As mentioned earlier, a popular way to analyze problems defined on a Grassmannian is to embed the manifold into a Hilbert space using a valid Grassmannian kernel. Let us now formally define Grassmannian kernels:

###### Definition 2 (Real-valued Positive Definite Kernels)

Let be a nonempty set. A symmetric function is a positive definite (pd) kernel on if and only if for any , and .

###### Definition 3 (Grassmannian Kernel)

A function is a Grassmannian kernel if it is well-defined and pd. In our context, a function is well-defined if it is invariant to the choice of basis, i.e., , for all and , where denotes the special orthogonal group.

The most widely used kernel is arguably the Gaussian or radial basis function (RBF) kernel. It is therefore tempting to define a Radial Basis Grassmannian kernel by replacing the Euclidean distance with the geodesic distance. Unfortunately, although symmetric and well-defined, the function

is not pd. This can be verified by a counter-example using the following points on 222Note that we rounded each value to its 4 most significant digits.:

 X1=⎡⎢⎣100100⎤⎥⎦, X2=⎡⎢⎣−0.0996−0.3085−0.4967−0.8084−0.86220.5014⎤⎥⎦, X3=⎡⎢⎣−0.98680.1259−0.1221−0.9916−0.1065−0.0293⎤⎥⎦, X4=⎡⎢⎣0.17360.08350.71160.67820.6808−0.7301⎤⎥⎦.

The function

for these points has a negative eigenvalue of

.

Nevertheless, two Grassmannian kernels, i.e., the Binet-Cauchy kernel [28] and the projection kernel [6], have been proposed to embed Grassmann manifolds into RKHS. The Binet-Cauchy and projection kernels are defined as

 k2bc(X,Y) =det(XTYYTX), (3) kp(X,Y) =∥∥XTY∥∥2F. (4)
##### Property 1 (Relation to Principal Angles).

Both and are closely related to the principal angles between two subspaces. Let be the principal angle between , i.e., by SVD, , with a diagonal matrix with elements . Then

 kp(X,Y)

Similarly, one can show that .

## 3 Embedding Grassmannians to Hilbert Spaces

While and have been successfully employed to transform problems on Grassmannians to Hilbert spaces [6, 8, 27], the resulting Hilbert spaces themselves have received comparatively little attention. In this section, we aim to bridge this gap and study these two spaces, which can be explicitly computed. To this end, we discuss the two embeddings that define these Hilbert spaces, namely the Plücker embedding and the projection embedding. These embeddings, and their respective properties, will in turn help us devise our set of new Grassmannian kernels.

### 3.1 Plücker Embedding

To study the Plücker embedding, we first need to review some concepts of exterior algebra.

###### Definition 4 (Alternating Multilinear Map)

Let and be two vector spaces. A map is multilinear if it is linear in each slot, that is if

 g(v1,⋯,λvi+λ′v′i,⋯,vk)=λg(v1,⋯,vi,⋯,vk)+λ′g(v1,⋯,v′i,⋯,vk).

Furthermore, the map is alternating if, whenever two of the inputs to are the same vector, the output is 0. That is, if .

###### Definition 5 (kth Exterior Product)

Let be a vector space. The exterior product of , denoted by is a vector space, equipped with an alternating multilinear map of the form , with the wedge product.

The wedge product is supercommutative and can be thought of as a generalization of the cross product in to an arbitrary dimension. Importantly, note that the exterior product is a vector space, that is

 k⋀V=span({v1∧v2∧⋯∧vk}),∀vi∈V.

The Grassmannian can be embedded into the projective space as follows. Let be a point on described by the basis , i.e., . The Plücker map of is given by:

###### Definition 6 (Plücker Embedding)

The Plücker embedding is defined as

 P(X)=[x1∧x2∧⋯∧xp], (5)

where is the subspace spanned by .

###### Example 1

Consider the space of two-dimensional planes in , i.e., . In this space, an arbitrary subspace is described by its basis . Let be the unit vector along the axis. We can write . Then

 P(B) =(4∑i=1a1,iei)∧(4∑i=1a2,jej) =(a1,1a2,2−a1,2a2,1)(e1∧e2)+(a1,1a2,3−a1,3a2,1)(e1∧e3) +(a1,1a2,4−a1,4a2,1)(e1∧e4)+(a1,2a2,3−a1,3a2,2)(e2∧e3) +(a1,2a2,4−a1,4a2,2)(e2∧e4)+(a1,3a2,4−a1,4a2,3)(e3∧e4).

Hence, the Plücker embedding of is a 6-dimensional space spanned by , , , . A closer look at the coordinates of the embedded subspace reveals that they are indeed the minors of all possible submatrices of . This can be shown to hold for any and .

###### Proposition 1

The Plücker coordinates of are the minors of the matrix obtained by taking rows out of the possible ones.

###### Remark 1

The space induced by the Plücker map of is -dimensional.

To be able to exploit the Plücker embedding to design new kernels, we need to define an inner product over . Importantly, to be meaningful, this inner product needs to be invariant to the specific realization of a point on (recall that, e.g., swapping two columns of a specific realization still corresponds to the same point on ). Furthermore, we would also like this inner product to be efficient to evaluate, thus avoiding the need to explicitly compute the high-dimensional embedding. Note in particular that, for vision applications, the dimensionality of becomes overwhelming and hence explicitly computing the embedding is impractical. To achieve these goals, we rely on the following definition and theorem:

###### Definition 7 (Compound Matrices)

Given a matrix , the matrix whose elements are the minors of of order arranged in a lexicographic order is called the compound of , and is denoted by .

###### Theorem 3.1 (Binet-Cauchy Theorem)

Let and be two rectangular matrices of size and , respectively. Then, .

Therefore, for , we have .

Since, for , stores all minors and hence conveys the Plücker coordinates of , this would suggest defining the inner product for the Plücker embedding as . This is indeed what was proposed in [6, 28] where was used as a linear kernel. However, while is invariant to the action of , it is not invariant to the specific realization of a subspace. This can be simply verified by permuting the columns of , which does not change the subspace, but may change the sign of . Note that this sign issue was also observed by Wolf et al[28]. However, this problem was circumvented by considering the second-order polynomial kernel .

In contrast, here, we focus on designing a valid inner product that satisfies this invariance condition. To this end, we define the inner product in as . This inner product induces the distance

 δ2bc(X,Y)=∥P(X)−P(Y)∥2=2−2∣∣det(XTY)∣∣. (6)

Clearly, if is the set of principal angles between two Grassmannian points and , then , which is invariant to the specific realization of a subspace since .

In the following, we show that the Plücker embedding has the nice property of being closely related to the true geometry of the corresponding Grassmannian:

###### Theorem 3.2 (Curve Length Equivalence)

The length of any given curve is the same under and up to a scale of .

###### Proof

Given in appendix.∎

### 3.2 Projection Embedding

We now turn to the case of the projection embedding. Note that this embedding has been better studied than the Plücker one [10].

###### Definition 8 (Projection Embedding)

The projection embedding is defined as

 Π(X)=XXT. (7)

The projection embedding is a diffeomorphism from a Grassmann manifold onto the idempotent symmetric matrices of rank , i.e., it is a one-to-one, continuous, differentiable mapping with a continuous, differentiable inverse [3]. The space induced by this embedding is a smooth, compact submanifold of of dimension . Since is a symmetric matrix, a natural choice of inner product is . This inner product can be shown to be invariant to the specific realization of a subspace, and induces the distance

 δ2p(X,Y)=∥∥Π(X)−Π(Y)∥∥2F=2p−2∥∥XTY∥∥2F.

Due to space limitation, we do not discuss the properties of the projection embedding, such as isometry [3] and length of curves [7]. We refer the reader to [10] for a more thorough discussion of the projection embedding.

## 4 Grassmannian Kernels

From the discussion in Section 3, and , defined in Eq. 4 and Eq. 3, can be seen to correspond to a homogeneous second order polynomial kernel in the space induced by the Plücker embedding and to a linear kernel in the space induced by the projection embedding, respectively. In this section, we show that the inner products that we defined in Section 3 for the Plücker and projection embeddings can actually be exploited to derive many new Grassmannian kernels, including universal kernels and conditionally positive definite kernels. In the following, we denote by and kernels derived from the Plücker embedding (Binet-Cauchy kernels) and from the projection embedding, respectively.

### 4.1 Polynomial Kernels

Given an inner product, which itself defines a valid linear kernel, the most straightforward way to create new kernels is to consider higher degree polynomials. Such polynomial kernels are known to be pd. Therefore, we can readily define polynomial kernels on the Grassmannian as

 kp,bc(X,Y) =(β+∣∣det(XTY)∣∣)α ,  β>0, (8) kp,p(X,Y) (9)

Note that the kernel used in [28] is indeed the homogeneous second order with and .

### 4.2 Universal Grassmannian Kernels

Although often used in practice, polynomial kernels are known not to be universal [23]. This can have a crucial impact on their representation power for a specific task. Indeed, from the Representer Theorem [20], we have that, for a given set of training data , and a pd kernel , the function learned by any algorithm can be expressed as

 ^f(x∗)=∑j∈Nncjk(x∗,xj). (10)

Importantly, only universal kernels have the property of being able to approximate any target function arbitrarily well given sufficiently many training samples. Therefore, and may not generalize sufficiently well for certain problems. In the following, we develop several universal Grassmannian kernels. To this end, we make use of negative definite kernels and of their relation to pd ones. Let us first formally define negative definite kernels.

###### Definition 9 (Real-valued Negative Definite Kernels)

Let be a nonempty set. A symmetric function is a negative definite (nd) kernel on if and only if for any , and with .

Note that, in contrast to positive definite kernels, an additional constraint of the form is required in the negative definite case.

The most important example of nd kernels is the distance function defined on a Hilbert space. More specifically:

###### Theorem 4.1 ([11])

Let be a nonempty set, be an inner product space, and be a function. Then defined by is negative definite.

Therefore, being distances in Hilbert spaces, both and are nd kernels. We now state an important theorem which establishes the relation between pd and nd kernels.

###### Theorem 4.2 (Theorem 2.3 in Chapter 3 of [2])

Let

be a probability measure on the half line

and . Let be the Laplace transform of , i.e., . Then, is positive definite for all if and only if is negative definite.

The problem of designing a pd kernel on the Grassmannian can now be cast as that of finding an appropriate probability measure . Below, we show that this lets us reformulate popular kernels in Euclidean space as Grassmannian kernels.

#### 4.2.1 RBF Kernels.

Grassmannian RBF kernels can be obtained by choosing in Theorem 4.2, where is the Dirac delta function. This choice yields the Grassmannian RBF kernels (after discarding scalar constants)

 kr,bc(X,Y) =exp(β∣∣det(XTY)∣∣) ,  β>0, (11) kr,p(X,Y) (12)

Note that the RBF kernel obtained from the projection embedding, i.e. , was also used by Vemulapalli et al[27]. However, the positive definiteness of this kernel was neither proven nor discussed.

#### 4.2.2 Laplace Kernels.

The Laplace kernel is another widely used Euclidean kernel, defined as . To obtain Laplace kernels on the Grassmannian, we make use of the following theorem for nd kernels.

###### Theorem 4.3 (Corollary 2.10 in Chapter 3 of [2])

If is negative definite and satisfies then so is for .

As a result, both and are nd by choosing in Theorem 4.3. By employing either or along with in Theorem 4.2, we obtain the Grassmannian Laplace kernels

 kl,bc(X,Y) =exp(−β√1−∣∣det(XTY)∣∣) ,  β>0, (13) kl,p(X,Y) =exp(−β√p−∥∥XTY∥∥2F) ,  β>0. (14)

As shown in [23], the RBF and Laplace kernels are universal for . Since the Plücker and projection embeddings map to Euclidean spaces, this property clearly extends to the Grassmannian RBF and Laplace kernels.

#### 4.2.3 Binomial Kernels.

By choosing , where is the unit (or Heaviside) step function, i.e., , we obtain the Grassmannian binomial kernels

 kb,bc(X,Y) =1β−∣∣det(XTY)∣∣ ,  β>1, (15) kb,p(X,Y) (16)

Note that the generating function is a valid measure only for . This translates into the constraints on given in Eq. 15 and Eq. 16.

A more general form of binomial kernels can be derived by noting that, if is pd, then so is (see Proposition 2.7 in Chapter 3 of [2]). This lets us define the Grassmannian kernels

 kbi,bc(X,Y) =(β−∣∣det(XTY)∣∣)−α ,  β>1, α>0, (17) kbi,p(X,Y) (18)

To show that the binomial kernels are universal, we note that

 (1−t)−α=∞∑j=0(−αj)(−1)jtj,  with  (αj)=j∏i=1(α−i+1)/i.

It can be seen that , which implies that both and have non-negative and full Taylor series. This, as was shown in Corollary 4.57 of [23], is a necessary and sufficient condition for a kernel to be universal.

### 4.3 Conditionally Positive Kernels

Another important class of kernels is the so-called conditionally positive definite kernels [2]. Formally:

###### Definition 10 (Conditionally Positive Definite Kernels)

Let be a nonempty set. A symmetric function is a conditionally positive definite (cpd) kernel on if and only if for any , and with .

The relations between cpd kernels and pd ones were studied by Berg et al[2] and Schölkopf [19] among others. Before introducing cpd kernels on the Grassmannian, we state an important property of cpd kernels.

###### Proposition 2 ([19])

For a kernel algorithm that is translation invariant, one can equally use cpd kernels instead of pd ones.

This property relaxes the requirement of having pd

kernels for certain types of kernel algorithms. A kernel algorithm is translation invariant if it is independent of the position of the origin. For example, in SVMs, maximizing the margin of the separating hyperplane between two classes is independent of the position of the origin. As a result, one can seamlessly use a

cpd kernel instead of a pd kernel in SVMs. To introduce cpd kernels on Grassmannians, we rely on the following proposition:

###### Proposition 3 ([2])

If is nd then is cpd.

This lets us derive the Grassmannian cpd kernels

 klog,bc(X,Y) =−log(2−det(XTY)), (19) klog,p(X,Y) =−log(p+1−∥∥XTY∥∥2F). (20)

The ten new kernels derived in this section are summarized in Table 1. Note that given the linear Plücker and projection kernels, i.e., and , it is possible to obtain the polynomial and Gaussian extensions via standard kernel construction rules [21]. However, our approach lets us derive many other kernels in a principled manner by, e.g., exploiting different measures in Theorem 4.2. Nonetheless, here, we confined ourselves to deriving kernels corresponding to the most popular ones in Euclidean space, and leave the study of additional kernels as future work.

## 5 Experimental Evaluation

In this section, we compare our new kernels with the baseline kernels and

using three different kernel-based algorithms on Grassmannians: kernel SVM, kernel k-means and kernelized Locality Sensitive Hashing (kLSH). In our experiments, unless stated otherwise, we obtained the kernel parameters (

i.e., for all kernels except the logarithm ones and for the polynomial and binomial cases) by cross-validation.

### 5.1 Gender Recognition from Gait

We first demonstrate the benefits of our kernels on a binary classification problem on the Grassmannian using SVM and the Grassmannian Graph-embedding Discriminant Analysis (GGDA) proposed in [8]. To this end, we consider the task of gender recognition from gait (i.e., videos of people walking). We used Dataset-B of the CASIA gait database [30], which comprises 124 individuals (93 males and 31 females). The gait of each subject was captured from 11 viewpoints. Every video is represented by a gait energy image (GEI) of size (see Fig. 1), which has proven effective for gender recognition [29].

In our experiment, we used the videos captured with normal clothes and created a subspace of order 3 from the 11 GEIs corresponding to the different viewpoints. This resulted in 731 points on . We then randomly selected 20 individuals (10 male, 10 female) as training set and used the remaining individuals for testing. In Table 2

, we report the average accuracies over 10 random partitions. Note that for the SVM classifier, all new kernels derived from the Plücker embedding outperform

, with highest accuracy obtained with the binomial kernel. Similarly, all new projection kernels outperform , and the polynomial kernel achieves the overall highest accuracy of . For GGDA, bar the case of , all new kernels also outperform previously-known ones.

### 5.2 Pose Categorization

As a second experiment, we evaluate the performance of our kernels on the task of clustering on the Grassmannian using kernel k-means. To this end, we used the CMU-PIE face dataset [22], which contains images of 67 subjects with 13 different poses and 21 different illuminations (see Fig. 3 for examples). From each image, we computed a spatial pyramid of LBP [18] histograms and concatenated them to form a dimensional descriptor. For each subject, we collected the images acquired with the same pose, but different illuminations, in an image set, which we then represented as a linear subspace of order 3. This resulted in a total of Grassmannian points on . We used 10 samples from each pose to compute the kernel parameters.

The goal here is to cluster together image sets representing the same pose. To evaluate the quality of the clusters, we report both the clustering accuracy and the Normalized Mutual Information (NMI) [24]

, which measures the amount of statistical information shared by random variables representing the cluster distribution and the underlying class distribution of the data points. From the results given in Table

3, we can see that, with the exception of , the new kernels in each embedding outperform their respective baseline, or . For the Binet-Cauchy kernels, the maximum accuracy (and NMI score) is reached by the RBF kernel. The overall maximum accuracy of is achieved by the projection-based binomial kernel.

We also evaluated the intrinsic k-means algorithm of [26]. This algorithm achieved accuracy and an NMI score of . Furthermore, intrinsic k-means required to perform clustering on an i7 machine using Matlab. On the same machine, the runtimes for kernel k-means using and (which achieve the highest accuracies in Table 3) were and , respectively. This clearly demonstrates the benefits of RKHS embedding to tackle clustering problems on the Grassmannian.

### 5.3 Mouse Behavior Analysis

Finally, we utilized kernelized Locality-Sensitive Hashing (kLHS) [14] to perform recognition on the 2000 videos of the mice behavior dataset [13]. The basic idea of kLSH is to search for a projection from an RKHS to a low-dimensional Hamming space, where each sample is encoded with a -bit vector called the hash key. The approximate nearest-neighbor to a query can then be found efficiently in time sublinear in the number of training samples.

The mice dataset [13] contains 8 behaviors (i.e., drinking, eating, grooming, hanging, rearing, walking, resting and micro-movement of head) of several mice with different coating colors, sizes and genders (see Fig. 3

for examples). In each video, we estimated the background to extract the region containing the mouse in each frame. These regions were then resized to

, and the video represented with an order 6 subspace, thus yielding points on . We randomly chose 1000 videos for training and used the remaining 1000 videos for testing. We report the average recognition accuracy over 10 random partitions.

Fig. 4 depicts the recognition accuracies of the new and baseline kernels as a function of the number of bits . For the Plücker embedding kernels, the gap between our RBF kernel and reaches for a hash key of size 30. For the same hash key size, the projection-based heat kernel outperforms by more than , and thus reaches the overall highest accuracy of .

### 5.4 Kernel Sparse Coding

We performed an experiment on body-gesture recognition using the UMD Keck dataset [15]. To this end, we consider the problem of kernel sparse coding on the Grassmannian which can be formulated as

 miny∥∥ϕ(X)−∑Nj=1yjϕ(Dj)∥∥2+λ∥y∥1, (21)

where is a dictionary atom, is the query and is the vector of sparse codes. In practice, we used each training sample as an atom in the dictionary. Note that, as shown in [4], (21) only depends on the kernel values computed between the dictionary atoms, as well as between the query point and the dictionary. Classification is then performed by assigning the label of the dictionary element with strongest response to the query.

The UMD Keck dataset [15] comprises 14 body gestures with static and dynamic backgrounds (see examples in Figure 5). The dataset contains 126 videos from static scenes and 168 ones from dynamic environments. Following the experimental protocol used in [16], we first extracted the region of interest around each gesture and resized it to pixels. We then represented each video by a subspace of order 6, thus yielding points on .

Table 4 compares the performance of our kernels with that of and . Note that our kernels outperform the baselines in both the static and dynamic settings. The maximum accuracy is obtained by for the static scenario (), and by for the dynamic one (). For the same experiments, the state-of-the-art solution using product manifolds [16] achieves and , respectively.

## 6 Conclusions and Future Work

We have introduced a set of new positive definite kernels to embed Grassmannian into Hilbert spaces, which have a more familiar Euclidean structure. This set includes, among others, universal Grassmannian kernels, which have the ability to approximate general functions. Our experiments have demonstrated the superiority of such kernels over previously-known Grassmannian kernels, i.e., the Binet-Cauchy kernel [28] and the projection kernel [6]. It is important to keep in mind, however, that choosing the right kernel for the data at hand remains an open problem. In the future, we intend to study if searching for the best probability measure in Theroem 4 could give a partial answer to this question.

## Appendix 0.A Proof of Length Equivalence

Here, we prove Theorem 2 from Section 3, i.e., the equivalence up to a scale of of the length of any given curve under the Binet-Cauchy distance derived from the Plücker embedding and the geodesic distance . The proof of this theorem follows several steps. We start with the definition of curve length and intrinsic metric. Without any assumption on differentiability, let be a metric space. A curve in is a continuous function and joins the starting point to the end point .

###### Definition 11

The length of a curve is the supremum of over all possible partitions , where and .

###### Definition 12

The intrinsic metric on is defined as the infimum of the lengths of all paths from to .

###### Theorem 0.A.1 ( [9])

If the intrinsic metrics induced by two metrics and are identical up to a scale , then the length of any given curve is the same under both metrics up to .

###### Theorem 0.A.2 ( [9])

If and are two metrics defined on a space such that

 limd1(x,y)→0d2(x,y)d1(x,y)=1. (22)

uniformly (with respect to and ), then their intrinsic metrics are identical.

Therefore, here, we need to study the behavior of

 limδ2g(X,Y)→0δ2bc(X,Y)δ2g(X,Y)

to prove our theorem on curve length equivalence.

###### Proof

Since for , we can see that

 limδg(X,Y)→0δ2bc(X,Y)δ2g(X,Y) =limθi→02−2∏pi=1(1−sin2θi)∑pi=1θ2i =limθi→02−2∏pi=1(1−θ2i)∑pi=1θ2i=limθi→02−2(1−∑pi=1θ2i)∑pi=1θ2i=2.

This, in conjunction with Theorem 0.A.1, concludes the proof.∎

## References

• [1] Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton, NJ, USA (2008)
• [2] Berg, C., Christensen, J.P.R., Ressel, P.: Harmonic Analysis on Semigroups. Springer (1984)
• [3] Chikuse, Y.: Statistics on Special Manifolds. Springer (2003)
• [4]

Gao, S., Tsang, I.W.H., Chia, L.T.: Kernel sparse representation for image classification and face recognition. In: Proc. European Conference on Computer Vision (ECCV). pp. 1–14 (2010)

• [5] Gopalan, R., Li, R., Chellappa, R.: Unsupervised adaptation across domain shifts by generating intermediate data representations. IEEE Trans. Pattern Analysis and Machine Intelligence (2014)
• [6]

Hamm, J., Lee, D.D.: Grassmann discriminant analysis: a unifying view on subspace-based learning. In: Proc. Int. Conference on Machine Learning (ICML). pp. 376–383 (2008)

• [7] Harandi, M., Sanderson, C., Shen, C., Lovell, B.C.: Dictionary learning and sparse coding on grassmann manifolds: An extrinsic solution. In: Proc. Int. Conference on Computer Vision (ICCV) (December 2013)
• [8]

Harandi, M.T., Sanderson, C., Shirazi, S., Lovell, B.C.: Graph embedding discriminant analysis on Grassmannian manifolds for improved image set matching. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 2705–2712 (2011)

• [9] Hartley, R., Trumpf, J., Dai, Y., Li, H.: Rotation averaging. Int. Journal of Computer Vision 103(3), 267–305 (2013)
• [10] Helmke, U., Hüper, K., Trumpf, J.: Newtons’s method on Grassmann manifolds. Preprint: [arXiv:0709.2205] (2007)
• [11] Jayasumana, S., Hartley, R., Salzmann, M., Li, H., Harandi, M.: Kernel methods on the Riemannian manifold of symmetric positive definite matrices. In: CVPR. pp. 73–80 (2013)
• [12] Jayasumana, S., Hartley, R., Salzmann, M., Li, H., Harandi, M.: Optimizing over radial kernels on compact manifolds. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2014)
• [13] Jhuang, H., Garrote, E., Yu, X., Khilnani, V., Poggio, T., Steele, A.D., Serre, T.: Automated home-cage behavioural phenotyping of mice. Nature communications 1,  68 (2010)
• [14] Kulis, B., Grauman, K.: Kernelized locality-sensitive hashing. IEEE Trans. Pattern Analysis and Machine Intelligence 34(6), 1092–1104 (2012)
• [15] Lin, Z., Jiang, Z., Davis, L.S.: Recognizing actions by shape-motion prototype trees. In: Proc. Int. Conference on Computer Vision (ICCV). pp. 444–451. IEEE (2009)
• [16] Lui, Y.M.: Human gesture recognition on product manifolds. Journal of Machine Learning Research 13(1), 3297–3321 (2012)
• [17] Micchelli, C.A., Xu, Y., Zhang, H.: Universal kernels. Journal of Machine Learning Research 7, 2651–2667 (2006)
• [18] Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Analysis and Machine Intelligence 24(7), 971–987 (2002)
• [19] Scholkopf, B.: The kernel trick for distances. In: Proc. Advances in Neural Information Processing Systems (NIPS). pp. 301–307 (2001)
• [20]

Schölkopf, B., Herbrich, R., Smola, A.J.: A generalized representer theorem. In: Computational learning theory. pp. 416–426. Springer (2001)

• [21] Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press (2004)
• [22] Sim, T., Baker, S., Bsat, M.: The cmu pose, illumination, and expression database. IEEE Trans. Pattern Analysis and Machine Intelligence 25(12), 1615–1618 (2003)
• [23] Steinwart, I., Christmann, A.: Support vector machines. Springer (2008)
• [24]

Strehl, A., Ghosh, J., Mooney, R.: Impact of similarity measures on web-page clustering. In: AAAI Workshop on Artificial Intelligence for Web Search. pp. 58–64 (2000)

• [25] Subbarao, R., Meer, P.: Nonlinear mean shift over Riemannian manifolds. Int. Journal of Computer Vision 84(1), 1–20 (2009)
• [26] Turaga, P., Veeraraghavan, A., Srivastava, A., Chellappa, R.: Statistical computations on Grassmann and Stiefel manifolds for image and video-based recognition. IEEE Trans. Pattern Analysis and Machine Intelligence 33(11), 2273–2286 (2011)
• [27] Vemulapalli, R., Pillai, J.K., Chellappa, R.: Kernel learning for extrinsic classification of manifold features. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 1782–1789 (2013)
• [28] Wolf, L., Shashua, A.: Learning over sets using kernel principal angles. Journal of Machine Learning Research 4, 913–931 (2003)
• [29] Yu, S., Tan, T., Huang, K., Jia, K., Wu, X.: A study on gait-based gender classification. IEEE Trans. Image Processing (TIP) 18(8), 1905–1910 (2009)
• [30] Zheng, S., Zhang, J., Huang, K., He, R., Tan, T.: Robust view transformation model for gait recognition. In: International Conference on Image Processing (ICIP). pp. 2073–2076 (2011)