Fast Optimization Algorithm on Riemannian Manifolds and Its Application in Low-Rank Representation

12/07/2015 ∙ by Haoran Chen, et al. ∙ Charles Sturt University Beijing University of Technology 0

The paper addresses the problem of optimizing a class of composite functions on Riemannian manifolds and a new first order optimization algorithm (FOA) with a fast convergence rate is proposed. Through the theoretical analysis for FOA, it has been proved that the algorithm has quadratic convergence. The experiments in the matrix completion task show that FOA has better performance than other first order optimization methods on Riemannian manifolds. A fast subspace pursuit method based on FOA is proposed to solve the low-rank representation model based on augmented Lagrange method on the low rank matrix variety. Experimental results on synthetic and real data sets are presented to demonstrate that both FOA and SP-RPRG(ALM) can achieve superior performance in terms of faster convergence and higher accuracy.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 15

page 16

page 17

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Recent years have witnessed the growing attention in optimization on Riemannian manifolds. Since Riemannian optimization is directly based on the curved manifolds, one can eliminate those constraints such as orthogonality to obtain an unconstrained optimization problem that, by construction, will only use feasible points. This allows one to incorporate Riemannian geometry in the resulting optimization problems, thus producing far more accurate numerical results. The Riemannian optimization have been successfully applied in machine learning, computer vision and data mining tasks, including fixed low rank optimization

BoumalAbsil2011 ; Vandereycken2013 , Riemannian dictionary learning HarandiHartleyShenLovellSanderson2014 , and computer vision SuSrivastavaSouzaSarkar2014; WangShanChenGao2008 ; TuragaVeeraraghavanSrivastavaChellappa2011

, and tensor clustering

SunGaoHongMishraYin .

The constrained optimization on Euclidean space often has much larger dimension than that on manifolds implicitly defined by the constraints. An optimization algorithm on a manifold has therefore a lower complexity and better numerical properties. Methods of solving minimization problems on Riemannian manifolds have been extensively researched Udriste1994 ; AbsilMahonySepulchre2008 . In fact, traditional optimization methods such as the steepest gradient method, conjugate gradient method and Newton method in Euclidean space can be easily adopted to optimization on Riemannian manifolds and a solid conceptual framework was built over the last decades and the generic algorithm implementation is openly available, see http://www.manopt.org.

As one of fundamental optimization algorithms, the steepest descent method was first proposed for optimization on Riemmanian manifolds in Gabay1982 . The algorithm is computationally simple, however its convergence is very slow, particularly, for large scale complicated optimization problems in modern machine learning. In contrast, the Newton’s method Gabay1982 and the BFGS quasi-Newton scheme (BFGS rank-2-update) RingWirth2012 have higher convergence rate, however, in practical applications, using the full second-order Hessian information is computationally prohibitive.

In order to obtain a method offering high convergence rate and avoiding calculating the inverse Hessian matrix, Absil et al. AbsilBakerGallivan2007 proposed the Trust-region method on Riemannian manifolds. For example, the Trust-region method has been applied the optimization problem on Grassmann manifold for the matrix completion problem BoumalAbsil2011 . Each iteration of the trust-region method involves solving the Riemannian Newton equation Smith1993 , which increases the complexity of the algorithm. Huang et al. HuangAbsilGallivan2015

generalized symmetric rank-one trust-region method to a vector variable optimization problem on a

-dimensional Riemannian manifold, where an approximated Hessian matrix was generated by using the symmetric rank-one update without solving Riemannian Newton equation. This method has superlinear convergence. However, the symmetric rank-one update matrices are generated only by vectors rather than matrices, so this algorithm is not feasible for the variable is the matrix. This is a great barrier in practical applications.

Generally, optimization methods which use second-order function information can get faster convergent speed than just using first-order information, however, this will greatly increase computational complexity at the same time. Fletcher-Reeves conjugate gradient method on Riemannian manifolds RingWirth2012 is a typical type of methods which need only the first-order function information, and its convergence is superlinear but lower than the desired 2-order speed.

For the gradient method, it is known that the sequence of function values converges to the optimal function value at a rate of convergence that is no worse than , which is also called a superlinear convergence. For optimization on Euclidean spaces, researchers have sought for acceleration strategies to speed up the optimization algorithms requiring only first-order function information, e.g. based on linear search Nesterov1983 . More strategies have been proposed by considering special structures of objectives, such as composite objective functions BeckTeboulle2009 ; TohYun2010 ; Nesterov2013 . In these strategies, the algorithms need only the first-order function information with a guaranteed quadratic convergence rate for . In this paper we extend the method to Algorithm 1 for the general model (13) on Riemannina manifolds and we establish the improved complexity and convergence results, see Theorem 3.1 in Section 3.

As a practical application of the proposed fast gradient-like first-order algorithm, we further consider the learning problems on low-rank manifolds. Low-rank learning seeks the lowest rank representation among all the candidates that can contain most of the data samples information. The low-rank learning has been widely applied in the low-rank matrix recovery (MR)/low-rank matrix completion MishraApuroopSepulchre2012 ; MishraMeyerBachSepulchre2013 ; MishraMeyerBonnabelSepulchre2014 and low-rank representation (LRR) for subspace clustering LiuLinYu2010 ; LinLiuSu2011 ; LiuLinYanSunYuMa2013 . One of low-rank matrix completion application examples is the Netflix problem BennettElkanLiuSmythTikk2007 , in which one would like to recover a low-rank matrix from a sparse sampled matrix entries.

However, the lowest rank optimization problems are NP hard and generally extremely hard to solve (and also hard to approximate RechtFazelParrilo2010 ). Typical approaches are either to transform the lowest rank problem into minimization of nuclear norm by using effective convex relaxation or parametrize the fixed-rank matrix variables based on matrix decomposition.

As the set of rank- matrix is a smooth Riemannian manifold under an appropriately chosen metric AbsilMahonySepulchre2008 ; MishraApuroopSepulchre2012 ; Vandereycken2013 ; MishraMeyerBonnabelSepulchre2014 , one can convert a rank constrained optimization problem to an unconstrained optimization problem on the low rank matrix manifold. VandereyckenVandereycken2013 considered this problem as an optimization problem on fixed-rank matrix Riemannian manifold. Considering different types of matrix decomposition, Mishra et al. MishraApuroopSepulchre2012 ; MishraMeyerBonnabelSepulchre2014 proposed a Riemannian quotient manifold for low-rank matrix completion. Ngo et al. NgoSaad2012 addressed this problem based on a scaled metric on the Grassmann manifold.

The aforementioned works transform/simplify a low-rank matrix completion problem into the fixed-rank optimization. However, low-rank optimization of matrix completion is not equivalent to the fixed-rank optimization. In order to exploit the fixed-rank optimization for low-rank optimization, Mishra et al. MishraMeyerBachSepulchre2013 proposed a pursuit algorithm that alternates between fixed-rank optimization and rank-one updates to search the best rank value. Tan et al. TanTsangWangVandereyckenPan2014 proposed a Riemannian pursuit approach which converts low-rank problem into a series of fixed rank problems, and further confirmed that low-rank problem can be considered as optimization whose search space is varieties of low-rank matrices TanShiGaoHengelXuXiao2015 , see (7), using the subspace pursuit approach on Riemannian manifold to look for the desired rank value . Since is a closure of the Riemannian submanifold , see (4), an iteration algorithm over is more safe-guarded than on the fixed-rank open manifold . We abuse ”Riemannian” for simplicity. In this paper, we consider low-rank learning as optimization problems on varieties of low-rank matrix Riemannian manifolds .

The contributions of this paper are:

  1. We develop a fast optimization algorithm (FOA) for the model (13) on general Riemannian manifolds. This algorithm only uses first-order function information, but has the convergence rate .

  2. We provide a theoretical analysis for the proposed FOA and prove its quadratic convergence.

  3. We propose fast subspace pursuit approach (SP-RPRG(ALM)) on low rank matrix varieties based on the augmented Lagrangian method for the model (27).

The rest of this paper is organized as follows. Section 2 introduces the necessary symbol and concepts. In section 3, we propose the fast optimization algorithm on Riemannian manifold, and discuss its convergence in detail. In section 4, we extend the previous algorithm to the low-rank representation with noise on low-rank matrices varieties. Moreover, we proposed the fast subspace pursuit method (SP-RPRG(ALM)) based on augmented Lagrange approach and FOA on low rank matrices varieties and applied SP-RPRG(ALM) to this model. Some experimental results are presented for evaluating the performance of our algorithm in Section 5. Finally, the conclusions are summarized in Section 6.

2 Notations and Preliminaries

This section will briefly describe some notations and concepts that will be used throughout this paper.

2.1 Notations

We denote matrices by boldface capital letters, e.g., , vectors by boldface lowercase letters, e.g., , and scalars by letters, e.g., . The -norm of a vector is denoted by . is defined as the number of non-zero elements in the vector v. denotes the sum of all -norm of its columns. The operator means the vector resulting from the component-wise maximum of u and v. The inner product of two matrices A and B at the same size is defined as . The superscript T denotes the transpose of a vector/matrix. The is the diagonal matrix whose diagonal elements are the components of vector . The SVD decomposition of is denoted as , where is the

-th singular value of

X. The nuclear norm of X is defined as . For any convex function , its subdifferential at X is denoted by .

2.2 Riemannian Manifolds

A manifold of dimension Lee2003 is a topological space that locally resembles a Euclidean space in a neighbourhood of each point . For example, lines and circles are manifolds, and surfaces, such as a plane, a sphere, and a torus, are manifolds. Geometrically, a tangent vector is a vector that is tangent to a manifold at a given point . Abstractly, a tangent vector is a functional, defined on the set of all the smooth functions over the manifold , which satisfies the Leibniz differential rule. All the possible tangent vectors at constitute a Euclidean space, named the tangent space of at and denoted by . If we have a smoothly defined metric (inner-product) across all the tangent spaces on every point , then we call a Riemannian manifold.

The tangent bundle is defined as the disjoint union of all tangent spaces

(1)

For a Riemannian manifold, the Riemannian gradient of a smooth function at is defined as the unique tangent vector in such that for all , where denotes the (Euclidean) directional derivative. If is embedded in a Euclidean space, then is given by the projection of onto the tangent space .

With a globally defined differential structure, manifold becomes a differentiable manifold. A geodesic is a smooth curve with a vanishing covariant derivative of its tangent vector field, and in particular, the Riemmannian distance between two points is the shortest smooth path connecting them on the manifold, that is the infimum of the lengths of all paths joining and .

There are predominantly two operations for computations on a Riemannian manifold, namely (1) the exponential map at point , denoted by , and (2) the logarithmic map, at point , . The former projects a tangent vector in the tangent space onto the manifold, the latter does the reverse. Locally both mappings are diffeomorphic. Note that these maps depend on the manifold point at which the tangent spaces are computed.

Given two points that are close to each other, the distance between them can be calculated through the following formula as the norm in tangent space.

(2)

The squared distance function is a smooth function for all . In this setting, it is tempting to define a retraction and lifting along the following lines. Given and , compute Retraction operator by (1) moving along to get the point in the linear embedding space, and (2) projecting the point back to the manifold . And we compute Lifting operator by projecting the point onto the tangent space . From another point of view, since , so could be perceived as a direction denoting from point to on Riemannian manifold.

At last, we need to define vector transport on .

(3)

2.3 The Riemannian Manifold of Fixed-rank Matrices

Let

(4)

It can be proved that is a Riemannian manifold of dimension Vandereycken2013 . It has the following equivalent expression by SVD decomposition:

where is the Stiefel manifold of real, orthonormal matrices, and the entries in are arranged in descending order. Furthermore, the tangent space at is given by

(5)

Since is embedded in , the Riemannian gradient of is given as the orthogonal projection of the Euclidean gradient of onto the tangent space. Here, we denote the orthogonal projection of any onto the tangent space at as

(6)

where and , for any .

2.4 Varieties of Low-rank Matrices

Define

(7)

then is the closure of the matrices set with constant rank . At a singular point where rank(), we have to use search directions in the tangent cone (instead of tangent space), The tangent cones of are explicitly known SchneiderUschmajew2015 ,

(8)

where and , , .

Projection operator , projecting an element onto the tangent cone , can be calculated as

(9)

where

Note that is the best rank approximation of . Moreover, can be efficiently computed with the same complexity as on in TanShiGaoHengelXuXiao2015 .

The gradient of function can be calculated by the following formula

(10)

where denotes the Euclidean gradient of .

Another ingredient we need is the so-called retraction operator which maps a tangent vector in the tangent cone at a given point on , which is defined as

(11)

Next, we introduce the inverse of retraction operator called the Lifting operator, denoted by which maps a point on manifold back to the tangent tone space . While the two point are close to each other, we can compute approximately by

(12)

Also, can denote a direction form to on Varieties since .

3 The Fast Optimization Algorithm on Riemannian manifold

We consider the following general model (13) on Riemannian manifold which also naturally extends the problem formulation in BeckTeboulle2009

(13)

where is a Riemannian manifold. We make the following assumptions throughout the section:

  1. is a continuous convex function which is possibly non-smooth.

  2. is a smooth convex function of the type , there exists a finite real number such that , where is the largest singular value of the Hessian matrix of the function .

  3. satisfies the following inequality,

    (14)

    where , and is a Lifting operator defined in Section 2.2.

It is usually difficult to directly solve (13), in particular, in some applications such as matrix completion and low rank representation. While introducing auxiliary variables, expensive matrix inversions are often necessary. Here we adopt the following proximal Riemannian gradient (PRG) method TanShiGaoHengelXuXiao2015 to update . For any , we consider the following quadratic approximation of at a given point ,

(15)

Note the above local model is different from that on vector spaces TohYun2010 ; Nesterov2013 . We assume that the local model admits a unique minimizer

(16)

Simple manipulation gives that

(17)

Taking in (17), we define the local optimal point of (17) as ,

(18)

where plays the role of a step size. Moreover, needs to satisfy the inequality

(19)

Note that the sequence of function values produced by (18) is nonincreasing. Indeed, for every

(20)

The convergent speed is a major concern for any optimization algorithms. Linear search on Riemannian manifold has linear convergence. This conclusion has been shown in AbsilMahonySepulchre2008 when . In order to obtain linear search with quadratic convergence, we propose a fast optimization algorithm (FOA) which extends the acceleration methods in BeckTeboulle2009 ; JiYe2009 ; TohYun2010 onto Riemannian manifold. A specific linear combination of the previous two points was employed in the acceleration methods in Euclidean space. However the linear operation is a barrier for Riemannian manifold since Riemannian manifold may be not a linear space. So we need to use necessary tools on Riemannian manifold to overcome this difficulty.

According to (19) and (20), we can obtain a convergent sequence and its corresponding function value sequence which is monotonically declined. So the vector (see Section 2.2) is an ascent direction and its negative direction is a descent direction. We start at the point and walk a specific step size along the descent direction on to reach a new point . Then we retract onto denoted by , i.e. the new auxiliary variable . This is an acceleration part of the fast optimization algorithm. Next, the point was generated by , i.e., replacing with in (18) and (19), which is the difference between FOA and the linear search algorithm. In this way, we can generate the iterative sequence with the convergence rate .

Stopping Conditions I. The iteration with backtracking can be terminated if one of the following conditions is satisfied:

  • Condition 1:  ;

  • Condition 2:  ;

  • Condition 3:  Number of iterations .

where and denote tolerance values, and denotes the given positive integer.

The FOA is summarized in Algorithm 1.

1:  Initial Take , . Set , and .
2:  Find the smallest nonegative integers such that with
(21)
Set and compute
(22)
3:  Terminate if stopping conditions I are achieved.
3:  
Algorithm 1 FOA on Riemannian manifold for model (13)

Now we are in a position to present the major conclusion of the paper.

Lemma 1 (Optimality Condition)

A point is a local minimizer of (13) if and only if there exists such that AbsilMahonySepulchre2008

where .

Lemma 2

Let and be a descent direction. Then there exists an that satisfies the condition in (21).

Proof

According to the quality of function , is close to , , we can obtain , i.e. equality (21) to hold when .

The following Theorem 3.1 gives the quadratic convergence of Algorithm 1.

Theorem 3.1

Let be generated by Algorithm 1, then for any

The proof of Theorem 3.1 is given in Appendix A.

When noise exists in data, the model (13) can be changed into the following form:

(23)

where the properties of function , is the same as (13).

In order to solve the problem (23) following LinLiuSu2011 , we can optimize the variables and by alternating direction approach. This approach decomposes the minimization of into two subproblems that minimize w.r.t. and , respectively. The iteration of each subproblem goes as follows,

(24)
(25)

While solving (24), the variable is fixed. We consider the following robust proximal Riemannian gradient (RPRG) method of at a given point on

Define

(26)

where the point is generated by the previous point in same methods as Algorithm 1.

In most cases, subproblem (25) has a closed solution.

Similar to Algorithm 1, we propose the following stopping conditions II:

  • Condition 1: and ;

  • Condition 2:  Number of iterations .

  • Condition 3: The number of alternating direction iterations .

where is a threshold for the change in the solutions, is a threshold for the error constraint, and are positive integers.

Algorithm 2 summarizes the FOA for model (23).

1:  Initial Take , . , Set , and .
2:  for   do
3:     for  do
4:        Find the smallest nonegative integers such that with
Set and compute
5:        Terminate if stopping conditions I are achieved.
6:     end for
7:     
8:     
9:     Terminate if stopping conditions II are achieved.
10:  end for
10:  
Algorithm 2 FOA on Riemannian manifold for model (23)

By Theorem 3.1, it is easy to infer the following theorem to guarantee the convergence of Algorithm 2.

Theorem 3.2

Let , and be an infinite sequence generated by Algorithm 2, then it follows that , converges to a limit point .

4 The application of FOA in low rank representation on low rank matrix varieties

Low rank representation model has been widely studied in recent years. In this section, we try to apply algorithm 2 to low rank representation model on varieties. The model is defined as follows:

(27)

where denotes the nuclear-norm of matrix , is a regularization parameter, is the data matrix, is a regularizer (see Section 2.1), and denotes Low rank matrix varieties of rank at most .

In order to solve model (27), we generalize the augmented Lagrangian methods (ALM) on Euclidean space to optimization on Low rank matrix varieties. The augmented Lagrangian function is as follows

(28)

where is the Lagrange multiplier, is the inner product, and , are the penalty parameters. The augmented Lagrangian function consists of two parts:

  • The first part is a smooth and differentiable function

  • The second is a continuous non-smooth functions .

Hence, model (27) is equivalent to following optimization problem

(29)

Since is a closure of the Riemannian submanifold , it can be guaranteed that (29) has optimal solution. In spite of the non-smooth of set , it has been shown in SchneiderUschmajew2015 that the tangent cone of at singular points with rank has a rather simple characterization and can be calculated easily (see section2.4). Hence the model (29) can be directly optimized on UschmajewVandereycken2014 as section 3.

Subspace pursuit approach is an elegant optimization method on Riemannian manifold. In order to better optimize model (29) , we propose a fast subspace pursuit method denoted as SP-RPRG(ALM) based on FOA. In the fast subspace pursuit method, while achieving the local optimum for the problem (29) on ( denotes most rank of , . is the th alternating iteration, .), we will start from the local optimum as the initial to warm-start the subsequent problem on low rank matrix varieties . Some related ingredients on low rank matrix verities are considered as follows (see section 2.4):

  • Projection operator defined by (9);

  • Gradient of function on low rank matrix verities , i.e.,

    where ;

  • Retraction operator defined by (11)

    where , is the first elements in the vector of ;

  • Lifting operator and the vector denoted by (12).

For the concrete function in (29), solving the problem

According to (3), we can obtain

(30)

where (see TanShiGaoHengelXuXiao2015 ), and can be efficiently computed in the sense that can be cheaply computed without expensive SVDsVandereycken2013 .

Next, we directly update for (29) thought fixing the variables and , we have

let , then the closed-form solution

(31)

where denotes the th column of TanShiGaoHengelXuXiao2015 .

The ALM with fast subspace pursuit on is described in detail in Algorithm 3. And Stopping conditions III. In theory, we stop Algorithm 3 if .

0:  Parameters , , , , and is positive integers. Data matrices .
1:  Initial and are zero matrices, , denotes the most rank of current matrix , .
2:  for  do
3:     for  do
4:        for  do
5:           Fast optimization algorithm
6:           Terminate if stopping conditions I are achieved.
7:           
8:        end for
9:        Compute by (31).
10:        Compute
11:        Compute .
12:        Terminate if stopping conditions II are achieved.
13:     end for
14:  end for
14:  .
Algorithm 3 Fast subspace pursuit based on ALM on for model (27)

5 Experimental Results and Analysis

In this section, we conduct several experiment to assess the proposed algorithms. We focus on the low rank matrix completion problem on a synthetic data and clustering on some public data sets for evaluating the performance of FOA in convergent rate and error rate. All algorithms are coded in Matlab (R2014a) and implement on PC machine installed a 64_bit operating system with an iterl(R) Core (TM) i7 CPU (3.4GHz with single-thread mode) and 4GB memory.

5.1 Matrix Completion

Let be a matrix whose elements are known only on a subset of the complete set of indexes . The low-rank matrix completion problem consists of finding the matrix with lowest rank that agrees with on . This problem can be consider as optimization with the objective function , where the denotes projection operator onto , if , otherwise .

Following Vandereycken2013 , we generate ground-truth low-rank matrices of rank , where , is rank r matrix generated randomly. The size of the complete set is OS with the oversampling factor OS .

Figure 1: Objective function value w.r.t. iteration number
Figure 2: Objective function value w.r.t. time cost (second)
Figure 3: Objective function value w.r.t. iteration number
Figure 4: Objective function value w.r.t. time cost(second)

We test FOA on the matrix completion task and compare their optimization performance against qGeomMCMishraMeyerBachSepulchre2013 , LRGeomCG Vandereycken2013 , and LRGeomSD (steepest descent method) on fixed-rank Riemannian manifold.111qGeomMC and LRGeomCG are from http://bamdevmishra.com/ codes /fixedranklist/ and http://www.unige.ch/math/vandereycken/matrix _completion.html, respectively. Fig.1 and Fig.3 give the needed iteration number while stop condition is satisfied with for different matrix size. Fig.2 and Fig.4 give the time cost in the same stop condition for different matrix size. We can find from these figures that qGeomMC need the least iteration number, ours is second. The convergent rate of our algorithm is the fastest. But qGeomMC employ the trust region method which uses the second-order function information. So the proposed FOA has the advantage in convergence rapidity over existing optimization algorithm with first order function information on Riemannian manifold.

Figure 5: Some sample images from the Extended Yale B face databases
Figure 6: Some sample images from the COIL-20 databases

5.2 Clustering on Extended Yale B and COIL-20 Database

In this subsection, we apply FOA to model (29) to evaluate its performance in calculating time and clustering error by compared with existing optimization method on Riemannian manifold. The compared algorithm include LRRLiuLinYanSunYuMa2013 (solving model (27) on euclidean space), SP-RPRGTanShiGaoHengelXuXiao2015 (solving model (27) based on low rank matrix varieties) and SP-RPRG(ALM)(solving model (27) based on augmented Lagrange method on low rank matrix vatieties). For SP-RPRG and SP-RPRG(ALM), we give respectively optimal solution by conjugate gradient method and FOA method, and compare their time cost. We set , , for model (27).

Two public databases are used for evaluation. One is the extended Yale B face database which contains 2,414 frontal face images of 38 individuals. The images were captured under different poses and illumination conditions. Fig. 5 shows some sample images. To reduce the computational cost and the memory requirements, all face images are resized from pixels to pixels and then vectorized as 2016-dimensional vectors.

Another database is Columbia Object Image Library (COIL-20) which includes 1,440 gray-scale images of 20 objects (72 images per object). The objects have a wide variety in geometric and reflectance characteristics. Each image is clipped out from the black background using a rectangular bounding box and resized from pixels to pixels, then shaped as a 1024-dimensional gray-level intensity feature. Fig. 6 shows some sample images.

In the clustering experiments on extended Yale B database, we initialize parameters , for SP-RPRG(ALM), for LRR 222LRR code is from http://www.cis.pku.edu.cn/faculty/vision/zlin/zlin.htm LiuLinYanSunYuMa2013 , and , for SP-RPRG TanShiGaoHengelXuXiao2015 . The first classes are select for experiments, each class contains 64 images.

Figure 7: Some video frame examples from the Video 2 with 0, 20%, 50% and 80% Gaussian noise respectively.

We compare face clustering error rate for LRR, SP-RPRG and SP-RPRG(ALM), and time cost using our FOA and conjugate gradient (CG) method for SP-RPRG and SP-RPRG (ALM) with different value. Table 1 gives the experimental results. We can observe from these results that the proposed SP-LRR(ALM) has the lowest error rate for each value. FOA applied in SP-RPRG and SP-RPRG(ALM) spend less time than conjugate gradient method to achieve the same accuracy.

Class LRR SP-RPRG SP-RPRG(ALM)
error error time(CG) time(FOA) error time(CG) time(FOA)
2 2.34 0 71.3 14.3 9.0 3.9
3 3.65 3.13 185.4 56.2 42.8 30.2
5 4.69 2.50 581.5 187.5 163.0 160.5
8 13.40 5.66 2339.2 943.2 1027.1 916.3
10 24.02 7.50 14826.9 2064.3 870 844.0
Table 1: Face clustering error (%) and time(second) cost on Extended Yale B database.
Class LRR SP-RPRG SP-RPRG(ALM)
ave err std ave err std time time ave err std time time
(CG) (FOA) (CG) (FOA)
2 4.08 11.49 2.75 8.11 0.8 0.4 1.94 6.91 0.8 0.5
3 6.89 12.81 4.57 8.00 3.1 1.3 2.65 7.61 3.6 1.1
4 10.92 13.07 8.46 9.95 16.8 4.0 4.79 7.23 18.6 7.6
5 13.10 13.58 9.69 9.57 31.9 10.9 6.09 6.98 31.8 12.3
6 19.67 13.12 11.98 9.40 35.1 26.2 6.26 7.03 62.4 39.7
7 24.44 12.25 14.11 9.53 63.5 47.3 8.40 8.19 73.1 63.7
8 25.02 11.57 14.87 7.43 91.6 73.4 9.15 6.68 106.7 92.1
9 30.88 9.40 15.86 5.79 125.0 111.0 11.74 6.15 132.6 126.2
10 31.44 9.93 20.96 7.43 171.5 165.2 13.57 7.18 185.0 149.2
11 33.69 8.56 20.60 6.27 260.1 250.8 15.23 6.76 270.0 245.9
Table 2:

The object clustering average error (%), standard deviation, and average time(second) cost and on COIL-20.

In the clustering experiments on COIL-20 database, we set the initialization parameters , for SP-LRR(ALM), for LRR and , for SP-LRR. In experiment, we randomly select (ranging from 2 to 11) classes from the 20 classes and 36 samples for each class in the whole COIL-20 databases. For each value , we run 50 times using different randomly chosen classes and samples. The results are shown in Table 2

. From this table we can see that our proposed SP-LRR(ALM) has the lowest average clustering error. FOA applied in SP-LRR and SP-LRR(ALM) spend less time than conjugate gradient method to achieve the same accuracy.This results demonstrates that the SP-LRR(ALM) significantly enhances the accuracy and FOA reduce the time cos during the affinity matrix construction.

5.3 Clustering on Video Scene Database

This experiment test the performance of FOA based on model (29) on video scene database. The compared optimization method and performance are same as previous experiments.

The aim of experiment is to segment individual scenes from a video sequence. The video sequences are drawn from two short animations freely available from the Internet Archive 333http://archive.org/., this are same as the data used in TierneyGaoGuo2014 . The sequence is around 10 seconds in length (approximately 300 frames) and contains three scenes each. To segmented scenes according to significant translation and morphing of objects and sometimes

camera or perspective changes, as a result, there are 19 and 24 clips for Video 1 and Video 2 respectively, each clip include 3 scenes. Scene changes (or keyframes) were collected and labeled by hand to form ground truth data. The pre-processing of a sequence consists of converting colour video to grayscale and down sampling to a resolution of . The video clips were corrupted with various magnitudes of Gaussian noise to evaluate the clustering performance of different methods. Fig.7 show some frames from video 2.

Each frame of clips is vectorized as and concatenated with consecutive frames to form the samples for segmentation. The model parameters are set and in SP-RPRG(ALM), for LRR, and for SP-RPRG.

Gaussian noise Video 1
LRR SP-RPRG SP-RPRG(ALM)
error error time time error time time
(CG) (FOA) (CG) (FOA)
0% min 0.00(7) 0.00(11) 0.00(12)
max 48.28 29.33 25.59
med 22.73 0.00 0.00
mean 18.82 9.11 157.1 113.8 5.49 49.4 21.9
std 17.63 13.40 9.42
20% min 0.00(6) 0.00(11) 0.00(12)
max 58.33 37.93 31.03
med 27.16 0.00 0.00
mean 20.18 9.38 155.8 89.9 7.55 50.5 18.7
std 17.36 13.51 11.18
50% min 0.00 (6) 0.00(12) 0.00(18)
max 41.98 25.86 40.69
med 13.63 4.85 0.00
mean 16.29 12.77 116.2 82.2 6.73 51.9 16.4
std 15.84 15.34 9.98
80% min 0.00(5) 0.00(9) 0.00(10)
max 52.64 53.41 26.67
med 20.36 2.20 0.00
mean 20.28 16.39 107.6 91.7 7.83 56.8 18.6
std 18.80 20.00 10.34
Table 3: Average time(second) cost and clustering error rates (%) for the corrupted video 1 with 0, 20%, 50% and 80% Gaussian noise, lower is better. Numbers in brackets indicate how many times clustering was perfect, i.e. zero error.
Gaussian noise Video 2
LRR SP-RPRG SP-RPRG(ALM)
error error time time error time time
(CG) (FOA) (CG) (FOA)
0% min 0.00(22) 0.00(22) 0.00(22)
max 28.29 16.33 3.98
med 0.00 0.00 0 .00
mean 1.43 1.20 56.3 41.6 0.19 17.7 17.8
std 5.85 4.10 0.82
20% min 0.00(14) 0.00(22) 0.00(22)
max 57.74 0.59 0.59
med 0.00 0.00 0.00
mean 12.07 0.04 51.5 39.4 0.04 17.6 17.3
std 16.42 0.14 0.14
50% min 1.78 0(21) 0(22)
max 0.59 0.59 46.33
med 0.00 0.00 0.00
mean 4.99 0.06 45.9 37.2 0.04 17.7 13.3
std 11.80 0.16 0.14
80% min 0.00(15) 0.00(18) 0.00(22)
max 54.83 22.58 0.59
med 0.00 0.00 0.00
mean 9.62 3.16 40.1 34.8 0.004 17.8 10.6
std 16.94 7.07 0.14
Table 4: Average time(second) cost and clustering error rates (%) for the corrupted video 2 with 0, 20%, 50% and 80% Gaussian noise, lower is better. Numbers in brackets indicate how many times clustering was perfect, i.e. zero error.

TABLE 3 and 4 list the segmentation results and average time cost by our FOA and conjugate gradient (CG) method for SP-RPRG and SP-RPRG(ALM). These results are the average segmentation value of all clips for each Video. We can see from these results that SP-RPRG (ALM) has consistently low average error rates and standard deviation with various magnitudes of Gaussian noise. In particular, SP-RPRG (ALM) has not been influenced by Gaussian noise in the video 2. In most case, the FOA has fast convergent rate for SP-RPRG and SP-RPRG(ALM).

6 Conclusions

This paper propose a fast optimization algorithm (FOA) on a Riemannian manifold for a class of composite function, and prove its convergence. Different from most optimization methods on Riemannian manifold, our algorithm use only first-order function information, but has the convergence rate . Experiments on some data set show the optimization performance largely outperform existing method in terms of convergence rate and accuracy. Besides, we transform low rank representation model into optimization problem on varieties based on augmented Lagrange approach, then fast subspace pursuit methods based on FOA is applied to solve optimization problem. Extensive experiments results demonstrate the superiority of our proposed ALM with fast subspace purist approach .

Acknowledgements.
The research project is supported by the Australian Research Council (ARC) through the grant DP140102270 and also partially supported by National Natural Science Foundation of China under Grant No. 61390510, 61133003, 61370119, 61171169 and 61227004.

References

  • (1) Absil, P.A., Baker, C., Gallivan, K.A., Trust-region methods on riemannian manifolds, Foundations of Computational Mathematics, 7(3), 303-330 (2007)
  • (2) Absil, P.A., Mahony, R., Sepulchre, R., Optimization Algorithm on Matrix Manifolds, Princeton University Press, (2008)
  • (3) Beck, A., Teboulle, M., A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM Journal on Imaging Sciences, 2(1), 183-202 (2009)
  • (4) Bennett, J., Elkan, C., Liu, B., Smyth, P., Tikk, D., Kdd cup and workshop 2007, ACM SIGKDD Explorations Newsletter, 9(2), 51-52 (2007)
  • (5) Boumal, N., Absil, P.A., RTRMC: A riemannian trust-region method for low-rank matrix completion, Advances in Neural Information Processing Systems, 406-414 (2011)
  • (6) Gabay, D., Minimizing a differentiable function over a differential manifold, Journal of Optimization Theory and Applications, 37(2), 177-219 (1982)
  • (7) Harandi, M., Hartley, R., Shen, C., Lovell, B., Sanderson, C., Extrinsic methods for coding and dictionary learning on Grassmann manifolds, arXiv:1401.8126, 1-41 (2014)
  • (8) Huang, W., Absil, P.A., Gallivan, K.A., A riemannian symmetric rank-one trust-region method, Mathematical Programming, 150(2), 179-216 (2015)
  • (9) Ji, S., Ye, J., An accelerated gradient method for trace norm minimization, Proceedings of the 26th Annual International Conference on Machine Learning, 457-464 (2009)
  • (10) Lee, J.M., Introduction to smooth manifolds, Springer New York (2003)
  • (11) Lin, Z., Liu, R., Su, Z., Linearized alternating direction method with adaptive penalty for low-rank representation, Advances in Neural Information Processing Systems, 612-620 (2011)
  • (12) Liu, G., Lin, Z., Yan, S., Sun, J., Yu, Y., Ma, Y., Robust recovery of subspace structures by low-rank representation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 171-184 (2013)
  • (13) Liu, G., Lin, Z., Yu, Y., Robust subspace segmentation by low-rank representation, Proceedings of the 27th International Conference on Machine Learning, 663-670 (2010)
  • (14) Mishra, B., Apuroop, K., Sepulchre, R., A riemannian geometry for low-rank matrix com- pletion, arXiv preprint arXiv:1211.1550 (2012)
  • (15) Mishra, B., Meyer, G., Bach, F., Sepulchre, R., Low-rank optimization with trace norm penalty, SIAM Journal on Optimization, 23(4), 2124-2149 (2013)
  • (16) Mishra, B., Meyer, G., Bonnabel, S., Sepulchre, R., Fixed-rank matrix factorizations and riemannian low-rank optimization, Computational Statistics, 29(3), 591-621 (2014)
  • (17) Nesterov, Y., A method for solving the convex programming problem with convergence rate O(1/k2), Soviet Mathematics Doklady, vol. 27, 372-376 (1983)
  • (18) Nesterov, Y., Gradient methods for minimizing composite functions, Mathematical Pro- gramming, 140(1), 125-161 (2013)
  • (19) Ngo, T., Saad, Y., Scaled gradients on grassmann manifolds for matrix completion, Advances in Neural Information Processing Systems, 1412-1420 (2012)
  • (20) Recht, B., Fazel, M., Parrilo, P.A., Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization, SIAM Review, 52(3), 471-501 (2010)
  • (21) Ring, W., Wirth, B., Optimization methods on riemannian manifolds and their application to shape space, SIAM Journal on Optimization, 22(2), 596-627 (2012)
  • (22) Schneider, R., Uschmajew, A., Convergence results for projected line-search methods on varieties of low-rank matrices via lojasiewicz inequality, SIAM Journal on Optimization, 25(1), 622-646 (2015)
  • (23) Smith, S.T., Geometric optimization methods for adaptive filtering, Division of applied sciences, Harvard University, Cambridge, MA (1993)
  • (24)

    Su, J., Srivastava, A., de Souza, F., Sarkar, S., Rate-invariant analysis of trajectories on riemannian manifolds with application in visual speech recognition, IEEE Conference on Computer Vision and Pattern Recognition, 620-627 (2014)

  • (25) Sun, Y., Gao, J., Hong, X., Mishra, B., Yin, B., Heterogeneous tensor decomposition for clustering via manifold optimization, IEEE Transactions on Pattern Analysis and Machine Intelligence, Accepted
  • (26) Tan, M., Shi, J., Gao, J., Hengel, A., Xu, D., Xiao, S., Scalable nuclear-norm minimization by subspace pursuit proximal riemannian gradient, arXiv preprint arXiv:1503.02828 (2015)
  • (27) Tan, M., Tsang, I., Wang, L., Vandereycken, B., Pan, S., Riemannian pursuit for big matrix recovery, Proceedings of the 31st International Conference on Machine Learning, 1539-1547 (2014)
  • (28) Tierney, S., Gao, J., Guo, Y., Subspace clutering for sequential data, IEEE Conference on Computer Vision and Pattern Recognition, 1019-1026 (2014)
  • (29) Toh, K., Yun, S., An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems, Pacific Journal of Optimization 6, 615-640 (2010)
  • (30) Turaga, P., Veeraraghavan, A., Srivastava, A., Chellappa, R., Statistical computations on grassmann and stiefel manifolds for image and video-based recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(11), 2273-2286 (2011)
  • (31) Udriste, C., Convex functions and optimization methods on Riemannian manifolds, Springer Science & Business Media (1994)
  • (32) Uschmajew, A., Vandereycken,B., Line-search methods and rank increase on low-rank matrix varieties, Proceedings of the 2014 International Symposium on Nonlinear Theory and its Applications, 52-55 (2014)
  • (33) Vandereycken, B.: Low-rank matrix completion by riemannian optimization, SIAM Journal on Optimization, 23(2), 1214-1236 (2013)
  • (34)

    Wang, R., Shan, S., Chen, X., Gao, W., Manifold-manifold distance with application to face recognition based on image set, IEEE Conference on Computer Vision and Pattern Recognition, 1-8 (2008)

Appendix A

The requirement that the vector transport is isometric means that, for all and all , the equation holds.To prove Theorem 3.1, we give some lemmas in following.

Lemma 3

Let be positive real sequences satisfying

(32)

with . Then for every .

Proof

Let , we can obtain from (32)

i.e. , so is a decreasing sequence. Then

Since is positive sequences, we can conclude

Lemma 4

Let positive sequence generated in Algorithm 1 via (22) with . Then for all .

Proof

1) When , .
2) Suppose that when , the conclusion is correct. Then, while