1 Introduction
Recent years have witnessed the growing attention in optimization on Riemannian manifolds. Since Riemannian optimization is directly based on the curved manifolds, one can eliminate those constraints such as orthogonality to obtain an unconstrained optimization problem that, by construction, will only use feasible points. This allows one to incorporate Riemannian geometry in the resulting optimization problems, thus producing far more accurate numerical results. The Riemannian optimization have been successfully applied in machine learning, computer vision and data mining tasks, including fixed low rank optimization
BoumalAbsil2011 ; Vandereycken2013 , Riemannian dictionary learning HarandiHartleyShenLovellSanderson2014 , and computer vision SuSrivastavaSouzaSarkar2014; WangShanChenGao2008 ; TuragaVeeraraghavanSrivastavaChellappa2011, and tensor clustering
SunGaoHongMishraYin .The constrained optimization on Euclidean space often has much larger dimension than that on manifolds implicitly defined by the constraints. An optimization algorithm on a manifold has therefore a lower complexity and better numerical properties. Methods of solving minimization problems on Riemannian manifolds have been extensively researched Udriste1994 ; AbsilMahonySepulchre2008 . In fact, traditional optimization methods such as the steepest gradient method, conjugate gradient method and Newton method in Euclidean space can be easily adopted to optimization on Riemannian manifolds and a solid conceptual framework was built over the last decades and the generic algorithm implementation is openly available, see http://www.manopt.org.
As one of fundamental optimization algorithms, the steepest descent method was first proposed for optimization on Riemmanian manifolds in Gabay1982 . The algorithm is computationally simple, however its convergence is very slow, particularly, for large scale complicated optimization problems in modern machine learning. In contrast, the Newton’s method Gabay1982 and the BFGS quasiNewton scheme (BFGS rank2update) RingWirth2012 have higher convergence rate, however, in practical applications, using the full secondorder Hessian information is computationally prohibitive.
In order to obtain a method offering high convergence rate and avoiding calculating the inverse Hessian matrix, Absil et al. AbsilBakerGallivan2007 proposed the Trustregion method on Riemannian manifolds. For example, the Trustregion method has been applied the optimization problem on Grassmann manifold for the matrix completion problem BoumalAbsil2011 . Each iteration of the trustregion method involves solving the Riemannian Newton equation Smith1993 , which increases the complexity of the algorithm. Huang et al. HuangAbsilGallivan2015
generalized symmetric rankone trustregion method to a vector variable optimization problem on a
dimensional Riemannian manifold, where an approximated Hessian matrix was generated by using the symmetric rankone update without solving Riemannian Newton equation. This method has superlinear convergence. However, the symmetric rankone update matrices are generated only by vectors rather than matrices, so this algorithm is not feasible for the variable is the matrix. This is a great barrier in practical applications.Generally, optimization methods which use secondorder function information can get faster convergent speed than just using firstorder information, however, this will greatly increase computational complexity at the same time. FletcherReeves conjugate gradient method on Riemannian manifolds RingWirth2012 is a typical type of methods which need only the firstorder function information, and its convergence is superlinear but lower than the desired 2order speed.
For the gradient method, it is known that the sequence of function values converges to the optimal function value at a rate of convergence that is no worse than , which is also called a superlinear convergence. For optimization on Euclidean spaces, researchers have sought for acceleration strategies to speed up the optimization algorithms requiring only firstorder function information, e.g. based on linear search Nesterov1983 . More strategies have been proposed by considering special structures of objectives, such as composite objective functions BeckTeboulle2009 ; TohYun2010 ; Nesterov2013 . In these strategies, the algorithms need only the firstorder function information with a guaranteed quadratic convergence rate for . In this paper we extend the method to Algorithm 1 for the general model (13) on Riemannina manifolds and we establish the improved complexity and convergence results, see Theorem 3.1 in Section 3.
As a practical application of the proposed fast gradientlike firstorder algorithm, we further consider the learning problems on lowrank manifolds. Lowrank learning seeks the lowest rank representation among all the candidates that can contain most of the data samples information. The lowrank learning has been widely applied in the lowrank matrix recovery (MR)/lowrank matrix completion MishraApuroopSepulchre2012 ; MishraMeyerBachSepulchre2013 ; MishraMeyerBonnabelSepulchre2014 and lowrank representation (LRR) for subspace clustering LiuLinYu2010 ; LinLiuSu2011 ; LiuLinYanSunYuMa2013 . One of lowrank matrix completion application examples is the Netflix problem BennettElkanLiuSmythTikk2007 , in which one would like to recover a lowrank matrix from a sparse sampled matrix entries.
However, the lowest rank optimization problems are NP hard and generally extremely hard to solve (and also hard to approximate RechtFazelParrilo2010 ). Typical approaches are either to transform the lowest rank problem into minimization of nuclear norm by using effective convex relaxation or parametrize the fixedrank matrix variables based on matrix decomposition.
As the set of rank matrix is a smooth Riemannian manifold under an appropriately chosen metric AbsilMahonySepulchre2008 ; MishraApuroopSepulchre2012 ; Vandereycken2013 ; MishraMeyerBonnabelSepulchre2014 , one can convert a rank constrained optimization problem to an unconstrained optimization problem on the low rank matrix manifold. VandereyckenVandereycken2013 considered this problem as an optimization problem on fixedrank matrix Riemannian manifold. Considering different types of matrix decomposition, Mishra et al. MishraApuroopSepulchre2012 ; MishraMeyerBonnabelSepulchre2014 proposed a Riemannian quotient manifold for lowrank matrix completion. Ngo et al. NgoSaad2012 addressed this problem based on a scaled metric on the Grassmann manifold.
The aforementioned works transform/simplify a lowrank matrix completion problem into the fixedrank optimization. However, lowrank optimization of matrix completion is not equivalent to the fixedrank optimization. In order to exploit the fixedrank optimization for lowrank optimization, Mishra et al. MishraMeyerBachSepulchre2013 proposed a pursuit algorithm that alternates between fixedrank optimization and rankone updates to search the best rank value. Tan et al. TanTsangWangVandereyckenPan2014 proposed a Riemannian pursuit approach which converts lowrank problem into a series of fixed rank problems, and further confirmed that lowrank problem can be considered as optimization whose search space is varieties of lowrank matrices TanShiGaoHengelXuXiao2015 , see (7), using the subspace pursuit approach on Riemannian manifold to look for the desired rank value . Since is a closure of the Riemannian submanifold , see (4), an iteration algorithm over is more safeguarded than on the fixedrank open manifold . We abuse ”Riemannian” for simplicity. In this paper, we consider lowrank learning as optimization problems on varieties of lowrank matrix Riemannian manifolds .
The contributions of this paper are:

We develop a fast optimization algorithm (FOA) for the model (13) on general Riemannian manifolds. This algorithm only uses firstorder function information, but has the convergence rate .

We provide a theoretical analysis for the proposed FOA and prove its quadratic convergence.

We propose fast subspace pursuit approach (SPRPRG(ALM)) on low rank matrix varieties based on the augmented Lagrangian method for the model (27).
The rest of this paper is organized as follows. Section 2 introduces the necessary symbol and concepts. In section 3, we propose the fast optimization algorithm on Riemannian manifold, and discuss its convergence in detail. In section 4, we extend the previous algorithm to the lowrank representation with noise on lowrank matrices varieties. Moreover, we proposed the fast subspace pursuit method (SPRPRG(ALM)) based on augmented Lagrange approach and FOA on low rank matrices varieties and applied SPRPRG(ALM) to this model. Some experimental results are presented for evaluating the performance of our algorithm in Section 5. Finally, the conclusions are summarized in Section 6.
2 Notations and Preliminaries
This section will briefly describe some notations and concepts that will be used throughout this paper.
2.1 Notations
We denote matrices by boldface capital letters, e.g., , vectors by boldface lowercase letters, e.g., , and scalars by letters, e.g., . The norm of a vector is denoted by . is defined as the number of nonzero elements in the vector v. denotes the sum of all norm of its columns. The operator means the vector resulting from the componentwise maximum of u and v. The inner product of two matrices A and B at the same size is defined as . The superscript T denotes the transpose of a vector/matrix. The is the diagonal matrix whose diagonal elements are the components of vector . The SVD decomposition of is denoted as , where is the
th singular value of
X. The nuclear norm of X is defined as . For any convex function , its subdifferential at X is denoted by .2.2 Riemannian Manifolds
A manifold of dimension Lee2003 is a topological space that locally resembles a Euclidean space in a neighbourhood of each point . For example, lines and circles are manifolds, and surfaces, such as a plane, a sphere, and a torus, are manifolds. Geometrically, a tangent vector is a vector that is tangent to a manifold at a given point . Abstractly, a tangent vector is a functional, defined on the set of all the smooth functions over the manifold , which satisfies the Leibniz differential rule. All the possible tangent vectors at constitute a Euclidean space, named the tangent space of at and denoted by . If we have a smoothly defined metric (innerproduct) across all the tangent spaces on every point , then we call a Riemannian manifold.
The tangent bundle is defined as the disjoint union of all tangent spaces
(1) 
For a Riemannian manifold, the Riemannian gradient of a smooth function at is defined as the unique tangent vector in such that for all , where denotes the (Euclidean) directional derivative. If is embedded in a Euclidean space, then is given by the projection of onto the tangent space .
With a globally defined differential structure, manifold becomes a differentiable manifold. A geodesic is a smooth curve with a vanishing covariant derivative of its tangent vector field, and in particular, the Riemmannian distance between two points is the shortest smooth path connecting them on the manifold, that is the infimum of the lengths of all paths joining and .
There are predominantly two operations for computations on a Riemannian manifold, namely (1) the exponential map at point , denoted by , and (2) the logarithmic map, at point , . The former projects a tangent vector in the tangent space onto the manifold, the latter does the reverse. Locally both mappings are diffeomorphic. Note that these maps depend on the manifold point at which the tangent spaces are computed.
Given two points that are close to each other, the distance between them can be calculated through the following formula as the norm in tangent space.
(2) 
The squared distance function is a smooth function for all . In this setting, it is tempting to define a retraction and lifting along the following lines. Given and , compute Retraction operator by (1) moving along to get the point in the linear embedding space, and (2) projecting the point back to the manifold . And we compute Lifting operator by projecting the point onto the tangent space . From another point of view, since , so could be perceived as a direction denoting from point to on Riemannian manifold.
At last, we need to define vector transport on .
(3) 
2.3 The Riemannian Manifold of Fixedrank Matrices
Let
(4) 
It can be proved that is a Riemannian manifold of dimension Vandereycken2013 . It has the following equivalent expression by SVD decomposition:
where is the Stiefel manifold of real, orthonormal matrices, and the entries in are arranged in descending order. Furthermore, the tangent space at is given by
(5) 
Since is embedded in , the Riemannian gradient of is given as the orthogonal projection of the Euclidean gradient of onto the tangent space. Here, we denote the orthogonal projection of any onto the tangent space at as
(6) 
where and , for any .
2.4 Varieties of Lowrank Matrices
Define
(7) 
then is the closure of the matrices set with constant rank . At a singular point where rank(), we have to use search directions in the tangent cone (instead of tangent space), The tangent cones of are explicitly known SchneiderUschmajew2015 ,
(8) 
where and , , .
Projection operator , projecting an element onto the tangent cone , can be calculated as
(9) 
where
Note that is the best rank approximation of . Moreover, can be efficiently computed with the same complexity as on in TanShiGaoHengelXuXiao2015 .
The gradient of function can be calculated by the following formula
(10) 
where denotes the Euclidean gradient of .
Another ingredient we need is the socalled retraction operator which maps a tangent vector in the tangent cone at a given point on , which is defined as
(11) 
Next, we introduce the inverse of retraction operator called the Lifting operator, denoted by which maps a point on manifold back to the tangent tone space . While the two point are close to each other, we can compute approximately by
(12) 
Also, can denote a direction form to on Varieties since .
3 The Fast Optimization Algorithm on Riemannian manifold
We consider the following general model (13) on Riemannian manifold which also naturally extends the problem formulation in BeckTeboulle2009
(13) 
where is a Riemannian manifold. We make the following assumptions throughout the section:

is a continuous convex function which is possibly nonsmooth.

is a smooth convex function of the type , there exists a finite real number such that , where is the largest singular value of the Hessian matrix of the function .
It is usually difficult to directly solve (13), in particular, in some applications such as matrix completion and low rank representation. While introducing auxiliary variables, expensive matrix inversions are often necessary. Here we adopt the following proximal Riemannian gradient (PRG) method TanShiGaoHengelXuXiao2015 to update . For any , we consider the following quadratic approximation of at a given point ,
(15) 
Note the above local model is different from that on vector spaces TohYun2010 ; Nesterov2013 . We assume that the local model admits a unique minimizer
(16) 
Simple manipulation gives that
(17) 
Taking in (17), we define the local optimal point of (17) as ,
(18) 
where plays the role of a step size. Moreover, needs to satisfy the inequality
(19) 
Note that the sequence of function values produced by (18) is nonincreasing. Indeed, for every
(20) 
The convergent speed is a major concern for any optimization algorithms. Linear search on Riemannian manifold has linear convergence. This conclusion has been shown in AbsilMahonySepulchre2008 when . In order to obtain linear search with quadratic convergence, we propose a fast optimization algorithm (FOA) which extends the acceleration methods in BeckTeboulle2009 ; JiYe2009 ; TohYun2010 onto Riemannian manifold. A specific linear combination of the previous two points was employed in the acceleration methods in Euclidean space. However the linear operation is a barrier for Riemannian manifold since Riemannian manifold may be not a linear space. So we need to use necessary tools on Riemannian manifold to overcome this difficulty.
According to (19) and (20), we can obtain a convergent sequence and its corresponding function value sequence which is monotonically declined. So the vector (see Section 2.2) is an ascent direction and its negative direction is a descent direction. We start at the point and walk a specific step size along the descent direction on to reach a new point . Then we retract onto denoted by , i.e. the new auxiliary variable . This is an acceleration part of the fast optimization algorithm. Next, the point was generated by , i.e., replacing with in (18) and (19), which is the difference between FOA and the linear search algorithm. In this way, we can generate the iterative sequence with the convergence rate .
Stopping Conditions I. The iteration with backtracking can be terminated if one of the following conditions is satisfied:

Condition 1: ;

Condition 2: ;

Condition 3: Number of iterations .
where and denote tolerance values, and denotes the given positive integer.
The FOA is summarized in Algorithm 1.
(21) 
(22) 
Now we are in a position to present the major conclusion of the paper.
Lemma 1 (Optimality Condition)
A point is a local minimizer of (13) if and only if there exists such that AbsilMahonySepulchre2008
where .
Lemma 2
Let and be a descent direction. Then there exists an that satisfies the condition in (21).
Proof
According to the quality of function , is close to , , we can obtain , i.e. equality (21) to hold when .
Theorem 3.1
Let be generated by Algorithm 1, then for any
The proof of Theorem 3.1 is given in Appendix A.
When noise exists in data, the model (13) can be changed into the following form:
(23) 
where the properties of function , is the same as (13).
In order to solve the problem (23) following LinLiuSu2011 , we can optimize the variables and by alternating direction approach. This approach decomposes the minimization of into two subproblems that minimize w.r.t. and , respectively. The iteration of each subproblem goes as follows,
(24) 
(25) 
While solving (24), the variable is fixed. We consider the following robust proximal Riemannian gradient (RPRG) method of at a given point on
Define
(26) 
where the point is generated by the previous point in same methods as Algorithm 1.
In most cases, subproblem (25) has a closed solution.
Similar to Algorithm 1, we propose the following stopping conditions II:

Condition 1: and ;

Condition 2: Number of iterations .

Condition 3: The number of alternating direction iterations .
where is a threshold for the change in the solutions, is a threshold for the error constraint, and are positive integers.
By Theorem 3.1, it is easy to infer the following theorem to guarantee the convergence of Algorithm 2.
Theorem 3.2
Let , and be an infinite sequence generated by Algorithm 2, then it follows that , converges to a limit point .
4 The application of FOA in low rank representation on low rank matrix varieties
Low rank representation model has been widely studied in recent years. In this section, we try to apply algorithm 2 to low rank representation model on varieties. The model is defined as follows:
(27) 
where denotes the nuclearnorm of matrix , is a regularization parameter, is the data matrix, is a regularizer (see Section 2.1), and denotes Low rank matrix varieties of rank at most .
In order to solve model (27), we generalize the augmented Lagrangian methods (ALM) on Euclidean space to optimization on Low rank matrix varieties. The augmented Lagrangian function is as follows
(28) 
where is the Lagrange multiplier, is the inner product, and , are the penalty parameters. The augmented Lagrangian function consists of two parts:

The first part is a smooth and differentiable function

The second is a continuous nonsmooth functions .
Hence, model (27) is equivalent to following optimization problem
(29) 
Since is a closure of the Riemannian submanifold , it can be guaranteed that (29) has optimal solution. In spite of the nonsmooth of set , it has been shown in SchneiderUschmajew2015 that the tangent cone of at singular points with rank has a rather simple characterization and can be calculated easily (see section2.4). Hence the model (29) can be directly optimized on UschmajewVandereycken2014 as section 3.
Subspace pursuit approach is an elegant optimization method on Riemannian manifold. In order to better optimize model (29) , we propose a fast subspace pursuit method denoted as SPRPRG(ALM) based on FOA. In the fast subspace pursuit method, while achieving the local optimum for the problem (29) on ( denotes most rank of , . is the th alternating iteration, .), we will start from the local optimum as the initial to warmstart the subsequent problem on low rank matrix varieties . Some related ingredients on low rank matrix verities are considered as follows (see section 2.4):
For the concrete function in (29), solving the problem
According to (3), we can obtain
(30) 
where (see TanShiGaoHengelXuXiao2015 ), and can be efficiently computed in the sense that can be cheaply computed without expensive SVDsVandereycken2013 .
Next, we directly update for (29) thought fixing the variables and , we have
The ALM with fast subspace pursuit on is described in detail in Algorithm 3. And Stopping conditions III. In theory, we stop Algorithm 3 if .
5 Experimental Results and Analysis
In this section, we conduct several experiment to assess the proposed algorithms. We focus on the low rank matrix completion problem on a synthetic data and clustering on some public data sets for evaluating the performance of FOA in convergent rate and error rate. All algorithms are coded in Matlab (R2014a) and implement on PC machine installed a 64_bit operating system with an iterl(R) Core (TM) i7 CPU (3.4GHz with singlethread mode) and 4GB memory.
5.1 Matrix Completion
Let be a matrix whose elements are known only on a subset of the complete set of indexes . The lowrank matrix completion problem consists of finding the matrix with lowest rank that agrees with on . This problem can be consider as optimization with the objective function , where the denotes projection operator onto , if , otherwise .
Following Vandereycken2013 , we generate groundtruth lowrank matrices of rank , where , is rank r matrix generated randomly. The size of the complete set is OS with the oversampling factor OS .
We test FOA on the matrix completion task and compare their optimization performance against qGeomMCMishraMeyerBachSepulchre2013 , LRGeomCG Vandereycken2013 , and LRGeomSD (steepest descent method) on fixedrank Riemannian manifold.^{1}^{1}1qGeomMC and LRGeomCG are from http://bamdevmishra.com/ codes /fixedranklist/ and http://www.unige.ch/math/vandereycken/matrix _completion.html, respectively. Fig.1 and Fig.3 give the needed iteration number while stop condition is satisfied with for different matrix size. Fig.2 and Fig.4 give the time cost in the same stop condition for different matrix size. We can find from these figures that qGeomMC need the least iteration number, ours is second. The convergent rate of our algorithm is the fastest. But qGeomMC employ the trust region method which uses the secondorder function information. So the proposed FOA has the advantage in convergence rapidity over existing optimization algorithm with first order function information on Riemannian manifold.
5.2 Clustering on Extended Yale B and COIL20 Database
In this subsection, we apply FOA to model (29) to evaluate its performance in calculating time and clustering error by compared with existing optimization method on Riemannian manifold. The compared algorithm include LRRLiuLinYanSunYuMa2013 (solving model (27) on euclidean space), SPRPRGTanShiGaoHengelXuXiao2015 (solving model (27) based on low rank matrix varieties) and SPRPRG(ALM)(solving model (27) based on augmented Lagrange method on low rank matrix vatieties). For SPRPRG and SPRPRG(ALM), we give respectively optimal solution by conjugate gradient method and FOA method, and compare their time cost. We set , , for model (27).
Two public databases are used for evaluation. One is the extended Yale B face database which contains 2,414 frontal face images of 38 individuals. The images were captured under different poses and illumination conditions. Fig. 5 shows some sample images. To reduce the computational cost and the memory requirements, all face images are resized from pixels to pixels and then vectorized as 2016dimensional vectors.
Another database is Columbia Object Image Library (COIL20) which includes 1,440 grayscale images of 20 objects (72 images per object). The objects have a wide variety in geometric and reflectance characteristics. Each image is clipped out from the black background using a rectangular bounding box and resized from pixels to pixels, then shaped as a 1024dimensional graylevel intensity feature. Fig. 6 shows some sample images.
In the clustering experiments on extended Yale B database, we initialize parameters , for SPRPRG(ALM), for LRR ^{2}^{2}2LRR code is from http://www.cis.pku.edu.cn/faculty/vision/zlin/zlin.htm LiuLinYanSunYuMa2013 , and , for SPRPRG TanShiGaoHengelXuXiao2015 . The first classes are select for experiments, each class contains 64 images.
We compare face clustering error rate for LRR, SPRPRG and SPRPRG(ALM), and time cost using our FOA and conjugate gradient (CG) method for SPRPRG and SPRPRG (ALM) with different value. Table 1 gives the experimental results. We can observe from these results that the proposed SPLRR(ALM) has the lowest error rate for each value. FOA applied in SPRPRG and SPRPRG(ALM) spend less time than conjugate gradient method to achieve the same accuracy.
Class  LRR  SPRPRG  SPRPRG(ALM)  
error  error  time(CG)  time(FOA)  error  time(CG)  time(FOA)  
2  2.34  0  71.3  14.3  9.0  3.9  
3  3.65  3.13  185.4  56.2  42.8  30.2  
5  4.69  2.50  581.5  187.5  163.0  160.5  
8  13.40  5.66  2339.2  943.2  1027.1  916.3  
10  24.02  7.50  14826.9  2064.3  870  844.0 
Class  LRR  SPRPRG  SPRPRG(ALM)  

ave err  std  ave err  std  time  time  ave err  std  time  time  
(CG)  (FOA)  (CG)  (FOA)  
2  4.08  11.49  2.75  8.11  0.8  0.4  1.94  6.91  0.8  0.5 
3  6.89  12.81  4.57  8.00  3.1  1.3  2.65  7.61  3.6  1.1 
4  10.92  13.07  8.46  9.95  16.8  4.0  4.79  7.23  18.6  7.6 
5  13.10  13.58  9.69  9.57  31.9  10.9  6.09  6.98  31.8  12.3 
6  19.67  13.12  11.98  9.40  35.1  26.2  6.26  7.03  62.4  39.7 
7  24.44  12.25  14.11  9.53  63.5  47.3  8.40  8.19  73.1  63.7 
8  25.02  11.57  14.87  7.43  91.6  73.4  9.15  6.68  106.7  92.1 
9  30.88  9.40  15.86  5.79  125.0  111.0  11.74  6.15  132.6  126.2 
10  31.44  9.93  20.96  7.43  171.5  165.2  13.57  7.18  185.0  149.2 
11  33.69  8.56  20.60  6.27  260.1  250.8  15.23  6.76  270.0  245.9 
The object clustering average error (%), standard deviation, and average time(second) cost and on COIL20.
In the clustering experiments on COIL20 database, we set the initialization parameters , for SPLRR(ALM), for LRR and , for SPLRR. In experiment, we randomly select (ranging from 2 to 11) classes from the 20 classes and 36 samples for each class in the whole COIL20 databases. For each value , we run 50 times using different randomly chosen classes and samples. The results are shown in Table 2
. From this table we can see that our proposed SPLRR(ALM) has the lowest average clustering error. FOA applied in SPLRR and SPLRR(ALM) spend less time than conjugate gradient method to achieve the same accuracy.This results demonstrates that the SPLRR(ALM) significantly enhances the accuracy and FOA reduce the time cos during the affinity matrix construction.
5.3 Clustering on Video Scene Database
This experiment test the performance of FOA based on model (29) on video scene database. The compared optimization method and performance are same as previous experiments.
The aim of experiment is to segment individual scenes from a video sequence. The video sequences are drawn from two short animations freely available from the Internet Archive ^{3}^{3}3http://archive.org/., this are same as the data used in TierneyGaoGuo2014 . The sequence is around 10 seconds in length (approximately 300 frames) and contains three scenes each. To segmented scenes according to significant translation and morphing of objects and sometimes
camera or perspective changes, as a result, there are 19 and 24 clips for Video 1 and Video 2 respectively, each clip include 3 scenes. Scene changes (or keyframes) were collected and labeled by hand to form ground truth data. The preprocessing of a sequence consists of converting colour video to grayscale and down sampling to a resolution of . The video clips were corrupted with various magnitudes of Gaussian noise to evaluate the clustering performance of different methods. Fig.7 show some frames from video 2.
Each frame of clips is vectorized as and concatenated with consecutive frames to form the samples for segmentation. The model parameters are set and in SPRPRG(ALM), for LRR, and for SPRPRG.
Gaussian noise  Video 1  
LRR  SPRPRG  SPRPRG(ALM)  
error  error  time  time  error  time  time  
(CG)  (FOA)  (CG)  (FOA)  
0%  min  0.00(7)  0.00(11)  0.00(12)  
max  48.28  29.33  25.59  
med  22.73  0.00  0.00  
mean  18.82  9.11  157.1  113.8  5.49  49.4  21.9  
std  17.63  13.40  9.42  
20%  min  0.00(6)  0.00(11)  0.00(12)  
max  58.33  37.93  31.03  
med  27.16  0.00  0.00  
mean  20.18  9.38  155.8  89.9  7.55  50.5  18.7  
std  17.36  13.51  11.18  
50%  min  0.00 (6)  0.00(12)  0.00(18)  
max  41.98  25.86  40.69  
med  13.63  4.85  0.00  
mean  16.29  12.77  116.2  82.2  6.73  51.9  16.4  
std  15.84  15.34  9.98  
80%  min  0.00(5)  0.00(9)  0.00(10)  
max  52.64  53.41  26.67  
med  20.36  2.20  0.00  
mean  20.28  16.39  107.6  91.7  7.83  56.8  18.6  
std  18.80  20.00  10.34 
Gaussian noise  Video 2  
LRR  SPRPRG  SPRPRG(ALM)  
error  error  time  time  error  time  time  
(CG)  (FOA)  (CG)  (FOA)  
0%  min  0.00(22)  0.00(22)  0.00(22)  
max  28.29  16.33  3.98  
med  0.00  0.00  0 .00  
mean  1.43  1.20  56.3  41.6  0.19  17.7  17.8  
std  5.85  4.10  0.82  
20%  min  0.00(14)  0.00(22)  0.00(22)  
max  57.74  0.59  0.59  
med  0.00  0.00  0.00  
mean  12.07  0.04  51.5  39.4  0.04  17.6  17.3  
std  16.42  0.14  0.14  
50%  min  1.78  0(21)  0(22)  
max  0.59  0.59  46.33  
med  0.00  0.00  0.00  
mean  4.99  0.06  45.9  37.2  0.04  17.7  13.3  
std  11.80  0.16  0.14  
80%  min  0.00(15)  0.00(18)  0.00(22)  
max  54.83  22.58  0.59  
med  0.00  0.00  0.00  
mean  9.62  3.16  40.1  34.8  0.004  17.8  10.6  
std  16.94  7.07  0.14 
TABLE 3 and 4 list the segmentation results and average time cost by our FOA and conjugate gradient (CG) method for SPRPRG and SPRPRG(ALM). These results are the average segmentation value of all clips for each Video. We can see from these results that SPRPRG (ALM) has consistently low average error rates and standard deviation with various magnitudes of Gaussian noise. In particular, SPRPRG (ALM) has not been influenced by Gaussian noise in the video 2. In most case, the FOA has fast convergent rate for SPRPRG and SPRPRG(ALM).
6 Conclusions
This paper propose a fast optimization algorithm (FOA) on a Riemannian manifold for a class of composite function, and prove its convergence. Different from most optimization methods on Riemannian manifold, our algorithm use only firstorder function information, but has the convergence rate . Experiments on some data set show the optimization performance largely outperform existing method in terms of convergence rate and accuracy. Besides, we transform low rank representation model into optimization problem on varieties based on augmented Lagrange approach, then fast subspace pursuit methods based on FOA is applied to solve optimization problem. Extensive experiments results demonstrate the superiority of our proposed ALM with fast subspace purist approach .
Acknowledgements.
The research project is supported by the Australian Research Council (ARC) through the grant DP140102270 and also partially supported by National Natural Science Foundation of China under Grant No. 61390510, 61133003, 61370119, 61171169 and 61227004.References
 (1) Absil, P.A., Baker, C., Gallivan, K.A., Trustregion methods on riemannian manifolds, Foundations of Computational Mathematics, 7(3), 303330 (2007)
 (2) Absil, P.A., Mahony, R., Sepulchre, R., Optimization Algorithm on Matrix Manifolds, Princeton University Press, (2008)
 (3) Beck, A., Teboulle, M., A fast iterative shrinkagethresholding algorithm for linear inverse problems, SIAM Journal on Imaging Sciences, 2(1), 183202 (2009)
 (4) Bennett, J., Elkan, C., Liu, B., Smyth, P., Tikk, D., Kdd cup and workshop 2007, ACM SIGKDD Explorations Newsletter, 9(2), 5152 (2007)
 (5) Boumal, N., Absil, P.A., RTRMC: A riemannian trustregion method for lowrank matrix completion, Advances in Neural Information Processing Systems, 406414 (2011)
 (6) Gabay, D., Minimizing a differentiable function over a differential manifold, Journal of Optimization Theory and Applications, 37(2), 177219 (1982)
 (7) Harandi, M., Hartley, R., Shen, C., Lovell, B., Sanderson, C., Extrinsic methods for coding and dictionary learning on Grassmann manifolds, arXiv:1401.8126, 141 (2014)
 (8) Huang, W., Absil, P.A., Gallivan, K.A., A riemannian symmetric rankone trustregion method, Mathematical Programming, 150(2), 179216 (2015)
 (9) Ji, S., Ye, J., An accelerated gradient method for trace norm minimization, Proceedings of the 26th Annual International Conference on Machine Learning, 457464 (2009)
 (10) Lee, J.M., Introduction to smooth manifolds, Springer New York (2003)
 (11) Lin, Z., Liu, R., Su, Z., Linearized alternating direction method with adaptive penalty for lowrank representation, Advances in Neural Information Processing Systems, 612620 (2011)
 (12) Liu, G., Lin, Z., Yan, S., Sun, J., Yu, Y., Ma, Y., Robust recovery of subspace structures by lowrank representation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 171184 (2013)
 (13) Liu, G., Lin, Z., Yu, Y., Robust subspace segmentation by lowrank representation, Proceedings of the 27th International Conference on Machine Learning, 663670 (2010)
 (14) Mishra, B., Apuroop, K., Sepulchre, R., A riemannian geometry for lowrank matrix com pletion, arXiv preprint arXiv:1211.1550 (2012)
 (15) Mishra, B., Meyer, G., Bach, F., Sepulchre, R., Lowrank optimization with trace norm penalty, SIAM Journal on Optimization, 23(4), 21242149 (2013)
 (16) Mishra, B., Meyer, G., Bonnabel, S., Sepulchre, R., Fixedrank matrix factorizations and riemannian lowrank optimization, Computational Statistics, 29(3), 591621 (2014)
 (17) Nesterov, Y., A method for solving the convex programming problem with convergence rate O(1/k2), Soviet Mathematics Doklady, vol. 27, 372376 (1983)
 (18) Nesterov, Y., Gradient methods for minimizing composite functions, Mathematical Pro gramming, 140(1), 125161 (2013)
 (19) Ngo, T., Saad, Y., Scaled gradients on grassmann manifolds for matrix completion, Advances in Neural Information Processing Systems, 14121420 (2012)
 (20) Recht, B., Fazel, M., Parrilo, P.A., Guaranteed minimumrank solutions of linear matrix equations via nuclear norm minimization, SIAM Review, 52(3), 471501 (2010)
 (21) Ring, W., Wirth, B., Optimization methods on riemannian manifolds and their application to shape space, SIAM Journal on Optimization, 22(2), 596627 (2012)
 (22) Schneider, R., Uschmajew, A., Convergence results for projected linesearch methods on varieties of lowrank matrices via lojasiewicz inequality, SIAM Journal on Optimization, 25(1), 622646 (2015)
 (23) Smith, S.T., Geometric optimization methods for adaptive filtering, Division of applied sciences, Harvard University, Cambridge, MA (1993)

(24)
Su, J., Srivastava, A., de Souza, F., Sarkar, S., Rateinvariant analysis of trajectories on riemannian manifolds with application in visual speech recognition, IEEE Conference on Computer Vision and Pattern Recognition, 620627 (2014)
 (25) Sun, Y., Gao, J., Hong, X., Mishra, B., Yin, B., Heterogeneous tensor decomposition for clustering via manifold optimization, IEEE Transactions on Pattern Analysis and Machine Intelligence, Accepted
 (26) Tan, M., Shi, J., Gao, J., Hengel, A., Xu, D., Xiao, S., Scalable nuclearnorm minimization by subspace pursuit proximal riemannian gradient, arXiv preprint arXiv:1503.02828 (2015)
 (27) Tan, M., Tsang, I., Wang, L., Vandereycken, B., Pan, S., Riemannian pursuit for big matrix recovery, Proceedings of the 31st International Conference on Machine Learning, 15391547 (2014)
 (28) Tierney, S., Gao, J., Guo, Y., Subspace clutering for sequential data, IEEE Conference on Computer Vision and Pattern Recognition, 10191026 (2014)
 (29) Toh, K., Yun, S., An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems, Pacific Journal of Optimization 6, 615640 (2010)
 (30) Turaga, P., Veeraraghavan, A., Srivastava, A., Chellappa, R., Statistical computations on grassmann and stiefel manifolds for image and videobased recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(11), 22732286 (2011)
 (31) Udriste, C., Convex functions and optimization methods on Riemannian manifolds, Springer Science & Business Media (1994)
 (32) Uschmajew, A., Vandereycken,B., Linesearch methods and rank increase on lowrank matrix varieties, Proceedings of the 2014 International Symposium on Nonlinear Theory and its Applications, 5255 (2014)
 (33) Vandereycken, B.: Lowrank matrix completion by riemannian optimization, SIAM Journal on Optimization, 23(2), 12141236 (2013)

(34)
Wang, R., Shan, S., Chen, X., Gao, W., Manifoldmanifold distance with application to face recognition based on image set, IEEE Conference on Computer Vision and Pattern Recognition, 18 (2008)
Appendix A
The requirement that the vector transport is isometric means that, for all and all , the equation holds.To prove Theorem 3.1, we give some lemmas in following.
Lemma 3
Let be positive real sequences satisfying
(32) 
with . Then for every .
Proof
Let , we can obtain from (32)
i.e. , so is a decreasing sequence. Then
Since is positive sequences, we can conclude
Lemma 4
Let positive sequence generated in Algorithm 1 via (22) with . Then for all .
Proof
1) When , .
2) Suppose that when , the conclusion is correct. Then, while
Comments
There are no comments yet.