A Group Norm Regularized LRR Factorization Model for Spectral Clustering

01/08/2020 ∙ by Xishun Wang, et al. ∙ 10

Spectral clustering is a very important and classic graph clustering method. Its clustering results are heavily dependent on affine matrix produced by data. Solving Low-Rank Representation (LRR) problems is a very effective method to obtain affine matrix. This paper proposes LRR factorization model based on group norm regularization and uses Augmented Lagrangian Method (ALM) algorithm to solve this model. We adopt group norm regularization to make the columns of the factor matrix sparse, thereby achieving the purpose of low rank. And no Singular Value Decomposition (SVD) is required, computational complexity of each step is great reduced. We get the affine matrix by different LRR model and then perform cluster testing on synthetic noise data and real data (Hopkin155 and EYaleB) respectively. Compared to traditional models and algorithms, ours are faster to solve affine matrix and more robust to noise. The final clustering results are better. And surprisingly, the numerical results show that our algorithm converges very fast, and the convergence condition is satisfied in only about ten steps. Group norm regularized LRR factorization model with the algorithm designed for it is effective and fast to obtain a better affine matrix.



There are no comments yet.


page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

With the advent of the age of big data, we are confronted with different kinds of data every day. Unsupervised learning consider various problems in big data based on samples of unknown category. Clustering is a very classic and important unsupervised learning algorithm, which has been extensively applied in data mining, image segmentation, computer vision, pattern recognition, finance and other fields 

[1, 2, 3, 4]

. The existing classic clustering algorithms include k-means 

[5], spectral clustering [6, 7], density clustering [8], fuzzy clustering [9]etc.. Spectral clustering has advantages such as the algorithm is efficient, the data can be of any shape, the method is not sensitive to abnormal data and can be applied to high-dimensional problem. However, the spectral clustering needs to input the affine matrix in advance, which has a great influence on the clustering results.

In 2010, the Low-Rank Representation (LRR) [10] problem has been proposed by Liu et al. The affine matrix is mainly obtained by solving the LRR problem, followed by clustering the matrix using spectral clustering methods (such as Normalized Cuts ([6]). They assume that data samples come from the union of multiple subspaces, and the purpose of the algorithm is to denoise and obtain samples on the corresponding subspaces to which they belong. In the article they proved that LRR can accurately obtain each real subspace for clean data. For the noisy data, LRR can approximately restore the subspace of the original data with theoretical guarantees. In the article [10], in the case of specifying the class, using the affine matrix obtained by LRR for spectral clustering is more accurate and the performance is more robust than traditional method.

When solving the LRR problem, the traditional method mainly uses the minimization of the nuclear norm to approximate the minimum rank in the objective function. This is a convex approximation that guarantees the convergence of the designed algorithm. However, singular value decomposition (SVD) is required to calculate in the process of solving, SVD is time consuming, and the computation complexity is for an affine matrix. The classic algorithms for using SVD to solve LRR such as APG [11], ADM [12], LADM and LADMAP [13]

, of which APG solves the approximate problem of LRR , and its clustering result is not good. LADMAP performs best of these algorithms, which is LADM combines with adaptive adjustment of penalty parameters, however its calculation speed is still slow, especially for the high dimensional data. Along this line of thinking, the accelerated LADMAP is proposed by Lin , and they use skinny SVD technology to reduce the complexity to

, where

is the rank of the affine matrix. However, its rate of convergence is sub-linear, requiring more iterations, and the rank depends on the selection of hyperparameters. Lu introduced a smooth objective function with regular terms and used the Iterative Reweighted Least Squares (IRLS) method to solve the objective function 

[14]. The new method does not need SVD, but the computation complexity of matrix multiplication is , their numerical experiments show that the convergence is linear, so it yields more faster than LADMAP in some cases.

In order to avoid SVD calculation, Chen offer matrix factorization LRR model and hidden matrix factors augmented Lagrangian method (HMFALM) [15] . They decompose the affine matrix into and then use Augmented Lagrangian Method (ALM)to solve the model, where ,. They choose a method to traverse the rank that is they first choose a proper interval d, then run the algorithm on the ranks 1, d+1, 2d+1,…,kd+1,…and stop it when the results begin to worsen. Thus, searching through the options one by one to find the optimum rank. Although the original problem becomes a non-convex problem, the algorithm does not require SVD, only multiplication of the factor matrix is required. Its complexity is , where is the dimension of the data. The numerical results show that, HMFALM is much smaller than IRLS in the number of iteration steps, and its rate of convergence is faster. However, HMFALM needs an outer loop to find rank , the inner loop iterates to meet the stopping criterion, and the result of finding the rank is heavily dependent on the given hyperparameter.

We introduce a group norm regularization to design an adaptive rank-finding matrix factorization model to solve LRR. We first let , where is a larger number. Group norm regularization will make some columns of the factor matrix  become zero columns, so that the rank of the affine matrix is automatically reduced to achieve the purpose of adjusting the rank adaptively. Although we decreasing the rank from a large number, the numerical value shows that zero columns appear very quickly (zero columns can be deleted to speed up). It drops to a low rank in a few steps, and iterative convergence about ten steps. Its specific solving algorithm is ALM method, which is similar to [15], and the algorithm complexity is . Numerical results on synthetic noise data and real data (Hopkin155 and EYaleB) show that our model is faster, more accurate, and more robust to noise than above algorithms.

The structure of the paper is as follows: Section 2 introduces LRR problem, its convex approximation model [10] and matrix factorization model [15]. Section 3 introduces our model, gives ALM for our model, proposes acceleration technology for the ALM and introduces how to use the solution of LRR to spectral cluster. Numerical experimental results are reported in Section 4. Finally, Sections 5 concludes this paper.

2 LRR problem and two types of models

First, let’s recall the following LRR problem


where  is the data matrix,

is the dimension of the data vector,

is the number of data vectors, and . We call the optimal solution of above problem is the ”lowest-rank representation” of data with respect to a dictionary . This is a NP-hard problem, because the rank is norm and the solution is not unique. Just like the classic method of solving the low-rank problem, Liu [10] take advantage of the nuclear norm to approximate and get the following convex optimization problem:


Liu proved in [16] that under some conditions, the solution of (2.2) is unique and is one of the solutions to (2.1), and this solution can be transformed to obtain an affine matrix of data , which can be used for spectral clustering. The uniqueness of (2.2) is given by Wei and Lin [17]:

Theorem 2.1

Suppose the skinny SVD of is , then the minimizer to problem (2.2) is uniquely defined by


This formula naturally implies that exactly recovers affine matrix by Costeira [18].

As for the solution of (2.2) is one of the solutions to (2.1), we recommend to see corallary 4.1 in  [16]. To make the model robust to noise, Liu [16] proposed the following noisy LRR nuclear norm model:


Where .

In order to solve the (2.4), several algorithms have been designed. They need to calculate SVD and lack of speed. Based on this, Chen [15] put into a low rank factorization , and proposed the following matrix factorization model:


where . We can write , then the problem is expressed as follows


However, the rank of this model needs to be specified additional, Chen[15] gave a method to find the optimal rank:
1. Give the interval and hyperparameter .
2. Solve the problem (2.6) when and stop it when begin to worsen. Thus, searching through the options one by one to find the optimum rank.

Assume the optimal rank , in this case, the solution obtained from (2.6) is . According to the data space is full and the theorem in[10, 16], we can get the optimal by  ( is the pseudoinverse of ). The obtained rank is heavily dependent on the hyperparameter , and a lot of additional iterative calculations must be done before the optimal rank is obtained. In order to reduce the number of iteration steps and find the rank adaptively, we have designed a new model in section 3, which added the term of group norm regularization to the model (2.6).

3 Group norm regularized LRR factorization model and algorithm

Matrix factorization model is superior to the nuclear norm approximation method in calculation speed. However, it is difficult to estimate the rank of the restored matrix by the former method. So we want to find an adaptive method of estimating rank for different types of data. As is known to all, the rank of a matrix is determined by the number of rows or columns of the factor matrix, and the rank of the matrix is reduced if some columns are zero. So we take an oversized factor matrix first, and make the number of columns of the factor matrix zero by introducing the group norm regularization, so as to achieve the purpose of adjusting the rank adaptively.

3.1 Group norm regularized LRR factorization model

Assume that is a matrix of data samples, is the dimension of the data, is the number of data, and some data contain noise. We hope to remove noise and represent clean data at a low rank to obtain an affine matrix. We obtain the group norm regularized LRR factorization model (GNRLRRFM) by adding the group norm regularization term to (2.6):


where , , , is a larger number. is group norm of , and . The true rank of is usually unknow, and is an initial guess which is a larger number (for example K=n). Owing to the group norm regularization, some columns of U will be equal to zero under proper parameter , . Assuming columns of will be zero by the group norm , then we can get . So we reached the goal of adjusting the rank of adaptively only by introducing the group norm regularization. is also very important because and play a role of balance and mutual restraint in GNRLRRFM.

In summary, the GNRLRRFM model can adaptively estimate rank under constrained condition for different types of data without the need to additionally design updated rank strategies. And the regularization term make the model more resistant to noise. Of course, we have introduced two extra hyperparameters and , but numerical results show that our model is less sensitive to hyperparameters relative to other selection of the models.

3.2 Augmented Lagrangian Method

In this section, we introduce the ALM method to solve (3.1). For such bi-convex problems, i.e, convex in U for V fixed and convex in V for U fixed, Sun [19], Shen [20], Xu [21], Chen [15] all used similar ALM method to solve such bi-convex problem, and have obtained relatively good numerical results. The augmented Lagrange function of formula (3.1) is as follows :


where is a penalty parameter, is the lagrange multiplier corresponding to the constraint , is the usual inner product.

It is well-known that, starting from , the classic augmented Lagrangian method solves


at the -th iteration and then updates . Similar to classical ALM, we can update and at the -th iteration separately:


It is difficult to solve (3.4a) directly because and are coupled, so we propose a method called inner iteration technique to obtain approximate solution


where is the steps of inner iteration. At this point, can be solved by least square method:


Since is difficult to solve inspired by[13] we make the quadratic linearizing in (3.5) and add a proximal term


where is the same as proposed in [13].

We can get the solution of (3.8) by soft threshold shrinkage :


where , means the -th column of .

Owing to the soft-thresholding rule, then some columns of are equal to zeros, so we will get the low-rank solution. Similarly, we can get the explicit expression of :


To avoid ALM converging to an infeasible point, we adopt the strategies which proposed by Lu and Zhang [22] to update in the third part in Algorithm 1. At this point, we have given the explicit formula to update the variables for (3.3) at -th iteration. By the above update formula, we give the Algorithm 1 to solve problem (3.1).


Algorithm 1 : ALM for GNRLRRFM
0:  data , a rank overestimate , hypermeters ;
0:  , are obtained by SVD of X, , , , , , , with , and a sufficiently large constant and ;
  while not convergent do
    1. With , , Compute according to (3.7) (3.9) to find an approximation solution of (3.4), Then update according to (3.10), So we can find an approximate point  s.t
    2. Set
3. If and
then set Otherwise,set
    4.Set .
  end while

3.3 Convergence issue

For no-convex problems of this type of matrix factorization (3.1), although many books or articles (Boyd[23], Sun [19], Shen [20], Xu [21], Chen [15]) all numerically show its strong convergence behavior and the results compared with the original convex problem SVD faster and better, the convergence proving of non-convex problems by ALM is still a very difficult matter at present. The last three articles can only assume that it converges to the KKT point under some strong conditions which are difficult to verify theoretically, and this topic is deserved to research in the future.

Here we introduce the conditions and results of these three articles, that is the models studied by Shen [20] and Xu [21] don’t have regularization terms in comparison with our model, they need to assume that the variables are bounded and converge, then the ADMM algorithm in the article converges to the KKT point. Chen’s algorithm is the same as ours, and our model is within the general framework they proposed, Chen (2017) give a convergence analysis of ALM algorithm for general form:


where is a lower semi-continuous function and is continuously differentiable function. The Relaxed Constant Positive Linear Dependence (RCPLD) condition holds is necessary for the local minimizer for problem (3.15) is a KKT point, RCPLD is introduced in [15] as follows:

Definition 3.1

For the above problem (3.15), let be the feasible region, , and is a basis for the space with .We say that RCPLD holds for the system at if such that the space has the same rank for each

For specific proof, we recommend readers to see [15].

3.4 Accelerated ALM for GNRLRRFM

In this section, we propose two techniques to accelerate ALM for GNRLRRFM. The techniques aim to reduce computational complexity at each iteration and iteration numbers. In Section 4, we compared accelerated and unaccelerated ALM on synthetic data.

The computational complexity comes mainly from the matrix multiplication at each iteration. For the present case, some columns of the matrix are zeros owing to the utilization of the group norm regularization. This fact inspires the first technique, that is, we delete the zero columns in and the corresponding rows in before we perform the matrix multiplication. In the numerical experiments, we found that , here is the number of non-zero columns of at the -th iteration. Therefore, the first technique does not affect the convergence and will speed up the calculation.

The second technique is to inner iterate only one step for and , that is:


where the specific update steps can be seen in Algorithm 2. Although we solve (3.4a) and (U, V) at the same time with only one step of inner iteration approximately, but the numerical value shows that the Algorithm 2 converges in about ten steps. By applying the above acceleration techniques, we arrive at Algorithm 2 as below.


Algorithm 2 Accelerated ALM(AALM) for GNRLRRFM
0:  data , a rank overestimate , hypermeters ;
0:  , are obtained by SVD of X, , , ,
  while not convergent do
    1. With , , Compute according to (3.16) (3.17) to find an approximation solution of (3.4), delete zero columns of and corresponding rows of . Then update according to (3.10).
    2. Set
3. If and
then set Otherwise,set
    4.Set .
  end while

3.5 Subspace Segmentation (Clustering)

As same as Liu [16], we designed the following algorithm to perform subspace segmentating (clustering) based on the obtained by solving (3.1).


Algorithm 3 Subspace Segmentation (Clustering)
0:  data , a rank overestimate , hypermeters , number k of subspaces;
  1. obtain the minimizer by algorithm 2.
  2. compute .
  3. compute the skinny SVD: .
  5. get an affinity matrix ,
  6 use W to perform NCut and segment the data samples into k clusters.

In the fifth step, each item is squared to ensure that the elements in the similarity matrix are positive. In summary, Algorithm 3 describes how to use the solution obtained by GNRLRRFM for clustering.

4 Numerical experiments

In this section, we test the efficiency of our algorithm and compare it with some other algorithms. We have implemented our algorithm on a PC with 3.2GHZ AMD Ryzen 7 2700 Processor and 16GB of memory running. All computations are done in Matlab version 2016b and few tasks are written by C++. We compare our algorithm with three methods  (LADMAP(A) [13], IRLS [14] and HMFALM [15]). The first method is based on the model (2.4), which is faster than other SVD algorithm because it uses an adaptive adjustment penalty term to accelerate convergence and uses skinny SVD instead of SVD, reducing the complexity from to , where r is the predicted rank of the Z. IRLS smoothes the objective function by introducing regular terms, and then uses the weighted least squares method to solve the variables alternately. Although the singular value decomposition is not required during the algorithm, the matrix product complexity is still . During the solution process, the Matlab command lyap is used to solve the Sylvester equation (sometimes the solution of equation is not unique, and the program will be terminated), but under some problems, the number of iteration steps is less than that of LADMPA(A). HMFALM based on matrix factorization model (2.6) which does not need to calculate SVD, and only needs to perform matrix multiplication so as to be complexity. Its outer loop is r starting from 1 and increasing by step d. For each r, the inner loop must calculate iteratively until the stop condition is met to breaks out of the inner loop, and until the best rank interval is found to try to find the optimal r one by one. Where m is the dimension of the data, HMFALM is faster than the first two algorithms, but it is very sensitive to the hyperparameter , and anti-noise ability is not good without regular term.

Our model add the group norm regularization term on the matrix factorization model (2.6), and use the nature of the group norm regularization term: the factor matrix will have zero columns, and then adaptively reduce the rank. Although our rank starts to decrease from a large number , however, it only takes a few steps to iterate from a large rank to a small rank. The numerical results show that our algorithm AALM has converged in about ten iteration steps of the (3.1). The stopping criteria in our numerical experiments is defined as follows:


where is a moderately small number.

4.1 Experiments on synthetic data

We first compared the ALM and AALM (before and after acceleration) on the synthetic data, for the inner iteration of ALM, we tried two stopping criteria: 1.The internal iteration stops in the fixed 5 steps. 2.The stop criterion of inner iteration is met when .

The construction method of noisy synthetic data is the same as [13], [10], [24], [15]. The specific construction procedure is as follows. First, we denote the number of subspace by s, and the number of basis in each subspace by r while the dimensionality of the data is d. For the first subspace, we construct the basis

, which is a random orthogonal matrix with the dimension

, while basis of corresponding subspace obtained by , where is a random rotation matrix. This can ensure that these subspaces are independent of each other, and the basis in each subspace is linear independent. Then in the -th space, we use the basis to generate samples : , where

is independent and identically distributed, obeying the standard normal distribution

. Then we randomly select 20 from all data to be contaminated, such as the data vector is drawn to, then we can add noise according to the following formula:



is a zero mean unit variance Gaussian noise vector. Finally, we get the data matrix


We denote , , , , and generate synthetic data as described above. In Figure 1, ALM and AALM are compared, and Figure 1 shows that the effect after acceleration is better than that without acceleration. The horizontal axis is obtained after transformation of time. The vertical axis is the error . The purple line is the internal criterion of ALM which adopts the second criterion: each step iterate until the inner iteration convergence. The green line is the inner iteration with fixed five steps. The red line is the inner iteration with a fixed one step, the blue line is the inner iteration with one step and deletes the 0-column of each . Comparing blue line with red line, we can observe that deleting the 0-column validates our previous analysis: with no effect to the convergence result, and it improves memory savings and speeds up the calculating. From the whole Figure 1, we can see that the inner iteration does not need to converge, even one step is adopted, which can greatly reduce the calculation time.

Figure 1: Comparison between ALM and accelerated ALM on synthetic data

From table 1 to table 3, we use LADMAP(A), HMFALM and AALM separately to obtain the corresponding affine matrix on the noisy synthetic data, and then Algorithm 3 is adopt to perform clustering. We want to verify noise resistance and sensitivity to hyperparameters between the AALM and several compared algorithms. For the intensity of the noise, we select . For the selection of hyperparameters, we select the for the three algorithms LADMAP(A), IRLS, HMFALM. With respect to our algorithm, , and are selected. For the other parameters from IRLS and LADMAP(A) algorithm , we select the optimal parameters set in the corresponding article, and we select for the HMFALM algorithm with the searching gap is and searching exactly. We select as the other parameters in our algorithm, we all run the algorithm three times and take the average as each result for each synthetic data.

(s,p,d,r) Method Time(s) Ite Acc(%) Time(s) Ite Acc(%) Time(s) Ite Acc(%)
(10,20,200,5) HMFALM 0.0687 106 53.50 0.2040 206 100.00 0.3253 245 98.50
LADM 0.4223 53 98.50 2.7070 258 100.00 13.986 1246 92.67
AALM 0.0493 10 100.00 0.0407 9 100.00 0.0373 9 99.67
(15,20,200,5) HMFALM 0.1073 73 44.33 0.4520 184 100.00 1.0693 259 87.67
LADM 0.9667 56 99.44 6.9257 311 100.00 34.369 1359 87.78
AALM 0.1147 10 100.00 0.0967 9 100.00 0.0900 9 99.89
(20,25,500,5) HMFALM 0.6620 102 87.07 1.8550 188 100.00 4.8747 282 81.00
LADM 3.4753 80 99.93 24.856 409 99.40 72.036 902 82.40
AALM 0.3883 10 100.00 0.3413 9 100.00 0.3087 9 100.00
(30,30,900,5) HMFALM 3.1897 126 99.52 9.9670 199 100.00 24.252 267 80.33
LADM 16.577 83 100.00 115.86 472 88.44 902.24 2598 82.74
AALM 1.8543 11 100.00 1.5953 10 100.00 1.3303 9 100.00
(35,40,1400,5) HMFALM 12.482 145 100.00 50.680 236 86.90 55.900 241 80.62
LADM 119.13 198 99.17 259.32 349 81.05 20961 17467 63.50
AALM 6.4487 13 99.81 5.0647 11 99.98 4.2353 9 100.00
(40,50,2000,5) HMFALM 36.148 146 100.00 124.23 231 80.55 120.70 230 80.60
LADM 462.56 314 86.95 498.80 292 82.50 3987.7 1633 80.52
AALM 17.832 16 99.55 15.046 12 99.97 11.490 9 100.00
Table 1: Numerical results on synthetic data ()
(s,p,d,r) Method Time(s) Ite Acc(%) Time(s) Ite Acc(%) Time(s) Ite Acc(%)
(10,20,200,5) HMFALM 0.0410 57 22.00 0.1907 192 100.00 0.5063 290 81.67
LADM 0.4153 55 97.17 2.4267 215 100.00 14.404 944 86.00
AALM 0.0513 10 100.00 0.0397 9 100.00 0.0410 9 99.83
(15,20,200,5) HMFALM 0.0860 56 20.33 0.4123 158 97.67 1.1663 264 80.22
LADM 0.8593 58 98.56 5.4410 229 97.56 38.563 957 84.56
AALM 0.1227 9 99.89 0.1130 9 100.00 0.1130 9 99.89
(20,25,500,5) HMFALM 0.4450 74 33.73 3.0117 204 95.73 4.7363 271 81.07
LADM 3.0857 69 99.80 32.630 509 89.00 178.75 1672 81.33
AALM 0.4253 10 99.40 0.3997 9 99.93 0.3607 9 100.00
(30,30,900,5) HMFALM 2.5263 108 90.30 23.785 255 80.81 23.144 255 80.56
LADM 24.673 106 97.63 162.50 542 81.89 499.30 1187 80.15
AALM 2.0857 13 98.52 1.7333 9 99.96 1.5700 9 100.00
(35,40,1400,5) HMFALM 22.388 184 85.21 53.265 227 80.48 53.220 229 80.57
LADM 237.34 316 89.95 3378.0 3267 81.38 1388.8 1206 80.76
AALM 7.0320 16 99.31 5.8153 12 99.88 4.2393 8 100.00
(40,50,2000,5) HMFALM 114.30 216 80.63 117.18 216 80.38 117.57 216 80.57
LADM 707.86 419 80.92 8509.0 3627 80.72 2686.7 1053 80.63
AALM 17.972 16 99.92 16.859 14 99.85 12.191 9 100.00
Table 2: Numerical results on synthetic data ()
(s,p,d,r) Method Time(s) Ite Acc(%) Time(s) Ite Acc(%) Time(s) Ite Acc(%)
(10,20,200,5) HMFALM 0.0483 67 25.17 0.2960 224 66.67 0.4907 279 81.67
LADM 0.7967 83 84.17 4.2673 383 88.50 36.212 1657 83.17
Ours 0.0507 9 95.67 0.0517 9 97.67 0.0610 9 97.83
(15,20,200,5) HMFALM 0.1247 79 36.00 0.8313 222 68.44 1.1647 253 80.89
LADM 2.7097 138 67.89 12.076 496 90.89 146.66 3193 82.00
Ours 0.1250 9 91.56 0.1230 9 95.56 0.1187 8 97.11
(20,25,500,5) HMFALM 1.2063 142 69.20 4.5427 260 81.20 4.6800 261 79.87
LADM 20.271 293 30.20 85.304 1017 84.67 295.25 2684 79.93
Ours 0.4310 10 91.53 0.3973 9 94.73 0.3627 8 95.60
(30,30,900,5) HMFALM 10.736 184 6.11 22.449 245 80.74 22.574 246 79.00
LADM 117.06 438 52.19 636.09 1536 83.44 1220.9 2795 80.59
Ours 2.1647 13 83.85 1.5993 9 91.96 1.4360 8 97.15
(35,40,1400,5) HMFALM 51.562 220 80.52 51.515 221 80.62 50.354 218 80.60
LADM 555.73 632 83.40 2605.8 2218 80.60 2830.0 2416 80.55
Ours 7.3887 16 89.86 5.8400 12 83.90 3.9590 8 96.45
(40,50,2000,5) HMFALM 114.17 208 79.98 113.02 207 80.45 111.18 207 80.38
LADM 7750.0 3451 81.95 9314.1 3654 80.55 7959.7 3065 79.72
Ours 18.338 16 95.75 15.982 14 85.65 10.763 8 92.70
Table 3: Numerical results on synthetic data ()

From table 1 to table 3, we can observe that our AALM algorithm has better calculation speed and clustering accuracy than HMFALM and LADM on synthetic data, where only ten steps iteration by our algorithm. Futhermore, compared to the other two algorithm, our clustering results are basically unchanged along with the change of hyperparameters. As a result, although our model (3.1) have one more hyperparameters than the (2.6), our model is not sensitive to hyperparameters, while the clustering result of the other two models is greatly affected by the hyperparameters . In addition, when , our clustering accuracy is the best, and even in some cases it can be more close to 20 accuracy than other algorithms, so that the GNRLRRFM model introduced with the group norm regularization term has good noise immunity and is robust.

4.2 Experiments on real data

In this section, we test the clustering effectiveness of our algorithm in the Hopkins155 dataset [25] and Extended Yale B dataset [26].

The Hopkins155 dataset contains 156 data sequences, each data sequence contains from 39 to 550 data vectors (from two or three motion modes), and the dimension of each data vector is 72 (24 frames 3). We specify the number of classes (two or three classes) of each data sequence, and take advantage of HMFALM, LADM, IRLS, and AALM respectively in these 156 sequences to solve the similarity matrix and cluster. In table 4, we give the total accuracy, average iteration steps, and average time on the data series in the condition of two modes, three modes, and all modes. Among them, as for HMFALM, LADM and IRLS, we select the  (the optimal parameters tested by the authors in their article), with respect to AALM algorithm, we select .

Problem Two Motions Three motions All motions
Time Iter. Acc() Time Iter. Acc() Time Iter. Acc()
HMFALM 0.0509 123.54 96.62 0.0734 137.03 95.04 0.0561 126.65 96.14
LADM 74.160 28724 96.39 119.07 37564 95.72 84.525 30764 96.19
IRLS 36.054 189.48 97.15 74.160 181.89 95.90 43.335 187.72 96.77
AALM 0.0190 13.667 97.75 0.0231 13.889 96.62 0.0199 13.718 97.41
Table 4: Comparison of motion segmentation by LRR by using different solvers on the Hopkins155 dataset

As can be seen from table 4, our algorithm is faster than the other three algorithms, with the least number of iteration steps and the highest clustering accuracy.

Figure 2: Example face image from the Extended Yale B dataset

Extended Yale B dataset contains 38 subjects (people). Each subject has 64 face images, Figure 2 shows thirty pictures from one of the people’s faces where data has lighting noise so that some faces cannot be seen clearly or even become dark. For instance the fourth picture can’t be recognized even by people. Similar to [14], we conduct two experiments by construct the first 5 subjects and the first 10 subjects into a dataset X. First, we resize all the picture to 32 32. Second, to reduce noise, project it to a 30-dimensional subspace for 5 subjects clustering problem and a 60-dimensional subspace for 10 subjects by principle component analysis (PCA).Third, By applying HMFALM, LADM, IRLS and AALM to solve the low-rank representation problem, we get different affine matrix. At last, comparing the spectral clustering result by Algorithm 3 with different affine matrix:

Problem 10 subjects 5 subjects
Time Iter. Acc() Time Iter. Acc()
HMFALM 0.4280 396 81.87 0.1060 336 88.44
LADM 98.7680 8430 81.56 15.8250 4324 88.44
IRLS 97.9000 107 81.87 16.3400 102 88.44
Ours 0.0720 16 81.87 0.0200 16 88.44
Table 5: Comparison of face clustering by LRR by using different solvers on the Extended Yale B

It can be easily seen that the clustering accuracy of the four algorithms is the same for the 5 subjects, but our algorithm AALM is the fastest. AALM, IRLS and HMFALM have achieved the same accuracy for the 10 subjects, while our algorithm is still the fastest. In summary, our algorithm has achieved the best accuracy with fastest computing speed on the real problem that are Hopkins 155 motion mode clustering and Extended Yale B face clustering.

5 Conclusion

In this paper, we propose a group norm regularization factorization LRR model based on the low-rank representation factor model, and design an accelerated ALM (AALM) algorithm to obtain a affine matrix, and then cluster data by the algorithm of spectral clustering. For noisy synthetic data, our algorithm and model clustering results yield more accurate results than both the traditional nuclear norm-based LRR model and the low-rank representation factor model without regularization, in addition compared with the selected classic algorithm, our model is more robust, insensitive to parameters, and has better clustering results. With respect to real data Hopkin155 motion pattern clustering and Extended Yale B face clustering, our algorithms have achieved optimal clustering accuracy with fastest rate than alternating algorithm. In a word, this paper proposes a group norm regularization factorization LRR model to solve similarity matrices. Compared with the previous LRR model, numerical experiments illustrates that the similarity matrix obtained by our model is fast and clustering results is good.


  • [1] Xindong Wu, Vipin Kumar, J Ross Quinlan, Joydeep Ghosh, Qiang Yang, Hiroshi Motoda, Geoffrey J McLachlan, Angus Ng, Bing Liu, S Yu Philip, et al. Top 10 algorithms in data mining. Knowledge and information systems, 14(1):1–37, 2008.
  • [2] Chi-hau Chen. Handbook of pattern recognition and computer vision. World Scientific, 2015.
  • [3] Newton Da Costa Jr, Jefferson Cunha, and Sergio Da Silva.

    Stock selection based on cluster analysis.

    Economics Bulletin, 13(1):1–9, 2005.
  • [4] Amit Saxena, Mukesh Prasad, Akshansh Gupta, Neha Bharill, Om Prakash Patel, Aruna Tiwari, Meng Joo Er, Weiping Ding, and Chin-Teng Lin. A review of clustering techniques and developments. Neurocomputing, 267:664–681, 2017.
  • [5] James MacQueen et al. Some methods for classification and analysis of multivariate observations. In

    Proceedings of the fifth Berkeley symposium on mathematical statistics and probability

    , volume 1, pages 281–297. Oakland, CA, USA, 1967.
  • [6] Jianbo Shi and Jitendra Malik. Normalized cuts and image segmentation. Departmental Papers (CIS), page 107, 2000.
  • [7] Cuimei Guo, Sheng Zheng, Yaocheng Xie, and Wei Hao. A survey on spectral clustering. In World Automation Congress 2012, pages 53–56. IEEE, 2012.
  • [8] Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd, volume 96, pages 226–231, 1996.
  • [9] James C Bezdek, Robert Ehrlich, and William Full. Fcm: The fuzzy c-means clustering algorithm. Computers & Geosciences, 10(2-3):191–203, 1984.
  • [10] Guangcan Liu, Zhouchen Lin, and Yong Yu. Robust subspace segmentation by low-rank representation. In ICML, volume 1, page 8, 2010.
  • [11] Kim-Chuan Toh and Sangwoon Yun. An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems. Pacific Journal of optimization, 6(615-640):15, 2010.
  • [12] Zhouchen Lin, Minming Chen, and Yi Ma. The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. arXiv preprint arXiv:1009.5055, 2010.
  • [13] Zhouchen Lin, Risheng Liu, and Zhixun Su. Linearized alternating direction method with adaptive penalty for low-rank representation. In Advances in neural information processing systems, pages 612–620, 2011.
  • [14] Canyi Lu, Zhouchen Lin, and Shuicheng Yan. Smoothed low rank and sparse matrix recovery by iteratively reweighted least squares minimization. IEEE Transactions on Image Processing, 24(2):646–654, 2014.
  • [15] Baiyu Chen, Zi Yang, and Zhouwang Yang. An algorithm for low-rank matrix factorization and its applications. Neurocomputing, 275:1012–1020, 2018.
  • [16] Guangcan Liu, Zhouchen Lin, Shuicheng Yan, Ju Sun, Yong Yu, and Yi Ma. Robust recovery of subspace structures by low-rank representation. IEEE transactions on pattern analysis and machine intelligence, 35(1):171–184, 2012.
  • [17] Wei Siming and Lin Zhouchen. Analysis and improvement of low rank representation for subspace segmentation. arXiv preprint arXiv:1107.1561, 2011.
  • [18] João Paulo Costeira and Takeo Kanade. A multibody factorization method for independently moving objects. International Journal of Computer Vision, 29(3):159–179, 1998.
  • [19] Dennis L Sun and Cedric Fevotte. Alternating direction method of multipliers for non-negative matrix factorization with the beta-divergence. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 6201–6205. IEEE, 2014.
  • [20] Yuan Shen, Zaiwen Wen, and Yin Zhang. Augmented lagrangian alternating direction method for matrix separation based on low-rank factorization. Optimization Methods and Software, 29(2):239–263, 2014.
  • [21] Yangyang Xu, Wotao Yin, Zaiwen Wen, and Yin Zhang. An alternating direction algorithm for matrix completion with nonnegative factors. Frontiers of Mathematics in China, 7(2):365–384, 2012.
  • [22] Zhaosong Lu and Yong Zhang.

    An augmented lagrangian approach for sparse principal component analysis.

    Mathematical Programming, 135(1-2):149–193, 2012.
  • [23] Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, Jonathan Eckstein, et al. Distributed optimization and statistical learning via the alternating direction method of multipliers.

    Foundations and Trends® in Machine learning

    , 3(1):1–122, 2011.
  • [24] Shijie Xiao, Wen Li, Dong Xu, and Dacheng Tao. Falrr: A fast low rank representation solver. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4612–4620, 2015.
  • [25] Roberto Tron and René Vidal. A benchmark for the comparison of 3-d motion segmentation algorithms. In 2007 IEEE conference on computer vision and pattern recognition, pages 1–8. IEEE, 2007.
  • [26] Athinodoros S Georghiades, Peter N Belhumeur, and David J Kriegman.

    From few to many: Illumination cone models for face recognition under variable lighting and pose.

    IEEE Transactions on Pattern Analysis & Machine Intelligence, (6):643–660, 2001.