Robust Subspace Clustering via Smoothed Rank Approximation

08/18/2015 ∙ by Zhao Kang, et al. ∙ Southern Illinois University 0

Matrix rank minimizing subject to affine constraints arises in many application areas, ranging from signal processing to machine learning. Nuclear norm is a convex relaxation for this problem which can recover the rank exactly under some restricted and theoretically interesting conditions. However, for many real-world applications, nuclear norm approximation to the rank function can only produce a result far from the optimum. To seek a solution of higher accuracy than the nuclear norm, in this paper, we propose a rank approximation based on Logarithm-Determinant. We consider using this rank approximation for subspace clustering application. Our framework can model different kinds of errors and noise. Effective optimization strategy is developed with theoretical guarantee to converge to a stationary point. The proposed method gives promising results on face clustering and motion segmentation tasks compared to the state-of-the-art subspace clustering algorithms.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Recently there has been a surge of interest in finding minimum rank matrix within an affine constraint set [1, 2]. The problem is as follows,

(1)

where is the unknown matrix, is a linear mapping, and denotes the observations. Unfortunately, however, minimizing the rank of a matrix is known to be NP-hard and a very challenging problem.

Consequently, a widely-used convex relaxation approach is to replace the rank function with the nuclear norm , where is the

-th singular value of

(suppose ). The nuclear norm technique has been shown to be effective in encouraging a low-rank solution [3, 4]. Nevertheless, there is no guarantee for the minimum nuclear norm solution to coincide with that of minimal rank in many interesting circumstances, which is heavily dependent on the singular values of matrices in the nullspace of . Some variations of the nuclear norm have demonstrated promising results, e.g., singular value thresholding [5], and truncated nuclear norm [6].

Another popular alternative approach is to compute

(2)

where is usually a nonconvex and nonsmooth function. It has been observed that nonconvex approach can succeed in a broader range of scenarios [7]. However, nonconvex optimization is often challenging.

To overcome the above-mentioned difficulties, in this paper, we propose to use a particular log-determinant (LogDet) function to approximate the rank function. The formulation we consider is:

(3)

where

is an identity matrix. For large nonzero singular values, the LogDet function value will be much smaller than the nuclear norm. It is easy to show that

. Therefore, LogDet is a tighter rank approximation function than the nuclear norm. Although a similar function has been proposed and iterative linearization has been used to find a local minimum [8], its is required to be small (e.g., ), which leads to significantly biased approximation for small singular values and thus limited applications. Smoothed Schatten- function, , has been well studied in matrix completion [9]; nonetheless, the resulting algorithm is rather sensitive to parameter .

The main contributions of this work are as follows: An efficient algorithm is devised to optimize LogDet associated nonconvex objective function; Our method pushes the accuracy of subspace clustering to a new level.

Ii Problem statement of subspace clustering

An important application of the proposed LogDet function is the low-rank representation based subspace clustering problem. There has been significant research effort on this subject over the past several years due to its promising applications in computer vision and machine learning

[10]

. Subspace clustering aims at finding a low-dimensional subspace for each group of points, which is based on the widely-used assumption that high-dimensional data actually reside in a union of multiple low-dimensional subspaces. Under such an assumption the data could be separated in a projected subspace. Consequently, subspace clustering mainly involves two tasks, firstly projecting the data into a latent subspace to describe the affinities of points, and subsequently, grouping the data in that subspace. Some spectral clustering methods such as normalized cuts (NCuts)

[11] are usually used in the second task to find the cluster membership. Besides this spectral clustering-based subspace clustering method, iterative, algebraic, and statistical methods are also available in the literature [10]

, but they are usually sensitive to initialization, noise or outliers.

Typical spectral clustering-based subspace clustering methods are Local Subspace Affinity (LSA) [12], Sparse Subspace Clustering (SSC) [13], Low Rank Representation (LRR) [2, 14] and its more robust variant LRSC [15, 16]. Among them, SSC and LRR give promising results even in the presence of large outliers or corruption [17, 18]. They both suppose that each data point can be written as a linear combination of other points in the dataset. SSC tries to find the sparsest representation of data points through -norm. Even when the subspaces overlap, SSC can successfully reveal subspace structure [19]. SSC’s solution is sometimes too sparse to form a fully connected affinity graph for data in a single subspace [20]. LRR uses the lowest-rank representation to depict the similarity among data points. It is theoretically guaranteed to succeed when the subspaces are independent.

Let store a set of -dimensional samples drawn from a union of subspaces. LRR considers the following regularized nuclear norm rank minimization problem:

(4)

where is a parameter, represents unknown corruption, and can be -norm, -norm, or squared Frobenius norm. Specifically, if random corruption is assumed in the data, is usually adopted; is more suitable to characterize sample-specific corruptions and outliers; often describes Gaussian noise. LRR is able to produce pretty competitive performance on subspace clustering in the current literature. However, the solution to it might not be unique due to the nuclear norm [21]; and furthermore, the rank surrogate can deviate far from the true rank function.

To better approximate the rank while possessing the desired robustness similar to LRR, in this paper, we propose to use the above-mentioned LogDet function and solve the following problem:

(5)

The objective function of (5) is nonconvex. We design an effective optimization strategy based on an augmented Lagrangian multiplier (ALM) method, which is potentially applicable to large-scale data because of the decomposability of ALM and its admittance to parallel algorithms. For our optimization method, we provide theoretical analysis for its convergence, which mathematically guarantees that our algorithm can produce a convergent subsequence and the converged point is a stationary point of (5).

Iii Proposed method: CLAR

In this section, we present the proposed robust subspace clustering algorithm CLAR: Clustering with Log-determinant Approximation to Rank. The basic theorems and optimization algorithm will be presented below.

Iii-a Smoothed rank minimization

To make the objective function in (5) separable, we first convert it to the following equivalent problem by introducing an auxiliary variable :

(6)

We can solve problem (6) using a type of ALM method. The corresponding augmented Lagrangian function is

(7)

where and are Lagrange multipliers, and is a penalty parameter. Then we can apply the alternating minimization idea to update one of the variables with the others fixed.

Given the current point , , , , , the updating scheme is:

The first equation above has a closed-form solution:

(8)

For updating, it can be converted to scalar minimization problems due to the following theorem [22], which is also proved in the supplementary material.

Theorem 1.

If is a unitarily invariant function and SVD of is , then the optimal solution to the following problem

(9)

is with SVD , where ; moreover, , where

is the vector of nonincreasing singular values of

, then is obtained by using the Moreau-Yosida proximity operator , where , and

(10)

According to the first-order optimality condition, the gradient of the objective function of (10) with respect to each singular value should vanish. Thus we have

(11)

The above equation is cubic and gives three roots. If , the minimizer will be 0; otherwise, it can be shown that there is a unique minimizer if . To ensure this requirement is satisfied, we adopt in our experiments. Finally, we obtain the update of variable with

(12)

Depending on different regularization strategies, we have different closed-form solutions for . For squared Forbenius norm,

(13)

For -norm, according to [23], if we define , then can be updated element-wisely as:

(14)

For -norm, by [24], we have

(15)

Input: data matrix , parameters , , .
Initialize: , , .
REPEAT

1:   Obtain through (8).
2:   Update as (12).
3:   Solve by either (13), (14), or (15) according to .
4:   Update the multipliers:
5:   Update the parameter by .

UNTIL stopping criterion is met.

Algorithm 1 Smoothed Rank Minimization

The complete procedure for solving problem (5) is summarized in Algorithm 1. Since our objective function is nonconvex, it is difficult to give a rigorous mathematical proof for convergence to an (local) optimum. As we show in the supplementary material, our algorithm converges to an accumulation point and this accumulation point is a stationary point. Our experiments confirm the convergence of the proposed method. The experimental results are promising, despite that the solution obtained by the proposed optimization method may be a local optimum.

Iii-B Subspace segmentation

After we obtain the coefficient matrix , we consider constructing a similarity graph matrix from it, since postprocessing of the coefficient matrix often improves the clustering performance [13]. Using the angular information based technique in [14], we define , where and are from the skinny SVD of . Inspired by [25], we define as:

(16)

where and stand for the -th and -th columns of , and tunes the sharpness of the affinity between two points. However, an excessively large would break affinities between points of the same group. is used in our experiments, and thus we have the same post-processing procedure as LRR111As we confirmed with an author of [14], the power 2 of equation (12) in [14] is a typo, which should be 4. . After obtaining , we directly utilize NCuts to cluster the samples.

Fig. 1: Sample face images in Extended Yale B.

Iv Experiment

In this section, we apply CLAR to subspace clustering on two benchmark databases: the Extended Yale B database (EYaleB) [26] and the Hopkins 155 motion database [27]. CLAR is compared with the state-of-the-art subspace clustering algorithms: SSC, LSA, LRR, and LRSC. The segmentation error rate is used to evaluate the subspace clustering performance, which is defined to be the percentage of erroneously clustered samples versus the total number of samples in the data set being considered. The parameters are tuned to achieve the best performance. In general, when the corruptions or noise are slight, the value of should be relatively large. For our two experiments, and are used. influences the convergence speed, and we adopt as often done in literature. For fair comparison, we follow experimental settings in [13]. We stop the program when a maximum of 100 iterations or a relative difference of is reached. The experiments are implemented on Intel Core i5 2.3GHz MacBook Pro 2011 with 4G memory. The code is available at: https://github.com/sckangz/logdet.

Iv-a Face clustering

EYaleB consists of 2,414 frontal face images of 38 individuals under 64 lighting conditions. The task is to cluster these images into their individual subspaces by identity. EYaleB is challenging for subspace clustering due to large noise or corruptions, which can be seen from sample images in Figure 1. As [13], we model noise with . Each image is resized to a 2016-dimensional vector. We divide the 38 subjects into four groups, i.e., 1 to 10, 11 to 20, 21 to 30, and 31 to 38, and consider all choices of for each group and all choices of in the first three groups. There will be datasets for each n, respectively.


Fig. 2: Examples of recovery results of face images. The three columns from left to right are the original image (), the error matrix () and the recovered image (), respectively.

Mean and median error rates for the datasets corresponding to each are reported in Table I. It can be seen that CLAR outperforms the other methods significantly. As more subjects are involved, the error rate of CLAR remains at a low level, while those of other methods increase drastically. In particular, in the most challenging case of 10 subjects, the mean clustering error rate of CLAR is 3.85, which improves by 7.09 compared to the best result provided by SSC. This implies that our method is robust to in-sample outliers. In Table I, we also observe that the clustering error rates of LSA are much larger than other methods. This is potentially because LSA is based on MSE, which is heavily influenced by outliers. In addition, the advantage of our method is much more significant with respect to other low-rank representation based algorithms such as LRR and LRSC; for example, there is 11 and 19 improvement over LRR in the cases of 8 and 10 subjects, respectively. This verifies the importance of good rank approximation.

Figure 2 shows some recovery results from the 10-subject clustering scenario. As we can see, the error term is indeed sparse and it helps remove the shadows.

METHOD LRR SSC LSA LRSC CLAR
2 Subjects
Mean 2.54 1.86 32.80 5.32 1.27
Median 0.78 0.00 47.66 4.69 0.78
3 Subjects
Mean 4.21 3.10 52.29 8.47 1.92
Median 2.60 1.04 50.00 7.81 1.56
5 Subjects
Mean 6.90 4.31 58.02 12.24 2.64
Median 5.63 2.50 56.87 11.25 2.19
8 Subjects
Mean 14.34 5.85 59.19 23.72 3.36
Median 10.06 4.49 58.59 28.03 3.03
10 Subjects
Mean 22.92 10.94 60.42 30.36 3.85
Median 23.59 5.63 57.50 28.75 3.44
TABLE I: Clustering error rate (%) on the EYaleB dataset.
Fig. 3: Sample images in Hopkins 155 database. Trackers are denoted by different colors.

Iv-B Motion segmentation

In this subsection, we evaluate the robustness of CLAR for motion segmentation problem, which is an important step in video sequences analysis. Given multiple image frames of a dynamic scene, motion segmentation is to cluster the points in those views into different motions undertaken by the moving objects. Hopkins 155 motion database contains 155 video sequences along with features extracted and tracked in all frames for each sequence. Since the trajectories associated with each motion reside in a distinct affine subspace of dimension

, every motion corresponds to a subspace. Figure 3 gives some sample images. is applied to model the noise.

METHOD LRR SSC LSA LRSC CLAR
2 Motions
Mean 2.13 1.52 4.23 3.69 1.32
Median 0.00 0.00 0.56 0.29 0.00
3 Motions
Mean 4.03 4.40 7.02 7.69 2.60
Median 1.43 0.56 1.45 3.80 0.51
All
Mean 2.56 2.18 4.86 4.59 1.61
Median 0.00 0.00 0.89 0.60 0.00
Average Time 6.44 5.09 17.17 0.70 3.80
TABLE II: Segmentation error rate (%) and mean computational time (s) on the Hopkins 155 dataset.
Fig. 4: The influence of the parameter of CLAR on all 155 sequences of Hopkins 155.

Table II shows the clustering results on the Hopkins 155 dataset. CLAR achieves the best results in all cases. Specifically, the average clustering error rate is 1.32 for two motions and 2.60 for three motions. We also show the computational time in Table II. As we can see, our computational time is less than LRR, SSC and LSA, though more than LRSC. Figure 4 demonstrates the sensitivity of our algorithm to . It shows that the performance of CLAR is quite stable while varies in a pretty large range. We also test with values 1.05 and 1.2 which do not give much difference in error rate. Since our problem is nonconvex, we repeat the experiments using different random initializations and we can still get similar results after tuning the parameters. Thus, CLAR appears quite insensitive to initilizations.

V Conclusion

In this paper, we study the matrix rank minimization problem with log-determinant approximation. This surrogate can better approximate the rank function. As an application, we study its use for the robust subspace clustering problem. A minimization algorithm, based on a type of augmented Lagrangian multipliers method, is developed to optimize the associated nonconvex objective function. Extensive experiments on the face clustering and motion segmentation demonstrate the effectiveness and robustness of the proposed method, which shows superior performance when compared to the state-of-the-art subspace clustering methods.

Acknowledgment

This work is supported by US National Science Foundation grants IIS 1218712.

References

  • [1] E. J. Candès and B. Recht, “Exact matrix completion via convex optimization,” Foundations of Computational mathematics, vol. 9, no. 6, pp. 717–772, 2009.
  • [2] G. Liu, Z. Lin, and Y. Yu, “Robust subspace segmentation by low-rank representation,” in Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010, pp. 663–670.
  • [3] M. Fazel, “Matrix rank minimization with applications,” Ph.D. dissertation, PhD thesis, Stanford University, 2002.
  • [4] B. Recht, M. Fazel, and P. A. Parrilo, “Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization,” SIAM review, vol. 52, no. 3, pp. 471–501, 2010.
  • [5] J.-F. Cai, E. J. Candès, and Z. Shen, “A singular value thresholding algorithm for matrix completion,” SIAM Journal on Optimization, vol. 20, no. 4, pp. 1956–1982, 2010.
  • [6] Y. Hu, D. Zhang, J. Ye, X. Li, and X. He, “Fast and accurate matrix completion via truncated nuclear norm regularization,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 35, no. 9, pp. 2117–2130, 2013.
  • [7] C. Lu, J. Tang, S. Y. Yan, and Z. Lin, “Generalized nonconvex nonsmooth low-rank minimization,” in

    IEEE International Conference on Computer Vision and Pattern Recognition

    .   IEEE, 2014.
  • [8]

    M. Fazel, H. Hindi, and S. P. Boyd, “Log-det heuristic for matrix rank minimization with applications to hankel and euclidean distance matrices,” in

    American Control Conference, 2003. Proceedings of the 2003, vol. 3.   IEEE, 2003, pp. 2156–2162.
  • [9] K. Mohan and M. Fazel, “Iterative reweighted algorithms for matrix rank minimization,” The Journal of Machine Learning Research, vol. 13, no. 1, pp. 3441–3473, 2012.
  • [10] R. Vidal, “A tutorial on subspace clustering,” IEEE Signal Processing Magazine, vol. 28, no. 2, pp. 52–68, 2010.
  • [11] J. Shi and J. Malik, “Normalized cuts and image segmentation,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 22, no. 8, pp. 888–905, 2000.
  • [12] J. Yan and M. Pollefeys, “A general framework for motion segmentation: Independent, articulated, rigid, non-rigid, degenerate and non-degenerate,” in Computer Vision–ECCV 2006.   Springer, 2006, pp. 94–106.
  • [13] E. Elhamifar and R. Vidal, “Sparse subspace clustering: Algorithm, theory, and applications,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 35, no. 11, pp. 2765–2781, 2013.
  • [14] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma, “Robust recovery of subspace structures by low-rank representation,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 35, no. 1, pp. 171–184, 2013.
  • [15]

    P. Favaro, R. Vidal, and A. Ravichandran, “A closed form solution to robust subspace estimation and clustering,” in

    Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on.   IEEE, 2011, pp. 1801–1807.
  • [16] R. Vidal and P. Favaro, “Low rank subspace clustering (lrsc),” Pattern Recognition Letters, vol. 43, pp. 47–61, 2014.
  • [17]

    G. Liu, H. Xu, and S. Yan, “Exact subspace segmentation and outlier detection by low-rank representation,” in

    International Conference on Artificial Intelligence and Statistics

    , 2012, pp. 703–711.
  • [18] Y.-X. Wang and H. Xu, “Noisy sparse subspace clustering,” in Proceedings of The 30th International Conference on Machine Learning, 2013, pp. 89–97.
  • [19] M. Soltanolkotabi, E. J. Candes et al., “A geometric analysis of subspace clustering with outliers,” The Annals of Statistics, vol. 40, no. 4, pp. 2195–2238, 2012.
  • [20] B. Nasihatkon and R. Hartley, “Graph connectivity in sparse subspace clustering,” in Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on.   IEEE, 2011, pp. 2137–2144.
  • [21] H. Zhang, Z. Lin, and C. Zhang, “A counterexample for the validity of using nuclear norm as a convex surrogate of rank,” in Machine Learning and Knowledge Discovery in Databases.   Springer, 2013, pp. 226–241.
  • [22] Z. Kang, C. Peng, J. Cheng, and Q. Cheng, “Logdet rank minimization with application to subspace clustering,” Computational Intelligence and Neuroscience, vol. 2015, 2015.
  • [23] A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM Journal on Imaging Sciences, vol. 2, no. 1, pp. 183–202, 2009.
  • [24] J. Yang, W. Yin, Y. Zhang, and Y. Wang, “A fast algorithm for edge-preserving variational multichannel image restoration,” SIAM Journal on Imaging Sciences, vol. 2, no. 2, pp. 569–592, 2009.
  • [25] F. Lauer and C. Schnorr, “Spectral clustering of linear subspaces for motion segmentation,” in Computer Vision, 2009 IEEE 12th International Conference on.   IEEE, 2009, pp. 678–685.
  • [26]

    K.-C. Lee, J. Ho, and D. Kriegman, “Acquiring linear subspaces for face recognition under variable lighting,”

    Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 27, no. 5, pp. 684–698, 2005.
  • [27] R. Tron and R. Vidal, “A benchmark for the comparison of 3-d motion segmentation algorithms,” in Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on.   IEEE, 2007, pp. 1–8.