Multiframe motion segmentation
is a very important yet challenging problem in computer vision. Given multiple image frames of a dynamic scene taken by a (possibly moving) camera, the task is to segment the point correspondences in those views into different motions undertaken by the moving objects. A more formal definition of the problem appears below.
Consider a dynamic scene consisting of rigid-body motions
undertaken by objects relative to a moving camera. Suppose that
frames of images have been taken by the camera, and that
are detected on the objects. Let be
the coordinates of the feature point in the
image frame for every and , and form trajectory vectors:
trajectory vectors:. The task is to separate these trajectories into independent motions undertaken by those objects.
There has been significant research on this subject over the past few years (see [19, 16] for a comprehensive literature review). According to the assumption on the camera model, those algorithms can be divided into the following two categories:
Affine methods [5, 15, 14, 17, 20, 9, 19, 3] assume an affine projection model, so that the trajectories associated with each motion live in an affine subspace of dimension at most three (or a linear subspace of dimension at most four containing the affine subspace). Thus, the motion segmentation problem is equivalent to a subspace clustering problem. State-of-the-art affine algorithms that have been applied to this problem include Random Sample Consensus (RANSAC) [5, 15], Multi-Stage Learning (MSL) 
, Generalized Principal Component Analysis (GPCA)[17, 9, 19], Local Subspace Affinity (LSA) , and Agglomerative Lossy Compression (ALC) [8, 11].
An extensive benchmark for comparing the performance of these algorithms is the Hopkins 155 Database 
. It contains 155 video sequences along with features extracted and tracked in all frames for each sequence, 120 of which have two motions and the rest (35 sequences) consist of three motions.
In this paper we examine the performance of a recent affine method, Spectral Curvature Clustering (SCC) [3, 2], on the Hopkins 155 database and compare it with other affine algorithms that are mentioned above (their results have been reported in [16, 11] and also partly online at http://www.vision.jhu.edu/data/hopkins155/).
Our experiments show that SCC outperforms all the above-mentioned affine algorithms on this benchmark dataset with an average classification error of 1.41% for two motions and 4.85% for three motions. In contrast, the smallest average misclassification rate among all other affine methods is 2.40% for sequences containing two motions and 6.26% for sequences with three motions, both achieved by ALC .
2 Review of the SCC algorithm
The SCC algorithm [3, Algorithm 2] takes as input a data set , which is sampled from a mixture of affine subspaces in the Euclidean space
and possibly corrupted with noise and outliers. The number of the subspacesand the maximum111By using only the maximal dimension we treat all the subspaces to be -dimensional. This strategy works quite well in many cases, as demonstrated in . of their dimensions should also be provided by the user. The output of the algorithm is a partition of the data into (disjoint) clusters, , representing the affine subspaces.
The initial step of the SCC algorithm is to randomly select from the data subsets of (distinct) points with a fixed size . Based on these
-tuples, an affinity matrixis formed in the following way. Let be the index sets of the subsets. Then for each and , if , we set by default; otherwise, we form the corresponding union and define
in which is a fixed constant whose automatic choice is explained later, and is the (squared) polar curvature  of the corresponding points, . That is,
Note that the numerator is, up to a factor, the (squared) volume of the -simplex formed by the points . Therefore, the polar curvature can be thought of as being the volume of the simplex, normalized at each vertex, averaged over the vertices, and then scaled by the diameter of the simplex. When points are sampled from the same subspace, we expect the polar curvature to be close to zero and consequently the affinity close to one. On the other hand, when they are sampled from mixed subspaces, the polar curvature is expected to be large and the affinity close to zero.
The SCC algorithm next forms pairwise weights from the above multi-way affinities:
and applies spectral clustering to find clusters .
In order to refine the clusters, SCC then re-samples -tuples from each of the clusters , and re-applies the rest of the steps. This procedure is repeated until convergence for a best segmentation, and is referred to as iterative sampling (see [3, Sect. 3.1.1]). Its convergence is measured by the total orthogonal least squares (OLS) error of -dimensional affine subspace approximations to the clusters :
In situations where the ground truth labels of the data points are known, we also compute the misclassification rate:
The parameter of Eq. (1) is automatically selected by SCC at each iteration in the following way. Let denote the vector of all the squared polar curvatures computed in an arbitrarily fixed iteration. The algorithm applies the following set of candidate values which represent several scales of the curvatures:
and chooses the one for which the error of Eq. (4) is minimized. A quantitative derivation of the above selection criterion for appears in [3, Section 3.1.2]. It is also demonstrated in  that SCC will often fail with arbitrary choices of .
We present (a simplified version of) the SCC algorithm below (in Algorithm 1). We note that the storage requirement of the algorithm is , and the total running time is , where is the number of sampling iterations performed (till convergence, typically ).
We compare the SCC algorithm with other state-of-the-art affine methods, such as ALC [8, 11], GPCA [17, 9, 19], LSA , MSL , and RANSAC [5, 15], using the Hopkins 155 benchmark . We also compare the performance of affine methods with an oracle, the Reference algorithm (REF) , which fits subspaces using the ground truth clusters and re-assigns points to its nearest subspace. Though it cannot be used in practice, REF verifies the validity of affine camera model and provides a basis for comparison among practical algorithms. The results of the latter six methods (including REF) are already published in [19, 11], so we simply copy them from there.
|2 motions||3 motions|
|# Seq.||# Seq.|
The Hopkins 155 database contains sequences with two and three motions, and consists of three categories of motions (see Figure 1 for a sample image in each category and Table 1 for some summary information of each category, e.g., number of sequences, average number of tracked features, and average number of frames):
Checkerboard: this category consists of 104 sequences of indoor scenes taken with a handheld camera under controlled conditions.
Traffic: this category consists of 38 sequences of outdoor traffic scenes taken by a moving handheld camera.
Other (Articulated/Non-rigid): this category contains 13 sequences displaying motions constrained by joints, head and face motions, people walking, etc.
It is proved (e.g., in ) that the trajectory vectors associated with each motion live in a distinct affine subspace of dimension (or a linear subspace of dimension containing the affine subspace). Also, it is possible to cluster the trajectories either in the full space ( is the number of frames) or in some projected space (after dimensionality reduction by PCA), e.g., ( is the number of motions) or . Thus, we will apply the SCC algorithm (Algorithm 1) to each of the 155 motion sequences to segment -dimensional subspaces in in six ways: . Each case is correspondingly represented by the shorthand SCC .
We use the default value for all SCC when applied to the 155 sequences. Also, in order to mitigate the randomness effect due to initial sampling, we repeat the experiment 100 times and record only the average misclassification rate. For each SCC , we report in Table 2 the mean and median of the averaged errors for sequences with two motions, and in Table 3 results on three motions. Figure 2 shows histograms of the misclassification rates with the percentage of sequences in which each algorithm achieved a certain error. The corresponding histograms for other methods are shown in [19, Figure 3].
Looking at Tables 2 and 3, we conclude that the SCC algorithm (with all six pairs ) outperforms all competing methods (in terms of the mean error) and is very close to the reference algorithm (REF). In the checkerboard category, it even has a better performance than REF. In addition, SCC has the following two strengths in comparison with most other affine methods. First, as we observed in experiments, the performance of SCC is not so sensitive to its free parameter . In contrast, the ALC algorithm is very sensitive to its distortion parameter and often gives incorrect number of clusters, requiring running it for many choices of while having no theoretical guarantee. Second, SCC can be directly applied to the original trajectory vectors (which are very high dimensional), thus preprocessing of the trajectories, i.e., dimensionality reduction, is not necessary (unlike GPCA and LSA). Finally, we remark that SCC also outperforms some perspective methods, e.g., Local Linear Manifold Clustering (LLMC)  (their misclassification rates are also available at http://www.vision.jhu.edu/data/hopkins155/).
The histograms (in Figure 2) show that the SCC algorithm obtains a perfect segmentation for 80% of two-motion sequences and for over 50% of three-motion sequences. Under this criterion, SCC is at least comparable to the best algorithms (ALC, LSA , MSL) and the reference algorithm (REF); see [11, Figure 4] and [19, Figure 3]. Moveover, SCC has the shortest tails; its worst case segmentation error (about 35%) is much smaller than those of other methods some of which are as large as 50%.
Regarding running time, the SCC algorithm generally takes 1 to 2 seconds to process one sequence on a compute server with two dual core AMD Opteron 64-bit 280 processors (2.4 GHz) and 8 GB of RAM. It is much faster than the best competitors such as ALC, LSA , and MSL (see their computation time in [11, Table 6] and [16, Tables 3 & 5] while also noting that there were all performed on faster machines).
At the time of finalizing this version we have found out about the very recent affine method of Sparse Subspace Clustering (SSC)  which reportedly has superb results on the Hopkins 155 database and outperforms the results reported here for both SCC and REF. It will be interesting to test its sensitivity to its tuning parameter in future work.
We thank the anonymous reviewers for their helpful comments and for pointing out reference  to us. Special thanks go to Rene Vidal for encouraging comments when this manuscript was still at an early stage and for referring us to the workshop. Thanks to the Institute for Mathematics and its Applications (IMA), in particular Doug Arnold and Fadil Santosa, for an effective hot-topics workshop on multi-manifold modeling that we participated in. The research described in this paper was partially supported by NSF grants #0612608 and #0915064.
-  G. Chen, S. Atev, and G. Lerman. Kernelized spectral curvature clustering (KSCC). In ICCV Workshop on Dynamical Vision, 2009.
-  G. Chen and G. Lerman. Foundations of a multi-way spectral clustering framework for hybrid linear modeling. Found. Comput. Math., 2009. DOI 10.1007/s10208-009-9043-7.
-  G. Chen and G. Lerman. Spectral curvature clustering (SCC). Int. J. Comput. Vision, 81(3):317–330, 2009.
R. V. E. Elhamifar.
Sparse subspace clustering.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
-  M. Fischler and R. Bolles. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Comm. of the ACM, 24(6):381–395, June 1981.
-  A. Goh and R. Vidal. Segmenting motions of different types by unsupervised manifold clustering. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1–6, June 2007.
R. Hartley and R. Vidal.
The multibody trifocal tensor: motion segmentation from 3 perspective views.In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 1, pages 769–775, 2004.
-  Y. Ma, H. Derksen, W. Hong, and J. Wright. Segmentation of multivariate mixed data via lossy coding and compression. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(9):1546–1562, September 2007.
-  Y. Ma, A. Y. Yang, H. Derksen, and R. Fossum. Estimation of subspace arrangements with applications in modeling and segmenting mixed data. SIAM Review, 50(3):413–458, 2008.
A. Ng, M. Jordan, and Y. Weiss.
On spectral clustering: Analysis and an algorithm.In Advances in Neural Information Processing Systems 14, pages 849–856, 2001.
-  S. Rao, R. Tron, R. Vidal, and Y. Ma. Motion segmentation via robust subspace separation in the presence of outlying, incomplete, or corrupted trajectories. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008.
-  S. Rao, A. Yang, S. Sastry, and Y. Ma. Robust algebraic segmentation of mixed rigid-body and planar motions. Submitted to International Journal of Computer Vision, 2008.
-  K. Schindler, J. U, and H. Wang. Perspective -view multibody structure-and-motion through model selection. In Proc. of the 9th European conference on computer vision, volume 1, pages 606–619, 2006.
-  Y. Sugaya and K. Kanatani. Geometric structure of degeneracy for multi-body motion segmentation. In Lecture Notes in Computer Science, volume 3247/2004, chapter Statistical Methods in Video Processing, pages 13–25. Springer Berlin / Heidelberg, 2004.
-  P. H. S. Torr. Geometric motion segmentation and model selection. Phil. Trans. R. Soc. Lond. A, 356:1321–1340, 1998.
-  R. Tron and R. Vidal. A benchmark for the comparison of 3-D motion segmentation algorithms. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8, 2007.
-  R. Vidal, Y. Ma, and S. Sastry. Generalized principal component analysis (GPCA). IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(12), 2005.
-  R. Vidal, Y. Ma, S. Soatto, and S. Sastry. Two-view multibody structure from motion. Int. J. Comput. Vis., 68(1):7–25, June 2006.
-  R. Vidal, R. Tron, and R. Hartley. Multiframe motion segmentation with missing data using powerfactorization and GPCA. Int. J. Comput. Vis., 79:85–105, 2008.
-  J. Yan and M. Pollefeys. A general framework for motion segmentation: Independent, articulated, rigid, non-rigid, degenerate and nondegenerate. In ECCV, volume 4, pages 94–106, 2006.