Motion Segmentation by SCC on the Hopkins 155 Database

09/09/2009
by   G. Chen, et al.
University of Minnesota
0

We apply the Spectral Curvature Clustering (SCC) algorithm to a benchmark database of 155 motion sequences, and show that it outperforms all other state-of-the-art methods. The average misclassification rate by SCC is 1.41 for sequences having two motions and 4.85

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

12/09/2010

Sparse motion segmentation using multiple six-point consistencies

We present a method for segmenting an arbitrary number of moving objects...
03/24/2022

Quantum Motion Segmentation

Motion segmentation is a challenging problem that seeks to identify inde...
04/06/2018

Motion Segmentation by Exploiting Complementary Geometric Models

Many real-world sequences cannot be conveniently categorized as general ...
05/03/2022

Predicting Loose-Fitting Garment Deformations Using Bone-Driven Motion Networks

We present a learning algorithm that uses bone-driven motion networks to...
08/16/2019

3D Rigid Motion Segmentation with Mixed and Unknown Number of Models

Many real-world video sequences cannot be conveniently categorized as ge...
03/25/2022

Implicit Neural Representations for Variable Length Human Motion Generation

We propose an action-conditional human motion generation method using va...
09/29/2018

In-Hand Manipulation via Motion Cones

In this paper, we present the mechanics and algorithms to compute the se...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Multiframe motion segmentation

is a very important yet challenging problem in computer vision. Given multiple image frames of a dynamic scene taken by a (possibly moving) camera, the task is to segment the point correspondences in those views into different motions undertaken by the moving objects. A more formal definition of the problem appears below.

Problem 1.

Consider a dynamic scene consisting of rigid-body motions undertaken by objects relative to a moving camera. Suppose that frames of images have been taken by the camera, and that feature points are detected on the objects. Let be the coordinates of the feature point in the image frame for every and , and form

trajectory vectors:

. The task is to separate these trajectories into independent motions undertaken by those objects.

There has been significant research on this subject over the past few years (see [19, 16] for a comprehensive literature review). According to the assumption on the camera model, those algorithms can be divided into the following two categories:

  1. Affine methods [5, 15, 14, 17, 20, 9, 19, 3] assume an affine projection model, so that the trajectories associated with each motion live in an affine subspace of dimension at most three (or a linear subspace of dimension at most four containing the affine subspace). Thus, the motion segmentation problem is equivalent to a subspace clustering problem. State-of-the-art affine algorithms that have been applied to this problem include Random Sample Consensus (RANSAC) [5, 15], Multi-Stage Learning (MSL) [14]

    , Generalized Principal Component Analysis (GPCA) 

    [17, 9, 19], Local Subspace Affinity (LSA) [20], and Agglomerative Lossy Compression (ALC) [8, 11].

  2. Perspective methods [7, 13, 18, 6, 12, 1] assume a perspective projection model under which point trajectories associated with each moving object lie on a multilinear variety. However, clustering multilinear varieties is a challenging task and very limited research has been done in this direction.

An extensive benchmark for comparing the performance of these algorithms is the Hopkins 155 Database [16]

. It contains 155 video sequences along with features extracted and tracked in all frames for each sequence, 120 of which have two motions and the rest (35 sequences) consist of three motions.

In this paper we examine the performance of a recent affine method, Spectral Curvature Clustering (SCC) [3, 2], on the Hopkins 155 database and compare it with other affine algorithms that are mentioned above (their results have been reported in [16, 11] and also partly online at http://www.vision.jhu.edu/data/hopkins155/).

Our experiments show that SCC outperforms all the above-mentioned affine algorithms on this benchmark dataset with an average classification error of 1.41% for two motions and 4.85% for three motions. In contrast, the smallest average misclassification rate among all other affine methods is 2.40% for sequences containing two motions and 6.26% for sequences with three motions, both achieved by ALC [11].

The rest of the paper is organized as follows. We first briefly review the SCC algorithm in Section 2, and then test in Section 3 the SCC algorithm against other common affine methods on the Hopkins 155 database. Finally, Section 4 concludes with a brief discussion.

2 Review of the SCC algorithm

The SCC algorithm [3, Algorithm 2] takes as input a data set , which is sampled from a mixture of affine subspaces in the Euclidean space

and possibly corrupted with noise and outliers. The number of the subspaces

and the maximum111By using only the maximal dimension we treat all the subspaces to be -dimensional. This strategy works quite well in many cases, as demonstrated in [3]. of their dimensions should also be provided by the user. The output of the algorithm is a partition of the data into (disjoint) clusters, , representing the affine subspaces.

The initial step of the SCC algorithm is to randomly select from the data subsets of (distinct) points with a fixed size . Based on these

-tuples, an affinity matrix

is formed in the following way. Let be the index sets of the subsets. Then for each and , if , we set by default; otherwise, we form the corresponding union and define

(1)

in which is a fixed constant whose automatic choice is explained later, and is the (squared) polar curvature [3] of the corresponding points, . That is,

(2)

Note that the numerator is, up to a factor, the (squared) volume of the -simplex formed by the points . Therefore, the polar curvature can be thought of as being the volume of the simplex, normalized at each vertex, averaged over the vertices, and then scaled by the diameter of the simplex. When points are sampled from the same subspace, we expect the polar curvature to be close to zero and consequently the affinity close to one. On the other hand, when they are sampled from mixed subspaces, the polar curvature is expected to be large and the affinity close to zero.

The SCC algorithm next forms pairwise weights from the above multi-way affinities:

(3)

and applies spectral clustering 

[10] to find clusters .

In order to refine the clusters, SCC then re-samples -tuples from each of the clusters , and re-applies the rest of the steps. This procedure is repeated until convergence for a best segmentation, and is referred to as iterative sampling (see [3, Sect. 3.1.1]). Its convergence is measured by the total orthogonal least squares (OLS) error of -dimensional affine subspace approximations to the clusters :

(4)

In situations where the ground truth labels of the data points are known, we also compute the misclassification rate:

(5)

The parameter of Eq. (1) is automatically selected by SCC at each iteration in the following way. Let denote the vector of all the squared polar curvatures computed in an arbitrarily fixed iteration. The algorithm applies the following set of candidate values which represent several scales of the curvatures:

(6)

and chooses the one for which the error of Eq. (4) is minimized. A quantitative derivation of the above selection criterion for appears in [3, Section 3.1.2]. It is also demonstrated in [3] that SCC will often fail with arbitrary choices of .

We present (a simplified version of) the SCC algorithm below (in Algorithm 1). We note that the storage requirement of the algorithm is , and the total running time is , where is the number of sampling iterations performed (till convergence, typically ).

0:  Data set , maximal intrinsic dimension , and number of subspaces (required); number of sampled subsets (default = )
0:   disjoint clusters . Steps:
1:  Sample randomly subsets of (with indices ), each containing distinct points.
2:   For each sampled subset , compute the squared polar curvature of it and each of the remaining points in by Eq. (2). Sort increasingly these squared curvatures into a vector .
3:   for to do Form the matrix by setting in Eq. (1

), and estimate the weights

via Eq. (3)
Apply spectral clustering [10] to these weights and find a partition of the data into clusters
end forRecord the partition that has the smallest total OLS error, i.e., of Eq. (4), for the corresponding -dimensional affine subspaces.
4:  Sample subsets of points (of size ) from each found above and repeat Steps 2 and 3 to find newer clusters. Iterate until convergence to obtain a best segmentation.
Algorithm 1 Spectral Curvature Clustering (SCC)
Figure 1: A sample image from each of the three categories in the Hopkins155 database.

3 Results

We compare the SCC algorithm with other state-of-the-art affine methods, such as ALC [8, 11], GPCA [17, 9, 19], LSA [20], MSL [14], and RANSAC [5, 15], using the Hopkins 155 benchmark [16]. We also compare the performance of affine methods with an oracle, the Reference algorithm (REF) [16], which fits subspaces using the ground truth clusters and re-assigns points to its nearest subspace. Though it cannot be used in practice, REF verifies the validity of affine camera model and provides a basis for comparison among practical algorithms. The results of the latter six methods (including REF) are already published in [19, 11], so we simply copy them from there.

2 motions 3 motions
# Seq. # Seq.
Checker. 78 291 28 26 437 28
Traffic 31 241 30 7 332 31
Other 11 155 40 2 122 31
All 120 266 30 35 398 29
Table 1: Summary information of the Hopkins 155 database: number of sequences (# Seq.), average number of feature points (), and average number of frames () in each category for two motions and three motions separately.

The Hopkins 155 database contains sequences with two and three motions, and consists of three categories of motions (see Figure 1 for a sample image in each category and Table 1 for some summary information of each category, e.g., number of sequences, average number of tracked features, and average number of frames):

  • Checkerboard: this category consists of 104 sequences of indoor scenes taken with a handheld camera under controlled conditions.

  • Traffic: this category consists of 38 sequences of outdoor traffic scenes taken by a moving handheld camera.

  • Other (Articulated/Non-rigid): this category contains 13 sequences displaying motions constrained by joints, head and face motions, people walking, etc.

It is proved (e.g., in [9]) that the trajectory vectors associated with each motion live in a distinct affine subspace of dimension (or a linear subspace of dimension containing the affine subspace). Also, it is possible to cluster the trajectories either in the full space ( is the number of frames) or in some projected space (after dimensionality reduction by PCA), e.g., ( is the number of motions) or . Thus, we will apply the SCC algorithm (Algorithm 1) to each of the 155 motion sequences to segment -dimensional subspaces in in six ways: . Each case is correspondingly represented by the shorthand SCC .

Checkerboard Traffic Other All
mean median mean median mean median mean median
ALC 5 2.66% 0.00% 2.58% 0.25% 6.90% 0.88% 3.03% 0.00%
ALC sp 1.55% 0.29% 1.59% 1.17% 10.70% 0.95% 2.40% 0.43%
GPCA 6.09% 1.03% 1.41% 0.00% 2.88% 0.00% 4.59% 0.38%
LSA 5 8.84% 3.43% 2.15% 1.00% 4.66% 1.28% 6.73% 1.99%
LSA 2.57% 0.27% 5.43% 1.48% 4.10% 1.22% 3.45% 0.59%
MSL 4.46% 0.00% 2.23% 0.00% 7.23% 0.00% 4.14% 0.00%
RANSAC 6.52% 1.75% 2.55% 0.21% 7.25% 2.64% 5.56% 1.18%
REF 2.76% 0.49% 0.30% 0.00% 1.71% 0.00% 2.03% 0.00%
SCC 2.99% 0.39% 1.20% 0.32% 7.71% 3.67% 2.96% 0.42%
SCC 1.76% 0.01% 0.46% 0.16% 4.06% 1.69% 1.63% 0.06%
SCC 1.77% 0.00% 0.63% 0.14% 4.02% 2.13% 1.68% 0.07%
SCC 2.31% 0.25% 0.71% 0.26% 5.05% 1.08% 2.15% 0.27%
SCC 1.30% 0.04% 1.07% 0.44% 3.68% 0.67% 1.46% 0.16%
SCC 1.31% 0.06% 1.02% 0.26% 3.21% 0.76% 1.41% 0.10%
Table 2: Misclassification rates for sequences with two motions. ALC 5 and ALC sp respectively represent ALC with projection dimensions 5 and a sparsity-preserving dimension, LSA means applying LSA in the projected space (after dimensionality reduction), and REF refers to the reference algorithm.
Checkerboard Traffic Other All
mean med. mean med. mean med. mean med.
ALC 5 7.05% 1.02% 3.52% 1.15% 7.25% 7.25% 6.26% 1.02%
ALC sp 5.20% 0.67% 7.75% 0.49% 21.08% 21.08% 6.69% 0.67%
GPCA 31.95% 32.93% 19.83% 19.55% 16.85% 16.85% 28.66% 28.26%
LSA 5 30.37% 31.98% 27.02% 34.01% 23.11% 23.11% 29.28% 31.63%
LSA 5.80% 1.77% 25.07% 23.79% 7.25% 7.25% 9.73% 2.33%
MSL 10.38% 4.61% 1.80% 0.00% 2.71% 2.71% 8.23% 1.76%
RANSAC 25.78% 26.01% 12.83% 11.45% 21.38% 21.38% 22.94% 22.03%
REF 6.28% 5.06% 1.30% 0.00% 2.66% 2.66% 5.08% 2.40%
SCC 7.72% 3.21% 0.52% 0.28% 8.90% 8.90% 6.34% 2.36%
SCC 6.00% 2.22% 1.78% 0.42% 5.65% 5.65% 5.14% 1.67%
SCC 6.23% 1.70% 1.11% 1.40% 5.41% 5.41% 5.16% 1.58%
SCC 5.56% 2.03% 1.01% 0.47% 8.97% 8.97% 4.85% 2.01%
SCC 5.68% 2.96% 2.35% 2.07% 10.94% 10.94% 5.31% 2.40%
SCC 6.31% 1.97% 3.31% 3.31% 9.58% 9.58% 5.90% 1.99%
Table 3: Misclassification rates for sequences with three motions. ALC 5 and ALC sp respectively represent ALC with projection dimensions 5 and a sparsity-preserving dimension, LSA means applying LSA in the projected space (after dimensionality reduction) and REF refers to the reference algorithm.

We use the default value for all SCC when applied to the 155 sequences. Also, in order to mitigate the randomness effect due to initial sampling, we repeat the experiment 100 times and record only the average misclassification rate. For each SCC , we report in Table 2 the mean and median of the averaged errors for sequences with two motions, and in Table 3 results on three motions. Figure 2 shows histograms of the misclassification rates with the percentage of sequences in which each algorithm achieved a certain error. The corresponding histograms for other methods are shown in [19, Figure 3].

4 Discussion

(a) two motions
(b) three motions
Figure 2: Histograms of misclassification errors obtained by SCC.

Looking at Tables 2 and 3, we conclude that the SCC algorithm (with all six pairs ) outperforms all competing methods (in terms of the mean error) and is very close to the reference algorithm (REF). In the checkerboard category, it even has a better performance than REF. In addition, SCC has the following two strengths in comparison with most other affine methods. First, as we observed in experiments, the performance of SCC is not so sensitive to its free parameter . In contrast, the ALC algorithm is very sensitive to its distortion parameter and often gives incorrect number of clusters, requiring running it for many choices of while having no theoretical guarantee. Second, SCC can be directly applied to the original trajectory vectors (which are very high dimensional), thus preprocessing of the trajectories, i.e., dimensionality reduction, is not necessary (unlike GPCA and LSA). Finally, we remark that SCC also outperforms some perspective methods, e.g., Local Linear Manifold Clustering (LLMC) [6] (their misclassification rates are also available at http://www.vision.jhu.edu/data/hopkins155/).

The histograms (in Figure 2) show that the SCC algorithm obtains a perfect segmentation for 80% of two-motion sequences and for over 50% of three-motion sequences. Under this criterion, SCC is at least comparable to the best algorithms (ALC, LSA , MSL) and the reference algorithm (REF); see [11, Figure 4] and [19, Figure 3]. Moveover, SCC has the shortest tails; its worst case segmentation error (about 35%) is much smaller than those of other methods some of which are as large as 50%.

Regarding running time, the SCC algorithm generally takes 1 to 2 seconds to process one sequence on a compute server with two dual core AMD Opteron 64-bit 280 processors (2.4 GHz) and 8 GB of RAM. It is much faster than the best competitors such as ALC, LSA , and MSL (see their computation time in [11, Table 6] and [16, Tables 3 & 5] while also noting that there were all performed on faster machines).

At the time of finalizing this version we have found out about the very recent affine method of Sparse Subspace Clustering (SSC) [4] which reportedly has superb results on the Hopkins 155 database and outperforms the results reported here for both SCC and REF. It will be interesting to test its sensitivity to its tuning parameter in future work.

Acknowledgements

We thank the anonymous reviewers for their helpful comments and for pointing out reference [4] to us. Special thanks go to Rene Vidal for encouraging comments when this manuscript was still at an early stage and for referring us to the workshop. Thanks to the Institute for Mathematics and its Applications (IMA), in particular Doug Arnold and Fadil Santosa, for an effective hot-topics workshop on multi-manifold modeling that we participated in. The research described in this paper was partially supported by NSF grants #0612608 and #0915064.

References

  • [1] G. Chen, S. Atev, and G. Lerman. Kernelized spectral curvature clustering (KSCC). In ICCV Workshop on Dynamical Vision, 2009.
  • [2] G. Chen and G. Lerman. Foundations of a multi-way spectral clustering framework for hybrid linear modeling. Found. Comput. Math., 2009. DOI 10.1007/s10208-009-9043-7.
  • [3] G. Chen and G. Lerman. Spectral curvature clustering (SCC). Int. J. Comput. Vision, 81(3):317–330, 2009.
  • [4] R. V. E. Elhamifar. Sparse subspace clustering. In

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    , 2009.
  • [5] M. Fischler and R. Bolles. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Comm. of the ACM, 24(6):381–395, June 1981.
  • [6] A. Goh and R. Vidal. Segmenting motions of different types by unsupervised manifold clustering. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1–6, June 2007.
  • [7] R. Hartley and R. Vidal.

    The multibody trifocal tensor: motion segmentation from 3 perspective views.

    In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 1, pages 769–775, 2004.
  • [8] Y. Ma, H. Derksen, W. Hong, and J. Wright. Segmentation of multivariate mixed data via lossy coding and compression. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(9):1546–1562, September 2007.
  • [9] Y. Ma, A. Y. Yang, H. Derksen, and R. Fossum. Estimation of subspace arrangements with applications in modeling and segmenting mixed data. SIAM Review, 50(3):413–458, 2008.
  • [10] A. Ng, M. Jordan, and Y. Weiss.

    On spectral clustering: Analysis and an algorithm.

    In Advances in Neural Information Processing Systems 14, pages 849–856, 2001.
  • [11] S. Rao, R. Tron, R. Vidal, and Y. Ma. Motion segmentation via robust subspace separation in the presence of outlying, incomplete, or corrupted trajectories. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008.
  • [12] S. Rao, A. Yang, S. Sastry, and Y. Ma. Robust algebraic segmentation of mixed rigid-body and planar motions. Submitted to International Journal of Computer Vision, 2008.
  • [13] K. Schindler, J. U, and H. Wang. Perspective -view multibody structure-and-motion through model selection. In Proc. of the 9th European conference on computer vision, volume 1, pages 606–619, 2006.
  • [14] Y. Sugaya and K. Kanatani. Geometric structure of degeneracy for multi-body motion segmentation. In Lecture Notes in Computer Science, volume 3247/2004, chapter Statistical Methods in Video Processing, pages 13–25. Springer Berlin / Heidelberg, 2004.
  • [15] P. H. S. Torr. Geometric motion segmentation and model selection. Phil. Trans. R. Soc. Lond. A, 356:1321–1340, 1998.
  • [16] R. Tron and R. Vidal. A benchmark for the comparison of 3-D motion segmentation algorithms. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8, 2007.
  • [17] R. Vidal, Y. Ma, and S. Sastry. Generalized principal component analysis (GPCA). IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(12), 2005.
  • [18] R. Vidal, Y. Ma, S. Soatto, and S. Sastry. Two-view multibody structure from motion. Int. J. Comput. Vis., 68(1):7–25, June 2006.
  • [19] R. Vidal, R. Tron, and R. Hartley. Multiframe motion segmentation with missing data using powerfactorization and GPCA. Int. J. Comput. Vis., 79:85–105, 2008.
  • [20] J. Yan and M. Pollefeys. A general framework for motion segmentation: Independent, articulated, rigid, non-rigid, degenerate and nondegenerate. In ECCV, volume 4, pages 94–106, 2006.