1 Introduction
Feature tracking is a prerequisite for many computer vision tasks, such as visual SLAM and action recognition. Among all the feature tracking methods, the KanadeLucasTomasi (KLT) tracker [18, 23, 21], although developed 30 years ago, still remains one of the most widely used techniques. One of the reasons for this popularity is its computational efficiency; the KLT tracker is local, in the sense that it treats each local region independently of the others, which makes it highly parallelizable. This locality, however, comes at a cost in tracking robustness: the tracking of each feature cannot benefit from intrinsic scene constraints, and thus often suffers from drift.
The realworld scenes, however, are often strongly constrained. For example, in autonomous driving, most of the moving objects (cars, vehicles, pedestrian) are rigid, or quasirigid if seen from afar. Several methods have therefore been proposed to exploit this scene rigidity to improve feature tracking [24, 3, 20]. Unfortunately, these methods all assume an affine camera model and are thus illsuited to handle strong perspective effects. More importantly, they work either as a postprocessing step on an entire sequence [24], which is sensitive to initial tracking results and does not apply to online feature tracking, or within a temporal sliding window [3, 20], which is sensitive to initialization in the first few frames.
By contrast, in this paper, we introduce a novel feature tracker that takes advantage of multibody scene rigidity to improve tracking robustness under a general perspective camera model. A conventional approach to addressing this problem would consist of alternating between two subtasks: motion segmentation and feature tracking under rigidity constraints for each segment. This, however, suffers from the following drawbacks: First, it requires knowing the number of observed motions; and, second, it relies on assigning points to individual motions, which is very sensitive to the initial motion estimates.
Here, we introduce a segmentationfree multibody feature tracker that overcomes these drawbacks. Specifically, our approach bypasses the motion assignment step by making use of subspace constraints derived directly from the epipolar constraints of multiple motions. As a result, our algorithm does not require prior knowledge of the number of motions. Furthermore, this allows us to formulate tracking as an optimization problem whose subproblems all have closedform solutions.
We demonstrate the effectiveness of our method on both feature point tracking and framebyframe motion segmentation on real world sequences. Our experiments show that, by incorporating multimotion constraints, our tracker yields better accuracies and is more robust to noise than the standard KLT tracker and the stateoftheart tracking algorithm of [20].
2 Related Work
The KLT tracker [23, 21] was derived from the LucasKanade algorithm for image alignment [18]. Feature tracking was achieved by optimizing the sum of squared differences between a template patch and an image patch with the GaussNewton method. It was later extended to handle relatively large displacements by the use of image pyramids [1].
Global rigidity constraints have been incorporated in feature point tracking to improve robustness. For instance, Torresani and Bregler [24] proposed to regularize tracking with a global lowrank constraint on the trajectory matrix of the whole sequence. They relied on the original KLT tracker to get a set of reliable tracks, and explicitly factorized the reliable trajectory matrix into two lowrank matrices with the rank given a priori. One of the lowrank matrices, called the motion parameter matrix, was then used to rectify the unreliable tracks. In short, this method can be viewed as a postprocessing step on the results of the KLT tracker, and is therefore not suitable for online frametoframe tracking.
Instead of using the whole sequence, lowrank constraints [3] and similar subspace priors [20] were applied within a temporal sliding window. Specifically, Buchanan and Fitzgibbon [3] exploited the lowrank constraints within a Bayesian tracking framework, making predictions of the new location of a particular point using a low rank approximation obtained from the previous frames. Recently, Poling et al. [20]
proposed a better feature tracker by adding soft subspace constraints to the original KLT tracker and jointly solving for the displacement vectors of all feature points. These methods, however, assume an affine camera model within a temporal window, and are therefore illsuited to handle strong perspective effects. Moreover, since the lowrank constraints are enforced in a temporal sliding window, these methods are sensitive to initialization in the first few frames.
By contrast, [19] exploits perspective projection by making use of epipolar constraints to track edgels in two consecutive frames. This method, however, was specifically designed to model a single motion, and thus does not easily extend to the multibody case.
In the closely related optical flow literature, several methods have been devoted to improving robustness via rigidity constraints. For instance, Valgaerts et al. [26] introduced a variational model to jointly recover the fundamental matrix and the optical flow; Wedel et al. [30, 29]
leveraged the fundamental matrix prior as an additional weak prior within a variational framework. These methods, however, assume that the scene is mostly stationary (and thus a single fundamental matrix is estimated), and treat the dynamic parts as outliers
[29]. Garg et al. [7, 8] proposed to make use of subspace constraints to regularize the multiframe optical flow within a variational approach. This approach, however, assumes an affine camera model and works over entire sequences.While, to the best of our knowledge, explicitly modeling multibody motion has not been investigated in the context of feature tracking and optical flow estimation, a large body of work [5, 27, 31, 15, 16, 28, 6, 12, 14, 13] has been devoted to multibody motion segmentation given good point trajectories in relatively long sequences. Typically, these tracks are first obtained with the KLT tracker, and then manually cleaned up, e.g., the Hopkins155 dataset [25]. In a sense, the lack of better tracking algorithms that can incorporate the intrinsic constraints of dynamic scenes prevents the practical use of these motion segmentation algorithms.
In this paper, we seek to track feature points in dynamic scenes where multiple motions are present. In this scenario, a single fundamental matrix is not sufficient to express the epipolar constraints any more. While one could think of alternating between estimating multiple fundamental matrices, motion assignments and displacement vectors, the resulting algorithm would typically be very sensitive to initialization, since the motion assignments strongly depend on the motion estimates. By contrast, we introduce a segmentationfree approach that bypasses the motion assignment problem by exploiting subspace constraints derived from epipolar geometry. This yields a robust multibody tracking algorithm that, as demonstrated by our experiments, opens up the possibility to perform motion segmentation in realistic scenarios.
3 Multibody Feature Tracker
We now introduce our approach to multibody feature tracking. Formally, let denote the current image, the previous image (or template image), and the image point in the patch of the template image. Our goal is to estimate the displacement vector for all tracked feature points. To this end, we rely on the standard brightness constancy assumption [22], which lets us derive the data term
(1) 
where, typically, or . In particular, we use the norm, which provides robustness to outliers.
Estimating the displacements from this data term only is typically sensitive to noise and may be subject to drift. A general approach to making the process more robust consists of introducing a regularizer to form an energy function of the form
(2) 
As mentioned above, several attempts at designing such a regularizer have been proposed. For example, under an affine camera model, can encode a lowrank prior [24, 20]; with a general projective camera model, can represent epipolar constraints (i.e., a fundamental matrix prior) [30, 26, 19]. In the latter case, the fundamental matrix can be either precomputed via an existing feature matching method [19], or recomputed iteratively.
When multiple motions are present, however, a single epipolar constraint is not sufficient. Instead, multiple fundamental matrices should be estimated so as to respect the assignments of the tracked points to individual motions. A straightforward way to addressing this problem consists of adding a motion segmentation step in the tracking algorithm, so that the fundamental matrices can be iteratively reestimated. This leads to the simple segmentationbased approach to multibody feature tracking as described below.
3.1 A First Attempt: Segmentationbased Tracking
To derive a segmentationbased approach, we rely on epipolar constraints. Recall that, in epipolar geometry [10], the homogeneous coordinates and of two corresponding image points in two frames are related by a fundamental matrix , such that
(3) 
It is therefore natural to exploit these constraints to regularize tracking according to the motion assignments of the different points.
More specifically, in the segmentationbased approach, three types of variables must be estimated: the displacement vector , the fundamental matrices (where is the number of motions), and the motion label of each tracked point. Let us denote by the homogeneous coordinate of the feature point (i.e., the center of the patch ) assigned to motion . We can define a multibody regularization term as
(4) 
where .
The energy function can then be approximately minimized by iterating over the three following steps:

Update by firstorder gradient descent [20];

Estimate for each motion given the current point assignments;

Reassign the motion labels of the feature points to the nearest .
This segmentationbased approach suffers from several drawbacks. First, the number of motions needs to be known a priori, which is typical hard for generalpurpose tracking. Second, and more importantly, the quality of solution obtained with this approach will strongly depend on the initializations of and of the motion labels. This, in a sense, is a chickenandegg problem, since good initialization for these variables could be obtained from good motion estimates. Instead, in the remainder of this section, we introduce a new segmentationfree approach that bypasses the need to explicitly compute the fundamental matrices and the motion assignments.
3.2 Our Segmentationfree Approach
In this section, we introduce our segmentationfree multibody feature tracker, which is the key contribution of this paper. We first show how the epipolar constraints can be converted to subspace constraints, and incorporated into our tracking formalism. We then derive the solution to the resulting optimization problem by decomposing it into several convex subproblems all with closedform solutions.
3.2.1 Epipolar Subspace Constraints
As in the segmentationbased approach, we seek to rely on epipolar geometry. To this end, we make use of the constraint expressed in Eq. 3. We first note that this constraint can be rewriten as
(5) 
where is the vectorized fundamental matrix , and
(6) 
Let us define . Then, lies in the orthogonal complement of , which is a subspace of dimension up to eight^{1}^{1}1Note that, in practice, this dimension is typically smaller than 8, since, in real scenes, the motion of objects, such as cars or people, is not arbitrary, and thus corresponds to degenerate (i.e., lowrank) motion [16]., and which we call the epipolar subspace. Since image points undergoing the same motion share the same fundamental matrix, all s corresponding to points belonging to the same rigid motion lie on the same subspace [16].
Therefore, in our multibody feature tracking scenario, if the feature points are correctly tracked, the data vectors defined as
(7) 
should lie in a union of linear subspaces. This subspace constraint can be characterized by the selfexpressiveness property [6, 12], i.e., a data point drawn from one subspace in a union of subspaces can be represented as a linear combination of the points lying in the same subspace.
In our case, this selfexpressiveness property can be expressed as
(8) 
where ^{2}^{2}2In the following, we make use of subscript , i.e., , to indicate that depends on the variable . For compactness, and without causing confusion, we drop this explicit dependency in Section 3.2.3., and is the coefficient matrix encoding the linear combinations. On its own, this term has a trivial solution for (i.e
., the identity matrix). To avoid this solution,
needs to be regularized. In the subspace clustering literature, is encouraged to be either sparse [6] by minimizing , low rank [17] by minimizing , or dense block diagonal [12] by minimizing . Here, we choose the Frobenius norm, which has proven effective and is easy to optimize. Furthermore, we explicitly model noise and outliers, which are inevitable in realworld sequences.More specifically, we write our regularization term for multibody tracking as
(9) 
where accounts for noise and outliers, and is thus encouraged to be sparse. Note that, for a given displacement , and ignoring noise, the optimal value of this regularizer depends on the intrinsic dimension of the motion [12]. Since here we optimize , this regularizer therefore tends to favor degenerate rigid motions over purely arbitrary rigid motions. This actually reflects reality, since, in real scenes, cars, people and other objects typically move in a wellconstrained manner.
Importantly, this regularization term requires explicitly computing neither the fundamental matrices, nor the motion assignments. As such, it therefore yields a segmentationfree approach.
Altogether, the energy function of our multibody tracking framework can be written as
(10) 
Our goal is to minimize w.r.t. and . We next show how to solve this optimization problem.
3.2.2 Approximation and Problem Reformulation
To optimize Eq. 10, we first approximate the data term in the same manner as the original KLT. In other words, given an initial displacement for patch , we approximate the intensity values with their firstorder Taylor expansion at . This can be written as
(11) 
For notational convenience, let , and . Then, the data term can be expressed as
(12) 
By combining this data term with our regularizer, we get the optimization problem
(13) 
where .
For convenience of optimization, we introduce an auxiliary variable . Then, (13) can be equivalently written as
(14) 
The main hurdle in optimizing (14) now lies in the term with due to its seemingly complicated dependency on . However, we show below that this term can be simplified by a few matrix derivations.
First, note that, by definition, we have
(15) 
where , , is the 3by3 identity matrix and denotes the Kronecker product. Let us define (or equivalently ) and introduce another auxiliary variable (where is obtained by removing every column of )^{3}^{3}3Note that = , since .. Our optimization problem then becomes
(16) 
where now .
The above optimization problem involves a large number of variables. We propose to solve it via the Alternating Direction Method of Multipliers (ADMM) [2], which decomposes a big optimization problem into several small subproblems. Below, we show how this can be achieved for our problem.
3.2.3 ADMM Solution
To apply the ADMM, we first need to derive the augmented Lagrangian of (16), which can be expressed as
(17)  
where denotes the matrix inner product, ,, are Lagrange multipliers, and is the penalty parameter. The ADMM then works by alternatively minimizing w.r.t. one of the five variables , , , , while keeping the remaining four fixed.
As shown in appendix, the five subproblems derived from the augmented Lagrangian are all convex problems that can be solved efficiently in closedform. These closedform solutions can be written as
(18)  
(19)  
(20)  
(21)  
(22) 
where is the vectorized form of , is the softthresholding operator, and the definitions of , , , are given in appendix.
Finally, the Lagrange multipliers and penalty parameter can be updated as
(23)  
(24)  
(25)  
(26) 
where , and is the predefined maximum of .
Our approach to solving (16) is outlined in Algorithm 1. Note that the problem we are trying to solve is nonconvex in that i) the intensity function is nonconvex w.r.t. ; ii) the optimization problem 16 involves a bilinear term in an equality constraint. While the ADMM does not guarantee convergence to the global optimum, it has proven effective in practice [11].
3.2.4 Our Complete Multibody Feature Tracker
In the same spirits as [1], we make use of an image pyramid to handle large displacements and avoid local optima. The results obtained at a coarser level of the pyramid are used as initialization for the next (, finer) level. Within each pyramid level, the initial displacement , where the firstorder Taylor approximation is performed, is updated with the displacement vector of the previous iteration. We iterate over successive Taylor approximations until the displacement vector does not change significantly. Our complete segmentationfree multibody feature tracker is outlined in Algorithm 2.
4 Experiments
KLT  L1KLT  BFT  Our Method 
To show the benefits of our multibody feature tracker, we performed extensive experiments on different sequences. In the remainder of this section, we present both qualitative and quantitative results.
In these experiments, we compare our approach with the following baselines: the original KLT tracker (KLT), the L1norm KLT tracker (L1KLT), and the more recent Better Feature Tracker (BFT) through Subspace Constraints [20]. For the original KLT, we used the Matlab builtin vision toolbox vision.PointTracker; we implemented the L1norm KLT tracker using the same framework as our method by just disabling the regularization term; and for BFT, we used the code released by the authors.
Due to the lack of benchmark datasets for feature tracking, we make use of motion segmentation datasets where both the groundtruth tracks and the original videos are available. Since those videos are typically only provided for illustration purpose, they are generally highly compressed and not ideal for reliable feature tracking. This, however, is not really a problem when one seeks to evaluate feature tracking methods, since (i) it essentially represents a challenging scenario; and (ii) all algorithms are evaluated on the same data. In particular, here, we employed 10 checkerboard (indoor) sequences and 12 carsandpeople (outdoor) sequences from the wellknown Hopkins155 dataset [25]. Moreover, we used another 8 outdoor sequences from the more recent MTPV dataset [16]
. To test the robustness of the different methods, we added different levels of Gaussian noise (with variance
0.01, 0.02, 0.03, or 0.04)^{4}^{4}4Note that the intensities of the images are normalized to . So the Gaussian noise with is already big noise and more noise may never occur in practice. to the images. Altogether, this results in 150 evaluation sequences. The values of the parameters ( and ) were tuned on a separate validation set and kept unchanged for all our experiments.To compare the algorithms, we measure the number of tracking errors, i.e., the number of points that drift from the groundtruth by more than a certain error tolerance . Note that, in the sequences that we use, the groundtruth was obtained by the standard KLT tracker and then manually cleaned up, so the groundtruth itself contains some noise whose level depends on the scene itself. In particular, we observed that the groundtruth of the indoor checkerboard sequences generally has more noise than that of the outdoor sequences. Therefore, we set a larger error tolerance for the checkerboard sequences () than for the outdoor ones (). For every sequence, we compute the average number of incorrectly tracked feature points over all the frames, and then average this number over the sequences.
4.1 Hopkins Checkerboard Sequences





Methods  KLT  L1KLT  BFT  Ours 
47.63  34.69  39.68  27.77  
46.92  30.86  39.30  27.32  
45.95  29.69  38.84  27.13  
46.59  30.16  39.16  28.18  
47.19  31.16  39.35  27.21  



We first evaluated our method and the baselines on the Hopkins checkerboard sequences, which depict controlled indoor scenes with multiple rigidly moving objects. The average number of tracks in this dataset is 202.9 .Generally, the repetitive texture in these sequences makes feature tracking more ambiguous and thus harder. However, in this experiment, we show that our multibody feature tracker is more robust to this ambiguity. To provide a fair comparison, we used the same patch size () and the same number of image pyramid levels (4) for all the methods. Furthermore, we initialized all the tracking methods with the groundtruth locations of the feature points in the first frame.
From Table 1, we can see that the L1KLT tracker consistently achieves better results than the original KLT tracker and than BFT. Our algorithm, however, consistently outperforms L1KLT, which clearly evidences the benefits of incorporating our multibody prior. We observed that BFT generally fails to track moving objects, as illustrated in Fig.1. This is mainly because BFT heavily relies on a good estimate of the global motion, obtained by registering the entire current image to the previous one. For scenes with multiple motions, however, global motion estimation becomes unreliable, thus causing BFT to fail to track the moving objects. Note that the performance of all the trackers remain relatively unaffected as the noise level increases. This is mainly due to the fact that the corners in the checkerboard, while resembling each other, are very strong features that are robust to noise.
4.2 Hopkins CarandPeople Sequences





Methods  KLT  L1KLT  BFT  Ours 
21.71  24.28  49.13  16.14  
34.59  29.31  51.69  18.82  
54.95  36.32  54.63  26.56  
76.02  46.49  57.57  33.80  
95.17  56.92  58.36  42.43  



We then evaluated the algorithms on the Hopkins CarandPeople sequences, depicting realworld outdoor scenes with multiple rigid motions. The number of tracks provided by the groundtruth ranges from 147 to 548 with an average of 369. Here, for all the methods, we used the same patch size and image pyramid levels as in the previous experiment, and initialized the feature points with their groundtruth locations in the first frame. The average number of tracking errors for the different methods under different image noise level is reported in Table 2. Again, our multibody feature tracker achieves the lowest tracking error compared to the baselines, which confirms the robustness of our method.
4.3 MTPV Sequences
We further tested our method on the MTPV sequences, which provide images of higher quality and resolution^{5}^{5}5Note, however, that they are still highly compressed and not wellsuited for tracking, as pointed out in the readme file of the dataset. than the Hopkins dataset and contains sequences with strong perspective effects. By contrast, however, this dataset contains some outliers and missing data. For evaluation purpose, i.e., to create a complete and accurate groundtruth, we discarded the outliers and missing data. Since the image resolution is higher in this dataset, we used a larger patch size of for all the methods. The results of all the algorithms are provided in Table 3. Note that we still outperform all the baselines for most noise levels, with the exception of BFT for . We believe that the slightly less impressive gap between our approach and the baselines, in particular BFT, is due to the fact that the feature points in this dataset are often dominated by the background. See Fig. 2 for typical examples of this dataset.





Methods  KLT  L1KLT  BFT  Ours 
3.07  13.34  6.83  2.34  
17.76  22.12  8.84  3.87  
28.39  27.26  11.17  6.94  
40.61  35.53  11.26  9.92  
47.69  38.93  12.34  13.22  



4.4 KITTI Sequence
KLT  L1KLT 
BFT  Our Method 
To evaluate the algorithms on realistic, highquality images, we employed four sequences^{6}^{6}62011_09_26_drive_0018, 2011_09_26_drive_0051, 2011_09_26_drive_ 0056, and 2011_09_28_drive_0016. from KITTI [9], depicting a street/traffic scene with multiple motions. Since no groundtruth trajectories are provided with this data, to obtain quantitative results, we took 10 consecutive frames from each sequence, applied the KLT tracker to them, and manually cleaned up the results to get groundtruth trajectories with an average 177 points per sequence. The results of this experiments for different levels of noise added to the input are reported in Table 4, and Fig. 3 shows a qualitative comparison of the algorithms. Note that our method also outperforms the baselines on this data.
Methods  KLT  L1KLT  BFT  Ours 

21.43  22.05  27.48  14.18  
24.35  22.85  27.80  16.70  
31.15  26.88  27.85  17.70  
34.43  29.23  27.75  20.33 
4.5 FramebyFrame Motion Segmentation





Methods  KLT+SSC  KLT+EDSC  L1+SSC  L1+EDSC  Ours 
19.76  20.57  18.71  19.11  8.97  
19.76  20.61  19.61  20.41  9.35  
19.21  20.99  21.02  21.92  9.33  
20.63  20.69  22.21  20.48  9.89  
20.38  19.82  21.35  20.80  11.26  



In our formulation, we optimize our energy function w.r.t. two variables: the displacement vector and the selfexpressiveness coefficients . While the vector provides the tracking results, the matrix
, as in the subspace clustering literature, can be used to build an affinity matrix for spectral clustering, and thus, if we assume that the number of motions is known
a priori, lets us perform motion segmentation. In other words, our method can also be interpreted as simultaneous feature tracking and framebyframe motion segmentation. In this experiment, we therefore aim to evaluate the framebyframe motion segmentation accuracy of our method. Since, To the best of our knowledge, no existing motion segmentation methods perform feature tracking and framebyframe motion segmentation jointly, we compare our results with the following twosteps baselines: first, we find the tracks by KLT or L1KLT and form the epipolar subspaces as in Eq. 7; second, we apply a subspace clustering method, i.e., Sparse Subspace Clustering (SSC) or Efficient Dense Subspace Clustering (EDSC), to perform motion segmentation. This results in four baselines denoted by KLT+SSC, KLT+EDSC, L1+SSC [16] and L1+EDSC. The results of motion segmentation on the 22 Hopkins sequences used previously are shown in Table 5. These results clearly evidence that our method outperforms the baselines significantly in terms of motion segmentation.5 Conclusion and Future Work
In this paper, we have introduced a novel feature tracker that incorporates a multibody rigidity prior into feature tracking. To this end, we have derived epipolar subspace constraints that prevent us from having to compute fundamental matrices and motion assignments explicitly. Our formulation only involves a series of convex subproblems, all of which have closedfrom solutions. We have demonstrated the effectiveness of our method via extensive experiments on indoor and outdoor sequences.
While adding global rigidity constraints (be it the lowrank or the epipolar subspace constraints) to the local KLT tracker improves robustness, it comes with some computational overhead. The current Matlab implementation of our method runs at about 1 frame per second for 200 points on a single core CPU (3.4GHZ), which is on par with BFT [20], but slower than the original KLT tracker. In the future, we will therefore study how to speed up our approach, for instance by exploiting the GPU. Furthermore, our current model assumes that each patch undergoes only translation between consecutive frames. We therefore plan to investigate the use of more accurate models, such as affine transformations.
Appendix: ADMM Derivations
Given the augmented Lagrangian in Eq. 3.2.3, the ADMM subproblems can be derived as follows:
(1) Computing can be expressed as the convex program
(27) 
which can be solved in closedform by elementwise thresholding [4], which directly yields Eq. 18.
(2) Similarly, computing translates to
(28) 
which again can be solved by elementwise thresholding, thus yielding Eq. 19.
(3) To compute , we have the leastsquares problem
(29) 
which can easily be solved in closedform as in Eq. 20.
(4) Computing requires solving the problem
(30) 
where is a column vector defined as
and is a sparse blockdiagonal matrix expressed as
This subproblem has again a closedform solution given by Eq. 21. Note that and are sparse matrices, so can be computed efficiently by sparse matrix techniques.
(5) While solving for may not seem straightforward, we show below that it is nothing but a leastsquares problem. The subproblem w.r.t. can be written as
(31) 
Let , , be the matrix forms of , , , respectively. Then, (31) can be equivalently written as
(32) 
This again leads to a closedform solution for given by Eq. 22, where and .
References
 [1] J.Y. Bouguet. Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm. Technical Report, Intel Microprocessor Research Labs, 2001.

[2]
S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein.
Distributed optimization and statistical learning via the alternating
direction method of multipliers.
Foundations and Trends® in Machine Learning
, 3(1):1–122, 2011.  [3] A. Buchanan and A. Fitzgibbon. Combining local and global motion models for feature point tracking. In CVPR, 2007.

[4]
J.F. Cai, E. J. Candès, and Z. Shen.
A singular value thresholding algorithm for matrix completion.
SIAM Journal on Optimization, 20(4):1956–1982, 2010.  [5] J. P. Costeira and T. Kanade. A multibody factorization method for independently moving objects. IJCV, 29(3):159–179, 1998.
 [6] E. Elhamifar and R. Vidal. Sparse subspace clustering: Algorithm, theory, and applications. PAMI, 35(11):2765–2781, 2013.
 [7] R. Garg, L. Pizarro, D. Rueckert, and L. Agapito. Dense multiframe optic flow for nonrigid objects using subspace constraints. In ACCV, 2010.
 [8] R. Garg, A. Roussos, and L. Agapito. A variational approach to video registration with subspace constraints. IJCV, 104(3):286–314, 2013.
 [9] A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In CVPR, 2012.
 [10] R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, ISBN: 0521540518, second edition, 2004.
 [11] P. Ji, H. Li, M. Salzmann, and Y. Dai. Robust motion segmentation with unknown correspondences. In ECCV. 2014.
 [12] P. Ji, M. Salzmann, and H. Li. Efficient dense subspace clustering. In WACV, 2014.
 [13] P. Ji, M. Salzmann, and H. Li. Shape interaction matrix revisited and robustified: Efficient subspace clustering with corrupted and incomplete data. In ICCV, 2015.
 [14] P. Ji, Y. Zhong, H. Li, and M. Salzmann. Null space clustering with applications to motion segmentation and face clustering. In ICIP, 2014.

[15]
H. Li.
Twoview motion segmentation from linear programming relaxation.
In CVPR, 2007.  [16] Z. Li, J. Guo, L.F. Cheong, and S. Z. Zhou. Perspective motion segmentation via collaborative clustering. In ICCV, 2013.
 [17] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma. Robust recovery of subspace structures by lowrank representation. PAMI, 35(1):171–184, 2013.
 [18] B. D. Lucas, T. Kanade, et al. An iterative image registration technique with an application to stereo vision. In IJCAI, volume 81, pages 674–679, 1981.
 [19] T. Piccini, M. Persson, K. Nordberg, M. Felsberg, and R. Mester. Good edgels to track: Beating the aperture problem with epipolar geometry. In ECCV Workshops, 2014.
 [20] B. Poling, G. Lerman, and A. Szlam. Better feature tracking through subspace constraints. In CVPR, 2014.
 [21] J. Shi and C. Tomasi. Good features to track. In CVPR, 1994.
 [22] R. Szeliski. Computer vision: algorithms and applications. Springer Science & Business Media, 2010.
 [23] C. Tomasi and T. Kanade. Detection and tracking of point features. School of Computer Science, Carnegie Mellon Univ. Pittsburgh, 1991.
 [24] L. Torresani and C. Bregler. Spacetime tracking. In ECCV, 2002.
 [25] R. Tron and R. Vidal. A benchmark for the comparison of 3d motion segmentation algorithms. In CVPR, 2007.
 [26] L. Valgaerts, A. Bruhn, and J. Weickert. A variational model for the joint recovery of the fundamental matrix and the optical flow. In Pattern Recognition, pages 314–324. 2008.

[27]
R. Vidal, Y. Ma, and S. Sastry.
Generalized principal component analysis (GPCA).
PAMI, 27(12):1945–1959, 2005.  [28] R. Vidal, S. Soatto, Y. Ma, and S. Sastry. Segmentation of dynamic scenes from the multibody fundamental matrix. In CVPR, 2001.
 [29] A. Wedel, D. Cremers, T. Pock, and H. Bischof. Structureand motionadaptive regularization for high accuracy optic flow. In ICCV, 2009.
 [30] A. Wedel, T. Pock, J. Braun, U. Franke, and D. Cremers. Duality TVL1 flow with fundamental matrix prior. In IVCNZ, 2008.
 [31] J. Yan and M. Pollefeys. A general framework for motion segmentation: Independent, articulated, rigid, nonrigid, degenerate and nondegenerate. In ECCV, 2006.