RANSAC Algorithms for Subspace Recovery and Subspace Clustering

11/30/2017 ∙ by Ery Arias-Castro, et al. ∙ 0

We consider the RANSAC algorithm in the context of subspace recovery and subspace clustering. We derive some theory and perform some numerical experiments. We also draw some correspondences with the methods of Hardt and Moitra (2013) and Chen and Lerman (2009b).

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The Random Sample Consensus (with acronym RANSAC) algorithm of Fischler and Bolles (1981)

, and its many variants and adaptations, are well-known in computer vision for their robustness in the presence of gross errors (outliers). In this paper we focus on the closely related problems of subspace recovery and subspace clustering in the presence of outliers, where RANSAC-type methods are believed to be optimal, yet too costly in terms of computations when the fraction of inliers is small. Although this is a well-understood limitation of the RANSAC, we nevertheless establish this rigorously in the present context. In particular, we derive the performance and computational complexity of RANSAC for these two problems, and perform some numerical experiments corroborating our theory and comparing the RANSAC with other methods proposed in the literature.

1.1 The problem of subspace recovery

Consider a setting where the data consist of points in dimension , denoted . It is assumed that of these points lie on a -dimensional linear subspace and that the points are otherwise in general position, which means that the following assumption is in place:

Assumption 1.

A -tuple of data points (with ) is linearly independent unless it includes at least points from .

(We say that points are linearly dependent if they are so when seen as vectors.)

The points on are called inliers and all the other points are called outliers. This is the setting of subspace recovery without noise. When there is noise, the points are not exactly on the underlying subspace but rather in its vicinity. In any case, the goal is to recover , or said differently, distinguish the inliers from the outliers. See Figure 0(a) for an illustration in a setting where the subspace is of dimension in ambient dimension . The goal is to recover the subspace and/or identify the inlier points.

This problem is intimately related to the problem of robust covariance estimation, which dates back decades

(Maronna, 1976; Huber and Ronchetti, 2009; Tyler, 1987), but has attracted some recent attention. We refer the reader to the introduction of (Zhang and Lerman, 2014)

for a comprehensive review of the literature, old and new. Subspace recovery in the presence of outliers, as we consider the problem here, is sometimes referred to a robust principal components analysis, although there are other meanings in the literature more closely related to matrix factorization with a low-rank component

(Wright et al., 2009; Candès et al., 2011).

(a) Subspace recovery problem.
(b) Subspace clustering problem.
Figure 1: An illustration of the two settings considered in the paper.

1.2 The problem of subspace clustering

Consider a setting where the data consist of points in dimension , denoted . It is assumed that of these points lie on a -dimensional linear subspace , where (so that there are subspaces in total). The remaining points are in general position:

Assumption 2.

A -tuple of data points is linearly independent unless it includes at least points from for some .

In this setting all the points on one of the subspaces are inliers, and all the other points are outliers. This is the setting of subspace clustering without noise. When there is noise, the inliers are not exactly on the subspaces but in their vicinity. See Figure 0(b) for an illustration in a setting where there is one subspace of dimension and two subspaces of dimension , in ambient dimension . The goal is to cluster points to their corresponding for all .

The problem of subspace clustering has applications in computer vision, in particular, movement segmentation (Vidal, 2011; Vidal et al., 2005).

1.3 Contents

In Section 2, we consider the problem of subspace recovery. In Section 3, we consider the problem of subspace clustering. In both cases, we study a ‘canonical’ RANSAC algorithm, deriving some theory and comparing it with other methods in numerical experiments. We briefly discuss our results in Section 4.

Remark 1 (linear vs affine).

Throughout, we consider the case where the subspaces are linear, although some applications may call for affine subspaces. (This is for convenience.) Because of this, we are able to identify a point with the corresponding vector (sometimes written ).

2 Subspace recovery

We consider the setting of Section 1.1 and use the notation defined there. In particular, we work under Assumption 1. We consider the noiseless setting for simplicity.

2.1 RANSAC for subspace recovery

We propose a simple RANSAC algorithm for robust subspace recovery. In the present setting, in particular under Assumption 1, the underlying linear subspace (which we assumed is of dimension ) is determined by any -tuple that comes from that subspace. The algorithm starts by randomly selecting a -tuple and checking if this tuple forms a linear subspace of dimension . If so, the subspace is recovered and the algorithm stops. Otherwise, the algorithm continues, repeatedly sampling a -tuple at random until the subspace is discovered. (Optionally, the algorithm can be made to stop when a maximum number of tuples has been sampled.) In this formulation, detailed in Algorithm 1, is known.

Input : data points ; dimension
Output : a linear subspace of dimension containing at least points
1 repeat
2      randomly select a -tuple of data points
3until the tuple is linearly dependent;
return the subspace spanned by the tuple
Algorithm 1 RANSAC (Subspace Recovery)

By design, the procedure is exact. (Again, we are in the noiseless setting. In a noisy setting, the method can be shown to be essentially optimal.) However, researchers have shied away from a RANSAC approach because of its time complexity. We formalize what is in the folklore in the following.

Proposition 1.

Algorithm 1

is exact and the number of iterations has the geometric distribution

111 Here we consider the variant of the geometric distribution that is supported on the positive integers.

with success probability

. Thus the expected number of iterations is , which is of order when is held fixed.

Note that each iteration requires on the order of operations as it requires computing the rank of a -by- matrix.

Proof.

The algorithm sample a -tuple independently and uniformly at random until the tuple is linearly dependent. Because of Assumption 1, a -tuple is linearly dependent if and only if all the points in the tuple are from . While there are -tuples in total, only fit the bill, so that the probability of drawing a suitable tuple is . Because the draws are independent, the total number of draws until the algorithm stops has the geometric distribution with success probability .

We know that the mean of this distribution is , and when is assumed fixed, while and are large, we have

(The reader is invited to verify that this still holds true as long as .) ∎

In applications where the number of outliers is a non-negligible fraction of the sample (meaning that is not close to 1), the RANSAC’s number of iterations depends exponentially on the dimension of subspace. This confirms the folklore, at least in such a setting.

Remark 2.

For simplicity, we analyzed the variant of the algorithm where the tuples are drawn with replacement, so that the worst-case number of iterations is infinite. However, in practice one should draw the tuples without replacement (which is equally easy to do in the present setting), as recommended in (Schattschneider and Green, 2012). For this variant, the worst-case time complexity is . Moreover, Proposition 1

still applies if understood as an upper bound. (The number of iterations has a so-called negative hypergeometric distribution in this case.)

Remark 3.

If the dimension is unknown, a possible strategy is to start with , run the algorithm for a maximum number of iterations, and if no pair of points is found to be aligned with the origin, move to , and continue in that fashion, increasing the dimension. If no satisfactory tuple is found, the algorithm would start again at . The algorithm will succeed eventually.

2.2 The algorithm of Hardt and Moitra for subspace recovery

As we said above, researchers have avoid RANSAC procedures because of the running time, which as we saw can be prohibitive. Recently, however, Hardt and Moitra (2013) have proposed a RANSAC-type algorithm that strikes an interesting compromise between running time and precision.

Their algorithm is designed for the case where the sample size is larger than the ambient dimension, namely . It can be described as follows. It repeatedly draws a -tuple at random until the tuple is found to be linearly dependent. When such a tuple is found, the algorithm returns a set of linearly dependent points in the tuple. See the description in Algorithm 2. A virtue of this procedure is that it does not require knowledge of the dimension of the underlying subspace.

Input : data points
Output : a linear subspace
1 repeat
2      randomly select a -tuple of data points
3until the tuple is linearly dependent;
return the subspace spanned by any subset of linearly dependent points in the tuple
Algorithm 2 Hardt-Moitra (Subspace Recovery)
Proposition 2.

When , Algorithm 2 is exact and its number of iterations has the geometric distribution with success probability . Thus the expected number of iterations is .

Note that each iteration requires on the order of operations as it requires computing the rank of a -by- matrix.

Proof.

With Assumption 1 in place, a -tuple is linearly dependent if and only if it contains at least points from the subspace . Thus the Repeat statement above stops exactly when it found a -tuple that contains at least points. Moreover, also because of Assumption 1, the points within that tuple that are linear dependent must belong to . Therefore, the algorithm returns , and is therefore exact.

We now turn to the number of iterations. The number of iterations is obviously geometric and the success probability is the probability that a -tuple drawn uniformly at random contains at least points from . is that probability. Indeed, it is the probability that, when drawing balls without replacement from an urn with red balls out of total, the sample contains at least red balls. In the present context, the balls are of course the points and the red balls are the points on the linear subspace. ∎

Hardt and Moitra (2013) analyze their algorithm in a slightly different setting and with the goal of finding the maximum fraction of outliers that can be tolerated before the algorithm breaks down in the sense that it does not run in polynomial time. In particular, they show that, if , then their algorithm has a number of iterations with the geometric distribution with success probability at least , so that the expected number of iterations is bounded by , which is obviously polynomial in . In fact, it can be better than that. The following is a consequence of Proposition 2.

Corollary 1.

If, in addition to , it holds that , with and , for some fixed , then is bounded from below by a positive quantity that depends only on . Consequently, Algorithm 2 has expected number of iterations of order .

Proof.

Let

denote a random variable with the hypergeometric distribution with parameters

described above. Then , and it depends on . We show that is bounded from below irrespective of these parameters as long as the conditions are met. Noting that is increasing in and , and decreasing in and , it suffices to consider how varies along a sequence where and all varying with in such a way that and , as this makes the expected number of iterations largest. Define

The condition implies that , and along the sequence of parameters under consideration, . Moreover, along such a sequence, is standard normal in the limit, so that

using Slutsky’s theorem in the last line. ∎

2.3 Numerical experiments

We performed some small-scale numerical experiments comparing RANSAC (in the form of Algorithm 1), the Hardt-Moitra (HM) procedure (Algorithm 2), and the Geometric Median Subspace (GMS) of (Zhang and Lerman, 2014), which appears to be one of the best methods on the market. (We used the code available on Teng Zhang’s website.)

Each inlier is uniformly distributed on the intersection of the unit sphere with the underlying subspace. Each outlier is simply uniformly distributed on the unit sphere. The result of each algorithm is averaged over 1000 repeats. Performance is measured by the (first principal) angle between the returned subspace and the true subspace. (This is to be fair to GMS, as the other two algorithms are exact.) The results are reported in Table 

1.

We performed another set of experiments to corroborate the theory established in Proposition 1 for the complexity of RANSAC. The results are shown in Figure 2, where each setting has been repeated 1000 times. As expected, as the dimensionality of the problem increases, RANSAC’s complexity becomes quickly impractical.

parameters average system time difference in angle
RANSAC HM GMS RANSAC HM GMS
.0051 .0009 .0947 0 0 .0341
.0011 .0006 .1912 0 0 0
.0064 .0002 .5008 0 0 .0184
.0001 .0006 .1918 0 0 0
.0076 .0093 .0117 0 0 .0411
.6123 .0013 .0160 0 0 .2285
Table 1: Numerical experiments comparing RANSAC, HM, and GMS for the problem of subspace recovery. As in the text, is the dimension of the subspace, is the ambient dimension, is the number of inliers, is the number of outliers (so that is the sample size).
Figure 2: Average number of iterations for RANSAC (in the form of Algorithm 1) as a function of the subspace dimension and the ratio of sample size to number of inliers . The dashed lines are the averages from our simulation while the lines are derived from theory (Proposition 1).

3 Subspace clustering

We consider the setting of Section 1.2 and use the notation defined there. In particular, we work under Assumption 2. We consider the noiseless setting for simplicity. We also assume that all subspaces are of same dimension, denoted (so that for all ).

3.1 RANSAC for subspace clustering

We propose a simple RANSAC algorithm for subspace clustering. As before, any of the linear subspaces is determined by any -tuple that comes from that subspace. The algorithm starts by randomly selecting a -tuple and checking if this tuple forms a linear subspace of dimension . If so, one of the subspaces is recovered and all the points on the subspace are extracted from the data. Otherwise, the algorithm continues, repeatedly sampling a -tuple at random until that condition is met. The algorithm continues in this fashion until all the subspaces have been recovered. In this formulation, detailed in Algorithm 3, both and are assumed known.

Input : data points ; dimension ; number of subspace
Output :  linear subspaces of dimension , each containing at least points
1 for  do
2       repeat
3            randomly select a -tuple of data points
4      until the tuple is linearly dependent;
5      return the subspace spanned by the tuple
6       remove the points on that subspace from the data
7 end for
Algorithm 3 RANSAC (Subspace Clustering)

Again, the procedure is exact by design, since we are in the noiseless setting. Here too, researchers have not embraced RANSAC approaches because of their running time. We confirm this folklore in the following, where we assume for simplicity that all subspaces have the same number of points (so that for all ).

Proposition 3.

Algorithm 3 is exact and the number of iterations is has the distribution of , where the ’s are independent and has the geometric distribution with success probability . This is stochastically bounded by the negative binomial with parameters . Thus the expected number of iterations is bounded by , which is of order when and are held fixed.

The proof is very similar to that of Proposition 1 and is omitted.

Remark 4.

When the dimensions of the subspaces are unknown, a strategy analogous to that described in Remark 3 is of course possible. When the number of subspaces is unknown, a stopping rule can help decide whether there remains a subspace to be discovered. Details are omitted as such an approach, although natural, could prove complicated.

3.2 Adapting the algorithm of Hardt and Moitra for subspace clustering

Algorithm 3 consists in applying Algorithm 1 until a subspace is recovered, removing the points on that subspace, and then continuing, until all subspaces are recovered. An algorithm for subspace clustering can be based on the algorithm of Hardt and Moitra (2013) (Algorithm 2) instead. The resulting algorithm is suited for the case where . Based on the fact that Algorithm 2 has expected number of iterations bounded by , the resulting algorithm for subspace clustering has expected number of iterations bounded by . See Algorithm 4, where we assume that the number of subspaces is known, but do not assume that the dimensions of the subspaces are known (and they do not need to be the same).

Input : data points ; number of subspace
Output :  linear subspaces each with a number of points exceeding its dimension
1 for  do
2       repeat
3            randomly select a -tuple of data points
4      until the tuple is linearly dependent;
5      repeat
6             find the smallest number of linearly dependent points in the tuple
7             return the subspace spanned by these points
8             remove the points on that subspace from the data
9      until there are no more linearly dependent points in the tuple;
10 end for
Algorithm 4 Subspace Clustering based on the Hardt-Moitra Algorithm

The reason why we extract the smallest number of linearly dependent points at each step is to avoid a situation where a -tuple contains points from and points from (with ), in which case, assuming , these points are linearly dependent but do not span one of the subspaces. This particular step is, however, computationally challenging as it amounts to finding the sparsest solution to a -by- linear system, a problem known to be challenging (Tropp and Wright, 2010, Eq 1). One possibility is to replace this will finding the solution with minimum norm (Tropp and Wright, 2010, Eq 8). The use of the constraint is central to the method proposed by Elhamifar and Vidal (2009).

3.3 The algorithm of Chen and Lerman

The Spectral Curvature Clustering (SCC) algorithm of Chen and Lerman (2009b) is in fact of RANSAC type. The method was designed for the noisy setting and is therefore more sophisticated.222 Chen and Lerman (2009b) consider the case where the subspaces are affine, but we adapt their method to the case where they area linear. It is based on a function that quantifies how close a -tuple is from spanning a subspace of dimension or less. It is equal to 1 when this is the case and is strictly less than 1 when this is not the case. The algorithm draws a number, , of -tuples at random, where the -th tuple is denoted , and computes the matrix , where

It then applies a form of spectral graph partitioning algorithm to closely related to method of Ng et al. (2002). (The method assumes all subspaces are of same dimension , and both and are assumed known.)

In the noiseless setting, one could take to return 1 if the tuple is linearly dependent and 0 otherwise. In that case, is simply the number of -tuples among the that were drawn with whom both and are linearly dependent. Chen and Lerman (2009a) analyzes their method in a setting that reduces to this situation and show that the method is exact in this case.

3.4 Numerical experiments

We performed some numerical experiments to compare various methods for subspace clustering, specifically, RANSAC, Sparse Subspace Clustering (SSC) (Elhamifar and Vidal, 2009), Spectral Curvature Clustering (SCC) (Chen and Lerman, 2009b), and Thresholding-based Subspace Clustering (TSC) (Reinhard Heckel, 2015).

Each inlier is uniformly distributed on the intersection of the unit sphere with its corresponding subspace. Each outlier is simply uniformly distributed on the unit sphere. The result of each algorithm is averaged over 500 repeats. Performance is measured by the Rand index. The results are reported in Table 2.

parameters average system time rand index
RANSAC SSC SCC TSC RANSAC SSC SCC TSC
.0622 .3188 .3731 .1767 1 .8407 .9997 .7237
.8810 .2813 .8257 .1815 1 .6490 .8322 .5829
.2108 .3974 .4563 1 .7619 .9689
.3425 .6163 .9395 .3282 1 .9206 .9548 .7578
17.1827 .3185 1.4141 .2019 1 .6148 .6904 .5688
Table 2: Numerical experiments comparing RANSAC, SSC, SCC, and TSC for the problem of subspace clustering. As in the text, is the dimension of the subspaces (assumed to be the same), is the ambient dimension, is the number of subspaces, is the number of inliers per subspace (assumed to be the same), is the number of outliers (so that is the sample size).

4 Discussion and conclusion

In our small scale experiments, RANSAC is seen to be competitive with other methods, at least when the intrinsic dimensionality is not too large and when there are not too many outliers (or too many underlying subspaces) present in the data. This was observed both in the context of subspace recovery and in the context of subspace clustering.

References

  • Candès et al. (2011) Candès, E. J., X. Li, Y. Ma, and J. Wright (2011). Robust principal component analysis? Journal of the ACM (JACM) 58(3), 11.
  • Chen and Lerman (2009a) Chen, G. and G. Lerman (2009a).

    Foundations of a multi-way spectral clustering framework for hybrid linear modeling.

    Foundations of Computational Mathematics 9(5), 517–558.
  • Chen and Lerman (2009b) Chen, G. and G. Lerman (2009b). Spectral curvature clustering (scc). International Journal of Computer Vision 81(3), 317–330.
  • Elhamifar and Vidal (2009) Elhamifar, E. and R. Vidal (2009). Sparse subspace clustering. In

    Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on

    , pp. 2790–2797. IEEE.
  • Fischler and Bolles (1981) Fischler, M. A. and R. C. Bolles (1981). Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24(6), 381–395.
  • Hardt and Moitra (2013) Hardt, M. and A. Moitra (2013). Algorithms and hardness for robust subspace recovery. In Conference on Learning Theory (COLT), Volume 30, pp. 354–375.
  • Huber and Ronchetti (2009) Huber, P. J. and E. M. Ronchetti (2009). Robust Statistics (2nd edition). Wiley.
  • Maronna (1976) Maronna, R. A. (1976). Robust M-estimators of multivariate location and scatter. The Annals of Statistics 4(1), 51–67.
  • Ng et al. (2002) Ng, A. Y., M. I. Jordan, and Y. Weiss (2002).

    On spectral clustering: Analysis and an algorithm.

    In Advances in neural information processing systems, pp. 849–856.
  • Reinhard Heckel (2015) Reinhard Heckel, Helmut Bolcskei, D. o. I. . E. E. Z. S. (2015). Robust subspace clustering via thresholding. IEEE Transactions on Information Theory 61.
  • Schattschneider and Green (2012) Schattschneider, R. and R. Green (2012). Enhanced ransac sampling based on non-repeating combinations. In Proceedings of the 27th Conference on Image and Vision Computing New Zealand, pp. 198–203. ACM.
  • Tropp and Wright (2010) Tropp, J. A. and S. J. Wright (2010). Computational methods for sparse solution of linear inverse problems. Proceedings of the IEEE 98(6), 948–958.
  • Tyler (1987) Tyler, D. E. (1987). A distribution-free m-estimator of multivariate scatter. The Annals of Statistics, 234–251.
  • Vidal (2011) Vidal, R. (2011). Subspace clustering. IEEE Signal Processing Magazine 28(2), 52–68.
  • Vidal et al. (2005) Vidal, R., Y. Ma, and S. Sastry (2005). Generalized principal component analysis (gpca). IEEE Transactions on Pattern Analysis and Machine Intelligence 27(12), 1945–1959.
  • Wright et al. (2009) Wright, J., A. Ganesh, S. Rao, Y. Peng, and Y. Ma (2009). Robust principal component analysis: Exact recovery of corrupted low-rank matrices via convex optimization. In Advances in neural information processing systems, pp. 2080–2088.
  • Zhang and Lerman (2014) Zhang, T. and G. Lerman (2014). A novel M-estimator for robust PCA.

    The Journal of Machine Learning Research

     15(1), 749–808.