Latent RANSAC

02/20/2018 ∙ by Simon Korman, et al. ∙ General Motors 0

We present a method that can evaluate a RANSAC hypothesis in constant time, i.e. independent of the size of the data. A key observation here is that correct hypotheses are tightly clustered together in the latent parameter domain. In a manner similar to the generalized Hough transform we seek to find this cluster, only that we need as few as two votes for a successful detection. Rapidly locating such pairs of similar hypotheses is made possible by adapting the recent "Random Grids" range-search technique. We only perform the usual (costly) hypothesis verification stage upon the discovery of a close pair of hypotheses. We show that this event rarely happens for incorrect hypotheses, enabling a significant speedup of the RANSAC pipeline. The suggested approach is applied and tested on three robust estimation problems: camera localization, 3D rigid alignment and 2D-homography estimation. We perform rigorous testing on both synthetic and real datasets, demonstrating an improvement in efficiency without a compromise in accuracy. Furthermore, we achieve state-of-the-art 3D alignment results on the challenging "Redwood" loop-closure challenge.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

Code Repositories

LatentRANSAC

Latent RANSAC implementation, based on USAC


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Despite the recent success of (deep-) learning based methods in computer vision, numerous applications still use “old-fashioned” robust estimation methods for model fitting, such as RANSAC

[19]. This is especially true for problems of a strong geometric nature such as image alignment, camera localization and

D reconstruction. Robust estimation methods of these types largely follow the “hypothesize and test” paradigm which has strong roots in statistics, and are highly attractive due to their ability to fit a model to data that is highly corrupted with outliers. Additionally, they have been successfully applied to many problems in computer vision and robotics achieving real time performance.

As an example, in the field of image (or shape) alignment, novel features and descriptors have been introduced to facilitate matching, including ones that are learned. However, once these features are matched, for a parametric model to be fitted, robust estimation methods like RANSAC are used to cope with corrupted sets of putative matches.

Geometric models that are commonly amenable to such a robust estimation process include: D-homography, camera localization, the essential and fundamental matrices that describe epipolar constraints between images, rigid D motion and more.

Figure 1: 3D alignment result of two methods on a pair of fragments from the “Redwood” dataset [11]. Our method (right) produces a correct alignment, even though the putative matches contain a mere of inliers, which is very challenging. On the left, we see a failure case of the method from [36], even though they manage to increase inlier rate up to . Runtimes for this example are ms for our method, and ms for [36].

1.1 Background and prior art

Consensus maximization has proven a useful robust estimation approach to solving a wide variety of fitting and alignment problems in computer vision.

Research in this field can be broadly divided into global and local optimization methods. Global methods [27, 35, 10, 8] use different strategies to explore the entire solution space enjoy the advantage of having a deterministic nature. Our method, however, belongs to the family of local methods which are typically extremely fast randomized algorithms, potentially equipped with probabilistic success guarantees.

While the proposed method is presented in the context of RANSAC, it is closely related-to and inspired-by other works in the field, such as Hough voting. We cover these topics briefly.

Figure 2: A flow chart of RANSAC compared to the suggested method. We propose an alternative flow (in blue), in which after a hypothesis is generated it first undergoes a hash procedure, and only verified if a (valid) collision is detected. The stop criterion has to be modified as well, to ensure a second

good hypothesis is drawn with high probability.

RANdom SAmple Consensus (RANSAC) [19]

is one of the de-facto golden standards for solving robust estimation problems in a practical manner. Under this paradigm, the space of solutions is explored by repeatedly selecting random minimal subsets of a set of given measurements (e.g. putative matches), for which a model hypothesis is fitted. These hypotheses are verified by counting the measurements that agree with them up to a predefined tolerance. This process is repeated until a desired probability to draw at-least one pure set of inliers is achieved.

Several techniques that extend the ‘naive’ RANSAC procedure described above, are covered in a recent comprehensive survey by Raguram et al. [28]. This survey also suggests USAC – a framework that combines some RANSAC extensions, yielding excellent results on a variety of problems in terms of accuracy, efficiency, and stability. Two notable extensions are PROSAC [12], which incorporates prior knowledge on the measurements by prioritizing their sampling accordingly, as well as LO-RANSAC [14] which performs efficient local optimization on a model that has been verified.

While the previous extensions can be seamlessly applied along with the suggested method, the following extension is similar in nature to ours in that it aims to speed up the verification step, but it does so in a very different manner. The Sequential Probability Ratio Test (SPRT) [13] extension, adopted by USAC, is based on Wald’s sequential test [32]. It attempts to reject a “bad” model with high probability, after inspecting only a small fraction of the measurements. While the test is theoretically solid, it relies on two parameters that are assumed to be known a priori, and in practice need to be adaptively estimated at runtime. It is reported to have achieved an improvement of 20% in evaluation time compared to the simpler bail-out test [9].

Generalized Hough transform (GHT)

originated from an algorithm for line detection in images [21], which was later generalized to handle arbitrary shapes [18, 7]. The key idea behind this method is that partial observations of the model are casted as votes into a (quantized) solution space, in which the object can be detected as a mode (the location with the most votes). In practice, GHT has not been shown to scale well to solution spaces of high dimensionality (i.e. higher than ), and typically requires numerous votes for a mode to be accurately detected.

Between RANSAC and GHT.

Some works bare resemblance to both of the mentioned approaches. Our work can be seen as one of these: While it fits naturally into the RANSAC pipeline, it has some similarities to GHT in the sense that it seeks to find the mode in the parameter domain, only that it needs as few as two votes to detect it.

A method by Den-Hollander et al. [15]

also lays somewhere between RANSAC and GHT: To increase the probability of obtaining a pure set of inlier matches, a sub-minimal set is drawn. The remaining degrees of freedom (DoF) are resolved using a voting scheme in a low-dimensional setting. As with all Hough-like methods, an adequate parameterization of the remaining DoF is required. The authors of

[15] provide such a parameterization for the problem of fundamental matrix estimation.

Our method bares a strong resemblance to the “Randomized Hough Transform” (RHT) [33] of Xu et al. in that a vote is casted into a single cell in the solution domain, generated from a randomly selected minimal set. However, unlike [33], we deal with a hypothesis in constant time and space, rather than logarithmic, thanks to the Random Grid hashing mechanism that we adapt. In addition, while RHT deals with robust curve-fitting (of up to 3 dimensions), we successfully apply our method on a variety of problem domains of higher dimensionality (up to 8 dimensions).

1.2 Contributions

The main novelty of the presented method is its ability to handle RANSAC hypotheses in constant time, regardless of the number of measurements (e.g. matches). We show that it is beneficial to handle hypotheses in the latent space, due to an efficient parametrization and hashing scheme that we devise, which can quickly filter candidate hypotheses until a pair of correct ones are drawn. While this approach comes at the expense of a small increase in the number of hypotheses to be examined, it allows for a significant speedup of the RANSAC pipeline.

The new proposed modifications to RANSAC are accompanied by a rigorous analysis which results in an updated stopping criterion and a well understood overall probability of success. Finally, we validate our method using challenging data in the problems of 2D-homography estimation, 2D-3D based camera localization and rigid-3D alignment, showing state-of-the-art results.

2 Method

The ‘vanilla’ RANSAC pipeline can be divided into three main components: hypothesis generation, hypothesis verification and the adaptive stopping mechanism. The proposed Latent-RANSAC hypothesis handling fits naturally into the aforementioned pipeline, as can be seen in Figure 2, highlighted in blue. The additional modules we propose act as a ‘filter’ that avoids the need to verify the vast majority of generated hypotheses: Instead of verifying each hypothesis by applying it on all of the matches (a costly process that takes time linear in the number of matches), we check in constant time if a previously generated hypothesis ‘collides’ with the current one, i.e. whether they are close enough (in a sense that will be clarified below). Only the very few hypotheses that pass this filtering stage progress to the verification stage for further processing. As a result of the proposed change, the RANSAC stopping criterion needs to modified to guarantee a probability of encountering a second good hypothesis rather than just one.

Outline

We begin by covering the key components of our method: parametrization of the solution space (Section 2.1), Random Grids hashing (Section 2.2) and the modified stopping criterion (Section 2.3). We conclude this part of the paper in Section 2.4, with an analysis of our Random Grids hashing process.

Preliminary definitions

In our setup, the goal is to robustly fit a geometric model (transform) to a set of matches (correspondences), w.l.o.g. in Euclidean space, where a match

is an ordered pair of points

and . For a geometric transform and match the residual error of the match m with respect to is the Euclidean distance in given by:

(1)

Given a set of matches and a tolerance , the inlier rate achieved by a transform is defined as the fraction of matches for which . We denote the maximal inlier rate for a match-set by .

2.1 Parametrization of the solution domain

In the RANSAC pipeline, matches are used both for the generation of hypothesis candidates, as well as for their screening. Since our approach performs the majority of the screening according to some ’similarity’ in the space of transformations, we seek a parametrization of the transformation space in which distances between transformations can be defined explicitly. More formally, we define such a parametrization by an embedding hypotheses into some -dimensional space , which we call the latent space111the latent space dimension typically being the number of degrees of freedom of the transformation space.. We consider the distance between transformations to be given by the

metric between the embedded (or latent) vectors (

-tuples). Our goal is to use an embedding in which the distance between any pair of hypotheses and is tightly related to the difference in the way these hypotheses act on matches in the source domain, i.e. to the difference in magnitudes of their residual errors on the matches. Ideally, for any set of matches ,

(2)

2D homography.

We describe here the parameterization we use for the space of D homographies, which are given by projective matrices in . Following previous works (e.g. [24, 16]), we use the 4pt parameterization [6] that represents a D-homography by an 8-tuple , defined by the coordinates in the target image that are the result of applying on the four corners of the source image, as illustrated in Figure 3.

[width=]homog_hash.pdf source imagetarget image

Figure 3: Illustration of the 4pt homography parametrization [6]. A homography is represented in the latent space by mapping the location of the four corners of the source image onto the target image, resulting in the 8-tuple .

As was noted in [24], this parametrization has the key property that the difference between match errors of two well-behaved homographies is bounded by the distance between their pt representations.

The special Euclidean group ,

used to describe rigid motion in , will be used here to solve the problems of Perspective-n-Point (PnP) estimation and Rigid D alignment. We follow a parametrization that was suggested and used in a line of works of Li et al. [34, 8]. The group can be described as the product between two sub-groups , namely D translations and the special orthogonal group (D rotations). Each of these 3-dimensional sub-groups is parameterized as a 3-tuple, resulting in a 6-tuple representation defined as follows: The axis-angle vector (3-tuple) represents the D rotation matrix given by , where is the matrix exponential and

denotes the skew-symmetric matrix representation. Such vectors

reside in the radius- ball that is contained in the D cube . The translation 3-tuple is a vector in the cube that contains the relevant bounded range of translations for a large enough .

Similar to the case of the D-homography parametrization, it is proved in [34] that the difference between match errors of two rigid motions is bounded by the distance between their parametrization.

2.2 Random Grids hashing

Given such a embedding of a generated hypotheses, the heart of our method boils down to a nearest neighbor query search of the current vector through all vectors representing previously generated hypotheses. More precisely, the task needed to be performed is a range search query for vectors that are at a distance of up to a certain tolerance .

A recent work of Aiger et al. [2] turns out to be extremely suitable for this task. They propose Random Grids - a randomized hashing method based on a very simple idea of imposing randomly shifted ‘grids’ over the vector space, checking for vectors that ‘collide’ in a common cell. The Random Grids algorithm is very fast, and simple to implement - even in comparison with the closely related LSH-based algorithms [4], since the grid is axis aligned and it is uniform (consists of cells in with equal side length). Most important, and essential for the speed of our method, is that the range search is done in constant time (i.e. it does not depend on the number of vectors searched against), as opposed to the RANSAC hypothesis validation that requires applying the model and measuring errors on (typically hundreds of) point matches or even the logarithmic-time solution proposed in RHT [33].

Hashing scheme.

We are given a representation of the transform as a vector (for a -dimensional parameterization). In the Random Grids [2] setting, we hash v into hash tables , each associated with an independent random grid, which is defined by a uniform random shift , where is the cell side length and is the dimension of the latent vector v. The cell index for v in the table is obtained by concatenating the integer vector into a single scalar (where means “floor” operation). The entire hashing process - initialization, insertion and collision checking, is given in detail in Algorithm 1.

input: (incremental) A candidate transform (matrix)
parameters: number of tables ; tolerance ; cell dim. ; parametrization dim. ; initialization:
foreach  do
       1. Initialize an empty hash table .
       2. Randomize offset
end foreach
insertion and collision check for hypothesis :
foreach  do
       1. Let v be the embedding of
       2. The hash index for v is:
       3. If the cell is occupied by a vector u, report a collision of if
      4. Store v in
end foreach
Algorithm 1 Latent-RANSAC hypothesis handling.

2.3 Latent-RANSAC stopping criterion

The classical analysis of RANSAC provides a simple formula for the number of iterations required to reach a certain success probability (e.g. 0.99). It is based on the assumption that it is sufficient to have a single ‘good’ iteration in which a pure set of inliers is drawn. Note that this assumption is made for the simplicity of the analysis and is only theoretical, since it ignores e.g. the presence of inlier noise and several possible degeneracies in the data.

Formally, let

be the random variable that counts the number of such good iterations out of

attempts. For a minimal set of size and data with inlier rate of , it holds that

(3)

where . The number of iterations required to guarantee a desired success probability is therefore:

(4)

A similar simplified analysis can be applied to the Latent-RANSAC scheme. Ignoring the presence of inlier noise, the existence of (at least) two ‘good’ iterations is needed for a collision to be detected and the algorithm to succeed. Therefore, by the binomial distribution we have that

(5)

Based on equations (4) and (5), we plot in Figure 4 the ratio between the number of required iterations in the case of Latent-RANSAC versus the case of RANSAC. The ratio is given as a function of the inlier rate , at 3 different success rates (color coded), for the two different cases (e.g. in Rigid D motion estimation) and (e.g. in homography estimation). Interestingly, the ratio attains a small value (less than 2) for inlier rates below , and converges to small constant values as the inlier rate decreases. The very high inlier rates for which the ratio is large are of no concern, since the absolute number is extremely low in this range.

Figure 4: The ratio between stopping criterions of Latent-RANSAC (5) and of RANSAC (4). Ratios are shown as a function of inlier rate for several success probabilities (color coded) and for (minimal sample size) values of 3 (dashed) and 4 (solid). See text for details.

In the next section, as part of an analysis of the Random Grid hashing, we derive a more realistic stopping criterion that depends also on the success probability of the Random Grid based collision detection, which clearly depends on the inlier noise level.

2.4 Random Grid analysis

We cover two aspects of Random Grids. First, we extend the stopping criterion from Section 2.3 to consider the probability that a colliding pair of good hypotheses will be detected. Next, we discuss causes of false collision detection, which can have an affect on the algorithm runtime.

stopping criterion.

Let be the event that the random grid component succeeds (detects a collision), given good iterations out of a total of . We can now update Equation (5), taking this success probability into account:

(6)

where the inequality holds due to the fact that monotonically increases with .

A final lower bound on (from which the stopping criterion is determined) can be obtained by substituting the expression for from (5) into (6) together with a lower bound on which we provide next.

Recall that is the event that the random grid hashing succeeds given that two successful hypotheses were generated. We will, more explicitly, denote this event by , for a random grid that uses hash tables.

The analysis in [2] is rather involved since it deals with the Euclidean distance. Using distances we are able to derive the success probability of finding a true collision in our setup, as a function of the random grid parameters, in a simpler manner. Assuming a tolerance in the latent space, determined by (inlier) noise level of the data, using a random grid with cell dimension and a single table results in

(7)

since a pair of pure-inlier transformations (which differ by at most ) must share the same independently offsetted bin indices in each of the dimensions.

Finally, using hash tables, randomly and independently generated, we obtain:

(8)

False collisions.

We now discuss the expected number of false collisions that are found by the hashing scheme. It is important to understand why false collision might happen, as they have an effect on the overall runtime of our pipeline.

Recall that is the overall number of iterations of the pipeline, and hence it is also the total number of samples inserted into each hash table. There are two kinds of false collisions to consider. The first kind happens due to the fact that the random grid cell size might be larger than the tolerance . Following the recommendation in [1] we set the cell size to be not much larger than the tolerance , resulting in a small number of such false collisions. In any case, this kind of collision has a small impact on the runtime, since it will be filtered by the tolerance test (step 3 in Algorithm 1) at constant time cost.

The second kind of false collision is one that passes the tolerance test (step 3). Since it is not the true model, it is associated with some inlier rate . If , the probability of this collision appearing before we have reached the stopping criterion is negligible. Empirically, we observe very few (typically less than ) collisions that pass the tolerance test up to the stopping of the algorithm. These are the only kind of collisions that incur a non-negligible penalty (in runtime only) since they invoke the verification process that every “vanilla” RANSAC hypothesis goes through.

3 Results

In order to evaluate our method, we performed extensive tests on both real and synthetic data. The Latent-Ransac algorithm is applied to the problems of D-homography estimation (Section 3.1), Perspective-n-Point (PnP) estimation (Section  3.2) and Rigid D alignment (Section 3.3). It is compared with ordinary RANSAC, with or without the well known SPRT [13] extension, which is a very different technique for accelerating RANSAC’s model verification phase.

Implementation details.

Our method naturally extends the standard RANSAC pipeline, according to the changes highlighted in Figure 2. Our implementation (for which we use the shorthand LR) extends the excellent C++ RANSAC implementation USAC [28], with the noted changes in the specific modules. This enables an easy way to compare with a state-of-the-art RANSAC implementation, and allows our method to enjoy the same extensions used by USAC (such as its local optimization component LO-RANSAC [14]). In addition, their implementation includes the SPRT [13] extension, the most commonly used acceleration of RANSAC’s model verification phase. We use the shorthand SPRT to refer to RANSAC using this extension.

In addition we use the OpenGV library [22] for PnP model fitting (Kneip’s P3P algorithm [23]) and for Rigid D model fitting (Arun’s algorithm [5]). We make our modifications to the code available at anonymized.

Parameter were tuned by a simple coordinate descent on a subset of the data and kept fixed throughout. Parameters common to all settings: probability of success , number of hash tables ; Random Grid cell size , where the tolerance (and the RANSAC threshold used) are specified separately for each experiment; Maximal number of iterations ; Hash table size of , resulting in addressing indices of bits (see the supplementary materials for hash table implementation details).

[width=0.9]IR_CDF.pdf [11]Redwood[30]ZuBuD

Figure 5: Inlier rate cumulative distribution (CDF) of the two real data sets we use. The dashed curve was taken only over Redwood pairs with provided ground truth.

3.1 2D-homography estimation

Figure 6: Runtime breakdown per pipeline module comparing LR to SPRT and RANSAC. These are average per-instance runtimes (in seconds) for D-homography estimation (left) and PnP estimation (right) taken over each of the entire data-sets used for the evaluation.

We create a large body of D-homography estimation instances using the Zurich buildings data-set [30]. The data-set consists of sequences of 5 snapshots taken from different street-level viewpoints for 201 buildings or houses. The images are typically dominated by planar facade structures and hence each pair of images in a sequence is related by a D homography (or perhaps more than one in the case of several planes).

We computed SIFT [25] features for each image and created sets of corresponding features for each of the 10 (ordered) pairs of images in a sequence using the VLFeat library [31]. Then, we ran both RANSAC and LR on each pair 10 times and saved the highest detected inlier rate of any of the fitted homographies as the ’optimal’ inlier rate for the image pair. A small set of image pairs (168 out of 2010) with very low was manually removed from the evaluation since the inlier feature locations did not reside on an actual single plane in the scene, and were very noisy.

The 1842 resulting matching instances are challenging: many pairs have low inlier rates, that result from (i) the planar area of interest typically covering only part of each image; (ii) large viewpoint changes; (iii) large presence of repetitive patterns (e.g. windows or pillars). See Figure 5 for the distribution of inlier rates for this data-set.

We ran LR with a tolerance of pixels in the latent domain and RANSAC and SPRT with a threshold of pixels. These parameters were chosen to give the best results, which are summarized in Table 1. We arrange the image pairs into four groups according to their ’optimal’ inlier rate (defined above), and the size of each group is shown at the bottom of the table. For each group we report the average, and -percentile of runtimes for each method. We also report each method’s success rate (averaged over all pairs in the group), which is the ratio between the detected inlier-rate and the ’optimal’ inlier rate (i.e. the highest one from any method).

inlier rate range (in %)
measure method 0-10 10-20 20-40 40-100
runtime avg. (95%) RANSAC 1,536 (6,627) 56 (153) 8 (16) 8 (15)
SPRT 1,157 (4,994) 34 (95) 6 (11) 9 (16)
LR 1,102 (4,559) 38 (92) 8 (13) 12 (18)
success RANSAC 98.01% 98.20% 98.75% 99.32%
SPRT 97.60% 98.39% 98.78% 99.50%
LR 97.50% 98.47% 98.90% 99.65%
# of pairs 210 390 650 592
Table 1: D Homography fitting on Zurich Buildings [30]. Best results are shown in bold. See text for further details.

As can be seen, SPRT and LR (modestly) accelerate RANSAC at the harder inlier rate ranges, where the overall runtime is longer. LR achieves the best acceleration for the lowest range, while SPRT does better in the 20-40 range. In terms of accuracy - both SPRT and LR are in par with RANSAC, achieving high success rates for the different inlier rate groups.

The detailed runtime breakdowns shown in Figure 6 (left) reveals two important points that should be made here. First, in homography estimation, methods that accelerate the RANSAC evaluation stage (i.e. LR and SPRT) have a relatively small potential improvement gap since the runtime of RANSAC-based homography estimation is dominated by the model fitting stage (this is not the case for the other problems we deal with, as will be seen later). Second, the improved acceleration of LR in the lower ranges is significant when considering the overall time taken to fit the entire data-set, since the majority of time is spent on these difficult cases which are surprisingly not rare (11% and 21% of all pairs are in the 0-10 and 10-20 ranges respectively).

3.2 Perspective-n-Point (PnP) estimation

In order to compare methods on the PnP problem (camera localization from D-D correspondences), we follow the benchmark proposed in [22], using a large bank of synthetic instances with controlled values of inlier-rate and inlier-noise . Specifically, a synthetic inlier set is created by first picking a random camera position in D, followed by projecting random D locations (within the camera view) to D angular coordinates, followed by adding noise . The fraction of outlier matches are generated similarly, except that the D locations are distributed uniformly in the image. Each instance contains such D-D correspondences, generated in this manner.

We chose to use Kneip’s algorithm [23] for minimal sample P3P model fitting (for all methods) due to its good accuracy-efficiency trade off compared to other alternatives. Since only matches are needed for the minimal sample fitting problem, RANSAC generally needs very few iterations to robustly reveal the true underlying model in typical cases. We therefore add noise at levels of to and consider challenging inlier rates in the range between and .

We ran RANSAC and SPRT with a threshold of radians, LR with a tolerance of radians (in the latent domain), and the translation-to-angle ratio of the embedding was . The results, summarized in table 2, show that LR achieve more than an order of magnitude acceleration compared to RANSAC, at only a minor loss in accuracy in all configurations. SPRT achieves similar acceleration factor at the higher inlier rates ( and ). It is not as efficient at the lower range ( and ) and much less accurate at the lowest inlier rate of .

For the PnP case, Figure 6 (right) reveals that the huge acceleration happens due to the vast majority (over 95%) of time spent by RANSAC on model verification, which is practically eliminated by LR. This is possible mainly due to the existence of fast fitting algorithms (e.g. [23]) for the P3P problem.

inlier rate (%)
noise level method 3 5 10 20
RANSAC 34,2133,800 7,1091,560 887106 11420
SPRT 60,0301,639 874235 7114 101
LR 2,404733 439289 6012 133
RANSAC 34,1854,479 7,1601,218 894196 11528
SPRT 61,19420,602 954323 7518 102
LR 2,374452 588208 6318 122
RANSAC 30,8804,434 73021039 898104 11828
SPRT 57,46124,507 1,002188 8111 123
LR 2,635886 554176 7419 145
success RANSAC 101.11% 99.03% 98.12% 97.73%
SPRT 28.94% 100.00% 98.17% 97.62%
LR 93.06% 98.17% 97.63% 96.95%
Table 2: Results for PnP estimation on synthetic data. Runtimes are given in median std format, taken over the random instances used for each noise-level/inlier-rate combination. Similarly, average success rates are reported (for each instance we count the number of detected inliers as a percentage of the true number of inliers).

3.3 Rigid 3D alignment

To evaluate the Rigid D alignment application of Latent-RANSAC, we use the registration challenge of the recent “Redwood” benchmark proposed by Choi et al. [11]. This dataset was generated from four synthetic D scenes, each divided into point-cloud fragments on average. While from synthetic origin, these fragments contain high-frequency noise and low-frequency distortion that simulate scans created by consumer depth cameras.

The challenge is to perform global D registration between every pair of fragments of a given scene, in order to provide candidate pairs for trajectory loop closure. A correctly ‘detected’ pair is one for which the point clouds overlap by at least and the reported transformation is sufficiently accurate (see [11] for details). The main goal in this benchmark, as stated by [11], is to achieve high recall while relying on a post-process to later remove false-matches.

Aside from the benchmark, Choi et al. [11] present a simple extension (CZK) to the Point-Cloud-Library (PCL) [20] implementation of [29]. The method of CZK showed state-of-the-art performance, while comparing to previous methods like OpenCV [17], 4PCS [3] and its extension super4PCS [26]. Fast Global Registration (FGR) [36] is a recent novel optimization process presented by Zhou et al., which achieves an order of magnitude runtime acceleration on this dataset, at a competitive recall-precision performance. They perform the costly nearest-neighbor (NN) search only once (unlike previous methods which use them in their inner loop), while introducing several fast and simple methods to filter false matches.

We chose to follow [11, 20, 36] and feed our framework with putative matches based on FPFH features [29], using a tolerance of cm in the latent space, and translation-to-angle ratio of for the embedding. Inspired by FGR we perform the NN search only once. We then only apply one single filter (also used in [11, 36]), an approximate congruency validation on each minimal sample drawn.

Figure 7 shows a comparison of Latent RANSAC to other results reported in [36]

. Our method clearly achieves the highest recall value (the main goal), at a precision slightly below that of CZK. Furthermore, we are able to dominate all previous results in precision and recall simultaneously by using a slightly stricter setting, when reporting only pairs with overlap of over

rather than .

[width=0.45]redwood_prre.pdf [17]OpenCV[3]4PCS[26]super4PCS[20]PCL[36]FGR[11]CZKLRLR (strict)

Figure 7: Performance on the “redwood” benchmark [11]. Our method achieves state-of-the-art recall in the standard setting (marked by a red ‘x’), while using a stricter threshold (marked by a red ‘+’) dominates all previous result in both precision and recall. See description in the text for further details.

We attribute our high performance mainly to the fact that we perform almost no filtering to the putative matches, such as the bidirectional search and tuple filtering done in [36], normal-agreement in [11, 20] or the drawing of a non-minimal set of 4 matches in [11]. Using this “naive” nearest neighbor FPFH feature matching avoids filtering of true correspondences (enabling higher recall), but this comes at the cost of some very low inlier-rates, as can be seen in Figure 5. Our algorithm is able to deal with such inlier rates successfully (and efficiently), as was shown in the other experiments of this section and in Figure 1.

Another attractive property of our method in this benchmark is its runtime, presented in Table 3. Our runtime is close to that of FGR, which we outperform significantly in terms of recall. Note, however, that our method is actually faster than FGR whenever the inlier rate is above , as the number of iterations given by (5) is very low. Verification consumes a considerable part of the runtime of methods like [11, 20], while we perform the costly overlap verification only upon the detection of a collision ( times per run on average). Additionally, we perform overlap calculation only once as done in FGR.

method PCL [20] CZK [11] FGR [36] LR
avg. time (sec) 0.21
Table 3: Average runtimes on the “redwood dataset”, excluding normals and FPFH [29] calculation time which are ms and ms on average, respectively. A breakdown of our method’s timing includes ms for feature matching, ms for the latent RANSAC pipeline, and ms for overlap calculation.

4 Future work

In this work we presented Latent-RANSAC: a novel speed-up of the hypothesis handling stage of the RANSAC pipeline. We have shown its advantages on challenging matching problems, that include very low inlier rates, in the domains of homography estimation, camera localization and rigid 3D motion estimation.

Latent-RANSAC has the potential to be extended to additional domains. Of particular interest is finding an appropriate parametrization of the more challenging fundamental matrix domain, which is classically tackled using RANSAC.

The good results that Latent-RANSAC achieves on the ”Redwood” benchmark come to show the advantage of being able to handle highly corrupted ”raw”s data (over 60% of the fragment pairs have under 10% inlier rate). This is since the alternative of filtering the data to reduce the rate of outliers comes at the risk of loss of informative data. The challenge, however, remains to do so efficiently, especially for search spaces of high dimensionality.

References

  • [1] D. Aiger, H. Kaplan, and M. Sharir. Reporting neighbors in high-dimensional euclidean space. SIAM Journal on Computing, 43(4):1363–1395, 2014.
  • [2] D. Aiger, E. Kokiopoulou, and E. Rivlin. Random grids: Fast approximate nearest neighbors and range searching for image search. In Proceedings of the IEEE International Conference on Computer Vision, pages 3471–3478, 2013.
  • [3] D. Aiger, N. J. Mitra, and D. Cohen-Or. 4-points congruent sets for robust pairwise surface registration. ACM Transactions on Graphics (TOG), 27(3):85, 2008.
  • [4] A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In Foundations of Computer Science, 2006. FOCS’06. 47th Annual IEEE Symposium on, pages 459–468. IEEE, 2006.
  • [5] K. S. Arun, T. S. Huang, and S. D. Blostein. Least-squares fitting of two 3-d point sets. IEEE Transactions on pattern analysis and machine intelligence, (5):698–700, 1987.
  • [6] S. Baker, A. Datta, and T. Kanade. Parameterizing homographies. Technical Report CMU-RI-TR-06-11, 2006.
  • [7] D. H. Ballard. Generalizing the hough transform to detect arbitrary shapes. Pattern recognition, 13(2):111–122, 1981.
  • [8] D. Campbell, L. Petersson, L. Kneip, and H. Li. Globally-optimal inlier set maximisation for simultaneous camera pose and feature correspondence. In The IEEE International Conference on Computer Vision (ICCV), Oct 2017.
  • [9] D. P. Capel. An effective bail-out test for ransac consensus scoring. In BMVC, 2005.
  • [10] T.-J. Chin, P. Purkait, A. Eriksson, and D. Suter. Efficient globally optimal consensus maximisation with tree search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2413–2421, 2015.
  • [11] S. Choi, Q.-Y. Zhou, and V. Koltun. Robust reconstruction of indoor scenes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
  • [12] O. Chum and J. Matas. Matching with prosac-progressive sample consensus. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 1, pages 220–226. IEEE, 2005.
  • [13] O. Chum and J. Matas. Optimal randomized ransac. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(8):1472–1482, 2008.
  • [14] O. Chum, J. Matas, and J. Kittler. Locally optimized ransac. In Pattern Recognition, pages 236–243. Springer, 2003.
  • [15] R. J. Den Holl and E. A. Hanjalic. A combined ransac-hough transform algorithm for fundamental matrix estimation. In in 18th British Machine Vision Conference. University of. Citeseer, 2007.
  • [16] D. DeTone, T. Malisiewicz, and A. Rabinovich. Deep image homography estimation. arXiv preprint arXiv:1606.03798, 2016.
  • [17] B. Drost, M. Ulrich, N. Navab, and S. Ilic. Model globally, match locally: Efficient and robust d object recognition. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 998–1005. Ieee, 2010.
  • [18] R. O. Duda and P. E. Hart. Use of the hough transformation to detect lines and curves in pictures. Communications of the ACM, 15(1):11–15, 1972.
  • [19] M. A. Fischler and R. C. Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6):381–395, 1981.
  • [20] D. Holz, A. E. Ichim, F. Tombari, R. B. Rusu, and S. Behnke. Registration with the point cloud library: A modular framework for aligning in 3-d. IEEE Robotics & Automation Magazine, 22(4):110–124, 2015.
  • [21] P. V. Hough. Machine analysis of bubble chamber pictures. In International conference on high energy accelerators and instrumentation, volume 73, page 2, 1959.
  • [22] L. Kneip and P. Furgale. Opengv: A unified and generalized approach to real-time calibrated geometric vision. In Robotics and Automation (ICRA), 2014 IEEE International Conference on, pages 1–8. IEEE, 2014.
  • [23] L. Kneip, D. Scaramuzza, and R. Siegwart. A novel parametrization of the perspective-three-point problem for a direct computation of absolute camera position and orientation. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 2969–2976. IEEE, 2011.
  • [24] R. Litman, S. Korman, A. Bronstein, and S. Avidan. Inverting ransac: Global model detection via inlier rate estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5243–5251, 2015.
  • [25] D. G. Lowe. Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2):91–110, 2004.
  • [26] N. Mellado, D. Aiger, and N. J. Mitra. Super 4pcs fast global pointcloud registration via smart indexing. In Computer Graphics Forum, volume 33, pages 205–215. Wiley Online Library, 2014.
  • [27] C. Olsson, O. Enqvist, and F. Kahl. A polynomial-time bound for matching and registration with outliers. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1–8. IEEE, 2008.
  • [28] R. Raguram, O. Chum, M. Pollefeys, J. Matas, and J.-M. Frahm. Usac: a universal framework for random sample consensus. IEEE transactions on pattern analysis and machine intelligence, 35(8):2022–2038, 2013.
  • [29] R. B. Rusu, N. Blodow, and M. Beetz. Fast point feature histograms (fpfh) for d registration. In Robotics and Automation, 2009. ICRA’09. IEEE International Conference on, pages 3212–3217. IEEE, 2009.
  • [30] H. Shao, T. Svoboda, and L. Van Gool. Zubud - zurich buildings database for image based recognition. Computer Vision Lab, Swiss Federal Institute of Technology, Switzerland, Tech. Rep, 260:20, 2003.
  • [31] A. Vedaldi and B. Fulkerson. VLFeat: An open and portable librar of computer vision algorithms, 2008.
  • [32] A. Wald. Sequential analysis. Courier Corporation, 1973.
  • [33] L. Xu, E. Oja, and P. Kultanen. A new curve detection method: randomized hough transform (rht). Pattern recognition letters, 11(5):331–338, 1990.
  • [34] J. Yang, H. Li, and Y. Jia. Go-icp: Solving 3d registration efficiently and globally optimally. In Proceedings of the IEEE International Conference on Computer Vision, pages 1457–1464, 2013.
  • [35] Y. Zheng, S. Sugimoto, and M. Okutomi. Deterministically maximizing feasible subsystem for robust model fitting with unit norm constraint. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 1825–1832. IEEE, 2011.
  • [36] Q.-Y. Zhou, J. Park, and V. Koltun. Fast global registration. In European Conference on Computer Vision, pages 766–782. Springer, 2016.