Latent RANSAC implementation, based on USAC
We present a method that can evaluate a RANSAC hypothesis in constant time, i.e. independent of the size of the data. A key observation here is that correct hypotheses are tightly clustered together in the latent parameter domain. In a manner similar to the generalized Hough transform we seek to find this cluster, only that we need as few as two votes for a successful detection. Rapidly locating such pairs of similar hypotheses is made possible by adapting the recent "Random Grids" range-search technique. We only perform the usual (costly) hypothesis verification stage upon the discovery of a close pair of hypotheses. We show that this event rarely happens for incorrect hypotheses, enabling a significant speedup of the RANSAC pipeline. The suggested approach is applied and tested on three robust estimation problems: camera localization, 3D rigid alignment and 2D-homography estimation. We perform rigorous testing on both synthetic and real datasets, demonstrating an improvement in efficiency without a compromise in accuracy. Furthermore, we achieve state-of-the-art 3D alignment results on the challenging "Redwood" loop-closure challenge.READ FULL TEXT VIEW PDF
The seam-driven approach has been proven fairly effective for
Loop closure detection, which is the task of identifying locations revis...
Random hypothesis sampling lies at the core of many popular robust fitti...
Software startups have emerged as an interesting multiperspective resear...
We propose a new methodology to perform mineralogic inversion from wellb...
In this paper, we introduce a non-rigid registration pipeline for pairs ...
As developers debug, developers formulate hypotheses about the cause of ...
Latent RANSAC implementation, based on USAC
D reconstruction. Robust estimation methods of these types largely follow the “hypothesize and test” paradigm which has strong roots in statistics, and are highly attractive due to their ability to fit a model to data that is highly corrupted with outliers. Additionally, they have been successfully applied to many problems in computer vision and robotics achieving real time performance.
As an example, in the field of image (or shape) alignment, novel features and descriptors have been introduced to facilitate matching, including ones that are learned. However, once these features are matched, for a parametric model to be fitted, robust estimation methods like RANSAC are used to cope with corrupted sets of putative matches.
Geometric models that are commonly amenable to such a robust estimation process include: D-homography, camera localization, the essential and fundamental matrices that describe epipolar constraints between images, rigid D motion and more.
Consensus maximization has proven a useful robust estimation approach to solving a wide variety of fitting and alignment problems in computer vision.
Research in this field can be broadly divided into global and local optimization methods. Global methods [27, 35, 10, 8] use different strategies to explore the entire solution space enjoy the advantage of having a deterministic nature. Our method, however, belongs to the family of local methods which are typically extremely fast randomized algorithms, potentially equipped with probabilistic success guarantees.
While the proposed method is presented in the context of RANSAC, it is closely related-to and inspired-by other works in the field, such as Hough voting. We cover these topics briefly.
is one of the de-facto golden standards for solving robust estimation problems in a practical manner. Under this paradigm, the space of solutions is explored by repeatedly selecting random minimal subsets of a set of given measurements (e.g. putative matches), for which a model hypothesis is fitted. These hypotheses are verified by counting the measurements that agree with them up to a predefined tolerance. This process is repeated until a desired probability to draw at-least one pure set of inliers is achieved.
Several techniques that extend the ‘naive’ RANSAC procedure described above, are covered in a recent comprehensive survey by Raguram et al. . This survey also suggests USAC – a framework that combines some RANSAC extensions, yielding excellent results on a variety of problems in terms of accuracy, efficiency, and stability. Two notable extensions are PROSAC , which incorporates prior knowledge on the measurements by prioritizing their sampling accordingly, as well as LO-RANSAC  which performs efficient local optimization on a model that has been verified.
While the previous extensions can be seamlessly applied along with the suggested method, the following extension is similar in nature to ours in that it aims to speed up the verification step, but it does so in a very different manner. The Sequential Probability Ratio Test (SPRT)  extension, adopted by USAC, is based on Wald’s sequential test . It attempts to reject a “bad” model with high probability, after inspecting only a small fraction of the measurements. While the test is theoretically solid, it relies on two parameters that are assumed to be known a priori, and in practice need to be adaptively estimated at runtime. It is reported to have achieved an improvement of 20% in evaluation time compared to the simpler bail-out test .
originated from an algorithm for line detection in images , which was later generalized to handle arbitrary shapes [18, 7]. The key idea behind this method is that partial observations of the model are casted as votes into a (quantized) solution space, in which the object can be detected as a mode (the location with the most votes). In practice, GHT has not been shown to scale well to solution spaces of high dimensionality (i.e. higher than ), and typically requires numerous votes for a mode to be accurately detected.
Some works bare resemblance to both of the mentioned approaches. Our work can be seen as one of these: While it fits naturally into the RANSAC pipeline, it has some similarities to GHT in the sense that it seeks to find the mode in the parameter domain, only that it needs as few as two votes to detect it.
A method by Den-Hollander et al. 
also lays somewhere between RANSAC and GHT: To increase the probability of obtaining a pure set of inlier matches, a sub-minimal set is drawn. The remaining degrees of freedom (DoF) are resolved using a voting scheme in a low-dimensional setting. As with all Hough-like methods, an adequate parameterization of the remaining DoF is required. The authors of provide such a parameterization for the problem of fundamental matrix estimation.
Our method bares a strong resemblance to the “Randomized Hough Transform” (RHT)  of Xu et al. in that a vote is casted into a single cell in the solution domain, generated from a randomly selected minimal set. However, unlike , we deal with a hypothesis in constant time and space, rather than logarithmic, thanks to the Random Grid hashing mechanism that we adapt. In addition, while RHT deals with robust curve-fitting (of up to 3 dimensions), we successfully apply our method on a variety of problem domains of higher dimensionality (up to 8 dimensions).
The main novelty of the presented method is its ability to handle RANSAC hypotheses in constant time, regardless of the number of measurements (e.g. matches). We show that it is beneficial to handle hypotheses in the latent space, due to an efficient parametrization and hashing scheme that we devise, which can quickly filter candidate hypotheses until a pair of correct ones are drawn. While this approach comes at the expense of a small increase in the number of hypotheses to be examined, it allows for a significant speedup of the RANSAC pipeline.
The new proposed modifications to RANSAC are accompanied by a rigorous analysis which results in an updated stopping criterion and a well understood overall probability of success. Finally, we validate our method using challenging data in the problems of 2D-homography estimation, 2D-3D based camera localization and rigid-3D alignment, showing state-of-the-art results.
The ‘vanilla’ RANSAC pipeline can be divided into three main components: hypothesis generation, hypothesis verification and the adaptive stopping mechanism. The proposed Latent-RANSAC hypothesis handling fits naturally into the aforementioned pipeline, as can be seen in Figure 2, highlighted in blue. The additional modules we propose act as a ‘filter’ that avoids the need to verify the vast majority of generated hypotheses: Instead of verifying each hypothesis by applying it on all of the matches (a costly process that takes time linear in the number of matches), we check in constant time if a previously generated hypothesis ‘collides’ with the current one, i.e. whether they are close enough (in a sense that will be clarified below). Only the very few hypotheses that pass this filtering stage progress to the verification stage for further processing. As a result of the proposed change, the RANSAC stopping criterion needs to modified to guarantee a probability of encountering a second good hypothesis rather than just one.
In our setup, the goal is to robustly fit a geometric model (transform) to a set of matches (correspondences), w.l.o.g. in Euclidean space, where a match
is an ordered pair of pointsand . For a geometric transform and match the residual error of the match m with respect to is the Euclidean distance in given by:
Given a set of matches and a tolerance , the inlier rate achieved by a transform is defined as the fraction of matches for which . We denote the maximal inlier rate for a match-set by .
In the RANSAC pipeline, matches are used both for the generation of hypothesis candidates, as well as for their screening. Since our approach performs the majority of the screening according to some ’similarity’ in the space of transformations, we seek a parametrization of the transformation space in which distances between transformations can be defined explicitly. More formally, we define such a parametrization by an embedding hypotheses into some -dimensional space , which we call the latent space111the latent space dimension typically being the number of degrees of freedom of the transformation space.. We consider the distance between transformations to be given by the
metric between the embedded (or latent) vectors (-tuples). Our goal is to use an embedding in which the distance between any pair of hypotheses and is tightly related to the difference in the way these hypotheses act on matches in the source domain, i.e. to the difference in magnitudes of their residual errors on the matches. Ideally, for any set of matches ,
We describe here the parameterization we use for the space of D homographies, which are given by projective matrices in . Following previous works (e.g. [24, 16]), we use the 4pt parameterization  that represents a D-homography by an 8-tuple , defined by the coordinates in the target image that are the result of applying on the four corners of the source image, as illustrated in Figure 3.
As was noted in , this parametrization has the key property that the difference between match errors of two well-behaved homographies is bounded by the distance between their pt representations.
used to describe rigid motion in , will be used here to solve the problems of Perspective-n-Point (PnP) estimation and Rigid D alignment. We follow a parametrization that was suggested and used in a line of works of Li et al. [34, 8]. The group can be described as the product between two sub-groups , namely D translations and the special orthogonal group (D rotations). Each of these 3-dimensional sub-groups is parameterized as a 3-tuple, resulting in a 6-tuple representation defined as follows: The axis-angle vector (3-tuple) represents the D rotation matrix given by , where is the matrix exponential and
denotes the skew-symmetric matrix representation. Such vectorsreside in the radius- ball that is contained in the D cube . The translation 3-tuple is a vector in the cube that contains the relevant bounded range of translations for a large enough .
Similar to the case of the D-homography parametrization, it is proved in  that the difference between match errors of two rigid motions is bounded by the distance between their parametrization.
Given such a embedding of a generated hypotheses, the heart of our method boils down to a nearest neighbor query search of the current vector through all vectors representing previously generated hypotheses. More precisely, the task needed to be performed is a range search query for vectors that are at a distance of up to a certain tolerance .
A recent work of Aiger et al.  turns out to be extremely suitable for this task. They propose Random Grids - a randomized hashing method based on a very simple idea of imposing randomly shifted ‘grids’ over the vector space, checking for vectors that ‘collide’ in a common cell. The Random Grids algorithm is very fast, and simple to implement - even in comparison with the closely related LSH-based algorithms , since the grid is axis aligned and it is uniform (consists of cells in with equal side length). Most important, and essential for the speed of our method, is that the range search is done in constant time (i.e. it does not depend on the number of vectors searched against), as opposed to the RANSAC hypothesis validation that requires applying the model and measuring errors on (typically hundreds of) point matches or even the logarithmic-time solution proposed in RHT .
We are given a representation of the transform as a vector (for a -dimensional parameterization). In the Random Grids  setting, we hash v into hash tables , each associated with an independent random grid, which is defined by a uniform random shift , where is the cell side length and is the dimension of the latent vector v. The cell index for v in the table is obtained by concatenating the integer vector into a single scalar (where means “floor” operation). The entire hashing process - initialization, insertion and collision checking, is given in detail in Algorithm 1.
The classical analysis of RANSAC provides a simple formula for the number of iterations required to reach a certain success probability (e.g. 0.99). It is based on the assumption that it is sufficient to have a single ‘good’ iteration in which a pure set of inliers is drawn. Note that this assumption is made for the simplicity of the analysis and is only theoretical, since it ignores e.g. the presence of inlier noise and several possible degeneracies in the data.
be the random variable that counts the number of such good iterations out ofattempts. For a minimal set of size and data with inlier rate of , it holds that
where . The number of iterations required to guarantee a desired success probability is therefore:
A similar simplified analysis can be applied to the Latent-RANSAC scheme. Ignoring the presence of inlier noise, the existence of (at least) two ‘good’ iterations is needed for a collision to be detected and the algorithm to succeed. Therefore, by the binomial distribution we have that
Based on equations (4) and (5), we plot in Figure 4 the ratio between the number of required iterations in the case of Latent-RANSAC versus the case of RANSAC. The ratio is given as a function of the inlier rate , at 3 different success rates (color coded), for the two different cases (e.g. in Rigid D motion estimation) and (e.g. in homography estimation). Interestingly, the ratio attains a small value (less than 2) for inlier rates below , and converges to small constant values as the inlier rate decreases. The very high inlier rates for which the ratio is large are of no concern, since the absolute number is extremely low in this range.
In the next section, as part of an analysis of the Random Grid hashing, we derive a more realistic stopping criterion that depends also on the success probability of the Random Grid based collision detection, which clearly depends on the inlier noise level.
We cover two aspects of Random Grids. First, we extend the stopping criterion from Section 2.3 to consider the probability that a colliding pair of good hypotheses will be detected. Next, we discuss causes of false collision detection, which can have an affect on the algorithm runtime.
Let be the event that the random grid component succeeds (detects a collision), given good iterations out of a total of . We can now update Equation (5), taking this success probability into account:
where the inequality holds due to the fact that monotonically increases with .
Recall that is the event that the random grid hashing succeeds given that two successful hypotheses were generated. We will, more explicitly, denote this event by , for a random grid that uses hash tables.
The analysis in  is rather involved since it deals with the Euclidean distance. Using distances we are able to derive the success probability of finding a true collision in our setup, as a function of the random grid parameters, in a simpler manner. Assuming a tolerance in the latent space, determined by (inlier) noise level of the data, using a random grid with cell dimension and a single table results in
since a pair of pure-inlier transformations (which differ by at most ) must share the same independently offsetted bin indices in each of the dimensions.
Finally, using hash tables, randomly and independently generated, we obtain:
We now discuss the expected number of false collisions that are found by the hashing scheme. It is important to understand why false collision might happen, as they have an effect on the overall runtime of our pipeline.
Recall that is the overall number of iterations of the pipeline, and hence it is also the total number of samples inserted into each hash table. There are two kinds of false collisions to consider. The first kind happens due to the fact that the random grid cell size might be larger than the tolerance . Following the recommendation in  we set the cell size to be not much larger than the tolerance , resulting in a small number of such false collisions. In any case, this kind of collision has a small impact on the runtime, since it will be filtered by the tolerance test (step 3 in Algorithm 1) at constant time cost.
The second kind of false collision is one that passes the tolerance test (step 3). Since it is not the true model, it is associated with some inlier rate . If , the probability of this collision appearing before we have reached the stopping criterion is negligible. Empirically, we observe very few (typically less than ) collisions that pass the tolerance test up to the stopping of the algorithm. These are the only kind of collisions that incur a non-negligible penalty (in runtime only) since they invoke the verification process that every “vanilla” RANSAC hypothesis goes through.
In order to evaluate our method, we performed extensive tests on both real and synthetic data. The Latent-Ransac algorithm is applied to the problems of D-homography estimation (Section 3.1), Perspective-n-Point (PnP) estimation (Section 3.2) and Rigid D alignment (Section 3.3). It is compared with ordinary RANSAC, with or without the well known SPRT  extension, which is a very different technique for accelerating RANSAC’s model verification phase.
Our method naturally extends the standard RANSAC pipeline, according to the changes highlighted in Figure 2. Our implementation (for which we use the shorthand LR) extends the excellent C++ RANSAC implementation USAC , with the noted changes in the specific modules. This enables an easy way to compare with a state-of-the-art RANSAC implementation, and allows our method to enjoy the same extensions used by USAC (such as its local optimization component LO-RANSAC ). In addition, their implementation includes the SPRT  extension, the most commonly used acceleration of RANSAC’s model verification phase. We use the shorthand SPRT to refer to RANSAC using this extension.
In addition we use the OpenGV library  for PnP model fitting (Kneip’s P3P algorithm ) and for Rigid D model fitting (Arun’s algorithm ). We make our modifications to the code available at anonymized.
Parameter were tuned by a simple coordinate descent on a subset of the data and kept fixed throughout. Parameters common to all settings: probability of success , number of hash tables ; Random Grid cell size , where the tolerance (and the RANSAC threshold used) are specified separately for each experiment; Maximal number of iterations ; Hash table size of , resulting in addressing indices of bits (see the supplementary materials for hash table implementation details).
We create a large body of D-homography estimation instances using the Zurich buildings data-set . The data-set consists of sequences of 5 snapshots taken from different street-level viewpoints for 201 buildings or houses. The images are typically dominated by planar facade structures and hence each pair of images in a sequence is related by a D homography (or perhaps more than one in the case of several planes).
We computed SIFT  features for each image and created sets of corresponding features for each of the 10 (ordered) pairs of images in a sequence using the VLFeat library . Then, we ran both RANSAC and LR on each pair 10 times and saved the highest detected inlier rate of any of the fitted homographies as the ’optimal’ inlier rate for the image pair. A small set of image pairs (168 out of 2010) with very low was manually removed from the evaluation since the inlier feature locations did not reside on an actual single plane in the scene, and were very noisy.
The 1842 resulting matching instances are challenging: many pairs have low inlier rates, that result from (i) the planar area of interest typically covering only part of each image; (ii) large viewpoint changes; (iii) large presence of repetitive patterns (e.g. windows or pillars). See Figure 5 for the distribution of inlier rates for this data-set.
We ran LR with a tolerance of pixels in the latent domain and RANSAC and SPRT with a threshold of pixels. These parameters were chosen to give the best results, which are summarized in Table 1. We arrange the image pairs into four groups according to their ’optimal’ inlier rate (defined above), and the size of each group is shown at the bottom of the table. For each group we report the average, and -percentile of runtimes for each method. We also report each method’s success rate (averaged over all pairs in the group), which is the ratio between the detected inlier-rate and the ’optimal’ inlier rate (i.e. the highest one from any method).
|inlier rate range (in %)|
|runtime avg. (95%)||RANSAC||1,536 (6,627)||56 (153)||8 (16)||8 (15)|
|SPRT||1,157 (4,994)||34 (95)||6 (11)||9 (16)|
|LR||1,102 (4,559)||38 (92)||8 (13)||12 (18)|
|# of pairs||210||390||650||592|
As can be seen, SPRT and LR (modestly) accelerate RANSAC at the harder inlier rate ranges, where the overall runtime is longer. LR achieves the best acceleration for the lowest range, while SPRT does better in the 20-40 range. In terms of accuracy - both SPRT and LR are in par with RANSAC, achieving high success rates for the different inlier rate groups.
The detailed runtime breakdowns shown in Figure 6 (left) reveals two important points that should be made here. First, in homography estimation, methods that accelerate the RANSAC evaluation stage (i.e. LR and SPRT) have a relatively small potential improvement gap since the runtime of RANSAC-based homography estimation is dominated by the model fitting stage (this is not the case for the other problems we deal with, as will be seen later). Second, the improved acceleration of LR in the lower ranges is significant when considering the overall time taken to fit the entire data-set, since the majority of time is spent on these difficult cases which are surprisingly not rare (11% and 21% of all pairs are in the 0-10 and 10-20 ranges respectively).
In order to compare methods on the PnP problem (camera localization from D-D correspondences), we follow the benchmark proposed in , using a large bank of synthetic instances with controlled values of inlier-rate and inlier-noise . Specifically, a synthetic inlier set is created by first picking a random camera position in D, followed by projecting random D locations (within the camera view) to D angular coordinates, followed by adding noise . The fraction of outlier matches are generated similarly, except that the D locations are distributed uniformly in the image. Each instance contains such D-D correspondences, generated in this manner.
We chose to use Kneip’s algorithm  for minimal sample P3P model fitting (for all methods) due to its good accuracy-efficiency trade off compared to other alternatives. Since only matches are needed for the minimal sample fitting problem, RANSAC generally needs very few iterations to robustly reveal the true underlying model in typical cases. We therefore add noise at levels of to and consider challenging inlier rates in the range between and .
We ran RANSAC and SPRT with a threshold of radians, LR with a tolerance of radians (in the latent domain), and the translation-to-angle ratio of the embedding was . The results, summarized in table 2, show that LR achieve more than an order of magnitude acceleration compared to RANSAC, at only a minor loss in accuracy in all configurations. SPRT achieves similar acceleration factor at the higher inlier rates ( and ). It is not as efficient at the lower range ( and ) and much less accurate at the lowest inlier rate of .
For the PnP case, Figure 6 (right) reveals that the huge acceleration happens due to the vast majority (over 95%) of time spent by RANSAC on model verification, which is practically eliminated by LR. This is possible mainly due to the existence of fast fitting algorithms (e.g. ) for the P3P problem.
|inlier rate (%)|
To evaluate the Rigid D alignment application of Latent-RANSAC, we use the registration challenge of the recent “Redwood” benchmark proposed by Choi et al. . This dataset was generated from four synthetic D scenes, each divided into point-cloud fragments on average. While from synthetic origin, these fragments contain high-frequency noise and low-frequency distortion that simulate scans created by consumer depth cameras.
The challenge is to perform global D registration between every pair of fragments of a given scene, in order to provide candidate pairs for trajectory loop closure. A correctly ‘detected’ pair is one for which the point clouds overlap by at least and the reported transformation is sufficiently accurate (see  for details). The main goal in this benchmark, as stated by , is to achieve high recall while relying on a post-process to later remove false-matches.
Aside from the benchmark, Choi et al.  present a simple extension (CZK) to the Point-Cloud-Library (PCL)  implementation of . The method of CZK showed state-of-the-art performance, while comparing to previous methods like OpenCV , 4PCS  and its extension super4PCS . Fast Global Registration (FGR)  is a recent novel optimization process presented by Zhou et al., which achieves an order of magnitude runtime acceleration on this dataset, at a competitive recall-precision performance. They perform the costly nearest-neighbor (NN) search only once (unlike previous methods which use them in their inner loop), while introducing several fast and simple methods to filter false matches.
We chose to follow [11, 20, 36] and feed our framework with putative matches based on FPFH features , using a tolerance of cm in the latent space, and translation-to-angle ratio of for the embedding. Inspired by FGR we perform the NN search only once. We then only apply one single filter (also used in [11, 36]), an approximate congruency validation on each minimal sample drawn.
. Our method clearly achieves the highest recall value (the main goal), at a precision slightly below that of CZK. Furthermore, we are able to dominate all previous results in precision and recall simultaneously by using a slightly stricter setting, when reporting only pairs with overlap of overrather than .
We attribute our high performance mainly to the fact that we perform almost no filtering to the putative matches, such as the bidirectional search and tuple filtering done in , normal-agreement in [11, 20] or the drawing of a non-minimal set of 4 matches in . Using this “naive” nearest neighbor FPFH feature matching avoids filtering of true correspondences (enabling higher recall), but this comes at the cost of some very low inlier-rates, as can be seen in Figure 5. Our algorithm is able to deal with such inlier rates successfully (and efficiently), as was shown in the other experiments of this section and in Figure 1.
Another attractive property of our method in this benchmark is its runtime, presented in Table 3. Our runtime is close to that of FGR, which we outperform significantly in terms of recall. Note, however, that our method is actually faster than FGR whenever the inlier rate is above , as the number of iterations given by (5) is very low. Verification consumes a considerable part of the runtime of methods like [11, 20], while we perform the costly overlap verification only upon the detection of a collision ( times per run on average). Additionally, we perform overlap calculation only once as done in FGR.
|method||PCL ||CZK ||FGR ||LR|
|avg. time (sec)||0.21|
In this work we presented Latent-RANSAC: a novel speed-up of the hypothesis handling stage of the RANSAC pipeline. We have shown its advantages on challenging matching problems, that include very low inlier rates, in the domains of homography estimation, camera localization and rigid 3D motion estimation.
Latent-RANSAC has the potential to be extended to additional domains. Of particular interest is finding an appropriate parametrization of the more challenging fundamental matrix domain, which is classically tackled using RANSAC.
The good results that Latent-RANSAC achieves on the ”Redwood” benchmark come to show the advantage of being able to handle highly corrupted ”raw”s data (over 60% of the fragment pairs have under 10% inlier rate). This is since the alternative of filtering the data to reduce the rate of outliers comes at the risk of loss of informative data. The challenge, however, remains to do so efficiently, especially for search spaces of high dimensionality.