TEASER-plusplus
A fast and robust point cloud registration library
view repo
We propose a robust approach for the registration of two sets of 3D points in the presence of a large amount of outliers. Our first contribution is to reformulate the registration problem using a Truncated Least Squares (TLS) cost that makes the estimation insensitive to a large fraction of spurious point-to-point correspondences. The second contribution is a general framework to decouple rotation, translation, and scale estimation, which allows solving in cascade for the three transformations. Since each subproblem (scale, rotation, and translation estimation) is still non-convex and combinatorial in nature, out third contribution is to show that (i) TLS scale and (component-wise) translation estimation can be solved exactly and in polynomial time via an adaptive voting scheme, (ii) TLS rotation estimation can be relaxed to a semidefinite program and the relaxation is tight in practice, even in presence of an extreme amount of outliers. We validate the proposed algorithm, named TEASER (Truncated least squares Estimation And SEmidefinite Relaxation), in standard registration benchmarks showing that the algorithm outperforms RANSAC and robust local optimization techniques, and favorably compares with Branch-and-Bound methods, while being a polynomial-time algorithm. TEASER can tolerate up to 99
READ FULL TEXT VIEW PDFA fast and robust point cloud registration library
Point cloud registration
is a fundamental problem in robotics and computer vision and consists in finding the best transformation (rotation, translation, and potentially scale) that aligns two point clouds. It finds applications in motion estimation and 3D reconstruction
[30, 6, 21, 17, 60], object recognition and localization [20, 56, 58, 43], panorama stitching [3], and medical imaging [2, 53], to name a few.When the ground-truth correspondences between the point clouds are known and the noise follows a zero-mean Gaussian distribution, the registration problem can be readily solved, since elegant closed-form solutions
[31, 1] exist for the case of isotropic noise, and recently proposed convex relaxations [9] are empirically tight even in presence of large anisotropic noise. In practice, however, the correspondences are either unknown, or contain a high ratio of outliers. Large outlier rates are typical of 3D keypoint detection and matching techniques [55, 49, 61]. Therefore, it is common to use the aforementioned methods within a RANSAC scheme [23].While RANSAC is popular approach for several robust vision and robotics problems, its runtime grows exponentially with the outlier ratio [11] and it can perform poorly with extreme outlier rates. The capability of tolerating a large amount of outliers is of paramount importance in applications where the correspondences are unknown and when operating in the clutter (e.g., object pose estimation in the wild). Moreover, even when the correspondences are known but uncertain, it is desirable to develop registration techniques that can afford stronger performance guarantees compared to RANSAC.
This paper is motivated by the goal of designing an approach that (i) can solve registration globally (without relying on an initial guess), (ii) can tolerate extreme amounts of outliers (e.g., when of the measurements are outliers), (iii) runs in polynomial time, and (iv) provides formal performance guarantees. The related literature, reviewed in Section II, fails to simultaneously address all these aspects, and only includes techniques that are robust to moderate amounts (e.g., ) of outliers and lack optimality guarantees (e.g., FGR [62]), or are globally optimal but run in exponential time in the worst case, such as branch-and-bound (BnB) methods (e.g., Go-ICP [57]).
Contribution. Our first contribution (presented in sec:TLSregistration) is to reformulate the registration problem using a Truncated Least Squares (TLS) cost that is insensitive to a large fraction of spurious data. We name the resulting problem the Truncated Least Squared Registration (TR) problem.
The second contribution (sec:decoupling) is a general framework to decouple scale, rotation, and translation estimation. The idea of decoupling rotation and translation has appeared in related work, e.g., [42] and more recently in [40, 11]. The novelty of our proposal is threefold: (i) we develop invariant measurements to estimate the scale (previous work [40, 11] assume the scale is given), (ii) we make the decoupling formal within the framework of unknown-but-bounded noise [45], and (iii) we provide a general graph-theoretic framework to derive these invariant measurements.
The decoupling allows solving in cascade for scale, rotation, and translation. However, each subproblem is still combinatorial in nature. Our third contribution is to show that (i) in the scalar case TLS estimation can be solved exactly in polynomial time using an adaptive voting scheme, and this enables efficient estimation of the scale and the (component-wise) translation; (ii) we can prune a large amount of outliers by finding a maximal clique of the graph defined by the invariant measurements; (iii) we can formulate a tight semidefinite programming (SDP) relaxation to estimate the rotation, (iv) we can provide per-instance bounds on the performance of the SDP relaxation. To the best of our knowledge, this is the first polynomial-time algorithm for outlier-robust registration with computable performance guarantees.
We validate the proposed algorithm, named Truncated least squares Estimation And SEmidefinite Relaxation (TEASER), in standard registration benchmarks as well as robotics datasets, showing that the algorithm outperforms RANSAC and robust local optimization techniques, and favorably compares with Branch-and-Bound methods, while being a polynomial-time algorithm. TEASER can tolerate up to outliers (Fig. 1) and returns highly-accurate solutions.
There are two established paradigms for the registration of 3D point clouds: correspondence-based and Simultaneous Pose and Correspondence methods.
Correspondence-based methods first detect and match 3D keypoints between point clouds using local [55, 26, 49, 50] or global [20, 36] descriptors to establish putative correspondences, and then either use closed-form solutions [31, 1] in a RANSAC [23] scheme, or apply robust optimization methods [62, 11] to gain robustness against outliers. 3D keypoint matching is known to be less accurate compared to 2D counterparts like SIFT and ORB, thus causing much higher outlier rates, e.g., having 95% spurious correspondences is considered common [11]. Therefore, a robust backend that can deal with extreme outlier rates is highly desirable.
Registration without outliers. Horn [31] and Arun [1] show that optimal solutions (in the maximum likelihood sense) for scale, rotation, and translation can be computed in closed form when the points are affected by isotropic zero-mean Gaussian noise. Olsson et al. [47] propose a method based on BnB that is globally optimal and allows point-to-point, point-to-line, and point-to-plane correspondences. Recently, Briales and Gonzalez-Jimenez [9] propose a semidefinite relaxation that can deal with anisotropic Gaussian noise, and has per-instance optimality guarantees. All these methods assume that all the correspondences are known and correct.
Robust registration.Probably the most widely used robust registration approach is based on RANSAC [23, 15], which has enabled several early applications in vision and robotics [27, 44]. Despite its efficiency in the low-noise and low-outlier regime, RANSAC exhibits slow convergence and low accuracy with large outlier rates [11], where it becomes harder to sample a “good” consensus set. Other approaches resort to M-estimation, which replaces the least squares objective function with robust costs that are less sensitive to outliers [54, 5, 38]. Zhou et al. [62] propose Fast Global Registration (FGR) that uses the Geman-McClure robust cost function and leverages graduated non-convexity to solve the resulting non-convex optimization. Since graduated non-convexity has to be solved in discrete steps, this method does not guarantee global optimality in general [11]. Indeed, FGR tends to fail when the outlier ratio is high (>80%), as we show in sec:experiments. Izatt et al. [32] develop an approach based on mixed-integer programming that computes -globally-optimal solutions, and can be also used for offline verification of optimality. Bustos and Chin [11] propose a Guaranteed Outlier REmoval (GORE) technique, that uses geometric operations to significantly reduce the amount of outlier correspondences before passing them to the optimization backend. GORE has been shown to be robust to 95% spurious correspondences [11]. However, GORE does not estimate the scale of the registration and has exponential worst-case time complexity due to the possible usage of BnB (see Algorithm 2 in [11]).
Simultaneous Pose and Correspondence (SPC) methods alternate between finding the correspondences and computing the best transformation given the correspondences.
Local methods. The Iterative Closest Point (ICP) algorithm [4] is considered a milestone in point cloud registration and remains one of the most widely used approaches. However, ICP is known for its vulnerability to local minima and it only performs well given a good initial guess of the transformation. Multiple variants of ICP [24, 51, 59, 41, 16, 35, 7] have been proposed, showing that the use of robust cost functions improves convergence. Probabilistic interpretations have also been proposed to improve ICP
convergence, for instance interpreting the registration problem as a minimization of the Kullback-Leibler divergence between two Gaussian Mixture Models
[33, 46, 34, 13]. All these methods rely on iterative local search, do not provide global optimality guarantees, and typically fail without a good initial guess.Global methods. Global SPC approaches compute a globally optimal solution without initial guesses, and are usually based on BnB, which at each iteration divides the parameter space into multiple sub-domains (branch) and computes the bounds of the objective function for each sub-domain (bound). A series of geometric techniques have been proposed to improve the bounding tightness [29, 8, 57, 14, 12] and increase the search speed [57, 39]. However, the runtime of BnB increases exponentially with the size of the point cloud and it can be made worse by the explosion of the number of local minima resulting from high outlier ratios [11].
In the robust registration problem, we are given two 3D point sets and , with , such that:
(1) |
where , , and are an unknown scale, rotation, and translation, models measurement noise, and
is a vector of zeros for
inliers, or a vector of arbitrary numbers for outliers. In words, if the -th correspondence is an inlier, corresponds to a 3D transformation of (plus noise), while if is an outlier correspondence, is just an arbitrary vector. is the set of proper rotation matrices (whereis the identity matrix of size
). We consider a correspondence-based setup, where we need to compute given putative correspondences .Registration without outliers. When is a zero-mean Gaussian noise with isotropic covariance , and all the correspondences are correct (i.e., ), the Maximum Likelihood estimator of can be computed in closed form by decoupling the estimation of the scale, translation, and rotation, using Horn’s [31] or Arun’s method [1].
Robust registration. In practice, a large fraction of the correspondences are outliers, due to incorrect keypoint matching. Despite the elegance of the closed-form solutions [31, 1], they are not robust to outliers, and a single “bad” outlier can compromise the correctness of the resulting estimate. Therefore, one typically “wraps” these methods within a RANSAC scheme. While RANSAC is a popular method for registration, it is not able to deal with extreme amounts of outliers, as shown in Section VI and in related work [11]. Hence, we propose a truncated least squares registration formulation that can tolerate extreme amounts of spurious data.
Truncated Least Squares Registration. We depart from the Gaussian noise model and assume that that noise is unknown but bounded [45]. Formally, we assume that the noise in (1) is such that , where is a given bound.
Then we adopt the following Truncated Least Squares estimator (TLS):
(2) |
which computes a least squares solution of measurements with small residual (), while discarding measurements with large residuals (when the -th summand becomes constant and does not influence the optimization). The constant is typically chosen to be , while one may use a different to be stricter or more lenient towards potential outliers.
We conclude this section by rewriting (2) using an equivalent formulation that absorbs the sign of into . Formally, we write and note that , where is the sign of a scalar, is the set of orthogonal matrices containing rotations and reflections. Hence we reformulate (2) to optimize over a positive scale
and an orthogonal matrix
:(3) |
Problem (3) is our robust formulation, named Truncated Least Squared Registration (TR). In the following section, we discuss how to decouple the estimation of scale, rotation, and translation using invariant measurements.
We propose a polynomial-time algorithm that decouples the estimation of scale, translation, and rotation in problem (3). The key insight is that we can reformulate the measurements (1) to obtain quantities that are invariant to a subset of the transformations (scaling, rotation, translation).
While the absolute positions of the points in depend on the translation , the relative positions are invariant to . Mathematically, given two points and from (1), the relative position of these two points is:
(4) |
where the translation cancels out in the subtraction. Therefore, we can obtain a Translation Invariant Measurement (TIM) by computing and , and the TIM satisfies the following generative model:
(TIM) |
where is zero if both the -th and the -th measurements are inliers (or it is an arbitrary vector otherwise), while is the measurement noise. It is easy to see that if and then .
The advantage of the TIMs in eq. (TIM) is that their generative model is only related to two unknowns, and .
A general way to create TIMs is given by thm:TIM.
Define the vectors (resp. ), obtained by concatenating all vectors (resp. ) in a single column vector. Moreover, define an arbitrary graph with nodes and an arbitrary set of edges . Then, the vectors and are TIMs, where is the incidence matrix of , and is the Kronecker product.
A proof of the theorem is given in the Supplementary Material. The number of TIMs clearly depends on the topology of the graph in thm:TIM and is upper-bounded by (complete graph). Three potential topologies (complete, star, spanning tree) are shown in Fig. 2, which also gives an intuitive visualization of the TIMs.
While the relative locations of pairs of points (TIMs) still depends on the rotation , their distances are invariant to both and . Therefore, to build rotation invariant measurements, we compute the norm of each TIM vector:
(5) |
We now note that for the inliers () it holds (using and the triangle inequality):
(6) |
hence we can write (5) equivalently as:
(7) |
with , and if both and are inliers or is an arbitrary scalar otherwise. Recalling that the norm is rotation invariant and that , and dividing both members of (7) by , we obtain new measurements :
(TRIM) |
where , and . It is easy to see that since . We define .
Eq. (TRIM) describes a translation and rotation invariant measurement (TRIM) whose generative model is only function of the unknown scale . A remark of the novelty of creating TIMs and TRIMs is presented in the Supplementary Material.
We propose a decoupled approach to solve in cascade for the scale, the rotation, and the translation in (3). The approach, named Truncated least squares Estimation And SEmidefinite Relaxation (TEASER), works as follows:
we use the TRIMs to estimate the scale
we use and the TIMs to estimate the rotation
we use and to estimate the translation from the original TLS problem (3).
The pseudocode is also summarized in Algorithm 1.
The generative model (TRIM) describes linear measurements of the unknown scale , affected by bounded noise including potential outliers (when ). Again, we estimate the scale given the measurements and the bounds using a TLS estimator:
(8) |
where for simplicity we numbered the measurements from to and adopted the notation instead of .
The following theorem shows that one can solve (8) in polynomial time by a simple enumeration.
thm:scalarTLS, whose proof is given in the Supplementary Material, is based on the insight that the consensus set can only change at the boundaries of the intervals (Fig. 3(a)) and there are at most such boundaries. The theorem also suggests a straightforward adaptive voting algorithm to solve (8), with pseudocode given in Algorithm 2. The algorithm first builds the boundaries of the intervals shown in Fig. 3(a) (line 2). Then, for each interval, it evaluates the consensus set (line 2, see also Fig. 3(b)). Since the consensus set does not change within an interval, we compute it at the interval centers (line 2, see also Fig. 3(b)). Finally, the cost of each consensus set is computed and the smallest cost is returned as optimal solution (line 2).
The interested reader can find a discussion on the relation between TLS and consensus maximization
(a popular approach for outlier detection
[52, 40]) in the Supplementary Material.Maximal clique inlier selection (MCIS). The graph theoretic interpretation of thm:TIM offers further opportunities to prune outliers. Considering the TRIMs as edges in the graph (where the vertices are the 3D points and the edge set induces the TIMs and TRIMs per thm:TIM), we can use the scale estimate from Algorithm 2 to prune edges in the graph whose associated TRIM is such that . This allows us to obtain a pruned graph , with , where gross outliers are discarded. The following result ensures that inliers form a clique in the graph enabling an even more substantial rejection of outliers.
Edges corresponding to inlier TIMs form a clique in , and there is at least one maximal clique in that contains all the inliers.
A proof of Theorem 3 is presented in the Supplementary Material. Theorem 3 allows us to select the inliers by finding the maximal cliques of . Although finding the maximal cliques of a graph takes exponential time in general, there exist efficient approximation algorithms [10, 48]. In addition, under high outlier rates, the graph is sparse and the maximal clique problem can be solved quickly in practice [22]. In this paper, we choose the maximal clique with largest size as the inlier set to pass to rotation estimation. sec:separateSolver shows that this method drastically reduces the number of outliers.
In summary, the function in Algorithm 1 first calls Algorithm 2, and then computes the largest maximal clique in the resulting graph. All measurements that do not belong to the clique are rejected as outliers.
What if the scale is known? In some registration problems, the scale is known, e.g., the scale of the two point clouds is the same. In such a case, we can skip Algorithm 2 and set to be the known scale. Moreover, we can still use the MCIS method to largely reduce the number of outliers.
The generative model (TIM) describes measurements affected by bounded noise including potential outliers (when ). Again, we estimate from the estimated scale , the measurements and the bounds using a TLS estimator:
(10) |
where for simplicity we numbered the measurements from to and adopted the notation instead of . For simplicity of notation, in the following we drop and assume that have been corrected by the scale ().
A fundamental contribution of this paper is to develop a tight convex relaxation for (10). The relaxation is tight even in presence of a large number (90%) of outliers and provides per-instance suboptimality guarantees. Before presenting the relaxation, we introduce a binary formulation that is instrumental to develop the proposed relaxation.
Binary formulation and Binary cloning. The first insight behind our convex relaxation is the fact that we can write the TLS cost (10
) in additive form using auxiliary binary variables (a property recently leveraged in a different context by
[38]):(11) |
The equivalence can be easily understood from the fact that .
We conveniently rewrite (11) by replacing the binary variables with suitable (orthogonal) matrices.
Problem (11) is equivalent to the following optimization problem
subject to | (12) | ||||
where we introduced a matrix for each , and defined the vector .
A formal proof of prop:cloning is given in the Supplementary Material. We name the re-parametrization in prop:cloning binary cloning, since we now have clones of (namely , ) that are in charge of rejecting outliers: when the -th term in the objective becomes (i.e., is treated as an inlier, similarly to choosing ), while when the -th term is equal to (i.e., is treated as an outlier, similarly to choosing ). This reparametrization enables our relaxation.
Convex relaxation. The proposed relaxation is presented in prop:TLSrotationRelax. The main goal of this paragraph is to provide the intuition behind our relaxation, while the interested reader can find a formal derivation in the Supplementary Material.
Let us define a matrix , stacking all unknown variables in (12). We observe that the matrix contains all linear and quadratic terms in and :
(13) |
Now it is easy to see that the cost function in (12) can be rewritten as a function of , by noting that (appearing in the cost) are entries of , see (13). The constraints in (12) can be similarly written as a function of . For instance, the constraints and simply enforce that the block diagonal entries of are identity matrices. Similarly, can be rewritten as a (non-convex) constraint involving off-diagonal entries of . Finally, the fact that implies is positive semidefinite and has rank (number of rows in , see (13)).
According to the discussion so far, we can reparametrize problem (12) using , and we can then develop a convex relaxation by relaxing all the resulting non-convex constraints. This is formalized in the following proposition.
The following convex program is a relaxation of (12):
(14) | |||||
subject to | |||||
where is a known symmetric matrix (expression given in the supplementary), and denotes a block of whose row indices correspond to the location of in (cf. with indices at the top of the matrix in eq. (13)) and column indices correspond to the location of in (similarly, for , , , etc.).
The convex program (14) can be solved in polynomial time using off-the-shelf convex solvers, such as cvx [25]. It is a relaxation, in the sense that the set of feasible solutions of (14) includes the set of feasible solutions of (12). Moreover, it enjoys the typical per-instance guarantees of convex relaxations.
Empirically, we found that our relaxation is tight (i.e., produces a rank 3 solution) even when 90% of the TIMs are outliers. In general, even when the relaxation is not tight, one can still project to a feasible solution of (12) and obtain an upper-bound on how suboptimal the resulting solution is.
Since we already presented a polynomial-time solution for scalar TLS in sec:scaleEstimation, we propose to solve for the translation component-wise, i.e., we compute the entries of independently (see the Supplementary Material for details):
(15) |
where denotes the -th entry of a vector.
The goal of this section is to (i) test the performance of our scale, rotation, translation solvers and the MCIS pruning (sec:separateSolver), (ii) evaluate TEASER against related techniques in benchmarking datasets (sec:benchmark), (iii) evaluate TEASER with extreme outliers rates (sec:benchmarkExtreme), and (iv) show an application of TEASER for object localization in an RGB-D robotics dataset (sec:roboticsApplication). In all tests we set .
Implementation details. We implemented TEASER in matlab and used cvx to solve the convex relaxation (14). Moreover, we used the algorithm in [22] to find all the maximal cliques in the pruned TIM graph (see thm:maxClique).
Testing setup. We use the Bunny dataset from the Stanford 3D Scanning Repository [19] and resize the corresponding point cloud to be within the cube. The Bunny is first downsampled to points, and then a random transformation (with and ) is applied according to eq. (1). To generate the bounded noise , we sample , until the resulting vector satisfies . We set and such that (this bound stems from the fact that for Gaussian ,
follows a Chi-square distribution with 3 degrees of freedom). To generate outliers, we replace a fraction of
with vectors uniformly sampled in the sphere of radius 5. We test increasing outlier ratios . All statistics are computed over 40 Monte Carlo runs.Scale solver. Given the two point clouds and , we first create TIMs corresponding to a complete graph and then use Algorithm 2 to solve for the scale. We compute both maximum consensus [52] and TLS estimates of the scale and compare their performance. Fig. 4(a) shows box plots of the scale error with increasing outlier ratios. It can be observed that the TLS solver is robust against 80% outliers, while, in that regime, maximum consensus failed three times.
Rotation Solver. To test the rotation solver, we apply a random rotation to the Bunny, while we fix and . Rotation error and stable rank are shown in Fig. 4(b): the rank of the matrix computed by the relaxation is numerically close to 3 (prop:TLSrotationRelax) even with 90% of outliers, and the rotation error remains below 2 degrees.
Translation Solver. To test the translation solver, we apply a random translation to the Bunny, while we fix and . We test component-wise translation estimation using both maximum consensus and TLS. Fig. 4(c) shows that both techniques are robust against 80% outliers.
Maximal Clique Inlier Selection. For this test, we downsample Bunny to and fix the scale to when applying the random transformation. We first prune the outlier TIMs/TRIMs (edges) that are not consistent with the scale , while keeping all the points (nodes), to obtain the graph . Then we compute the maximal clique with largest size in using the algorithm in [22], and remove all edges and nodes outside the clique, obtaining a pruned graph . Fig. 4(d) shows the outlier ratio in (label: “Before MCIS”) and (label: “After MCIS”). The MCIS procedure effectively reduces the amount of outliers to below 10%, further facilitating rotation and translation estimation, which, in isolation, can already tolerate more than 90% outliers.
Testing setup. We benchmark TEASER against two state-of-the-art robust registration techniques: Fast Global Registration (FGR) [62] and Guaranteed Outlier REmoval (GORE) [11]. In addition, we test two RANSAC variants: a fast version where we terminate RANSAC after a maximum of 1,000 iterations (RANSAC (1K)) and a slow version where we terminate RANSAC after 60s (RANSAC). Four datasets, Bunny, Armadillo, Dragon and Buddha, from the Stanford 3D Scanning Repository are selected and downsampled to points. The tests below follow the same protocol of sec:separateSolver.
Known Scale. We first evaluate the compared techniques with known scale . Fig. 5(a) shows the rotation and translation error at increasing outlier ratios for the Bunny dataset. TEASER, GORE and RANSAC are robust against up to 90% outliers, although TEASER tends to produce more accurate estimates than GORE, and RANSAC typically requires over iterations for convergence at 90% outlier rate. FGR can only resist 70% outliers and RANSAC (1K) starts breaking at 60% outlier rate. These conclusions are confirmed by the results on the other three datasets (Armadillo, Dragon, Buddha), which are shown in the Supplementary Material due to space constraints.
Unknown Scale. GORE is unable to solve for the scale, hence we only benchmark TEASER against FGR (although the original algorithm in [62] did not solve for the scale, we extend it by using Horn’s method to compute the scale at each iteration), RANSAC (1K) and RANSAC. Fig. 5(b) plots the scale, rotation and translation error for increasing outlier ratios on the Bunny dataset. All the compared techniques perform well when the outlier ratio is below 60%. FGR has the lowest breakdown point and fails at 80%. RANSAC (1K) and TEASER only fail at 90% outlier ratio when the scale is unknown. Although RANSAC with 60s timeout outperforms other methods at 90% outlier rate, it typically requires more than iterations to converge, which is not practical for real-time applications.
Motivated by the fact that over 95% of the correspondences generated by 3D keypoint matching methods can be outliers, we further benchmark the performance of TEASER under extreme outlier rates from 95% to 99% with known scale and correspondences on the Bunny. We also replace RANSAC (1K) with RANSAC (10K), since RANSAC (1K) already performs poorly at 90% outlier ratio (sec:benchmark, Fig. 5(a)). Fig. 6 shows the boxplots of the rotation and translation errors under extreme outlier rates. Both TEASER and GORE are robust against up to 99% outliers, while RANSAC with 60s timeout can resist 98% outliers with about iterations. RANSAC (10K) and FGR perform poorly under extreme outlier ratios. While GORE and TEASER are both robust against 99% outliers, TEASER produces slightly lower estimation errors.
We use the large-scale point cloud datasets from [37] to test TEASER in object pose estimation and localization applications. Seven scenes containing a cereal box and one scene containing a cap are selected. We first use the ground truth object labels to extract the cereal box/cap out of the scene and treat it as the object, then apply a random transformation to the scene, to get an object-scene pair. To register the object-scene pair, we first use FPFH feature descriptors [50] to establish putative feature correspondences. Then, TEASER is used to find the relative pose. Fig. 5(c) shows the noisy FPFH correspondences, the inlier correspondences obtained by TEASER, and successful localization and pose estimation of the cereal box. The interested readers can refer to the Supplementary Material for the localization results on the other seven scenes.
We propose a Truncated Least Squares approach to compute the relative transformation (scale, rotation, translation) that aligns two point clouds in the presence of extreme outlier rates. Then, we present a general graph-theoretic framework to decouple rotation, translation, and scale estimation. We provide a polynomial-time solution for each subproblem: scale and (component-wise) translation estimation can be solved exactly via an adaptive voting scheme, while rotation estimation can be relaxed to a semidefinite program. The resulting approach, named TEASER (Truncated least squares Estimation And SEmidefinite Relaxation), outperforms RANSAC and robust local optimization techniques, and favorably compares with Branch-and-Bound methods, while being a polynomial-time algorithm. TEASER can tolerate up to outliers and returns highly-accurate solutions.
Using the vector notation and already introduced in the statement of the theorem, we can write the generative model (1) compactly as:
(A16) |
where , , and is a column vector of ones of size . Denote as the cadinality of , such that . Let us now multiply both members by :
(A17) |
Using the property of the Kronecker product we simplify:
(A18) |
where we used the fact that is in the Null space of the incidence matrix [18]. Using (i) and (ii), eq. (A17) becomes:
(A19) |
which is invariant to the translation , concluding the proof.
We remark that the idea of using translation invariant measurements and rotation invariant measurements has been proposed in recent work [39, 11, 40] while (i) the graph theoretic interpretation of thm:TIM is novel and generalizes previously proposed methods, and (ii) the notion of translation and rotation invariant measurements (TRIMs) is completely new. We also remark that while related work uses invariant measurements to filter-out outliers [11] or to speed up BnB [39, 40], we show that they also allow computing a polynomial-time robust registration solution.
Let us first prove that there are at most different non-empty consensus sets. We attach a confidence interval to each measurement , . For a given scalar , a measurement is in the consensus set of if (satisfies ), see Fig. 3(a). Therefore, the only points on the real line where the consensus set may change are the boundaries (shown in red in Fig. 3(a)) of the intervals , . Since there are at most such intervals, there are at most non-empty consensus sets (Fig. 3(b)), concluding the first part of the proof. The second part follows from the fact that the consensus set of is necessarily one of the possible consensus sets, and problem (8) simply computes the least squares estimate of the measurements in the consensus set of the solution and choose the estimate that induces the lowest cost as the optimal estimate. Therefore, we can just enumerate every possible consensus set and compute a least squares estimate for each of them, to find the solution that induces the smallest cost.
TLS (and Algorithm 2) are related to consensus maximization, a popular approach for outlier detection in vision [52, 40]. Consensus maximization looks for an estimate that maximizes the number of inliers, i.e. , where denotes the cardinality of a set. While consensus maximization is intractable in general, by following the same lines of Theorem 2, it is easy to show that consensus maximization can be solved in polynomial time in the scalar case as . While we expect the TLS solution to maximize the set of inliers, consensus maximization and TLS will not return the same solution in general, since TLS may prefer to discard measurements that induce a large bias in the estimate, as shown by the simple example below.
Consider a simple scalar estimation problem, where we are given three measurements and . Assume , and . Then, it is possible to see that attains a maximum consensus set including all measurements , while the TLS estimate is which attains a cost , and has consensus set .
Consider a graph whose edges where selected as inliers during scale estimation. An edge (and the corresponding TIM) is an inlier if both and are correct correspondences (see discussion before thm:TIM). Therefore, contains edges connecting all points for which we have inlier correspondences. Therefore, these points are vertices of a clique in the graph and the edges (or equivalently the TIMs) connecting those points form a clique in . We conclude the proof by observing that the clique formed by the inliers has to belong to at least one maximal clique of .
Here we prove the equivalence between Problem (11) and problem (12). Towards this goal, we show that the latter is simply a reparametrization of the former.
Let us rewrite (11) by making the orthogonality constraint explicit (recall ):
(A20) | |||||
subject to |
Noting that the term , we can safely move it inside the squared norm and rewrite eq. (A20) as:
(A21) | |||||
subject to |
Now, we reparametrize the problem by introducing matrices . In particular, we note that these matrices satisfy:
(i.e., )
, which also implies
, where .
These three properties allow writing (A21) as: