1 Introduction
The availability of consumer depth cameras has allowed us to acquire reliable 3D scans of a scene or an object [1, 2, 3]. To build a geometrically consistent 3D reconstruction, we need to solve the key problem of aligning or registering all scans in a global frame of reference [4, 5, 6, 7]. The resulting solution can be used in a diverse range of contexts including humancomputer interaction, archiving of cultural objects, industrial inspection etc. There is a wide variety of solutions to the 3D registration problem in the literature [8, 9, 10, 11, 12]. All of these methods involve a tradeoff between speed and accuracy. Recently, [12] has presented a method for fast global registration (henceforth denoted FGR) of 3D scans based on the robustness of a line process. This method has been shown to outperform existing methods in terms of both speed and accuracy.
As in [12], given a set of 3D feature correspondences, we pose the registration problem as one of solving for the rigid Euclidean motion that minimizes a robust cost function. However, unlike their approach, our solution systematically utilises the rich geometric structure of the underlying Lie group representation for 3D motion, i.e., the Special Euclidean group . In this context, we achieve robustness via the iteratively reweighted least squares (IRLS) method. The key observation of our paper is that our specific combination of geometry of rigid Euclidean motion and the robustness of IRLS allows our method to significantly outperform the fast global registration method of [12] without having to take recourse to tuning of parameters. Additionally, we show that the solution proposed in [12] is a special case of our more general (and more accurate and faster) solution. In the process, we demonstrate that we can gain both theoretical insight as well as improved performance by utilizing the rich geometric structure of the underlying geometric representation of rigid Euclidean motions.
Furthermore, our work also addresses two important considerations. Firstly, in order to achieve robustness, some loss functions used for optimization have parameters that need to be either provided
a priori or estimated in situ. This is the case with [12] which uses the GemanMcClure loss function. We argue that for the problem of 3D registration the statistical efficiency (i.e. accuracy) tradeoff inherent to all robust estimators can be easily optimized using loss functions that are parameter free. This obviates the need to estimate any parameter during the course of optimization. Secondly, we argue that while fast and accurate 3D registration can be achieved using 3D point feature correspondences, this approach has certain limitations. While such feature correspondences can be reliably obtained when the camera motion is small (equivalently there is significant overlap between scans), there are certain scenarios where the feature correspondences break down. For such cases, we demonstrate that accurate joint registration of multiple 3D scans can be achieved by incorporating our robust motion estimation method into local methods such as the ICP.2 Literature Survey
Although the literature of 3D registration is extensive, in the following we only focus on aspects that are directly relevant to our method. A large number of methods for registering 3D scans using point correspondences have two key aspects, (a) a method for establishing point correspondences between two 3D scans and (b) a method for estimating motion given an adequate number of such correspondences. We may further classify methods according to whether they use a fixed set of point correspondences
[11, 12, 13, 14] or update them [15, 16, 17, 7, 18]. In the former instance, we use invariant feature representations to find point correspondences across a pair of scans which are then used for motion estimation. In the latter approach, we alternately update point correspondences using nearest neighbors and motion estimation till convergence, such as in the classical approach of ICP and its variants (see [15, 18, 19, 20] and references therein). Independent of the method for establishing correspondences, we need a method for robust motion estimation method given a set of correspondences. This problem of robust motion estimation is the main focus of this paper.The least squares solution to the motion estimation problem is the classical method of Umeyama [21]
. However, the least squares solution breaks down in the presence of outliers. The requisite robustness is often achieved using variants of RANSAC
[16, 22, 23, 24, 7, 10] or motion clustering [17, 7, 25]. Other classes of approaches are based on the branchandbound framework [8, 26, 27, 28, 29]. However, all of these approaches often require expensive iterations and are slow to converge. Other solutions based on expectationmaximization using Gaussian mixture model representations of the scans
[30, 31, 32] are similarly slow. Another approach that is relatively efficient is to perform robust averaging of pairwise motions between scans [33, 34, 35], or perform some variants of posegraph optimization [11], to produce tightly registered scans. Yet another approach is to use the wellknown IRLS method [36] to efficiently optimize robust cost functions [37]. The recent fast global registration method of [12] made use of the duality between robust estimators and line processes [38] to develop a fast approach for global registration. This method has been shown to produce the best results till date in terms of speed as well as accuracy.3 Lie Group Structure of Euclidean Motions
Our method utilizes the geometric structure of Lie groups, specifically that of the Special Euclidean group . In this section, we provide a brief description of the group structure and relevant Lie group properties. The Euclidean motion between two 3D scans consists of a rotation and a translation. While rotations can be represented in a variety of ways, we represent them as orthonormal matrices , i.e., and (here and for the rest of this paper, we use to denote the identity matrix). A rigid Euclidean motion can then be compactly represented by a matrix
(1) 
The matrices and satisfy the properties of a matrix group and also form smooth, differential manifolds, i.e., they are Lie groups. Thus, and where and are the Special Orthogonal and the Special Euclidean groups respectively. We also note that and have and degrees of freedom respectively.
Lie Groups: The Lie group structure of plays an important role in our formulation. Detailed expositions on the geometric properties of this representation are presented in [39, 40]
. For our purposes, it will suffice to note that for finite dimensional Lie groups (matrix groups) the product and inverse operations are differentiable mappings. Every point in a Lie group has a local neighborhood (tangent space) called its Lie algebra which has the properties of a vector space. For
and , the corresponding Lie algebra are denoted as and respectively. Hereis the skewsymmetric form of the axisangle rotation representation
. In this representation, represents a rotation by angle about the axis . Further, we can move from a Lie group to its Lie algebra and viceversa using the logarithm and the exponential mappings respectively. Thus, and . Similarly, we have and with the forms(2) 
Further, we note that the and mappings for and have closed form expressions that can be efficiently implemented. Specifically,
(3) 
(4) 
where . We also note that the special properties of the group structure of are discussed in [41].
4 Robust Motion Estimation
Given pairs of point correspondences between a pair of scans, we can solve for the motion required for aligning the pair of scans. In the case of global approaches, where the motion can potentially be large, such point correspondences are obtained by matching geometric features such as FPFH [23] whereas in iterative schemes like ICP, the correspondence of points is obtained by finding the nearest neighbor match on the second scan for a point on the first one. Due to a variety of sources of error, the correspondence pairs are not in perfect alignment or could be grossly incorrect, i.e., , where denotes the norm of the discrepancy in registration for the th correspondence pair
. When we assume that the individual errors have a zeromean, iid Gaussian distribution, the optimal estimate for
is obtained by a least squares minimization and has a closed form [21]. However, since correspondences could be highly incorrect in practice, instead of a least squares formulation, we pose motion estimation as the optimization of a robust cost function(5) 
where is a robust loss function and we also note that the individual error terms are a function of the motion . The use of robust estimators is well studied in statistics as an Mestimation problem and has been successfully used in a variety of vision problems [42]. However, in addition to robustness, in Eqn. 5 we require our solution to satisfy the geometric constraints for . These requirements of robust estimation under geometric constraints are satisfied by our solution.
4.1 Proposed Solution for Pairwise Registration
We propose to minimize the cost function in Eqn. 5 in an iterative fashion. Let the estimate for at the th iteration be denoted . In the th iteration, let us update the motion matrix by , i.e., . Noting that here the matrix multiplication is not commutative, our formulation uses a leftinvariant metric on [41, 43]. Using a firstorder approximation for the motion update matrix , we have
(6)  
(7) 
(8) 
Here we note that the Lie algebra matrix encodes the parameters that we need to estimate for the update . We can obtain the vector representation for these parameters using the ‘vee’ operator, i.e., . Since we use a firstorder approximation in Eqn. 6, the cost is linear in . We equivalently note that it is also linear in . Thus, we rewrite the individual error terms as
(9) 
where and are the appropriate matrices. The derivation of the explicit forms of and are given in the appendix. To obtain the update in the th iteration, we now optimize the cost w.r.t. , and get,
(10) 
for each summand indexed by , where is the influence function of the robust loss . We may further denote , which is the relative weight accorded to the th equation in Eqn. 10. Collecting all such relationships obtained for each pair of correspondences into a single system of equations we have
(11) 
where , , and .
Eqn. 11 is a weighted linear system of equations with the solution . However, it should be noted that each individual weight is a function of the error which, in turn, is dependent on , since . Thus, in Eqn. 11, the equivalent relationship is . The solution for this system of equations is the wellknown iteratively reweighted least squares (IRLS) method [36, 44]. In the IRLS method, in each iteration the weights are estimated based on the current estimate of . Given these weights, is reestimated. This process is repeated till convergence.
Given a solution for , we can estimate where the ‘hat’ operator converts the estimated Lie algebra parameters into its equivalent matrix representation . We emphasize here that although in Eqn. 6, we assumed a firstorder approximation, we map the estimated into an intrinsic estimate of the motion update, i.e., . In other words, a firstorder approximation in an intermediate step does not mean that the actual cost function is approximated. The mapping ensures that the estimated motion is always a valid member of the group. We may now state our solution for robust motion estimation as given in Algorithm 1.
Algorithm 1 is an iterative algorithm with a nested iteration. The outer loop is defined by the while statement and we denote its number of iterations as . The inner loop consists of the IRLS step of line 3 in Algorithm 1 since IRLS is itself an iterative method. We denote the number of iterations of the IRLS step as .
4.2 Extension to Joint Multiview Registration
In Algorithm 1 we presented the registration solution for two scans. This approach can be extended to the simultaneous or joint multiview registration of a set of scans. Towards this end, we define a viewgraph where represents the Euclidean motion of the th scan (equivalently camera) and an edge signifies that the relative Euclidean motion between the th and th scan can be determined from available matches. Further, we denote the number of scans as . We may now define the cost function to be optimized for joint multiview registration as follows
(12) 
where denotes the set of absolute motions of each of the scans w.r.t. to a global frame of reference and is the number of correspondences between the th and th scan. We again use an iterative approach to minimize the cost function in Eqn. 12 w.r.t. . In the th iteration, we update each motion matrix by , i.e., . Using a firstorder approximation for each update matrix , we have
(13)  
(14) 
where collates the corresponding vector representations of each of the Lie algebra matrices , and and are constructed analogous to Eqn. 9. The subsequent update in the th iteration is then analogously obtained from the relation
(15) 
where , and correspondingly collate all , and . As earlier, the solution for the system of equations in Eqn. 15 is the IRLS method, where we estimate the weights and the collated vector representation in alternating iterations till convergence.
Given a solution for , we can estimate, for each , and thereby update each member of as . It should also be noted that in multiview registration, the choice of the global frame of reference for the set is arbitrary. For our implementation, we fix it to the first scan without loss of generality, i.e., we set and do not update throughout course of optimization. The detailed algorithm is presented in the appendix.
5 Results
In recent literature, the fast global registration (FGR) method of [12] has been shown to outperform other global registration and local refinement algorithms in terms of speed and registration accuracy. Therefore, owing to space constraints, we confine the comparison of our results mostly to those of FGR as this will suffice to show where our performance stands in terms of the other registration algorithms. Furthermore, we test our algorithm on 3 different choices of the loss function , namely : , : , and scaled GemanMcClure: , where is the scale factor. The FGR approach of [12] uses only the scaled GemanMcClure loss function and anneals in fixed steps per iteration. To enable comparison, in our tests with the scaled GemanMcClure loss function we anneal in an identical manner. We present results on both pairwise and multiview registration tests. For these pairwise and multiview tests we terminate using and respectively. All our reported running times are measured on a single thread on an Intel Xeon(R) E52650 v3 processor clocked at 2.30 GHz.
5.1 Pairwise Registration
We present the performance of our proposed pairwise registration on the synthetic range data provided by [12] and on the 4 indoor sequences in the Augmented ICLNUIM dataset provided by Choi et al. [11]. In all our pairwise tests, we use . We compare the registration errors given by the 3 versions of our method with the following prior methods: Super4PCS [7], GoICP [29], Choi et al. [11], FGR [12] and DARE [45]
(using the hyperparameters suggested by the authors). Registration errors comprise of statistical measures on the distances between the groundtruth point correspondences between pairs of scans post alignment. We also report the trajectory errors, which include statistical measures on both the rotation errors and the translation norm errors between the pairs of cameras (corresponding to the given scans) w.r.t. the groundtruth camera pair, for all the methods.
Synthetic range dataset: We perform the set of controlled experiments described in [12] on each of their 5 synthetic datasets, at the given Gaussian noise levels and (for each model, is in units of the surface diameter). Note that adding Gaussian noise to a pair of scans introduces outliers in the point correspondences that are computed between that pair. Since depth cameras produce noisy scans in practice, this is a realistic way of increasing the outlier percentage in the point correspondences for synthetic scans. Table 1 lists for each method and for each noise level, the mean and maximal RMSE on the aligned ground truth correspondences. For both noise levels, our method with the loss function attains the lowest registration error together with FGR. Table 2 reports the mean running time of the motion step of each method for each of the 5 models in the dataset. The of our method is more faster than all prior methods on the average, and the version is more than faster.
Method  

Md.RAE  Md.TNE  Mn.RMSE  Mx.RMSE  Md.RAE  Md.TNE  Mn.RMSE  Mx.RMSE  
Super4PCS [7]  0.864  0.008  0.014  0.029  1.468  0.012  0.017  0.095 
GoICP [29]  1.207  0.011  0.032  0.133  1.736  0.019  0.037  0.127 
Choi et al. [11]  0.778  0.006  0.008  0.022  1.533  0.015  0.035  0.274 
FGR [12]  0.749  0.005  0.004  0.011  1.146  0.008  0.006  0.017 
DARE [45]  1.851  0.013  0.035  0.176  2.005  0.025  0.054  0.312 
Our  0.545  0.004  0.004  0.011  0.959  0.008  0.006  0.017 
Our  0.566  0.004  0.004  0.011  1.516  0.011  0.007  0.017 
Our GM  0.725  0.005  0.004  0.011  1.146  0.008  0.006  0.017 
Dataset  Mean # points  Super 4PCS [7]  GoICP [29]  Choi et al. [11]  FGR [12]  DARE [45]  Our  Our

Our GM 
Bimba  9,416  16,230  1,550  650  11.9  920  3.9  2.5  5.5 
Child’n  11,148  18,410  1,620  890  16.8  960  4.8  3.3  8.0 
Dragon  11,232  20,520  1,840  970  17.6  1,090  5.0  3.4  8.2 
Angel  12,072  29,640  3,000  1,090  19.1  1,770  5.3  3.8  8.9 
Bunny  13,357  38,470  5,530  1,170  21.6  3,310  7.4  5.1  9.9 
Mean  11,445  24,650  2,710  960  17.4  1,610  5.3  3.6  8.1 
Augmented ICLNUIM dataset: Each of the 4 sequences in the Augmented ICLNUIM dataset provided by Choi et al. [11] consist of 2350 to 2870 scans of an indoor scene. Moreover, the given scans are provided in a smooth temporal sequence, i.e., pairs of scans with proximity in timestamp also have sufficient view overlap with each other. This, in turn, leads to reliable FPFH feature matches between such pairs. We therefore tested the performance of all the methods for all pairs of scans in each sequence such that . Table 3 lists the results on each dataset for the various methods under consideration. For each dataset and corresponding method, we list the median rotation angle error and the median translation norm error of the recovered pairwise camera motions, as well as the mean computation time on the pairs. The version of our method performs the best in terms of the trajectory error statistics. It can also be seen to be significantly faster than the FGR method of [12]. More such pairwise results on other datasets are given in the appendix.
Method  livingroom 1  livingroom 2  office 1  office 2  
Md.RAE  Md.TNE  Mn.Time  Md.RAE  Md.TNE  Md.Time  Md.RAE  Md.TNE  Mn.Time  Md.RAE  Md.TNE  Mn.Time  
Super4PCS [7]  1.104  0.039  368,030  0.616  0.033  344,720  0.932  0.038  367,980  0.844  0.027  345,460 
GoICP [29]  1.336  0.071  35,110  0.992  0.058  33,420  1.365  0.066  34,450  1.104  0.047  32,530 
Choi et al. [11]  0.941  0.041  14,740  0.551  0.031  13,850  0.811  0.036  14,720  0.765  0.029  13,990 
FGR [12]  0.793  0.029  272  0.482  0.021  181  0.707  0.020  272  0.669  0.016  177 
DARE [45]  1.305  0.044  21,500  1.172  0.059  20,320  1.716  0.037  21,110  1.286  0.068  20,920 
Our  0.595  0.023  61  0.380  0.017  50  0.474  0.014  59  0.437  0.011  45 
Our  0.964  0.025  33  0.419  0.019  27  0.569  0.017  33  0.524  0.013  25 
Our GM  0.793  0.029  118  0.482  0.021  87  0.707  0.020  118  0.669  0.016  89 
5.2 Joint Multiview Registration
We present the performance of our joint multiview registration algorithm on the 4 sequences in the Augmented ICLNUIM dataset, specifically, on the 47 to 57 scene fragments that were provided for each sequence by Choi et al. [11]. We use for multiview registration, as the joint optimization variable and its corresponding search space are both large ( in Eqn. 15 for cameras). First, we compute pairwise motion estimates between fragments followed by a robust motion averaging step on that is similar to that used for rotation averaging in [46]. The output of this twostage approach is used to initialize the joint multiview optimization in Eqn. 12. The main drawback of only using the twostep approach for global registration is that it is not a true global method. It only considers local point correspondences, and then averages out the errors made by the pairwise motion estimates. Conversely, the joint multiview approach deals with point correspondences in the global setting and solves for the global cost function. The relative improvement in reconstruction error gained by using the joint multiview approach on top of the twostage approach is shown in a table in the appendix.
We compare our results with those of Choi et al. [11] and FGR [12]. While we are aware of other approaches to multiview registration including closed form approaches, we omit them from our comparisons because they failed to provide a solution in the largescale dataset for multiview registration we have used. For example, Bartoli et al. [47] and Bergström et al. [48], among others, compute transformations using a closed form SVD solution that do not scale to large scale data. Other alternates, such as that of Fitzgibbon et al. [49], use an LMbased approach, which are slow and do not exploit the geometry of the problem.
Table 4 lists the mean registration error from the ground truth surface achieved by each compared method on each sequence as well as the time taken to complete execution. For estimating the mean registration error, we use the CloudCompare software available at http://www.cloudcompare.org.
Once again, the version of our method performs the best overall in terms of registration error and is significantly faster than the FGR method of [12]. Also our method is the fastest amongst all methods with a slight drop in accuracy compared to our method. Finally, Figure 1 shows a complete reconstruction of the sequence livingroom 2 from the Augmented ICLNUIM dataset produced by the version of our multiview method. Reconstructions of other sequences are given in the appendix.
Method  livingroom 1  livingroom 2  office 1  office 2  
MRE  Time  MRE  Time  MRE  Time  MRE  Time  
Choi et al. [11]  0.04  8,940  0.07  3,360  0.03  4,500  0.04  4,080 
FGR [12]  0.05  131  0.06  81  0.03  69  0.05  48 
Our  0.04  71  0.05  49  0.03  42  0.04  32 
Our  0.07  62  0.09  40  0.04  36  0.06  28 
Our GM  0.05  88  0.06  70  0.03  55  0.05  41 
6 Discussion
As we demonstrated in Section 5, our motion estimation method in Algorithm 1 is both fast and accurate. More specifically, our method outperforms the stateoftheart FGR method of [12] in terms of both speed and accuracy. Given their strong similarities, in Section 6.1 we examine the relationship of the method of [12] and our approach in Algorithm 1. Following that, we discuss the limitations of using FPFH for feature matching and our approach to overcome those in Section 6.2.
6.1 Comparison with FGR [12]
In [12], the cost function to be minimized is the same as that of Eqn. 5. However, for minimizing this cost function, they use a line process optimization. Originally developed in [38] for modelling discontinuities, a line process optimization can be shown to be equivalent to optimizing a robust estimator. Recalling that , we define a cost function
(16) 
where is the collection of line processes for each correspondence pair and is a penalty term for each line process. Here is a monotonically decreasing function designed such that when , is a fixed nonnegative constant and when , . Thus, varying the line process in the interval allows us to move between a leastsquares and a robust regime. In [38] it has been shown that for every choice of loss function , there is an equivalent such that minimization in Eqn. 16 yields the same solution as that of Eqn. 5. The FGR method of [12] utilizes the robustness of the line process to estimate the desired . Using a firstorder approximation of the motion update, [12] arrives at a GaussNewton solution (Eqn. 8 of [12]). It can be easily shown that this solution is identical to solving the system of equations in Eqn. 11 in our notation. In other words, while solving for the update step , the FGR method of [12] implicitly carries out a single iteration of our IRLS step in line 3 of Algorithm 1. In contrast, in our approach we carry out iterations of the IRLS step to achieve better convergence.
Although the difference between our method and that of [12] is only in the number of iterations of the inner IRLS step, its implication is both subtle and profound. If we were solving a single IRLS problem in a vector space setting, this difference would have been immaterial. However, we note that in both of our approaches, the linear solution for the updates are interleaved with the nonlinear motion updates , resulting in significantly different trajectories of the solution on the group. Specifically, in our case, by iterating the IRLS step to convergence we obtain the best possible estimate of in each intermediate step which, in turn, results in the best improvement of the estimate (equivalently the most reduction in the cost function in the th step).
Another noteworthy difference between the two methods is the choice of the parametrization of the motion representation. We use the geometrically correct form of in Eqn. 6, i.e., for our update step. However, for their update step, the authors of [12] use an extrinsic form of motion parametrization, i.e., . While our parametrization is geometrically consistent with Lie group theory, we can recognize from Eqn. 4 that the choice in [12] is approximately close to our representation for small motion, i.e., if and only if . Conversely, for sufficiently large , the approximate representation is notably different from the exact representation of the form. Therefore the improvement per iteration in the cost function for the method of [12] is lower compared to our single iteration IRLS form. The result is that the line processbased method of [12] has slower convergence.
We highlight both these differences for a pair of scans from the Augmented ICLNUIM dataset in Figure 3. For the purpose of illustration, we consider the loss function for the line process optimization routine proposed in [12] as well as for our optimization routine. The reason we do not consider the GemanMcClure loss function is that it has a scale factor, which, in practice, is initialized at a large value and has to be progressively annealed during optimization. This annealing introduces artifacts that unnecessarily clutter the illustration of the underlying differences. In other words, we use because it does not alter the fundamental properties of the two optimization routines, and at the same time lends itself to a clean illustration of our argument. We also note that in Figure 3, we represent the cost as a function of the number of iterations , and that it is depicted on a scale. For ease of visualization, we only show the plot for the iteration range . Alongside the figure, Table 3 reports the number of iterations taken by each method to reach the different convergence criteria specified by .
Firstly we observe from Table 3 that even though the line process is conceptually equivalent to our procedure with a single IRLS step, the proposed optimization routine of [12] takes more iterations than our actual procedure with a single IRLS step to converge to the same convergence criterion. This is because of the extrinsic (approximate) parametrization used in [12] as opposed to the correct Lie algebraic representation . Secondly, it is clear from both Figure 3 and Table 3 that the cost function converges in progressively fewer iterations as we increase the number of IRLS steps. However, it should be noted that increasing the number of IRLS steps makes each iteration of our optimization routine slower as well. Therefore, a balance has to be struck between the speed of each iteration and the number of iterations required for convergence. In practice, we have found that using 2 IRLS steps per iteration for pairwise registration and 3 IRLS steps per iteration for joint multiview registration yield our desired results. In any event, the key observation is that the single iteration of the FGR is insufficient and yields poorer convergence properties compared to our formulation.
Choice of Loss Function : The choice of the loss function to be used is a critical factor in our estimation procedure. All loss functions achieve robustness to outliers by a tradeoff with statistical efficiency (i.e. accuracy). In practice, the accuracy achievable by a chosen loss function depends on the empirical nature of the error distribution. As discussed earlier, in [12] the authors use the GemanMcClure loss function . Here the performance critically depends on chosing a good value for that reflects the outlier distribution inherent to the data used. In [12] the authors start with a large and progressively reduce it in fixed amounts with every iteration. However, if the nature of the data varies significantly, such a fixed annealing procedure may not produce the best possible results. To overcome this limitation of the GemanMcClure method, as we have shown in Section 5, we have also tested for and and demonstrated that the latter provides the best performance. Apart from improved performance, an important desirable property of using is that it is entirely parameter free, hence we do not need to follow any additional annealing procedure. We could conceivably further improve performance by determing the optimal exponent in for . However, for a broad range of possible error distributions, we find that is adequate.
6.2 Limitation of FPFH FeatureMatching Based Registration
In Section 5, we have demonstrated the potency of the FPFH featurematching based registration paradigm. However, we note that these experiments, derived from those presented in [12], have special data characteristics. Specifically, in these examples, either the datasets are small or the motions between adjacent scans are small in magnitude and exhibit temporal smoothness. However, we note here that registration based on feature correspondences cannot work in all possible contexts. For example, when scans in the input dataset have sufficiently large depth differences or large motions between them in both rotation and translation, FPFH featurebased matches become few and unreliable. Consequently, the reconstructions given by both FGR [12] and our proposed algorithms become incorrect. In such a scenario, we need to take recourse to using a robust ICPbased multiview registration method (albeit with a greater computational cost) which converges to the correct solution.
In this alternative approach, following [33], we again consider the camera viewgraph . For each edge available in the set , we estimate the pairwise motion using a robust version of ICP. Specifically, for the motion estimation step in our robust ICP, we use the motion obtained using the loss optimization for pairwise registration method as described in Algorithm 1. After all pairwise motions represented by the edges are estimated, we estimate the absolute motion of each camera w.r.t. a global frame of reference using robust motion averaging. Here, our solution for robust motion averaging for rigid Euclidean motions is similar to the robust rotation averaging method of [46]. We find that typically, we achieve the desired convergence in 3 full iterations of this procedure. To illustrate our argument, we show the qualities of reconstructions achieved for a lifesize statue of Mahatma Gandhi, where the input set of scans is small and mostly have low overlap. Moreover, overlapping scans have significant depth differences between them, leading to significantly different FPFH features and consequently, a high percentage of incorrect matches. A visual representation of this scenario is given in the appendix for better understanding. As we can see in Figure 4, the joint multiview registration using FPFH featuresbased matches fails to correctly align some of the scans, whereas the robust multiviewbased ICP routine successfully produces a correct reconstruction.
7 Conclusion
We have presented a fast and efficient 3D point registration method that optimizes on the group and generalizes the line process based optimization method. In addition, we have shown that in scenarios where feature correspondences cannot be reliably established, our robust motion estimation can make a multiview ICP method robust and effective.
References
 [1] R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohli, J. Shotton, S. Hodges, and A. Fitzgibbon, “KinectFusion: RealTime Dense Surface Mapping and Tracking,” in Proceedings of the 10th IEEE International Symposium on Mixed and Augmented Reality, 2011.
 [2] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A Benchmark for the Evaluation of RGBD SLAM Systems,” in International Conference on Intelligent Robots and Systems (IROS), IEEE/RSJ, pp. 573–580, IEEE, 2012.
 [3] Z. Zhang, “Microsoft Kinect Sensor and its Effect,” IEEE Multimedia, vol. 19, no. 2, pp. 4–10, 2012.
 [4] J. Salvi, C. Matabosch, D. Fofi, and J. Forest, “A Review of Recent Range Image Registration Methods with Accuracy Evaluation,” Image and Vision Computing, vol. 25, no. 5, pp. 578–596, 2007.
 [5] G. K. Tam, Z.Q. Cheng, Y.K. Lai, F. C. Langbein, Y. Liu, D. Marshall, R. R. Martin, X.F. Sun, and P. L. Rosin, “Registration of 3D Point Clouds and Meshes: A Survey from Rigid to Nonrigid,” IEEE Transactions on Visualization and Computer Graphics, vol. 19, no. 7, pp. 1199–1217, 2013.
 [6] Y. Guo, M. Bennamoun, F. Sohel, M. Lu, and J. Wan, “3D Object Recognition in Cluttered Scenes with Local Surface Features: A Survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 11, pp. 2270–2287, 2014.
 [7] N. Mellado, D. Aiger, and N. J. Mitra, “Super 4PCS Fast Global Pointcloud Registration via Smart Indexing,” in Computer Graphics Forum, vol. 33 (5), pp. 205–215, Wiley Online Library, 2014.
 [8] N. Gelfand, N. J. Mitra, L. J. Guibas, and H. Pottmann, “Robust Global Registration,” in Symposium on Geometry Processing, vol. 2 (3), p. 5, 2005.

[9]
A. Makadia, A. Patterson, and K. Danilidis, “Fully Automatic Registration
of 3D point Clouds,” in
IEEE Computer Society Conference on Computer Vision and Pattern Recognition
, vol. 1, pp. 1297–1304, IEEE, 2006.  [10] D. Holz, A. E. Ichim, F. Tombari, R. B. Rusu, and S. Behnke, “Registration with the Point Cloud Library: A Modular Framework for Aligning in 3D,” IEEE Robotics & Automation Magazine, vol. 22, no. 4, pp. 110–124, 2015.
 [11] S. Choi, Q.Y. Zhou, and V. Koltun, “Robust Reconstruction of Indoor Scenes,” in IEEE Conference on Computer Vision and Pattern Recognition, pp. 5556–5565, IEEE, 2015.
 [12] Q.Y. Zhou, J. Park, and V. Koltun, “Fast Global Registration,” in European Conference on Computer Vision, pp. 766–782, Springer, 2016.
 [13] F. Tombari, S. Salti, and L. Di Stefano, “Performance Evaluation of 3D Keypoint Detectors,” International Journal of Computer Vision, vol. 102, no. 13, pp. 198–220, 2013.
 [14] Y. Guo, M. Bennamoun, F. Sohel, M. Lu, J. Wan, and N. M. Kwok, “A Comprehensive Performance Evaluation of 3D Local Feature Descriptors,” International Journal of Computer Vision, vol. 116, no. 1, pp. 66–89, 2016.
 [15] S. Rusinkiewicz and M. Levoy, “Efficient Variants of the ICP Algorithm,” in Proceedings of the Third International Conference on 3D Digital Imaging and Modeling, pp. 145–152, IEEE, 2001.
 [16] D. Aiger, N. J. Mitra, and D. CohenOr, “4Points Congruent Sets for Robust Pairwise Surface Registration,” in ACM Transactions on Graphics (ToG), vol. 27 (3), p. 85, ACM, 2008.
 [17] B. Drost, M. Ulrich, N. Navab, and S. Ilic, “Model Globally, Match Locally: Efficient and Robust 3D Object Recognition,” in IEEE Conference on Computer Vision and Pattern Recognition, pp. 998–1005, IEEE, 2010.
 [18] S. Bouaziz, A. Tagliasacchi, and M. Pauly, “Sparse Iterative Closest Point,” in Proceedings of the Eleventh Eurographics/ACM SIGGRAPH Symposium on Geometry Processing, pp. 113–123, Eurographics Association, 2013.
 [19] D. Chetverikov, D. Svirko, D. Stepanov, and P. Krsek, “The Trimmed Iterative Closest Point Algorithm,” in Proceedings of the 16th International Conference on Pattern Recognition, vol. 3, pp. 545–548, IEEE, 2002.
 [20] J. Yang, H. Li, and Y. Jia, “GoICP: Solving 3D Registration Efficiently and Globally Optimally,” in IEEE International Conference on Computer Vision, pp. 1457–1464, IEEE, 2013.
 [21] S. Umeyama, “LeastSquares Estimation of Transformation Parameters Between Two Point Patterns,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 4, pp. 376–380, 1991.
 [22] R. Raguram, J.M. Frahm, and M. Pollefeys, “A Comparative Analysis of RANSAC Techniques Leading to Adaptive RealTime Random Sample Consensus,” in European Conference on Computer Vision, pp. 500–513, Springer, 2008.
 [23] R. B. Rusu, N. Blodow, and M. Beetz, “Fast Point Feature Histograms (FPFH) for 3D Registration,” in IEEE International Conference on Robotics and Automation, pp. 3212–3217, IEEE, 2009.
 [24] C. Papazov, S. Haddadin, S. Parusel, K. Krieger, and D. Burschka, “Rigid 3D Geometry Matching for Grasping of Known Objects in Cluttered Scenes,” The International Journal of Robotics Research, vol. 31, no. 4, pp. 538–553, 2012.
 [25] R. F. SalasMoreno, R. A. Newcombe, H. Strasdat, P. H. Kelly, and A. J. Davison, “SLAM++: Simultaneous Localisation and Mapping at the Level of Objects,” in IEEE Conference on Computer Vision and Pattern Recognition, pp. 1352–1359, IEEE, 2013.
 [26] R. I. Hartley and F. Kahl, “Global Optimization Through Searching Rotation Space and Optimal Estimation of the Essential Matrix,” in IEEE 11th International Conference on Computer Vision, pp. 1–8, IEEE, 2007.
 [27] H. Li and R. Hartley, “The 3D3D Registration Problem Revisited,” in IEEE 11th International Conference on Computer Vision, pp. 1–8, IEEE, 2007.
 [28] O. Enqvist, K. Josephson, and F. Kahl, “Optimal Correspondences from Pairwise Constraints,” in IEEE 12th International Conference on Computer Vision, pp. 1295–1302, IEEE, 2009.
 [29] J. Yang, H. Li, D. Campbell, and Y. Jia, “GoICP: a Globally Optimal Solution to 3D ICP PointSet Registration,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 11, pp. 2241–2254, 2016.
 [30] B. Jian and B. C. Vemuri, “Robust Point Set Registration Using Gaussian Mixture Models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, pp. 1633–1645, Aug 2011.
 [31] G. D. Evangelidis and R. Horaud, “Joint Alignment of Multiple Point Sets with Batch and Incremental ExpectationMaximization,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, pp. 1397–1410, June 2018.
 [32] B. Eckart, K. Kim, and J. Kautz, “HGMR: Hierarchical Gaussian Mixtures for Adaptive 3D Registration,” in The European Conference on Computer Vision (ECCV), September 2018.
 [33] V. M. Govindu and A. Pooja, “On Averaging Multiview Relations for 3D Scan Registration,” IEEE Transactions on Image Processing, vol. 23, pp. 1289–1302, March 2014.
 [34] A. Torsello and A. Albarelli, “MultiView Registration via Graph Diffusion of Dual Quaternions,” in IEEE Conference on Computer Vision and Pattern Recognition, pp. 2441–2448, 2011.
 [35] F. Arrigoni, B. Rossi, and A. Fusiello, “Global Registration of 3D Point Sets via LRS Decomposition,” in European Conference on Computer Vision, pp. 489–504, 2016.
 [36] P. W. Holland and R. E. Welsch, “Robust Regression Using Iteratively Reweighted LeastSquares,” Communications in Statistics  Theory and Methods, vol. 6, no. 9, pp. 813–827, 1977.
 [37] K. Aftab and R. Hartley, “Convergence of Iteratively Reweighted Least Squares to Robust Mestimators,” in Applications of Computer Vision (WACV), 2015 IEEE Winter Conference on, pp. 480–487, IEEE, 2015.
 [38] M. J. Black and A. Rangarajan, “On the Unification of Line Processes, Outlier Rejection, and Robust Statistics with Applications in Early Vision,” International Journal of Computer Vision, vol. 19, no. 1, pp. 57–91, 1996.
 [39] G. Chirikjian, Stochastic Models, Information Theory, and Lie Groups, Volume 2: Analytic Methods and Modern Applications. Birkhauser Boston, 2011.
 [40] J. Selig, Geometrical Methods in Robotics. Springer New York, 2013.
 [41] F. C. Park, “Distance Metrics on the RigidBody Motions with Applications to Mechanism Design,” Journal of Mechanical Design, vol. 117, no. 1, pp. 48–54, 1995.
 [42] P. Meer, D. Mintz, A. Rosenfeld, and D. Y. Kim, “Robust Regression Methods for Computer Vision: A Review,” International Journal of Computer Vision, vol. 6, no. 1, pp. 59–70, 1991.
 [43] E. Zacur, M. Bossa, and S. Olmos, “LeftInvariant Riemannian Geodesics on Spatial Transformation Groups,” SIAM Journal on Imaging Sciences, vol. 7, no. 3, pp. 1503–1557, 2014.
 [44] R. Szeliski, Computer Vision: Algorithms and Applications. SpringerVerlag New York, Inc., 2010.
 [45] F. J. Lawin, M. Danelljan, F. Khan, P.E. Forssén, and M. Felsberg, “Density Adaptive Point Set Registration,” in IEEE Conference on Computer Vision and Pattern Recognition, Computer Vision Foundation, June 2018.
 [46] A. Chatterjee and V. M. Govindu, “Robust Relative Rotation Averaging,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, pp. 958–972, April 2018.
 [47] A. Bartoli, D. Pizarro, and M. Loog, “Stratified Generalized Procrustes Analysis,” International journal of computer vision, vol. 101, no. 2, pp. 227–253, 2013.
 [48] P. Bergström and O. Edlund, “Robust Registration of Point Sets Using Iteratively Reweighted Least Squares,” Computational Optimization and Applications, vol. 58, no. 3, pp. 543–561, 2014.
 [49] A. W. Fitzgibbon, “Robust Registration of 2D and 3D Point Sets,” Image and Vision Computing, vol. 21, no. 1314, pp. 1145–1153, 2003.
 [50] A. S. Mian, M. Bennamoun, and R. Owens, “ThreeDimensional ModelBased Object Recognition and Segmentation in Cluttered Scenes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 10, pp. 1584–1601, 2006.
Appendix A Linear Form of the Error Term in Eqn. 9
Appendix B Algorithm for Joint Multiview Registration
Similar to our algorithm for robust motion estimation between a pair of 3D scans, we state our solution for the robust motion estimation of a set of 3D scans as given in Algorithm 2.
Appendix C More Results
UWA dataset: Table 5 reports the performace of the motion step of the Fast Global Registration (FGR) method of [12] as well as all the 3 versions of our method on the UWA dataset [50]. This dataset consists of 5 objects and 50 scenes, with each scene consisting of a combination of these objects. The task is to align individual objects to the scenes, given that each scene contains substantial clutter and occlusion. A total of 188 such alignment tasks are provided in the dataset. As we can see from Table 5, the version of our method produces the lowest median rotation angle and median translation norm errors. It is also significantly faster than FGR [12].
Method  Median RAE  Median TNE  Mean Time 

FGR [12]  1.276  0.152  32.7 
Our  0.319  0.034  10.1 
Our  0.824  0.108  7.0 
Our GM  1.276  0.152  14.5 
Relative improvement of joint multiview approach over twostage motion averaging approach: As described in Section 5.2, we show in Table 6 that the reconstruction error of the scenes in the Augmented ICLNUIM dataset [11] decreases when we use our joint multiview estimation procedure on top of the twostage motion averaged approach. We show the improvement achieved using the loss, which is our bestperforming version.
livingroom 1  livingroom 2  office 1  office 2  

Before MV  0.07  0.07  0.06  0.07 
After MV  0.04  0.05  0.03  0.04 
Appendix D An Illustration of the Limitation of FPFH FeatureMatching Based Registration
As discussed in Section 6.2, we have presented a scenario where the FPFH featurematching based registration technique breaks down due to unreliability of the feature matches themselves. In this particular scenario, we have 23 scans of a lifesize statue of Mahatma Gandhi collected with a standard commercial depth camera. Figure 8 shows the plan view of a schematic of the cameras (represented as balls) around the statue, as recovered by our ICPbased multiview approach. Recall that these cameras are the nodes of the viewgraph . We also display a schematic of the edges in the viewgraph (using sticks). The thickness of each edge is proportional to the number of FPFH feature matches found between the correponding camera (or equivalently scan) pair.
We can observe from Figure 8 that the thinnest edges are found between pairs of cameras at different depths, implying that there are extremely few FPFH feature matches between these cameras. Compounding this observation with the fact that FPFH features are noisy to begin with, the resultant motions between these cameras, even with our robust cost function, are grossly incorrect. In contrast, our ICPbased multiview approach can, albeit at a higher computational cost, align these cameras correctly and produce the desired reconstruction.
Comments
There are no comments yet.