Point cloud reconstruction of outdoor scenes has many important applications such as 3D architectural modeling, terrestrial surveying, Simultaneous Localization and Mapping (SLAM) for autonomous vehicles, etc. Compared to images, point clouds from 3D scanners exhibit less variation under different weather or lighting conditions, e.g., summer and winter (Fig. 1), or day and night (Fig. 5). Furthermore, the depths of point clouds from 3D scanners are more accurate than image-based reconstructions. Consequently, point clouds from 3D scanners are preferred for large-scale outdoor 3D reconstructions. Most existing methods for 3D reconstruction are solved via a two-step approach: a front-end data association step and a back-end optimization step. More specifically, data association is used to establish feature matches  in point cloud fragments for registration, and loop-closures  between point cloud fragments for pose-graph  optimization. Unfortunately, no existing algorithm for feature matching and loop-closure detection guarantees complete elimination of outliers. Although outlier feature matches are usually handled with RANSAC-based geometric verification [16, 30], such pairwise checks do not consider global consistency. In addition, the numerous efforts on improving the accuracy in loop-closure detection [6, 8, 20, 26] are not completely free from false positives. Many back-end optimization algorithms [13, 14, 21] are based on non-linear least-squares that lack the robustness to cope with outliers. A small number of outliers would consequently lead to catastrophic failures in the 3D reconstructions. Several prior works focus on disabling outlier loop-closures in the back-end optimization [5, 15, 25]. However, these methods do not consider the effect from the outlier feature matches with the exception of  that solves global geometric registration in a very small-scale problem setting.
The main contribution of this paper is a probabilistic approach for robust back-end optimization to handle outliers from a weak front-end data association in large-scale point cloud based reconstructions. Our approach simultaneously suppresses outlier feature matches and loop-closures. To this end, we model our robust point cloud reconstruction problem as a Bayesian network. The global poses of the point cloud fragments are the unknown parameters, and odometry and loop-closure constraints are the observed variables. A binary latent variable is assigned to each loop-closure constraint; it determines whether a loop-closure constraint is an inlier or outlier. We model feature matches in the odometry constraints with a long-tail Cauchy distribution to gain robustness to outlier matches. Additionally, we use a Cauchy-Uniform mixture model for loop-closure constraints. The uniform and Cauchy distributions model outlier loop-closures and the feature matches in inlier loop-closures, respectively. In contrast to many existing back-end optimizers that use rigid transformations as the odometry and loop-closure constraints [5, 14, 15, 21, 25], we use the distances between feature matches to exert direct influence on these matches.
We use the Expectation-Maximization (EM) algorithm [3, 15] to find the globally consistent poses of the point cloud fragments (Sec. 4). The EM algorithm iterates between the Expectation and Maximization steps. In the Expectation step, the posterior of a loop-closure constraint being an inlier is updated. In the Maximization step, a local optimal solution for the global poses is found from maximizing the expected complete data log-likelihood over the posterior from the expectation step. We also generalize our approach to solve reconstruction problems with an easier setting (Sec. 5). In particular, a strong assumption is imposed: odometry and inlier loop-closure constraints are free from outlier feature matches. We show that by using a Gaussian-Uniform mixture model, our approach degenerates to the formulation of a state-of-the-art approach for robust indoor reconstruction . Fig. 1 shows an example of the reconstruction result with our method compared to other methods in the presence of outliers.
2 Related Work
Reconstruction of outdoor scenes has been studied in [22, 23]. Schöps et al.  propose a set of filtering steps to detect and discard unreliable depth measurements acquired from a RGB-D camera. However, loop-closures is not detected and this can lead to reconstruction failures. Relying on very accurate GPS/INS, Pollefeys et al.  propose a 3D reconstruction system from RGB images. However, GPS/INS signal may be unavailable or unreliable, especially on cloudy days or in urban canyons. Our work relies on neither GPS/INS nor RGB images. In contrast, we focus on reconstruction from point cloud data acquired from 3D scanners that is less sensitive to weather or lighting changes. There are also many works on indoor scene reconstruction. Since the seminal KinectFusion , there are several follow-up algorithms [4, 19, 27]. Unfortunately, these methods do not detect loop-closures. Nonetheless, there are many RGB-D reconstruction methods with loop-closure detection [5, 7, 10, 11, 24, 28, 31, 32, 33].
Choi et al.  achieve the state-of-the-art performance for indoor reconstruction with robust loop-closure. However, they assume no outlier feature matches in the odometry and inlier loop-closure constraints. We relax this assumption to achieve robust feature matching. More specifically, estimates a switch variable  for each loop-closure constraint using line processes . Outlier loop-closures are disabled by setting the respective switch variables to zero. Additional switch prior terms are imposed and chosen empirically  to prevent a trivial solution of removing all loop-closure constraints. In comparison, our approach does not require the additional prior terms. We estimate the posterior of a loop-closure being an inlier constraint in the Expectation step shown in Sec. 4. The EM approach is also used by Lee et al. . However, they solve a robust pose-graph optimization problem without coping with the feature matches for reconstruction.
In this section, we provide an overview of our reconstruction pipeline that consists of four main components: point cloud fragment construction, point cloud registration, loop-closure detection, and robust reconstruction with EM.
Point cloud fragment construction.
A single scan from a 3D scanner, e.g. LiDAR, contains limited number of points. We integrate multiple consecutive scans with odometry readings obtained from dead reckoning e.g., the Inertial Navigation System (INS)  to form local point cloud fragments. A set of 3D features is then extracted from each point cloud fragment using .
Point cloud registration.
It is inefficient to perform an exhaustive pairwise registration for large-scale outdoor scenes with many point cloud fragments. Hence, we perform point cloud based place-recognition  to identify a set of candidate loop-closures. We retain the top potential loop-closures for each fragment and remove the duplicates. For each loop-closure between fragments and , we keep the set of top feature matches denoted as . We define as a loop-closure constraint, which can either be an inlier or outlier. Similar to the odometry constraint, an inlier loop-closure can also contain outlier feature matches.
Robust reconstruction with EM.
The constraints from point cloud registration and loop-closure detection can contain outliers. In particular, both odometry and loop-closure constraints can contain outlier feature matches. Moreover, many detected loop-closures are false positives. In the next section, we describe our probabilistic modeling approach to simultaneously suppress outlier feature matches and false loop-closures. The EM algorithm is used to solve for the globally consistent fragment poses. Optional refinement using ICP can be applied to further improve the global point cloud registration.
4 Robust Reconstruction with EM
We model the robust reconstruction problem as a Bayesian network shown in Fig. 2. Let , where , denote the fragment poses, denote the odometry constraints obtained in point cloud registration, and denote the loop-closure constraints obtained in loop-closure detection. We explicitly assign the loop-closure constraints into clusters that represent the inliers and outliers. For each loop-closure constraint , we introduce a corresponding assignment variable .
is a one-hot vector:and assigns assigns as an inlier and outlier loop-closure constraint, respectively. We use to denote the assignment variables. is the unknown parameter, is the latent variable, and and are both observed variables.
Robust reconstruction can be solved as finding the Maximum a Posterior (MAP) solution of . However, the MAP solution involves an intractable step of marginalization over the latent variable . We circumvent this problem by using the EM algorithm that takes the maximization of the expected complete data log-likelihood over the posterior of the latent variables. The EM algorithm iterates between the Expectation and Maximization steps. In the Expectation step, we use , i.e., fragment poses solved from the previous iteration to find the posterior distribution of the latent variable ,
in which does not depend on , since they are conditionally independent given according to the Bayesian network in Fig. 2.
In the Maximization step, the posterior distribution (Eq. (1)) is used to update by maximizing the expectation of the complete data log-likelihood denoted by
We define for the term with odometry constraints, and for the term with loop-closure constraints.
The unknown parameters, i.e., global poses of the fragments, are initialized with the relative poses computed from odometry constraints using ICP. Other dead reckoning methods such as wheel odometry and/or INS readings can also be used.
4.1 Modeling Odometry Constraints
Odometry constraints are obtained from point cloud registration between two consecutive point cloud fragments. Recall that an odometry constraint is a set of feature matches between fragments and , which can contain outlier matches. To gain robustness, we model each feature match
with a long-tail multivariate Cauchy distribution. Suppose these feature matches are independent and identically distributed (i.i.d.), we take a geometric mean over their product to get
which we assume an isotropic covariance with scale , and denotes the Mahalanobis distance such that
The value of is set based on the density of extracted features. For example, m in the outdoor dataset.
4.2 Modeling Loop-Closure Constraints
A loop-closure constraint is the set of feature matches between fragments and . We propose to use a Cauchy-Uniform mixture model to cope with the (1) outlier loop-closure constraints and (2) outlier feature matches in the inlier loop-closure constraints.
To distinguish between inlier and outlier loop-closures, we model the distribution of assignment variable,
Next, we use two distributions: Cauchy and Uniform distributions to model the inlier and outlier loop-closure constraints, respectively.
Cauchy distribution – inlier loop-closure constraints.
The inlier loop-closure constraints can contain outlier feature matches. We use the same multivariate Cauchy distribution as Eq. (3) and further reorganize the terms. We define for brevity, such that
and denotes the number of feature matches in .
Uniform distribution – outlier loop-closure constraints.
We model the outlier loop-closure constraints with a uniform distribution defined by a constant probability ,
4.3 Expectation Step
The constant consists of two distribution parameters: is the probability of being an inlier loop-closure; is the constant probability to uniformly sample a random loop-closure, which are difficult to set manually based on different datasets. Hence, we propose to estimate based on the input data. More specifically, we learn from the odometry constraints, since all odometry constraints are effectively inlier loop-closure constraints.
The process to learn is as follows. First, for each odometry constraint , we denote its corresponding error term (analogous to Eq. (10)), where
Next, we compute the median error denoted as . Since we regard all odometry constraints as inlier loop-closure constraints, let
where we set , meaning that a loop-closure with a small error () is very likely to be an inlier (). Finally, we solve for using Eq. (13).
4.4 Maximization Step
In the maximization step, we solve for that maximizes , where and are shorthand notations defined in Eq. (2). These two terms are evaluated independently, and then optimized jointly.
Assuming the odometry constraints in are i.i.d., the joint probability of all odometry constraints is given by
Substituting the joint probability of the feature matches within each odometry constraint (Eq. (3)), we can rewrite as
We can rewrite as
The maximization of can be reformulated into a non-linear least-squares problem with the following objective function
which can be easily optimized using the sparse Cholesky solver in Google Ceres . The computation complexity is cubic to the total number of feature matches.
5 Generalization using EM
In the previous section, we solved the problem when constraints are contaminated with outlier feature matches. In this section, we study a problem with an easier setting where correct loop-closure constraints contain no outlier feature matches. Recall that long-tail multivariate Cauchy distribution is used to gain robustness against outlier feature matches. We replace the multivariate Cauchy distribution with a multivariate Gaussian distribution for the easier problem without outlier feature matches, and show that our EM formulation degenerates to the formulation of a state-of-the-art approach for robust indoor reconstruction. To avoid repetition, we only highlight the major differences to the previous section. Each analogous term is augmented with a superscript that stands for “Gaussian”.
Replacing the multivariate Cauchy distribution in Eq. (3) with a multivariate Gaussian distribution, we have
and and remain unchanged.
and is the number of feature matches. We note that is a sum-of-square errors that can lead to arithmetic overflow in the term from the posterior of the latent variable (analogous to Eq. (10)). In contrast, there is no arithmetic overflow in the term from Eq. (10) since from Eq. (8) is a sum-of-log errors. We propose to alleviate the arithmetic overflow problem by using a Pareto distribution that approximates as
where is a scale parameter. For outlier loop-closures, the uniform distribution in Eq. (9) still holds.
|Living room 1||Living room 2||Office 1||Office 2||Average|
|Choi et al. ||Recall(%)||57.6||49.7||63.3||60.7||57.8|
|Ours (Sec. 5)||Recall(%)||58.7||48.4||63.9||61.5||58.1|
|Living room 1||Living room 2||Office 1||Office 2||Average|
|Whelan et al. ||0.22||0.14||0.13||0.13||0.16|
|Kerl et al. ||0.21||0.06||0.11||0.10||0.12|
|Choi et al. ||0.04||0.07||0.03||0.04||0.05|
|Ours (Sec. 5)||0.06||0.09||0.05||0.04||0.06|
It becomes apparent in that the arithmetic overflow problem is alleviated by the replacement of with . In the previous section, in Eq. (13) is learned from the median error of all the error terms in the odometry constraints. Unfortunately, the median error from becomes uninformative because we assume no outlier feature matches, i.e., since . Despite the absence of outlier feature matches, is upper bounded by some threshold, . Hence, the mean error term can be directly estimated from Eq. (23) as . Subsequently, let
where we set and solve for . We set m for our experiments on the indoor dataset (see next section) based on the typical magnitude of sensor noise.
Finally, we reformulate the maximization problem as a non-linear least-squares problem with the following objective function
which is similar to the formulation in  with two minor differences. First, we average the square errors over the number of feature matches but  does not. Second, we estimate the posterior by iterating between the Expectation and Maximization steps but  estimates it using line processes . It is important to note that Eq. (28) is derived from the original Gaussian formulation in Eq. (22) instead of the Pareto approximation in Eq. (24).
We use the experimental results from two datasets for the comparison between our approach and the state-of-the-art approach . The first dataset is from small-scale indoor scenes with no outlier feature matches in the odometry and inlier loop-closure constraints, and the second dataset is from large-scale outdoor scenes with outlier feature matches. Our Gaussian-Uniform EM (Sec. 5) and Cauchy-Uniform EM (Sec. 4) are evaluated on the small-scale indoor and large-scale outdoor datasets, respectively.
6.1 Small-Scale Indoor Scenes
The “Augmented ICL-NUIM Dataset” provided and augmented by  and , respectively, is used as the small-scale indoor dataset. This dataset is generated from synthetic indoor environments and includes two models: a living room and an office. There are two RGB-D image sequences for each model, resulting in a total of four test cases. To ensure fair comparison, we follow the same evaluation criteria and experimental settings as .
after pruning and (3) our method after pruning. Here, “before pruning” refers the loop-closures from the loop-closure detection, and “after pruning” refers to the inlier loop-closures after robust optimization. It can be seen that the average precision and recall of our method is comparable to. This is an expected result since we showed in Sec. 5 that our method degenerates to the method in  with minor differences in the absence of outlier feature matches. We further evaluate the reconstruction accuracy of the final model using the error metric proposed in , i.e., the mean distance of the reconstructed surfaces to the ground truth surfaces. Tab. 2 shows the comparison of the reconstruction accuracy of our method to other existing approaches. In addition, as suggested in , the reconstruction accuracy of the model obtained from fusing the input depth images with the ground truth trajectory (denoted as GT Trajectory in Tab. 2) is reported for reference. As expected, our method shows comparable result with the state-of-the-art on the indoor dataset.
6.2 Large-Scale Outdoor Scenes
The large-scale outdoor dataset is based on the “Oxford Robotcar Dataset” . It consists of 3D point clouds captured with a LiDAR sensor mounted on a car that repeatedly drives through Oxford, UK, at different times over a year. We select two different driving routes from the dataset, a short route (about 1km) and a long route (city-scale). Furthermore, we take two traversals at different times for each route, resulting four traversals in total. Unlike the synthetic indoor dataset, there is no ground truth of the surface geometry. We evaluate the trajectory accuracy against the GPS/INS readings as an indirect measurement of reconstruction accuracy. We prepare the dataset as follows:
Point cloud fragments. We integrate the push-broom 2D LiDAR scans and their corresponding INS readings into the 3D point clouds. We segment the data into fragments with 30m radius for every 10m interval. Each fragment is then downsampled using a VoxelGrid filter with a grid size of 0.2m. 242 and 1770 fragments are constructed for the 1km route and the city-scale route, respectively.
Odometry trajectory. The odometry trajectory is disconnected due to discontinuous INS data since we are combining two traversals. We simulate the odometry trajectory via geometric registrations between consecutive point cloud fragments, and manually identify one linkage transformation between the two traversals. We also check the entire odometry trajectory to ensure that there are no remaining erroneous transformations. The resulting odometry trajectory is used to initialize the fragment poses, .
Odometry constraints. For every two consecutive frames along the odometry trajectory, we perform point cloud registration as described in Sec. 3. Specifically, we extract 1024 features for each fragment, and collect the top 200 feature matches to form an odometry constraint. Note that the feature matches are selected without additional geometric verification, and it can contain outliers. 241 and 1769 odometry constraints are constructed for the 1km route and the city-scale route, respectively.
Loop-closure constraints. We perform loop-closure detection as described in Sec. 3. We take every 5th fragment along the trajectory as a keyframe fragment; loop-closures are detected among the selected keyframe fragments. For the 1km route, we find the top 5 loop-closures for each keyframe fragment and then remove the duplicates. For the city-scale route, we find the top 10 loop-closures for each keyframe fragment and then remove the duplicates. 171 and 1438 loop-closure constraints are constructed for the 1km and city-scale route, respectively. The outlier loop-closure ratio is more than 80% for both routes.
|Choi et al. ||123.24||207.93|
|Choi et al. ||1.97||50.92|
|Ours (Sec. 4)||1.34||2.45|
We compare the effectiveness of our approach with two baseline methods based on : a stronger and a weaker baseline. The stronger baseline encodes uncertainty information of the feature matches between two fragments into a covariance matrix. The feature matches used to construct the covariance matrix are those within 1m apart after geometric registration. Refer to  for the more details on the covariance matrix. The covariance matrix of the weaker baseline is set to identity, i.e., no uncertainty information on the feature matches. The relative poses between the point cloud fragments computed from ICP are used as the odometry and loop-closure constraints in the baseline methods.
Tab. 3 summarizes the mean distances of the estimated poses to the GPS/INS trajectory as an indirect measure of the reconstruction accuracy on the 1km and city-scale outdoor datasets. Fig. 3 and 4 show the plots of the trajectories. We align the first five fragment poses with the GPS/INS trajectory, error measurements start after the 5th fragment pose. The results show that the accuracy increases when more information about the feature matches is considered in the optimization process. We can see from Tab. 3, and Fig. 3 and 4 that the weaker baseline ( with uninformative identity covariance) without information of the feature matches gives the worst performance. The stronger baseline ( with informative covariance matrix) that encodes information of feature matches using the covariance matrix shows better performance. In contrast, our method that directly takes feature matches as the odometry and loop-closure constraints outperforms the two baselines. Furthermore, Fig. 1 and 5 show reconstruction results for qualitative evaluation. It can be seen from the bottom left and right plots in Fig. 5 that our method produces the sharpest reconstructions of the 3D point clouds.
In this paper, we proposed a probabilistic approach for robust point cloud reconstruction of large-scale outdoor scenes. Our approach leverages on a Cauchy-Uniform mixture model to simultaneously suppress outlier feature matches and loop-closures. Moreover, we showed that by using a Gaussian-Uniform mixture model, our approach degenerates to the formulation of a state-of-the-art approach for robust indoor reconstruction. We verified our proposed methods on both indoor and outdoor benchmark datasets.
This work is supported in part by a Singapore MOE Tier 1 grant R-252-000-A65-114.
-  S. Agarwal, K. Mierle, and Others. Ceres solver. http://ceres-solver.org.
-  M. J. Black and A. Rangarajan. On the unification of line processes, outlier rejection, and robust statistics with applications in early vision. In IJCV, 1996.
-  G. Celeux and G. Govaert. A classification em algorithm for clustering and two stochastic versions. In CSDA, 1992.
-  J. Chen, D. Bautembach, and S. Izadi. Scalable real-time volumetric surface reconstruction. In TOG, 2013.
-  S. Choi, Q.-Y. Zhou, and V. Koltun. Robust reconstruction of indoor scenes. In ICCV, 2015.
-  M. Cummins and P. Newman. Appearance-only slam at large scale with fab-map 2.0. In IJRR, 2011.
-  F. Endres, J. Hess, J. Sturm, D. Cremers, and W. Burgard. 3-d mapping with an rgb-d camera. In T-RO, 2014.
-  D. Galvez-Lopez and J. D. Tardos. Real-time loop detection with bags of binary words. In IROS, 2011.
-  A. Handa, T. Whelan, J. McDonald, and A. J. Davison. A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM. In ICRA, 2014.
-  P. Henry, D. Fox, A. Bhowmik, and R. Mongia. Patch volumes: Segmentation-based consistent mapping with rgb-d cameras. In 3DV, 2013.
-  P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox. Rgb-d mapping: Using kinect-style depth cameras for dense 3d modeling of indoor environments. In IJRR, 2012.
-  C. Kerl, J. Sturm, and D. Cremers. Dense visual slam for rgb-d cameras. In IROS, 2013.
-  F. R. Kschischang, B. J. Frey, and H.-A. Loeliger. Factor graphs and the sum-product algorithm. In T-IT, 2001.
-  R. Kümmerle, G. Grisetti, H. Strasdat, K. Konolige, and W. Burgard. g 2 o: A general framework for graph optimization. In ICRA, 2011.
-  G. H. Lee, F. Fraundorfer, and M. Pollefeys. Robust pose-graph loop-closures with expectation-maximization. In IROS, 2013.
-  G. H. Lee and M. Pollefeys. Unsupervised learning of threshold for geometric verification in visual-based loop-closure. In ICRA, 2014.
-  W. Maddern, G. Pascoe, C. Linegar, and P. Newman. 1 year, 1000 km: The oxford robotcar dataset. In IJRR, 2017.
-  R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohi, J. Shotton, S. Hodges, and A. Fitzgibbon. Kinectfusion: Real-time dense surface mapping and tracking. In ISMAR, 2011.
-  M. Nießner, M. Zollhöfer, S. Izadi, and M. Stamminger. Real-time 3d reconstruction at scale using voxel hashing. In TOG, 2013.
-  D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. In CVPR, 2006.
-  E. Olson, J. Leonard, and S. Teller. Fast iterative alignment of pose graphs with poor initial estimates. In ICRA, 2006.
-  M. Pollefeys, D. Nistér, J.-M. Frahm, A. Akbarzadeh, P. Mordohai, B. Clipp, C. Engels, D. Gallup, S.-J. Kim, P. Merrell, et al. Detailed real-time urban 3d reconstruction from video. In IJCV, 2008.
-  T. Schöps, T. Sattler, C. Häne, and M. Pollefeys. Large-scale outdoor 3d reconstruction on a mobile device. In CVIU, 2017.
-  F. Steinbrucker, C. Kerl, and D. Cremers. Large-scale multi-resolution surface reconstruction from rgb-d sequences. In ICCV, 2013.
-  N. Sünderhauf and P. Protzel. Switchable constraints for robust pose graph slam. In IROS, 2012.
-  M. A. Uy and G. H. Lee. PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition. In CVPR, 2018.
-  T. Whelan, H. Johannsson, M. Kaess, J. J. Leonard, and J. McDonald. Robust real-time visual odometry for dense rgb-d mapping. In ICRA, 2013.
-  T. Whelan, M. Kaess, J. J. Leonard, and J. McDonald. Deformation-based loop closure for large scale dense rgb-d slam. In IROS, 2013.
-  J. Xiao, A. Owens, and A. Torralba. Sun3d: A database of big spaces reconstructed using sfm and object labels. In ICCV, 2013.
-  Z. J. Yew and G. H. Lee. 3DFeat-Net: Weakly Supervised Local 3D Features for Point Cloud Registration. In ECCV, 2018.
-  Q.-Y. Zhou and V. Koltun. Dense scene reconstruction with points of interest. In TOG, 2013.
-  Q.-Y. Zhou and V. Koltun. Simultaneous localization and calibration: Self-calibration of consumer depth cameras. In CVPR, 2014.
-  Q.-Y. Zhou, S. Miller, and V. Koltun. Elastic fragments for dense scene reconstruction. In ICCV, 2013.
-  Q.-Y. Zhou, J. Park, and V. Koltun. Fast global registration. In ECCV, 2016.