Direction-Aware Semi-Dense SLAM

09/18/2017 ∙ by Julian Straub, et al. ∙ 0

To aide simultaneous localization and mapping (SLAM), future perception systems will incorporate forms of scene understanding. In a step towards fully integrated probabilistic geometric scene understanding, localization and mapping we propose the first direction-aware semi-dense SLAM system. It jointly infers the directional Stata Center World (SCW) segmentation and a surfel-based semi-dense map while performing real-time camera tracking. The joint SCW map model connects a scene-wide Bayesian nonparametric Dirichlet Process von-Mises-Fisher mixture model (DP-vMF) prior on surfel orientations with the local surfel locations via a conditional random field (CRF). Camera tracking leverages the SCW segmentation to improve efficiency via guided observation selection. Results demonstrate improved SLAM accuracy and tracking efficiency at state of the art performance.



There are no comments yet.


page 1

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Related Work

Among the wealth of recent 3D SLAM systems [13, 25, 33, 32, 49, 50, 29, 15, 12], there are only few who jointly reason about 3D structure, geometric segmentation, and camera trajectory. The most common geometric prior is planarity of the environment. Castle et al. [10] are among the visual SLAM systems to incorporate planar geometry. They augment their visual SLAM system with the ability to detect known planar patches and use them to improve SLAM. Salas-Moreno et al. [39] integrate plane segmentation into the tracking and reconstruction pipeline of a dense surfel-based reconstruction system. Results show that utilizing a plane segmentation of the environment leads to improved tracking accuracy. Kaess [20] explores a plane-based SLAM formulation wherein the map directly consists of infinite planes which are being jointly optimized with the camera pose in a smoothing and mapping (SAM). Ma et al. [28] demonstrate joint inference over a key-frame-based map and a plane segmentation of the environment. The joint formulation with soft plane-assignments reduces drift of the SLAM system. In comparison to the plane-based approaches, the proposed system does not need to explicitly extract planes and imposes scene-wide constraints as opposed to local constraints. Furthermore sampling-based inference allows soft associations to directions that can be refined and corrected whereas all but the EM-based CPA-SLAM [28] make hard assignments to specific planes that are not revisited [10, 39, 20]. More related to the proposed SCW-based approach is the system by Peasley et al. [37] who use the Manhattan World assumption to impose global constraints on the 2D trajectory of a robot. They show that this yields drift-free SLAM, eliminating the need for loop-closures, given that the MW assumption holds. Bosse et al. [6]

essentially use the SCW assumption in the image space via vanishing point (VP) detection. They incorporate VP tracking into a SLAM system to jointly estimate a robot’s trajectory and the sparse 3D location of lines in the environment.

Beyond the aforementioned planar, MW and SCW assumptions several approaches have been proposed that incorporate human-annotated semantic labels and shapes into SLAM. Bao et al. [3, 2] jointly estimate object and region segmentation of a sparse point cloud in the batch structure from motion framework. Similarly, Fioraio et al. [16] jointly perform incremental object detection, mapping and camera pose estimation in what they call semantic bundle adjustment. Xiao et al. [51] show how enforcing semantic label consistency in a 3D reconstruction system leads to better 3D reconstruction by decreasing drift and correcting loop closures. Kundun et al. [26] jointly use dense image segmentation and the raw RGB image captured from a single camera to infer the camera trajectory and, using a conditional random field (CRF) defined over an occupancy grid, a semantic 3D reconstruction. Working in the realm of RGBD cameras as well, Kim et al. [24] use a voxel-based world representation and, for a given RGBD image, infer the 3D occupancy (i.e. the 3D structure) and the segmentation of the environment into semantic classes. Salas-Moreno et al. [40] are the first to demonstrate a SLAM system that utilizes dense 3D object models as beacons for camera tracking and map representation. Closer in spirit to our approach, Cabezas et al. [8]

use a mixture-model over scene features (appearance, surface normals and semantic observations) as a prior-probability model to discover and encourage scene-wide structure. They show that the learned scene-specific priors improve the 3D reconstruction. In comparison, our method not only discovers scene-wide structure but also connects the scene-wide model to the local 3D reconstruction.

2 Direction-Aware Semi-Dense SLAM

We define direction-aware SLAM as reasoning about the joint distribution of a world map

, the trajectory of the perception system and the directional segmentation given observations . Concretely, we represent the map as a set of surfels [21, 48, 17]. Surfels are localized planes with position , orientation , color and radius . For notational clarity let collect all properties of surfel . The SCW segmentation is expressed via surfel labels . The world is observed via a RGB-D camera at poses where indexes the pose at the reception of the th camera frame. From the RGBD image, we obtain point observations , surface normal observations , and surface color . We collect all observations of the th frame in the variable . Hence, the direction-aware SLAM problem amounts to inference over the posterior:


We perform inference on this direction-aware SLAM posterior by interleaving inference about the three subproblems of localization, mapping and directional segmentation:


To accommodate operation at camera frame-rate the inference is split into two main parts: (1) real-time maximum likelihood camera pose estimation (Eq. (2)) and (2) sampling-based joined inference on segmentation and map (Eq. (3) and (4)) which runs in the background. An overview of the direction-aware SLAM system is depicted in Fig. LABEL:fig:sparseFusionArch.

Instead of aiming to represent all surfaces in the environment densely, as in related work [21, 48, 17], we sample the surfaces of the environment sparsely with a bias towards high intensity gradient areas for two reasons: (1) a sparse sampling of environment surface captures the majority of surfaces and scene structure, (2) a bias towards high intensity gradient areas captures visually salient regions for camera tracking [15]. To substantiate the first point we show the percentage of inlier scene-points to a randomly sampled set of planes as a function of the number of planes in Fig. LABEL:fig:planeSparsity across all 1449 scenes of the NYU v2 dataset [30]. As little as planes are enough to describe an average of of the scene.

3 Direction-aware Camera Pose Estimation

For each observed RGBD frame we run the iterative closest point (ICP) algorithm to find a local optimum of the camera pose as well as data association between the observations and the global map surfels. The optimization of the camera pose given a projective data association amounts to maximizing the negative log likelihood of the camera pose:


where projects a 3D point into the image space. Note that while we have use the familiar notation for surfel properties (here , ), in practice sample-based estimates, computed as described in Sec. 6, are used.

This cost function combines a point-to-plane (p2pl) and a photometric (photo) cost as employed by [50]. The probabilistic interpretation was developed in [41] and an extension to include a photometric term is straight forward [23]. The common strategy to obtain a camera pose estimate given data association is to Taylor-expand the error terms around the current transformation estimate:


where we have collected the individual derivatives and error terms into the rows of and respectively. The variable is a small perturbation of the transformation in the tangent space to at the current estimate of the transformation . The least-squares solution for the (small) motion along the manifold is obtained via the standard pseudo inverse . As noted previously by Kerl et al. [22] the term

is the Fisher information matrix of the estimator. The variance of the estimate can be lower-bound by the Fisher information matrix using the Cramer-Rao bound 

[11]. Therefore, the entropy of the estimate is lower-bound via


The task of the perception system is to improve the lower-bound on the true variance and entropy to enable more certain estimates.

Our variant of ICP, outlined in Algorithm 1, incrementally adds planes to the cost function until low enough entropy is reached. Planes are chosen in a round-robin style from each of the Stata Center World segments in order of decreasing surfel texture gradient strength. Intuitively a diverse set of observed plane orientations provides better constrain the point-to-plane cost function (at least three differently orientated planes have to be observed to constrain the system fully). Preference for high gradient image regions is important for the photometric part of the ICP cost function. Similar to the approach by Dellaert et al. [14], the proposed ICP variant selectively integrates informative observations which decreases the number of necessary observations in practice and thus speeds up camera tracking.

1:get observable surfels by rendering the map
2:while ICP not converged do
4:     while uncertainty too large do
5:         pick surfel with next lower in dir. seg.
6:         if plane passes occlusion reasoning  then
7:              add to ICP and ; updated entropy
8:         end if
10:     end while
11:     compute transformation update
12:end while
Algorithm 1 Direction-aware incremental ICP.

4 Directional Segmentation

Under the Stata Center World model we make the assumption that the surface normal distribution of surfels has characteristic, low-entropy patterns as leveraged in related work by Straub et al. 

[45, 43, 44]. Similar to [43], we capture the notion of the Stata Center World model, that the surfel surface normal distribution consists of some variable, unknown number of clusters by a Dirichlet process von-Mises-Fisher mixture model. Following the proposal of [36], we impose spatial smoothness of the Stata Center World segmentation by assuming a Markov random field (MRF) over the segmentation that encourages uniform labeling inside a set of neighboring surfels of surfel .

From a generative standpoint, this model first samples a countably infinite set of cluster weights , von-Mises-Fisher means , and concentrations from a Dirichlet process with concentration and base measure :


To define the base measure, we utilize the conjugate prior for the von-Mises-Fisher distribution which in general is only known up to proportionality 



where . The parameters of the prior are the directional mode and and where can be understood as pseudo-counts and as the concentration mode. Second, given the cluster weights and the local neighborhood , a label is sampled to assign each surfel to a von-Mises-Fisher distribution :


The MRF smoothness component in practice helps speed up inference and leads to more uniform segmentations in the face of noise. It takes the form:


where is the weight of the MRF contribution and is if and otherwise.

5 Direction-aware Mapping

We use another Markov random field over neighboring surfels to express a local planarity assumption over points in the same directional segment. The MRF connects the scene-wide directional segmentation with local spatial properties. The MRF potential that encapsulates local planarity is obtained by symmetrizing the well known point-to-plane distance function used in implementations of ICP [38]:


While the point-to-plane cost function penalizes the out-of-plane deviation of a point, the MRF potential employed herein can be seen as the product of two Gaussians with variance over the out-of-plane deviation of the respective other surfel location. This geometry is shown in Fig. LABEL:fig:p2pl.

5.1 Observation Models for Mapping

Surfel locations and orientations are observed via the camera located at the estimated pose . Associations between RGB-D observations and map surfels are established using projective data association [32]. Back-facing and occluded surfels are pruned. Occlusion is detected if a surfel observation has low probability. Capturing the camera-frame times at which observations of surfel were taken in the set , we assume an iid Gaussian observation model for locations :


The observation covariances are computed according to a realistic depth camera noise model [34] and incorporate the linearized camera pose uncertainty:


The surfel orientation observations are assumed to be iid von-Mises-Fisher distributed:


where we have used the inferred camera rotations . Surface normals are extracted using the fast yet robust unconstrained scatter-matrix approach by Badino et al. [1]. It is unclear how camera pose noise and depth image noise influences the surface normal concentration. Hence, we use a conservative observation concentration of which makes the realistic assumption that of the observed surface normals lie within a solid angle of about around the true surface normal. A more detailed model could be obtained with a controlled experiment similar to [34].

6 Sampling-based Inference over SCW Map

We now turn to describing how to perform posterior inference on the joint SCW map model given observations from inferred camera poses

. Because the directional segmentation involves a Bayesian nonparametric Dirichlet process prior, we rely on Gibbs sampling inference, which in the limit of sampling guarantees samples from the true posterior distribution. The Gibbs sampler iterates sampling from the different conditional distributions of each random variable in the join SCW map model. In the following we provide details on sampling from each conditional distributions before detailing how samples are used to inform camera tracking.

Sampling Normals

Via Bayes’ law, the conditional distribution of surfel direction , , is proportional to


where we have abbreviated with and used that only one of the two out-of-plane Gaussians in the MRF depends on

. The first factor stems from the directional Stata Center World mixture model, and the second from the surface normal observation model. To sample from this distribution we derive a close approximation to the out-of-plane Gaussian that has the form of a vMF distribution. This makes the posterior over surface normals von-Mises-Fisher distributed which can be sampled efficiently. The Gaussian distribution on out-of-plane deviations of neighboring points can be re-arranged as


where . This distribution has the form of a Bingham distribution [4]. To keep in the realm of the von-Mises-Fisher distribution, we approximate this Bingham with a vMF distribution using the eigen decomposition of

with eigenvalues

and associated eigenvectors



which is proportional to a vMF distribution with mode and concentration . Figure LABEL:fig:bing2vMF_singleMode

shows that the vMF approximation is close to the Bingham distribution for several realistic standard deviations of planar and slightly curved surfaces. In practice, since

incorporates only neighbors in the same directional segment (which are therefore likely to lie roughly in the same plane), we find the approximation to work well.

Under this approximation the posterior over surface normal is indeed proportional to a von-Mises-Fisher:


where have been rotated into the world camera frame using the appropriate and . An efficient method for sampling from a von-Mises-Fisher distribution is outlined in [47].

Sampling Directional Segmentation Labels

We use the Chinese restaurant process (CRP) representation of the Dirichlet process [5, 31] since it lends itself to straightforward sampling-based inference. The posterior for the directional segmentation label of surfel is:


where is the number of surfels associated to cluster excepting the th surfel, is the Dirichlet process concentration and is the weight of the MRF contribution. The marginal distribution of surface normal under the prior on the vMF component distribution, , can be derived in closed form for the vMF prior parameters and in dimensions (see Sec. 2.6.3 [42])


Sampling vMF Parameters and

Given sampled normals assigned to von-Mises-Fisher clusters via labels the posterior over the th vMF mixture component mode and concentration is:


where collects all surfels associated to cluster . With , the posterior parameters and are computed as


Sampling Locations

Conditioned on point observations , and a surfel’s neighborhood , a surfel’s position is distributed as:


where the observation model is Gaussian as defined in Eq. (15). The MRF potential from Eq. (14) is proportional to:


where is the information matrix of a degenerate Gaussian in information form and is its scaled mean. Since the individual distributions are all Gaussian the posterior over surfel location is also Gaussian [7] with the following mean and variance:


where . Note that there is always at least one observation (i.e. ) and therefore the inversion to compute the variance is always determined.

6.1 Estimates Computed from the Samples

We use the Gibbs-sampler samples to approximately compute means and variances of surfel locations and orientations. Via the law of large numbers and by the construction of the Gibbs sampler this approach will in the limit converge to the true means and variances 

[9]. In practice, since the marginal distributions or are mostly concentrated about a single mode, the estimates converge quickly. In our experiments in the order of ten samples were sufficient to get usable estimates for real-time camera tracking as described in Sec. 3.

Given a set of samples from the distribution of surfel locations , we estimate the mean and variance of the surfel location using the accumulated statistics and :


Note that samples are not samples from a Gaussian distribution but the maximum entropy distribution of is a Gaussian with the aforementioned mean and variance. The entropy of this Gaussian is an upper bound on the true entropy of the surfel location distribution and can serve as a scalar indicator of the uncertainty.

From the surfel normal samples we compute the mode of a vMF distribution for camera tracking using the accumulated statistics :


To compute the most likely directional segment of surfel we would ideally keep a count of the number of times the surfel is assigned to each directional cluster via label . Since the number of clusters keeps growing and we aim for this estimation to be efficient for large numbers of surfels, we only keep track of the most likely cluster assignments incrementally.

7 Implementation

In practice, to use the proposed approach we architect a multi-threaded system as depicted in Fig. LABEL:fig:sparseFusionArch. The main five threads are (1) a real-time data acquisition, camera tracking and observation extraction thread, (2) a nearest neighborhood graph builder thread and (3-5) three Gibbs sampler threads. Camera tracking utilizes RGBD frames and the current most likely estimate of the segmentation and surfel map to infer the current camera pose . To be able to deal with fast motions we perform photometric rotational pre-alignment [27] from image pyramid level down to . For the same reason, we run direction-aware ICP from scale pyramid levels down to . The observation extraction algorithm adds new surfels by uniformly sampling so-far unobserved surfaces with a bias towards high gradient surface areas similar to [15]. The graph builder thread uses the initial locations of all surfels to maintain a k-nearest-neighbor graph over surfels (here ) using the negative log MRF potential from Eq. (14) as the distance function. Valid neighbors have to be within a Euclidean radius of  m. This is an approximation to the directed graph that could be obtained by connecting all surfels within some distance. Retaining only the top closest (under the potential) surfels improves algorithm efficiency without notable differences in the reconstruction results. To deal with deleted and newly added surfels, the thread additionally randomly revisits and potentially updates the nearest neighbors of already incorporated surfels. We split the Gibbs sampler into three threads each sampling (at its own speed) from the respective posterior given samples from the other threads. There exists only preliminary research on parallel Gibbs sampling under the name Hogwild Gibbs sampling [19] and it is unclear if there are theoretical guarantees. In practice breaking the samplers into parallel threads seems to make no difference.

8 Evaluation and Results

In the following we evaluate the proposed direction-aware 3D reconstruction system on various challenging datasets quantitatively as well as qualitatively. All experiments are performed on a machine with an Intel Xeon CPU with 16 cores at 2.4 GHz and a Nvidia GTX-1080 graphics card. As described in Sec. 7, the algorithm utilizes a total of CPU cores for the main inference tasks. Surface normals are computed only sparsely on CPU wherever needed. The GPU is used for the full-frame operations of pre-alignment and data preprocessing.

Qualitative Reconstruction Results

In Figures LABEL:fig:roundStairs, LABEL:fig:fr2xyz, and LABEL:fig:32D458

we show the RGB-colored and the SCW segmented 3D reconstructions of different scenes. As can be seen, the maximum a-posteriori estimate of the Stata Center World segmentation sensibly partitions the environment according to the surfel directions. The inference extracts the main peaks of the distribution which correspond to planar regions in the scene. Additionally, low concentration clusters are inferred that capture noisy, non-planar regions (green in top and yellow in bottom row of Fig. 

LABEL:fig:fr2xyz, yellow in Fig. LABEL:fig:32D458).

Algorithm Operation and Properties

To explore the properties of the algorithm, we discuss timings, surfel and sampling statistics collected during the reconstruction of the fr2_xyz dataset [46] displayed in Fig. LABEL:fig:fr2xyz. Figure LABEL:fig:sparseFusionStats (left) shows that the main camera tracking thread mostly runs in less than ms per frame. Runtime increases when the camera moves far away from the scene and ICP processes more points for confident camera tracking (see Fig. LABEL:fig:sparseFusionStats middle). The runtime of the surfel parameter sampling threads scales with the size of the map (compare Fig. LABEL:fig:sparseFusionStats middle). As can be seen in Fig. LABEL:fig:sparseFusionStats (middle), the number of surfels utilized for camera tracking is usually less than surfels even if a magnitude more surfels are in view. This is enabled by the direction and gradient-aware selection of surfel observations. The statistics in Fig. LABEL:fig:sparseFusionStats (right) show that while the number of surfels in the map keeps growing, the sampling threads yield sufficient samples per surfel.

Camera Tracking Accuracy Comparison

We use the TUM indoor dataset [46] and the synthetic dataset by Handa et al. [18] to evaluate the camera tracking accuracy against groundtruth via the absolute trajectory error (ATE) [46] and compare our system to related 3D SLAM systems in Fig. LABEL:fig:ateJoint. Fig. LABEL:fig:ateJoint demonstrates that the proposed directional SLAM system is on par or better than related algorithms in terms of camera trajectory estimation for datasets without the need for loop closures. The Dir. SLAM Random system uses direct surfel fusion and randomly selects ICP observations. As can be seen, disregarding the directional segmentation decreases tracking accuracy especially on the real datasets fr2_xyz and fr2_desk.

9 Conclusion

We have introduced the first direction-aware semi-dense SLAM system which performs joint inference over directional segmentation, surfel-based map and camera pose. Its direction-awareness manifests in that it can utilize the directional segmentation for its other tasks. The use of Gibbs-sampling-based inference on the complex Bayesian nonparametric segmentation and map model in a real-time reconstruction system has not been demonstrated before. Due to the flexibility of Gibbs-sampling this opens up exciting possibilities for inference on more complex and detailed environment models. Having access to samples from the posterior also allows reasoning about uncertainty which is not possible with the commonly employed mode-seeking inference methods.


  • [1] H. Badino, D. Huber, Y. Park, and T. Kanade. Fast and accurate computation of surface normals from range images. In ICRA, pages 3084–3091. 2011.
  • [2] S. Y. Bao, M. Bagra, Y.-W. Chao, and S. Savarese. Semantic structure from motion with points, regions, and objects. In CVPR, pages 2703–2710. 2012.
  • [3] S. Y. Bao and S. Savarese. Semantic structure from motion. In CVPR, pages 2025–2032. 2011.
  • [4] C. Bingham. An antipodally symmetric distribution on the sphere. The Annals of Statistics, 2(6):1201–1225, 1974.
  • [5] D. Blackwell and J. B. MacQueen. Ferguson distributions via pólya urn schemes. The Annals of Statistics, pages 353–355, 1973.
  • [6] M. Bosse, R. Rikoski, J. Leonard, and S. Teller. Vanishing points and three-dimensional lines from omni-directional video. The Visual Computer, 19(6):417–430, 2003.
  • [7] P. Bromiley.

    Products and convolutions of Gaussian probability density functions.

    Technical Report Tina Memo No. 2003-003, University of Manchester.
  • [8] R. Cabezas, J. Straub, and J. W. Fisher III. Semantically-Aware Aerial Reconstruction from Multi-Modal Data. In ICCV, 2015.
  • [9] G. Casella and E. George. Explaining the gibbs sampler. The American Statistician, 46(3):167–174, 1992.
  • [10] R. O. Castle, D. Gawley, G. Klein, and D. W. Murray. Towards simultaneous recognition, localization and mapping for hand-held and wearable cameras. In ICRA, 2007.
  • [11] H. Cramér. Mathematical Methods of Statistics, volume 9. Princeton University Press, 2016.
  • [12] A. Dai, M. Nießner, M. Zollhöfer, S. Izadi, and C. Theobalt. BundleFusion: Real-time Globally Consistent 3D Reconstruction using On-the-fly Surface Re-integration TOG, 2017.
  • [13] A. J. Davison. Real-time simultaneous localisation and mapping with a single camera. In ICCV, 2003.
  • [14] F. Dellaert and R. Collins. Fast image-based tracking by selective pixel integration. In Proceedings of the ICCV Workshop on Frame-Rate Vision, 1999.
  • [15] J. Engel, T. Schöps, and D. Cremers. LSD-SLAM: Large-scale direct monocular SLAM. In ECCV, 2014.
  • [16] N. Fioraio and L. Di Stefano. Joint detection, tracking and mapping by semantic bundle adjustment. In CVPR, 2013.
  • [17] M. Habbecke and L. Kobbelt. A surface-growing approach to multi-view stereo reconstruction. In CVPR, 2007.
  • [18] A. Handa, T. Whelan, J. McDonald, and A. J. Davison. A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM. In ICRA, 2014.
  • [19] M. Johnson, J. Saunderson, and A. Willsky. Analyzing hogwild parallel Gaussian Gibbs sampling. In NIPS, 2013.
  • [20] M. Kaess. Simultaneous localization and mapping with infinite planes. In ICRA, 2015.
  • [21] M. Keller, D. Lefloch, M. Lambers, S. Izadi, T. Weyrich, and A. Kolb. Real-time 3D reconstruction in dynamic scenes using point-based fusion. In International Conference on 3DTV-Conference, 2013.
  • [22] C. Kerl, J. Sturm, and D. Cremers. Dense visual SLAM for RGB-D cameras. In IROS, 2013.
  • [23] C. Kerl, J. Sturm, and D. Cremers. Robust odometry estimation for RGB-D cameras. In ICRA, 2013.
  • [24] B.-s. Kim, P. Kohli, and S. Savarese. 3D scene understanding by voxel-CRF. In ICCV, 2013.
  • [25] G. Klein and D. Murray. Parallel tracking and mapping on a camera phone. In ISMAR, 2009.
  • [26] A. Kundu, Y. Li, F. Dellaert, F. Li, and J. M. Rehg. Joint semantic segmentation and 3D reconstruction from monocular video. In ECCV, 2014.
  • [27] S. Lovegrove and A. J. Davison. Real-time spherical mosaicing using whole image alignment. In ECCV, 2010.
  • [28] L. Ma, C. Kerl, J. Stueckler, and D. Cremers. CPA-SLAM: Consistent plane-model alignment for direct RGB-D SLAM. In ICRA, 2016.
  • [29] R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos. ORB-SLAM: a versatile and accurate monocular SLAM system. Transactions on Robotics, 31(5):1147–1163, 2015.
  • [30] P. K. Nathan Silberman, Derek Hoiem and R. Fergus. Indoor segmentation and support inference from RGBD images. In ECCV, 2012.
  • [31] R. Neal. Markov chain sampling methods for Dirichlet process mixture models. Journal of computational and graphical statistics, 9(2):249–265, 2000.
  • [32] R. A. Newcombe, A. J. Davison, S. Izadi, P. Kohli, O. Hilliges, J. Shotton, D. Molyneaux, S. Hodges, D. Kim, and A. Fitzgibbon. Kinectfusion: Real-time dense surface mapping and tracking. In ISMAR, 2011.
  • [33] R. A. Newcombe, S. J. Lovegrove, and A. J. Davison. DTAM: Dense tracking and mapping in real-time. In ICCV, 2011.
  • [34] C. V. Nguyen, S. Izadi, and D. Lovell. Modeling Kinect sensor noise for improved 3D reconstruction and tracking. In 3DIMPVT, 2012.
  • [35] G. Nunez-Antonio and E. Gutiérrez-Pena. A Bayesian analysis of directional data using the von Mises-Fisher distribution. Communications in Statistics—Simulation and Computation®, 34(4):989–999, 2005.
  • [36] P. Orbanz and J. Buhmann.

    Smooth image segmentation by nonparametric Bayesian inference.

    ECCV, 2006.
  • [37] B. Peasley, S. Birchfield, A. Cunningham, and F. Dellaert. Accurate on-line 3D occupancy grids using Manhattan world constraints. In IROS, 2012.
  • [38] S. Rusinkiewicz and M. Levoy. Efficient variants of the ICP algorithm. In International Conference on 3-D Digital Imaging and Modeling, 2001.
  • [39] R. F. Salas-Moreno, B. Glocken, P. H. Kelly, and A. J. Davison. Dense planar SLAM. In ISMAR, 2014.
  • [40] R. F. Salas-Moreno, R. A. Newcombe, H. Strasdat, P. H. Kelly, and A. J. Davison. SLAM++: Simultaneous localisation and mapping at the level of objects. In CVPR, 2013.
  • [41] A. Segal, D. Haehnel, and S. Thrun. Generalized-ICP. In RSS, 2009.
  • [42] J. Straub. Nonparametric Directional Perception. PhD thesis, Massachusetts Institute of Technology, 2017.
  • [43] J. Straub, T. Campbell, J. P. How, and J. W. Fisher III. Small-variance nonparametric clustering on the hypersphere. In CVPR, 2015.
  • [44] J. Straub, J. Chang, O. Freifeld, and J. W. Fisher III. A dirichlet process mixture model for spherical data. In AISTATS, 2015.
  • [45] J. Straub, G. Rosman, O. Freifeld, J. J. Leonard, and J. W. Fisher III. The Manhattan frame model – Manhattan world inference in the space of surface normals. In TPAMI, 2017.
  • [46] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers. A benchmark for the evaluation of RGB-D SLAM systems. In IROS, 2012.
  • [47] G. Ulrich. Computer generation of distributions on the m-sphere. Applied Statistics, pages 158–163, 1984.
  • [48] T. Weise, T. Wismer, B. Leibe, and L. Van Gool. In-hand scanning with online loop closure. In ICCV Workshops, 2009.
  • [49] T. Whelan, M. Kaess, M. Fallon, H. Johannsson, J. Leonard, and J. McDonald. Kintinuous: Spatially extended kinectfusion. 2012.
  • [50] T. Whelan, R. F. Salas-Moreno, B. Glocker, A. J. Davison, and S. Leutenegger. Elasticfusion: Real-time dense SLAM and light source estimation. IJRR, pages 1697–1716, 2016.
  • [51] J. Xiao, A. Owens, and A. Torralba. Sun3D: A database of big spaces reconstructed using SFM and object labels. In ICCV, 2013.