1 Related Work
Among the wealth of recent 3D SLAM systems [13, 25, 33, 32, 49, 50, 29, 15, 12], there are only few who jointly reason about 3D structure, geometric segmentation, and camera trajectory. The most common geometric prior is planarity of the environment. Castle et al. [10] are among the visual SLAM systems to incorporate planar geometry. They augment their visual SLAM system with the ability to detect known planar patches and use them to improve SLAM. SalasMoreno et al. [39] integrate plane segmentation into the tracking and reconstruction pipeline of a dense surfelbased reconstruction system. Results show that utilizing a plane segmentation of the environment leads to improved tracking accuracy. Kaess [20] explores a planebased SLAM formulation wherein the map directly consists of infinite planes which are being jointly optimized with the camera pose in a smoothing and mapping (SAM). Ma et al. [28] demonstrate joint inference over a keyframebased map and a plane segmentation of the environment. The joint formulation with soft planeassignments reduces drift of the SLAM system. In comparison to the planebased approaches, the proposed system does not need to explicitly extract planes and imposes scenewide constraints as opposed to local constraints. Furthermore samplingbased inference allows soft associations to directions that can be refined and corrected whereas all but the EMbased CPASLAM [28] make hard assignments to specific planes that are not revisited [10, 39, 20]. More related to the proposed SCWbased approach is the system by Peasley et al. [37] who use the Manhattan World assumption to impose global constraints on the 2D trajectory of a robot. They show that this yields driftfree SLAM, eliminating the need for loopclosures, given that the MW assumption holds. Bosse et al. [6]
essentially use the SCW assumption in the image space via vanishing point (VP) detection. They incorporate VP tracking into a SLAM system to jointly estimate a robot’s trajectory and the sparse 3D location of lines in the environment.
Beyond the aforementioned planar, MW and SCW assumptions several approaches have been proposed that incorporate humanannotated semantic labels and shapes into SLAM. Bao et al. [3, 2] jointly estimate object and region segmentation of a sparse point cloud in the batch structure from motion framework. Similarly, Fioraio et al. [16] jointly perform incremental object detection, mapping and camera pose estimation in what they call semantic bundle adjustment. Xiao et al. [51] show how enforcing semantic label consistency in a 3D reconstruction system leads to better 3D reconstruction by decreasing drift and correcting loop closures. Kundun et al. [26] jointly use dense image segmentation and the raw RGB image captured from a single camera to infer the camera trajectory and, using a conditional random field (CRF) defined over an occupancy grid, a semantic 3D reconstruction. Working in the realm of RGBD cameras as well, Kim et al. [24] use a voxelbased world representation and, for a given RGBD image, infer the 3D occupancy (i.e. the 3D structure) and the segmentation of the environment into semantic classes. SalasMoreno et al. [40] are the first to demonstrate a SLAM system that utilizes dense 3D object models as beacons for camera tracking and map representation. Closer in spirit to our approach, Cabezas et al. [8]
use a mixturemodel over scene features (appearance, surface normals and semantic observations) as a priorprobability model to discover and encourage scenewide structure. They show that the learned scenespecific priors improve the 3D reconstruction. In comparison, our method not only discovers scenewide structure but also connects the scenewide model to the local 3D reconstruction.
2 DirectionAware SemiDense SLAM
We define directionaware SLAM as reasoning about the joint distribution of a world map
, the trajectory of the perception system and the directional segmentation given observations . Concretely, we represent the map as a set of surfels [21, 48, 17]. Surfels are localized planes with position , orientation , color and radius . For notational clarity let collect all properties of surfel . The SCW segmentation is expressed via surfel labels . The world is observed via a RGBD camera at poses where indexes the pose at the reception of the th camera frame. From the RGBD image, we obtain point observations , surface normal observations , and surface color . We collect all observations of the th frame in the variable . Hence, the directionaware SLAM problem amounts to inference over the posterior:(1) 
We perform inference on this directionaware SLAM posterior by interleaving inference about the three subproblems of localization, mapping and directional segmentation:
(2)  
(3)  
(4) 
To accommodate operation at camera framerate the inference is split into two main parts: (1) realtime maximum likelihood camera pose estimation (Eq. (2)) and (2) samplingbased joined inference on segmentation and map (Eq. (3) and (4)) which runs in the background. An overview of the directionaware SLAM system is depicted in Fig. LABEL:fig:sparseFusionArch.
Instead of aiming to represent all surfaces in the environment densely, as in related work [21, 48, 17], we sample the surfaces of the environment sparsely with a bias towards high intensity gradient areas for two reasons: (1) a sparse sampling of environment surface captures the majority of surfaces and scene structure, (2) a bias towards high intensity gradient areas captures visually salient regions for camera tracking [15]. To substantiate the first point we show the percentage of inlier scenepoints to a randomly sampled set of planes as a function of the number of planes in Fig. LABEL:fig:planeSparsity across all 1449 scenes of the NYU v2 dataset [30]. As little as planes are enough to describe an average of of the scene.
3 Directionaware Camera Pose Estimation
For each observed RGBD frame we run the iterative closest point (ICP) algorithm to find a local optimum of the camera pose as well as data association between the observations and the global map surfels. The optimization of the camera pose given a projective data association amounts to maximizing the negative log likelihood of the camera pose:
(5)  
(6)  
(7) 
where projects a 3D point into the image space. Note that while we have use the familiar notation for surfel properties (here , ), in practice samplebased estimates, computed as described in Sec. 6, are used.
This cost function combines a pointtoplane (p2pl) and a photometric (photo) cost as employed by [50]. The probabilistic interpretation was developed in [41] and an extension to include a photometric term is straight forward [23]. The common strategy to obtain a camera pose estimate given data association is to Taylorexpand the error terms around the current transformation estimate:
(8) 
where we have collected the individual derivatives and error terms into the rows of and respectively. The variable is a small perturbation of the transformation in the tangent space to at the current estimate of the transformation . The leastsquares solution for the (small) motion along the manifold is obtained via the standard pseudo inverse . As noted previously by Kerl et al. [22] the term
is the Fisher information matrix of the estimator. The variance of the estimate can be lowerbound by the Fisher information matrix using the CramerRao bound
[11]. Therefore, the entropy of the estimate is lowerbound via(9) 
The task of the perception system is to improve the lowerbound on the true variance and entropy to enable more certain estimates.
Our variant of ICP, outlined in Algorithm 1, incrementally adds planes to the cost function until low enough entropy is reached. Planes are chosen in a roundrobin style from each of the Stata Center World segments in order of decreasing surfel texture gradient strength. Intuitively a diverse set of observed plane orientations provides better constrain the pointtoplane cost function (at least three differently orientated planes have to be observed to constrain the system fully). Preference for high gradient image regions is important for the photometric part of the ICP cost function. Similar to the approach by Dellaert et al. [14], the proposed ICP variant selectively integrates informative observations which decreases the number of necessary observations in practice and thus speeds up camera tracking.
4 Directional Segmentation
Under the Stata Center World model we make the assumption that the surface normal distribution of surfels has characteristic, lowentropy patterns as leveraged in related work by Straub et al.
[45, 43, 44]. Similar to [43], we capture the notion of the Stata Center World model, that the surfel surface normal distribution consists of some variable, unknown number of clusters by a Dirichlet process vonMisesFisher mixture model. Following the proposal of [36], we impose spatial smoothness of the Stata Center World segmentation by assuming a Markov random field (MRF) over the segmentation that encourages uniform labeling inside a set of neighboring surfels of surfel .From a generative standpoint, this model first samples a countably infinite set of cluster weights , vonMisesFisher means , and concentrations from a Dirichlet process with concentration and base measure :
(10) 
To define the base measure, we utilize the conjugate prior for the vonMisesFisher distribution which in general is only known up to proportionality
[35]:(11) 
where . The parameters of the prior are the directional mode and and where can be understood as pseudocounts and as the concentration mode. Second, given the cluster weights and the local neighborhood , a label is sampled to assign each surfel to a vonMisesFisher distribution :
(12) 
The MRF smoothness component in practice helps speed up inference and leads to more uniform segmentations in the face of noise. It takes the form:
(13) 
where is the weight of the MRF contribution and is if and otherwise.
5 Directionaware Mapping
We use another Markov random field over neighboring surfels to express a local planarity assumption over points in the same directional segment. The MRF connects the scenewide directional segmentation with local spatial properties. The MRF potential that encapsulates local planarity is obtained by symmetrizing the well known pointtoplane distance function used in implementations of ICP [38]:
(14) 
While the pointtoplane cost function penalizes the outofplane deviation of a point, the MRF potential employed herein can be seen as the product of two Gaussians with variance over the outofplane deviation of the respective other surfel location. This geometry is shown in Fig. LABEL:fig:p2pl.
5.1 Observation Models for Mapping
Surfel locations and orientations are observed via the camera located at the estimated pose . Associations between RGBD observations and map surfels are established using projective data association [32]. Backfacing and occluded surfels are pruned. Occlusion is detected if a surfel observation has low probability. Capturing the cameraframe times at which observations of surfel were taken in the set , we assume an iid Gaussian observation model for locations :
(15) 
The observation covariances are computed according to a realistic depth camera noise model [34] and incorporate the linearized camera pose uncertainty:
(16) 
The surfel orientation observations are assumed to be iid vonMisesFisher distributed:
(17) 
where we have used the inferred camera rotations . Surface normals are extracted using the fast yet robust unconstrained scattermatrix approach by Badino et al. [1]. It is unclear how camera pose noise and depth image noise influences the surface normal concentration. Hence, we use a conservative observation concentration of which makes the realistic assumption that of the observed surface normals lie within a solid angle of about around the true surface normal. A more detailed model could be obtained with a controlled experiment similar to [34].
6 Samplingbased Inference over SCW Map
We now turn to describing how to perform posterior inference on the joint SCW map model given observations from inferred camera poses
. Because the directional segmentation involves a Bayesian nonparametric Dirichlet process prior, we rely on Gibbs sampling inference, which in the limit of sampling guarantees samples from the true posterior distribution. The Gibbs sampler iterates sampling from the different conditional distributions of each random variable in the join SCW map model. In the following we provide details on sampling from each conditional distributions before detailing how samples are used to inform camera tracking.
Sampling Normals
Via Bayes’ law, the conditional distribution of surfel direction , , is proportional to
(18) 
where we have abbreviated with and used that only one of the two outofplane Gaussians in the MRF depends on
. The first factor stems from the directional Stata Center World mixture model, and the second from the surface normal observation model. To sample from this distribution we derive a close approximation to the outofplane Gaussian that has the form of a vMF distribution. This makes the posterior over surface normals vonMisesFisher distributed which can be sampled efficiently. The Gaussian distribution on outofplane deviations of neighboring points can be rearranged as
(19) 
where . This distribution has the form of a Bingham distribution [4]. To keep in the realm of the vonMisesFisher distribution, we approximate this Bingham with a vMF distribution using the eigen decomposition of
with eigenvalues
and associated eigenvectors
:(20) 
which is proportional to a vMF distribution with mode and concentration . Figure LABEL:fig:bing2vMF_singleMode
shows that the vMF approximation is close to the Bingham distribution for several realistic standard deviations of planar and slightly curved surfaces. In practice, since
incorporates only neighbors in the same directional segment (which are therefore likely to lie roughly in the same plane), we find the approximation to work well.Under this approximation the posterior over surface normal is indeed proportional to a vonMisesFisher:
(21) 
where have been rotated into the world camera frame using the appropriate and . An efficient method for sampling from a vonMisesFisher distribution is outlined in [47].
Sampling Directional Segmentation Labels
We use the Chinese restaurant process (CRP) representation of the Dirichlet process [5, 31] since it lends itself to straightforward samplingbased inference. The posterior for the directional segmentation label of surfel is:
(22) 
where is the number of surfels associated to cluster excepting the th surfel, is the Dirichlet process concentration and is the weight of the MRF contribution. The marginal distribution of surface normal under the prior on the vMF component distribution, , can be derived in closed form for the vMF prior parameters and in dimensions (see Sec. 2.6.3 [42])
(23) 
Sampling vMF Parameters and
Given sampled normals assigned to vonMisesFisher clusters via labels the posterior over the th vMF mixture component mode and concentration is:
(24) 
where collects all surfels associated to cluster . With , the posterior parameters and are computed as
(25) 
Sampling Locations
Conditioned on point observations , and a surfel’s neighborhood , a surfel’s position is distributed as:
(26) 
where the observation model is Gaussian as defined in Eq. (15). The MRF potential from Eq. (14) is proportional to:
(27) 
where is the information matrix of a degenerate Gaussian in information form and is its scaled mean. Since the individual distributions are all Gaussian the posterior over surfel location is also Gaussian [7] with the following mean and variance:
(28)  
(29)  
(30) 
where . Note that there is always at least one observation (i.e. ) and therefore the inversion to compute the variance is always determined.
6.1 Estimates Computed from the Samples
We use the Gibbssampler samples to approximately compute means and variances of surfel locations and orientations. Via the law of large numbers and by the construction of the Gibbs sampler this approach will in the limit converge to the true means and variances
[9]. In practice, since the marginal distributions or are mostly concentrated about a single mode, the estimates converge quickly. In our experiments in the order of ten samples were sufficient to get usable estimates for realtime camera tracking as described in Sec. 3.Given a set of samples from the distribution of surfel locations , we estimate the mean and variance of the surfel location using the accumulated statistics and :
(31) 
Note that samples are not samples from a Gaussian distribution but the maximum entropy distribution of is a Gaussian with the aforementioned mean and variance. The entropy of this Gaussian is an upper bound on the true entropy of the surfel location distribution and can serve as a scalar indicator of the uncertainty.
From the surfel normal samples we compute the mode of a vMF distribution for camera tracking using the accumulated statistics :
(32) 
To compute the most likely directional segment of surfel we would ideally keep a count of the number of times the surfel is assigned to each directional cluster via label . Since the number of clusters keeps growing and we aim for this estimation to be efficient for large numbers of surfels, we only keep track of the most likely cluster assignments incrementally.
7 Implementation
In practice, to use the proposed approach we architect a multithreaded system as depicted in Fig. LABEL:fig:sparseFusionArch. The main five threads are (1) a realtime data acquisition, camera tracking and observation extraction thread, (2) a nearest neighborhood graph builder thread and (35) three Gibbs sampler threads. Camera tracking utilizes RGBD frames and the current most likely estimate of the segmentation and surfel map to infer the current camera pose . To be able to deal with fast motions we perform photometric rotational prealignment [27] from image pyramid level down to . For the same reason, we run directionaware ICP from scale pyramid levels down to . The observation extraction algorithm adds new surfels by uniformly sampling sofar unobserved surfaces with a bias towards high gradient surface areas similar to [15]. The graph builder thread uses the initial locations of all surfels to maintain a knearestneighbor graph over surfels (here ) using the negative log MRF potential from Eq. (14) as the distance function. Valid neighbors have to be within a Euclidean radius of m. This is an approximation to the directed graph that could be obtained by connecting all surfels within some distance. Retaining only the top closest (under the potential) surfels improves algorithm efficiency without notable differences in the reconstruction results. To deal with deleted and newly added surfels, the thread additionally randomly revisits and potentially updates the nearest neighbors of already incorporated surfels. We split the Gibbs sampler into three threads each sampling (at its own speed) from the respective posterior given samples from the other threads. There exists only preliminary research on parallel Gibbs sampling under the name Hogwild Gibbs sampling [19] and it is unclear if there are theoretical guarantees. In practice breaking the samplers into parallel threads seems to make no difference.
8 Evaluation and Results
In the following we evaluate the proposed directionaware 3D reconstruction system on various challenging datasets quantitatively as well as qualitatively. All experiments are performed on a machine with an Intel Xeon CPU with 16 cores at 2.4 GHz and a Nvidia GTX1080 graphics card. As described in Sec. 7, the algorithm utilizes a total of CPU cores for the main inference tasks. Surface normals are computed only sparsely on CPU wherever needed. The GPU is used for the fullframe operations of prealignment and data preprocessing.
Qualitative Reconstruction Results
In Figures LABEL:fig:roundStairs, LABEL:fig:fr2xyz, and LABEL:fig:32D458
we show the RGBcolored and the SCW segmented 3D reconstructions of different scenes. As can be seen, the maximum aposteriori estimate of the Stata Center World segmentation sensibly partitions the environment according to the surfel directions. The inference extracts the main peaks of the distribution which correspond to planar regions in the scene. Additionally, low concentration clusters are inferred that capture noisy, nonplanar regions (green in top and yellow in bottom row of Fig.
LABEL:fig:fr2xyz, yellow in Fig. LABEL:fig:32D458).Algorithm Operation and Properties
To explore the properties of the algorithm, we discuss timings, surfel and sampling statistics collected during the reconstruction of the fr2_xyz dataset [46] displayed in Fig. LABEL:fig:fr2xyz. Figure LABEL:fig:sparseFusionStats (left) shows that the main camera tracking thread mostly runs in less than ms per frame. Runtime increases when the camera moves far away from the scene and ICP processes more points for confident camera tracking (see Fig. LABEL:fig:sparseFusionStats middle). The runtime of the surfel parameter sampling threads scales with the size of the map (compare Fig. LABEL:fig:sparseFusionStats middle). As can be seen in Fig. LABEL:fig:sparseFusionStats (middle), the number of surfels utilized for camera tracking is usually less than surfels even if a magnitude more surfels are in view. This is enabled by the direction and gradientaware selection of surfel observations. The statistics in Fig. LABEL:fig:sparseFusionStats (right) show that while the number of surfels in the map keeps growing, the sampling threads yield sufficient samples per surfel.
Camera Tracking Accuracy Comparison
We use the TUM indoor dataset [46] and the synthetic dataset by Handa et al. [18] to evaluate the camera tracking accuracy against groundtruth via the absolute trajectory error (ATE) [46] and compare our system to related 3D SLAM systems in Fig. LABEL:fig:ateJoint. Fig. LABEL:fig:ateJoint demonstrates that the proposed directional SLAM system is on par or better than related algorithms in terms of camera trajectory estimation for datasets without the need for loop closures. The Dir. SLAM Random system uses direct surfel fusion and randomly selects ICP observations. As can be seen, disregarding the directional segmentation decreases tracking accuracy especially on the real datasets fr2_xyz and fr2_desk.
9 Conclusion
We have introduced the first directionaware semidense SLAM system which performs joint inference over directional segmentation, surfelbased map and camera pose. Its directionawareness manifests in that it can utilize the directional segmentation for its other tasks. The use of Gibbssamplingbased inference on the complex Bayesian nonparametric segmentation and map model in a realtime reconstruction system has not been demonstrated before. Due to the flexibility of Gibbssampling this opens up exciting possibilities for inference on more complex and detailed environment models. Having access to samples from the posterior also allows reasoning about uncertainty which is not possible with the commonly employed modeseeking inference methods.
References
 [1] H. Badino, D. Huber, Y. Park, and T. Kanade. Fast and accurate computation of surface normals from range images. In ICRA, pages 3084–3091. 2011.
 [2] S. Y. Bao, M. Bagra, Y.W. Chao, and S. Savarese. Semantic structure from motion with points, regions, and objects. In CVPR, pages 2703–2710. 2012.
 [3] S. Y. Bao and S. Savarese. Semantic structure from motion. In CVPR, pages 2025–2032. 2011.
 [4] C. Bingham. An antipodally symmetric distribution on the sphere. The Annals of Statistics, 2(6):1201–1225, 1974.
 [5] D. Blackwell and J. B. MacQueen. Ferguson distributions via pólya urn schemes. The Annals of Statistics, pages 353–355, 1973.
 [6] M. Bosse, R. Rikoski, J. Leonard, and S. Teller. Vanishing points and threedimensional lines from omnidirectional video. The Visual Computer, 19(6):417–430, 2003.

[7]
P. Bromiley.
Products and convolutions of Gaussian probability density functions.
Technical Report Tina Memo No. 2003003, University of Manchester.  [8] R. Cabezas, J. Straub, and J. W. Fisher III. SemanticallyAware Aerial Reconstruction from MultiModal Data. In ICCV, 2015.
 [9] G. Casella and E. George. Explaining the gibbs sampler. The American Statistician, 46(3):167–174, 1992.
 [10] R. O. Castle, D. Gawley, G. Klein, and D. W. Murray. Towards simultaneous recognition, localization and mapping for handheld and wearable cameras. In ICRA, 2007.
 [11] H. Cramér. Mathematical Methods of Statistics, volume 9. Princeton University Press, 2016.
 [12] A. Dai, M. Nießner, M. Zollhöfer, S. Izadi, and C. Theobalt. BundleFusion: Realtime Globally Consistent 3D Reconstruction using Onthefly Surface Reintegration TOG, 2017.
 [13] A. J. Davison. Realtime simultaneous localisation and mapping with a single camera. In ICCV, 2003.
 [14] F. Dellaert and R. Collins. Fast imagebased tracking by selective pixel integration. In Proceedings of the ICCV Workshop on FrameRate Vision, 1999.
 [15] J. Engel, T. Schöps, and D. Cremers. LSDSLAM: Largescale direct monocular SLAM. In ECCV, 2014.
 [16] N. Fioraio and L. Di Stefano. Joint detection, tracking and mapping by semantic bundle adjustment. In CVPR, 2013.
 [17] M. Habbecke and L. Kobbelt. A surfacegrowing approach to multiview stereo reconstruction. In CVPR, 2007.
 [18] A. Handa, T. Whelan, J. McDonald, and A. J. Davison. A benchmark for RGBD visual odometry, 3D reconstruction and SLAM. In ICRA, 2014.
 [19] M. Johnson, J. Saunderson, and A. Willsky. Analyzing hogwild parallel Gaussian Gibbs sampling. In NIPS, 2013.
 [20] M. Kaess. Simultaneous localization and mapping with infinite planes. In ICRA, 2015.
 [21] M. Keller, D. Lefloch, M. Lambers, S. Izadi, T. Weyrich, and A. Kolb. Realtime 3D reconstruction in dynamic scenes using pointbased fusion. In International Conference on 3DTVConference, 2013.
 [22] C. Kerl, J. Sturm, and D. Cremers. Dense visual SLAM for RGBD cameras. In IROS, 2013.
 [23] C. Kerl, J. Sturm, and D. Cremers. Robust odometry estimation for RGBD cameras. In ICRA, 2013.
 [24] B.s. Kim, P. Kohli, and S. Savarese. 3D scene understanding by voxelCRF. In ICCV, 2013.
 [25] G. Klein and D. Murray. Parallel tracking and mapping on a camera phone. In ISMAR, 2009.
 [26] A. Kundu, Y. Li, F. Dellaert, F. Li, and J. M. Rehg. Joint semantic segmentation and 3D reconstruction from monocular video. In ECCV, 2014.
 [27] S. Lovegrove and A. J. Davison. Realtime spherical mosaicing using whole image alignment. In ECCV, 2010.
 [28] L. Ma, C. Kerl, J. Stueckler, and D. Cremers. CPASLAM: Consistent planemodel alignment for direct RGBD SLAM. In ICRA, 2016.
 [29] R. MurArtal, J. M. M. Montiel, and J. D. Tardos. ORBSLAM: a versatile and accurate monocular SLAM system. Transactions on Robotics, 31(5):1147–1163, 2015.
 [30] P. K. Nathan Silberman, Derek Hoiem and R. Fergus. Indoor segmentation and support inference from RGBD images. In ECCV, 2012.
 [31] R. Neal. Markov chain sampling methods for Dirichlet process mixture models. Journal of computational and graphical statistics, 9(2):249–265, 2000.
 [32] R. A. Newcombe, A. J. Davison, S. Izadi, P. Kohli, O. Hilliges, J. Shotton, D. Molyneaux, S. Hodges, D. Kim, and A. Fitzgibbon. Kinectfusion: Realtime dense surface mapping and tracking. In ISMAR, 2011.
 [33] R. A. Newcombe, S. J. Lovegrove, and A. J. Davison. DTAM: Dense tracking and mapping in realtime. In ICCV, 2011.
 [34] C. V. Nguyen, S. Izadi, and D. Lovell. Modeling Kinect sensor noise for improved 3D reconstruction and tracking. In 3DIMPVT, 2012.
 [35] G. NunezAntonio and E. GutiérrezPena. A Bayesian analysis of directional data using the von MisesFisher distribution. Communications in Statistics—Simulation and Computation®, 34(4):989–999, 2005.

[36]
P. Orbanz and J. Buhmann.
Smooth image segmentation by nonparametric Bayesian inference.
ECCV, 2006.  [37] B. Peasley, S. Birchfield, A. Cunningham, and F. Dellaert. Accurate online 3D occupancy grids using Manhattan world constraints. In IROS, 2012.
 [38] S. Rusinkiewicz and M. Levoy. Efficient variants of the ICP algorithm. In International Conference on 3D Digital Imaging and Modeling, 2001.
 [39] R. F. SalasMoreno, B. Glocken, P. H. Kelly, and A. J. Davison. Dense planar SLAM. In ISMAR, 2014.
 [40] R. F. SalasMoreno, R. A. Newcombe, H. Strasdat, P. H. Kelly, and A. J. Davison. SLAM++: Simultaneous localisation and mapping at the level of objects. In CVPR, 2013.
 [41] A. Segal, D. Haehnel, and S. Thrun. GeneralizedICP. In RSS, 2009.
 [42] J. Straub. Nonparametric Directional Perception. PhD thesis, Massachusetts Institute of Technology, 2017.
 [43] J. Straub, T. Campbell, J. P. How, and J. W. Fisher III. Smallvariance nonparametric clustering on the hypersphere. In CVPR, 2015.
 [44] J. Straub, J. Chang, O. Freifeld, and J. W. Fisher III. A dirichlet process mixture model for spherical data. In AISTATS, 2015.
 [45] J. Straub, G. Rosman, O. Freifeld, J. J. Leonard, and J. W. Fisher III. The Manhattan frame model – Manhattan world inference in the space of surface normals. In TPAMI, 2017.
 [46] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers. A benchmark for the evaluation of RGBD SLAM systems. In IROS, 2012.
 [47] G. Ulrich. Computer generation of distributions on the msphere. Applied Statistics, pages 158–163, 1984.
 [48] T. Weise, T. Wismer, B. Leibe, and L. Van Gool. Inhand scanning with online loop closure. In ICCV Workshops, 2009.
 [49] T. Whelan, M. Kaess, M. Fallon, H. Johannsson, J. Leonard, and J. McDonald. Kintinuous: Spatially extended kinectfusion. 2012.
 [50] T. Whelan, R. F. SalasMoreno, B. Glocker, A. J. Davison, and S. Leutenegger. Elasticfusion: Realtime dense SLAM and light source estimation. IJRR, pages 1697–1716, 2016.
 [51] J. Xiao, A. Owens, and A. Torralba. Sun3D: A database of big spaces reconstructed using SFM and object labels. In ICCV, 2013.
Comments
There are no comments yet.