Simultaneous Localization and Layout Model Selection in Manhattan Worlds

09/11/2018 ∙ by Armon Shariati, et al. ∙ 0

In this paper, we will demonstrate how Manhattan structure can be exploited to transform the Simultaneous Localization and Mapping (SLAM) problem, which is typically solved by a nonlinear optimization over feature positions, into a model selection problem solved by a convex optimization over higher order layout structures, namely walls, floors, and ceilings. Furthermore, we show how our novel formulation leads to an optimization procedure that automatically performs data association and loop closure and which ultimately produces the simplest model of the environment that is consistent with the available measurements. We verify our method on real world data sets collected with various sensing modalities.



There are no comments yet.


page 2

page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

This paper describes a novel approach to Simultaneous Localization and Mapping (SLAM) that leverages the Manhattan structure of the environment and reformulates the reconstruction task in terms of a model selection problem. Importantly, the key subproblem of establishing long range correspondences and loop closures is incorporated into the reconstruction procedure and solved automatically as part of the optimization procedure.

While there exist several SLAM approaches that try to incorporate the rectilinear structure of indoor man-made environments within their models by tracking more semantically meaningful features such as lines and planes [1, 2], to our knowledge, we are the first to frame the mapping aspect of the problem entirely as one of model selection. Our ideal model is one that would resemble an architect’s blueprint which outlines the location of all large static layout structures, namely walls, floors, and ceilings. Such a generative model, even partially complete, could not only enable a robot to track it’s position and orientation within the environment with greater precision, but could also serve as a strong prior for occupancy inference, as well as object detection and completion. We demonstrate on real data that our novel formulation, based on the principle of Occam’s razor (i.e. select the simplest model of those that best describe the data), produces an optimized trajectory and a compact map representation, jointly.

Ii Related Work

This work aims to simultaneously address several problems which collectively intersect a few broader research topics in robotics and computer vision. At the heart of our system is an optimization over a series of robot poses and landmarks, which clearly places it squarely in the SLAM domain. On the other hand, our contributions to automated layout-model selection extends its relevance to loop closure and automated floor plan generation.

Ii-a Simultaneous Localization and Mapping

The SLAM problem has been at the forefront of robotics research for well over two decades. While space constraints prohibit a comprehensive review, the dichotomy highlighted in [3] between non-structural and structural SLAM systems is quite useful when considering where the contributions of this paper best fit.

The more popular and mature non-structural SLAM systems such as [4, 5, 2] prioritize generalizability across environments at the cost of accuracy as they refrain from introducing any hard constraints between landmarks and poses, and among landmarks themselves. Meanwhile, the recently popular structural SLAM systems such as [6, 1, 7] leverage structural cues in order to provide geometric constraints, which can be used to improve accuracy at the cost of universality. Our present work falls in the latter category.

The work most closely related to our own among this body of literature would be [8]. Although nearly identical in terms of an initial approach and output, their emphasis on the computational limitations of previous SLAM approaches leads them to a Bayesian filtering based technique. Furthermore, their algorithm does not address the issues surrounding data association and automatic model selection. While [1] also touches upon several of the issues we seek to address, and approaches them in a similar fashion using similar tools, they too overlook the challenge of compact map synthesis and loop closure.

Semantic SLAM [9] is a relatively new body of work which shares many of our own objectives in a more general context. However, a key component of our approach is the assumption of underlying rectilinear structure which we explicitly exploit to improve the quality of our reconstruction.

Ii-B Loop Closure

As the results in this paper are also relevant to visual loop closure, it is worth visiting at least a few approaches to the problem thus far. However, a more in depth review of the subject may be found in [10].

What distinguishes our work here from many of the other key-frame based SLAM systems that also perform loop detection and closure, such as [5]

, is that our system is able to solve the loop detection problem and the belief-update problem in a single optimization framework instead of relying on separate modules. While this combinatorial optimization problem is NP-hard we are able to reformulate the search as a convex optimization problem which makes this approach feasible.

Latif et al. [11]

also observed that sparsity can be leveraged in the context of loop closure. However, while their approach is effectively a search through a set of correspondences to find one sparse set of basis vectors for reconstruction, we invert the problem by trying to instead maximize the total number of correspondences, which allows us to solve the more general problem of data association across temporal frames in addition to loop closure.

Ii-C Automated Floor Plan Generation

Constructing a meaningful representation of an indoor environment is at the heart of automated floor plan generation [12]. While we may share many of the motivations behind the works within this body of literature, our problem domain is that of robotic exploration. As a result, the complexity of many of the models used in these approaches are still greater than is necessary, and furthermore, entirely overlook the localization aspect of the SLAM problem.

Among the more complex and expressive models, are those described in [13] and [14], both of which lift the Manhattan World assumption in order to output a set of watertight polyhedra representing the boundaries of rooms in the environment. These techniques also focus on volumetric labeling of the space.

While both [15] and [16] present solutions for use on mobile platforms, Liu et al. take a learning approach to the problem, whereas the authors of [16] leverage the potential of user intervention.

Another approach, relying on image panoramas alone can be found in [17], which formulates the floor plan reconstruction as a shortest path problem.

Iii Technical Approach

In this section we describe the main features of our structural analysis procedure. To ground our subsequent discussion we begin with a brief description of the input data and the hardware systems used to acquire it in our subsequent experiments. Figure 1 shows an example of one of our sensor rigs. Despite slight differences among sensor configurations, every rig features a stereo pair of cameras, hardware synchronized to an inertial measurement unit (IMU), as well as a depth sensor that captures low-resolution depth images up to a range of about 6 meters. The data from the stereo cameras and IMU are used to drive a stereo based MSCKF Visual-Inertial Odometery (VIO) system [18]. Further details surrounding each sensor are provided in Section I.

The end result is that our sensor suites provide the analysis algorithm with a set of depth maps along with initial estimates for the relative motion of the sensor rig over time and an estimate for the direction of the gravity vector in each frame.

Fig. 1: Sensor rig used to acquire data. Annotated in yellow are the PMD Monstar depth sensor and a custom stereo pair. The camera to the right of the Monstar and the center stereo camera are not used.

We note that similar datasets could be acquired using other means. One could acquire depth maps using a LIDAR sensor like the Velodyne puck or from a passive stereo system. Similarly pose information could be derived from monocular Visual-Inertial Odometry or from wheel encoders on a moving platform. The proposed analysis would still be applicable in all of these cases.

The first stage in the analysis involves processing each frame in the depth map separately to extract salient axis-aligned planar fragments of layout structure, called layout segments. The first step in this process involves projecting the depth points into the plane defined by the measured gravity vector and then rotating the resulting 2D point set in one degree increments to find a yaw orientation that minimizes the entropy of the resulting point distribution. This algorithm is described in a number of previous works including [19, 20], and [21] where it is referred to as an entropy compass. Upon completion, this procedure recovers the orientation of the frame with respect to the prevailing Manhattan structure. Once this has been done, the system labels pixels in the depth map according to the axis alignment of the surrounding patch, and then groups them together using a connected components procedure as shown in Figure 2. Finally, using an inverse perspective projection, each cluster of pixels is projected onto the axis-aligned plane in the scene whose location is defined by the centroid of the 3D points in the component. A layout segment is defined by the extent of the projected 3D points in a cluster and the location of the plane on which they reside.

Fig. 2: Depth map broken into salient surfaces. Red, green, and blue pixels represent , , and -axis alignment. A pixel is assigned to the major axis which maximizes the number of pixels in its neighborhood that would reside on the plane centered at with the given major axis orientation. If no axis can be assigned with sufficient confidence or no depth information is recorded at , it is colored black and white respectively.

In addition to the entropy compass procedure which is applied to each frame individually, the system has an estimate for the relative orientation between each frame derived from the visual-inertial odometry system. These two sources of information are fused to provide a final estimate of the orientation of each frame in the sequence. The relative yaw estimates from the VIO system are used to constrain the range of angles considered in the entropy compass phase and to provide orientation estimates during periods where no axis-aligned surfaces are visible.

The end result of the procedure is described in Figure 3, which shows a top down view of a set of camera frames. Each frame is associated with two coordinate frames of reference, one which indicates the actual orientation of the sensor head and the other indicating an axis-aligned frame derived from the entropy analysis. Each layout measurement, denoting an estimate for the minimum distance between the camera frame and the corresponding layout segment observed at that frame, is depicted by a red or green dotted line.

Fig. 3: A 2-dimensional geometric representation of our model which illustrates a sensor moving through a Manhattan environment making periodic range measurements to various layout structures. Solid lines correspond to layout structures, while dotted lines correspond to measurements. Each distance measurement to a particular layout structure corresponds to the distance computed to the visible layout segment within the depth map captured at that frame.
Fig. 4:

A functional representation of our model as a factor-graph. Circles, triangles, and rectangles correspond to nodes encapsulating parameters we wish to estimate – sensor and axis-aligned layout segment positions. Solid lines correspond to measurement factors, which arise from the VIO, entropy analysis, and depth map processing. We extend the traditional factor-graph formulation by including binary correspondence edges, represented by dotted lines. Initially generated by a temporal analysis, the set of hypothetical correspondence edges is also augmented by a user defined heuristic. Our sparse optimization procedure ultimately determines which of these constraints to enforce and discard.

This system of measurements can be abstracted into the factor-graph [22] shown in Figure 4. Here the circular nodes on top correspond to axis-aligned frame positions while the triangular and rectangular nodes on the bottom correspond to layout segments. The links between frames correspond to the estimates for interframe motion while the links between the frames and the layout segments correspond to the distance measurements described in Figure 3.

In the sequel we will use the following notation to describe the elements of the model shown in Figure 4.

Let denote the position of frame with respect to an axis-aligned global world frame while denotes the orientation of the axis-aligned frame with respect to the corresponding sensor frame. Each of the layout segments that we observe will ultimately be associated with a structural supporting layout plane, which is modeled as an axis-aligned surface with infinite extent. Each such layout plane will be modeled with a single parameter. More specifically we will let denote the coordinate of an infinite layout plane with index that is perpendicular to the -axis of the model, similarly denotes the coordinate of a aligned layout plane with index and denotes the coordinate of a aligned layout plane with index .

Correspondences between layout segments are denoted by the dotted lines in Figure 4. These correspondences amount to asserting that two extracted segments lie on the same axis-aligned layout plane. Note that these correspondences would typically link layout segments extracted in different frames but could also link two segments extracted in the same frame. At this stage of the analysis procedure a simple temporal analysis procedure is used to establish correspondences between segments seen in one frame and segments seen in the subsequent frame that have sufficient overlap. This initial set of correspondences will be augmented with longer range correspondences that are automatically discovered in a subsequent step of the process.

We will let the vector denote the estimate for the translation between subsequent axis-aligned frames in the sequence that is derived from the visual odometry system and the orientation estimation procedure. That is denotes an estimate for the quantity .

We will let denote a vector formed by stacking the free parameters of our model, that is , , , and , for all , , , and . Note that we assume that the camera orientations that align the frames with the Manhattan model, , have been estimated using the entropy compass procedure described previously.

In this case the measurement system takes on a particularly simple linear form. Namely for each measurement from a frame to an -aligned layout segment we have an equation of the form


where denotes the x coordinate associated with the -aligned layout plane associated with the layout segment, denotes the coordinate of the position of frame , and denotes the measured offset between the layout segment and the camera as depicted in Figure 3. Note can be signed depending upon where the frame is relative to the layout segment.

For layout segments aligned with the axis and axis we would have exactly analogous equations


As previously discussed, the measurements of interframe motion derived from the VIO system and entropy analysis can be modeled as follows


Given this system of measurements the task of finding the optimal estimate for the structure of the scene and the trajectory of the sensor based on the factor-graph simply amounts to solving a sparse linear system in a least squares sense.


This is simply the system formed by stacking the measurement equations, namely Equations 1, 2, 3, and 4, into a single sparse system. The vector aggregates the right hand sides of the equations including the distance measurements, , , , and the translation estimates . This sparse system can be solved extremely efficiently even for relatively large systems of measurements.

Fig. 5: An illustration of how our convex solution (below) can improve reconstruction by eliminating the drift still present in the least-squares solution (above). Notice the reduction in the total number of layout segments.

Running this procedure yields the result shown in the second column of Figure 6. Each entry shows a result that captures the overall structure of the hallway but also exhibits the kind of drift typically associated with SLAM solutions; an artifact further highlighted in Figure 5. These reconstruction errors stem from the fact that the initial set of correspondences derived from the stream of depth frames is necessarily incomplete. While correspondences derived from frame to frame analysis are typically correct they fail to capture salient long term matches. For example, when one enters then exits a room it is important to encode the fact that walls in the hallway were in fact previously seen and are not new features in the map. Similarly it is entirely possible to encounter a structural wall, then an opening, and then an entirely new section of the same wall. This problem of establishing long range correspondences is exacerbated by the fact that layout structures, unlike visual point feature landmarks, are extended structures and are rarely visually distinctive. Different sections of the same structure can have different appearances in different locations which can frustrate simple techniques that attempt to establish these long range correspondences.

We note that this problem of establishing long range correspondences subsumes the problem of loop closure which also revolves around the issue of deciding that one structure, a wall in this case, corresponds to one that was observed previously.

We propose a novel method that allows us to solve this problem by reimagining this problem as one of model selection where our goal is to derive the simplest model that is consistent with our observations.

We begin by noting that solutions to Equation 5 suffer from having too many wall surfaces. This is because when a layout structure is encountered again after an intervening break it will be entered again in the map as a new structural layout plane. Our goal then is to discover which of the segments in our overly large model could actually be coincident. Identifying two or more layout structures with each other effectively reduces the number of parameters associated with the model since all of the displacement parameters associated with that set are collapsed to a single value. In this way we effectively compress the model leading to a simpler solution.

We begin by encoding all of the possible or suspected equivalences between layout segments in a set of equations of the following form


As you would expect Equation 6 encodes the idea that the aligned layout plane with index and the one with index are in fact the same. Analogous equations are defined for and aligned layout planes.

These possible equivalences can be readily accumulated into a single sparse linear system . Where is a sparse matrix encoding the relationship and is the vector of model parameters described earlier and used in Equation 5.

One possible approach to generating equivalence hypotheses, is to simply enumerate all possible equivalences between segments which face the same direction (north, east, south, west). However, this approach leads to an unnecessarily large matrix that contains numerous spurious hypotheses; the effects of which we discuss more thoroughly in Section I. For now, we adopt the heuristic of enumerating all possible equivalences between segments facing in the same direction that are within 1.5 meters of each other.

At this point we are not sure which of the equivalences are correct and which are false. This leads to a model selection problem. If there are possible equivalence relations then there are in principle possible models depending on which of the equivalence relations are enforced, modulo independence issues related to transitive closures among the equivalence relations.

How then can we go about selecting which relations are correct from this exponentially large set of possibilities?

We begin by using the original reconstruction problem as a system that defines a set of possible solutions. We do this by considering the set of values that satisfy:


Where encodes the discrepancy between a proposed solution and the available measurements. One way to choose delta is simply by setting it to

where is the optimal value of after solving Equation 5. Alternatively, one can relate to the error that one expects in the measurements based on the sensor model. We could also imagine replacing the norm with the or norms.

In either case the result is a convex set of parameters that are sufficiently consistent with the original set of correspondences.

We then view our problem as finding the point in the set that maximizes the number of equivalence relations we can satisfy. Note that maximizing the number of equivalences is equivalent to minimizing the number of parameters in the final model, so our goal is to effectively apply the principle of Occam’s razor to find the simplest model that explains our data.

Formally we can state our goal as follows

subject to

In this expression, the norm of a vector simply counts the number of non-zero entries in its input. This problem formulation is reminiscent of the kinds of problems one encounters in compressed sensing.

While this formulation is what we would ideally like to tackle, the discontinuous nature of the norm makes it intractable so we resort instead to the norm which we can view as a convex relaxation of our original problem.

Our new goal then can be stated as follows

subject to

Many may notice the similarity between our formulation and the LASSO procedure [23]. LASSO performs subset selection over model coefficients by forcing as many of them to zero by bounding the sum of the absolute values of regression coefficients. Our approach to model simplification is different as our procedure reduces model complexity by enforcing equivalence relations encoded in the matrix.

At this point we note that the optimization problem stated in Equation 9 involves minimizing a convex function subject to a convex constraint which places us squarely in the domain of convex optimization. The resulting problem can be reformulated as solving for the optimal value of a linear objective function subject to a set of linear and convex quadratic constraints. We note that we can solve problems involving hundreds of variables in a matter of seconds due to the sparseness of the underlying systems. In our current implementation we formulate and solve this problem in Matlab using CVX.

Once the problem has been solved we examine the resulting vector and apply a threshold to decide which of the equivalences should be enforced. We then re-solve the optimization problem enforcing these equivalences

subject to

where denotes the reduced set of enforced equivalences. The extent of the new layout structures are determined by computing the boundary around the individual corresponding layout segments residing on the same plane.

We also introduce an additional hard constraint on the trajectory of the sensor relative to the layout segments. For instance, a measurement to layout segment also introduces the following linear inequality constraint


This is done in order to ensure that the uncovered model is topologically consistent with what is observed. Accumulating these inequalities yields an additional convex constraint , which is added to Equations 9 and 10.

Iv Experimental Results

Area ID 1 2 3 4 5 6
Number of Equivalences Considered 149 3826 4889 414 5568 824
Number of Equivalences Accepted 139 2474 3430 275 5564 622
Number of Initial Layout Segments in Model 56 291 363 96 192 119
Number of Layout Structures After Analysis 14 58 74 34 30 26
Complexity Reduction % 75.0 80.1 79.6 64.6 84.4 78.2
Optimization Time (s) 0.17 1.85 10.22 0.74 16.21 2.26
Path Length (m) 53 200 249 69 113 67
TABLE I: Optimization results in each of the mapped environments.

Each of our 6 experiments entailed generating a map of a different indoor area across the University of Pennsylvania’s campus using our proposed method. We use three different sensor rigs in order to demonstrate robustness. All of them have in common a custom stereo monochrome camera running at 20 Hz with a field of view (FOV) [24]. Together with a hardware synchronized intertial measurement unit (IMU), they provide data to drive the stereo VIO algorithm. The primary difference between rigs is the choice of depth sensor and its frame rate. Rig A (used for Areas 1,2, and 3) hosts a PMD Monstar time-of-flight depth sensor, which captures resolution frames at 10 Hz and has a FOV of . Rig B (used for Areas 4 and 5) features the same depth sensor, but run at 5Hz. Lastly, Rig C (used for Area 6) replaces the Monstar with an Orbbec Astra structured light camera, which runs at 30Hz and has a resolution of and a FOV of .

Agnostic to sensor choice or environment, all computations were carried out using the same set of optimization parameters – another advantageous aspect of our approach. The entropy compass explored a radius of around the estimate of angular displacement provided by the VIO system. The value of used to determine was empirically set to . Finally, the threshold value of used to conclude equivalence between two segments was set to centimeters. All of our computations were carried out using an HP Omen PC 880-160se, which is equipped with 32GB of RAM, an Intel i7-8700K CPU, and an NVIDIA GeForce GTX 1080Ti graphics card.

Figure 6 illustrates a composite of the reconstruction results in each environment using our convex approach, which we compare to reconstructions based on registering layout segments to the axis-aligned frame position they were observed in, as well as the results of the least-squares optimization described in Equation 5. Even though all of these environments can be roughly described as Manhattan, each presented its own unique set of challenges.

For instance, Area 2 contains a few instances of walls oriented at angles, which at times confounded our entropy analysis, and in turn, layout segment detection. Moreover, this resulted in a “smear” of phantom layout segments to be generated along the face of the offending wall, which also led to an undesirable chain of correspondences to arise between what are two separate walls.

Area 3 is rather large and contains many classrooms along its eastern wing. As a result, redundant candidate layout structures were generated as we encountered the same walls multiple times upon entering and exiting each classroom, which our optimization then had to consider.

While we expected Area 5 to be trivial to model due to its relatively small area and simple topology, we actually observed a much slower time to convergence. Ironically, this was precisely due to its simple corridor structure, as the layout segments generated from observing the dominant walls on either side of the robot formed two large fully connected components in the subgraph of our factor-graph containing only correspondence edges. This inevitably led to an explosion in the number of rows in our matrix.

Area 6 posed an interesting set of challenges in that it features a modern architecture with many glass surfaces (embedded even in doors), large open areas, and exposed structural I-beams oriented at various angles. As a result, not only was the entropy analysis and layout segment detection confounded by the actual layout itself, but also by missing and corrupted depth measurements.

Table I shows the relationship between the number of equivalences considered and those accepted, as well as that between the number of layout structures found and the original set of layout segments. Also provided is the effect each of these statistics has on the time taken by the optimization to converge. Note that the system ends up selecting a relatively small subset of layout structures from among the total set of segments.

Importantly, almost all of the examples involve situations where the robot needs to perform loop closures to account for situations where the same surface is encountered again after a significant interval of time. These loop closures are automatically detected and factored into the reconstruction as part of our procedure.

It is worth mentioning that in earlier experiments, we simply enumerated all possible equivalences, while this was clearly naive, the system still produced a correct reconstruction due to the bound on the allowable reconstruction error. This suggests that our system was able to effectively choose the right set of equivalences to enforce among a sea of spurious ones, which means that one can suggest many possible correspondences secure in the knowledge that the system is capable of weeding out the incorrect ones automatically.

Another noteworthy aspect of our approach is that, unlike most other optimization-based SLAM solutions, our system suffers no significant difference in computation time based on the size of the environment to be mapped. Instead, performance depends on the number of potential correspondences generated.

V Conclusion

In conclusion, we have demonstrated an approach for generating compact reconstructions of Manhattan environments. In locations where a reasonable estimate for one’s rotation can be inferred from the visible Manhattan structure, we can solve the full SLAM problem using convex optimization. Furthermore, our sparse objective enables us to explore the vast combinatorial space of potential data associations and loop closures, which results in a drift-free trajectory alongside a compact representation of the map. We validate our mapping procedure in a set of reasonably representative Manhattan-like indoor environments with multiple sensing modalities.

What is still to be explored however is a way in which to better determine a layout structure’s extent. While we have uncovered an efficient way of estimating the position of pervasive structures within the world, estimating where these structures truly begin and end falls to the less satisfying heuristic of simply merging boundaries of individual layout segments. This is primarily due to the difficulty in modeling the interaction between orthogonal layout structures in a way that is characteristic of most Manhattan environments. This is a challenge we wish to address in our future work.


  • [1] M. Hsiao, E. Westman, and M. Kaess, “Dense Planar-Inertial SLAM with Structural Constraints,” in IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 6521–6528.
  • [2] A. Pumarola, A. Vakhitov, A. Agudo, A. Sanfeliu, and F. Moreno-Noguer, “PL-SLAM: Real-time monocular visual SLAM with points and lines,” in IEEE International Conference on Robotics and Automation (ICRA), 2017.
  • [3] H. Li, J. Yao, J.-c. Bazin, X. Lu, Y. Xing, and K. Liu, “A Monocular SLAM System Leveraging Structural Regularity in Manhattan World,” in IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 2518–2525.
  • [4] H. Li, J. Yao, X. Lu, and J. Wu, “Combining Points and Lines for Camera Pose Estimation and Optimization in Monocular Visual Odometry *,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, pp. 1289–1296.
  • [5] R. Mur-Artal, J. M. Montiel, and J. D. Tardos, “ORB-SLAM: A Versatile and Accurate Monocular SLAM System,” IEEE Transactions on Robotics, vol. 31, no. 5, pp. 1147–1163, 2015.
  • [6] F. Camposeco and M. Pollefeys, “Using vanishing points to improve visual-inertial odometry,” in IEEE International Conference on Robotics and Automation (ICRA), vol. 2015-June, no. June, 2015, pp. 5219–5225.
  • [7] H. Zhou, D. Zou, L. Pei, R. Ying, P. Liu, and W. Yu, “StructSLAM: Visual SLAM with building structure lines,” IEEE Transactions on Vehicular Technology, vol. 64, no. 4, pp. 1364–1375, 2015.
  • [8] P. Kim, B. Coltin, and H. J. Kim, “Linear RGB-D SLAM for Planar Environments,” in European Conference on Computer Vision (ECCV), 2018, pp. 1–16.
  • [9] S. L. Bowman, N. Atanasov, K. Daniilidis, and G. J. Pappas, “Probabilistic data association for semantic SLAM,” in IEEE International Conference on Robotics and Automation (ICRA), 2017, pp. 1722–1729.
  • [10] S. Lowry, N. Sunderhauf, P. Newman, J. J. Leonard, D. Cox, P. Corke, and M. J. Milford, “Visual Place Recognition: A Survey,” IEEE Transactions on Robotics, vol. 32, no. 1, pp. 1–19, 2016.
  • [11] Y. Latif, G. Huang, J. Leonard, and J. Neira, “Sparse optimization for robust and efficient loop closing,” Robotics and Autonomous Systems, vol. 93, pp. 13–26, 2017.
  • [12] B. Okorn, X. Xiong, B. Akinci, and D. Huber, “Toward Automated Modeling of Floor Plans,” in Symposium on 3D Data Processing, Visualization and Transmission, 2010.
  • [13] C. Mura, O. Mattausch, and R. Pajarola, “Piecewise-planar Reconstruction of Multi-room Interiors with Arbitrary Wall Arrangements,” Computer Graphics Forum, vol. 35, no. 7, pp. 179–188, oct 2016. [Online]. Available:
  • [14] S. Ochmann, R. Vock, R. Wessel, and R. Klein, “Automatic reconstruction of parametric building models from indoor point clouds,” Computers and Graphics (Pergamon), vol. 54, pp. 94–103, feb 2016.
  • [15] C. Liu, J. Wu, and Y. Furukawa, “Floornet: A unified framework for floorplan reconstruction from 3d scans,” CoRR, vol. abs/1804.00090, 2018. [Online]. Available:
  • [16] V. Angladon, S. Gasparini, and V. Charvillat, “Room floor plan generation on a project tango device,” in

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

    , vol. 10705 LNCS, 2018, pp. 226–238.
  • [17] R. Cabral and Y. Furukawa, “Piecewise planar and compact floorplan reconstruction from images,” in

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    , 2014.
  • [18] K. Sun, K. Mohta, B. Pfrommer, M. Watterson, S. Liu, Y. Mulgaonkar, C. J. Taylor, and V. Kumar, “Robust Stereo Visual Inertial Odometry for Fast Autonomous Flight,” IEEE Robotics and Automation Letters (RA-L), 2017. [Online]. Available:
  • [19] J. C. Bazin, Y. Seo, and M. Pollefeys, “Globally optimal consensus set maximization through rotation search,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7725 LNCS, no. PART 2.   Springer, Berlin, Heidelberg, 2013, pp. 539–551. [Online]. Available:{_}42
  • [20] A. Taneja, L. Ballan, and M. Pollefeys, “Never get lost again: Vision based navigation using streetview images,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).   Springer, Cham, 2015, vol. 9007, pp. 99–114. [Online]. Available:{_}7
  • [21] A. Cowley, C. J. Taylor, and B. Southall, “Rapid multi-robot exploration with topometric maps,” in IEEE International Conference on Robotics and Automation (ICRA).   IEEE, may 2011, pp. 1044–1049. [Online]. Available:
  • [22] F. R. Kschischang, B. J. Frey, and H. A. Loeliger, “Factor graphs and the sum-product algorithm,” IEEE Transactions on Information Theory, 2001.
  • [23] R. Tibshirani, “Regression shrinkage and selection via the lasso: A retrospective,” Journal of the Royal Statistical Society. Series B: Statistical Methodology, 2011.
  • [24] “Open vision computer,”