Simultaneous localization and mapping (SLAM) has been shown as an excellent solution when a robot needs to move in environments that are unknown and GPS is unavailable. We are interested in the problem of reconstructing navigable space from a cloud of 3D map points that are produced from SLAM methods. Particularly, we investigate the map points that are sparse and noisy, and are generated and converted from visual features detected by a vision system (either monocular or stereo vision) carried by a robot. The map points represent obstacles in the environment, however, they are not immediately useful for robot navigation because these visual feature based 3D map points can be very inaccurate for describing obstacle’s position information, especially if we compare them with the point clouds generated from other high-definition ranging sensors such as LIDAR. The objective of this work is to extract navigable space from sparse map points, the result of which can be used for future collision-free navigation and motion planning.
To maximize the constructed free space from the noisy and irregular map points, we opt to use a convex region growing strategy  which iteratively looks for a sequence of polyhedra with maximum inscribed ellipsoids using quadratic programming and semidefinite programming. We then develop a framework by incrementally seeding and creating local convex regions along a robot’s trajectory. To build a complete navigable volume (tunnel) for the entire map, we reconstruct a regulated version of point cloud which looks similar to the outputs of high-definition ranging/depth sensors. The regulation is built on a projection process on the generated convex hulls. This is done by projecting the original noisy points around the polyhedra onto the surfaces of those polyhedra, after which those points falling inside the convex hulls are culled. The regulated point cloud also allows human users to quickly recognize and abstract the environmental information.
We have validated our proposed framework using both KITTI dataset and a real environmental structure. Our results reveal that the reconstructed navigable free space has small volume loss (error) comparing with the ground truth, and the method is highly efficient, allowing real-time computation and online planning.
An illustration of our proposed method applied to a real environment. (a) The original map point cloud generated by a sparse visual SLAM method. (b) After filtering out outlier (distant) points, the free space is computed. The free space consists of a series of overlapping convex hulls which capture the shape of the free space while excluding any points inside. (c) For better visualization and environmental abstraction, we regulate the original noisy map points by projecting them onto the facets of the convex hulls. (d) The projected point cloud is further refined to be denser and smoother.
To the best of our knowledge, this is the first time to extract free space from 3D sparse and noisy map points transformed from visual features using point regulation operation.
Ii Related Work
Point cloud processing methods have been well studied in past years. Existing point cloud methods can be categorized as surface reconstruction, volume reconstruction, model fitting, and kernel-based regression frameworks. Specifically, a large set of early work aims to build surfaces from point clouds. For instance, moving least squares (MLS) based methods [2, 3] were developed to reconstruct point clouds output from laser scans; projection-based greedy strategies  were used to achieve incremental surface growing and building. Signed distance function  and Bayesian method  have also been investigated for surface reconstruction. Some other works, like , , and , adopted volume carving mechanism to obtain free space given a set of points. Typically these methods first decompose the space into cells using 3D triangulation techniques, and then the visibility constraints are used to carve out those cells passed by the visibility line. Different from the surface-based and volume-based reconstruction schemes, RANSAC based model fitting methods [10, 11]have been used to capture the spatial structure of the given set of points. Online kernel-based learning methods 
have also been proposed to implement terrain estimation from the point cloud of a LIDAR scanner.
Related work also includes various mapping approaches as our work utilizes the 3D map points generated from existing mapping methods. Existing map forms for robots navigation include, for example, occupancy grid map , 3D OctoMap , signed distance map [15, 16], topological map [17, 18], and convex region growing map [1, 19, 20], etc. In this work we are particularly interested in the map points generated from sparse visual features such as those from ORB-SLAM  and SVO .
assume that the points are captured by ranging sensors such as Lidars, sonars or depth cameras, and therefore the points are dense and evenly (uniformly) distributed like a mesh surface. We cannot use those approaches, because we are not able to get well estimated normal and curvature of point set from those techniques. Volume reconstruction methods[7, 8, 9] need to implement 3D triangulation for all the points. This however is not necessary if building a navigable space is the final goal: we want to build a map with only a minimal set of points defining the free space, instead of all of the (possibly noisy) points. In addition, the computation requirement for the 3D triangulation and the post-processing can be prohibitive. Plane fitting methods [10, 11] also fail in our case due to the high ambiguity of points structures.
Iii Method Overview
An autonomous robot needs to know clearly the obstacle space and free space before taking actions. We desire the map that describes the environment to include precise 3D information of obstacles just like the map point cloud constructed from a 3D LIDAR sensor. However, this is challenging or even unrealistic due to the sparse and noisy visual features detected. One possible way to “relax” this problem is to build an approximate navigable space with conservative surfaces to “bound” (represent) obstacles. In other words, we aim at obtaining a conservative but safe approximation of a free, navigable space instead of an exact representation of the obstacle surfaces. The input of our work is the 3D map points (point cloud) generated from existing sparse visual odometry or mapping methods.
Different from the points generated from ranging sensors which capture dense 3D position information with high accuracy, the map points constructed by visual odometry or SLAM methods are much irregular, resulting in great difficulty for accurately describing 3D object information. Specifically, there are a couple of challenges:
High variation on the density of map points. The map points are typically converted from visual features by visual odometry or SLAM methods. In general the visual features are not evenly or uniformly distributed in the field of view, regardless of the feature detectors/descriptors employed. Consequently, the converted 3D map points are distributed unevenly in 3D space and are visually nowhere surface-like. See Fig. 2.
Large noise on point’s position. Ideally, the map points of a smooth surface like a wall should be located on a plane, but in reality the computed map points are rather scattered randomly in a 3D volume around the surface area, leading to great difficulty for reconstructing a surface.
Misleading information. Different from conventional data processing methods where a small number of outliers are viewed as noise points, in our context the seeming outliers could be the majority and cannot be simply discarded. Also highly dense batches of points do not necessarily mean the area is valuable/important but perhaps contain much misleading information and thus require extra effort to remove, thin, or regulate them.
Typical existing surface and volume reconstruction methods (e.g., [3, 4, 5, 9]) do not work as they usually focus on triangulation for all the points but do not provide estimation of normal and curvature information from sparse and noisy points.
It is also worth mentioning that, although each input map point may contain multiple pieces of information (for example, in the ORB-SLAM, a map point not only contains the 3D position information for an extracted feature key point, but also records information of the original ORB feature descriptor as well as the view direction from a corresponding keyframe optical center), in this work, we only take advantage of the 3D position information for each map point.
Iii-B Free Space Extraction and Construction
Instead of reconstructing the point cloud (obstacles) as in most existing work [3, 4, 10], we are interested in building the free space directly through discarding the “obstacle space”. An additional advantage of doing this is that we immediately obtain the obstacle-free area in which we can model planning modules directly.
Specifically, we opt to grow a convex hull under the saptial constraints of the map points. We adopt to use the Iterative Regional Inflation by Semidefinite (IRIS) method 
. IRIS is a convex optimization process to find a large and convex region given a set of constraints in 3D space. The method alternates between searching for a set of hyperplanes that separate a convex space from the obstacles using quadratic programming and searching for the maximum ellipsoid which inscribe the convex space previously found using semidefinite programming.
Formally, an ellipsoid is represented as an image of a unit ball:
where is the 3D points, represents the points inside a unit ball. C and d are the coefficients of the affine transformation.
The polyhedron (3D polytope) is represented by a set of linear inequalities:
where denotes a series of constraints and
are a coefficient matrix and a vector, respectively.
The notation of the ellipsoid takes a form of affine transformation, which has a straightforward meaning when we try to maximize the volume of the ellipsoid: the volume is proportional to the determinant detC. Let be the vertices of the obstacles and be the set of obstacles. The problem could be formulated as follows:
where is a vector representing the row of A, is the element of b, is the number of obstacles and is the number of vertices on obstacle . The first constraint requires all vertices of obstacles to be on one side of the obtained hyperplanes while the second constraint requires all points on the ellipsoid to be on the other side of those obtained hyperplanes.
Iii-C Seeding and Building Local Convex Regions
A hurdle that prevents us from directly utilizing IRIS lies in that, the obstacles in our case are not convex polyhedrons, but a cloud of sparse, noisy, and cluttered map points.
To address the problem, first, we sample points as seed points along the robot’s trajectory with a certain interval. Then the evident outlier (noise) map points are filtered and eliminated. An outlier map point can be identified if the map point is far away from both the sampling point on robot’s trajectory and the centroid of a patch of point cloud around that sampling point. Since the map points in 3D space are viewed as obstacles, instead of finding the closest point on obstacles to the ellipsoid as mentioned in the standard IRIS process, we opt to look for the closest point to the ellipsoid among the nearest neighboring points. Therefore, we employ the -nearest neighbors (-NN) method for each sampling point in the filtered point cloud.
After that, the convex region growing process starts from each seed point and terminates when the ellipsoid touches some points and its growth rate is below some predefined threshold. Because such growing process only occurs inside the free space, the final extracted result is always guaranteed to be obstacle-free (and collision-free if the kinodynamic motion is planned inside this space).
In greater detail, let be the set of seed points sampled from the trajectory, and be the set of nearest neighbors of , where is defined as , indexing the sampled points along the trajectory. Each map point from the point cloud is represented as where denotes the point index in set . The set of convex hulls is represented by . From each convex hull , we can obtain a set of vertices and a set of meshes . To create the free space for a complete map, we apply IRIS on each seed and its related . The free space is thus represented with a sequence of convex hulls that are pair-wise overlapped (connected). An illustration of a convex hull computed around one seed point is shown in Fig. 4.
It is also worth noting that, the piece-wise convexity allows convex constraints to be naturally added while generating robot trajectories, after which the convex optimization problem can be easily formulated and solved.
Iii-D Point Cloud Regulation
We are also interested in generating the form of point cloud that is similar to high-definition 3D ranging sensors because a dense and clean form of point cloud can help human quickly recognize and abstract environmental information. Such “cleaned” map point cloud is particularly useful for the human-in-the-loop system. To achieve this, we propose a method that regulates the original noisy map points by projecting all points on the corresponding convex hull surfaces.
The high-level flowchart can be seen in Fig. 6. The input for the whole reconstruction process consists of the seed point set sampled from the trajectory, their related nearest points, and the generated convex hulls using IRIS. For each seed point and its nearest points as well as its associated convex hull, we first make a projection of the nearest points onto the adjacent surfaces of the convex hull. Next, since the convex hulls are pairwise overlapped, the projection of nearest points on hull surfaces may cause some points to fall inside other convex hulls, and thus those points have to be removed. After that, the points will be down-sampled using the Voxel Grid Filter approach. This will make the points uniformly distributed. The down-sampled points will then be triangulated using projection-based incremental triangulation method 
to form a mesh and by doing so, we create mesh edges between points. After building the mesh, we perform an interpolation and add more points on the edges of the built mesh, resulting a denser point cloud. Finally, we use Moving Least Squares (MLS) to smooth the points, and the output is the reconstructed point cloud. The blocks in green refine the points and can be achieved using existing tools (e.g., using suites from PCL ). Here we focus on describing the point regulation components in red blocks, as briefly illustrated in Fig. 5.
Iii-D1 Points projection
For each convex hull , we project the points in on the meshes of and obtain the projected points . Let be the points that fall inside other convex hulls in and let be the points that do not fall inside other convex hulls in . Obviously, . Such point projection process ensures that the regulated point cloud captures the general structure of the convex hull while respecting the distribution of the original map points.
Iii-D2 Points culling
The method for points culling is based on the convexity property of the convex hull, i.e., a segment connecting any two points inside the convex hull has no intersection with any boundary of the convex hull. In our case, if any projected point on a convex hull A falls inside another hull B, the segment connecting the seed point of B with the projected point will be inside B and has no intersection with the boundary of B. If the segment has intersection with the boundary of B, that means the projected points is outside the hull B. We then remove those points.
An illustration in 2D scenario is shown in Fig. 7. We take two convex hulls: and . The red dots and are the two adjacent seed points. The projected points on are those yellow points whereas the projected points on are those blue points. The points in are notated with and , and the points in are marked with , and . In Fig. 7, all segment lines connecting the seed point and the points in are within . While all the segment lines connecting the seed point and the points in are within . In contrast, in Fig. 7, any segment line connecting the points in and has an intersection with the border of , whereas any segment line connecting the points in and causes an intersection with the border of . For example, see the purple star in Fig. 7.
The culling process allows us to harvest those “safe” map points. We can observe that the obtained map points are typically located on the outer surface of the convex hulls, but they might not be satisfactorily smooth and dense. Techniques shown in green blocks of Fig. 6 are then employed to further refine the result.
We validate our proposed method using both KITTI dataset and a real built structure/environment. The sparse map points are generated through running the ORB_SLAM2 . For the real environment experiment, we use TurtleBot3 Waffle which is equipped with Intel RealSense R200 camera carried by the robot for capturing the environment data. We use the monocular mode of ORB_SLAM2 so that only 2D images are used for extracting visual features.
Iv-a Experiment with KITTI Dataset
As mentioned in Sect. III, outlier points from the original point cloud need to be eliminated first. To do so, the Statistical Outlier Removal Filter tool provided by PCL  is adopted. A comparison between before and after filtering can be seen in Fig. 8. The filtered result provides a point cloud input to our framework. The created free space using our proposed method is shown in Fig. 9. In general, the created convex regions can cover the free space inside the point cloud while satisfying the obstacle-free property—no point will be included inside the convex regions.
Based on the constructed free space, we implement the procedure shown in Fig. 6 and obtained the results as shown in Fig. 10. Fig. 10 reveals the regulated point cloud after projection on convex hull, from which we can observe that the cloud are cleaner than the original noisy map points. We then further refine the result and obtain a denser and smoother point cloud as shown in Fig. 10. This improved point cloud well shapes the free space while keeping the general distribution of the original point cloud (no point appears on the top side as no feature is detected from the sky).
Iv-B Experiment in Real Environment
Although KITTI dataset provides great convenience for conducting simulation, it is extremely difficult to evaluate the performance as the outdoor environment captured in the video is complex and cannot be accessed. To evaluate a set of basic performances, we build an environment with a regular shape whose geometry can be exactly measured and calculated. Such ground truth greatly facilitates comparison and analysis for our reconstruction results. The built environment and its corresponding dimension information are shown in Fig. 11.
To avoid environmental symmetry which could easily provide features of great similarity and result in larger chances of loop closure errors, we design the testing environment in a simple and asymmetric form. Additionally, random pictures (textures) and small objects are placed in the environment to guarantee sufficient features can be detected and used. Our reconstruction result can be seen in Fig. 12.
To evaluate the experiment result, we compare the boundary of the created free space, that is, the boundary of the convex hulls with the real boundary of the environment. The error is defined as the distance between these two versions of boundaries. Since calculation of the difference between two 3D hull-surfaces is not easy, we thus make some approximations. First, we project the vertices of the convex hulls onto the plane. Then we separate the boundary of the environment into segments which form areas as shown in Fig. 13. For each area, we use some cells to discretize it, for example, see the yellow cells in area in Fig. 13. Some of the previously projected vertices will fall into the cells, and for each cell we pick the point that is closest to the boundary of the environment as the boundary point of our created free space. For those cells which do not contain any projected points, we take the middle point of the most inner edge of the cell as the boundary point of free space, and we name those middle points as false points. The places which have false points typically have large reconstruction errors, which will be seen later. Finally, we obtain a set of approximate boundary points of the created free space. Connecting those points will form an approximate boundary line. The distance from an approximate boundary point to the corresponding boundary edge represents the reconstruction error for the corresponding cell. The reconstruction error for each edge could be calculated through the sum of squared differences (SSD). The approximate boundary lines in our experiment result is shown in green lines in Fig. 14.
To facilitate evaluation of the reconstruction error, we divide the areas into two types: (1) outer areas including areas and (2) inner areas including areas , as illustrated in Fig. 13. Based on the error metric defined above, the reconstruction error is shown in Fig. 15. In our experiment, we measure the unit proposed by ORB_SLAM2 as follows: cm unit. The squared sum of and average boundary errors are given in Fig. 16.
From Fig. 15, we can see that both outer boundary errors and inner boundary errors tend to grow when the robot moves to the corner of the built environment. This is due to the intrinsic limitation of visual SLAM methods where visual feature points are harder to be detected while the camera is rotating, causing the number of feature points to decrease and the estimation error to increase around corners. Comparing between the outer boundary errors and inner boundary errors, we can also observe that the values of the former are large than the later, which also could be seen in Fig. 16. We can see that both two errors are very small when the robot moves in a straight path along a corridor. Such varying property of errors is consistent with the characteristics of visual SLAM methods which work particularly well for straightline movements but can be less robust for rotations.
The time costs of different parts in our proposed framework are presented in Table 18, where the input point clouds information is given in Table 17. As seen in Fig. 18, the total time for creating the free space for the whole map in our real environment experiment is only around 1.5 seconds whereas the KITTI 05 sequence requires 16.8403 seconds. However, if we examine the time consumed per seed point (a seed is sampled around every one second), they are in the same level, i.e., the times consumed per seed in real environment and with KITTI 05 sequence are 0.0412 second and 0.0723 second, respectively. Such limited time cost completely satisfies the requirement for real-time planning and navigation within finite horizons.
For the part of dense point cloud reconstruction, it is relatively time-consuming as revealed in Fig. 18. However, the regulated and refined dense point cloud is not required for robot’s navigation and planning, instead the convex hulls can be a necessary and critical module which can be easily achieved in real-time computation. The dense point cloud facilitates human recognition and abstraction of the environment, but this component can be executed only periodically or based on the human’s infrequent requests instead of real-time updates.
In this paper, we have presented a framework to create navigable space from sparse and noisy map point cloud generated by sparse visual SLAM methods. Our method first builds local convex regions, from which we further regulate and refine the original noisy and sparse map points to obtain a denser and smoother point cloud that well describes the environment. We have validated our proposed framework using both a public dataset and a real environmental structure, and our proposed method is validated to be robust to highly noisy map points and efficient for real-time planning and navigation.
-  R. Deits and R. Tedrake, “Computing large convex regions of obstacle-free space through semidefinite programming,” in Algorithmic foundations of robotics XI. Springer, 2015, pp. 109–124.
-  C. E. Scheidegger, S. Fleishman, and C. T. Silva, “Triangulating point set surfaces with bounded error.” in Symposium on Geometry Processing. Citeseer, 2005, pp. 63–72.
-  S. Fleishman, D. Cohen-Or, and C. T. Silva, “Robust moving least-squares fitting with sharp features,” in ACM transactions on graphics (TOG), vol. 24, no. 3. ACM, 2005, pp. 544–552.
-  Z. C. Marton, R. B. Rusu, and M. Beetz, “On fast surface reconstruction methods for large and noisy point clouds,” in 2009 IEEE international conference on robotics and automation. IEEE, 2009, pp. 3218–3223.
-  H. Hoppe, T. DeRose, T. Duchamp, J. McDonald, and W. Stuetzle, “Surface reconstruction from unorganized points.”
J. R. Diebel, S. Thrun, and M. Brünig, “A bayesian method for probable surface reconstruction and decimation,”ACM Transactions on Graphics (TOG), vol. 25, no. 1, pp. 39–59, 2006.
-  D. Lovi, N. Birkbeck, D. Cobzas, and M. Jagersand, “Incremental free-space carving for real-time 3d reconstruction,” in Fifth international symposium on 3D data processing visualization and transmission (3DPVT), 2010.
-  A. Romanoni and M. Matteucci, “Incremental reconstruction of urban environments by edge-points delaunay triangulation,” in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2015, pp. 4473–4479.
-  Y. Ling and S. Shen, “Building maps for autonomous navigation using sparse visual slam features,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2017, pp. 1374–1381.
L. Nan and P. Wonka, “Polyfit: Polygonal surface reconstruction from point
Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2353–2361.
-  H. Isack and Y. Boykov, “Energy-based geometric multi-model fitting,” International journal of computer vision, vol. 97, no. 2, pp. 123–147, 2012.
-  R. Hadsell, J. A. Bagnell, D. F. Huber, and M. Hebert, “Accurate rough terrain estimation with space-carving kernels.” in Robotics: Science and Systems, vol. 2009, 2009.
-  A. Elfes, “Using occupancy grids for mobile robot perception and navigation,” Computer, vol. 22, no. 6, pp. 46–57, 1989.
-  A. Hornung, K. M. Wurm, M. Bennewitz, C. Stachniss, and W. Burgard, “Octomap: An efficient probabilistic 3d mapping framework based on octrees,” Autonomous robots, vol. 34, no. 3, pp. 189–206, 2013.
H. Oleynikova, A. Millane, Z. Taylor, E. Galceran, J. Nieto, and R. Siegwart,
“Signed distance fields: A natural representation for both mapping and
RSS 2016 Workshop: Geometry and Beyond-Representations, Physics, and Scene Understanding for Robotics. University of Michigan, 2016.
-  H. Oleynikova, Z. Taylor, M. Fehr, J. Nieto, and R. Siegwart, “Voxblox: Building 3d signed distance fields for planning,” arXiv, pp. arXiv–1611, 2016.
-  F. Blochliger, M. Fehr, M. Dymczyk, T. Schneider, and R. Siegwart, “Topomap: Topological mapping and navigation based on visual slam maps,” in 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2018, pp. 1–9.
-  H. Oleynikova, Z. Taylor, R. Siegwart, and J. Nieto, “Sparse 3d topological graphs for micro-aerial vehicle planning,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018, pp. 1–9.
-  R. Deits and R. Tedrake, “Efficient mixed-integer planning for uavs in cluttered environments,” in 2015 IEEE international conference on robotics and automation (ICRA). IEEE, 2015, pp. 42–49.
-  S. Liu, M. Watterson, K. Mohta, K. Sun, S. Bhattacharya, C. J. Taylor, and V. Kumar, “Planning dynamically feasible trajectories for quadrotors using safe flight corridors in 3-d complex environments,” IEEE Robotics and Automation Letters, vol. 2, no. 3, pp. 1688–1695, 2017.
-  R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, “Orb-slam: a versatile and accurate monocular slam system,” IEEE transactions on robotics, vol. 31, no. 5, pp. 1147–1163, 2015.
-  C. Forster, M. Pizzoli, and D. Scaramuzza, “Svo: Fast semi-direct monocular visual odometry,” in 2014 IEEE international conference on robotics and automation (ICRA). IEEE, 2014, pp. 15–22.
-  P. URL, Point Cloud Library, [Accessed Feb 1, 2019], http://pointclouds.org/.
-  R. Mur-Artal and J. D. Tardós, “Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras,” IEEE Transactions on Robotics, vol. 33, no. 5, pp. 1255–1262, 2017.