A life-long SLAM approach using adaptable local maps based on rasterized LIDAR images

07/15/2021
by   Waqas Ali, et al.
Shanghai Jiao Tong University
0

Most real-time autonomous robot applications require a robot to traverse through a dynamic space for a long time. In some cases, a robot needs to work in the same environment. Such applications give rise to the problem of a life-long SLAM system. Life-long SLAM presents two main challenges i.e. the tracking should not fail in a dynamic environment and the need for a robust and efficient mapping strategy. The system should update maps with new information; while also keeping track of older observations. But, mapping for a long time can require higher computational requirements. In this paper, we propose a solution to the problem of life-long SLAM. We represent the global map as a set of rasterized images of local maps along with a map management system responsible for updating local maps and keeping track of older values. We also present an efficient approach of using the bag of visual words method for loop closure detection and relocalization. We evaluate the performance of our system on the KITTI dataset and an indoor dataset. Our loop closure system reported recall and precision of above 90 percent. The computational cost of our system is much lower as compared to state-of-the-art methods. Our method reports lower computational requirements even for long-term operation.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 4

09/09/2018

Simultaneous Localization and Mapping (SLAM) using RTAB-MAP

This paper implements Simultaneous Localization and Mapping (SLAM) techn...
11/06/2017

Elastic LiDAR Fusion: Dense Map-Centric Continuous-Time SLAM

The concept of continuous-time trajectory representation has brought inc...
10/09/2019

A Brain-Inspired Compact Cognitive Mapping System

As the robot explores the environment, the map grows over time in the si...
09/14/2021

GPGM-SLAM: a Robust SLAM System for Unstructured Planetary Environments with Gaussian Process Gradient Maps

Simultaneous Localization and Mapping (SLAM) techniques play a key role ...
08/05/2020

Elasticity Meets Continuous-Time: Map-Centric Dense 3D LiDAR SLAM

Map-centric SLAM utilizes elasticity as a means of loop closure. This ap...
10/04/2021

Geometry-based Graph Pruning for Lifelong SLAM

Lifelong SLAM considers long-term operation of a robot where already map...
10/28/2017

Long-Distance Loop Closure Using General Object Landmarks

Visual localization under large changes in scale is an important capabil...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

SLAM [31] algorithm is an integral part of an autonomous navigation system. Most applications involving autonomous robots require long-term operation in a dynamic environment. One of the main challenges for such scenarios is the problem of mapping efficiency. When a robot moves for a long time or multiple runs of the same area, the map built continues to rise and increases computational requirements. Several methods [15], [20] have proposed efficient mapping strategies to enable long-term operation. But the problem of lifelong mapping remains unsolved especially in the case of laser SLAM. We present the method of using the rasterized images for map representation. The idea is to represent the global map as a collection of rasterized images of local maps. A map management system ensures that the online mapping remains lightweight and keeping track of all the changes in the environment. It consists of keyframes and local maps graph to track all the local maps. In addition, the system performs map updates and culling to add new information and limiting online memory usage.

For a long-term SLAM operation, the ability to re-localize and detect loop closure is also vital. If loop closure detection is not robust and the robot has to move in the same area for a long time, the SLAM system will fail. For laser SLAM interest points [30, 29] or global descriptors [24, 17] has been utilized for loop closure detection. We build on the idea of using the bag of visual words for loop closure detection [11]

. BOW has proven to be an efficient method for content-based image retrieval from a large database. Several vision-based SLAM algorithms

[14, 2, 21] use the bag of words based methods for loop closure detection. In this paper, we propose the BOW approach for the loop closure method using rasterized laser images. One of the drawbacks of the BOW approach is the computational complexity of building vocabulary. We represent the global map as a set of local maps in the form of rasterized images. So only a limited number of local map images are used to build the BOW database making the system more efficient. We have used KITTI and an indoor dataset to evaluate the performance of our system. We use several state-of-the-art methods for comparison to show the efficiency of the proposed approach. Our system reported a higher recall and precision rate with much lower computational requirements. We present the following contributions in this paper.

  1. A novel mapping strategy for laser-based SLAM system. We propose the representation of the global map as a collection of rasterized images built from local maps.

  2. A map management system based on a graph structure of local maps and keyframes poses with an efficient update and culling algorithm.

  3. We propose a novel loop closure detection and re-localization for laser SLAM system based on the BOW approach.

  4. A lightweight SLAM algorithm with much lower computational requirements for long-term application

2 Related Work

One of the important aspects of the laser SLAM system for long-term operation is an efficient and robust loop closure and re-localization method. Hess et al. [13] presented a method of loop closure detection using scan-to-submap search. Mapping is divided into sub-maps and as each submap is finished it is qualified for loop closure search. Appearance-based loop closure detection is proposed in [17]. Features are detected based on NDT surface representation and by matching these features loop closure is detected. Steder [30] proposed an approach of applying the bag of words method for place recognition using laser range images. Their method reported a good recall rate but its time requirements for interest point detection, validation, and histogram calculation is high. We propose an efficient implementation of BOW loop closure using a small online built vocabulary based on local maps. Behley and Stachniss [4] proposed a SLAM system using surfel based maps and loop closure is detected by matching the new scan with the rendered map.

In our earlier work [1]

, loop closure detection is performed in two steps i.e. first finding closest keyframes and then using feature matching to confirm loop closure candidate. In this paper, we apply the bag of words approach for loop detection. We make this method more efficient by searching through a database built from local map images. Our method reported high precision and recall rates in real-time.

Most real-time robot applications require long term operation in a dynamic environment. The earliest solution to navigation in dynamic environments was presented by Yamauchi et al. [34]. They proposed reactive behavior and adaptive place network to deal with dynamic scenes, but their drawback is that it cannot self localize and provides no solution to exploration in unknown scenes. Stachniss and Burgard [27]

presented a method for modeling configuration of the semi-static environment. Their method clusters local maps to give an estimate of the possible configuration of the environment. Bosse et al.

[7] presented the Atlas framework for realizing large scale real-time operation using sub-maps. It uses a collection of sub-maps with their local frames, and a graph structure is formed based on vertices consisting of sub-maps and edges as a transformation between these local maps.

In [5, 6] a dynamic representation of map is presented which consists of local maps built at different times, these maps are continuously updated online. Those maps are selected for navigation which best suits the current observation. This operation requires a large amount of memory to update the maps online. Our method uses a rasterized image representation for 3D laser data and the map updates require a much lesser computational cost. In [16, 28] a method for limiting the computational complexity for long term operation using information-theoretic compression of pose graph is presented. Observations that provide little information are discarded and in this way nodes from pose-graph are marginalized. New nodes are only added when the robot explores previously unknown scenes. Our method provides an efficient way to store the maps while also keeping older information for long-term operation.

McDonald et al. [18] proposed a system for multi-session mapping. It incorporates multiple maps using a single coordinate and appearance-based loop closure detection is used. Fentanes et al. [9] proposed a method to predict the changes in an environment based on periodic events this allows for long term robot operation. Wolf and Sukhatme [33] proposed a mapping method for dealing with static and dynamic object separately. These environment models limit the application of the system to certain scenarios, while we build a system that can successfully work in different conditions and is not limited to specific environments. Konolige and Bowman [15] proposed a method for mapping in dynamic environments for long term operation. A criterion for system deployment and practical life long mapping is defined in their work. Robust visual place recognition is used for loop closure detection. It can recover maps that may have been distorted due to localization failure and stitch different sequences. Banerjee et al. [3] proposed a method to limit computational cost while keeping efficiency and consistency by pruning redundant local maps. Our method is both computationally efficient and able to relate changes in a dynamic environment. We save the local maps in the form of rasterized images, which require very little memory and life-long mapping is done online.

Figure 1: Complete architecture of our SLAM system

3 Tracking

3.1 Visual Odometry

The purpose of including the visual odometry into our system is to ensure that the tracking does not fail in unstructured or feature-less environments. There are several state-of-the-art visual odometry methods present in the literature. In this paper, we use the approach presented in [19]

. There are two main parts of odometry calculation, i.e., pose estimation from feature tracking and local BA. VO thread starts by taking in the images and extract ORB features

[25]. ORB feature detector is stable, fast to compute, and works in a variety of environments. We set the scale level for FAST corners at 8, and each scale is further divided into grid cells to get even distribution of features. We ensure that enough features are extracted from each image by adapting the threshold parameters of the FAST corner detector. Using a constant motion model initial pose is estimated. Next, we perform a search for map points. The pose estimate is first optimized from the map-point correspondences and then using motion BA. The final pose is then used for fusion with the laser pose.

3.2 Laser Pose Estimation

Pose estimation from the laser point cloud is based on our earlier work [1]

. There are three main components of this thread, i.e., point cloud rasterization, feature extraction and matching, and pose estimation. Once the raw point cloud is received, it is processed to remove outliers. For the 3d point cloud collected by multi-line lidar, circular rings are formed on the ground plane. Such rings can cause outliers when projected to the image. So we apply RANSAC

[10] based plane detection to remove the ground plane from the raw 3D point cloud. The remaining point cloud is used for rasterization. We use a pinhole camera model to give a geometric relationship between 3d points and their 2d projections. Figure 2 illustrates the process of rasterization.

Figure 2: The raw point cloud is first processed to remove outliers and then rasterized to form a greyscale image.

The 3d points received are in lidar coordinates, which are first transformed to camera coordinates using extrinsic parameters defined by a rotation matrix R

and a translation vector

t. Next, we use the intrinsic parameters K to project points to image coordinates. The final relationship for projecting a 3d point in lidar coordinates to 2d pixel in image coordinates is given in equation 1. We set the z-axis as the optical axis during the process of rasterization. The pixel intensity of each point on the image is set equal to the value of its corresponding 3d point. In this way, we get a greyscale image, and the elevation information of 3d points is saved in the image.

(1)

Next, we apply the ORB feature [25] detector on the rasterized images. At least 1000 features are detected from each image. We match features with the last scan and remove outliers using RANSAC [10]. Once feature correspondences are found, these features are projected back to laser coordinates. For a set of feature points and from two scans. We use eq.2 to estimate the transformation in the form of rotation matrix and translation vector between the two scans.

(2)

We use the ICP algorithm to estimate motion from these points. The number of points is low with known correspondences, therefore ICP gives an accurate estimate. We use the motion estimate to calculate the lidar pose. The tracking thread is also responsible for keyframe selection. We use simple criteria of keyframe selection, i.e., at least five frames have passed and there are at least 100 points in common with the last frame. Finally, we build a factor graph based on keyframe nodes, where each node represents the estimated pose from the laser point cloud. We then add visual odometry factors into the factor graph and optimize to calculated the final keyframe pose. Each keyframe stores the 6dof pose and 3d values of all the features observed at that position. The keyframes information is used by mapping and loop closure threads.

3.3 Re-localization

One of the main contributions of this paper is the design of the relocalization approach for laser-based SLAM. It is also vital for the long-term performance of robot localization. The principle for relocalization is similar to loop closure detection. In the event, enough features are not detected from laser rasterized images and VO doesn’t provide an accurate pose estimate, meaning that the tracking is lost. We convert the laser rasterized image from the current frame into a BOW vector and perform a database query. After getting a match from the database, we calculate the robot pose by matching the features from the current frame with the frame received from a database query. The process of building the BOW database from local maps is discussed in detail in the next section.

Figure 3: Mapping process of rasterizing the points of local map to form rasterized image

4 Mapping

The mapping thread takes the keyframe information from Tracking to register the local map and perform local BA. It starts by taking the map points observed at the first keyframe to initialize the local map. The local map is built incrementally as the information of the new keyframe is received. New map points are added to the local map until it meets the size threshold. At that point, a new local map is initialized. We ensure a smooth transition by keeping enough common points between consecutive local maps. When a local map is finished, we perform local BA to optimize the map points and keyframe poses. The cost function to solve local BA is defined as

(3)

In equation-3, and denote the poses for keyframe i & j and is the predicted observation using and . The actual observation is , giving the cost function equal to the difference of and . We minimize the re-projection error by solving eq. 4.

(4)

After the local BA is finished, the optimized map points within the local map are rasterized to form an image using eq. 1. Fig-3 shows the mapping process, are optimized 3d map points of a local map. These points are rasterized to form the local map image. In this way, the global map is represented as a collection of local map images. Once a local map is rasterized to an image, it is forwarded to the BOW database and map management system.

Figure 4: A set of visual word extracted from a rasterized image of local map

4.1 BOW Database

The mapping thread is also responsible for maintaining the BOW database. The rasterized local map images are used to build a database based on the bag of words approach [11]. BOW is most commonly used for vision-based SLAM algorithms to detect loop closure. Here we adopt this approach to apply for laser-based rasterized images. Instead of building the database of all the keyframes, we only use local maps for the database. This reduces the dimensions of the database and makes the whole system more efficient.

For every local map image, we extract ORB features similar to the tracking thread. We compute FAST corners and assign BRIEF descriptors to each corner. The next step is to build a vocabulary using DBOW2 [11] based on these features. The DBOW2 library discretizes binary descriptors, applies k-median clustering, and builds a vocabulary tree of these clusters. Figure-4 shows the example of visual words extracted from a rasterized local map image. We see three types of features dominating the visual words, i.e., building structures such as corners or planes, trees or shrubs, and dynamic objects such as cars. Using our approach, we can save the structural information of the scene. So even if there are some changes in the scene, we can still detect loop closure efficiently.

Input: Keyframe , Features
Result: Perform Mapping for long-term SLAM operation
initialization;
while Recieve Keyframes do
       initialize local map ;
       if local map finished then
             perform local BA;
             rasterize map-points to local map image ;
             add time and keyframe label to ;
             if local map graph present then
                   initialize local map graph;
                  
            else
                   add to graph;
                  
             end if
            if loop closure detected then
                   initialize local map graph update;
                   perform culling;
                  
             end if
            if distance travelled greater than  then
                   perform culling;
                  
             end if
            
      else
             add Keyframes to local map;
            
       end if
      
end while
Algorithm 1 Life-long Mapping

4.2 Map Management

4.2.1 Initialization

This first step for our map management scheme is to initialize a graph structure based on the local map images and their relation to the keyframes. We know the covisibility information of the landmarks concerning each keyframe x. The relation of points to a pixel point is based on equation.(1). Figure-3 shows the relationship between keyframes, map points, and local map images. Our idea here is to use these relations and build a graph that only includes keyframes’ poses and local map images. In long-term operation, we can use this graph to track all the maps. It is also useful to perform an update and remove redundant information. When initializing a local map into our graph each map is tagged with time information, which becomes useful later when the robot revisits an area. So in the graph structure, each local map has two labels, i.e., the keyframes indices to which the local map is connected and the time stamp.

While an online local map graph is kept, we also keep an offline database that stores the older local maps. So the older information is not completely lost but saved into the offline database.

4.2.2 Map Update

The updates to the local maps are vital for the long-term operation of our system. When a robot revisits an area or loop closure is detected, we want to update the information fused in local maps. When performing an update to the local map graph, the goal is not to lose older observations. When loop closure is detected a new local map is initialized at that position using the current information provided by laser scans. It is tagged with the time information and we remove older maps at the position from the online graph to save them into the offline database.

4.2.3 Culling

To keep the mapping efficient, we design a robust culling scheme for local maps. The culling mechanism has two parts, i.e. when the loop is detected and based on distance traveled. As discussed in the last section that when the loop is detected we add a new local map to the graph. The first part of the culling method is applied here to remove the older local map from the graph. For the second part, we set a distance threshold. If the traveled distance of the robot meets the threshold then older local maps are removed from the online graph. The final goal of the system is to minimize the online information while also maintaining system accuracy. We save all the marginalized local maps into an offline database.

The culling of older maps does not affect the global operation because we perform the re-localization and loop closure detection based on frame query from the BOW database. Using the proposed strategy the online operation remains efficient and robust to changes in the environment.

5 Loop Closure

Loop closure detection for the system works in three steps i.e. database query, feature matching, and transformation estimation. Figure-5 shows the algorithm flow for loop closure detection. A BOW database is managed by the mapping thread based on local maps. Our method has the advantage that minimizes the computational time required for loop candidate detection as the image space is reduced by just searching through the local maps. The method starts by taking the descriptors of a new frame and converting it to a BOW vector and performing a database query. We use the DBOW2 [11] library to perform the database query. A vector of the matches found from the query with normalized scores of each match is returned.

Figure 5: Control flow for loop closure detection where each new frame is used for database query from BOW database

From the list of matches, we apply a score threshold. We select the matches of the current keyframe with local maps that pass the threshold as initial candidates. Then, we match features from the current keyframe to each keyframe within the local map. Here the local map graph becomes useful as each local map contains a keyframes label. We use the RANSAC algorithm to detect outliers in the feature matches. If the current keyframe reports enough correspondences with keyframe , then this candidate can be considered as the final loop closure. For global optimization, we build a pose-graph, where landmarks are marginalized. We need to estimate a 6DOF transformation between frame and , for adding loop constraint in the pose graph. Using the already estimated correspondences between the two frames, we compute the transformation matrix. Finally, we optimize the trajectory after adding the loop closure constraint.

6 Experiments

We evaluate our method using KITTI [12] odometry dataset and an indoor dataset. KITTI dataset provides 22 sequences with 3D lidar data and stereo images. Sequences 00 to 10 serve as training datasets as ground truth data is provided. For the KITIT dataset, Velodyne HDL-64 LIDAR is used to record laser data. They record the data in three types of environments, i.e., urban, country, and highway. In addition, we record an indoor dataset with a VLP-16 laser scanner. This indoor dataset is recorded with multiple runs through our lab. It is useful to show the long-term SLAM performance in a dynamic environment.

We evaluate in two steps, i.e., first, we use the KITTI dataset to show the performance of the SLAM system in terms of localization accuracy, loop closure detection, and computational requirements. Next, we use the indoor dataset to evaluate the performance of our system in a long-term operation.

6.1 KITTI dataset

6.1.1 Localization accuracy

The localization performance of the system is assessed using relative translation and rotation errors based on the method presented in [12]. The relative rotation and translation errors are computed using equations 5 and 6.

(5)
(6)

In equations 5 & 6, is a set of frames , and are ground truth and estimated poses respectively, denotes rotation angle and is inverse compositional operator.

The localization performance is compared with LOAM [35], GICP [26] and VINS [23]

. LOAM is the best-performing laser-based method on the KITTI dataset and VINS is one of the best performing visual odometry methods. We run the open-source code available for the three systems on the eleven training sequences provided by the KITTI dataset. Table-

1 shows the values of relative translation and rotation errors for the 11 sequences. Our system consistently gives accurate performance for all the training sequences for both translation and rotation errors, while also surpassing other methods in some sequences. The values of our method remain low for almost all the sequences, while the results of remaining three methods vary for the 11 sequences. We can prove that our method can produce accurate results on par with state-of-the-art methods.

Sequences Our Method LOAM VINS GICP
Trans Rot Trans Rot Trans Rot Trans Rot
0 1.2275 0.0061 1.1 0.0053 1.3517 0.012 1.29 0.64
1 1.4493 0.0032 2.79 0.0055 2.2273 0.0076 4.39 0.91
2 1.0808 0.0052 1.54 0.0055 1.3989 0.009 2.53 0.77
3 0.8401 0.0034 1.13 0.0065 1.2116 0.0108 1.68 1.08
4 0.5469 0.0044 1.45 0.005 1.4229 0.0118 3.76 1.07
5 1.0494 0.0045 0.75 0.0038 1.4272 0.012 1.02 0.54
6 1.2673 0.0042 0.72 0.0039 1.2926 0.011 0.92 0.46
7 1.4091 0.0078 0.69 0.005 1.3308 0.013 0.64 0.45
8 1.0798 0.0053 1.18 0.0044 1.8352 0.012 1.58 0.75
9 1.495 0.005 1.2 0.0048 1.6345 0.0085 1.97 0.77
10 0.8396 0.0053 1.51 0.0057 3.1007 0.0185 1.31 0.62
Table 1: Relative translation and rotation error for our method, LOAM, VINS and GICP

6.1.2 Loop Closure

In this section, we evaluate the performance of the loop closure detection method presented in this paper. It is one of the main contributions of this paper. For evaluation, we have implemented two-loop closure techniques i.e. by searching through keyframes [1] and searching through local maps. In these two methods, the search for loop closure is performed based on feature matching. In the first method, we search for the nearest keyframes based on distance threshold and then match features to give the final loop candidate. The second method is similar but uses search through local maps instead of keyframes.

Sequence
Total
frames
No. of
Keyframes
No. of
Local maps
Time for candidate detection (msec)
BOW
Search through
Local maps
Search through
Keyframes
00 4540 1125 225 3.034 130.49 537.1
02 4660 1185 237 6.64 402 631
05 2760 870 174 2.1 103.70 384.60
06 1100 280 56 1.29 52.83 28.90
07 1100 265 53 1.102 21 26
09 1590 395 79 1.338 23 12.75
Table 2: Detailed values for each sequence used in loop closure evaluation
Figure 6: Loop candidate detection time comparison for BOW, local map and keyframe search loop detection methods

We use six sequences with loop closure from the training sequences of the KITTI dataset. Table-2 shows the stats for the six sequences, i.e., the total number of frames, and the number of keyframes and local maps selected by our system. The average time taken by each method for loop closure candidate detection is also given in the table. The approach of using local maps greatly reduces the space from which we can search for loop closure. Take the example of sequences 00 and 02, the longest sequences with loop closures. It contains 4540 and 4660 frames, which can be represented by 225 and 237 local maps respectively. So we get an efficient representation of the complete map. Next, we have a look at the time taken by each method for loop candidate detection. Firgure-6 shows the plot for loop detection time requirements of each method.

Sequence Recall (%) Precision (%)
BOW
Search through
Local maps
Search through
Keyframes
BOW
Search through
Local maps
Search through
Keyframes
00 94.78 98.2 72.25 98.5 100 100
02 84.28 89.28 59.2 80.5 94 78.13
05 88.58 97.59 80.18 92.8 100 100
06 84.5 98.89 51 100 100 100
07 69.23 84.6 61.53 100 100 100
09 65.21 82.6 39.13 100 100 100
Table 3: Values of recall and precision for loop closure detection using BOW, local map search and keyframe search
Figure 7: A comparison of loop closure detection recall rate for the three methods

The method presented in this paper has a major difference in performance. The average time taken by searching through local maps and keyframes is 130.49ms and 537.1ms respectively, while the time required by BOW based method takes only 3.034ms. Bag of words has been proven to be a very efficient approach that has been widely used in vision-based systems, but we adapt this method to apply to laser-based SLAM system. We make this approach time-efficient by building the database based on local maps. Table-3 shows the values for recall and precision for the three loop detection methods and a plot is given in figure-7. All the methods present high precision, but the keyframe-based method has a slightly lower recall rate because it is slower and can miss some frames. BOW method has a slightly lower recall rate as compared to feature-based search through local maps, but keeps high precision. These results validate the efficacy of the method presented in this paper. We showed that BOW-based loop closure detection for laser SLAM gives a precision of more than 90% and recall rate of above 80% and only needs few milliseconds to detect loop candidate.

6.1.3 Computational Complexity

In this paper, one of our main contributions is to build a lightweight SLAM algorithm. We select the six longest sequences from the KITTI odometry dataset, i.e., 00, 02, 08, 13, 19, and 21 to evaluate the computational requirements of our system. Out of these sequences 00, 02, 13, and 19 contain loop closures. This also gives us an idea of the computational requirements during loop closure detection. We have selected five SLAM methods for evaluation in this section including ORBSLAM2 [19], Cartographer [13], ISCLOAM [32], MULLS [22] and SUMA++ [8]. All of these methods contain loop closure detection. ORBSLAM2 is included in our evaluation because it also uses DBOW2 [11] to detect loop closure and relocalization. We want to compare the performance of our system with both laser and vision-based systems. Further, the cartographer requires imu data and the KITTI dataset only provides imu data for sequences 0 to 10. So we test cartographers only on sequences 00, 02, and 08.

Figure 8: Results of the cpu usage of our method in comparison with ORBSLAM2, cartograper, ISCLOAM, MULLS and SUMA++ for KITTI dataset

We select two metrics to measure the computational cost for each method, i.e., the percentage CPU and memory usage. First, we look at the CPU usage, figure-8 shows the values for all the six methods. The requirements of our system consistently remain low for all sequences, as the values remain less than 10%. The reason is that we propose a simple approach for the SLAM. The pose estimation is done by feature extraction and matching, then mapping performs only local BA and saves maps as rasterized images. Finally, loop closure detection is performed by search through a small BOW database. Looking at other systems, the CPU usage is much higher for all methods, except SUMA++. It has slightly lower requirements as it also uses laser image representation.

Next, we examine the memory requirements of the online SLAM operation. Figure-9 shows the percentage memory usage of the six methods. This comparison shows the true impact of our local map representation and loop detection approach. We limit the memory usage by using local map representation and then keeping only recent local maps online. Next, the computational requirements are limited by the implementation of a BOW-based loop closure and re-localization. The memory usage for our system remains lower than 5% for all the sequences. The remaining five systems report memory usage of up to 20% and more, ISCLOAM has the highest values of around 60%-80%. Their system is a typical SLAM approach where new map points are added as the robot moves. All the frames are kept online for loop closure detection, which requires such high memory usage. SUMA++ reported lower CPU usage, but their system’s memory requirements are much higher. ORBSLAM2 also uses the BOW approach using the DBOW2 library, but their computation requirements are much higher than our system. The main difference in performance is due to the size of the vocabulary. We use a small BOW database built by only using images of local maps limiting the memory usage. From these results, we have been able to prove the computational efficiency of our system.

Figure 9: A plot of the memory requirements of the six methods for KITTI dataset sequences.

6.2 Indoor Dataset

We have collected data inside our lab, with multiple runs of the area. Our lab presents a dynamic environment with constant changes in the scene. This kind of data is similar to the real-life operation of service robots that have to operate for a long time with people moving and other changes to the scene. In this section, we investigate the performance in two steps, i.e., the computational costs of the SLAM system during a long-term operation and the map quality. We use ORBSLAM2, ISCLOAM, and MULLS for evaluation with our system. We have not used SUMA++ in this section, as it only works well with the KITTI dataset.

(a) CPU usage
(b) Memory Usage
Figure 10: Percentage cpu and memory consumption by our method, ORBSLAM2, ISCLOAM and MULLS for the indoor dataset

One of the challenges for life-long SLAM is to limit the computational requirements of online operation. As the robot moves, new frames and map points keep on adding to the system. For most methods, this information is kept online for loop closure detection and global optimization. Figure-10 shows the percentage CPU and memory usage of the four systems used in this evaluation. First, we look at the performance of ORBSLAM2, as it reports the highest values of the four systems. The reasons are the vocabulary size for loop closure detection and relocalization, and the information kept online increases with the number of frames.

ISCLOAM uses the approach to keep all the frames and map points online to search for loop closure. So, the computational requirements rise as the robot moves through the environment. We can also see from the results that as the number of frames grows, both CPU and memory requirements increase. MULLS reports different behavior in terms of CPU load. Its value is slightly higher than our system, but it almost remains constant. But the memory usage rises as the number of frames increase. Lastly, we can observe that plot for CPU and memory usage remains flat for our method. The CPU load stays less than 10 %, and memory requirements are lower than 8%. These results show the effect of the proposed strategy for long-term operation. In this paper, we have proposed an efficient mapping approach using local map rasterized images. We also ensure that only limited information is kept online and still produce accurate results. Next, the loop closure detection method is fast and efficient. For indoor datasets with thousands of frames, loop closure thread only takes 1% of memory usage. From these results, we have shown the efficacy of the proposed method to solve the computational complexity issues for long-term SLAM operation.

7 Conclusion

In this paper, we presented an approach of a light weight SLAM algorithm that can be applied to long-term operation. We divided the global map into a set of local map rasterized images. These images reduce the computational and memory requirements for mapping of SLAM system. A map management system keeps the online operation lightweight, even for longer operations. Using Bag of words approach, an efficient loop closure and re-localization method is also presented. We have provided a thorough evaluation of our method with experiments on KITTI dataset and an indoor dataset. For KITTI, our method reported recall and precision of above 90 percent and much lower CPU and memory usage. The indoor dataset was used to show the performance during longer operations. The computational requirements of our system remain constant for the indoor dataset.

References

  • [1] W. Ali, P. Liu, R. Ying, and Z. Gong (2021) 6-dof feature based lidar slam using orb features from rasterized images of 3d lidar point cloud. arXiv preprint arXiv:2103.10678. Cited by: §2, §3.2, §6.1.2.
  • [2] A. Angeli, S. Doncieux, J. Meyer, and D. Filliat (2008) Real-time visual loop-closure detection. In 2008 IEEE international conference on robotics and automation, pp. 1842–1847. Cited by: §1.
  • [3] N. Banerjee, D. Lisin, J. Briggs, M. Llofriu, and M. E. Munich (2019) Lifelong mapping using adaptive local maps. In 2019 European Conference on Mobile Robots (ECMR), pp. 1–8. Cited by: §2.
  • [4] J. Behley and C. Stachniss (2018-06) Efficient surfel-based slam using 3d laser range data in urban environments. In Robotics: Science and Systems, pp. . Cited by: §2.
  • [5] P. Biber, T. Duckett, et al. (2005) Dynamic maps for long-term operation of mobile service robots. In Robotics: science and systems, pp. 17–24. Cited by: §2.
  • [6] P. Biber and T. Duckett (2009) Experimental analysis of sample-based maps for long-term slam. The International Journal of Robotics Research 28 (1), pp. 20–33. Cited by: §2.
  • [7] M. Bosse, P. Newman, J. Leonard, and S. Teller (2004) Simultaneous localization and map building in large-scale cyclic environments using the atlas framework. The International Journal of Robotics Research 23 (12), pp. 1113–1139. Cited by: §2.
  • [8] X. Chen, A. Milioto, E. Palazzolo, P. Giguère, J. Behley, and C. Stachniss (2019-11) SuMa++: efficient lidar-based semantic slam. pp. 4530–4537. External Links: Document Cited by: §6.1.3.
  • [9] J. P. Fentanes, B. Lacerda, T. Krajník, N. Hawes, and M. Hanheide (2015) Now or later? predicting and maximising success of navigation actions from long-term experience. In 2015 IEEE international conference on robotics and automation (ICRA), pp. 1112–1117. Cited by: §2.
  • [10] M. A. Fischler and R. C. Bolles (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24 (6), pp. 381–395. Cited by: §3.2, §3.2.
  • [11] D. Gálvez-López and J. D. Tardós (2012-10) Bags of binary words for fast place recognition in image sequences. IEEE Transactions on Robotics 28 (5), pp. 1188–1197. External Links: Document, ISSN 1552-3098 Cited by: §1, §4.1, §4.1, §5, §6.1.3.
  • [12] A. Geiger, P. Lenz, and R. Urtasun (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In

    Conference on Computer Vision and Pattern Recognition (CVPR)

    ,
    Cited by: §6.1.1, §6.
  • [13] W. Hess, D. Kohler, H. Rapp, and D. Andor (2016) Real-time loop closure in 2d lidar slam. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 1271–1278. Cited by: §2, §6.1.3.
  • [14] (2016) High performance loop closure detection using bag of word pairs. Robotics and Autonomous Systems 77, pp. 55 – 65. External Links: ISSN 0921-8890, Document Cited by: §1.
  • [15] K. Konolige and J. Bowman (2009) Towards lifelong visual maps. In 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1156–1163. Cited by: §1, §2.
  • [16] H. Kretzschmar, C. Stachniss, and G. Grisetti (2011) Efficient information-theoretic graph pruning for graph-based slam with laser range finders. In 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 865–871. Cited by: §2.
  • [17] M. Magnusson, H. Andreasson, A. Nuchter, and A. J. Lilienthal (2009)

    Appearance-based loop detection from 3d laser data using the normal distributions transform

    .
    In 2009 IEEE International Conference on Robotics and Automation, pp. 23–28. Cited by: §1, §2.
  • [18] J. McDonald, M. Kaess, C. Cadena, J. Neira, and J. J. Leonard (2011-09) 6-dof multi-session visual slam using anchor nodes. In Proceedings of European Conference on Mobile Robots (ECMR ’11), pp. 69 – 76. Cited by: §2.
  • [19] R. Mur-Artal and J. D. Tardós (2017) Orb-slam2: an open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Transactions on Robotics 33 (5), pp. 1255–1262. Cited by: §3.1, §6.1.3.
  • [20] A. Naima (2011) Long-term robot mapping in dynamic environments. Ph.D. Thesis, PhD Thesis, MIT. Cited by: §1.
  • [21] T. Nicosevici and R. Garcia (2012) Automatic visual bag-of-words for online robot navigation and mapping. IEEE Transactions on Robotics 28 (4), pp. 886–898. Cited by: §1.
  • [22] Y. Pan, P. Xiao, Y. He, Z. Shao, and Z. Li (2021) MULLS: versatile lidar SLAM via multi-metric linear least square. CoRR abs/2102.03771. External Links: Link Cited by: §6.1.3.
  • [23] T. Qin, J. Pan, S. Cao, and S. Shen (2019) A general optimization-based framework for local odometry estimation with multiple sensors. External Links: arXiv:1901.03638 Cited by: §6.1.1.
  • [24] T. Röhling, J. Mack, and D. Schulz (2015) A fast histogram-based similarity measure for detecting loop closures in 3-d lidar data. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 736–741. Cited by: §1.
  • [25] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski (2011) ORB: an efficient alternative to sift or surf. In 2011 International conference on computer vision, pp. 2564–2571. Cited by: §3.1, §3.2.
  • [26] A. Segal, D. Haehnel, and S. Thrun (2009) Generalized-icp.. In Robotics: science and systems, Vol. 2, pp. 435. Cited by: §6.1.1.
  • [27] C. Stachniss and W. Burgard (2005) Mobile robot mapping and localization in non-static environments. In aaai, pp. 1324–1329. Cited by: §2.
  • [28] C. Stachniss and H. Kretzschmar (2017) Pose graph compression for laser-based slam. In Robotics Research, pp. 271–287. Cited by: §2.
  • [29] B. Steder, G. Grisetti, and W. Burgard (2010) Robust place recognition for 3d range data based on point features. In 2010 IEEE International Conference on Robotics and Automation, pp. 1400–1405. Cited by: §1.
  • [30] B. Steder, M. Ruhnke, S. Grzonka, and W. Burgard (2011) Place recognition in 3d scans using a combination of bag of words and point feature based relative pose estimation. In 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1249–1255. Cited by: §1, §2.
  • [31] S. Thrun (2002) Probabilistic robotics. Communications of the ACM 45 (3), pp. 52–57. Cited by: §1.
  • [32] H. Wang, C. Wang, and L. Xie (2020-05) Intensity scan context: coding intensity and geometry relations for loop closure detection. pp. 2095–2101. External Links: Document Cited by: §6.1.3.
  • [33] D. F. Wolf and G. S. Sukhatme (2005) Mobile robot simultaneous localization and mapping in dynamic environments. Autonomous Robots 19 (1), pp. 53–65. Cited by: §2.
  • [34] B. Yamauchi and R. Beer (1996) Spatial learning for navigation in dynamic environments. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 26 (3), pp. 496–505. Cited by: §2.
  • [35] J. Zhang and S. Singh (2014) LOAM: lidar odometry and mapping in real-time.. In Robotics: Science and Systems, Vol. 2, pp. 9. Cited by: §6.1.1.