Greedy-Based Feature Selection for Efficient LiDAR SLAM

03/24/2021 ∙ by Jianhao Jiao, et al. ∙ 0

Modern LiDAR-SLAM (L-SLAM) systems have shown excellent results in large-scale, real-world scenarios. However, they commonly have a high latency due to the expensive data association and nonlinear optimization. This paper demonstrates that actively selecting a subset of features significantly improves both the accuracy and efficiency of an L-SLAM system. We formulate the feature selection as a combinatorial optimization problem under a cardinality constraint to preserve the information matrix's spectral attributes. The stochastic-greedy algorithm is applied to approximate the optimal results in real-time. To avoid ill-conditioned estimation, we also propose a general strategy to evaluate the environment's degeneracy and modify the feature number online. The proposed feature selector is integrated into a multi-LiDAR SLAM system. We validate this enhanced system with extensive experiments covering various scenarios on two sensor setups and computation platforms. We show that our approach exhibits low localization error and speedup compared to the state-of-the-art L-SLAM systems. To benefit the community, we have released the source code: https://ram-lab.com/file/site/m-loam.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

I-a Motivation

State estimation is a classic and fundamental problem in robotics [cadena2016past]. Over the past decades, LiDARs have attracted much attention from the simultaneous localization and mapping (SLAM) community due to their accuracy and reliability in range measurements. Recent work [zhang2014loam, shan2018lego, behley2018efficient, lin2020decentralized, liu2020balm] has pushed LiDAR-SLAM (L-SLAM) systems that are accurate and robust. However, L-SLAM systems commonly present a high latency on a variety of on-board processors with limited computation resources. This issue is critical if the scale of SLAM becomes large, or modules such as high-level decision making are integrated. Thus, towards real-time SLAM for diverse applications, such systems must exhibit low latency (time delay between input and output) along with the preservation of their accuracy and robustness.

L-SLAM comprises two major computational tasks in L-SLAM: data association and optimization. Data association indicates feature matching between the current frames to the reference frames, while optimization solves the pose parameters by maximizing a likelihood function given a set of constraints. Compared to visual features such as SIFT [lowe2004distinctive] and ORB [rublee2011orb], matching 3D features is known to be less accurate [yang2019polynomial]

, thus producing much higher outlier rates. To enforce accuracy, most L-SLAM systems exploit thousands of features to solve a large nonlinear least-squares (NLS) problem. However, this scheme presents significant drawbacks. The data association has to perform numerous nearest-neighbor queries to match correspondences, which is commonly time-consuming. Given plentiful measurements, the computational complexity of the optimization based on gradient descent also grows quadratically.

A prevalent solution to bound the complexity is to perform data sampling. For instance, many LiDAR-based object detectors [qi2017pointnet, qi2017pointnet++, shi2018pointrcnn] leverage the farthest point sampling (FPS) [eldar1997farthest] to process input points. However, classic sampling methods do not consider downstream tasks specifically. Taking FPS as an example, it selects a subset of points with the objective to achieve a maximal coverage of the input set [lang2020samplenet]. But for SLAM, it would be better if the process of point selection conforms to the optimization objective. Ideally, exploiting the set of selected features in optimization should lead to low latency and performance improvements.

I-B Contributions

This paper proposes a general and straightforward feature selection algorithm for L-SLAM systems. We have a crucial observation behind our approach: that not all geometric constraints contribute equally to the localization accuracy. Intuitively, well-conditioned constraints should distribute different directions, constraining the pose from different angles [zhang2016degeneracy]

. For instance, orthogonal constraints commonly outweigh their parallel counterparts. The selected features, which are the most valuable/informative to the pose estimation, are defined as

good features [zhao2020good], and both the data association and state optimization utilize them only.

This paper extends our previous work on multi-LiDAR SLAM [jiao2020robust]. Multiple LiDARs enable a robot to maximize its perceptual awareness of environments and obtain sufficient measurements, but inevitably increase the processing time. In this paper, we investigate the latency issue. From the traditional perspective, there is a trade-off between the latency and accuracy [bodin2018slambench2]. But in Section VI, we demonstrate that the proposed feature selection method boosts the accuracy ( error reduction) and efficiency ( time reduction) of an L-SLAM system. Furthermore, by evaluating the environment’s degeneracy and adaptively setting the number of good features, our method also works well in non-ideal cases. Overall, our work presents the following contributions:

  1. We transform the good feature section in LiDAR-based pose estimation into a problem that preserves the spectral property of information matrices.

  2. We propose and integrate the feature selection method into a multi-LiDAR SLAM system. We also propose to evaluate the environment’s degeneracy and adaptively change the number of good features online.

  3. We evaluate our approach under extensive experiments with two sensor setups, and computation platforms in terms of accuracy, robustness, and latency.

Ii Related Work

Scholarly works on SLAM are extensive. Since we focus on the optimal feature selection to improve L-SLAM’s efficiency, we review related works on two research topics: feature extraction and selection.

Ii-a Feature Extraction

Feature extraction is a process to build an informative, compact, and interpretable representation of raw measurements [guyon2008feature]. It has played a crucial role in the front end of many L-SLAM systems to facilitate subsequent tasks. SuMa and its variants [behley2018efficient, chen2019suma++, chen2020overlapnet] convert point clouds into range images and generated surfel-based maps. In contrast, LOAM [zhang2014loam] was proposed to extract features on both edge lines and planar surfaces, and LEGO-LOAM [shan2018lego] leverages ground features to constrain poses in the vertical direction. The following approaches apply visual detection [chen2020sloam] or directly used dense scanners [bosse2012zebedee, lin2020decentralized] to enhance the feature extraction in noisy or structureless environments. To decrease the number of features, Zhao et al. [zhou2020roi] presented a probabilistic framework to extract important region of interest by calculating features’ densities and distributions. This solution enables the removal of dynamic objects and redundant points in busy urban environments.

All of these methods extract features depending on local geometric structures, but they have not considered the explicit relationship between the pose estimation to select the most useful features. Our system is built on LOAM’s framework in feature selection. As a complement to the above appearance-based approaches, our solution identifies a set of good features by utilizing motion information.

Ii-B Feature Selection

Motion-based feature selection methods have been widely applied in visual SLAM [choudhary2015information, carlone2018attention, zhao2020good, fontan2020information], and many of them are based on the covariance or information matrices that capture uncertainties of poses [kaess2009covariance]. These methods formulate the feature selection as a submatrix selection problem and aim to find a subset of features with the objective to preserve the information matrix’s spectral attributes. Recent works by Carlone et al. [carlone2018attention] and Zhao et al. [zhao2020good] have investigated greedy-based algorithms [mirzasoleiman2015lazier] to solve this NP-hard feature selection problem at a polynomial-time complexity.

A common limitation of Carlone’s and Zhao’s works is that they assure sufficient features to be available. Under this assumption, the pose optimization with a set of good features remains well-conditioned. However, robots sometimes need to work in degraded environments such as textureless regions for cameras and narrow corridors for LiDARs. Therefore, only utilizing good features with a fixed size becomes degenerate. Based on Zhao’s feature selection approach, our method additionally evaluates environments’ degeneracy online, which enables us to adaptively change the number of good features to avoid the risk of ill conditions.

Iii Nonlinear Least-Squares Pose Estimation

We formulate the pose estimation of a L-SLAM system as an maximum likelihood estimation (MLE) [barfoot2017state]:

(1)

where represents the available features at the frame, is the robot’s pose to be optimized, and is the objective function. Assuming the measurement model to be Gaussian, problem (1) is solved as an NLS problem:

(2)

where is the robust loss [bosse2016robust] and is the covariance matrix. Problem (2) is equivalently rewritten as [barfoot2017state]

(3)

where is the alternative covariance matrix and is the derivative of . (2) is simplified as an iterative reweighed least-squares problem. Iterative methods such as Gauss-Newton or Levenberg-Marquardt can offen be used to solve this problem. These methods locally linearize the objective function by computing the Jacobian w.r.t. as . Given an initial guess, is iteratively optimized by usage of until convergence to find an optimum.

At the final iteration, the least-squares covariance of the state is calculated as [censi2007accurate], where is called the information matrix. This equation reveals the relationship between the pose covariance and information matrix. Generally, exploiting plentiful measurements in optimization should minimize poses’ uncertainties. This explains why SLAM systems commonly use all available features.

This paper focuses on low-latency applications in which the speed is highly prioritized. It requires us to utilize only a subset of features to accelerate the algorithm. As suggested in [zhao2020good], we can check whether a feature is selected or not by comparing the gains in spectrum of

. The word: “spectrum” denotes the set of eigenvalues of a matrix

[golub2013matrix].

Iv Methodology

This section first formulates the good feature selection problem and introduces a metric to guide the selection. We apply the stochastic-greedy method, which combines the random sampling procedure, to solve this problem in real time. We then extend this algorithm to achieve efficient feature selection. Finally, we propose to evaluate the environment’s degeneracy online to avoid ill-conditioning estimation.

Iv-a Problem Formulation

We denote as the number of all available features, as the maximum number of selected features, and as the good feature set. We denote as the metric to quantify the spectral attribute of a matrix. We formute the feature selection problem under a cardinality constraint as

(4)

where is the information matrix on the good feature set. There are several options to define : the trace [summers2015submodularity], the minimum eigenvalue [carlone2018attention], and the log determinant [zhao2020good].

Since problem (4) is NP-hard, we cannot find efficient algorithms to obtain the optimal subset for real-time applications. Fortunately, all these metrics are submodular and monotone increasing [mirzasoleiman2015lazier], allowing the solution to be approximate via greedy methods with a performance guarantee. Zhao et al. [zhao2020good] experimented with these metrics in bundle adjustment, where the log determinant was demonstrated to have the lowest pose error and computation time. We thus choose the option as our metric.

Iv-B Stochastic Greedy Algorithm

The class of greedy methods has been studied to solve problem (4). Here, we introduce the stochastic-greedy algorithm [mirzasoleiman2015lazier], which applies randomized acceleration to avoid brute-force search. The idea is simple: at each round, the current best feature is picked up by examining all elements from a random subset. This is different from the simple greedy approach, which has to search the whole set. We define the size of the random subset as , where is the decay factor. The time complexity is , which is independent of . The stochastic-greedy algorithm has been proved to have near-optimal performance in [mirzasoleiman2015lazier]:

Theorem 1: Let be the non-negative, monotone, and submodular function. Setting the size of the random subset as . Denote as the optimal set and the result found by the stochastic-greedy algorithm. enjoys the approximation guarantees in expectation:

(5)

Iv-C Good Feature Selection for Pose Estimation

Input: , , , , , ;
Output: good feature set ;
1 Initialize the set ;
2 while  and  do
3       the random subset is obtained by sampling random elements from ;
4       foreach  do
5             Search the correspondence from ;
6             if the correspondence is found then
7                   Compute the residual ;
8                   Compute w.r.t. where is the Jacobian of ;
9                  
10            else
11                   ;   ;
12                  
13            
14      ;
15       ;
16       ;   ;
17      
Algorithm 1 Stochastic Greedy-Based Good Feature Selection for NLS Pose Estimation

Based on the theoretical results, we employ the stochastic-greedy to achieve the feature selection for pose estimation. The detailed pipeline is summarized in Algorithm 1.

  1. Overview: The algorithm starts with the metric, the number of good features , the decay factor , the map in the reference frame , the set of all available features in the current frame , and the initial pose . It produces the good feature set .

  2. Line : The loop is exited if one of the following conditions is satisfied: good features are found or the computation time exceeds . The second condition limits the cost of finding good features. Since is submodular with diminishing returns, early termination does not induce much information loss.

  3. Lines : The correspondence of each feature in the random subset is found from . The residual is then computed. If a feature has already been visited at previous iterations, we skip these steps.

  4. Lines : The feature which leads to the maximum enhancement of the objective is selected. After that, the information matrix , , and are updated.

Furthermore, the process of feature selection implicitly performs outlier rejection: outliers are penalized by the robust loss in (3) with relatively small weights. They contribute less to

than standard features and will be selected with a low probability. Therefore, selecting good features might reduce the biases between estimates and the ground truth.

Iv-D Setting the Number of Good Features

Setting a proper size for the good feature set is essential to the system. Previous methods manually set as a constant value ( [zhao2020good]) or a fixed ratio of all features ( [carlone2018attention]). These schemes are feasible if sufficient features are always available. But if a robot has to work in non-ideal scenarios such as textureless walls or narrow corridors, utilizing a small set of features is not reliable. On the other hand, if we change the hard-coded number in a specific situation, it will inevitably increase the cost of deploying and maintaining a SLAM system on real platforms.

It would be better if were adaptively changed by evaluating the degeneracy online. Inspired by [zhang2016degeneracy], the magnitude of the degeneracy can be quantified by a factor . Differently, we define the factor using the log determinant metric as . Computing the information matrix on the full feature set is time-consuming. Since the robot performs a continuous movement in an environment, it is enough to compute at every time interval ().

Fig. 1 plots the values of on different sequences. RHD01lab contains several degenerate scenarios, while other sequences do not (see Section VI-C). Therefore, we empirically set . If , we select of features from the full set as the good features (i.e., ). Otherwise, we use of features from the set.

Fig. 1: The value of the degeneracy factor on different sequences.

V GF-Enhanced Multi-LiDAR SLAM System

(a) Full feature set ( points).
(b) Good features ( points).
Fig. 2: A qualitative example of good features selected by our greedy-based feature selection algorithm. This method pick up points on objects which provide strong geometric constraints, as indicated in the region 1 and 3. Points on the ground, which occupy around of the full feature set, are indicated by the region 2. Since they only constrain poses on the axis, our method only selected them with a small number. This is the main difference from the fully random sampling method (see Section VI-B).

The proposed feature selection method has been verified in an L-SLAM system called M-LOAM [jiao2020robust]. To distinguish it from the original system, the enhanced system is denoted by M-LOAM-gf. M-LOAM-gf solves SLAM with multiple LiDARs by two algorithms: odometry and mapping. Generally, these algorithms are designed to estimate poses in a coarse-to-fine fashion. Since they similarly formulate the NLS problem for pose estimation, the feature selection can be applied to both. Fig. 3 illustrates the overall structure of M-LOAM-gf. Note that loop closure is not included.

We give real definitions to the feature set . We extract features located on local edges and planar surfaces from the LiDARs’ raw measurements. According to the points’ curvatures, we select a set of edge and planar features to form . The next step is to match features between the reference frame and the robot’s current frame. In both odometry and mapping, we use the feature map in the reference frame to associate data with . The only difference is the scale. The local map in odometry is built within a small time interval (), while the global map in mapping is constructed using all features in keyframes. For convenience, we use to denote both the local and global maps.

With the found correspondences, we can optimize the relative transformation by minimizing the sum of all errors. The good feature selection algorithm enables M-LOAM-gf to select only a set of features in optimization while preserving the spectrum of the information matrix. An example of good features are shown in Fig. 2. After obtaining the good feature set in Section IV-C, the objective function is written as

(6)

for which the expression of the residuals and Jacobians is detailed in our supplementary material [jiao2020supplementary].

Fig. 3: Block diagram of the pipeline of M-LOAM-gf.

Vi Experiment

Sequence Greedy Rnd Full
RHD01lab
RHD02garden
OR01
OR02
OR03
OR04
OR05
Average
TABLE I: Mean and std of on RHD and OR sequences.
(a) The real handheld device (RHD).
(b) Results on the RHD01lab.
(c) Results on the RHD02garden.
Fig. 4: Estimated trajectories of different methods and the scene image on two RHD sequences.
Fig. 5: M-LOAM-gf’s trajectories on OR01, OR03, OR04, and OR05 from the Oxford Robocar dataset [barnes2019oxford] are aligned with the ground truth .
Fig. 6: Results on the OR02. (left) Map and M-LOAM-gf’s path. (right) Estimated trajectories of different methods are aligned with the ground truth.

We evaluate the performance of M-LOAM-gf on real-world experiments. First, we validate the stochastic-greedy-based feature selection. Second, we demonstrate the localization accuracy of M-LOAM-gf in various scenarios covering indoor environments and outdoor urban roads with two multi-LiDAR setups. Three SOTA L-SLAM systems are compared. We also study M-LOAM-gf’s latency on an on-board processor with limited computation resources.

Vi-a Implementation Details

We use the PCL library [rusu20113d] to process point clouds and the Ceres Solver [agarwal2015others] to solve the NLS problems. Our method is tested on sequences collected with two platforms:

  • Real Handheld Device (RHD) is made for indoor tests and shown in Fig. 4(a). It is installed with two VLP-16111https://velodynelidar.com/products/puck. We held this device to collect two sequences (RHD01lab and RHD02garden) with an average speed of .

  • Oxford Robocar (OR) [barnes2019oxford] is a vehicle equipped with two HDL-32E222https://velodynelidar.com/products/hdl-32e. Datasets were recorded by driving the car on urban roads at an average speed of . repeated traversals of a route were collected. Ground-truth poses in are available. We select one sequence lasting minutes and split it into sequences named OR01OR05 for evaluation.

Vi-B Validation on Good Feature Selection

This section validates that the greedy algorithm selects a set of valuable features with a large . Our stochastic-greedy method (label: greedy) is compared with the fully randomized selection method (label: rnd). Table I

reports the means and standard deviations (std) of

. The values with the full feature set (label: full) are provided for reference. On OR01OR02, the greedy method gains a smaller objective than the rnd method. This is reasonable since the greedy algorithm cannot always achieve the best performance according to (5). By considering the std and the larger means on most sequences, we conclude that the greedy method outperforms the rnd method.

Seq. Dimension M-LOAM-gf M-LOAM-rnd M-LOAM-full A-LOAM F-LOAM LEGO-LOAM

OR01
OR02
OR03
OR04
OR05
Average
TABLE II: Translational ATE [zhang2018tutorial] on OR sequences (the two best results are marked as bold text).

Vi-C Performance of SLAM

We compare the accuracy, robustness, and latency of M-LOAM-gf with several baseline methods:

The odometry and mapping of all methods run at Hz and Hz. For a fair comparison, the loop closure modules in some baselines are deactivated. The resolutions of the voxel filter [rusu20113d] on the edge and planar features are and .

Vi-C1 Qualitative Comparison

We first test our method on RHD sequences. RHD01lab is recorded by moving around an office space, in which several scenes provide only poor geometric constraints. Fig. 4(b) indicates two examples. Scene 1 is a long and narrow corridor, which is a typical degenerate environment [ye2019tightly]. Scene 2 is an indoor office, providing well-conditioned constraints. M-LOAM-gf successfully tracks robot poses due to its capability in evaluating the environment’s degeneracy. M-LOAM-rnd has a sudden drift in scene 1 since using only of features cannot constrain the poses. A-LOAM also fails because it cannot model the uncertainty in mapping, which is detailed in [jiao2020robust]. RHD02garden is collected in a garden. Estimated trajectories and scene images are shown in Fig. 4(c). Since the environment is well-conditioned, all trajectories are comparative.

Vi-C2 Localization Accuracy

We then perform a large-scale outdoor test on OR sequences under repetitions. Environments on OR sequences commonly provide sufficient features. We visualize M-LOAM-gf’s trajectory against the ground truth and the built map on OR02 in Fig. 6, and plot trajectories the other sequences in Fig. 5. Each method is evaluated the absolute trajectory error (ATE) and the relative pose error (RPE) [zhang2018tutorial]. Due to limited space, we report the translation ATE in Table II, and show complete results in our supplementary material [jiao2020supplementary]. M-LOAM-gf does not just preserve the accuracy of M-LOAM-full, but it also reduces the ATE on all sequences. The average translation ATE of M-LOAM-gf is lower than that of F-LOAM (the second-best method). The feature selection implicitly rejects outliers (see Section IV-C), which is essential to such accuracy gains. Thus, M-LOAM-rnd also improves M-LOAM-full. But its drift is larger than that of M-LOAM-gf. The performance of the fully randomized operation in M-LOAM-rnd is not guaranteed, which occasionally lead to inconsistent results.

Vi-C3 Latency

Experiments in the above sections are conducted on a desktop with an i7 CPU@4.2 GHz and 32 GB memory. The average latency of mapping of M-LOAM-gf over and frames on the RHD and OR sequences is and respectively. To demonstrate that our feature selection method boosts an L-SLAM system on processors with limited resources, M-LOAM-gf is also tested on an Intel NUC555zh.wikipedia.org/wiki/Next_Unit_of_Computing with an i7 CPU@3.1 GHz and 8 GB memory. The average latency is reported in Table III. We run the rosbag at a low frequency to ensure no data loss.

First of all, we observe that M-LOAM-rnd has lower latency than M-LOAM-gf in the GF-based data association. This is because the rnd is an algorithm, while the stochastic-greedy algorithm is . Second, compared with M-LOAM-full, M-LOAM-gf may need more time for feature matching but significantly save time in nonlinear optimization. Finally, M-LOAM-gf, M-LOAM-rnd, and LEGO-LOAM are three real-time systems ( at each mapping frame) for the Intel NUC. Both M-LOAM-gf and M-LOAM-rnd outperform LEGO-LOAM in terms of accuracy. LEGO-LOAM implicitly performs feature selection since it filters out points if the distances to their correspondences are larger than a threshold. But this naive and hard-coded solution leads to large accuracy loss.

Seq. Method Mapping
Data association Optimization Total
RHD M-LOAM-gf
M-LOAM-rnd
M-LOAM-full
A-LOAM
OR M-LOAM-gf
M-LOAM-rnd
M-LOAM-full
A-LOAM
F-LOAM
LEGO-LOAM
Latency: Time delay between the input and output of a function.
TABLE III: Average latency [ms] of mapping on an Intel NUC.

Vii Conclusion

In this paper, we propose a greedy-based feature selection method for NLS pose estimation using LiDARs. The feature selector retains the most valuable LiDAR features with the objective of preserving the information matrix’s spectrum. The stochastic-greedy algorithm is applied for the real-time selection. Moreover, we also investigate the degeneracy issue of utilizing good features for pose estimation in structureless environments. We propose a strategy to adaptively change the number of good features to avoid ill-conditioned estimation. The feature selection is integrated into a multi-LiDAR SLAM system, followed by evaluation on sequences with two sensor setups and computation platforms. The enhanced system is shown to have great efficiency and higher localization accuracy than SOTA methods. The idea of feature selection is general and can be applied to many NLS problems.

Future work will concern two possible directions. The first direction is to utilize data-driven methods [wong2020data] to online tune parameters which were mannually set. Another direction is to apply the proposed feature selection to other tasks, such as bundle adjustment [zhao2020goodgraph] and cross-model localization [huang2020gmmloc].

References