State estimation is a classic and fundamental problem in robotics [cadena2016past]. Over the past decades, LiDARs have attracted much attention from the simultaneous localization and mapping (SLAM) community due to their accuracy and reliability in range measurements. Recent work [zhang2014loam, shan2018lego, behley2018efficient, lin2020decentralized, liu2020balm] has pushed LiDAR-SLAM (L-SLAM) systems that are accurate and robust. However, L-SLAM systems commonly present a high latency on a variety of on-board processors with limited computation resources. This issue is critical if the scale of SLAM becomes large, or modules such as high-level decision making are integrated. Thus, towards real-time SLAM for diverse applications, such systems must exhibit low latency (time delay between input and output) along with the preservation of their accuracy and robustness.
L-SLAM comprises two major computational tasks in L-SLAM: data association and optimization. Data association indicates feature matching between the current frames to the reference frames, while optimization solves the pose parameters by maximizing a likelihood function given a set of constraints. Compared to visual features such as SIFT [lowe2004distinctive] and ORB [rublee2011orb], matching 3D features is known to be less accurate [yang2019polynomial]
, thus producing much higher outlier rates. To enforce accuracy, most L-SLAM systems exploit thousands of features to solve a large nonlinear least-squares (NLS) problem. However, this scheme presents significant drawbacks. The data association has to perform numerous nearest-neighbor queries to match correspondences, which is commonly time-consuming. Given plentiful measurements, the computational complexity of the optimization based on gradient descent also grows quadratically.
A prevalent solution to bound the complexity is to perform data sampling. For instance, many LiDAR-based object detectors [qi2017pointnet, qi2017pointnet++, shi2018pointrcnn] leverage the farthest point sampling (FPS) [eldar1997farthest] to process input points. However, classic sampling methods do not consider downstream tasks specifically. Taking FPS as an example, it selects a subset of points with the objective to achieve a maximal coverage of the input set [lang2020samplenet]. But for SLAM, it would be better if the process of point selection conforms to the optimization objective. Ideally, exploiting the set of selected features in optimization should lead to low latency and performance improvements.
This paper proposes a general and straightforward feature selection algorithm for L-SLAM systems. We have a crucial observation behind our approach: that not all geometric constraints contribute equally to the localization accuracy. Intuitively, well-conditioned constraints should distribute different directions, constraining the pose from different angles [zhang2016degeneracy]
. For instance, orthogonal constraints commonly outweigh their parallel counterparts. The selected features, which are the most valuable/informative to the pose estimation, are defined asgood features [zhao2020good], and both the data association and state optimization utilize them only.
This paper extends our previous work on multi-LiDAR SLAM [jiao2020robust]. Multiple LiDARs enable a robot to maximize its perceptual awareness of environments and obtain sufficient measurements, but inevitably increase the processing time. In this paper, we investigate the latency issue. From the traditional perspective, there is a trade-off between the latency and accuracy [bodin2018slambench2]. But in Section VI, we demonstrate that the proposed feature selection method boosts the accuracy ( error reduction) and efficiency ( time reduction) of an L-SLAM system. Furthermore, by evaluating the environment’s degeneracy and adaptively setting the number of good features, our method also works well in non-ideal cases. Overall, our work presents the following contributions:
We transform the good feature section in LiDAR-based pose estimation into a problem that preserves the spectral property of information matrices.
We propose and integrate the feature selection method into a multi-LiDAR SLAM system. We also propose to evaluate the environment’s degeneracy and adaptively change the number of good features online.
We evaluate our approach under extensive experiments with two sensor setups, and computation platforms in terms of accuracy, robustness, and latency.
Ii Related Work
Scholarly works on SLAM are extensive. Since we focus on the optimal feature selection to improve L-SLAM’s efficiency, we review related works on two research topics: feature extraction and selection.
Ii-a Feature Extraction
Feature extraction is a process to build an informative, compact, and interpretable representation of raw measurements [guyon2008feature]. It has played a crucial role in the front end of many L-SLAM systems to facilitate subsequent tasks. SuMa and its variants [behley2018efficient, chen2019suma++, chen2020overlapnet] convert point clouds into range images and generated surfel-based maps. In contrast, LOAM [zhang2014loam] was proposed to extract features on both edge lines and planar surfaces, and LEGO-LOAM [shan2018lego] leverages ground features to constrain poses in the vertical direction. The following approaches apply visual detection [chen2020sloam] or directly used dense scanners [bosse2012zebedee, lin2020decentralized] to enhance the feature extraction in noisy or structureless environments. To decrease the number of features, Zhao et al. [zhou2020roi] presented a probabilistic framework to extract important region of interest by calculating features’ densities and distributions. This solution enables the removal of dynamic objects and redundant points in busy urban environments.
All of these methods extract features depending on local geometric structures, but they have not considered the explicit relationship between the pose estimation to select the most useful features. Our system is built on LOAM’s framework in feature selection. As a complement to the above appearance-based approaches, our solution identifies a set of good features by utilizing motion information.
Ii-B Feature Selection
Motion-based feature selection methods have been widely applied in visual SLAM [choudhary2015information, carlone2018attention, zhao2020good, fontan2020information], and many of them are based on the covariance or information matrices that capture uncertainties of poses [kaess2009covariance]. These methods formulate the feature selection as a submatrix selection problem and aim to find a subset of features with the objective to preserve the information matrix’s spectral attributes. Recent works by Carlone et al. [carlone2018attention] and Zhao et al. [zhao2020good] have investigated greedy-based algorithms [mirzasoleiman2015lazier] to solve this NP-hard feature selection problem at a polynomial-time complexity.
A common limitation of Carlone’s and Zhao’s works is that they assure sufficient features to be available. Under this assumption, the pose optimization with a set of good features remains well-conditioned. However, robots sometimes need to work in degraded environments such as textureless regions for cameras and narrow corridors for LiDARs. Therefore, only utilizing good features with a fixed size becomes degenerate. Based on Zhao’s feature selection approach, our method additionally evaluates environments’ degeneracy online, which enables us to adaptively change the number of good features to avoid the risk of ill conditions.
Iii Nonlinear Least-Squares Pose Estimation
We formulate the pose estimation of a L-SLAM system as an maximum likelihood estimation (MLE) [barfoot2017state]:
where represents the available features at the frame, is the robot’s pose to be optimized, and is the objective function. Assuming the measurement model to be Gaussian, problem (1) is solved as an NLS problem:
where is the robust loss [bosse2016robust] and is the covariance matrix. Problem (2) is equivalently rewritten as [barfoot2017state]
where is the alternative covariance matrix and is the derivative of . (2) is simplified as an iterative reweighed least-squares problem. Iterative methods such as Gauss-Newton or Levenberg-Marquardt can offen be used to solve this problem. These methods locally linearize the objective function by computing the Jacobian w.r.t. as . Given an initial guess, is iteratively optimized by usage of until convergence to find an optimum.
At the final iteration, the least-squares covariance of the state is calculated as [censi2007accurate], where is called the information matrix. This equation reveals the relationship between the pose covariance and information matrix. Generally, exploiting plentiful measurements in optimization should minimize poses’ uncertainties. This explains why SLAM systems commonly use all available features.
This paper focuses on low-latency applications in which the speed is highly prioritized. It requires us to utilize only a subset of features to accelerate the algorithm. As suggested in [zhao2020good], we can check whether a feature is selected or not by comparing the gains in spectrum of
. The word: “spectrum” denotes the set of eigenvalues of a matrix[golub2013matrix].
This section first formulates the good feature selection problem and introduces a metric to guide the selection. We apply the stochastic-greedy method, which combines the random sampling procedure, to solve this problem in real time. We then extend this algorithm to achieve efficient feature selection. Finally, we propose to evaluate the environment’s degeneracy online to avoid ill-conditioning estimation.
Iv-a Problem Formulation
We denote as the number of all available features, as the maximum number of selected features, and as the good feature set. We denote as the metric to quantify the spectral attribute of a matrix. We formute the feature selection problem under a cardinality constraint as
where is the information matrix on the good feature set. There are several options to define : the trace [summers2015submodularity], the minimum eigenvalue [carlone2018attention], and the log determinant [zhao2020good].
Since problem (4) is NP-hard, we cannot find efficient algorithms to obtain the optimal subset for real-time applications. Fortunately, all these metrics are submodular and monotone increasing [mirzasoleiman2015lazier], allowing the solution to be approximate via greedy methods with a performance guarantee. Zhao et al. [zhao2020good] experimented with these metrics in bundle adjustment, where the log determinant was demonstrated to have the lowest pose error and computation time. We thus choose the option as our metric.
Iv-B Stochastic Greedy Algorithm
The class of greedy methods has been studied to solve problem (4). Here, we introduce the stochastic-greedy algorithm [mirzasoleiman2015lazier], which applies randomized acceleration to avoid brute-force search. The idea is simple: at each round, the current best feature is picked up by examining all elements from a random subset. This is different from the simple greedy approach, which has to search the whole set. We define the size of the random subset as , where is the decay factor. The time complexity is , which is independent of . The stochastic-greedy algorithm has been proved to have near-optimal performance in [mirzasoleiman2015lazier]:
Theorem 1: Let be the non-negative, monotone, and submodular function. Setting the size of the random subset as . Denote as the optimal set and the result found by the stochastic-greedy algorithm. enjoys the approximation guarantees in expectation:
Iv-C Good Feature Selection for Pose Estimation
Based on the theoretical results, we employ the stochastic-greedy to achieve the feature selection for pose estimation. The detailed pipeline is summarized in Algorithm 1.
Overview: The algorithm starts with the metric, the number of good features , the decay factor , the map in the reference frame , the set of all available features in the current frame , and the initial pose . It produces the good feature set .
Line : The loop is exited if one of the following conditions is satisfied: good features are found or the computation time exceeds . The second condition limits the cost of finding good features. Since is submodular with diminishing returns, early termination does not induce much information loss.
Lines –: The correspondence of each feature in the random subset is found from . The residual is then computed. If a feature has already been visited at previous iterations, we skip these steps.
Lines –: The feature which leads to the maximum enhancement of the objective is selected. After that, the information matrix , , and are updated.
Furthermore, the process of feature selection implicitly performs outlier rejection: outliers are penalized by the robust loss in (3) with relatively small weights. They contribute less to
than standard features and will be selected with a low probability. Therefore, selecting good features might reduce the biases between estimates and the ground truth.
Iv-D Setting the Number of Good Features
Setting a proper size for the good feature set is essential to the system. Previous methods manually set as a constant value ( [zhao2020good]) or a fixed ratio of all features ( [carlone2018attention]). These schemes are feasible if sufficient features are always available. But if a robot has to work in non-ideal scenarios such as textureless walls or narrow corridors, utilizing a small set of features is not reliable. On the other hand, if we change the hard-coded number in a specific situation, it will inevitably increase the cost of deploying and maintaining a SLAM system on real platforms.
It would be better if were adaptively changed by evaluating the degeneracy online. Inspired by [zhang2016degeneracy], the magnitude of the degeneracy can be quantified by a factor . Differently, we define the factor using the log determinant metric as . Computing the information matrix on the full feature set is time-consuming. Since the robot performs a continuous movement in an environment, it is enough to compute at every time interval ().
V GF-Enhanced Multi-LiDAR SLAM System
The proposed feature selection method has been verified in an L-SLAM system called M-LOAM [jiao2020robust]. To distinguish it from the original system, the enhanced system is denoted by M-LOAM-gf. M-LOAM-gf solves SLAM with multiple LiDARs by two algorithms: odometry and mapping. Generally, these algorithms are designed to estimate poses in a coarse-to-fine fashion. Since they similarly formulate the NLS problem for pose estimation, the feature selection can be applied to both. Fig. 3 illustrates the overall structure of M-LOAM-gf. Note that loop closure is not included.
We give real definitions to the feature set . We extract features located on local edges and planar surfaces from the LiDARs’ raw measurements. According to the points’ curvatures, we select a set of edge and planar features to form . The next step is to match features between the reference frame and the robot’s current frame. In both odometry and mapping, we use the feature map in the reference frame to associate data with . The only difference is the scale. The local map in odometry is built within a small time interval (), while the global map in mapping is constructed using all features in keyframes. For convenience, we use to denote both the local and global maps.
With the found correspondences, we can optimize the relative transformation by minimizing the sum of all errors. The good feature selection algorithm enables M-LOAM-gf to select only a set of features in optimization while preserving the spectrum of the information matrix. An example of good features are shown in Fig. 2. After obtaining the good feature set in Section IV-C, the objective function is written as
for which the expression of the residuals and Jacobians is detailed in our supplementary material [jiao2020supplementary].
We evaluate the performance of M-LOAM-gf on real-world experiments. First, we validate the stochastic-greedy-based feature selection. Second, we demonstrate the localization accuracy of M-LOAM-gf in various scenarios covering indoor environments and outdoor urban roads with two multi-LiDAR setups. Three SOTA L-SLAM systems are compared. We also study M-LOAM-gf’s latency on an on-board processor with limited computation resources.
Vi-a Implementation Details
We use the PCL library [rusu20113d] to process point clouds and the Ceres Solver [agarwal2015others] to solve the NLS problems. Our method is tested on sequences collected with two platforms:
Oxford Robocar (OR) [barnes2019oxford] is a vehicle equipped with two HDL-32E222https://velodynelidar.com/products/hdl-32e. Datasets were recorded by driving the car on urban roads at an average speed of . repeated traversals of a route were collected. Ground-truth poses in are available. We select one sequence lasting minutes and split it into sequences named OR01–OR05 for evaluation.
Vi-B Validation on Good Feature Selection
This section validates that the greedy algorithm selects a set of valuable features with a large . Our stochastic-greedy method (label: greedy) is compared with the fully randomized selection method (label: rnd). Table I
reports the means and standard deviations (std) of. The values with the full feature set (label: full) are provided for reference. On OR01–OR02, the greedy method gains a smaller objective than the rnd method. This is reasonable since the greedy algorithm cannot always achieve the best performance according to (5). By considering the std and the larger means on most sequences, we conclude that the greedy method outperforms the rnd method.
Vi-C Performance of SLAM
We compare the accuracy, robustness, and latency of M-LOAM-gf with several baseline methods:
M-LOAM-rnd is the variant of M-LOAM with the rnd feature selection module in mapping.
M-LOAM-full is the original M-LOAM that uses the full feature set in mapping.
The odometry and mapping of all methods run at Hz and Hz. For a fair comparison, the loop closure modules in some baselines are deactivated. The resolutions of the voxel filter [rusu20113d] on the edge and planar features are and .
Vi-C1 Qualitative Comparison
We first test our method on RHD sequences. RHD01lab is recorded by moving around an office space, in which several scenes provide only poor geometric constraints. Fig. 4(b) indicates two examples. Scene 1 is a long and narrow corridor, which is a typical degenerate environment [ye2019tightly]. Scene 2 is an indoor office, providing well-conditioned constraints. M-LOAM-gf successfully tracks robot poses due to its capability in evaluating the environment’s degeneracy. M-LOAM-rnd has a sudden drift in scene 1 since using only of features cannot constrain the poses. A-LOAM also fails because it cannot model the uncertainty in mapping, which is detailed in [jiao2020robust]. RHD02garden is collected in a garden. Estimated trajectories and scene images are shown in Fig. 4(c). Since the environment is well-conditioned, all trajectories are comparative.
Vi-C2 Localization Accuracy
We then perform a large-scale outdoor test on OR sequences under repetitions. Environments on OR sequences commonly provide sufficient features. We visualize M-LOAM-gf’s trajectory against the ground truth and the built map on OR02 in Fig. 6, and plot trajectories the other sequences in Fig. 5. Each method is evaluated the absolute trajectory error (ATE) and the relative pose error (RPE) [zhang2018tutorial]. Due to limited space, we report the translation ATE in Table II, and show complete results in our supplementary material [jiao2020supplementary]. M-LOAM-gf does not just preserve the accuracy of M-LOAM-full, but it also reduces the ATE on all sequences. The average translation ATE of M-LOAM-gf is lower than that of F-LOAM (the second-best method). The feature selection implicitly rejects outliers (see Section IV-C), which is essential to such accuracy gains. Thus, M-LOAM-rnd also improves M-LOAM-full. But its drift is larger than that of M-LOAM-gf. The performance of the fully randomized operation in M-LOAM-rnd is not guaranteed, which occasionally lead to inconsistent results.
Experiments in the above sections are conducted on a desktop with an i7 CPU@4.2 GHz and 32 GB memory. The average latency of mapping of M-LOAM-gf over and frames on the RHD and OR sequences is and respectively. To demonstrate that our feature selection method boosts an L-SLAM system on processors with limited resources, M-LOAM-gf is also tested on an Intel NUC555zh.wikipedia.org/wiki/Next_Unit_of_Computing with an i7 CPU@3.1 GHz and 8 GB memory. The average latency is reported in Table III. We run the rosbag at a low frequency to ensure no data loss.
First of all, we observe that M-LOAM-rnd has lower latency than M-LOAM-gf in the GF-based data association. This is because the rnd is an algorithm, while the stochastic-greedy algorithm is . Second, compared with M-LOAM-full, M-LOAM-gf may need more time for feature matching but significantly save time in nonlinear optimization. Finally, M-LOAM-gf, M-LOAM-rnd, and LEGO-LOAM are three real-time systems ( at each mapping frame) for the Intel NUC. Both M-LOAM-gf and M-LOAM-rnd outperform LEGO-LOAM in terms of accuracy. LEGO-LOAM implicitly performs feature selection since it filters out points if the distances to their correspondences are larger than a threshold. But this naive and hard-coded solution leads to large accuracy loss.
|Latency: Time delay between the input and output of a function.|
In this paper, we propose a greedy-based feature selection method for NLS pose estimation using LiDARs. The feature selector retains the most valuable LiDAR features with the objective of preserving the information matrix’s spectrum. The stochastic-greedy algorithm is applied for the real-time selection. Moreover, we also investigate the degeneracy issue of utilizing good features for pose estimation in structureless environments. We propose a strategy to adaptively change the number of good features to avoid ill-conditioned estimation. The feature selection is integrated into a multi-LiDAR SLAM system, followed by evaluation on sequences with two sensor setups and computation platforms. The enhanced system is shown to have great efficiency and higher localization accuracy than SOTA methods. The idea of feature selection is general and can be applied to many NLS problems.
Future work will concern two possible directions. The first direction is to utilize data-driven methods [wong2020data] to online tune parameters which were mannually set. Another direction is to apply the proposed feature selection to other tasks, such as bundle adjustment [zhao2020goodgraph] and cross-model localization [huang2020gmmloc].