I Introduction
Ia Motivation
State estimation is a classic and fundamental problem in robotics [cadena2016past]. Over the past decades, LiDARs have attracted much attention from the simultaneous localization and mapping (SLAM) community due to their accuracy and reliability in range measurements. Recent work [zhang2014loam, shan2018lego, behley2018efficient, lin2020decentralized, liu2020balm] has pushed LiDARSLAM (LSLAM) systems that are accurate and robust. However, LSLAM systems commonly present a high latency on a variety of onboard processors with limited computation resources. This issue is critical if the scale of SLAM becomes large, or modules such as highlevel decision making are integrated. Thus, towards realtime SLAM for diverse applications, such systems must exhibit low latency (time delay between input and output) along with the preservation of their accuracy and robustness.
LSLAM comprises two major computational tasks in LSLAM: data association and optimization. Data association indicates feature matching between the current frames to the reference frames, while optimization solves the pose parameters by maximizing a likelihood function given a set of constraints. Compared to visual features such as SIFT [lowe2004distinctive] and ORB [rublee2011orb], matching 3D features is known to be less accurate [yang2019polynomial]
, thus producing much higher outlier rates. To enforce accuracy, most LSLAM systems exploit thousands of features to solve a large nonlinear leastsquares (NLS) problem. However, this scheme presents significant drawbacks. The data association has to perform numerous nearestneighbor queries to match correspondences, which is commonly timeconsuming. Given plentiful measurements, the computational complexity of the optimization based on gradient descent also grows quadratically.
A prevalent solution to bound the complexity is to perform data sampling. For instance, many LiDARbased object detectors [qi2017pointnet, qi2017pointnet++, shi2018pointrcnn] leverage the farthest point sampling (FPS) [eldar1997farthest] to process input points. However, classic sampling methods do not consider downstream tasks specifically. Taking FPS as an example, it selects a subset of points with the objective to achieve a maximal coverage of the input set [lang2020samplenet]. But for SLAM, it would be better if the process of point selection conforms to the optimization objective. Ideally, exploiting the set of selected features in optimization should lead to low latency and performance improvements.
IB Contributions
This paper proposes a general and straightforward feature selection algorithm for LSLAM systems. We have a crucial observation behind our approach: that not all geometric constraints contribute equally to the localization accuracy. Intuitively, wellconditioned constraints should distribute different directions, constraining the pose from different angles [zhang2016degeneracy]
. For instance, orthogonal constraints commonly outweigh their parallel counterparts. The selected features, which are the most valuable/informative to the pose estimation, are defined as
good features [zhao2020good], and both the data association and state optimization utilize them only.This paper extends our previous work on multiLiDAR SLAM [jiao2020robust]. Multiple LiDARs enable a robot to maximize its perceptual awareness of environments and obtain sufficient measurements, but inevitably increase the processing time. In this paper, we investigate the latency issue. From the traditional perspective, there is a tradeoff between the latency and accuracy [bodin2018slambench2]. But in Section VI, we demonstrate that the proposed feature selection method boosts the accuracy ( error reduction) and efficiency ( time reduction) of an LSLAM system. Furthermore, by evaluating the environment’s degeneracy and adaptively setting the number of good features, our method also works well in nonideal cases. Overall, our work presents the following contributions:

We transform the good feature section in LiDARbased pose estimation into a problem that preserves the spectral property of information matrices.

We propose and integrate the feature selection method into a multiLiDAR SLAM system. We also propose to evaluate the environment’s degeneracy and adaptively change the number of good features online.

We evaluate our approach under extensive experiments with two sensor setups, and computation platforms in terms of accuracy, robustness, and latency.
Ii Related Work
Scholarly works on SLAM are extensive. Since we focus on the optimal feature selection to improve LSLAM’s efficiency, we review related works on two research topics: feature extraction and selection.
Iia Feature Extraction
Feature extraction is a process to build an informative, compact, and interpretable representation of raw measurements [guyon2008feature]. It has played a crucial role in the front end of many LSLAM systems to facilitate subsequent tasks. SuMa and its variants [behley2018efficient, chen2019suma++, chen2020overlapnet] convert point clouds into range images and generated surfelbased maps. In contrast, LOAM [zhang2014loam] was proposed to extract features on both edge lines and planar surfaces, and LEGOLOAM [shan2018lego] leverages ground features to constrain poses in the vertical direction. The following approaches apply visual detection [chen2020sloam] or directly used dense scanners [bosse2012zebedee, lin2020decentralized] to enhance the feature extraction in noisy or structureless environments. To decrease the number of features, Zhao et al. [zhou2020roi] presented a probabilistic framework to extract important region of interest by calculating features’ densities and distributions. This solution enables the removal of dynamic objects and redundant points in busy urban environments.
All of these methods extract features depending on local geometric structures, but they have not considered the explicit relationship between the pose estimation to select the most useful features. Our system is built on LOAM’s framework in feature selection. As a complement to the above appearancebased approaches, our solution identifies a set of good features by utilizing motion information.
IiB Feature Selection
Motionbased feature selection methods have been widely applied in visual SLAM [choudhary2015information, carlone2018attention, zhao2020good, fontan2020information], and many of them are based on the covariance or information matrices that capture uncertainties of poses [kaess2009covariance]. These methods formulate the feature selection as a submatrix selection problem and aim to find a subset of features with the objective to preserve the information matrix’s spectral attributes. Recent works by Carlone et al. [carlone2018attention] and Zhao et al. [zhao2020good] have investigated greedybased algorithms [mirzasoleiman2015lazier] to solve this NPhard feature selection problem at a polynomialtime complexity.
A common limitation of Carlone’s and Zhao’s works is that they assure sufficient features to be available. Under this assumption, the pose optimization with a set of good features remains wellconditioned. However, robots sometimes need to work in degraded environments such as textureless regions for cameras and narrow corridors for LiDARs. Therefore, only utilizing good features with a fixed size becomes degenerate. Based on Zhao’s feature selection approach, our method additionally evaluates environments’ degeneracy online, which enables us to adaptively change the number of good features to avoid the risk of ill conditions.
Iii Nonlinear LeastSquares Pose Estimation
We formulate the pose estimation of a LSLAM system as an maximum likelihood estimation (MLE) [barfoot2017state]:
(1) 
where represents the available features at the frame, is the robot’s pose to be optimized, and is the objective function. Assuming the measurement model to be Gaussian, problem (1) is solved as an NLS problem:
(2) 
where is the robust loss [bosse2016robust] and is the covariance matrix. Problem (2) is equivalently rewritten as [barfoot2017state]
(3) 
where is the alternative covariance matrix and is the derivative of . (2) is simplified as an iterative reweighed leastsquares problem. Iterative methods such as GaussNewton or LevenbergMarquardt can offen be used to solve this problem. These methods locally linearize the objective function by computing the Jacobian w.r.t. as . Given an initial guess, is iteratively optimized by usage of until convergence to find an optimum.
At the final iteration, the leastsquares covariance of the state is calculated as [censi2007accurate], where is called the information matrix. This equation reveals the relationship between the pose covariance and information matrix. Generally, exploiting plentiful measurements in optimization should minimize poses’ uncertainties. This explains why SLAM systems commonly use all available features.
This paper focuses on lowlatency applications in which the speed is highly prioritized. It requires us to utilize only a subset of features to accelerate the algorithm. As suggested in [zhao2020good], we can check whether a feature is selected or not by comparing the gains in spectrum of
. The word: “spectrum” denotes the set of eigenvalues of a matrix
[golub2013matrix].Iv Methodology
This section first formulates the good feature selection problem and introduces a metric to guide the selection. We apply the stochasticgreedy method, which combines the random sampling procedure, to solve this problem in real time. We then extend this algorithm to achieve efficient feature selection. Finally, we propose to evaluate the environment’s degeneracy online to avoid illconditioning estimation.
Iva Problem Formulation
We denote as the number of all available features, as the maximum number of selected features, and as the good feature set. We denote as the metric to quantify the spectral attribute of a matrix. We formute the feature selection problem under a cardinality constraint as
(4) 
where is the information matrix on the good feature set. There are several options to define : the trace [summers2015submodularity], the minimum eigenvalue [carlone2018attention], and the log determinant [zhao2020good].
Since problem (4) is NPhard, we cannot find efficient algorithms to obtain the optimal subset for realtime applications. Fortunately, all these metrics are submodular and monotone increasing [mirzasoleiman2015lazier], allowing the solution to be approximate via greedy methods with a performance guarantee. Zhao et al. [zhao2020good] experimented with these metrics in bundle adjustment, where the log determinant was demonstrated to have the lowest pose error and computation time. We thus choose the option as our metric.
IvB Stochastic Greedy Algorithm
The class of greedy methods has been studied to solve problem (4). Here, we introduce the stochasticgreedy algorithm [mirzasoleiman2015lazier], which applies randomized acceleration to avoid bruteforce search. The idea is simple: at each round, the current best feature is picked up by examining all elements from a random subset. This is different from the simple greedy approach, which has to search the whole set. We define the size of the random subset as , where is the decay factor. The time complexity is , which is independent of . The stochasticgreedy algorithm has been proved to have nearoptimal performance in [mirzasoleiman2015lazier]:
Theorem 1: Let be the nonnegative, monotone, and submodular function. Setting the size of the random subset as . Denote as the optimal set and the result found by the stochasticgreedy algorithm. enjoys the approximation guarantees in expectation:
(5) 
IvC Good Feature Selection for Pose Estimation
Based on the theoretical results, we employ the stochasticgreedy to achieve the feature selection for pose estimation. The detailed pipeline is summarized in Algorithm 1.

Overview: The algorithm starts with the metric, the number of good features , the decay factor , the map in the reference frame , the set of all available features in the current frame , and the initial pose . It produces the good feature set .

Line : The loop is exited if one of the following conditions is satisfied: good features are found or the computation time exceeds . The second condition limits the cost of finding good features. Since is submodular with diminishing returns, early termination does not induce much information loss.

Lines –: The correspondence of each feature in the random subset is found from . The residual is then computed. If a feature has already been visited at previous iterations, we skip these steps.

Lines –: The feature which leads to the maximum enhancement of the objective is selected. After that, the information matrix , , and are updated.
Furthermore, the process of feature selection implicitly performs outlier rejection: outliers are penalized by the robust loss in (3) with relatively small weights. They contribute less to
than standard features and will be selected with a low probability. Therefore, selecting good features might reduce the biases between estimates and the ground truth.
IvD Setting the Number of Good Features
Setting a proper size for the good feature set is essential to the system. Previous methods manually set as a constant value ( [zhao2020good]) or a fixed ratio of all features ( [carlone2018attention]). These schemes are feasible if sufficient features are always available. But if a robot has to work in nonideal scenarios such as textureless walls or narrow corridors, utilizing a small set of features is not reliable. On the other hand, if we change the hardcoded number in a specific situation, it will inevitably increase the cost of deploying and maintaining a SLAM system on real platforms.
It would be better if were adaptively changed by evaluating the degeneracy online. Inspired by [zhang2016degeneracy], the magnitude of the degeneracy can be quantified by a factor . Differently, we define the factor using the log determinant metric as . Computing the information matrix on the full feature set is timeconsuming. Since the robot performs a continuous movement in an environment, it is enough to compute at every time interval ().
V GFEnhanced MultiLiDAR SLAM System
The proposed feature selection method has been verified in an LSLAM system called MLOAM [jiao2020robust]. To distinguish it from the original system, the enhanced system is denoted by MLOAMgf. MLOAMgf solves SLAM with multiple LiDARs by two algorithms: odometry and mapping. Generally, these algorithms are designed to estimate poses in a coarsetofine fashion. Since they similarly formulate the NLS problem for pose estimation, the feature selection can be applied to both. Fig. 3 illustrates the overall structure of MLOAMgf. Note that loop closure is not included.
We give real definitions to the feature set . We extract features located on local edges and planar surfaces from the LiDARs’ raw measurements. According to the points’ curvatures, we select a set of edge and planar features to form . The next step is to match features between the reference frame and the robot’s current frame. In both odometry and mapping, we use the feature map in the reference frame to associate data with . The only difference is the scale. The local map in odometry is built within a small time interval (), while the global map in mapping is constructed using all features in keyframes. For convenience, we use to denote both the local and global maps.
With the found correspondences, we can optimize the relative transformation by minimizing the sum of all errors. The good feature selection algorithm enables MLOAMgf to select only a set of features in optimization while preserving the spectrum of the information matrix. An example of good features are shown in Fig. 2. After obtaining the good feature set in Section IVC, the objective function is written as
(6) 
for which the expression of the residuals and Jacobians is detailed in our supplementary material [jiao2020supplementary].
Vi Experiment
Sequence  Greedy  Rnd  Full 

RHD01lab  
RHD02garden  
OR01  
OR02  
OR03  
OR04  
OR05  
Average 
We evaluate the performance of MLOAMgf on realworld experiments. First, we validate the stochasticgreedybased feature selection. Second, we demonstrate the localization accuracy of MLOAMgf in various scenarios covering indoor environments and outdoor urban roads with two multiLiDAR setups. Three SOTA LSLAM systems are compared. We also study MLOAMgf’s latency on an onboard processor with limited computation resources.
Via Implementation Details
We use the PCL library [rusu20113d] to process point clouds and the Ceres Solver [agarwal2015others] to solve the NLS problems. Our method is tested on sequences collected with two platforms:

Real Handheld Device (RHD) is made for indoor tests and shown in Fig. 4(a). It is installed with two VLP16^{1}^{1}1https://velodynelidar.com/products/puck. We held this device to collect two sequences (RHD01lab and RHD02garden) with an average speed of .

Oxford Robocar (OR) [barnes2019oxford] is a vehicle equipped with two HDL32E^{2}^{2}2https://velodynelidar.com/products/hdl32e. Datasets were recorded by driving the car on urban roads at an average speed of . repeated traversals of a route were collected. Groundtruth poses in are available. We select one sequence lasting minutes and split it into sequences named OR01–OR05 for evaluation.
ViB Validation on Good Feature Selection
This section validates that the greedy algorithm selects a set of valuable features with a large . Our stochasticgreedy method (label: greedy) is compared with the fully randomized selection method (label: rnd). Table I
reports the means and standard deviations (std) of
. The values with the full feature set (label: full) are provided for reference. On OR01–OR02, the greedy method gains a smaller objective than the rnd method. This is reasonable since the greedy algorithm cannot always achieve the best performance according to (5). By considering the std and the larger means on most sequences, we conclude that the greedy method outperforms the rnd method.Seq.  Dimension  MLOAMgf  MLOAMrnd  MLOAMfull  ALOAM  FLOAM  LEGOLOAM  


OR01  
OR02  
OR03  
OR04  
OR05  
Average 
ViC Performance of SLAM
We compare the accuracy, robustness, and latency of MLOAMgf with several baseline methods:

MLOAMrnd is the variant of MLOAM with the rnd feature selection module in mapping.

MLOAMfull is the original MLOAM that uses the full feature set in mapping.

ALOAM^{3}^{3}3https://github.com/HKUSTAerialRobotics/ALOAM, FLOAM^{4}^{4}4https://github.com/wh200720041/floam, and LEGOLOAM [shan2018lego]
are three SOTA, opensource LSLAM systems. All of them are the improved versions of LOAM
[zhang2014loam].
The odometry and mapping of all methods run at Hz and Hz. For a fair comparison, the loop closure modules in some baselines are deactivated. The resolutions of the voxel filter [rusu20113d] on the edge and planar features are and .
ViC1 Qualitative Comparison
We first test our method on RHD sequences. RHD01lab is recorded by moving around an office space, in which several scenes provide only poor geometric constraints. Fig. 4(b) indicates two examples. Scene 1 is a long and narrow corridor, which is a typical degenerate environment [ye2019tightly]. Scene 2 is an indoor office, providing wellconditioned constraints. MLOAMgf successfully tracks robot poses due to its capability in evaluating the environment’s degeneracy. MLOAMrnd has a sudden drift in scene 1 since using only of features cannot constrain the poses. ALOAM also fails because it cannot model the uncertainty in mapping, which is detailed in [jiao2020robust]. RHD02garden is collected in a garden. Estimated trajectories and scene images are shown in Fig. 4(c). Since the environment is wellconditioned, all trajectories are comparative.
ViC2 Localization Accuracy
We then perform a largescale outdoor test on OR sequences under repetitions. Environments on OR sequences commonly provide sufficient features. We visualize MLOAMgf’s trajectory against the ground truth and the built map on OR02 in Fig. 6, and plot trajectories the other sequences in Fig. 5. Each method is evaluated the absolute trajectory error (ATE) and the relative pose error (RPE) [zhang2018tutorial]. Due to limited space, we report the translation ATE in Table II, and show complete results in our supplementary material [jiao2020supplementary]. MLOAMgf does not just preserve the accuracy of MLOAMfull, but it also reduces the ATE on all sequences. The average translation ATE of MLOAMgf is lower than that of FLOAM (the secondbest method). The feature selection implicitly rejects outliers (see Section IVC), which is essential to such accuracy gains. Thus, MLOAMrnd also improves MLOAMfull. But its drift is larger than that of MLOAMgf. The performance of the fully randomized operation in MLOAMrnd is not guaranteed, which occasionally lead to inconsistent results.
ViC3 Latency
Experiments in the above sections are conducted on a desktop with an i7 CPU@4.2 GHz and 32 GB memory. The average latency of mapping of MLOAMgf over and frames on the RHD and OR sequences is and respectively. To demonstrate that our feature selection method boosts an LSLAM system on processors with limited resources, MLOAMgf is also tested on an Intel NUC^{5}^{5}5zh.wikipedia.org/wiki/Next_Unit_of_Computing with an i7 CPU@3.1 GHz and 8 GB memory. The average latency is reported in Table III. We run the rosbag at a low frequency to ensure no data loss.
First of all, we observe that MLOAMrnd has lower latency than MLOAMgf in the GFbased data association. This is because the rnd is an algorithm, while the stochasticgreedy algorithm is . Second, compared with MLOAMfull, MLOAMgf may need more time for feature matching but significantly save time in nonlinear optimization. Finally, MLOAMgf, MLOAMrnd, and LEGOLOAM are three realtime systems ( at each mapping frame) for the Intel NUC. Both MLOAMgf and MLOAMrnd outperform LEGOLOAM in terms of accuracy. LEGOLOAM implicitly performs feature selection since it filters out points if the distances to their correspondences are larger than a threshold. But this naive and hardcoded solution leads to large accuracy loss.
Seq.  Method  Mapping  

Data association  Optimization  Total  
RHD  MLOAMgf  
MLOAMrnd  
MLOAMfull  
ALOAM  
OR  MLOAMgf  
MLOAMrnd  
MLOAMfull  
ALOAM  
FLOAM  
LEGOLOAM 
Latency: Time delay between the input and output of a function. 
Vii Conclusion
In this paper, we propose a greedybased feature selection method for NLS pose estimation using LiDARs. The feature selector retains the most valuable LiDAR features with the objective of preserving the information matrix’s spectrum. The stochasticgreedy algorithm is applied for the realtime selection. Moreover, we also investigate the degeneracy issue of utilizing good features for pose estimation in structureless environments. We propose a strategy to adaptively change the number of good features to avoid illconditioned estimation. The feature selection is integrated into a multiLiDAR SLAM system, followed by evaluation on sequences with two sensor setups and computation platforms. The enhanced system is shown to have great efficiency and higher localization accuracy than SOTA methods. The idea of feature selection is general and can be applied to many NLS problems.
Future work will concern two possible directions. The first direction is to utilize datadriven methods [wong2020data] to online tune parameters which were mannually set. Another direction is to apply the proposed feature selection to other tasks, such as bundle adjustment [zhao2020goodgraph] and crossmodel localization [huang2020gmmloc].
Comments
There are no comments yet.