MULLS: Versatile LiDAR SLAM via Multi-metric Linear Least Square [ICRA '21]
The rapid development of autonomous driving and mobile mapping calls for off-the-shelf LiDAR SLAM solutions that are adaptive to LiDARs of different specifications on various complex scenarios. To this end, we propose MULLS, an efficient, low-drift, and versatile 3D LiDAR SLAM system. For the front-end, roughly classified feature points (ground, facade, pillar, beam, etc.) are extracted from each frame using dual-threshold ground filtering and principal components analysis. Then the registration between the current frame and the local submap is accomplished efficiently by the proposed multi-metric linear least square iterative closest point algorithm. Point-to-point (plane, line) error metrics within each point class are jointly optimized with a linear approximation to estimate the ego-motion. Static feature points of the registered frame are appended into the local map to keep it updated. For the back-end, hierarchical pose graph optimization is conducted among regularly stored history submaps to reduce the drift resulting from dead reckoning. Extensive experiments are carried out on three datasets with more than 100,000 frames collected by six types of LiDAR on various outdoor and indoor scenarios. On the KITTI benchmark, MULLS ranks among the top LiDAR-only SLAM systems with real-time performance.READ FULL TEXT VIEW PDF
This paper proposes a 3D LiDAR SLAM algorithm named Ground-SLAM, which
We propose a framework for bundle adjustment (BA) on sparse lidar points...
The Simultaneous Localization And Mapping (SLAM) problem has been well
Real-time six degree-of-freedom pose estimation with ground vehicles
This paper presents a 3D lidar SLAM system based on improved regionalize...
In this paper, we studied a line-based SLAM method for road structure ma...
Modern LiDAR-SLAM (L-SLAM) systems have shown excellent results in
MULLS: Versatile LiDAR SLAM via Multi-metric Linear Least Square [ICRA '21]
Simultaneous localization and mapping (SLAM) plays a key role in robotics tasks, including robot navigation[temeltas2008slam], field surveying[ebadi2020lamp] and high-definition map production[ma2019exploiting] for autonomous driving. Compared with vision-based [mur2015orb, forster2014svo, qin2018vins] and RGB-D-based[izadi2011kinectfusion, whelan2015real, labbe2019rtab] SLAM systems, LiDAR SLAM systems are more reliable under various lighting conditions and are more suitable for tasks demanding for dense 3D map.
The past decade has witnessed enormous development on LiDAR odometry (LO)[zhang2014loam, deschaud2018imls, graeter2018limo, neuhaus2018mc2slam, shan2018lego, behley2018efficient, chen2019suma++, kovalenko2019sensor, li2019net, ye2019tightly, zuo2019lic, lin2020livox, chen2020sloamforest, liosam2020shan, garcia2020fail, zhou2020s4] and LiDAR-based loop closure detection[he2016m2dp, dube2020segmap, kim2018scan, wang2020isc, jiang2020lipmatch, liang2020novel, chen2020overlapnet]. Though state-of-the-art LiDAR-based approaches performs well in structured urban or indoor scenes with the localization accuracy better than most of the vision-based methods, the lack of versatility hinders them from being widely used in the industry. Most of methods are based on sensor dependent point cloud representations such as, ring[zhang2014loam, zuo2019lic] and range image[shan2018lego, behley2018efficient, chen2019suma++, kovalenko2019sensor, li2019net], which requires detailed LiDAR model parameters such as, scan-line distribution and the field of view. It’s non-trivial to directly adapt them on recently developed mechanical LiDARs with unique configuration of beams and solid-state LiDARs with limited field of view[lin2020livox]. Besides, some SLAM systems target on specific application environments such as, urban road[behley2018efficient, li2020urbanslamisprs], highway[zhao2019highway], coal mine[li2018coalmine], and forest[chen2020sloamforest] using the ad-hoc framework. They tend to get stuck in degenerated corner-cases or a rapid switch of the scene.
In this work, we proposed a LiDAR-only self-contained SLAM system (as shown in Fig. 1) to cope with the aforementioned challenges. The continuous inputs of the system are sets of 3D coordinates (optionally with point-wise intensity and timestamp) without the conversion to rings or range images. Therefore the system is independent of the LiDAR’s specification and without the loss of 3D structure information. Geometric feature points with plentiful categories such as ground, facade, pillar, beam, and roof are extracted from each frame, giving rise to better adaptability to the surroundings. The ego-motion is estimated efficiently by the proposed multi-metric linear least square (MULLS) iterative closest points (ICP) algorithm based on the classified feature points. Furthermore, a submap-based pose graph optimization is employed as the back-end with loop closure constraints constructed via map-to-map global registration.
The main contributions of this work are listed as follows:
A scan-line independent LiDAR-only SLAM solution named MULLS111https://github.com/YuePanEdward/MULLS, with low drift and real-time performance on various scenarios. Currently, MULLS ranks top 10 on the competitive KITTI benchmark222http://www.cvlibs.net/datasets/kitti/eval_odometry.php.
An efficient point cloud local registration algorithm named MULLS-ICP that realizes the linear least square optimization of point-to-point (plane, line) error metrics jointly in roughly classified geometric feature points.
As the vital operation of LiDAR SLAM, point cloud registration can be categorized into local and global registration. Local registration requires good transformation initial guess to finely align two overlapping scans without stuck in the local minima while global registration can coarsely align them regardless of their relative pose.
Local registration based on ICP[besl1992method] has been widely used in LO to estimate the relatively small scan-to-scan(map) transformation. Classic ICP keeps alternately solving for dense point-to-point closest correspondences and rigid transformation. To guarantee LO’s real-time performance (mostly at 10Hz) on real-life regularized scenarios, faster and more robust variants of ICP have been proposed in recent years focusing on the sampled points for matching and the error metrics[rusinkiewicz2001pointplaneicp, segal2009generalized, censi2008icp] for optimization. As a pioneering work of sparse point-based LO, LOAM[zhang2014loam] selects the edge and planar points by sorting the curvature on each scan-line. The squared sum of point-to-line(plane) distance is minimized by non-linear optimization to estimate the transformation. As a follow-up, LeGO-LOAM[shan2018lego] conducts ground segmentation and isotropic edge points extraction from the projected range image. Then a two-step non-linear optimization is executed on ground and edge correspondences to solve two sets of transformation coefficients successively. Unlike LOAM, SuMa[behley2018efficient] is based on the dense projective normal ICP between the range image of the current scan and the surfel map. SuMa++[chen2019suma++] further improved SuMa by realizing dynamic filtering and multi-class projective matching with the assistance of semantic masks. These methods share the following limits. First, they require the LiDAR’s model and lose 3D information due to the operation based on range image or scan-line. [deschaud2018imls][zhou2020s4] operate based on raw point cloud but lose efficiency. Second, ICP with different error metrics is solved by less efficient non-linear optimization. Though efficient linear least square optimization for point-to-plane metric has been applied on LO[low2004linear, pomerleau2015review, kuhner2020large], the joint linear solution to point-to-point (plane, line) metrics has not been proposed. These limits are solved in our work.
Global registration is often required for the constraint construction in LiDAR SLAM’s back-end, especially when the odometry drifts too much to close the loop by ICP. One solution is to apply correlation operation on global features[kim2018scan, wang2020isc, jiang2020lipmatch, liang2020novel, chen2020overlapnet] encoded from a pair of scans to get the yaw angle estimation. Other solutions solve the 6DOF transformation based on local feature matching and verification among the keypoints[rusu2009fpfh] or segments[dube2020segmap]. Our system follows these solutions by adopting the recently proposed TEASER[Yang20arXiv-TEASER] algorithm. By applying truncated least square estimation with semi-definite relaxation[yang2020graduated], TEASER is more efficient and robust than methods based on RANSAC[rusu2009fpfh, mellado2014super] and branch & bound (BnB)[yang2015go, cai2019practical].
Given the point-wise timestamp of a frame, the time ratio for a point with timestamp is , where , are the timestamp at the start and the end of the frame. When IMU is not available, the transformation of to can be computed under the assumption of uniform motion:
represents the spherical linear interpolation,and stands for the rotation and translation estimation throughout the frame. The undistorted point cloud is achieved by packing all the points at the end of the frame.
The workflow of this section is summarized in Fig. 2
. The input is the raw point cloud of each frame, and the outputs are six types of feature points with primary and normal vectors.
The point cloud after optional preprocessing is projected to a reference plane, which is horizontal or fitted from last frame’s ground points. The reference plane is then divided into equal-size 2D grids. The minimum height of the points in each grid and in its neighboring grids, denoted as and , are recorded respectively. With two thresholds , , each point in grid is classified as a roughly determined ground point or a nonground point by:
To refine the ground points, RANSAC is employed in each grid to fit the grid-wise ground plane. Inliers are kept as ground points and their normal vectors are defined as the surface normal of the grid-wise best-fit planes.
Nonground points are downsampled to a fixed number and then fed into the principal components analysis (PCA) module in parallel. The point-wise K-R neighborhood is defined as the nearest points within a sphere of radius . The covariance matrix for is calculated as , where is the center of gravity for
. Next, eigenvalues
and the corresponding eigenvectorsare determined by the eigenvalue decomposition of , where are the primary and the normal vector of . Then the local linearity , planarity , and curvature [hackel2016fast] are defined as .
According to the size of local feature and the direction of , five categories of feature points can be extracted, namely facade , roof , pillar , beam , and vertex . To refine the feature points, non-maximum suppression (NMS) based on are applied on linear points (, ), planar points (, ) and vertices respectively, followed by an isotropic downsampling. Together with , the roughly classified feature points are packed for registration.
Based on the extracted feature points, the neighborhood category context (NCC) is proposed to describe each vertex keypoint with almost no other computations roughly. As shown in (3), the proportion of feature points with different categories in the neighborhood, the normalized intensity and height above ground are encoded. NCC is later used as the local feature for global registration in the SLAM system’s back-end (III-E):
This section’s pipeline is summarized in Fig. 3. The inputs are the source and the target point cloud composed of multi-class feature points extracted in III-B, as well as the initial guess of the transformation from the source point cloud to the target point cloud . After the iterative procedure of ICP[besl1992method], the outputs are the final transformation estimation and its accuracy evaluation indexes.
For each iteration, the closest point correspondences are determined by nearest neighbor (NN) searching within each feature point category (, , , , , ) under an ever-decreasing distance threshold. Direction consistency check of the normal vector and the primary vector are further applied on the planar points (, , ) and the linear points (, ) respectively. Constraints on category, distance and direction make the point association more robust and efficient.
Suppose are corresponding points in the source point cloud and the target point cloud, the residual after certain rotation and translation of the source point cloud is defined as:
With the multi-class correspondences, we expect to estimate the optimal transformation that jointly minimized the point-to-point, point-to-plane and point-to-line distance between the vertex , planar (, , ) and linear (, ) point correspondences. As shown in Fig. 4, the point-to-point (plane, line) distance can be calculated as , , and respectively from from the residual, normal and primary vector (, , ). Thus, the transformation estimation is formulated as a weighted least square optimization problem with the following target function:
where is the weight for each correspondence. Under the tiny angle assumption, can be linearized as:
where , and are roll, pitch and yaw, respectively under Euler angle convention. Regarding the unknown parameter vector as , (5) becomes a weighted linear least square problem. It is solved by Gauss-Markov parameter estimation with the functional model and the stochastic model , where
is a prior standard deviation, andis the weight matrix. Deduced from (5), the overall design matrix and observation vector is formulated as:
where the components of and for each correspondence under point-to-point (plane, line) error metric are defined as:
Therefore, the estimation of unknown vector is:
where both and can be accumulated efficiently from each correspondence in parallel. Finally, the transformation matrix is reconstructed from .
Throughout the iterative procedure, a multi-strategy weighting function based on residuals, balanced direction contribution and intensity consistency is proposed. The weight for each correspondence is defined as , whose multipliers are explained as follows.
First, to better deal with the outliers, we present a residual weighting function deduced from the general robust kernel function covering a family of M-estimators[barron2019general]:
where is the coefficient for the kernel’s shape, is the normalized residual and is the inlier noise threshold. Although an adaptive searching for the best is feasible [chebrolu2020adaptive], it would be time-consuming. Therefore, we fix in practice, thus leading to a pseudo-Huber kernel function.
Second, the contribution of the correspondences is not always balanced on x, y, z direction. The following weighting function considering the number of correspondences from each category is proposed to guarantee the observability.
Third, since the intensity channel provides extra information for registration, (12) is devised to penalize the contribution of correspondences with large intensity inconsistency.
After the convergence of ICP, the posterior standard deviation and information matrix of the registration are calculated as:
where , together with the nonground point overlapping ratio , are used for evaluating the registration quality.
and a map management module. With an incoming scan, roughly classified feature points are extracted and further downsampled into a sparser group of points for efficiency. Besides, a local map containing static feature points from historical frames is maintained with the reference pose of the last frame. Taking the last frame’s ego-motion as an initial guess, the scan-to-scan MULLS-ICP is conducted between the sparser feature points of the current frame and the denser feature points of the last frame with only a few iterations. The estimated transformation is provided as a better initial guess for the following scan-to-map MULLS-ICP. It regards the feature points in the local map as the target point cloud and keeps iterating until the convergence. The sparser group of feature points are appended to the local map after the map-based dynamic object filtering. For this step, within a distance threshold to the scanner, those nonground feature points far away from their nearest neighbors with the same category in the local map are filtered. The local map is then cropped with a fixed radius. Only the denser group of feature points of the current frame are kept for the next epoch.
As shown in Fig. 5(b), the periodically saved submap is the processing unit. The adjacent and loop closure edges among the submaps are constructed and verified by the certificated and efficient TEASER[Yang20arXiv-TEASER]
global registration. Its initial correspondences are determined according to the cosine similarity of NCC features encoded inIII-B among the vertex keypoints. Taking TEASER’s estimation as the initial guess, the map-to-map MULLS-ICP is used to refine inter-submap edges with accurate transformation and information matrices calculated in III-C. Those edges with higher or lower than thresholds would be deleted. As shown in Fig. 6, once a loop closure edge is added, the pose correction of free submap nodes is achieved by the inter-submap PGO. Finally, the inner-submap PGO fixes each submap’s reference frame and adjusts the other frames’ pose.
The proposed MULLS system is evaluated qualitatively and quantitatively on KITTI, MIMAP and HESAI datasets, covering various outdoor and indoor scenes using six different types of LiDAR, as summarized in Table I. All the experiments are conducted on a PC equipped with an Intel Core i7-7700HQ@2.80GHz CPU for fair comparison.
|TABLE I: The proposed method is evaluated on three datasets|
|collected by various LiDARs in various scenarios.|
|KITTI||HDL64E||urban, highway, country||22||43k|
|HESAI||PandarQT, XT, 128||residential, urban, indoor||8||30k|
The quantitative evaluations are done on the KITTI Odometry/SLAM dataset [geiger2012we], which is composed of 22 sequences of data with more than 43k scans captured by a Velodyne HDL-64E LiDAR. KITTI covers various outdoor scenarios such as, urban road, country road, and highway. Seq. 0-10 are provided with GNSS-INS ground truth pose while seq. 11-21 are used as the test set for online evaluation and leaderboard without provided ground truth. Following the odometry evaluation criterion in [geiger2012we], the average translation & rotation error (ATE & ARE) are used for localization accuracy evaluation. The performance of the proposed MULLS-LO (w/o loop closure) and MULLS-SLAM (w/ loop closure) as well as seven state-of-the-art LiDAR SLAM solutions (results taken from their original papers and KITTI leaderboard) are reported in Table II. MULLS-LO achieves the best ATE (0.49%) on seq. 00-10 and runner-up on online test sets. With loop closure and PGO, MULLS-SLAM gets the best ARE (0.12/100m) but worse ATE. This is because ATE and ARE can’t fully reflect the improvement of global positioning accuracy since they are measured locally within a distance up to 800m. Besides, the maps and trajectories constructed from MULLS for two representative sequences are shown in Fig. 7 and 8. Specifically, our method surpasses state-of-the-art methods with a large margin. It has only a lane-level drift in the end on the featureless and highly dynamic highway sequence, as shown in Fig. 8(c).
|TABLE II: Quantitative evaluation on KITTI Dataset. Red and blue fonts denote the first and second place, respectively.|
|(U: urban road, H: highway, C: country road, *: with loop closure, -: not available)|
The ISPRS Multi-sensor Indoor Mapping and Positioning dataset[wang2020isprs] is collected by a backpack mapping system with Velodyne VLP32C and HDL32E in a 5-floor building. High accuracy terrestrial laser scanning (TLS) point cloud of floor 1 & 2 scanned by Rigel VZ-1000 is regarded as the ground truth for map quality evaluation. Fig. 9 demonstrates MULLS’s indoor mapping quality with a 6.7cm mean mapping error, which is calculated as the mean distance between each point in the map point cloud and the corresponding nearest neighbor in TLS point cloud.
HESAI dataset is collected by three kinds of mechanical LiDAR (Pandar128, XT, and QT Lite). Since the ground truth pose or map is not provided, the qualitative result of the consistent maps built by MULLS-SLAM indicate MULLS’s ability for adapting to LiDARs with different number and distribution of beams in both the indoor and outdoor scenes, as shown in Fig. 10.
|TABLE III: Ablation study w.r.t. geometric feature points|
|(-: LiDAR odometry failed)|
|KITTI 00 U||KITTI 01 H|
|ATE / ARE||ATE / ARE|
|✓||✓||✓||✓||✗||0.53 / 0.20||0.62 / 0.09|
|✓||✓||✓||✗||✗||0.51 / 0.18||1.02 / 0.16|
|✓||✓||✗||✗||✗||0.54 / 0.21||- / -|
|✓||✗||✓||✗||✗||0.92 / 0.28||- / -|
|✓||✗||✓||✓||✗||0.70 / 0.33||0.68 / 0.09|
|✓||✓||✓||✓||✓||0.57 / 0.23||0.92 / 0.11|
Further, in-depth performance evaluation on the adopted categories of geometric feature points and the weighting functions are conducted to verify the proposed method’s effectiveness. Table III shows that vertex point-to-point correspondences undermine LO’s accuracy on both sequences because points from regular structures are more reliable than those with high curvature on vegetations. Therefore, vertex points are only used in back-end global registration in practice. Besides, linear points (pillar and beam) are necessary for the supreme performance on highway scene. Points on guardrails and curbs are extracted as linear points to impose cross direction constraints. As shown in Table IV, all the three functions presented in III-C are proven effective on both sequences. Specifically, translation estimation realizes obvious improvement on featureless highway scenes with the consideration of intensity consistency.
|TABLE IV: Ablation study w.r.t. weighting function|
|weighting function||KITTI 00 U||KITTI 01 H|
|ATE / ARE||ATE / ARE|
|✓||✓||✓||0.51 / 0.18||0.62 / 0.09|
|✗||✗||✗||0.57 / 0.23||0.87 / 0.13|
|✗||✓||✓||0.52 / 0.19||0.66 / 0.10|
|✓||✗||✓||0.54 / 0.22||0.76 / 0.12|
|✓||✓||✗||0.53 / 0.20||0.83 / 0.11|
Besides, detailed analysis on the average runtime and iteration number of the registration with two different parameter settings is performed. As shown in Table V, is the total runtime for registration while , , and are the average runtime per iteration for correspondence determination, linear least square estimation (used by MULLS), Levenberg-Marquardt (LM) analytical optimization (used by LOAM[zhang2014loam]) and LM with auto-derivation (used by A-LOAM333https://github.com/HKUST-Aerial-Robotics/A-LOAM). It is shown that linear least square is more than five times faster than non-linear optimization methods such as LM. Moreover, Fig. 11 shows MULLS’s detailed timing report of four typical sequences from KITTI and HESAI dataset with different number of incoming points per frame. MULLS operates faster than 10Hz on average for all of the four sequences, though it sometimes runs beyond 100ms on sequences with lots of loops such as Fig. 11(a). Since the loop closure can be implemented on another thread, the real-time property of MULLS on moderate PC can be guaranteed.
|TABLE V: Runtime analysis for the registration module|
|(the first two columns indicate the average feature point number)|
|Source||Target||# Iter.||Time (ms)|
In this work, we present a versatile LiDAR-only SLAM system named MULLS, which is based on the efficient multi-metric linear least square ICP. Experiments on more than 100 thousand frames collected by six different LiDARs show that MULLS manages to achieve real-time performance with low drift and high map quality on various challenging outdoor and indoor scenarios regardless of the LiDAR specifications.