Direct LiDAR Odometry: Fast Localization with Dense Point Clouds

10/01/2021 ∙ by Kenny Chen, et al. ∙ NASA 0

This paper presents a light-weight frontend LiDAR odometry solution with consistent and accurate localization for computationally-limited robotic platforms. Our Direct LiDAR Odometry (DLO) method includes several key algorithmic innovations which prioritize computational efficiency and enables the use of full, minimally-preprocessed point clouds to provide accurate pose estimates in real-time. This work also presents several important algorithmic insights and design choices from developing on platforms with shared or otherwise limited computational resources, including a custom iterative closest point solver for fast point cloud registration with data structure recycling. Our method is more accurate with lower computational overhead than the current state-of-the-art and has been extensively evaluated in several perceptually-challenging environments on aerial and legged robots as part of NASA JPL Team CoSTAR's research and development efforts for the DARPA Subterranean Challenge.



There are no comments yet.


page 1

page 4

page 5

page 6

page 7

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Accurate state estimation and mapping in large, perceptually-challenging environments has become a critical capability for autonomous mobile robots. Whereas typical visual SLAM approaches often perform poorly in dust, fog, or low-light conditions, LiDAR-based methods can provide more reliable localization due to the superior range and accuracy of direct depth measurements. However, recent work on LiDAR odometry (LO) has revealed the challenges of processing the large number of depth returns generated by commercial LiDAR sensors in real-time for high-rate state estimation [13, 4]. This work will present several algorithmic innovations that make real-time localization with dense LiDAR scans feasible while also demonstrating the superiority of our method in terms of accuracy and computational complexity when compared to the state-of-the-art.

Current LO algorithms estimate a robot’s egomotion in two stages: first, by performing a “scan-to-scan” alignment between adjacent LiDAR frames to recover an immediate motion guess, followed by a “scan-to-map” registration between the current scan and past environmental knowledge to increase global pose consistency. Unfortunately, the large number of data points per scan from modern LiDARs can quickly overwhelm computationally-limited processors and bottleneck performance during alignment, which can ultimately induce frame drops and cause poor pose estimation. More specifically, “scan-to-scan” alignment requires a registration of corresponding points between two clouds, but this process often involves a nearest-neighbor search which grows exponentially with the number of points per scan. Feature-based methods

[13, 12, 15, 11]

attempt to mitigate this by using only the most salient points, but these methods require an often computationally-intensive feature extraction step and may accidentally discard data which could otherwise help anchor the downstream registration. Moreover, in “scan-to-map” alignment, keyed environmental history (which consists of all or a subset of past points) can grow rapidly in size as new scans are acquired and stored in memory by the system, significantly expanding the nearest-neighbor search space for typical submap extraction methods. Although tree-based data structures have been shown to decrease this nearest-neighbor search cost significantly

[8], the extraction of a local submap can still involve on the order of hundreds of thousands of points after just a few keyframes and prevent consistent performance for long-term navigation.

Fig. 1: Fast and light-weight LiDAR odometry. Two of Team CoSTAR’s robotic platforms which are limited in computational resources partially due to weight limitations. (A) Our custom quadrotor platform which features an Ouster OS1 LiDAR sensor on top. (B) Boston Dynamics’ Spot robot with a mounted custom payload and a Velodyne VLP-16 with protective guards. (C) Top-down view of a mapped limestone mine using our lighweight odometry method on these robots during testing and integration for the DARPA Subterranean Challenge.
Fig. 2: LiDAR odometry architecture. Our system first retrieves a relative transform between two temporally-adjacent scans of times and

through “scan-to-scan” (S2S) matching with RANSAC outlier rejection and an optional rotational prior from IMU. This initial estimate is then propagated into the world frame and used as the initialization point for our secondary GICP module for “scan-to-map” optimization (S2M), which scan-matches the current point cloud

with a derived submap consisting of scans from nearby and boundary keyframes. The output of this is our globally-consistent pose estimate which is subsequently checked against several metrics to determine if the current pose should be stored as a new keyframe or not.

In this letter, we will present the Direct LiDAR Odometry (DLO) algorithm – a high-speed and computationally-efficient frontend localization solution which permits the direct use of raw point cloud scans without significant preprocessing. A key insight of our work is the link between algorithmic speed and accuracy. Our contributions are as follows. First, a custom “speed-first” pipeline that can accurately resolve robot egomotion in real-time using minimally-preprocessed LiDAR scans and an optional IMU on consumer-grade processors. Second, a novel keyframing system which adapts to environmental signals and permits fast and permissive submap generation via convex optimization. Third, several important algorithmic insights developed from real-world implementation to further reduce computational overhead, which resulted in NanoGICP, our custom iterative closest point solver for light-weight point cloud scan-matching with cross-object data sharing, And finally, extensive evaluation of our methods in several challenging environments on computationally-limited robotic platforms as part of Team CoSTAR’s research and development efforts for the DARPA Subterranean Challenge.

Related Work

LiDAR-based odometry is typically cast as a non-linear optimization problem to calculate a best-fit homogeneous transform which minimizes the error across corresponding, i.e., matching, points and/or planes between two point clouds. Since correspondences are not known a priori, algorithms such as the iterative closest point (ICP) algorithm [3] or other variants like Generalized ICP (GICP) [10] have become the standard aligning two point clouds. However, searching over all data points can be computationally costly. Feature-based methods attempt to extract and use only the most salient points before scan-matching to reduce computation. Such features are found either via hand-tuned methods [17] or through learned networks [16] and include planes [15], edges and lines [11, 12], or ground points [13], and these works aim to translate insights gained from visual odometry (VO) techniques into the 3D domain. However, adding this step increases computational overhead and risks discarding data points which could help with better correspondence matching for odometry accuracy. Alternatively, direct methods attempt to align dense point clouds but current approaches either heavily down-sample [8] or use a filtering-based framework [14] to achieve real-time performance.

Nevertheless, a second stage immediately following scan alignment between adjacent clouds has been shown to help with global pose estimation consistency across a history of past scans [8, 4]. In “scan-to-map,” the transformation is further refined by aligning the current scan with an existing in-memory map. In practice, aligning with a submap (rather than the full history of scans) helps with computational efficiency, and this submap is typically derived by retrieving nearby map points within some radius of the robot’s current position. This search in “point-space,” however, can quickly explode in computational expense due to the sheer number of data points per scan, and while there are techniques to mitigate this such as only incrementally storing map data via keyframe clouds [8, 11], this search can still involve thousands of operations and can quickly bottleneck system performance as time goes on.

To address these issues, DLO only minimally preprocesses point clouds to provide accurate pose estimates even for robots with limited computational resources. The key contribution of our work lies in how we efficiently derive our submap for global refinement in scan-to-map matching. That is, rather than extracting points within a local vicinity of a robot’s current position as most works do, DLO instead searches in keyframe-space by associating a scan’s set of points with its corresponding keyframe position. The submap is subsequently constructed by concatenating the clouds from a subset of historic keyframes derived from nearby keyframes and those which make up the convex hull; this provides the current scan with both nearby and distant points in the submap to anchor to. In addition, a custom GICP solver enables extensive reuse of data structures across multiple solver instantiations to reduce redundant operations across the two-stage process. Our system also optionally accepts a relative prior from IMU in a loosely-coupled fashion to further improve accuracy in the optimization process, which can help especially during aggresive rotational motions. The reliability of our approach is demonstrated through extensive tests on multiple computationally-limited robotic platforms in several environments as part of Team CoSTAR’s research and development efforts for the DARPA Subterranean Challenge in support of NASA JPL’s Networked Belief-aware Perceptual Autonomy (NeBula) framework [1], in which DLO was the primary state estimation component on our fleet of autonomous aerial vehicles (Fig. 1A).

Ii Methods

Ii-a Notation

A point cloud, , is composed of a set of points with Cartesian coordinates . We denote {} as the LiDAR’s coordinate system, {} as the robot’s coordinate system located at the base link, and {} as the world coordinate system which coincides with {} at the initial position. Note that in this work we assume {} and {} reference frames coincide. Additionally, for optional optimization priors via proprioceptive sensors as described in Section II-D, we also assume that these frames coincide with {}, and subsequently {}, for convenience. We adopt standard convention such that points forward, points left, and points upward, and our work attempts to address the following problem: given adjacent point clouds scans and at time , estimate the robot’s current pose , trajectory , and map , all in {}.

Ii-B Preprocessing

Our system assumes an input of 3D point cloud data gathered by a 360 LiDAR such as an Ouster OS1 or Velodyne VLP-16 (although our methods are not limited to these two). To minimize information loss from the raw sensor data, only two filters are used during preprocessing: first, we remove all point returns that may be from the robot itself through a box filter of size m around the origin. This is especially important if the LiDAR’s field of view is within range of an aerial robot’s propellers (Fig. 1A) or is surrounded by protective guards (Fig. 1B). The resulting cloud is then sent through a 3D voxel grid filter with a resolution of m to lightly downsample the data for downstream tasks while maintaining dominate structures within the surrounding environment; the output of this is then used for subsequent tasks in the pipeline. Note that we do not correct for motion distortion in the point cloud as we assume that any distortion from robot movement is negligible due to high frame rate, and we directly use the dense point cloud rather than extracting features. On average, each cloud contains points after preprocessing.

1 0.9 input: , ; initialize: = I or gravityAlign()
2 output: ,
3 while  do
        blue// bluepreprocessing
4        preprocessPoints() ;
5        computeAdaptiveParameters() ;
        blue// blueinitialization
6        if  then
7               , NanoGICP.buildTargetTree() ;
8               updateKeyframeDatabase(, ) ;
9               continue;
11        end if
       blue// blueprior
12        if IMU then integrateGyro() else  I ;
         blue// bluescan-to-scan
13        , NanoGICP.buildSourceTree() ;
14        NanoGICP.align(, , , , ) ;
15        ;
        blue// bluescan-to-map
16        ; ;
17        getKeyframeNeighbors(, ) ;
18        getKeyframeHulls(, ) ;
19        ;
20        , NanoGICP.buildTargetTree() ;
21        NanoGICP.align(, , , , ) ;
        blue// blueupdate keyframe database and map
22        updateKeyframeDatabase(, ) ;
23        ;
        blue// bluereuse structs for next iteration
24        ; ;
25        return ,
27 end while
Algorithm 1 Direct LiDAR Odometry

Ii-C Scan Matching via Generalized-ICP

LiDAR-based odometry can viewed as the process of resolving a robot’s egomotion by means of comparing successive point clouds and point clouds in-memory to recover an transformation, which translates to the robot’s 6-DOF motion between consecutive LiDAR acquisitions. This process is typically performed in two stages, first to provide a best instantaneous guess, which is subsequently refined to be more globally consistent.

Ii-C1 Scan-to-Scan

In the first stage, the scan-to-scan matching objective is to compute an optimal relative transform between a source and a target (where = ) captured in such that:


The residual error from GICP is defined as:


such that the overall objective for this stage is:


for number of corresponding points between point clouds and , where , , and and are the corresponding estimated covariance matrices associated with each point of the source or target cloud, respectively, using ten nearest neighbors. As will be further discussed in Section II-D, we can initialize the above objective function with a prior supplied by external sensors or odometry in an attempt to converge towards a global minimum. That is, for Eq. (3), if a prior is available by means of IMU preintegration, we can set the initial guess = to create a loosely-coupled system. If a prior is not available however, the system reverts to pure LiDAR odometry in which = I and relies solely on point cloud correspondence matching for this step.

Ii-C2 Scan-to-Map

After recovering an initial robot motion estimate, a secondary stage of scan-to-map matching is performed and follows a similar procedure to that of scan-to-scan. However, rather than computing a relative transform between two instantaneous point clouds, the objective here is to further refine the motion estimate from the previous step to be more globally-consistent by means of matching with a local submap. In other words, the task here is to compute an optimal transform between the current source cloud and some derived submap such that:


After similarly defining the residual error from GICP as in Eq. (2), the overall objective function for scan-to-map is:


for number of corresponding points between point cloud and submap , where is the corresponding scan-stitched covariance matrix for point in the submap as defined later in Section II-F. Optimization Eq. (5) is initialized using the propagated result from scan-to-scan in the previous section from to , i.e. = , so that this prior motion can be compared against historical map data for global consistency. The output of this stage is the final estimated robot pose.

We note here that a key innovation of this work is how we derive and manage our submap for this stage: whereas previous works create a submap by checking the locality of each point individually in a stored map, we associate scans to keyframes and search rather in keyframe-space to stitch point clouds together and create . The implications of this include a far faster and more consistent submap creation process each iteration, which is additionally more permissive as compared to a radius-based search and will be further discussed in Section II-E.

Ii-D Optimization Prior

Eq. (3) describes the scan-to-scan non-linear optimization problem and can be initialized with a prior to reduce the chances of converging into a sub-optimal local minima. This prior represents an initial guess of the relative motion between two LiDAR frames and can come from integrating angular velocity measurements an inertial measurement unit (IMU). More specifically, angular velocity measurements from an inertial measurement unit can be defined as as is assumed to be measured in with static bias

and zero white noise

for convenience. After calibrating for the bias and subtracting it off from subsequent raw measurements to provide , a relative rotational motion of the robot’s body between two LiDAR frames can be computed via gyroscopic propagation of the quaternion kinematics , where is initialized to identity prior to integration, is the difference in time between IMU measurements in seconds, and only gyroscopic measurements found between the current LiDAR scan and the previous one are used. Note that we are only concerned with a rotational prior during IMU preintegration and leave the retrieval of a translational prior for future work. The resultant quaternion of this propagation is converted to an matrix with zero translational component to be used as , the scan-to-scan prior.

Ii-E Fast Keyframe-Based Submapping

Fig. 3: Keyframe-based submapping. A comparison between the different submapping approaches, visualizing the current scan (white), the derived submap (red), and the full map (blue). (A) A common radius-based submapping approach of = m retrieved in point cloud-space. (B) Our keyframe-based submapping approach, which concatenates a subset of keyed scans and helps anchor even the most distant points in the current scan (green box) during the scan-to-map stage.

A key innovation of this work lies in how our system manages map information and derives the local submap in scan-to-submap matching for global egomotion refinement. Rather than working directly with point clouds and storing points into a typical octree data structure, we instead create a secondary point cloud of keyframes to search within, in which each keyframe is linked to its corresponding point cloud scan in a key-value dictionary. The resulting submap used for scan-to-submap matching is then created by concatenating the corresponding point clouds from a subset of the keyframes, rather than directly retrieving local points within some radius of the robot’s current position.

The implications of this are twofold: first, by searching in “keyframe-space” rather than “point cloud-space,” a much more computationally-tractable problem is presented to the system when constructing a submap. Radius-based searches within a cumulative point cloud map can require distance calculations against hundreds of thousands of points, and even with an incremental octree data structure this process can quickly become infeasible and rapidly increase the computational expense over time as the number of points increases. Searching against keyframes however typically involves only a few hundred points even after long traversals and provides much more consistent computational performance, reducing the chance of a frame drop due to slow processing.

Additionally, a keyframe-based approach can construct a much more permissive submap as compared to range-based methods. That is, since the size of a submap derived from keyframe point clouds relies solely on the LiDAR sensor’s range rather than a predetermined distance, the derived submap can have a larger overlap with the current scan; this is illustrated in Fig. 3. In this example, a submap of fixed radius = m insufficiently overlaps with the current scan and can introduce drift over time due to containing only spatially-nearby points; however, a keyframe-based approach covers most of the current scan which can help with better global consistency. Expanding the radius size may help increase this overlap for radius-based methods, but doing so can significantly slowdown downstream tasks such as the point normal calculations for GICP.

Ii-E1 Keyframe Selection via kNN and Convex Hull

Fig. 4: Keyframe selection and adaptive thresholds. (A) Our method’s submap (red) is generated by concatenating the scans from a subset of keyframes (green spheres), which consists of nearest neighbor keyframes and those that construct the convex hull of keyframe set. (B) An illustration of adaptive thresholding. In this scenario, the threshold decreases when traversing down a narrow ramp to better capture small-scale details.

To construct the submap , we concatenate the corresponding point clouds from a selected subset of environmental keyframes. Let be the set of all keyframe point clouds such that . We define submap as the concatenation of nearest neighbor keyframe scans and nearest neighbor convex hull scans such that , where the indices which specify the convex hull are defined by the set of keyframes which make up the intersection of all convex sets containing the keyframes which compose .

The result of this is illustrated in Fig. 4

A, in which the keyframes highlighted in green are those that compose the extracted submap, indicated in red. Intuitively, extracting nearest neighbor keyframes aims to help with overlap of nearby points in the current scan, while those from the convex hull — which contain boundary map points — increase the overlap with more distant points in the scan. This combination aims to reduce overall trajectory drift by providing the system with multiple scales of environmental features to align with. Note that keyframes which are classified as both a nearest neighbor and a convex hull index are only used once to reduce redundancy in the submap.

Ii-E2 Adaptive Keyframing

The location of environmental keyframes affects the derived submap and can subsequently influence accuracy and robustness of the odometry. Keyframe nodes are commonly dropped using fixed thresholds (e.g., every m or of translational or rotational change) [8, 11, 12], but the optimal position can be highly dependent on a surrounding environment’s structure. More specifically, in large-scale settings, features captured by the point cloud scan are much more prominent and can be depended on for longer periods of time; however, for narrower, smaller-scale environments, a smaller threshold is necessary to continually capture the small-scale features (i.e., tight corners) in the submap for better localization. Thus, we choose to scale the translational threshold for new keyframes according to some notion of “spaciousness” in the instantaneous point cloud scan, defined as , where is the median Euclidean point distance from the origin to each point in the preprocessed point cloud, = , and = chosen empirically, and is the smoothed signal used to scale keyframe distance threshold such that: = m if m; = m if mm; = m if mm; and = m if m, with rotational threshold held fixed at . Fig. 4B illustrates the effects of this adaptive thresholding.

Ii-F Algorithmic Implementation

Element Scan-to-Scan Scan-to-Map
build when
TABLE I: Summary of Data Structure Recycling

Ii-F1 Scan-Stitched Submap Normals

Generalized-ICP involves minimizing the plane-to-plane distance between two clouds, in which these planes are modeled by a computed covariance for each point in the scan. Rather than computing the normals for each point in the submap on every iteration (which can be infeasible for real-time operation), we assume that the set of submap covariances can be approximated by concatenating the normals from keyframes which populate the submap such that . As a consequence, each submap’s set of normals need not be explicitly computed, but rather just reconstructed by stitching together those calculated previously.

Ii-F2 Data Structure Recycling

Expanding on the above idea, we’ve in fact identified several redundant instances in our pipeline which can be benefit from data structure sharing between modules; this helps to reduce overall system overhead by removing unnecessary computations in the loop. Summarized in Table I, the system requires eight total elements to successfully perform scan-to-scan and scan-to-map matching: a kdtree used to search for point correspondences, and the set of point covariances , for both source and target clouds in each scan-matching process.

Out of the four required kdtrees data structures, only two need to be built explicitly. That is, the tree for the source (input) cloud can be built just once per scan acquisition and shared between both modules (as the same scan is used for both sources). For the scan-to-scan target tree , this is simply just the previous iteration’s scan-to-scan source tree and thus can be propagated. The scan-to-map target tree needs to be built explicitly, but since the submap is derived from a set of keyframes, this build only needs to be performed when the set of keyframes that create the submap changes, such that . Otherwise, the data structure can just be reused again for additional computational savings. Point covariances needed for GICP, on the other hand, only need to be computed once per scan aquisition, and its data can be shared directly in the other three instances.

Fig. 5: Alpha course map. Different views and angles of the dense 3D point cloud map generated using our DLO algorithm on the Urban Alpha dataset. Estimated positions at each timestamp were used to transform the provided scan into a world frame; this was performed for all scans across the dataset and concatenated / voxel filtered to generated the above images.
Fig. 6: Error comparison. The absolute pose error plotted across a s window of movement, showing the difference between radius and keyframe submapping schemes. Keyframe-based approaches do not have the range restriction that radius-based approaches inherently contain, which directly translates to a lower error in odometry due to more perceptive submapping.

Ii-F3 Dual NanoGICP

To facilitate the cross-talking between scan-matching modules, we assembled NanoGICP, a custom iterative closest point solver which combines the FastGICP[6] and NanoFLANN[2]open-source packages with additional modifications for data structure sharing as described above. In particular, NanoGICP uses NanoFLANN to efficiently build kdtree data structures, which are subsequently used for point cloud correspondence matching by FastGICP. In practice, data structure sharing is performed between two separate NanoGICP instantiations — one to target each scan-matching problem — and done procedurally as detailed in Algorithm 1.

Iii Results

Fig. 7: Average convergence time. A comparison of average convergence times across benchmark alignments for each algorithm, including our NanoGICP solver and two other open-source GICP packages.
Fig. 8: Ablation study of data recycling schemes. Box plots of the processing time and CPU usage for four different recycling variants, ranging from no data structure reuse to partial reuse and full reuse.
None KDTrees Covariances Both
% Scans
TABLE II: Dropped LiDAR Scans per Recycling Scheme
Method Alpha Course Beta Course CPU Usage
APE [m] ME [m] APE [m] ME [m] No. of Cores
max mean std rmse max mean std rmse max mean
BLAM[7] 3.44 1.01 0.94 0.43 3.89 2.27 0.89 1.27 1.14 0.93
Cartographer[5] 5.84 2.91 1.60 1.05 2.64 1.37 0.67 0.31 1.75 0.88
LIO-Mapping[15] 2.12 0.99 0.51 0.45 1.60 1.18 0.22 0.61 1.80 1.53
LOAM[17] 4.33 1.38 1.19 0.60 2.58 2.11 0.44 0.99 1.65 1.41
LOCUS[8] 0.63 0.26 0.18 0.28 1.20 0.58 0.39 0.48 3.39 2.72
DLO 0.40 0.18 0.06 0.19 0.50 0.16 0.09 0.19 0.92 0.62
TABLE III: Comparison on Benchmark Datasets

Iii-a Component Evaluation

To investigate the impact of our system’s components, including keyframe-based submapping, submap normal approximation, and the reuse of data structures, we compare each component with its counterpart using the Alpha Course dataset from the Urban circuit of the DARPA Subterranean Challenge. This dataset contains LiDAR scans from a Velodyne VLP-16 sensor, in addition to IMU measurements from a VN-100, collected across 60 minutes in an abandoned powerplant located in Elma, WA which contains multiple perceptual challenges such as large or self-similar scenes (Fig. 5). For these component-wise evaluations, data was processed using a 4-core Intel i7 1.30GHz CPU.

Iii-A1 Keyframe-Based Submapping

We compared the absolute pose error (APE), processing time, and CPU load across three submapping schemes, including: radius-based ( = m), keyframe-based with a m static threshold, and keyframe-based with adaptive thresholding. For keyframe-based variants, we used 10 nearest-neighbor and 10 convex hull keyframes for submap derivation. From Fig. 6, the influence of our approach is clear: submapping in keyframe-space can significantly reduce positional error by considering more distant points that would otherwise be outside the scope of a radius-based approach. These additional points influence the outcome of the GICP optimization process as they are considered during error minimization for the optimal transform; this is especially important in purely frontend-based odometry, since any additional error in pose can quickly propagate over time due to drift. Processing time and CPU load showed similar trends: radius-based processed each scan notably slower at ms per scan with an average of CPU load as compared to ms / and ms / for static and adaptive schemes, respectively.

Fig. 9: Extreme environments. Top: Part of an underground mine in Lexington, KY mapped autonomously using our custom drone while running DLO. This environment contains challenging conditions such as: (A) low illuminance, (B) object obstructions, and (C) wet and muddy terrain. Bottom: Top-down (D) and side view (E) of the three levels of an abandoned subway located in Downtown Los Angeles, mapped via DLO using a Velodyne VLP-16. In this run, we manually tele-operated the legged robot to walk up, down, and around each floor for a total of m.

Iii-A2 Data Structure Recycling

To evaluate the effectiveness of data reusage, we measured and compared the processing time and CPU usage between different recycling schemes via a box plot (Fig. 8) and percentage of dropped scans over the dataset (Table II). In a naive system which explicitly calculates each kdtree and cloud covariance, computation time exceeded LiDAR rate (10Hz for Velodyne) with a high average of ms per scan and nearly of scans dropped due to high processing time. Recycling kdtrees but not covariances provides a slight improvement in processing time and CPU percentage, while recycling covariances but not kdtrees provides a more prominent performance boost; this is reasonable since our covariance recycling scheme is more aggressive than kdtree reusage. Finally, using the full scheme as detailed in Table I significantly decreases both metrics, with an average processing time of ms and CPU load, which prevents any LiDAR frames from dropping.

Iii-A3 NanoGICP

To compare NanoGICP with the state-of-the-art, we use FastGICP’s [6] benchmark alignment code found in the authors’ open-source repository. This benchmark measures the average convergence time to align two LiDAR scans across runs, and we compare against PCL’s [9] GICP implementation as well as FastGICP’s multithreaded implementation. Note that we do not compare against the voxelized FastGICP variant, since this method approximates planes with groups of planes and decreases overall accuracy. As shown in Fig. 7, we observed that NanoGICP converged faster on average (ms) when compared to FastGICP (ms) and PCL GICP (ms).

Iii-B Benchmark Results

To examine how our algorithm compares against the state-of-the-art, we use the Alpha and Beta courses from the Urban dataset of the Subterranean Challenge and evaluate accuracy and CPU load against several other LiDAR-based odometry packages, including BLAM[7], Cartographer[5], LIO-Mapping[15], LOAM[17], and LOCUS[8]. Table III provides an overview of these comparison (numbers retrieved from [8]). We note that LIO-SAM [11], a current state-of-the-art tightly-coupled approach, and LVI-SAM [12], which combines LIO-SAM [11]

with VIO, could not tested at the time of this work due to their sensitive calibration procedure and strict requirements input data. We observed that our method’s CPU load was measured to be far lower than any other algorithm, using less than one core both on average and at its peak. This is likely a result how our system derives its submap, in addition its extensive reuse of internal data structures. This observation can also explain DLO’s far lower absolute pose error (APE) and root-mean-square error (RMSE). With this faster processing time, our method outperformed all other methods in both Alpha and Beta courses, having more than twice the accuracy in the Beta course for max, mean and standard deviation. In addition to our more permissive submapping approach, we are less likely to drop frames than other methods and hence more up-to-date information for the system, ultimately resulting in a lower trajectory error.

Iii-C Field Experiments

We additionally tested and implemented our solution on several custom robotic platforms for real-world field operation. Specifically, we integrated DLO onto an aerial vehicle (Fig. 1A) with an Ouster OS1, VN-100 IMU, and an Intel NUC Board NUC7i7DNBE 1.90 GHz CPU , and a Boston Dynamics Spot (Fig. 1B) with a Velodyne VLP-16, VN-100 IMU, and an Intel NUC7i7DN 1.9 GHz CPU. For these real-world runs, we conducted both manual and autonomous traversals across two perceptually-degraded environments: in an underground limestone cave in Lexington, KY and at an abandoned subway in Downtown Los Angeles (Fig. 9). Both environments contained environmental properties which often challenge perceptual systems, including poor lighting conditions, featureless corridors, and the presence of particulates such as dust or fog; these, however, did not pose a significant issue to our system. Despite traversing over m across three different levels in the abandoned subway, our system reported only a cm end-to-end drift. Our tests in the underground mine showed similar promise: while the environment completely lacked any external lighting deep within the cave, DLO could still stably track our aerial vehicle across m of flight for autonomous traversal and exploration. The results from these field operations demonstrate the real-world reliability of our method.

Fig. 10: Mega Cavern. Different views of the Mega Cavern in Louisville, KY mapped by our DLO algorithm. Data is courtesy of Team Explorer.

Iv Discussion

This work presents Direct LiDAR Odometry (DLO), a light-weight and accurate frontend localization solution with minimal computational overhead for long-term traversals in extreme environments. A key innovation which distinguishes our work from others is how we efficiently derive a local submap for global pose refinement using a database of keyframe-point cloud pairs. This in turn permits a substantial number of solver data structures to be shared and reused between system modules, all of which is facilitated using our custom NanoGICP cloud registration package. We demonstrate the reliability of our approach through benchmarks and extensive field experiments on multiple platforms operating in large-scale perceptually-degraded environments. DLO was developed for and used on NASA JPL’s Team CoSTAR’s fleet of quadrotors in the DARPA Subterranean Challenge (Fig. 10), and in the future we are interested in tighter IMU integration through a backend component as well as adding loop closures to further reduce long-term drift.

Acknowledgements: The authors would like to thank Team CoSTAR teammates and colleagues, including Amanda Bouman, Luca Carlone, Micah Corah, Kamak Ebadi, Seyed Fakoorian, David Fan, Sung Kim, Benjamin Morrell, Joshua Ott, Andrzej Reinke, Toni Rosinol, and Patrick Spieler, for their valuable insight and productive discussions.


  • [1] A. Agha, K. Otsu, B. Morrell, D. D. Fan, R. Thakker, A. Santamaria-Navarro, S. Kim, et al. (2021) Nebula: quest for robotic autonomy in challenging environments; team costar at the darpa subterranean challenge. arXiv:2103.11470. Cited by: §I.
  • [2] J. L. Blanco and P. K. Rai (2014) Nanoflann: a C++ header-only fork of FLANN, a library for nearest neighbor (NN) with kd-trees. Note: Cited by: §II-F3.
  • [3] Y. Chen and G. Medioni (1992) Object modelling by registration of multiple range images. Image and vision computing. Cited by: §I.
  • [4] K. Ebadi, Y. Chang, M. Palieri, A. Stephens, A. Hatteland, E. Heiden, A. Thakur, N. Funabiki, B. Morrell, S. Wood, et al. (2020) LAMP: large-scale autonomous mapping and positioning for exploration of perceptually-degraded subterranean environments. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 80–86. Cited by: §I, §I.
  • [5] W. Hess, D. Kohler, H. Rapp, and D. Andor (2016) Real-time loop closure in 2d lidar slam. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 1271–1278. Cited by: §III-B, TABLE III.
  • [6] K. Koide, M. Yokozuka, S. Oishi, and A. Banno (2020) Voxelized gicp for fast and accurate 3d point cloud registration. IEEE International Conference on Robotics and Automation (ICRA). Cited by: §II-F3, §III-A3.
  • [7] E. Nelson B(erkeley) l(ocalization) a(nd) m(apping). External Links: Link Cited by: §III-B, TABLE III.
  • [8] M. Palieri, B. Morrell, A. Thakur, K. Ebadi, J. Nash, A. Chatterjee, C. Kanellakis, L. Carlone, C. Guaragnella, and A. Agha-Mohammadi (2020) Locus: a multi-sensor lidar-centric solution for high-precision odometry and 3d mapping in real-time. IEEE Robotics and Automation Letters. Cited by: §I, §I, §I, §II-E2, §III-B, TABLE III.
  • [9] R. B. Rusu and S. Cousins (2011-May 9-13) 3D is here: Point Cloud Library (PCL). In IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China. Cited by: §III-A3.
  • [10] A. Segal, D. Haehnel, and S. Thrun (2009) Generalized-icp.. In Robotics: science and systems, Vol. 2, pp. 435. Cited by: §I.
  • [11] T. Shan, B. Englot, D. Meyers, W. Wang, C. Ratti, and D. Rus (2020) Lio-sam: tightly-coupled lidar inertial odometry via smoothing and mapping. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5135–5142. Cited by: §I, §I, §I, §II-E2, §III-B.
  • [12] T. Shan, B. Englot, C. Ratti, and R. Daniela (2021) LVI-sam: tightly-coupled lidar-visual-inertial odometry via smoothing and mapping. In IEEE International Conference on Robotics and Automation (ICRA), Cited by: §I, §I, §II-E2, §III-B.
  • [13] T. Shan and B. Englot (2018) Lego-loam: lightweight and ground-optimized lidar odometry and mapping on variable terrain. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4758–4765. Cited by: §I, §I, §I.
  • [14] W. Xu, Y. Cai, D. He, J. Lin, and F. Zhang (2021) FAST-lio2: fast direct lidar-inertial odometry. arXiv preprint arXiv:2107.06829. Cited by: §I.
  • [15] H. Ye, Y. Chen, and M. Liu (2019) Tightly coupled 3d lidar inertial odometry and mapping. In 2019 International Conference on Robotics and Automation (ICRA), pp. 3144–3150. Cited by: §I, §I, §III-B, TABLE III.
  • [16] Z. J. Yew and G. H. Lee (2018) 3dfeat-net: weakly supervised local 3d features for point cloud registration. In

    Proceedings of the European Conference on Computer Vision (ECCV)

    pp. 607–623. Cited by: §I.
  • [17] J. Zhang and S. Singh (2014) LOAM: lidar odometry and mapping in real-time.. In Robotics: Science and Systems, Vol. 2. Cited by: §I, §III-B, TABLE III.