1 Introduction
The driverless transportation is becoming a technological dream on a verge of turning a daytoday reality, as more competitors are joining to [yandex_in_vegas] the pursuit. Early prototypes appeared decades ago [leonard1991simultaneous], while DARPA Urban Challenge could be considered an important milestone. AV (Autonomous Vehicle), hardware and models of the victorious team are highlighted in [urmson2007tartan] with an emphasis on the crucial role that a lidar plays in providing that level of autonomy. The comprehensive analysis [fletcher2008cornell] of a lowspeed collision between MIT’s AV "Talos" and Cornell’s AV "Skynet" occured during the challenge puts the positioning error as a one of causes. The lidar odometry is a subject of this work and our main contributions are:

point cloud filtering by the covariance of point normal, informed of the measurement error, specific for a lidar sensor

scheme for the false matches rejection between consecutive point clouds, based on a neighbor beam flight distance metric

reliable performance in the realistic environment is demonstrated on KITTI [geiger2012we] and on the internal dataset, collected from the autonomous fleet facing intense modes of operation every day (Fig. 1) in comparison with LOAM, stateoftheart competitor [zhang2014loam]
2 Related work
Driverassist systems (autonomy level 3 [SAE_ADAS]) tend to avoid 3D SLAM (Simultaneous Localization and Mapping) by reliance on GNSS, odometry and the lane detection. However, level 4 AVs often benefit from data stored in a HD map. This study is concerned with a level 4 setup, which dictates the separation of localization and mapping because doing the latter repetitively on an every car in a fleet in a same location is wasteful.
A substantial body of reseach exists on a problem of egomotion estimation and SLAM, yet the application to AVs imposes an unique set of challenges, as the environment is:

unprepared for autonomous navigation and is unfeasible to be retrofitted with markers

featurepoor: highways, tunnels, wastelands

dynamic: vegetation growth, snowbanks accumulation, construction works

hard to perceive: the weather, time of a day and an annual season change influence

rich with occlusions of background features by other vehicles
The degenerative effect of dense urbanisation on the SLAM performance was shown in [wen2018performance]. This challenge is addressed by graph optimization [grisetti2010tutorial], achieving the map global consistency. However, to reduce the computation time and decrease the chance of convergence to a suboptimal local minima, the better maps for optimization are called for. The input maps are built with odometry, which is the objective of our study.
The comparison [sokolov2017analysis] of lidar SLAM with the visual counterparts favours the former. The top of the KITTI’s leaderboard is also occupied by lidar methods. Laser range measurements tend to show a higher reliability and lower dependence on luminance than the depth measurements produced by cameras, which drives our focus on a lidar.
The seminal contribution is Iterative Closest Point [besl1992method]
, a method to register a relative 3D transformation between point clouds of arbitrary shape, based on the least squares. The closest point correspondence estimation is replaced by a grid lookup, and Euclidian loss is replaced by Mahalanobis in Normal Distributions Transform
[biber2003normal]. Novel learningbased methods are proposed [velas2018cnn], [cho2019deeplo], but do not perform on par yet. Reflectance intensity images with depth are reconstructed from range scans in [dong2014lighting] and then used as an input for the visual odometry algorithm. Contrary to that, many works [deschaud2018imls], [neuhaus2018mc2slam], [ji2018cpfg], [behley2018efficient] rely solely on the geometric properties of registered laser point clouds.The continuous time was found benefitial by H. Alismail et al in [alismail2014continuous] as it allows to solve simultaneously and explicitely for the pose and the point cloud drift, which happens due to an egomotion during the sensor spin time. The proposed method does not estimate the point cloud drift correction, as the wheel odometry data could be employed for that.
Optimal registration algorithms [cai2019practical] guarantee convergence without initialization, but are prohibitively expensive to compute. The landmarkbased registration is advocated in [li2010extracting] on grounds of decreasing a SLAM graph density. In spite of this valid concern our method operates directly on point clouds, because the dense approach provides a greater robustness in featurepoor environments.
Experimental results are evaluated against an another featurebased approach, LOAM [zhang2014loam], which holds the top result on KITTI’s leaderboard^{1}^{1}1http://www.cvlibs.net/datasets/kitti/eval_odometry.php at the time of writing this paper. LOAM makes use of two types of features (cornerlike and planelike) associated with dedicated losses (pointtoline, pointtoplane). These pointtoplane losses may include outlier entries because plane normal estimates are computed with three points each and could contain noise, while the proposed method estimates normals robutly and filters point clouds by normal covariances.
3 Lidar Odometry
The odometry task could be stated formally as the problem of determining a relative rigidbody 3D transformation , which brings the sensor coordinate frame at a time of the current range scan (source point cloud ) registration to the sensor coordinate frame at a time of the previous range scan (target point cloud ) registration.
The proposed method is built on the ICP framework [besl1992method]. We proceed with detailing specific choices made for every subroutine mentioned in Alg. 1 without a dive into the discussion of available alternatives, which are covered by the excellent review [rusinkiewicz2001efficient] and the bestpractices for choosing among them [pomerleau2013comparing].
Initialization of the transformation (initial guess) practically is derived from the independent sensors available: IMU, wheel odometry. However, this work considers the lidaronly odometry and uses the linear extrapolation from past states for initialization.
Filtration of Points is crucial as the data rate of a modern lidar would hinder the computational performance. Depending on the lidar model and the configuration of the environment 30K150K points are registered within one range scan , while only are passed on through the filter (Alg. 1, line 1) to the model as the point cloud . The filtration steps are:

voxel grid filter, passing on centroids of occupied voxels

normal estimation and normal covariance filtration, covered in detail in Sec. 3.1

local curvature filter: points with a higher curvature are discarded (estimated as in [pcl_curvature])
Matching of Points is done with a kd tree, selecting a closest point in the target cloud. These preliminary correspondences undergo the rejection of outliers:

geometric correspondence rejector, highlighted in Sec. 3.2

remove of matches having the largest distance (introduced by D. Chetverikov et al in [chetverikov2002trimmed])
Transformation Estimation is performed by the uniformly weighted linearized least squares optimization with the pointtoplane metric, which was shown to be generally superior to pointtopoint by F. Pomerleau et al in [pomerleau2013comparing].
Termination Criteria is a combination of thresholds on the absolute and relative magnitude of transformation increment with the threshold on an iterations count.
With that the ICP framework is defined and the proposed models for point filtration and matching follow.
3.1 Normal Covariance Filtration
The filter introduced in Alg. 1, line 1, takes a raw range scan and outputs , where each point has an estimated normal associated with it, as well as the normal covariance:
, as the points having highuncertainty normals are rejected. The reason is miscalculated normals impair transformation estimation precision, as individual losses would be off.
The normal
is estimated using SVD (Singular Value Decomposition) in a
neighborhood of the point, so when a substantial fraction of points is sampled from an another surface, which could happen in many boundary cases, SVD would yeild a normal vector, that is not describing a true plane well. The higher values of the last singular value are indicative of the lower confidence in the normal orientation. However, such a model discounts the fact that the point coordinates are not certain and are subject to the measurement error.
We propose a normal covariance filter (NCF), which accounts for the uncertainty in point coordinates, treating it as Gaussian random variables. The measurement error is approximated by a sphere with the standard deviation
.For the notation clarity, until the rest of the section the lower indices are used (as in ) to access a matrix entry and a single lower index to denote a matrix column , counting indices from .
The SVD input is an array of unbiased local points, sampled around from , i.e. the data matrix . The SVD output is , where , , , and , are orthogonal.
When entries of are random variables, is as well. The covariance of the plane normal is estimated by the propagation of uncertainty technique:
(1) 
The data matrix covariance was informally introduced above and is . The function is a sequence of computations, yielding from . To avoid the appearance of 3dimensional Jacobian matrix, is reformulated w.r.t. , simplified further, given the form of :
(2) 
Then the intermediate matrix is defined:
(3) 
where , and defines a matrix, having zeros everywhere except th element. With Eq. 3 the Jacobian w.r.t. to is obtained:
(4) 
For the full derivation of the SVD Jacobian please refer to [candes2013unbiased], [papadopoulo2000estimating].
Recovered in Eq. 1, lies in the sensor coordinate frame, hence it should be rotated into an alignment with before the threshold may be applied:
(5) 
The matrix is a covariance of in a normalaligned coordinate frame computed in Eq. 5 through eigendecomposition. Having for an each point in a range scan , the points associated with the higher normal uncertainty are discarded: . The filtration result is a range scan with precise normals, utilized on later stages of the ICP framework.
3.2 Geometric Correspondence Rejector
For a current range scan the set of correspondences is found by a search in the previous range scan (Alg. 1, line 5). The proposed model yields filtered correspondences , where is a point in the current range scan, participating in the correspondence with a point from the previous range scan.
The test isInlier is conducted independently for an every correspondence , characterized by the Euclidian distance between matched points: . The test description follows:
(6) 
The matching distance in compared to an upper bound of neighbor beam flight distances, or, shortly, , highlighted red in Fig. 2.
The neighbors of , , are four hypothetical points, which might have been registered by lasers, vertically adjacent to the one registering . Conventionally for spinning lidars, these laser levels are called rings, so if is a part of the ring , then it’s neighbors are on the adjacent rings , , shown in purple and orange in Fig. 2.
The model introduces two neighbors on an each of the adjacent rings assuming that "left" neighbors were registered at a previous increment of lidar azimutal rotation, at a current, and "right" neighbors at a next one.
For the calibrated lidar, all angular distances between and it’s neighbors are easily found. The calibration parameters are: (an angular increment of lidar azimutal rotation), , are angular distances in pitch between rings.
Hence, with it’s neighbors define a segment of the unitsphere. This sphere is scaled to and the segment is approximated by a gray plane on Fig. 2, in turn assuming neighbor lidar beams parallel. This assumption was found admissable for lidars in our use, for the hardware parameters are close to : , .
The diagonal vector between and on the approximated plane (shown in cyan in Fig. 2) is obtained as follows:
(7) 
The vectors , define absciss and ordinate axes of coordinate frame. With proceed to the neighbor beam flight distance metric:
(8) 
The intuition behind the metric is a distance between the scaled laser beam sphere surface and a local reflecting surface along the laser beam (grey and green surfaces in Fig. 2). Matches having
higher than that are probably made with points laying even further away from
than and should be discarded.4 Experiments
The thorough evaluation of the proposed method is provided. Two parts of this exist: the dedicated study of the geometric correspondence rejector and the integrated test of the proposed lidar odometry. The former is a comparison of the proposed rejector with the Euclidian distance rejector while the latter is a test with three ICP odometry variants and the competitor apporach, LOAM, on two realworld AV benchmarks, renown KITTI and the internal dataset, taken during Yandex own AV fleet operation.
4.1 Outlier Correspondence Rejection
The geometric correspondence rejector (GCR) is evaluated against the baseline, which is the Euclidian distance rejector. The test is run on point cloud pairs from the internal dataset. The ground truth for the test is relative transformations between pairs, refined by the graph optimization.
In an every test instance the initial guess is the composition of the ground truth transformation with a random noise. The method’s ability to recover the correct transformation is evaluated for several noise levels. The proposed method have shown the performance, superior to the baseline, as in Fig. 3. Only distributions of translation estimation errors are included as the rotation errors were found to be very correlated with it.
The noise transformation have two parameters , (magnitude limits in rotation and translation) and is sampled as follows: generate two random unit vectors on a sphere, where the former is the rotation axis and the latter is the translation axis , rotate for about and translate for along .
4.2 Lidar Odometry Evaluation
The study of the fully integrated odometry method follows with the baseline (BL), where the Euclidian distance rejector is used within the ICP framework, described in Alg. 1 along with the common approach for points filtration, following Sec. 3.
The baseline method is used as a control in a performance evaluation of LOAM, the proposed method (SALO) and an intermediate variant, integrating NCF into BL. GCR was not evaluated separately from NCF, because it relies on the quality of point cloud normals, which is ensured by NCF.
The internal Yandex dataset of total length of 16 minutes, 12 km was compiled for this study. It was taken in different cities and seasons of the year with a 32beam lidar, midgrade singleband GNSS with RTK. GNSS signal frequency was increased by the Kalman filter interpolation with IMU and wheel odometry to be used as the ground truth in this evaluation. The removal of dynamic objects from lidar scans was not performed.
Detailed results for all three methods and the competitor are showcased in Tab. I with trajectory visualizations in Fig. 4.
A. Geiger argues in [geiger2012we] that the absolute metric accounts errors nonuniformly based on a time passed since the start of the sequence. We adhere to KITTI’s standard and use the same metric and the official devkit in all evaluations, with a minor change: the reference pose is set metres away instead of averaging across , … to foster the results interpretability. The other motivation for that is space constraints and we may suggest readers to refer to our submission on the benchmark website [geiger2012we] for more results.
Along with error averages, standard deviations are listed and it may be noted that the lower mean error always coincides with the lower deviation.
It was shown before [sun2018dlo] and is confirmed now on the internal dataset and KITTI (Fig. 4.d) that the communitysupported implementation of LOAM [community_loam_velodyne] does not perform accordingly to the version, competing in the KITTI’s leaderboard and tends to degrade in many cases. Contrary to that, the proposed method produces a trajectory close to the ground truth, accumulating little drift over 7 minutes (Fig. 4.d).
The proposed method achieves 27% and 45% drift reduction on KITTI and internal dataset respectively along with an even stronger error variance reduction.
The avarage processing time of a point cloud is s. Studied variants do not vary substantially in computational cost, which mostly comes from the normal estimation.
a) internal #02  b) internal #06  c) internal #07  d) KITTI #00 
Training set of KITTI  

Method  00  01  02  03  04  05  06  07  08  09  10  Overall  
LOAM  
BL  
NCF  
SALO 
Internal Dataset  

Method  00  01  02  03  04  05  06  07  08  Overall  
LOAM  
BL  
NCF  
SALO 
BL is a baseline odometry, as described in Alg. 1, NCF – the baseline enhanced with the normal covariance filtration, as in Sec. 3.1, SALO is the proposed method, i.e. BL+NCF+GCR.
5 Conclusions
The MIT/Cornell collision during DARPA Grand Urban Challenge is the first AV road incident, but regretfully would not be the last. We share our experience in designing odometry solution for selfdriving car hoping to foster safety and reliabilty of the autonomous fleets across the globe. In this work ICP lidar odometry performance boosted by the normal covariance point cloud filter and by the geometric outlier correspondence rejector. These routines jointly enable for cleaner point clouds with better correspondences yielding better convergence properties, which is demonstrated on KITTI and internal dataset with relative positioning errors of and respectively. Further pursuit is an implementation of a local map and fast map query mechanisms to allow for implicit loop closures. Another conceived improvement is mixing pointtopoint and pointtoplane losses depending on the local structure, as well as the preparation of internal dataset for public release.
We are grateful to our colleague Sergey Krutikov for his invaluable contribution into this project.
Comments
There are no comments yet.