In order to create safe and efficient mobile robots, introspective and reliability-aware capabilities are required to assess and recover from perception failures. Many perception tasks, including localization 
, scene understanding and sensor calibration
, rely on point cloud registration. However, registration may provide incorrect estimates due to local minima of the registration cost function, uncompensated motion distortion , noise or when the registration problem is geometrically under-constrained [24, 3]. Consequently, it is essential to measure alignment quality and to reject or re-estimate alignment when quality is low. In the past, an extensive number of methods have been proposed to assess the alignment quality of point cloud pairs [6, 28, 8, 32, 19, 4, 17, 10, 30, 27, 16]. These metrics can typically be used to measure a relative alignment error in the process of registration, but provide limited information on whether the point clouds are correctly aligned once registration has been carried out . Until today, few studies have targeted the measurement of alignment correctness after registration [2, 9] and previous works report that alignment correctness classification based on AdaBoost and NDT score function decrease when applied to point clouds acquired from new environments .
In this paper, we propose “CorAl” (Correctly Aligned?): A method to introspectively measure and detect misalignment between previously registered point cloud pairs. CorAl specifically aims to bridge the gap in classification performance when applied to new unseen environments.
Our method is well grounded in information theory and gives an intuitive alignment correctness measure. CorAl measures the difference between the average differential entropy in the joint and separate point clouds. For well-aligned point clouds, the joint and the separate point clouds have similar entropy. In contrast, misaligned point clouds tend to “blur” the scene which can be measured as an increase in joint entropy as depicted in fig. 7. By using the separate point clouds to estimate the entropy inherent in the scene, our proposed method can assess quality in a range of different environments.
The contribution of this paper is an intuitive and simple measure of alignment correctness between point cloud pairs. We demonstrate how to use this quality measure to train a simple model that detects small alignment errors between point clouds, large errors are not considered in this paper. To train our model, we use previously corrected scans that are assumed to have no alignment error.
We make the following claims: (i) Our proposed method CorAl measures the correctness of point cloud alignment by accounting for the expected scene entropy. (ii) Our method is accurate in a wide range of environments and can generalize well to new environments without retraining.
Ii Related work
Several methods have been used to assess alignment quality in the literature. However, in most cases these methods are used in an ad-hoc manner. Few systematic evaluations of their general ability to be used as a classifier to detect aligned vs. misaligned point clouds have been made.
One well-used alignment measure is the root-mean-squared (RMS) point-to-point distance, truncated by some outlier rejection threshold. This is also the function that is minimized by iterative closest point registration. However, this measure has been shown to be highly sensitive to the environment and the choice of the outlier threshold [29, 2]. Consequently, this is a poor measure for alignment correctness classification.
One family of methods instead attempts to estimate the alignment uncertainty between point cloud pairs in the form of a covariance matrix [18, 5, 23, 26]. Some use Monte Carlo strategies to estimate uncertainty by sampling registrations in a region . This exhaustive search is unpractical in mobile robotics. Others attempt to estimate uncertainty in closed form using the Hessian [11, 21, 2], representing the steepness of the alignment score function around a minimum. These methods assume that the registration has reached a global maximum, which is not necessarily true. Until today, alignment classification based on uncertainty covariance have been less accurate compared to matching score .
Almqvist et al.  explored alignment classifiers based on RMS as well as other existing methods [27, 7, 21, 12, 22, 29], including the NDT score function , and investigated how to combine the measures with AdaBoost into a stronger classifier. The classifiers were evaluated on two outdoor data sets, and although their classifiers reached almost 90 % accuracy for the hardest cases on each data set individually, accuracy drops to around 80 % when cross-evaluating between the data sets. In their evaluations, the NDT score function proved to be the best individual measure for alignment assessment. The combined AdaBoost classifier did not have significantly higher accuracy, but reduced parameter sensitivity.
Liao et al.  recently proposed a registration method based on fuzzy clusters, which involves a registration quality assessment. This fuzzy cluster-based quality assessment (FuzzyQA) compares the similarity of dispersion and disposition of points around fuzzy cluster centers. It has been used to detect if the point clouds are coarsely aligned, Coral instead attempts to detect small alignment errors.
Nobili et al. proposed a method to predict alignment risk prior to registration by combining overlap information and an alignment metric. The alignment metric quantifies the geometric constraints in the registration problem. The alignment metric is based on point-to-plane residuals and has been evaluated in structured scenes with planar surfaces, while our method can operate well even in unstructured environments. Additionally, our method seeks to estimate the alignment after registration has been completed to introspectively measure the registration success, as opposed to predicting the risk prior to registration.
Bogoslavskyi et al.  defined a quality metric based on positive and negative point information, and used it to measure alignment error and cluster three known object types in a controlled experiment. Rather than focusing on objects, our method aims to classify alignment quality of observed scenes in different environments. Additionally, their method operates on range images, which might not be available, while our method operates on unorganized point clouds.
To the best of our knowledge, there is no method for binary point cloud alignment classification that performs accurately and transfer well to new environments without parameter tuning or retraining.
Iii CorAl method
and measures the randomness of multivariate Gaussian distributions. Droeschel and Behnke used MME in absence of accurate ground truth when evaluating map refinement. As shown in our evaluation, MME cannot be used as a general alignment quality measure as it is also affected by measurement noise, sample density and environment geometry. MME is more affected by changes in the environment compared to CorAl. Hence, the measure is not expected to generalize between, e.g., a structured warehouse and an unstructured outdoor forest environment. We overcome this effect using dual entropy measurements computed 1) in both point clouds separately and 2) in the joint point cloud. The intuition is that joining two well-aligned point clouds should not introduce additional uncertainty and entropy should remain constant if the point clouds overlap sufficiently.
Iii-a Computing joint and separate entropy
Our method operates on the dense point clouds , , given in a common fixed world frame, that contain a set of points in the Cartesian space . For later use, we define the joint point cloud ; i.e., all points in and together.
From all points within a radius around each point , we compute the sample covariance . From the determinant of the sample covariance we can then compute the differential entropy as:
where is the number of points in the point cloud .
Using Eq. 2 we can derive measures of the separate and joint average differential entropy of two point clouds .
Our first alignment quality measure uses the difference between the joint and the separate average differential entropy:
which can also be given per-point by
where the point entropy is evaluated on the joint point cloud and the separate point cloud = ( or ) where originates from. An example of point clouds colored by per-point entropy difference according to eq. 6 is depicted in Fig. (c)c and (d)d. Typically, is close to zero for well-aligned point clouds and increases with the alignment error as depicted in fig. 11, which visualizes the function’s surface for position and angular alignment errors around the correct alignment.
Well-aligned point clouds acquired in structured environments have low differential entropy for most query points
. This is reflected by low values for the determinant of the sample covariance. As the determinant can be expressed as the product of the eigenvalues of the sample covariance, we see that the measure is sensitive to an increase in the lowest of the eigenvalues when larger eigenvalues are constant. For example, the entropy of points on a planar surface is represented with a flat distribution with two large () and one small () eigenvalue. Misalignment changes the point distribution in the joint point cloud from flat to ellipsoidal which can be observed as an increase of the smallest eigenvalue . This makes the measure sensitive to misalignment of planar surfaces, but generalizes well to other geometries. As shown in the evaluation, the measure can capture discrepancies between point clouds regardless of whether these are due to rigid misalignments or distortions which can can occur when scanning while moving, e.g. because of vibrations or sensor velocity estimation errors. That means that the method can be overly sensitive when used together with a registration method or odometry framework that does not compensate movement distortion or has a low accuracy.
Overlap is required between point clouds to produce evidence of alignment. For that reason, we classify point clouds with less than 10% overlap as misaligned. By defining the overlap as all points with a neighbor within in the other point cloud, non overlapping points have no effect on the quality measure in eq. 5.
Iii-B Dynamic radius selection and outlier rejection
For well aligned point clouds, the quality measure is close to zero, meaning that the joint and separate point clouds have similar mean and probability distributions of per-point entropy as depicted in Fig.(a)a. Unfortunately, the entropy in eq. 1 is ill-posed when the determinant is close to zero and a small increase of the determinant causes a large increase of the entropy. Accordingly, the lowest measured entropies can increase (which indicates misalignment) even when joining well aligned point clouds as depicted in fig. (a)a. The ill-posed entropies are found where point density is low, typically for solitary points or far from the sensor where the radius is not large enough to include points that represent the geometry in the environment. The effect of the problem with entropies can be mitigated by maximizing the ratio . A larger ratio indicates that the measure is able to discriminate between aligned and misaligned point clouds.
We propose three strategies to address the ill-posed entropies due to variations in sampling density originating from the sensor.
(1): Eq. 1 is modified to where limits the lowest possible entropy. This make sure that entropy is similar for points distributed along a line and a plane. The improvement can be seen by comparing fig. 23(a-b).
(2): Radius is chosen based on the distance between the point and the sensor location, to account for that point density decrease over distance. The radius is hence selected as: in the range where is the vertical resolution of the sensor. For other sensor types e.g. RGB-D, the resolution could be chosen similarly according to the angular sensing resolution. A dynamic radius enables the quality measure to include more points far from the sensor and correctly detect alignment and misalignment for these as seen in fig. 23(c).
(3): Remove percent of points with the lowest entropies. The effect is depicted in fig. 23(d).
We use logistic regression as a model for classification:
where are input variables (described for each method in section IV-A). Instead of passing the quality measure , and are passed separately to and . are learned model parameters, is the class probability and is a class probability threshold and can be adjusted to the application needs. For example, in mobile robotics, it is desired that misaligned point clouds are not accidentally reported as aligned (false positives), potentially causing a system failure. In contrast, aligned point clouds classified as misaligned are typically harmless. For that reason, can be increased to reject false positives and hence improve robustness. We used the default threshold .
We evaluate an equal portion of aligned and misaligned point clouds. Misaligned point clouds are created by adding an offset for each point cloud pair: an angular offset () around the sensor’s vertical axis and a random translational offset at a distance (m) from the ground truth. These errors are large enough to be meaningful to detect in various environments, yet challenging to classify.
Iv-a Evaluated methods
The evaluated methods are summarized here together with their most important parameters.
CorAl (proposed in the paper)
CorAl-Median (proposed in the paper)
are modified to calculate the median entropy rather than the mean entropy, we hypothesize that this modification can be more robust. The parameters are unchanged.
NDT (point-to-distribution normal-distributions transform)
The method uses the 3D NDT  representation similarly to Almqvist  (NDT3), which constructs a voxel grid over one point cloud, and computes a Gaussian function based on the points in each voxel. The likelihood of finding the points in , given the NDT representation of , is computed as
where the number of overlapping points, defined as those points (which fall in an occupied NDT voxel, or in a voxel that is a direct neighbor of an occupied voxel) and
is the probability density function associated with the nearest overlapping NDT-cell. The most important parameter for NDT is the voxel sizewhich is set equal to in our evaluation as this makes the sample covariance of NDT cells and entropy computed from points in a similarly large volume.
Rel-NDT (proposed in the paper)
We wanted to investigate if entropy can be used to improve generalization of NDT to different environments. The idea is that environment type is reflected in the average entropy of the scene and can be combined with NDT score to improve classification. We did this by computing the average entropy of all NDT-covariances associated with in the point-likelihood terms and feed that together with the NDT score (9) to the classifier. No additional parameters to NDT are required.
FuzzyQA  measures the alignment quality by a ratio , where AFCCD and AFPCD are two indexes describing the points’ disposition and dispersion around fuzzy cluster centers. The two point clouds are coarsely aligned if . However, AFCCD and AFPCD are passed separately to the classifier input .
Input to the classifier
CorAl, FuzzyQA and Rel-NDT output two decision variables that are passed as input variables to the classifier (III-C). The other evaluated methods output a single variable , and is fixed.
Iv-B Qualitative evaluation, live robot data
First, we present qualitative results from real-world data in a structured warehouse environment. A forklift equipped with a Velodyne HDL-32E spinning laser scanner was manually driven at fast walking speed in the environment depicted in fig. 24. The environment in the sequence varies from large and open with visible walls, to small and narrow between ailes of pallets.
To generate ground-truth alignments for the warehouse dataset, we first aligned the point clouds using a scan-to-map approach . We then inspected the alignment between subsequent scans and found that at least 40/484 (8.3%) point clouds were impaired by rigid misalignments or non-rigid distortions from vibrations and motion to the extent that these could be easily visually located.
Alignment classification was then performed on the remaining scans by inducing errors as described in section IV. We used the following parameters as they provided a relatively high value of for the first scan pair in the dataset: , , , and voxel size . We found that CorAl-mean, MME and NDT reached an accuracy of , and
respectively. In this case, NDT performs slightly better than CorAl. We believe that CorAl is more sensitive to the typical alignment noise that is still present in the aligned scans. This typical alignment noise introduces a variance in the CorAl score and makes it hard to train a classifier that is sensitive to small misalignment’s. Whether this is desired behavior depends on the application.
Iv-C Quantitative evaluation, ETH benchmark data set
Our main quantitative evaluation is done using the public ETH registration dataset . This dataset includes 3 sequences in structured (blue) environments (Apartments, ETH Hauptgebaude, Stairs), 3 sequences in semi-structured (brown) environments (Gazebo in summer, Gazebo in winter, Mountain plain) and 2 challenging sequences in unstructured (green) environments (Wood in summer, Wood in autumn). Each sequence contains between 31 and 47 scans acquired from stationary positions. The dataset contains accurate ground truth positions, required to evaluate the different methods. In order to make the evaluation fairer, more realistic and applicable to real applications, we downsample the original, dense, point clouds using a voxel grid of 0.08 m. As the dataset has less variation in sampling density compared to the warehouse dataset, we used a fixed radius and set . NDT voxel size was set equal to the diameter to create a fair comparison.
CorAl has an overall run-time of seconds per point cloud pair on an Intel Core i7 and depends on the point cloud density.
Iv-C1 Separate training
The first test evaluates the capability to learn classification in a specific type of environment and serves as a reference for further evaluations. The classifiers were trained and evaluated on each sequence separately, using 5-fold cross validation.
Results are shown in fig. 25.
We found that all methods except FuzzyQA performed well on the structured environments. We did not expect that FuzzyQA would handle this as it is specifically designed to classify coarse alignment. Surprisingly, even MME scored 90–100% on the structured environment. This indicates that even naive methods can assess alignment quality in a highly structured environment. In the semi-structured and unstructured sequences, only CorAl and CorAl-median performed well, with consistently 90% accuracy, even in the most challenging sequences. All other methods are only slightly better than random, except for the gazebo sequences. Rel-NDT improves NDT in most cases, however not consistently. We believe this is because entropy alone provides little information about the environment. This is supported by the low overall accuracy of MME. Both NDT methods performed decently (77–90%) in the gazebo sequence, indicating that NDT requires at least some structure or surfaces free from foliage to be effective as an alignment correctness measure.
Iv-C2 Joint training
The second test evaluates how the methods are able to learn alignment classification when trained in a variety of environments. To do that, the methods need to be versatile. Training was performed on all the ETH sequences, evaluation was then performed on each sequence individually. The results are shown in fig. 26.
The accuracy of all classifiers decreased compared to the previous test. CorAl performed best, with accuracy 85–100% in all cases. CorAl-median reached a slightly lower accuracy compared to CorAl. Rel-NDT performed better than NDT in most cases, however not consistently. The generally high accuracy of CorAl indicates that it is possible to find general parameters that makes the method valid in various environments.
Iv-C3 Generalization to unseen environments
The final test evaluates how classifiers perform in environments with different characteristics than those observed in the training set. We trained and evaluated on different sequences and environments. The 3 structured environments were used for training and the remaining 5 (semi-structured and unstructured) were used for evaluation and vice versa. The classification accuracy is depicted in fig. 27.
When trained on structured and evaluated on semi-structured environments, CorAl performed accurately(85–98%) and other methods performed close to random except NDT for Gazebo summer () No method generalized well from structured to unstructured environments. On the other hand, learning from semi-structured and unstructured environments was enough to afford very high accuracy in structured environments with CorAl – very close to what was attained with joint training on all sequences. The previous joint evaluation show that it’s possible to train a model that is simultaneously accurate in all environment types. For that reason, we believe that the reason the classifier trained in a structured environment does not generalize to an unstructured environment is that the model overfits when not using sufficiently diverse and challenging data.
In this paper we introduced CorAl, a principled and intuitive measure of alignment correctness between point clouds. Using dual entropy measurement that compares the expected entropy found in the separate point clouds with the actual entropy, CorAl can measure point cloud alignment correctness and substantially outperforms previous methods when evaluated on a public data set. Specifically, we were able to use CorAl to train a classifier based on logistic regression that is simultaneously accurate in a diverse range of environments. Our experiments show that our method generalizes well from (i) unstructured and semi-structured to structured environments, and (ii) from structured to semi-structured. None of the evaluated methods generalized well from structured to unstructured environments. Therefore, we conclude it is possible to train a general and accurate alignment classifier given that training data is sufficiently diverse. Relatively modest results was achieved on live data. We think that the poor quality of the ground truth (obtained by lidar odometry and manual inspection) causes high variance in the CorAl score. The score is sensitive to small misalignment’s, therefore a higher quality ground truth is required to make a fair evaluation. We believe that CorAl per-point quality and classification can be a useful tool for alignment evaluation and can improve robustness in various perception tasks by serving as a fault detection step.
In the future we will investigate how to automatically learn sensor specific parameters or use the range image to find neighbouring points for covariance computation. This could address variations in point density owed to different sensors and environment scales.
-  (2019-09) A Submap per Perspective - Selecting Subsets for SuPer Mapping that Afford Superior Localization Quality. In 2019 European Conference on Mobile Robots (ECMR), pp. 1–7. External Links: Cited by: §I.
-  (2018) Learning to detect misaligned point clouds. Journal of Field Robotics 35 (5), pp. 662–677. External Links: Cited by: §I, §II, §II, §II, §IV-A.
-  (2017-Sep.) Incorporating ego-motion uncertainty estimates in range data registration. In 2017 (IROS), Vol. , pp. 1389–1395. External Links: Cited by: §I, §IV-B.
-  (2019) Pointnetlk: robust & efficient point cloud registration using pointnet. In , pp. 7163–7172. Cited by: §I.
-  (2003-07-31) Robot localization based on scan-matching—estimating the covariance matrix for the IDC algorithm. Robotics and Autonomous Systems 44 (1), pp. 29–40. External Links: Cited by: §II.
-  (1992-02) A method for registration of 3-d shapes. IEEE TPAMI 14 (2), pp. 239–256. External Links: Cited by: §I, §II.
The normal distributions transform: a new approach to laser scan matching. In Proceedings 2003 IEEE/RSJ (IROS 2003), Vol. 3, pp. 2743–2748 vol.3. External Links: Cited by: §II.
-  (2015-05) Generalized iterative most likely oriented-point (g-imlop) registration. International journal of computer assisted radiology and surgery 10, pp. . External Links: Cited by: §I.
-  (2017) Analyzing the quality of matched 3d point clouds of objects. 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6685–6690. Cited by: §I, §II.
-  (2013) Sparse iterative closest point. In Proceedings of the Eleventh Eurographics/ACMSIGGRAPH Symposium on Geometry Processing, pp. 113–123. Cited by: §I.
-  (2007-04) An accurate closed-form estimate of ICP’s covariance. In IEEE (ICRA), pp. 3167–3172. External Links: Cited by: §II.
-  (2007-01) Assessing map quality and error causation using conditional random fields. IFAC Proceedings Volumes 40 (15), pp. 463–468 (en). External Links: Cited by: §II.
-  (2000-04) Entropy expressions for multivariate continuous distributions. Information Theory, IEEE Transactions on 46, pp. 709 – 712. External Links: Cited by: §III.
-  (2019-04) Unified Motion-Based Calibration of Mobile Multi-Sensor Platforms With Time Delay Estimation. IEEE Robotics and Automation Letters 4 (2), pp. 902–909. External Links: Cited by: §I.
-  (2018-05) Efficient continuous-time slam for 3d lidar-based online mapping. In ICRA, Vol. , pp. 1–9. External Links: Cited by: §III, §IV-A.
-  (2015) MLMD: maximum likelihood mixture decoupling for fast and accurate point cloud registration. In 2015 International Conference on 3D Vision, Vol. , pp. 241–249. External Links: Cited by: §I.
Joint alignment of multiple point sets with batch and incremental expectation-maximization. IEEE Transactions on Pattern Analysis and Machine Intelligence 40 (6), pp. 1397–1410. Cited by: §I.
-  (2019-05) CELLO-3D: estimating the covariance of ICP in the real world. In IEEE (ICRA), pp. 8190–8196. External Links: Cited by: §II.
-  (2021) Point set registration for 3d range scans using fuzzy cluster-based metric and efficient global optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence 43 (9), pp. 3229–3246. External Links: Cited by: §I, §II, §IV-A.
-  (2007-10) Scan registration for autonomous mining vehicles using 3D-NDT. Journal of Field Robotics 24 (10), pp. 803–827. Cited by: §IV-A.
-  (2009-12) The three-dimensional normal-distributions transform — an efficient representation for registration, surface analysis, and loop detection. Ph.D. Thesis, Örebro University. Note: Örebro Studies in Technology 36 Cited by: §II, §II.
-  (2006-06) Fully automatic registration of 3d point clouds. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 1, pp. 1297–1304. External Links: Cited by: §II.
-  (2006) Scan-SLAM: combining EKF-SLAM and scan correlation. In Field and Service Robotics, P. Corke and S. Sukkariah (Eds.), Springer Tracts in Advanced Robotics, pp. 167–178. External Links: Cited by: §II.
-  (2018) Predicting alignment risk to prevent localization failure. In 2018 IEEE International Conference on Robotics and Automation (ICRA), Vol. , pp. 1003–1010. Cited by: §I, §II.
-  (2012-12) Challenging data sets for point cloud registration algorithms. IJRR 31 (14), pp. 1705–1711. Cited by: §IV-C.
-  (2015-05) A closed-form estimate of 3d ICP covariance. In 2015 14th IAPR (MVA), pp. 526–529. External Links: Cited by: §II.
-  (2001) Efficient variants of the ICP algorithm. In The Third International Conference on 3D Digital Imaging and Modeling, pp. 145–152. Cited by: §I, §II.
-  (2009-06) Generalized-ICP. In Robotics: Science and Systems V, External Links: Cited by: §I.
Precision range image registration using a robust surface interpenetration measure and enhanced genetic algorithms. TPAMI 27 (5), pp. 762–776. Cited by: §II, §II.
-  (2012) Fast and accurate scan registration through minimization of the distance between compact 3D NDT representations. The International Journal of Robotics Research 31 (12), pp. 1377–1393. Cited by: §I.
-  (2020) Assessing losses for point set registration. IEEE Robotics and Automation Letters 5 (2), pp. 3360–3367. Cited by: §I.
-  (2010-10) Point set registration using havrda–charvat–tsallis entropy measures. IEEE transactions on medical imaging 30, pp. 451–60. External Links: Cited by: §I.
-  (2014) LOAM: lidar odometry and mapping in real-time. In Robotics: Science and Systems, External Links: Cited by: §I.