Literature review of point cloud registration methods (For geomatics seminar at ETH Zurich)
Point cloud registration has been one of the basic steps of point cloud processing, which has a lot of applications in remote sensing and robotics. In this report, we summarized the basic workflow of target-less point cloud registration,namely correspondence determination and transformation estimation. Then we reviewed three commonly used groups of registration approaches, namely the feature matching based methods, the iterative closest points algorithm and the randomly hypothesis and verify based methods. Besides, we analyzed the advantage and disadvantage of these methods are introduced their common application scenarios. At last, we discussed the challenges of current point cloud registration methods and proposed several open questions for the future development of automatic registration approaches.READ FULL TEXT VIEW PDF
Point cloud registration is a fundamental problem in 3D computer vision....
Point cloud registration is a key task in many computational fields. Pre...
3D Point Cloud Semantic Segmentation (PCSS) is attracting increasing
Point cloud registration is a common step in many 3D computer vision tas...
Machine perception applications are increasingly moving toward manipulat...
The performance of surface registration relies heavily on the metric use...
Multi-view point cloud registration is a hot topic in the communities of...
Literature review of point cloud registration methods (For geomatics seminar at ETH Zurich)
In recent decades, point cloud has become a more and more common representation of the 3D world. Point cloud collected by laser scanner or RGBD-cameras can be used for landslide monitoring [rowlands2003landslide], solar potential analysis [solar], three dimensional model reconstruction [citymodel], cultural heritage protection [Heritage], forest management [Forestreg][forestreg2], robot ego-localization [loam] and high definition maps production for self-driving cars [hdmapping].
The issue is that the collection of point cloud is limited to the perspective so that it is not possible to achieve the 360 degree complete sampling of the target object’s surface from a single viewpoint (station). Usually, we would set up several stations around the target object to have a throughout scan. However, these scans from different viewpoints are in their corresponding station center coordinate system. To unify these scans into a common mapping coordinate system, we should accomplish the so-called point cloud registration procedure, as shown in Fig.1.
As the basic step of point cloud processing and the prerequisites of segmentation, classification and 3D model reconstruction, point cloud registration plays an important role in various of remote sensing and robotics applications. By adopting point cloud registration to get the transformation between two adjacent frames (scans), we can get the change of pose of a robot or an unmanned vehicle. This is called the LiDAR odometry , which is a heated topic in Simultaneous Localization and Mapping (SLAM) technology.
The traditional solution to point cloud registration is using some highly-reflective targets as the tie points for coordinate system transformation. Since this solution still needs the assist of artificial targets and the manual pining of targets in the point clouds, the process is labor-consuming and time-consuming. To automate the process, over the past twenty years, plenty of target-less point cloud registration approaches have been proposed in the fields of remote sensing and computer vision so as to automatically register point clouds together.
In literature, the task of point cloud registration generally follows a two-step workflow: determine correspondences and then estimate the transformation. The first step is correspondences determination. The correspondences can be geometric primitives like points, lines, planes and even specific objects. As the preparation, we usually need to detect the key points, fit the key lines, planes or extract the specific objects. Then we can extract the neighborhood feature and match those geometric primitives according to the feature similarity. Alternatively, the geometric or adjacency relationship can be adopted to get correspondences. Besides, we can keep randomly sampling a minimum set of correspondences and finally choose the set which leads to the transformation with the largest number of inliers.
The second step is transformation estimation. Given the correspondences, our goal is to solve the transformation (namely, translation and rotation) between two point clouds. Generally, we firstly define a reasonable target function with regard to the transformation parameters. It guarantees a good registration result when the function’s value is minimized. Then we can minimize (optimize) the target function using methods like Singular Value Decomposition (SVD), Linear Least Square (LLS) and also non-linear optimization algorithms such as Gauss-Newton and Levenberg-Marquardt. The transformation parameters corresponding to the minimal target function value are what we’d like to achieve.
The closely related studies would be briefly reviewed and discussed as follows. In section 2, common transformation estimation methods shared by various registration algorithm would be reviewed. In section 3, 4 and 5, registration methods based on feature matching, iterative closest points and randomized hypothesize-and-verify would be reviewed respectively. Section 6 consists of the summary of introduced algorithms and the outlook of existing challenges and open questions.
Given the correspondence points in the source (moving) point cloud and the target (referenced) point cloud, we’d like to estimate the transformation from the source to the target point cloud, as shown in Fig.2. the target function under point-to-point distance metric can be drawn as Eq.1, which leads to minimum sum of distance between correspondences after registration. In this case, at least three pair of correspondence are needed.
. When fitting the planes, we can get the normal vectorand the distance from coordinate origin to the plane . Registration’s target is to minimize the sum of difference of the normal vector and the distance between corresponding planes after updating the estimated transformation. In this case, still at least three pair of corresponding planes are needed.
A popular closed-form solution to point-to-point target function Eq.1 is the method based on SVD[svd]. Firstly, we calculate the centroids of the source and target point clouds (Eq.3) and the decentralized coordinate of all the correspondences (Eq.4). After that, we apply SVD (Eq.5) and get the rotation matrix and translation vector from the decomposed matrices as Eq.6.
Another popular solution to Eq.1 is linear least square parameter estimation [lls]. Since we can do the approximation , the rotation matrix can be represented as Eq.7. Then we can construct the observation function as Eq.8, which can be arranged as Eq.9. Since the design matrix and observation vector can be calculated, the transformation parameters can then be estimated as Eq.10. The rotation matrix is then restored from the rotation vector.
As for the plane-to-plane target function Eq.2, a simple solution is the three plane plus one intersection point method [visualplanar]. The selected planes have to be linearly independent and intersect at a unique point in order for the transformation parameters to be fully recovered. We can calculate rotation matrix from normal vectors as Eq.11 and Eq.12. The intersection point is calculated as Eq.13. The translation vector is then calculated from the vector between the corresponding intersection point of these planes as Eq.14.
Most of the feature matching based registration algorithms (as shown in Fig.4) follow the similar workflow.
Firstly, keypoint detectors such as intrinsic shape signature [intrinsic], 3D Harris [3dharris] and local curvature maximum are explored to detect keypoints from original point clouds. These keypoints are more significant geometrically so that more representative features can be extracted from them.
Secondly, local feature descriptors such as Spin Image [SI], Fast Point Feature Histograms (FPFH) [fpfhdes], SHOT descriptor [Shot], Rotational Projection Statistics (RoPS)[rops] , 3D Shape Context [3dsc] and Binary Shape Context [BSC]
are generated to encode the local neighborhood information of each keypoint. These feature descriptors should be invariant or insensitive to rigid transformation (translation and rotation) and have high precision and recall for matching. Several popular handcrafted feature descriptors are shown in Fig.5.
Recently, apart from these manually craft features, there are some learned features using deep neural network. A state-of-art point-based model is the PointNet[Pointnet]
, which is able to learn the descriptive point-wise feature of the point cloud for classification and semantic segmentation. A better network structure for point feature extraction used for matching and registration is the so-called siamese network with triplet loss function. An example is the Perfect Match Net[perfectmatch]
, which outperforms all the existing handcrafted and learned features on matching accuracy and efficiency with only about 16 dimensional learned feature. Since deep learning has already proved its superiority to traditional methods on both 2D and 3D computer vision, point cloud registration based on deep learning may finally be the main stream solution in the near future.
Thirdly, various feature matching strategies such as reciprocal nearest neighbor, nearest neighbor similarity ratio test and bipartite graph minimal match [IGSP] as shown in Fig.6 are adopted to identify the initial match.
However, there may still be a lot of outlier matches (red lines in Fig.7) among them . Then the incorrect correspondences are eliminated based on methods such as RANSAC [RANSAC], geometric consistency constraints [yangreg]
or Game Theory based matching algorithm[gametheory]. The RANSAC based on correspondence selects the best correspondence triplet which gives rise to most inliers after transformation.
Finally, the spatial transformation between the point cloud pair is estimated based on the correspondences after filtering.
Feature matching based registration methods are global registration approaches because there’s no requirement of transformation initial guess. The drawback is that they are not accurate enough since they are based on keypoints instead of the denser raw point cloud so that they are often regarded as coarse registration [coarsereg]. Only with proper matching strategy and outlier filter, can feature matching based methods be robust to noise, occlusion and low overlapping ratio. Besides, these methods are usually time-consuming due to their complex feature extraction, matching and filtering procedure.
The Iterative Closest Point (ICP) algorithm [ICP] is the most commonly used fine registration method due to its conceptual simplicity and high usability. With a good initial transformation, ICP accomplishes a locally optimal registration by alternately solving for point-to-point closest correspondences and optimal rigid transformation until convergence. At the correspondence determination step, ICP simply assumes closest points of source and target point cloud as correspondences. Then the transformation is estimated from the closest points by minimizing Eq.1. The source point cloud is updated with the transformation and the new closest point correspondences can be calculated so that the aforementioned process can be done iteratively, as shown in Fig.8(a).
The variants of ICP mainly focus on different processing steps (correspondence determination, outlier correspondence rejection and transformation estimation target function construction) of classic ICP algorithm [ICPcompare].
For correspondence determination, as shown in Fig.9, there are alternative principle like normal shooting which is suitable for registration of smooth structure and viewpoint projection which is more efficient when the viewpoint is already known.
As for the outlier correspondence rejection, as shown in Fig.10, we can set the correspondence distance threshold , normal vector compatibility and matching uniqueness to get rid of correspondence outliers. [trimmed] propose the Trimmed-ICP algorithm which estimates the distance threshold according to the approximate overlapping ratio.
There are also some variants of ICP focus on the distance metrics [pclreg] of transformation estimation target function as shown in Fig.11. In comparison with the point-to-point distance, the point-to-plane [pointtoplaneicp], point-to-line [pointtolineicp] distance metrics have better performance on scenarios with plenty of facades (plane) or pillars (lines). Their target functions for transformation estimation are listed in Eq.15 and Eq.16 respectively, in which and are the normal vector of and ’s neighborhood. The idea of multiple distance metrics is applied in the state-of-art Lidar odometry solution LOAM [loam], which uses non-linear optimization to solve the point-to-plane and point-to-plane ICP. Furthermore, [gicp] proposed the Generalized ICP, which adopts the neighborhood covariance matrix to combine different distance metrics together. For these methods, neighborhood Principle Component Analysis (PCA) needs to be done to get normal vector as well as neighborhood covariance.
Since ICP tends to converged to wrong local optima with bad transformation initial guess[icpreview] as shown in Fig.8(b-c), other variants of ICP focus on broadening the basin of convergence and avoiding the local optimum. ICP with Invariant Features (ICPIF) [icpif] combined invariant features with geometric distance in the closest distance calculation. ICPIF is more likely to converged to global optimum than ICP under ideal, noise free conditions. [goicp] proposed the Global Optimal ICP (Go-ICP) to integrate ICP with a branch-and-bound (BnB) scheme so that a coarse registration is not needed. However, Go-ICP is much more time consuming than ICP and sensitive to outliers.
In conclusion, the advantages and limitations of ICP are listed as follows. On one hand, ICP is extremely dependent on a good initialization, without which the algorithm is likely to be trapped in a local optimum so that it’s a kind of local registration method. On the other hand, ICP can achieve high registration accuracy when the rotation deviation to the ground truth is small so that ICP is often known as a preferred fine registration method. The general strategy to do registration for TLS point cloud is to apply coarse registration method at first and then use ICP to refine the coarse result. Besides, since ICP is somewhat efficient and following a simple and versatile processing structure, it is the most popular algorithm for SLAM related applications nowadays [icpreview].
The most representative randomized hypothesize-and-verify based algorithms is RANSAC [RANSAC]. Without doing correspondence determination (feature matching), we can also apply RANSAC without correspondence to find the largest common pointset for determining the correct registration. Randomly select three different points from the source point cloud and three from the target point cloud to form a group of correspondence bases, estimate the candidate transformation that register the base pairs, and then count the number of point from transformed source point cloud that within a inlier distance threshold from the nearest points in the target point cloud. The transformation estimated by the base pairs with the most points within distance threshold would be accepted at last.
The problem is the efficiency due to the equation of minimum iteration number for a trustworthy sample set as shown in Eq.17, in which is the point inlier ratio, is the sample number (6 here) , is the confidence and is the trial number.
The better solution is to pick base points randomly from source point cloud, and efficiently look for geometric congruent corresponding points in target point cloud. Though the time complexity for congruent group searching is , the sample number would decrease to 3 and the inlier ratio would increase a lot, thus leading to less total trial number.
The 4-Points Congruent Sets (4PCS) [4PCS] adopts the searching algorithm of affine transformation invariant coplanar 4 point group in different point sets with time complexity [fastaffine] to improve the congruent group searching efficiency since rigid transformation is the subset of affine transformation. 4PCS determines the corresponding four-point base sets by taking advantage of the invariant intersection distance ratios of these four points, as shown in Fig.12 and Fig.13.
Since 4PCS has good performance on challenging global registration cases but is still time-consuming due to the huge total number of points, some more efficient variants of 4PCS are proposed recently. Super4PCS [Super4PCS] decrease the time complexity of congruent set searching from to by using smart indexing of points. K-4PCS [k4pcs] is based on significant keypoints instead of the raw point cloud so that decrease the processing point number.
Randomly hypothesis and verify strategy can also be used for plane correspondence. [planarreg] use RANSAC for plane correspondence based registration. [v4pcs] proposed V4PCS, which is a plane version of 4PCS. In V4PCS, plane primitive is fitted from each voxel. Then the invariant intersection angle between 4 plane set’s normal vectors is adopted to speed up the congruent plane searching procedure.
These random sample based algorithms also do not need the transformation initial guess so they can do global registration. Since candidates are checked under a certain confidence, these methods are somehow robust to noise, outliers and similar (repetitive) structures but the total trial number would also increase under such circumstance because the likelihood of picking outlier free subsets degrades rapidly. Besides, these algorithm are mostly coarse registration solutions since the final transform estimated from the pair base of only minimum required number of correspondence is not accurate enough.
Except for the aforementioned methods, there are also some probability based methods that don’t follow the correspondence determination and transform estimation workflow. These probability based algorithms often fit some kind of probability distribution in target point cloud and then maximize the product of probability of the transformed points in source point cloud under certain distribution. Some examples are the 3D Normal Distribution Transformation (NDT) algorithm[NDT], the Coherent Point Drift (CPD) algorithm [cpd] and the Gaussian Mixed Model Registration (GMMReg) algorithm [GMMReg]. NDT has been widely used in LiDAR assisted localization. Since there’s no exact one-to-one correspondence between two point clouds due to noise of measurement and sampling, applying the probability based strategy can better deal with such problem compared with correspondence based methods. However, other probability based methods are still not robust enough to handle registration on large scale real world scenarios.
Generally speaking, for practical registration cases with a lot of scans, we often follow the ‘coarse to fine’ and ‘pairwise to multi-view’ processing idea, as shown in Fig.14. Coarse registration algorithm like feature matching and 4PCS are used at first and then fine registration algorithm such as ICP and NDT are adopted to refine the former result. Then we usually use some global adjustment strategy like pose graph optimization [globallyreg] [comparereg] to jointly register multiple scans together and minimize the misclosure.
Although there are so many different target-less registration methods that are suitable for different kind of datasets, they still face some common challenges, namely the huge number of points, low overlapping rate, the exist of clutters, occlusion, noise as well as the repetitive structures. Besides, the trade-off of accuracy and efficiency is still a big issue in practice. These challenges would be the main focus of future improvement of registration methods.
There are some other open questions to solve in the near future. For example, the cross-platform registration (such as ALS and TLS point clouds) is still a challenging problem due to the huge difference in perspective ,range and point density. Besides, the little-overlapping registration or shape matching problem which can be very useful in digital cultural relic restoration still waits for good solutions.