I Introduction
Current 3D laser scanners and RGBD cameras produce large 3D point clouds at a high frequency. For many applications the first step is to register or align pairs of such point clouds with each other. Accurately estimating the relative transformation between two such point clouds is paramount in applications such as SLAM, where it serves to constrain the pose graph, or in manipulation to perform scene reconstruction for future grasp planning. A widely used method for this, thanks to its simplicity and efficiency, is iterative closest point (ICP) [2].
ICP takes two point clouds, source and reference, as input with the goal of estimating the transformation that aligns the two point clouds. Using an initial transformation guess, ICP finds corresponding points in the two point clouds and updates the transformation in order to minimise the Euclidean distance between the correspondences. These steps are repeatedly executed until convergence is achieved. The simplicity of ICP stems from the fact that the transformation is estimated solely on the basis of distances between points in the two point clouds. This is in contrast to other approaches which rely on features extracted from the point clouds in order to perform the data association.
The quality of an ICP solution depends on the accuracy of the initial guess and the overlap between the two point sets. The initial guess is typically obtained from a motion model using data from an inertial measurement unit (IMU), odometer, etc. An inaccurate initial guess, due to drift and limited precision of the motion model, can result in large alignment errors resulting in incorrect transformation estimates being produced.
Standard ICP uses all available data points in every iteration when computing the transformation between the two point clouds. With sensors capable of producing point clouds with tens or hundreds of thousands of points, this approach is computationally expensive. In this paper, we propose a novel approach to solving the optimisation problem of ICP by exploiting the recent advances in stochastic gradient descent (SGD) based optimisation. Instead of computing gradient information using the entire point clouds, as done by traditional ICP methods, we use smaller partial point clouds or minibatches which can be processed faster and in constant time. While the gradient information obtained in this manner is noisier when compared to those obtained over the full point clouds, the gradients converge to the same result in expectation [4]. This results in a method which solves the original ICP problem more quickly, without sacrificing stability or quality of the solution.
The contributions of this paper are as follows:

A novel way to improve the speed and data efficiency of ICP to handle large scale point clouds;

A robust and efficient ICP method which is less sensitive to parameters when processing different data sources.
An implementation of the method described in this paper is available online^{1}^{1}1https://bitbucket.org/fafz/sgd_icp/.
Ii Related Work
Selecting corresponding points in the source and reference point clouds is an important step in ICP, as evidenced by the various ICP variants that have been proposed in the literature to improve this part of ICP. Luck et al. [10] use simulated annealing to obtain a good initial starting point. Masuda and Yokoyo [11]
combine a least median square estimator and ICP using random samples of points in each iteration. While this method uses random subsampling, the goal is to remove outliers from the point cloud processed by ICP and does not perform incremental updates of the transformation estimate as our proposed method does.
Chen and Medioni [5] propose a robust version of ICP which minimises the distance between points and planes (pointtoplane ICP) instead of the traditional pointtopoint distance. In pointtoplane ICP, matching points in the two point clouds are determined by intersecting a ray from the source point in direction of source point’s normal with the reference point’s surface. This method is more robust to local minima when compared to standard ICP [18]. Segal et al. [19] combine pointtopoint and pointtoplane ICP into a single probabilistic framework called GeneralisedICP (GICP). GICP is a planetoplane algorithm that models the local planar structure in both source and reference point clouds instead of just the reference’s, as is typically done with the pointtoplane ICP. Comprehensive summaries of different ICP variants and their performance and properties can be found in [16] and [13].
Once the corresponding point pairs are selected the next challenge is to use this information to find the optimal transformation. Several optimisation techniques are used to minimise ICP’s cost function. Fitzgibbon [7] presents a method which uses a nonlinear LevenbergMarquardt method [15] which combines batch gradient descent and GaussNewton to optimise the cost function. Levine [3] uses simulated annealing to minimise the cost function.
Another avenue of research are methods that improve the ability of ICP to handle an unknown degree of overlap between the point clouds. This includes the use of a trimmed square cost function to estimate optimal transformation [6]
, rejection of corresponding point pairs based on a threshold defined by the standard deviation of point pair distances
[22], or the use of a semi differential invariant for correspondence estimation [12].Aligning point clouds and estimating relative transforms between them can be achieved using other methods as well, including [21] and [20]. These methods use distribution based filtering approaches to model the uncertainties in the transformation parameters.
Our proposed method does not introduce a novel formulation of ICP, rather it provides a different method of solving the underlying optimisation problem, in a manner that is computationally more efficient and less reliant on parameter tuning.
Iii Stochastic Gradient Descent ICP
In the following subsection, we introduce the optimisation problem underlying ICP before describing our proposed method, SGDICP, which solves the ICP problem using stochastic gradient descent.
Iiia Standard ICP
Standard ICP aligns a pair of point clouds in two steps. In the first step, the source cloud is transformed with a current transform and then the correspondence between pairs of points in the transformed source and the reference point clouds are established on the basis of Euclidean distance. In the second step, a transformation which minimises the pointtopoint Euclidean distance between the corresponding points is calculated. The cost function or loss optimised by ICP in the second step is a function of representing the translation and rotation needed to align the two point clouds, i.e.:
(1) 
where N represents the total number of points in the source cloud, is a point in the source cloud, is the corresponding point in the reference cloud, and
is a transformation matrix parametrised by the parameter vector
. can be decomposed into and , where is a translation vector and is a rotation matrix. Rewriting (1) using the decomposition of , we obtain :(2) 
IiiB Stochastic Gradient Descent ICP
The optimisation problem in (1) is usually solved using batch optimisation methods in ICP. However, batch optimisation algorithms scale badly to problems where a large number of data points have to be processed in each step. This motivates the use of stochastic gradient descent (SGD) to solve the optimisation step in ICP. Stochastic gradient descent computes a gradient based on a single point or small number of points, minibatch, as opposed to the full point cloud, thus resulting in a computational cost that is independent of the size of the point cloud. While computing a gradient in such a manner results in a noisy estimate of the true gradient, one can prove convergence and optimality of SGD [4].
We can minimise the loss in (1) using minibatches containing points with SGD with the following update equation for the transformation parameters :
(3) 
where are the partial derivatives of the loss with respect to the individual parameters. and are the values of the transformation parameters at the current and previous iteration respectively. The matrix
acts as a preconditioner and is assumed to be the identity matrix in our case and the learning rate or stepsize
dictates how quickly parameter values change. The stepsize can be set according to various schedules, including fixed step size and adaptive schemes such as ADAM [9].The full update rule for the parameter vector has the following form:
(4) 
For translational update, is independent of the rotational components, thus . Similarly, for rotation angle updates, does not depend on translational parameters, resulting in . Equation (4) can further be made more explicit by splitting it into translational updates,
(5) 
and rotation angle updates
(6) 
where are vectors and are matrices. For example, , , and are , , and respectively and is . and can be derived in a similar manner. The individual components of the rotation matrix ( to ) can be found in [5].
In SGDICP, a minibatch is obtained by picking a random set of points from the source point cloud. Each point in the minibatch is then transformed with the current transformation matrix, i.e., . The corresponding points to these transformed points in the reference point cloud () are found using the pointtopoint distance metric:
(7) 
In case of partial overlap between the two point clouds, we use a maximum correspondence threshold to reject unlikely matches. Using the point pairs obtained in this manner, we iteratively minimise the Euclidean distance between the two clouds using (4).
Algorithm 1 shows the steps performed by our method. First a new minibatch is obtained from the source cloud in line 2. It is then transformed with the latest transformation matrix in line 3, then in lines 4 to 8 the corresponding points to those points in the reference cloud are found and stored. Once the corresponding pairs have been obtained, the stochastic gradient is computed and used to update the translation and rotation parameters of the transformation in lines 9 and 10. Finally, upon convergence the final set of parameters is returned in line 13.
In order to avoid bias in the minibatch selection, sampling is performed without replacement until all points have been selected, at which point all points are added back to the sampling pool where they are reconsidered for sampling. As the spatial extents of the point clouds can vary a lot, each point’s position is scaled to . This results in the stepsize not being dependant on the spatial extent of the point clouds being aligned. This scaling is undone before returning the final transformation.
Although this paper focuses on the pointpoint cost variant of ICP, it is in principle applicable for any other cost variant of ICP.
Iv Experiments
The following experiments investigate the performance of SGDICP in terms of dataefficiency, run time and accuracy, robustness, as well as parameter sensitivity. We compare our proposed method against standard ICP and Generalized ICP (GICP) [19], using the PCL [17] implementations, as well as a pointtopoint (LIBPOINT) and pointtoplane (LIBPLANE) based ICP variants implemented in libpointmatcher as described in [14]. The libpointmatcher methods use random subsampling in order to improve the runtime of the methods at the expense of some solution quality. We also investigate the effect the minibatch size has on the performance of SGDICP in terms of accuracy and runtime. The quality of the transformation estimates is measured separately for translation and rotation. Translational error is measured as the sum of Euclidean distance errors while rotational error is measured as the sum of the absolute angular differences.
Experiments are performed using the ETHZASL Kinect Dataset^{2}^{2}2https://projects.asl.ethz.ch/datasets/doku.php [14] which contains Kinect scans from an indoor scene with varying amount of clutter. The ground truth for this dataset is obtained by aligning the point clouds using standard ICP using the transform obtained from the Vicon system as the initial guess. As we are interested in pairwise transformations rather than tracking, this approach provides a better ground truth. The second dataset we use is the KITTI datset [8] from which we use the Velodyne data. As no high accuracy ground truth is available for scantoscan alignment, we apply random transformations of up to a maximum translation motion of and rotational motion of onto Velodyne scans.
In all the experiments, we use a value of to discard incorrect correspondences, a minibatch size of points and a step size of . These values work well in practice and no attempt was made to find the best performing ones. For libpointmatcher based methods, we use a subsampling ratio of to achieve good accuracy and speed. The experiments are performed on a desktop PC with an Intel Core i77700 CPU and 16 GB RAM. The source code implementation is singlethreaded and does not employ any GPU acceleration.
Iva DataEfficiency
We begin by evaluating how many data points the different methods need to process, to obtain a given level of solution quality. This is done by recording the number of points used by each method in each iteration until convergence is achieved. Figure 1 shows the number of points processed in logscale along the Xaxis with the translational error (in meters) along the Yaxis. The starting point of each line indicates the translational error and number of points used by each algorithm after one iteration. This plot clearly shows how our method (SGDICP) requires significantly less points, equivalent to a single pass through the point cloud of 15646 points, than the other methods to converge. GICP needs less passes over the cloud than the other methods, which is explained by the more informative error used and a robust point correspondence method. However, as we will see this comes at the cost of run time. The other three methods all use a large amount of points, yet all five methods achieve comparable errors. From Figure 1, we can additionally see that the libpointmatcher variants are more data efficient than standard ICP. This stems from the preprocessing done by these methods to the initial point clouds which includes randomly subsampling the original point clouds to reduce the total amount of points processed in each iteration.
This example demonstrates that the stochastic gradient estimates of our method are sufficient to achieve results comparable to other methods but at a fraction of the data being processed. In the next section we are going to see how this reduction in data usage of SGDICP translates to runtime efficiency and impact on solution accuracy.
IvB Solution Quality and Runtime
To evaluate the quality of the solution and runtime of the algorithms, we use 1000 cloud pairs randomly selected from the mediummediumfly ETHZASL Kinect dataset. The average alignment error and runtime along with their standard deviations are shown in Table I. The translational and rotational error is presented in meters (m) and radians (rad) respectively with runtime in seconds (sec).
Method  Translational  Rotational  RunTime 

Error (m)  Error (rad)  (sec)  
SGDICP  
ICP  
LIBPOINT  
LIBPLANE  
GICP 
Looking at the errors we can see that, SGDICP obtains results equivalent to those of ICP. This is expected, as the optimisation problem solved by the two problems is identical. GICP obtains the solutions with the lowest error which can be attributed to the more informative error metric as well as point correspondence method. Finally, both libpointmatcher based methods (LIBPOINT and LIBPLANE) obtain results with a larger error than standard ICP, which is part of the tradeoff incurred by the preprocessing of these methods.
Turning our attention to the runtime values, it is clear that SGDICP performs the best, being several times faster than standard ICP, while achieving the same error. This shows that, the reduction in number of points needed by SGDICP to converge translates directly into runtime efficiency. This also reveals that the runtime cost incurred to obtain the high quality results of GICP, being 40 times slower than standard ICP. A possible reason for this increase in runtime is the fact that, the computation of normal information can be costly, especially when done in combination with a full batch optimiser such as BroydenFletcherGoldfarbShanno (BFGS) [1] as used by GICP. Looking at the libpointmatcher methods, one can see that the reduction in quality has a beneficial impact on the runtime.
While both our proposed method SGDICP and the LIBPOINT and LIBPLANE methods process smaller parts of the original point clouds in each iteration, we can see the impact the fixed choice of points in LIBPOINT and LIBPLANE has on the error. The fixed choice of points results in the points that might be informative being discarded, in contrast to our method, which has access to different parts of the point cloud in each iteration. This results in fast periteration time without losing possible information.
Another interesting point is the comparison of convergence speed of LIBPLANE to standard ICP and LIBPOINT, both of which use a pointtopoint based error metric. The faster convergence of LIBPLANE indicates that planetoplane based ICP provides a substantial benefit over pointtopoint based metrics. As SGDICP only addresses the way in which the optimisation is performed, it will be interesting to see how using the planetoplane metric performs with it.
IvC Impact of Error in Initial Transformation on Accuracy
As our proposed method relies on stochastic gradients, this experiment aims to explore whether there is any instability arising from this. To this end, we analyse how the algorithms perform with various levels of misspecification of the initial guess of the translation and rotation. We use the mediummediumtranslation and mediummediumrotation recordings of the ETHZASL Kinect dataset. The possible pairings between point clouds are grouped into bins, based on the amount of translation or rotation needed to align them correctly. The maximum translation is and the maximum rotation is . These ranges are then split into ten evenly sized bins and 100 point cloud pairs for each of the ten bins are selected. For rotations the last two bins are excluded, as very few examples exist for these.
Figure 2 shows the distribution of errors in translation (left) and rotation (right) as box plots. Along the Xaxis the different error bins are shown while the Yaxis shows the error. For errors in translation, we can see that all methods perform without fault until roughly error in translation, at which point all methods start to degrade in a similar manner. The reason for this is that, at that point the overlap between the scans becomes very small, resulting in many ambiguous solutions. For example single planar surface with no or few features hindering in finding the correct alignment, as shown in the two examples representing the ground truth alignment of a reference cloud on the right and source cloud on the left in Figure 3.
The comparison of the impact of error in the initial rotation (right hand figure 2) shows the amount of initial rotational offset along the Xaxis. The behaviour here is similar to the translational case in that, all methods start to degrade at a similar point, roughly at . Again, similar to the previous test this coincides with a low overlap between the clouds. Equivalent performance of SGDICP to that of other methods shows that the stochastic nature of SGDICP does not result in a reduced ability to handle initial misspecification of transformations.
IvD Effect of Batch size on the Solution Quality and Runtime
To investigate the effect of a batchsize on the solution quality and runtime, we use the same dataset as is used in section IV B. This experiment is performed with a fixed stepsize (vanillaSGD) as well as an adaptive stepsize ADAM [9] (AdamSGD).
Figure 4 shows the number of points in a minibatch along the Xaxis and a comparison of vanillaSGD against AdamSGD in terms of corresponding translational and rotational error in meters (m) and radians (rad) respectively, runtime in seconds (sec) and number of iterations along the Yaxis. This plot shows that a minibatch containing 15 or more points gives the same solution quality. However, a bigger minibatch containing 300 or more points needs more time to converge. This increases in the runtime can be attributed to the longer time consumed by dataassociation computation for larger minibatches in each iteration.
Comparison of VanillaSGD against AdamSGD indicates that both perform equally well in terms of the solution quality, except that vanillaSGD could not converge to the right solution for a minibatch containing less than 15 points. Overall, AdamSGD is superior to vanillaSGD in terms of runtime efficiency as it converges in fewer iterations and in less time. The reason of superior performance of AdamSGD is likely the smarter stepsize which considers an exponentially decaying average of both past gradients and squared gradients in each dimension independently, resulting in quicker convergence.
IvE Velodyne Data
So far all experiments are conducted on data gathered with a Kinect, which results in dense and evenly spaced point clouds with small ranges. In this experiment we use data gathered by a Velodyne from the KITTI dataset. This data has long range readings with highly variable point density. All methods use the exact same parameters they used in the previous experiments, with the goal being to see the impact of these parameters when moving between data from different sensors.
Method  Translational  Rotational 

Error (m)  Error (rad)  
SGDICP  
ICP  
LIBPOINT  
LIBPLANE  
GICP 
The results presented in Table II show that SGDICP has the lowest error, while standard ICP and GICP fail to produce usable results. A possible explanation for the poor performance of ICP and GICP is that they require proper parameter tuning to obtain good performance. On other other hand, SGDICP, LIBPOINT, and LIBPLANE all perform well. The scaling of point clouds performed by SGDICP enables our method to handle Velodyne scans in the same manner as Kinect data. This scaling removes the need to perform the usually critical stepsize tuning of SGD based methods. The random subsampling of points performed by the libpointmatcher methods renders them robust to the changes in the point clouds in addition to improving runtime.
In summary, these experiments demonstrate that SGDICP produces alignment errors comparable to standard ICP at a significantly reduced computational cost. In comparison to other methods that attempt to improve the runtime of ICP, the stochastic sampling of SGDICP provides the advantage that no apriori data reduction, which can lead to loss of information, needs to be performed.
However, SGDICP still shares the same drawbacks as standard ICP in that it is sensitive to outliers in case of partial overlap which can lead to wrong data associations, as discussed in the result of initial guess impact on the performance of the methods. This is an inherent characteristic of the pointtopoint distance based data association.
V Conclusion
In this paper we present an efficient and accurate ICP variant, called SGDICP, which employs stochastic gradient descent (SGD) to solve the optimisation problem at the core of ICP. Solving the optimisation using stochastic gradient updates results in significantly faster runtime without any loss in the quality of the final transformation estimate. Comparisons with other common methods show that our proposed method is faster, as accurate as standard ICP and other popular ICP variants, and easily applicable to point clouds from different sources without additional parameter tuning.
Acknowledgements
This research is supported by an Australian Government Research Training Program (RTP) Scholarship.
References
 [1] M. Avriel. Nonlinear Programming: Analysis and Methods. Courier Corporation, 2003.
 [2] P. Besl and N.D. McKay. Method for registration of 3d shapes. In Sensor Fusion IV: Control Paradigms and Data Structures, 1992.
 [3] G. Blais and M.D. Levine. Registering multiview range data to create 3D computer objects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1995.

[4]
L. Bottou, F.E. Curtis, and J. Nocedal.
Optimization methods for largescale machine learning.
SIAM Review, 2018.  [5] Y. Chen and G. Medioni. Object modeling by registration of multiple range images. In Proceedings of IEEE International Conference on Robotics and Automation, 1991.

[6]
D. Chetverikov, D. Svirko, D. Stepanov, and P. Krsek.
The trimmed iterative closest point algorithm.
International Conference on Pattern Recognition
, 2002.  [7] A. Fitzgibbon. Robust registration of 2D and 3D point sets. Image and Vision Computing, 2003.
 [8] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun. Vision meets robotics: The kitti dataset. International Journal of Robotics Research (IJRR), 2013.
 [9] D. Kingma and B. Adam. A method for stochastic optimization. In International Conference on Learning Representations, 2015.
 [10] J. Luck, C. Little, and W. Hoff. Registration of range data using a hybrid simulated annealing and iterative closest point algorithm. In IEEE International Conference on Robotics and Automation, 2000.
 [11] T. Masuda and N. Yokoya. Robust method for registration and segmentation of multiple range images. Computer Vision and Image Understanding, 1995.
 [12] T. Pajdla and L.V. Gool. Matching of 3D curves using semidifferential invariants. Proceedings of the IEEE International Conference on Computer Vision, 1995.
 [13] F. Pomerleau, F. Colas, and R. Siegwart. A review of point cloud registration algorithms for mobile robotics. Foundations and Trends in Robotics, 2015.
 [14] F. Pomerleau, S. Magnenat, F. Colas, M. Liu, and R. Siegwart. Tracking a depth camera: Parameter exploration for fast ICP. IEEE International Conference on Intelligent Robots and Systems, 2011.
 [15] W Press, S. Teukolsky, W. Vetterling, and B. Flannery. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, 2nd edition, 1992.
 [16] S. Rusinkiewicz and M. Levoy. Efficient variants of the ICP algorithm. Proceedings of the International Conference on 3D Digital Imaging and Modeling, 2001.
 [17] R.B Rusu and S. Cousins. 3D is here: Point Cloud Library (PCL). In IEEE International Conference on Robotics and Automation, 2011.
 [18] J. Salvi, C. Matabosch, D. Fofi, and J. Forest. A review of recent range image registration methods with accuracy evaluation. Image and Vision Computing, 2007.
 [19] A. Segal, D. Haehnel, and S. Thrun. GeneralizedICP. In Robotics: Science and Systems, 2009.

[20]
R. Srivatsan, G. Rosen, F. Naina, D. Mohamed, and H. Choset.
Estimating SE(3) elements using a dual quaternion based linear Kalman filter.
Robotics: Science and Systems, 2016.  [21] R. A. Srivatsan, M. Xu, N. Zevallos, and H. Choset. Bingham distributionbased linear filter for online pose estimation. In Robotics: Science and Systems, 2017.
 [22] T. Zinsser, J. Schmidt, and H. Niemann. A refined ICP algorithm for robust 3D correspondence estimation. Proceedings of the International Conference on Image Processing, 2003.
Comments
There are no comments yet.