I Introduction
There is currently a wide diversity of techniques for obtaining point clouds (e.g. Kinect, Lidar, range cameras, structure from motion (SFM) and simultaneous localization and mapping (SLAM)). Their registration is a long standing and difficult challenge in computer vision, computer graphics, robotics, and medical applications. Because a point cloud usually contains tens of thousands, or millions, of points in each scene, it is much complex and difficult than the point set registration problem, which always processes less than 1000 points in each scene. When the point cloud coming from different kinds of sensors, the registration problem becomes much more difficulty because of the disparate sensing mechanisms. However, current researches mainly focus on reporting less than 1000 point set registration
[1][2][3][4][5]. Unlike these previous methods, we propose a method in this paper to deal with the crosssource point cloud registration problem, which is a generalization of complex point cloud registration. The existing point cloud registration methods can be categorized into two aspects: samesource and crosssource.In terms of samesource point cloud registration, existing methods can be divided into two categories: direct methods and transformed methods. Direct methods usually minimize the distance between pairwised points or features [6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]. Transformed methods usually transform 3D points from Euclidean space to other models and convert the registration problem into a model correspondence problem [1, 2, 17, 18, 19, 20].
There is a single paper about crosssource point cloud matching/registration [21], but its registration is executed using conventional iterative closest point (ICP) [7]
and many assumptions are made, including removing sparse outliers and manually selecting the dense point regions. A 4Points Congruent Setsbased method (4PCS) shows elements of experiments that deal with crosssource problems
[16], although such 4PCSbased methods are suboptimal in the direction of slippage as a result of operating at a point level [22, 15].The fact that point clouds come from different kinds of sensors (e.g. SFM with mobile phones, Kinect, range cameras and Lidar) present many challenges and the existing related methods have many limitations. There is a paucity of research in the literature on this issue. In real applications, however, multiple types of sensors have much greater ability than single sensors. For example, SLAM [23] constructs realtime complete depth and convert the depth to a point cloud, SFM uses images captured by RGB cameras to create point clouds for urban scenes [24] and heritage objects [25]. Other devices, such as Kinect and Lidar, offer effective ways of producing standard point cloud datasets. With the development of new technology, there are increasing means of sensing 3D point clouds describing the same objects or scenes. Registering these datasets will have great value in cultural heritage protection, city development and technology. Nevertheless, crosssource point cloud registration presents a challenge.
Figure 1 shows two crosssource point clouds with two monitors and audio equipment, which illustrates the difficulties confronting a robust registration method:
(1) Varying densities: large variations in the density of crosssource point clouds often lead to the failure of existing registration methods.
(2) Missing data: data is often missing when the same objects have different reflection or nonreflection in various types of sensors as a result of the imaging mechanisms of different sensing techniques. For instance, this problem is pronounced for point clouds created by SFM which is unable to generate points in uniform image regions.
(3) Large variations in scale and rotation angles: even though a registration method is supposed to recover scale and rotation angles, exceedingly large variations in scale and angle are often outside the capture zones of many existing methods (see Figure 7 for an example).
As demonstrated in our experiments, these combined challenges make crosssource point cloud registration difficult, and many existing methods fail in such adverse scenarios.
Despite the large variations in crosssource point clouds, our human vision system seems able to align them effortlessly with high accuracy. This is probably due to the fact that humans exploit the similarities between the
structures of two crosssource point clouds instead of the detailed points. Motivated by this insight, a method is proposed to extract and describe the macro structure (e.g. the global outline of objects) and the micro structure (e.g. voxels and segments) of point clouds. These macro and micro structures act like a net to robustly describe the invariant components of crosssource point clouds, and graph theory is a strong tool for preserving these structures from a mathematical viewpoint. A structure preserved representation method that ignores local point cloud details is proposed to deal with missing data and varying density. A scale normalization method is proposed to deal with the scale problem, and a systematic approach using these two methods is proposed to deal with all crosssource point cloud registration problems.To the best of our knowledge, this is the first time a method has been proposed that successfully registers two crosssource point clouds in adverse scenarios. The proposed approach preserves the structure properties well by firstly, extracting reliable macro and micro structures to be robust to large noise, outliers and some of the missing data; secondly, integrating the point cloud structures as graphs and describing them; thirdly, finding the optimal graphmatching solution, and lastly, refining the solution with 3D RANSAC (Random Sample Consensus) to remove outliers and ICP to finalize the outlierfree registration.
The contributions of this work are (1) a feasible structurebased framework to deal with the crosssource point cloud registration problem; (2) a new graph construction method to practically integrate macro and micro structures as a graph and robustly describe these structures; and (3) a new iteration method to solve the graph matching problem taking the global geometrical constraint into consideration.
Ii Related work
Samesource point clouds are captured from the same kinds of sensors (e.g. all captured by Kinect), while crosssources are captured from different kinds of sensors (e.g. one by Kinect, another by an RGB camera). In this section, the related methods are reviewed in terms of their ability to deal with the three challenges of crosssource point cloud registration. As noted in Section I, existing methods can be categorized as: direct methods and transformed methods.
Iia Direct methods
Direct point set registration methods usually minimize the Euclidean distance between nearby points. The most popular approach is the ICP [7]
algorithm, which alternates between estimating the point correspondence and estimating the transformation matrix for a given correspondence
[11, 10, 13]. The vanilla ICP method [7]relies on the assumption that all points have pairwise counterparts between two sets and are very sensitive to a given initialization. The method is widely used in samesource and crosssource registration. Iterative nonrigid point set matching has improved ICP by incorporating outlier detection in the iterative correspondence estimation steps
[8]. The above methods are all heuristic methods, hence they cannot guarantee the global optimality of the solutions. GoICP
[14] provides a globally optimal solution to ICP in 3D Euclidean registration, which combines ICP with a branchandbound (BnB) scheme. Similarly, GOGMA [26]combines Gaussian mixture model (GMM) with a BnB scheme. These global optimal methods are sensitive to scale problem. A rootsfinding technique was used in
[27]for affine invariant point set registration. The method is sensitive to outliers due to use of moments. To tackle outlier sensitive problem in
[27], [5] proposes a method that uses an L2E estimator and ICP. The method of [5] is suitable both for 2D and 3D situations. In the 2D instance, shape context is used as descriptor and the Hungarian method is used for matching with the test statistic as the cost measure. In the 3D instance, the spin image can be used as a feature descriptor, where the local similarity is measured by an improved correlation coefficient. [5] uses L2E, which is particularly appropriate for analyzing massive data sets when data cleaning is impractical. With the ICP refinement algorithm, this algorithm robustly estimates transformation with noise and outlier points. Also, the initial correspondences do not need to be highly accurate. The experimental results show that this algorithm has good performance under deformation, occlusion, rotation, noise and outliers. However, experiments have only been conducted on the algorithm in samesource situation; in the crosssource situation, the L2E estimator may face the problem of large outliers. Despite these improvements to the ICP method, the direct registration approaches above are intrinsically sensitive to missing data, large variations in point density, and scale differences, thus rendering them useless for crosssource point cloud registration (see the experimental results in Section V for examples).In contrast to these ICPbased methods, registration amounts to solving a global problem to find the best aligning rigid transform over the 6DOF space of all possible rigid transforms comprised of translations and rotations when scan pairs start in arbitrary initial poses. Since aligning rigid transforms are uniquely determined by three pairs of (nondegenerate) corresponding points, one popular strategy is to invoke RANSAC [28] to find the aligning triplets of point pairs [29]. This approach, however, regularly degrades to its worst case complexity in the number of data samples in presence of partial matching with low overlap. Various alternatives to RANSAC have been proposed to counter the cubic complexity, such as hierarchical representation in the normal space [30]
; supersymmetric tensors to represent the constraints between the tuples
[31] ; stochastic nonlinear optimization to reduce the distance between scan pairs [32]; branchandbound using pairwise distance invariants [33]; or evolutionary game theoretic matching [34, 35]. However, these methods are all sensitive to missing data.Following the concept of RANSAC, another kind of method is 4PCS [15], which uses a randomized alignment approach and the idea of planar congruent sets to compute optimal global rigid transformation. The 4PCS method is widely used and has been extended to take into account uniform scale variations [36]. However, these methods have a complexity of where denotes the size of the point clouds and is the set of candidate congruent 4points. It has great limitations when point numbers are large. To remove the quadratic complexity of the original 4PCS, [16] extends it to a fast algorithm with only linear computation time needed. This method reports the points or spheres in and uses a smart index to quickly find the matched plane in all candidate congruent 4points planes. One crosssource point cloud registration experiment is reported in [16]. However, these methods have many limitations due to their pointlevel operation. They may easily be suboptimal when computing their transformation relations. The varying density of the crosssource problem makes the performance of the 4PCSbased method even worse.
Although these direct methods show some ability in addressing elements of the crosssource problem, none of them can deal with the complete crosssource problem. In this paper, a novel method is proposed to robustly deal with the entire crosssource problem. The method extracting and combining macro and micro structures is robust to large variations in density, noise and outliers. In addition, the enhanced graph matching globally registers two structures. Lastly, a scale normalization step is used to eliminate most of the scale variation.
IiB Transformed methods
One of the mathematical tools typically used for registration is Mutual Information (MI), which catches the nonlinear correlations between the point clouds and the geometric properties of the target surface. The authors in [37] use ICP and mutual information (MI) to build onetoone correspondence between an magnetic resonance (MR) surface and laserscanned cortical surface; however, this method is highly dependent on initialization and overlap rate. The work in [38]
registers unstructured 3D point clouds by using Kmeans to form a set of codewords and using an estimator to optimize the MI value to obtain the final rigid relations. Cross correlation of the horizontal cross section images of the two point clouds is used in
[39] to coarsely register the point clouds, and ICP is then used to refine the coarse results. These MIbased methods perform poorly when data is missing because it make the MI of two point clouds originally not the same.Another type of transformed method is the featurebased method, which extracts features from 3D point clouds and transforms the point cloud registration Euclidean space into feature space. Typical 3D feature extraction methods
^{1}^{1}1There is a tutorial about 3D features. http://robotica.unileon.es/index.php/PCL/OpenNI_tutorial_4:_3D_object_recognition_(descriptors) are FPFH [40], ESF [41], Spin image [42] and SHOT [43]. These featurebased methods produce exciting results on samesource point clouds. However, it is very difficult to reliably extract similar features from crosssource point clouds, and these methods always fail in this situation. This is because these features may original perceive large discrepancy and cannot used for registration.Torki and Elgammal [19] use local features in images to learn manifold symbol. The authors first learn a feature embedding representation that contains the spatial structure of the features as well as the local appearance similarity. The outofsample method is then used to embed the features from new images. Similarly, Yuan [18] transforms every point in a point clouds into a shape representation, in order to cast the problem of point sets matching as a shape registration problem, which is the Schrodinger distance transform (SDT) representation. The problem is then transformed into solving a static Schrodinger equation in place of the consistent static HamiltonJacobi equation in the setting. The SDT representation is an analytic expression which can be normalized to have unit L2 norm in accordance with theoretical physics literature. The outline of this method is ”points set” ”SDTs””minimize the geodesic distance”.
Related to point cloud registration, another kind of methods is GMMbased methods. To deal with the noise and outliers existing in the point sets registration problem, Bing et al. [2] proposed a method in which point clouds were represented as Gaussian Mixture Models (GMM) and, went on to solve the registration problem by minimizing the statistical discrepancies between corresponding GMMs. This approach can be used for both rigid and nonrigid point cloud registration, and has demonstrated its ability to deal with noise and outliers to some extent. Georgios et al. [1] introduced a motion drift idea into the GMM framework and achieved good results on rigid and nonrigid point set registration. A solution to the GMMbased approach by recasting registration as a clustering problem was proposed in [4]. However, there are an increasing number of GMM models to robustly represent point clouds. When the point number increases to tens of thousands or millions, these methods are impractical in terms of both computational and memory cost. On the other hand, the GMMs depicting two point clouds are shown a lot of difference when there is missing data and large noise and outliers variations in crosssource point clouds, which makes the registration inaccurate or it may even fail. The experiments in Section V demonstrate these approaches do not lead to satisfactory results for crosssource point cloud registration.
The above transformed methods demonstrate ability in dealing with parts of noise and outliers or density variation, but none of them can successfully address the crosssource registration problem, which comprises issues of scale, density variation, noise and outliers and missing data. In this paper, we aim to address this tough crosssource problem. Motivated by our human registration process, a structurebased framework is proposed to robustly register two crosssource point clouds.
Iii Macro and micro structure representation
As mentioned in Section I, the significant challenges for 3D crosssource point cloud registration are the large variations in density, missing data, scale and angle between two point clouds. To address these variations, we define two structures, known as macro and micro structures, to describe the point clouds based on our observations. In our work, we extract structures from the crosssource point clouds and use these structures to indirectly register crosssource point clouds, instead of try to deal with these difficult changing points directly. Similar to our human ability, these structures robustly describe the global and local invariance of the crosssource point clouds, even though there are many variations in relation to these point clouds.
The macro structure is the overall outline or largescale structure of an object or scene. It is important to note that it represents the global properties of the structure, such as the boundary outline, the contour and the shape, but not the global light, global color or global material. Figure 3(a) illustrates that the rectangle above the square (the blue outline) is the macro structure. When humans judge whether two objects are similar, they usually first consider the macro structure, and an overall alignment is obtained on this basis. We define a micro structure to work alongside the macro structure. The micro structure is defined as a small scale structure, such as a stable cell or part of the object or scene. It is a local property that describes the internal details of the object or scene. In our work, the micro structure consists of a 3D region, such as a super voxel in 3D point clouds. Figure 3(b) illustrates that, super voxels contain points with the same properties of 3D spatial geometry. We use these micro and macro structures to iteratively obtain the corresponding relations between two point clouds.
Iv Registration algorithm
In this section, we describe the registration method based on the proposed macro and micro structure theory and describe the components that make up our system. Figure 2 provides an overview of our method in block form. It comprises the following five components:
Step 1, Scale normalization: The preprocessing stage. Two crosssource point clouds, which come from different sensors, are normalized to the same scale. The details are given in Section IVA.
Step 2, Macro/micro Structure Extraction: The main novelty of this stage is the reliable extraction of the structure from large variable crosssource point clouds, which is robust to most crosssource problems. These point clouds are segmented into many super voxels, using their 3D geometric properties, and the statistical property of each super voxel is used for its robust to local variations. These super voxels are integrated as the macro structure and the statistical property of each super voxel becomes the micro structure, as detailed in Section IVB. These structures are integrated in the next step.
Step 3, Graph construction: The main novelty of this stage is the combination of micro and macro structures using graphs. Although there are many variations in two crosssource point clouds, the invariant structure properties are preserved in this method. The nodes are the extracted voxels and the edges are the adjacent relations. In addition, a new similarity measure method is proposed which robustly describes these two graphs, as detailed in Section IVC. After the graph has been constructed, the registration problem is converted to a graph matching problem. An optimization method is thus needed to optimize the graph matching problem.
Step 4, Optimization: The novelty of this stage is the proposal of an enhanced optimization method. Factorized graph matching [3] is an optimization algorithm that optimizes graph matching at a constant time and is less prone to local optimization. To better suit to our problem and to pursue global optimal, we consider the geometry constraints in our optimization as detailed in Section IVD. With this matching result, further refinement is needed.
Step 5, Transformation estimation: Transformation matrix computation stage. RANSAC is performed to first remove outliers, following which ICP refines the initial matching from the graph matching, as detailed in Section IVE.
Iva Scale normalization
The two point clouds come from different sensors and therefore have different scales. To remove scale variation, we conduct scale normalization before the super voxel extraction step. Previously, the scale was normalized by manual measurements in the real world and these two point clouds were calibrated, but although manual measurement is accurate, it is sometimes difficult. We propose an automatic method to estimate the scale without the need for manual work. To achieve this goal, we first compute the mean distance of two 3D points and then compute the scale by comparing these two means as follows:
(1) 
where and
We use this scale to transform other point clouds and remove the scale difference in crosssource point clouds as far as possible. Although we cannot deal with the scale problem completely, the results of this stage are sufficient for the graph matching stage since most of the scale difference is eliminated. After the scale difference has been removed, the voxels can be extracted.
IvB Macro/micro Structure extraction
Due to the large variations in crosssource point clouds, a method is needed to extract the invariable components. Figure 1 shows that even though the two crosssource point clouds have many variations, the structure can still be recognized. For these crosssource point clouds, therefore, the focus is on the structure information rather than the detailed information, since the latter is full of noise, outliers and different densities.
We are motivated by the idea of cluster, where points with the same property are clustered together. As shown in Figure 1, humans have the ability to register these monitors at first glance. This is because the macro structure information remains in the crosssource data and when humans conduct the registration work, they are not concerned with information detail (e.g. the location of a point). However, if we want to accurately register these two point clouds, macro structure information alone is insufficient, and micro structure information is also needed. Hence, to develop an intelligent registration algorithm, we need a method that will retain the common macro and micro structure information and ensure it is robust to varying densities and missing data.
To fulfill this goal, we improve the recently developed segmentation method [44] to segment the two point clouds into many super voxels and extract the direct adjacency graph of these voxels. As the segmentation method adheres to object boundaries while remaining efficient by only using the 3D geometric property, it obtains robust results for two point clouds, regardless of different density, angle, noise and missing data (see the third column of Figure 4). Figure 4 shows that the center of the segmented super voxels deals with much of the noise, density and missing data problem. Unlike [44], we do not flow back at the extraction of each edge in the adjacency graph extraction step, which means that the direction information is considered in our new adjacency graph. This is because in the following optimization step (Section IVD), direct graph matching achieves more robust results than indirect graph matching [45]. This revision is a key element to ensure that these extracted voxels are correctly and robustly registered. At the same time, the ESF descriptor [41] for each voxel is extracted to describe the statistical property as a local structure. Based on the definition of macro and micro structures, therefore, each segmented super voxel is a micro structure and the whole of the adjacency graph and voxel centers are macro structures. After these structures have been extracted, they are integrated as a graph in the graph construction stage.
IvC Graph construction
A new graph construction method is proposed to utilize macro and micro structures to deal with the crosssource point cloud registration problem. The new graph construction method integrates these structures and forms the registration problem into a graph matching problem. We select graph because it is a strong tool for maintaining the properties(e.g. topology) of macro structures. At the same time, the nodes and the edges of the graph are able to maintain properties of the micro structure.
Before introducing the new method, the graph matching notations are introduced. A graph with nodes and directed edges is defined as C̆. and are the features for the nodes and edges of the graph, which are defined as and respectively. For example, could be a SIFT descriptor or ESF descriptor extracted from the original data around the node and could be the length of the edge. is a nodeedge incidence matrix which describes the topology of the graph. We define if the edge connects the node and the node, and zero otherwise. To perform graph matching, given a pair of graphs, we first need to define and . Next, we compute two affinity matrices, and to measure the similarity of each node and edge pair, then measures the similarity between the node of C̆ and the node of C̆, and measures the similarity between the edge of C̆ and the edge of C̆. Only when we define these matrices correctly, can we use graph matching method.
A robust structurebased graph construction method is proposed in this paper. To robustly deal with the many variations in crosssource problem, with exception of structure extraction, a structureretaining similarity measurement method is needed. In other words, the graph should be robustly described despite the crosssource problem. As previously discussed, humans can still register crosssource point clouds correctly by their structure. Similar to the human register’s process, the graph is constructed as a expression of the relations between structures. This is another key element obtaining robust registration results. The micro structures are utilized as the node descriptors and the spatial relations of the centers of micro structures are utilized as the edge descriptors. The graph has the ability of being robust to large variations in density, angle and missing data of crosssource point clouds. Here, we describe how to design the nodes and edges of these graphs, and their similarity measurement.
IvC1 Graph node and similarity measurement
To robustly represent the micro structures of point clouds, the method should be resilient to the large variations in density and missing data. We segment the super voxels of two point clouds and extract the centroid point of each super voxel. The graph node is constituted by these centroid points. To correctly match these nodes, they need to be described discriminately. Due to the crosssource problems discussed above (i.e. varying density, missing data and large variations in scale and rotation angles), using only the coordinates of these centroid points cannot describe discriminately for node description and the original matched nodes pairs are very rare. To robust deal with the crosssource problem, we select the ESF descriptor [41] instead of using conventional nodes’ coordinate because the ESF descriptor is a global descriptor that adds up the properties of the distance, angles and area of the point clouds. Using the ESF descriptor, we transform the variable Euclidean space into feature space (ESF 640). If two points come from the corresponding segments, the ESF descriptors will mostly be the same and should be matched, even though the centroid point may not perfectly match in the Euclidean space.
The node similarity matrix is computed by comparing the distance between the nodes’ ESF descriptors(see left hand of Figure 5). Here, the node similarity is not computed in Euclidean space but in feature space. Because ESF is a statistic and global descriptor, it has the ability to avoid the large local variations in Euclidean space and hence is more robust to the crosssource problem. The node similarity is
(2) 
where is the normalized distance of two 3D points’ ESF descriptors, . is the distance of two 3D points’ ESF descriptors and .
IvC2 Graph edge and similarity measurement
To robustly and discriminately describe the point cloud, it is necessary to build the edges accurately to reflect its macro structure. We record the adjacent relations (extracted in Section IVB) between super voxels and use these adjacent relations as edges . The adjacent relations correctly reflect the relations of the super voxels through the boundary property. The edges need to be described discriminatingly and meaningfully to ensure they are correctly matched. We need to reiterate that humans can register these two crosssource point clouds because their structures are almost the same. We therefore need to retain the structure property of these two graphs in describing edges. Edge direction is also an important factor for the structure of the graph, in spite of the edge distance.
In this paper, we use the spatial distance and geometric properties of these edges (see right hand of Figure 5
). The Euclidean distance and Eular angles of two connected nodes are combined to construct a descriptor vector for describing the edges
: , where . We compare the similarity by comparing the similarity of these descriptors and obtain , where . To make a more robust comparison, we normalize the descriptor , and the edge similarity matrix is computed by(3) 
This is a simple means of obtaining features in 3D point clouds (Euclidean distance and Eular angles of two points). At the same, it describes the edges, taking the spatial relations and structures into consideration. Tts ability to register the crosssource point clouds will be demonstrated in the experiment section.
IvD Optimization
We propose an enhanced factorized graph matching method which considers global geometry constraint to deal with the local minima problem in graph matching. Before introducing our method, we briefly review graph matching and FGM [3]. Suppose there is a pair of graphs, C̆ and C̆. The problem of graph matching consists of finding a correspondence between the nodes of C̆ and C̆ that maximizes the following score of global consistency:
(4) 
where denotes the node correspondence, for example, if node of C̆ and the node of C̆ correspond, . is an element of in row and col, is an element of in row and col.
It is more convenient to write in a quadratic form, , where is an indicator vector and is computed as follows:
(5) 
A factorized graph matching (FGM) method [3] is used to develop an initialfree optimization scheme with no accuracy loss to address the nonconvex issue. This method divides matrix into many smaller matrices. Using these smaller matrices, the graph matching optimization problem can be transformed to iteratively optimize the following nonlinear problem:
(6) 
where and are two relaxations in FGM [3].
Enhanced factorized graph matching. Although FGM iteratively uses a different to apply the FrankWolfe (FW) algorithm to avoid local optimal, it still exists to some extent. To effectively deal with the local optima in FGM, we improve the algorithm by considering global geometry constraint and introduce a new iteration method to solve the new algorithm. The improved energy function is :
(7) 
As our registration problem only has rigid rotation and translation, these rigid transformation relations always have neighbor projection errors nearby. We use this property to avoid the local minima and obtain more accurate transformation relations. We design this regulation term by considering the projection difference of neighboring correspondence points. is defined as
(8) 
where D represents connection points with point i, is the matched point of and is the matched point of . We can easily obtain these points in D by searching matrix G in the graph.
To optimize this nonlinear problem, we use FW [46], which iteratively updates the solution of . Given an initial , we update through optimal direction and step size . As a smooth term needs a correspondence relation, we divide the computation of optimal direction into two steps: (1) compute initial using and . We compute an initial
by solving the Hungarian algorithm which is linear programming similar to FGM
[3]. (2) computes the final by using , and . We compute the energy of the smooth terms using and obtain the final using the new energy. As the computation of involves linear programming, adding one more computation step of is not computationally costly. Similar to the FGM strategy, we also use 100 times iteration to discard the inferior temporary solution and compute an alternative solution using another FW step to optimize J(X). The final transformation matrix is computed in the next stage, following optimization.IvE Transformation estimation
Our goal is the registration of two crosssource point clouds. As the results of the graph matching contain a small number of outliers, we cannot use these results directly to compute the transformation matrix (used to combine two point clouds into a coordinate system). We need to remove the outliers to obtain the final transformation matrix. We use 3D RANSAC [47] to remove the outliers, after which we use these inners to compute the transformation matrix and perform the transformation for the point clouds. The transformation matrix may sometimes still contain small errors, so to deal with this situation, we add an ICP step to locally refine the registration after the outlier removal process. After completing these steps, we register the two crosssource point clouds together. The pseudo code of the complete registration algorithm is shown in Algorithm 1.
V Experiments
The proposed method provides a solution to the crosssource point cloud registration problem. In this section, we conduct comparative experiments with many stateoftheart registration methods: first, we compare the performance of the method on samesource datasets, and then conduct thoroughful experiments on challenging crosssource datasets.
Va Experimental setup
For comparison purposes, we select the representative 3D registration algorithms ICP [7], GoICP [14], 4PCS [15], super4PCS [16], TPSRPM [8], GMMReg [2], CPD [1] and JPMPC [4] as our compared methods. Experiments cannot be conducted on a large number of point cloud registrations using TPSRPM and JRMPC due to the memory cost, so to make a fair and reasonable comparison, we downsample the original point cloud and let the number of points be approximately 2000.
For the samesource database, we conduct a quantitative evaluation experiment with the 3D models ”Bunny”, ”Lucy” and ”Armadillo” from the Stanford 3D scanning repository^{2}^{2}2https://graphics.stanford.edu/data/3Dscanrep/3Dscanrep.html. We only consider points with positive z coordinates. For each view, following [4], the original models are rotated in the xzplane and the points with negative coordinates are rejected. In this way, only a part of the object is viewed in each set; the point sets do not fully overlap, and the extent of the overlap depends on the rotation angle, as in real scenarios.
There are three types of crosssource database:
Database A: KinectFusion and Phones’ RGB camera. We build a database with four sets of crosssource objects, which are typical examples of the different properties of crosssource point clouds. We use KinectFusion to build one source, and use VSFM to build another source for images which are captured by IPhone 6S Plus. As KinectFusion uses a physical device to capture 3D points, it can usually obtain dense and uniform point clouds on an object’s surface. However, VSFM is a method by which 3D point clouds are built from 2D images. It uses keypoints to initially build highly accurate 3D points and uses CMVC [9] to build more dense 3D points. These two sources are typical examples of crosssource problems, as previously discussed.
Database B: KinectFusion and KinectFusion’s RGB camera. We build the database in the following steps: Step 1, the original KinectFusion SDK ^{3}^{3}3https://www.microsoft.com/enau/download/details.aspx?id=40276 is revised to output the image sequence and camera pose of each image when capturing KinectFusion point clouds. Step 2, another point cloud is computed using these images and VSFM. A set of camera poses is computed using VSFM. As these two crosssource point clouds come from the same set of image sequences, the camera poses of KinectFusion and VSFM should be the same. Using this theory, a crosssource point cloud database is produced. The theory is illustrated shown in Figure 6. The VSFM point cloud is backprojected into the image coordinate system and then reprojected into the KinectFusion coordinate system. To avoid the inaccuracy of camera pose computation in VSFM and KinectFusion, we consider many poses whose reprojection error is less than (=0.5), and use these camera pose center points and the leastsquares method to compute the final rigid transformation between these two camera center points. The rigid transformation matrix is built on critical prior information and can therefore be used as groundtruth. These benchmark data contain 13 datasets and can be used to perform quantitative evaluation for crosssource point cloud registration.
Database C: Synthetic crosssource point clouds. We build the synthetic datasets according to the crosssource properties. Simulating the crosssource problems discussed in Section I, we build the synthetic datasets in three steps. Step 1: Different density and different viewpoints. We upsample the original point cloud by adding one point to the gravity center of each triangle of the original surface. We then remove all points whose coordinate is less than 0 in the upsampling point cloud, and obtain view 1 as S1. The coordinate system is rotated relative to the axis and downsamples every 3 points. We obtain view 2 by removing all the points. Step 2: Missed point cloud construction. Starting from view 2, we randomly delete ten parts in the plane to simulate a VSFM point cloud. Step 3: Rigid transformation. A random scale of 3 to 5, a random rotation matrix in the axis of to , and a random translation in the axis of 0 to 50% of the largest pointpoint distance are added to view 2. Step 4: Construction of noise and outliers. 40DB of noise is added to the original view 2 point cloud. The outliers are constructed by downsampling the original view 2 to 30% and adding random offset^{4}^{4}4offtset ranges from 0 to 1% of the largest pointpoint distance to the coordinate of the downsampled point cloud. The noise and outliers are combined to form the final point cloud S2. The S1 and S2 point clouds are simulating crosssource point clouds which perceive the crosssource problems. Ten crosssource datasets are synthesized using Stanford 3D objects^{5}^{5}5http://graphics.stanford.edu/data/3Dscanrep/. Figure 8 shows one sample of the synthetic datasets.
We first compute the radius of the point clouds for parameter setting by , where is the centroid point of the point cloud. To retain the same density and the same crosssource point cloud structure, we set the radius of the super voxels as 1% of the point cloud radius for both the KinectFusion and SFM point clouds. For the proposed method, we first compute the transformation matrix on macro and micro structures and then use the transformation matrix to perform transformation on the original crosssource point cloud.
VB Experiments on samesource point cloud datasets
We use the rootmeansquare error (RMSE) of the rotation parameters for the registration error since translation estimation is not challenging. We select ”Armadillo” and ”Bunny” with and respectively(SNR = 10db and 20% outliers).
RSMED  JRMPC  ICP  CSGM 

Armadillo  1.456  1.725  0.508 
Bunny  1.789  2.022  1.792 
Extensive evaluation and comparison of registration methods has been conducted by JRMPC on samesource databases. We only run JRMPC, ICP and the proposed method(CSGM) on the samesource database. Table I shows the quantitative comparison results. Note that ICP is more affected by the presence of outliers as a result of the onetoone correspondence and incurs a higher rate of error. JRMPC demonstrates similar performance to the proposed method, because GMM models perform well when the overlapping areas do not have a significant amount of missing data or the scale problem. We can see from this experiment that the proposed method is robust to outliers, noise and angle variations on samesource point clouds. The visual results are shown in Figure 7.
In addition, we test the robustness of the algorithms in terms of the rotation angle between two point clouds to capture the difference degree of the angles. We register the points under different angles from to and use RMSE to test the performance. The results are shown in Figure 9 and it can be seen that the angles have a different effect on the final error. As the proposed method uses a macro and micro structure to describe the point clouds, it shows robustness in dealing with outliers, noise and missing data on samesource database. However, the error increases when the rotation angle increases, similar to other methods. With the increase in the rotation angle, the outliers and the mismatched parts become a larger proportion of each point cloud.
VC Qualitative evaluation on real crosssource point clouds
As discussed previously, crosssource point clouds have a large variations in density, scale, angle and missing data which makes the already difficult point cloud registration problem even more challenging. To test the ability of our method to register crosssource point clouds and compare with other related methods, we conduct qualitative analysis experiments on four real crosssource datasets: Twobox, Chair, Threemonitor and Monitor. To make a thorough comparison, TPSRPM [8], ICP [7], CPD [1] , GMMReg [2], GoICP [14], 4PCS [15], JPMPC [4] and super4PCS [16] are selected as our comparison methods. Since many of the selected methods are unable to handle the scale problem, we first normalize the scale difference for ICP, GoICP, 4PCS, super4PCS, TPSRPM, GMMReg and JPMPC using our scale normalization method. In our proposed method, scale normalization is an integrated step.
Figure 10 shows the final registration results which indicate that the proposed method gives successful registration results, whereas the other methods fail in almost all cases. This is because many of these methods cannot handle scale problem, density problem or missing data. Note that TPSRPM obtains good result in Threemonitor and Monitor, but fails in Twobox and Chair. Also, TPSRPM is a nonrigid registration method. The proposed method obtains good results in crosssource datasets because it describes the micro and macro structure of point clouds, and uses the new optimization method to obtain correspondence relations.
Note that we do not iteratively conduct enhanced graph matching and outlier detection (RANSAC). We find that when we use the outlier detection method to remove graph nodes, the graph structure in some cases is totally different. As a alternative solution, we use ICP to smoothly refine the graph matching result to obtain a final registration result.
VD Quantitative evaluation on real and synthetic crosssource point clouds
To test the ability of the proposed method, we conduct quantitative evaluation on real and synthetic crosssource databases.
We first conduct quantitative evaluation on Databases B which contains real crosssource point clouds. We compare it in the quantitative evaluation experiments with methods that deal with rigid registration. Based on our knowledge, we compare our proposed method with ICP [7], GOICP [14], GMMReg [2], JPMPC [4], CPD [1] and 4PCS [15] and super4PCS [16] on a crosssource database.
Many rigid methods are unable to handle the scale problem. To make a fair comparison, scale normalization is performed before running these methods except for CPD which estimates scale internally. The transformation matrix for each comparison method is then computed and these matrices are used for quantitative evaluation. In this experiment, the matrices are all transformed from VSFM point clouds to KinectFusion point clouds. The VSFM point clouds are initially performed by using new computed and ground truth transformation matrices. These transformed VSFM point clouds are then compared with the ground truth transformed point clouds. As in [4], we compare the Frobenius Norm (Fnorm) between the newly computed matrices and the ground truth transformation matrices. To obtain a better visual representation of comparison results, we use as the final performance value. The smaller the value, the better performance of the algorithm. We also compute the mean of the Fnorm of all 13 datasets for each method and the results are shown in Figure 12.
Figure 12 shows the quantitative evaluation results. It illustrates the 4PCS and Super4PCS obtain worst results, and ICP follows. It is because the pointpoint level strategy shows poor ability in crosssource problems. The GMMReg, JRMPC and CPD show more robust and higher accuracy than other comparison methods; to some extent, they demonstrate the advantage of using the statistical property. The proposed CSGM method obtains the highest accuracy on all dataset. This is because we use the macro structure to globally register two point clouds with little attention to the detail, and use the micro structure to accurately register the two point clouds. We also use RANSAC and ICP to further improve the accuracy and robustness.
Figure 11 shows several sample visual results of these methods. The results show that the proposed CSGM clearly achieves better results than the other methods. GoICP and JRMPC obtain similar results to the proposed CSGM in the fourth row dataset. Because of the BnB strategy in GoICP and the generative strategy in JRMPC, good results are obtained if the scale normalizes very well and no large data are missing. If these conditions do not exist, these methods will completely fail. In the first two rows of Figure 11, for example, these methods show the results of that failure. However, the proposed CSGM achieves robust and accurate registration results in all crosssource datasets.
The proposed method is also compared on Database C which consists of synthetic crosssource point clouds. Transformation relation is estimated by the compared methods and the proposed method from view 2 to view 1 point cloud. The computed and ground truth transformation matrix are then utilized to transform the synthetic point cloud. The RSME error is computed according to the statistical distance of these two transformed point clouds. Also, we compare the Fnorm of the error of difference between transformation matrices.
Figure 14 shows the evaluation results of mean RMSE and Figure 15 shows the evaluation results of mean Fnorm of the computed transformation matrix and the groundtruth transformation matrix on whole ten sets of Database C. The results show that our method achieves accurate registration results which are better than the other methods. Figure 13 illustrates the visual effects of the Synthetic evaluation. The results show that the proposed CSGM obtains robust and visually correct registration results which are clearly better than those of the compared methods. Some of the comparison methods are even failed because the crosssource problem are really great challenge to these methods.
Vi Conclusion
In this paper, we proposed a new registration pipeline to deal with the crosssource point cloud registration problem using four novelty components. A scale normalization method was first proposed to eliminate the scale problem. Secondly, a micro and macro structure concept was proposed to describe the point clouds, and a new graph construction method was used to combine these structures. Thirdly, an optimization method was proposed to solve the problem. Lastly, a registration pipeline was proposed which combines the initial correspondence from graph matching and refinement using RANSAC and ICP.
Acknowledgment
The authors would like to thank the Nokia Corporation for their help and acknowledge the useful discussions with colleagues in GBDTC. This work is partially supported by Nokia research funding (MM12030846235).
References
 [1] A. Myronenko and X. Song, “Point set registration: Coherent point drift,” IEEE transactions on pattern analysis and machine intelligence, vol. 32, no. 12, pp. 2262–2275, 2010.
 [2] B. Jian and B. C. Vemuri, “Robust point set registration using gaussian mixture models,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 33, no. 8, pp. 1633–1645, 2011.

[3]
F. Zhou and F. De la Torre, “Factorized graph matching,” in
Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on
. IEEE, 2012, pp. 127–134.  [4] G. D. Evangelidis, D. KounadesBastian, R. Horaud, and E. Z. Psarakis, “A generative model for the joint registration of multiple point sets,” in Computer Vision–ECCV 2014. Springer, 2014, pp. 109–122.
 [5] J. Ma, J. Zhao, J. Tian, Z. Tu, and A. L. Yuille, “Robust estimation of nonrigid transformation for point set registration,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 2147–2154.
 [6] S. Belongie, J. Malik, and J. Puzicha, “Shape matching and object recognition using shape contexts,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 24, no. 4, pp. 509–522, 2002.
 [7] P. J. Best and N. D. McKay, “A method for registration of 3d shapes,” IEEE Transactions on pattern analysis and machine intelligence, vol. 14, no. 2, pp. 239–256, 1992.
 [8] H. Chui and A. Rangarajan, “A new algorithm for nonrigid point matching,” in Computer Vision and Pattern Recognition, 2000. Proceedings. IEEE Conference on, vol. 2. IEEE, 2000, pp. 44–51.
 [9] Y. Furukawa, B. Curless, S. M. Seitz, and R. Szeliski, “Towards internetscale multiview stereo,” in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, June 2010, pp. 1434–1441.
 [10] D. F. Huber and M. Hebert, “Fully automatic registration of multiple 3d data sets,” Image and Vision Computing, vol. 21, no. 7, pp. 637–650, 2003.
 [11] R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohi, J. Shotton, S. Hodges, and A. Fitzgibbon, “Kinectfusion: Realtime dense surface mapping and tracking,” in Mixed and augmented reality (ISMAR), 2011 10th IEEE international symposium on. IEEE, 2011, pp. 127–136.
 [12] A. Nüchter, K. Lingemann, J. Hertzberg, and H. Surmann, “6d slam—3d mapping outdoor environments,” Journal of Field Robotics, vol. 24, no. 89, pp. 699–722, 2007.
 [13] A. Torsello, E. Rodola, and A. Albarelli, “Multiview registration via graph diffusion of dual quaternions,” in Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011, pp. 2441–2448.
 [14] J. Yang, H. Li, and Y. Jia, “Goicp: Solving 3d registration efficiently and globally optimally,” in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 1457–1464.
 [15] D. Aiger, N. J. Mitra, and D. CohenOr, “4points congruent sets for robust pairwise surface registration,” ACM Transactions on Graphics (TOG), vol. 27, no. 3, p. 85, 2008.
 [16] N. Mellado, D. Aiger, and N. J. Mitra, “Super 4pcs fast global pointcloud registration via smart indexing,” in Computer Graphics Forum, vol. 33, no. 5. Wiley Online Library, 2014, pp. 205–215.
 [17] F. Wang, B. C. Vemuri, A. Rangarajan, and S. J. Eisenschenk, “Simultaneous nonrigid registration of multiple point sets and atlas construction,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 30, no. 11, pp. 2011–2022, 2008.
 [18] Y. Deng, A. Rangarajan, S. Eisenschenk, and B. C. Vemuri, “A riemannian framework for matching point clouds represented by the schrodinger distance transform,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 3756–3761.
 [19] M. Torki and A. M. Elgammal, “Putting local features on a manifold.” in CVPR, vol. 2, 2010, p. 4.
 [20] I. Cleju and D. Saupe, “Stochastic optimization of multiple texture registration using mutual information,” in Joint Pattern Recognition Symposium. Springer, 2007, pp. 517–526.
 [21] F. Peng, Q. Wu, L. Fan, J. Zhang, Y. You, J. Lu, and J.Y. Yang, “Street view crosssourced point cloud matching and registration,” in Image Processing (ICIP), 2014 IEEE International Conference on. IEEE, 2014, pp. 2026–2030.
 [22] N. Gelfand and L. J. Guibas, “Shape segmentation using local slippage analysis,” in Proceedings of the 2004 Eurographics/ACM SIGGRAPH symposium on Geometry processing. ACM, 2004, pp. 214–223.
 [23] X. Huang, L. Fan, J. Zhang, Q. Wu, and C. Yuan, “Real time complete dense depth reconstruction for a monocular camera,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2016, pp. 32–37.
 [24] P. Musialski, P. Wonka, D. G. Aliaga, M. Wimmer, L. Gool, and W. Purgathofer, “A survey of urban reconstruction,” in Computer graphics forum, vol. 32, no. 6. Wiley Online Library, 2013, pp. 146–177.
 [25] A. M. Manferdini, “A methodology for the promotion of cultural heritage sites through the use of lowcost technologies and procedures,” in Proceedings of the 17th International Conference on 3D Web Technology. ACM, 2012, pp. 180–180.
 [26] D. Campbell and L. Petersson, “Gogma: Globallyoptimal gaussian mixture alignment,” arXiv preprint arXiv:1603.00150, 2016.
 [27] J. Ho, A. Peter, A. Rangarajan, and M.H. Yang, “An algebraic approach to affine registration of point sets,” in 2009 IEEE 12th International Conference on Computer Vision. IEEE, 2009, pp. 1335–1340.
 [28] M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381–395, 1981.
 [29] C.S. Chen, Y.P. Hung, and J.B. Cheng, “Ransacbased darces: A new approach to fast automatic registration of partially overlapping range images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, no. 11, pp. 1229–1234, 1999.
 [30] Y. Diez, J. Martí, and J. Salvi, “Hierarchical normal space sampling to speed up point cloud coarse matching,” Pattern Recognition Letters, vol. 33, no. 16, pp. 2127–2133, 2012.
 [31] Z.Q. Cheng, Y. Chen, R. R. Martin, Y.K. Lai, and A. Wang, “Supermatching: Feature matching using supersymmetric geometric constraints,” IEEE Transactions on Visualization and Computer graphics, vol. 19, no. 11, pp. 1885–1894, 2013.
 [32] C. Papazov and D. Burschka, “Stochastic global optimization for robust point set registration,” Computer Vision and Image Understanding, vol. 115, no. 12, pp. 1598–1609, 2011.
 [33] N. Gelfand, N. J. Mitra, L. J. Guibas, and H. Pottmann, “Robust global registration,” in Symposium on geometry processing, vol. 2, no. 3, 2005, p. 5.
 [34] A. Albarelli, E. Rodola, and A. Torsello, “Loosely distinctive features for robust surface alignment,” in European Conference on Computer Vision. Springer, 2010, pp. 519–532.
 [35] E. Rodolà, A. Albarelli, F. Bergamasco, and A. Torsello, “A scale independent selection process for 3d object recognition in cluttered scenes,” International journal of computer vision, vol. 102, no. 13, pp. 129–145, 2013.
 [36] M. Corsini, M. Dellepiane, F. Ganovelli, R. Gherardi, A. Fusiello, and R. Scopigno, “Fully automatic registration of image sets on approximate geometry,” International journal of computer vision, vol. 102, no. 13, pp. 91–111, 2013.
 [37] T. K. Sinha, D. M. Cash, R. J. Weil, R. L. Galloway, and M. I. Miga, “Cortical surface registration using texture mapped point clouds and mutual information,” in International Conference on Medical Image Computing and ComputerAssisted Intervention. Springer, 2002, pp. 533–540.
 [38] G. Pandey, J. R. McBride, S. Savarese, and R. M. Eustice, “Toward mutual information based automatic registration of 3d point clouds,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2012, pp. 2698–2704.
 [39] A. Moussa and N. Elsheimy, “Automatic registration of approximately leveled point clouds of urban scenes.” ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, pp. 145–150, 2015.
 [40] R. B. Rusu, N. Blodow, and M. Beetz, “Fast point feature histograms (fpfh) for 3d registration,” in Robotics and Automation, 2009. ICRA’09. IEEE International Conference on. IEEE, 2009, pp. 3212–3217.
 [41] W. Wohlkinger and M. Vincze, “Ensemble of shape functions for 3d object classification,” in Robotics and Biomimetics (ROBIO), 2011 IEEE International Conference on. IEEE, 2011, pp. 2987–2992.
 [42] A. E. Johnson, “Spinimages: a representation for 3d surface matching,” Ph.D. dissertation, Citeseer, 1997.
 [43] F. Tombari, S. Salti, and L. Di Stefano, “Unique signatures of histograms for local surface description,” in European conference on computer vision. Springer, 2010, pp. 356–369.
 [44] J. Papon, A. Abramov, M. Schoeler, and F. Wörgötter, “Voxel cloud connectivity segmentation  supervoxels for point clouds,” in Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, Portland, Oregon, June 2227 2013.
 [45] F. Zhou and F. De la Torre, “Deformable graph matching,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 2922–2929.
 [46] M. Zaslavskiy, F. Bach, and J.P. Vert, “A path following algorithm for the graph matching problem,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 31, no. 12, pp. 2227–2242, 2009.
 [47] C. Papazov and D. Burschka, “An efficient ransac for 3d object recognition in noisy and occluded scenes,” in Computer Vision–ACCV 2010. Springer, 2010, pp. 135–148.
Comments
There are no comments yet.