1 Introduction and Related Work
Registration, i.e. determining a spatial transformation that aligns two images or point sets, is a fundamental task in medical image and shape analysis and a prerequisite for numerous clinical applications. It is widely used for imageguided intervention, motion compensation in radiation therapy, atlasbased segmentation or monitoring of disease progression. Nonrigid registration is illposed and thus a nonconvex optimization problem with a very high number of degrees of freedom. In addition, the medical domain poses particular challenges on the registration task, e.g. nonlinear intensity differences in multimodal images or high interpatient variations in anatomical shape and appearance.
Iconic registration: Voxelbased intensitydriven medical image registration has been an active area of research, which can e.g. be solved using discrete [7]
optimization of a similarity metric and a regularization constraint on the smoothness of the deformation field. Data driven deep learning methods based on convolutional neural networks (CNNs), have only recently been used in the field of medical image registration. In
[14]an iconic and unsupervised learning approach is introduced that learns features to drive a registration and replaces the iterative optimization with a feedforward CNN. While achieving impressive runtimes of under a second on a GPU the accuracy for CT lung motion estimation is inferior to conventional methods. Weak supervision in the form of landmarks or multilabel segmentations was used in the CNN framework of
[9], where the similarity measure is based on the alignment of the registered labels.Geometric registration: To capture large deformations, e.g. present in intrapatient inhaleexhale examinations of COPD patients [5] or vesselguided brain shift compensation [1]
, geometric registration models  based on keypoints or surfaces  offer a promising solution. Pointbased registration has not yet profited from the advantages of deep feature learning due to the restriction of conventional CNNs to densely gridded input. Many current geometric methods (e.g.
[1] and [12]) are based on the wellestablished coherent point drift (CPD) algorithm [10]. In addition to 3D coordinates, they incorporate further image or segmentationderived features, such as point orientations or scalar fractional anisotropy (FA) values [12].Deep geometric learning: While these handcrafted features clearly improved on the results of the CPD, recent methods from the field of geometric deep learning [4]
would enable a datadriven feature extraction directly from point sets. The PointNet framework
[11] was one of the first approaches to apply deep learning methods to unordered point sets. A limitation of the approach is that is does not consider local neighborhood information, which was adressed in [15] by dynamically building a knearestneighbour graph on the point set and thus also enabling feature propagation along edges in that graph. Combining convolutional feature learning with a differentiable and robustly regularized fitting process has first been proposed for multicamera scene reconstruction in [3] (DSAC), but has so far been limited to rigid alignment.Large deformation lung registration: Both iconic and geometric approaches have often been found to yield relative large residual errors for large motion lung registration (forced inhaletoexhale): e.g. 4.68 mm for the discrete optimization algorithm in [7] applied to the DIRlab COPD data [5] and 3.61 mm (on the inhaleexhale pairs of the EMPIRE10 challenge) for [6], which used both keypoint and intensitybased information. Learning the alignment of such difficult data appears to be so far impossible with intensitydriven CNN approaches that already struggle with more shallow breathing in 4DCT [14]. Thus being able to directly match vessel and airway trees based on geometric features alone can provide a valuable prealignment for further intensitybased registration (cf. [8]) or be directly used in clinical applications to perform atlasbased labelling of anatomical segments and branchpoints for physiological studies [13].
1.1 Contributions
Our work contributes two important steps towards datadriven point set registration that enables the incorporation of deep feature learning into a regularized CPD fitting algorithm. First, we utilize dynamic graph CNNs [15] in an auxiliary metric learning task to establish robust correspondences between a moving and a fixed point set. These learned features are shown to yield an improved modeling of prior probabilities in the CPD algorithm. Since all operations of the CPD algorithm are differentiable, we secondly show that it is possible to further optimize the parameters of the feature extraction network directly on the registration task. To evaluate our method we register keypoints extracted from inhale and exhale states in lung CTscans from the challenging DIRLab COPD dataset [5] showing the general feasibility of a deep learning point set registration framework in an endtoend manner and with only geometric information.
2 Methods
In this section, we introduce our proposed method for deformable point set registration with deeply learned features. Figure 1 summarizes the methods general idea. Input to our method are the fixed point set and the moving point set . While we make no assumptions on the number of points or correspondences in the input point sets, we assume a further set of keypoint correspondences with
for the supervised learning task, which is denoted as
. We compute geometric features from and with a shared dynamic graph CNN (DGCNN [15]). The spatial positions together with the extracted descriptors are input to the feature based CPD algorithm that produces displacement vectors for all points in
. We then employ thinplate splines (TPS) [2]as a scattered data interpolation method to compute the displacements for
, which yields the transformed point set . Finally, we can compute the mean squared error (MSE) of the Euclidean distance between correspondences in and as a loss for the optimization of the feature extraction network . In the following, we describe the descriptor learning with the DGCNN as well as the extensions to the CPD algorithm to exploit point features as prior probabilities.2.1 Descriptor Learning on Point Sets with Dynamic Graph CNNs
Our proposed network architecture for geometric feature extraction is illustrated in Figure 2. A key component is the edge convolution introduced in [15]
, that dynamically builds a kNearestNeighbor (kNN) graph from the points in the input feature space and then aggregates information from neighbouring points to output a final feature map. We employ several edge convolutions with DenseNet style feature concatenation to efficiently capture both local and global geometry. The final feature descriptor is obtained by fully connected layers that reduce the point information to a given dimensionality. We restrict the output descriptor space by
normalization to enable constant parametrization of subsequent operations in the registration pipeline which stabilizes network training. To establish robust initial correspondences between the moving and fixed point set the model is pretrained in an auxiliary metric learning task using a triplet loss.2.2 Featurebased Coherent Point Drift
The CPD algorithm formulates the alignment of two point sets as a probability density estimation problem. The points in the moving point set
are described as centroids of gaussian mixture models (GMMs) and are fitted to the points in the fixed point set
by maximizing the likelihood. To find the displacements forthe Expectation Maximization (EM) algorithm is used, where in the Estep point correspondence probabilities
are computed and in the Mstep the displacement vectors are updated. We incorporate the learned geometric feature descriptors and as additional prior probabilites with(1) 
where denotes the spatial point correspondence described in [10], is a tradeoff and scaling parameter and
(2) 
with and . and denote the number of points in and , respectively. In addition to the parameter in (2), that controls the width of the Gaussian, the CPD algorithm includes three more free parameters: , and . Parameter
models the amount of noise and outliers in the point sets, while parameters
and control the smoothness of the deformation field.3 Experiments
Registering the fully inflated to exhaled lungs is considered one of the most demanding tasks in medical image registration, which is important for analyzing e.g. local ventilation defects in COPD patients. We use the DIRLab COPD data set [5] with 10 inhaleexhale pairs of 3D CT scans for all our experiments. The thorax volumes are resampled to isotropic voxelsizes of mm and a few thousands keypoints are extracted from inner lung structures with the Foerstner operator. Automatic correspondences to supervise the learning of our DGCNN are established using the discrete and intensitybased registration algorithm of [8], which has an accuracy of 1 mm. In all experiments, no CTbased intensity information is used and all processing relies entirely on the geometric keypoint locations.
In our first experiment, we learn point descriptors directly in a supervised metric learning task. Therefore, a triplet loss is employed forcing feature similarity between corresponding keypoint regions in point set pairs. The inhale and exhale point set form the positive pair, while points from the permuted exhale point set yield as negative examples. These learned features can be directly used in a kNN registration. We then investigate the combination of spatial positions and learned descriptors in the featurebased CPD algorithm. Finally, in our concluding experiment, the feature network is trained in an endtoend manner as described in Section 2 to further optimize the pretrained geometric features.
Implementation details: Due to the limited number of instances in the used dataset we perform a leaveoneout validation, where we evaluate on one inhale and exhale point set and train our network with the remaining nine pairs. During training we use farthest point sampling to obtain points from the inhale and exhale point set, respectively. Each evaluation is run ten times and results are averaged to account for the effect of the sampling step. The employed network parameters are specified in Figure 2. For the CPD algorithm ( iterations) we use following parameters: , , and . For the endtoend training we relax parameters and to and , respectively, to allow for further optimization of input features.
4 Results and Discussion
Case #  initial  centeraligned  triplet + kNN@20  CPD [10]  triplet + CPD (ours)  endtoend (ours) 

1  26.3  17.8  8.1  5.5  4.2  3.4 
2  21.8  14.7  15.6  8.4  9.3  8.9 
3  12.6  10.6  6.4  2.7  2.5  2.4 
4  29.6  19.0  8.3  4.8  3.4  3.2 
5  30.1  18.4  7.8  8.4  5.2  4.6 
6  28.5  16.2  7.5  14.0  5.1  4.3 
7  21.6  10.2  6.3  3.0  2.6  2.5 
8  26.5  17.4  6.3  6.8  4.3  3.9 
9  14.9  14.1  9.0  3.5  3.1  3.6 
10  21.8  19.6  14.9  7.4  7.5  7.4 
mean  23.4  15.7  9.0  6.4  4.7  4.3 
std  11.9  7.0  5.5  5.2  4.1  3.6 
val   
Qualitative results are shown in Figure 3 where our approach demonstrates a good tradeoff between the very smooth motion of the CPD and the potential for large correspondences of the features from tripletlearning. Our quantitative results that are evaluated on 300 independent expert landmark pairs for each patient demonstrate that registering the point clouds directly with CPD (3D coordinates as input) yield a relatively large target registration error (TRE) of 6.45.2 mm (see Table 1). Employing kNN registration based on a DGCNN trained with keypoint correspondences to extract geometric features without regularization is still inferior with a TRE of 9.05.5 mm highlighting the challenges of this pointbased registration task and the difficulties of addressing the deformable alignment with onetoone correspondence search. Combining the geometric features of a pretrained DGCNN with the regularizing CPD that is extended to use 19dimensional inputs (16 features + 3 coordinates) yields a substantial improvement over each individual method with a TRE of 4.74.1 mm. Finally, using endtoend learning to backpropagate the regularized alignment errors through the iterative point drift layers to further improve the feature learning shows another small but significant improvement to 4.33.6 mm. These alignment errors cannot be directly compared to the large variety of image and featurebased registration algorithms that reached 3.6 mm [6], 4.7 mm [7] or 1.1 mm [8] for similar datasets, but were based on intensity information, while our comparison is restricted to purely geometric approaches without intensity. In addition, a better outcome would be expected by extending the keypoint extraction to focus on vessel or airwaybased nodes and to include anatomical treebased edges in the graph model. Nevertheless, the results clearly showed that our models are already able to directly learn semantic geometric features in a datadriven manner based on the inherent correspondence information.
5 Conclusion
We have presented a new method for deformable point set registration that learns geometric features from irregular point sets using a dynamic graph CNN (DGCNN) together with a regularizing and fully differentiable highdimensional coherent point drift (CPD) model. Our results clearly indicate that geometric feature learning, even from relatively uninformative point clouds, is possible with DGCNNs and can be further enhanced when incorporating the CPD model into the optimization. Evaluated on challenging inhaleexhale lung registration of COPD patients we achieve an improvement of 2.1 mm over the classical CPD method and are competitive with many classical imagebased registration algorithms despite the fact that no intensity information is used. In addition to these encouraging findings, we believe that alternative regularization models to the CPD, that require fewer iteration steps could have potential to further improve this approach. In future works, many more applications, e.g. surface point shape alignment and analysis, could benefit from deep point registration.
References
 [1] Bayer, S., Ravikumar, N., Strumia, M., Tong, X., Gao, Y., Ostermeier, M., Fahrig, R., Maier, A.: Intraoperative brain shift compensation using a hybrid mixture model. In: MICCAI. pp. 116–124 (2018)
 [2] Bookstein, F.L.: Principal warps: Thinplate splines and the decomposition of deformations. TPAMI 11(6), 567–585 (1989)
 [3] Brachmann, E., Krull, A., Nowozin, S., Shotton, J., Michel, F., Gumhold, S., Rother, C.: Dsacdifferentiable ransac for camera localization. In: CVPR. pp. 6684–6692 (2017)
 [4] Bronstein, M.M., Bruna, J., LeCun, Y., Szlam, A., Vandergheynst, P.: Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine 34(4), 18–42 (2017)
 [5] Castillo, R., Castillo, E., Fuentes, D., Ahmad, M., Wood, A.M., Ludwig, M.S., Guerrero, T.: A reference dataset for deformable image registration spatial accuracy evaluation using the copdgene study archive. Physics in Medicine & Biology 58(9), 2861 (2013)
 [6] Ehrhardt, J., Werner, R., SchmidtRichberg, A., Handels, H.: Automatic landmark detection and nonlinear landmarkand surfacebased registration of lung ct images. Medical Image Analysis for the ClinicA Grand Challenge, MICCAI 2010, 165–174 (2010)

[7]
Glocker, B., Komodakis, N., Tziritas, G., Navab, N., Paragios, N.: Dense image registration through mrfs and efficient linear programming. Medical image analysis
12(6), 731–741 (2008)  [8] Heinrich, M.P., Handels, H., Simpson, I.J.: Estimating large lung motion in copd patients by symmetric regularised correspondence fields. In: MICCAI. pp. 338–345 (2015)
 [9] Hu, Y., Modat, M., Gibson, E., Li, W., Ghavami, N., Bonmati, E., Wang, G., Bandula, S., Moore, C.M., Emberton, M., et al.: Weaklysupervised convolutional neural networks for multimodal image registration. Medical Image analysis 49, 1–13 (2018)
 [10] Myronenko, A., Song, X.: Point set registration: Coherent point drift. TPAMI 32(12), 2262–2275 (2010)

[11]
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the Conference on Computer Vision and Pattern Recognition. pp. 652–660 (2017)
 [12] Ravikumar, N., Gooya, A., Beltrachini, L., Frangi, A.F., Taylor, Z.A.: Generalised coherent point drift for groupwise multidimensional analysis of diffusion brain mri data. Medical image analysis 53, 47 – 63 (2019)
 [13] Tschirren, J., McLennan, G., Palágyi, K., Hoffman, E.A., Sonka, M.: Matching and anatomical labeling of human airway tree. TMI 24(12), 1540–1547 (2005)
 [14] de Vos, B.D., Berendsen, F.F., Viergever, M.A., Sokooti, H., Staring, M., Išgum, I.: A deep learning framework for unsupervised affine and deformable image registration. Medical image analysis 52, 128–143 (2019)
 [15] Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph cnn for learning on point clouds. arXiv preprint arXiv:1801.07829 (2018)
Comments
There are no comments yet.