1 Introduction and Related Work
Registration, i.e. determining a spatial transformation that aligns two images or point sets, is a fundamental task in medical image and shape analysis and a prerequisite for numerous clinical applications. It is widely used for image-guided intervention, motion compensation in radiation therapy, atlas-based segmentation or monitoring of disease progression. Non-rigid registration is ill-posed and thus a non-convex optimization problem with a very high number of degrees of freedom. In addition, the medical domain poses particular challenges on the registration task, e.g. non-linear intensity differences in multi-modal images or high inter-patient variations in anatomical shape and appearance.
Iconic registration: Voxel-based intensity-driven medical image registration has been an active area of research, which can e.g. be solved using discrete 
optimization of a similarity metric and a regularization constraint on the smoothness of the deformation field. Data driven deep learning methods based on convolutional neural networks (CNNs), have only recently been used in the field of medical image registration. In
an iconic and unsupervised learning approach is introduced that learns features to drive a registration and replaces the iterative optimization with a feed-forward CNN. While achieving impressive runtimes of under a second on a GPU the accuracy for CT lung motion estimation is inferior to conventional methods. Weak supervision in the form of landmarks or multi-label segmentations was used in the CNN framework of, where the similarity measure is based on the alignment of the registered labels.
, geometric registration models - based on keypoints or surfaces - offer a promising solution. Point-based registration has not yet profited from the advantages of deep feature learning due to the restriction of conventional CNNs to densely gridded input. Many current geometric methods (e.g. and ) are based on the well-established coherent point drift (CPD) algorithm . In addition to 3D coordinates, they incorporate further image or segmentation-derived features, such as point orientations or scalar fractional anisotropy (FA) values .
Deep geometric learning: While these hand-crafted features clearly improved on the results of the CPD, recent methods from the field of geometric deep learning 
would enable a data-driven feature extraction directly from point sets. The PointNet framework was one of the first approaches to apply deep learning methods to unordered point sets. A limitation of the approach is that is does not consider local neighborhood information, which was adressed in  by dynamically building a k-nearest-neighbour graph on the point set and thus also enabling feature propagation along edges in that graph. Combining convolutional feature learning with a differentiable and robustly regularized fitting process has first been proposed for multi-camera scene reconstruction in  (DSAC), but has so far been limited to rigid alignment.
Large deformation lung registration: Both iconic and geometric approaches have often been found to yield relative large residual errors for large motion lung registration (forced inhale-to-exhale): e.g. 4.68 mm for the discrete optimization algorithm in  applied to the DIR-lab COPD data  and 3.61 mm (on the inhale-exhale pairs of the EMPIRE10 challenge) for , which used both keypoint- and intensity-based information. Learning the alignment of such difficult data appears to be so far impossible with intensity-driven CNN approaches that already struggle with more shallow breathing in 4D-CT . Thus being able to directly match vessel- and airway trees based on geometric features alone can provide a valuable pre-alignment for further intensity-based registration (cf. ) or be directly used in clinical applications to perform atlas-based labelling of anatomical segments and branchpoints for physiological studies .
Our work contributes two important steps towards data-driven point set registration that enables the incorporation of deep feature learning into a regularized CPD fitting algorithm. First, we utilize dynamic graph CNNs  in an auxiliary metric learning task to establish robust correspondences between a moving and a fixed point set. These learned features are shown to yield an improved modeling of prior probabilities in the CPD algorithm. Since all operations of the CPD algorithm are differentiable, we secondly show that it is possible to further optimize the parameters of the feature extraction network directly on the registration task. To evaluate our method we register keypoints extracted from inhale and exhale states in lung CT-scans from the challenging DIR-Lab COPD dataset  showing the general feasibility of a deep learning point set registration framework in an end-to-end manner and with only geometric information.
In this section, we introduce our proposed method for deformable point set registration with deeply learned features. Figure 1 summarizes the methods general idea. Input to our method are the fixed point set and the moving point set . While we make no assumptions on the number of points or correspondences in the input point sets, we assume a further set of keypoint correspondences with
for the supervised learning task, which is denoted as. We compute geometric features from and with a shared dynamic graph CNN (DGCNN )
. The spatial positions together with the extracted descriptors are input to the feature based CPD algorithm that produces displacement vectors for all points in. We then employ thin-plate splines (TPS) 
as a scattered data interpolation method to compute the displacements for, which yields the transformed point set . Finally, we can compute the mean squared error (MSE) of the Euclidean distance between correspondences in and as a loss for the optimization of the feature extraction network . In the following, we describe the descriptor learning with the DGCNN as well as the extensions to the CPD algorithm to exploit point features as prior probabilities.
2.1 Descriptor Learning on Point Sets with Dynamic Graph CNNs
, that dynamically builds a k-Nearest-Neighbor (kNN) graph from the points in the input feature space and then aggregates information from neighbouring points to output a final feature map. We employ several edge convolutions with DenseNet style feature concatenation to efficiently capture both local and global geometry. The final feature descriptor is obtained by fully connected layers that reduce the point information to a given dimensionality. We restrict the output descriptor space bynormalization to enable constant parametrization of subsequent operations in the registration pipeline which stabilizes network training. To establish robust initial correspondences between the moving and fixed point set the model is pretrained in an auxiliary metric learning task using a triplet loss.
2.2 Feature-based Coherent Point Drift
The CPD algorithm formulates the alignment of two point sets as a probability density estimation problem. The points in the moving point set
are described as centroids of gaussian mixture models (GMMs) and are fitted to the points in the fixed point setby maximizing the likelihood. To find the displacements for
the Expectation Maximization (EM) algorithm is used, where in the E-step point correspondence probabilitiesare computed and in the M-step the displacement vectors are updated. We incorporate the learned geometric feature descriptors and as additional prior probabilites with
where denotes the spatial point correspondence described in , is a trade-off and scaling parameter and
with and . and denote the number of points in and , respectively. In addition to the parameter in (2), that controls the width of the Gaussian, the CPD algorithm includes three more free parameters: , and . Parameter
models the amount of noise and outliers in the point sets, while parametersand control the smoothness of the deformation field.
Registering the fully inflated to exhaled lungs is considered one of the most demanding tasks in medical image registration, which is important for analyzing e.g. local ventilation defects in COPD patients. We use the DIR-Lab COPD data set  with 10 inhale-exhale pairs of 3D CT scans for all our experiments. The thorax volumes are resampled to isotropic voxel-sizes of mm and a few thousands keypoints are extracted from inner lung structures with the Foerstner operator. Automatic correspondences to supervise the learning of our DGCNN are established using the discrete and intensity-based registration algorithm of , which has an accuracy of 1 mm. In all experiments, no CT-based intensity information is used and all processing relies entirely on the geometric keypoint locations.
In our first experiment, we learn point descriptors directly in a supervised metric learning task. Therefore, a triplet loss is employed forcing feature similarity between corresponding keypoint regions in point set pairs. The inhale and exhale point set form the positive pair, while points from the permuted exhale point set yield as negative examples. These learned features can be directly used in a kNN registration. We then investigate the combination of spatial positions and learned descriptors in the feature-based CPD algorithm. Finally, in our concluding experiment, the feature network is trained in an end-to-end manner as described in Section 2 to further optimize the pretrained geometric features.
Implementation details: Due to the limited number of instances in the used dataset we perform a leave-one-out validation, where we evaluate on one inhale and exhale point set and train our network with the remaining nine pairs. During training we use farthest point sampling to obtain points from the inhale and exhale point set, respectively. Each evaluation is run ten times and results are averaged to account for the effect of the sampling step. The employed network parameters are specified in Figure 2. For the CPD algorithm ( iterations) we use following parameters: , , and . For the end-to-end training we relax parameters and to and , respectively, to allow for further optimization of input features.
4 Results and Discussion
|Case #||initial||center-aligned||triplet + kNN@20||CPD ||triplet + CPD (ours)||end-to-end (ours)|
Qualitative results are shown in Figure 3 where our approach demonstrates a good trade-off between the very smooth motion of the CPD and the potential for large correspondences of the features from triplet-learning. Our quantitative results that are evaluated on 300 independent expert landmark pairs for each patient demonstrate that registering the point clouds directly with CPD (3D coordinates as input) yield a relatively large target registration error (TRE) of 6.45.2 mm (see Table 1). Employing kNN registration based on a DGCNN trained with keypoint correspondences to extract geometric features without regularization is still inferior with a TRE of 9.05.5 mm highlighting the challenges of this point-based registration task and the difficulties of addressing the deformable alignment with one-to-one correspondence search. Combining the geometric features of a pre-trained DGCNN with the regularizing CPD that is extended to use 19-dimensional inputs (16 features + 3 coordinates) yields a substantial improvement over each individual method with a TRE of 4.74.1 mm. Finally, using end-to-end learning to back-propagate the regularized alignment errors through the iterative point drift layers to further improve the feature learning shows another small but significant improvement to 4.33.6 mm. These alignment errors cannot be directly compared to the large variety of image- and feature-based registration algorithms that reached 3.6 mm , 4.7 mm  or 1.1 mm  for similar datasets, but were based on intensity information, while our comparison is restricted to purely geometric approaches without intensity. In addition, a better outcome would be expected by extending the keypoint extraction to focus on vessel- or airway-based nodes and to include anatomical tree-based edges in the graph model. Nevertheless, the results clearly showed that our models are already able to directly learn semantic geometric features in a data-driven manner based on the inherent correspondence information.
We have presented a new method for deformable point set registration that learns geometric features from irregular point sets using a dynamic graph CNN (DGCNN) together with a regularizing and fully differentiable high-dimensional coherent point drift (CPD) model. Our results clearly indicate that geometric feature learning, even from relatively uninformative point clouds, is possible with DGCNNs and can be further enhanced when incorporating the CPD model into the optimization. Evaluated on challenging inhale-exhale lung registration of COPD patients we achieve an improvement of 2.1 mm over the classical CPD method and are competitive with many classical image-based registration algorithms despite the fact that no intensity information is used. In addition to these encouraging findings, we believe that alternative regularization models to the CPD, that require fewer iteration steps could have potential to further improve this approach. In future works, many more applications, e.g. surface point shape alignment and analysis, could benefit from deep point registration.
-  Bayer, S., Ravikumar, N., Strumia, M., Tong, X., Gao, Y., Ostermeier, M., Fahrig, R., Maier, A.: Intraoperative brain shift compensation using a hybrid mixture model. In: MICCAI. pp. 116–124 (2018)
-  Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. TPAMI 11(6), 567–585 (1989)
-  Brachmann, E., Krull, A., Nowozin, S., Shotton, J., Michel, F., Gumhold, S., Rother, C.: Dsac-differentiable ransac for camera localization. In: CVPR. pp. 6684–6692 (2017)
-  Bronstein, M.M., Bruna, J., LeCun, Y., Szlam, A., Vandergheynst, P.: Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine 34(4), 18–42 (2017)
-  Castillo, R., Castillo, E., Fuentes, D., Ahmad, M., Wood, A.M., Ludwig, M.S., Guerrero, T.: A reference dataset for deformable image registration spatial accuracy evaluation using the copdgene study archive. Physics in Medicine & Biology 58(9), 2861 (2013)
-  Ehrhardt, J., Werner, R., Schmidt-Richberg, A., Handels, H.: Automatic landmark detection and non-linear landmark-and surface-based registration of lung ct images. Medical Image Analysis for the Clinic-A Grand Challenge, MICCAI 2010, 165–174 (2010)
Glocker, B., Komodakis, N., Tziritas, G., Navab, N., Paragios, N.: Dense image registration through mrfs and efficient linear programming. Medical image analysis12(6), 731–741 (2008)
-  Heinrich, M.P., Handels, H., Simpson, I.J.: Estimating large lung motion in copd patients by symmetric regularised correspondence fields. In: MICCAI. pp. 338–345 (2015)
-  Hu, Y., Modat, M., Gibson, E., Li, W., Ghavami, N., Bonmati, E., Wang, G., Bandula, S., Moore, C.M., Emberton, M., et al.: Weakly-supervised convolutional neural networks for multimodal image registration. Medical Image analysis 49, 1–13 (2018)
-  Myronenko, A., Song, X.: Point set registration: Coherent point drift. TPAMI 32(12), 2262–2275 (2010)
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the Conference on Computer Vision and Pattern Recognition. pp. 652–660 (2017)
-  Ravikumar, N., Gooya, A., Beltrachini, L., Frangi, A.F., Taylor, Z.A.: Generalised coherent point drift for group-wise multi-dimensional analysis of diffusion brain mri data. Medical image analysis 53, 47 – 63 (2019)
-  Tschirren, J., McLennan, G., Palágyi, K., Hoffman, E.A., Sonka, M.: Matching and anatomical labeling of human airway tree. TMI 24(12), 1540–1547 (2005)
-  de Vos, B.D., Berendsen, F.F., Viergever, M.A., Sokooti, H., Staring, M., Išgum, I.: A deep learning framework for unsupervised affine and deformable image registration. Medical image analysis 52, 128–143 (2019)
-  Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph cnn for learning on point clouds. arXiv preprint arXiv:1801.07829 (2018)