Learning Deformable Point Set Registration with Regularized Dynamic Graph CNNs for Large Lung Motion in COPD Patients

by   Lasse Hansen, et al.
Universität Lübeck

Deformable registration continues to be one of the key challenges in medical image analysis. While iconic registration methods have started to benefit from the recent advances in medical deep learning, the same does not yet apply for the registration of point sets, e.g. registration based on surfaces, keypoints or landmarks. This is mainly due to the restriction of the convolution operator in modern CNNs to densely gridded input. However, with the newly developed methods from the field of geometric deep learning suitable tools are now emerging, which enable powerful analysis of medical data on irregular domains. In this work, we present a new method that enables the learning of regularized feature descriptors with dynamic graph CNNs. By incorporating the learned geometric features as prior probabilities into the well-established coherent point drift (CPD) algorithm, formulated as differentiable network layer, we establish an end-to-end framework for robust registration of two point sets. Our approach is evaluated on the challenging task of aligning keypoints extracted from lung CT scans in inhale and exhale states with large deformations and without any additional intensity information. Our results indicate that the inherent geometric structure of the extracted keypoints is sufficient to establish descriptive point features, which yield a significantly improved performance and robustness of our registration framework.



There are no comments yet.


page 1

page 2

page 3

page 4


Deep learning based geometric registration for medical images: How accurate can we get without visual features?

As in other areas of medical image analysis, e.g. semantic segmentation,...

Linear and Deformable Image Registration with 3D Convolutional Neural Networks

Image registration and in particular deformable registration methods are...

Deformable Medical Image Registration Using a Randomly-Initialized CNN as Regularization Prior

We present deformable unsupervised medical image registration using a ra...

Deformable Gabor Feature Networks for Biomedical Image Classification

In recent years, deep learning has dominated progress in the field of me...

DeepBundle: Fiber Bundle Parcellation with Graph Convolution Neural Networks

Parcellation of whole-brain tractography streamlines is an important ste...

One Shot Learning for Deformable Medical Image Registration and Periodic Motion Tracking

Deformable image registration is a very important field of research in m...

To Learn or Not to Learn Features for Deformable Registration?

Feature-based registration has been popular with a variety of features r...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction and Related Work

Registration, i.e. determining a spatial transformation that aligns two images or point sets, is a fundamental task in medical image and shape analysis and a prerequisite for numerous clinical applications. It is widely used for image-guided intervention, motion compensation in radiation therapy, atlas-based segmentation or monitoring of disease progression. Non-rigid registration is ill-posed and thus a non-convex optimization problem with a very high number of degrees of freedom. In addition, the medical domain poses particular challenges on the registration task, e.g. non-linear intensity differences in multi-modal images or high inter-patient variations in anatomical shape and appearance.

Iconic registration: Voxel-based intensity-driven medical image registration has been an active area of research, which can e.g. be solved using discrete [7]

optimization of a similarity metric and a regularization constraint on the smoothness of the deformation field. Data driven deep learning methods based on convolutional neural networks (CNNs), have only recently been used in the field of medical image registration. In


an iconic and unsupervised learning approach is introduced that learns features to drive a registration and replaces the iterative optimization with a feed-forward CNN. While achieving impressive runtimes of under a second on a GPU the accuracy for CT lung motion estimation is inferior to conventional methods. Weak supervision in the form of landmarks or multi-label segmentations was used in the CNN framework of

[9], where the similarity measure is based on the alignment of the registered labels.

Geometric registration: To capture large deformations, e.g. present in intra-patient inhale-exhale examinations of COPD patients [5] or vessel-guided brain shift compensation [1]

, geometric registration models - based on keypoints or surfaces - offer a promising solution. Point-based registration has not yet profited from the advantages of deep feature learning due to the restriction of conventional CNNs to densely gridded input. Many current geometric methods (e.g.

[1] and [12]) are based on the well-established coherent point drift (CPD) algorithm [10]. In addition to 3D coordinates, they incorporate further image or segmentation-derived features, such as point orientations or scalar fractional anisotropy (FA) values [12].

Deep geometric learning: While these hand-crafted features clearly improved on the results of the CPD, recent methods from the field of geometric deep learning [4]

would enable a data-driven feature extraction directly from point sets. The PointNet framework

[11] was one of the first approaches to apply deep learning methods to unordered point sets. A limitation of the approach is that is does not consider local neighborhood information, which was adressed in [15] by dynamically building a k-nearest-neighbour graph on the point set and thus also enabling feature propagation along edges in that graph. Combining convolutional feature learning with a differentiable and robustly regularized fitting process has first been proposed for multi-camera scene reconstruction in [3] (DSAC), but has so far been limited to rigid alignment.

Large deformation lung registration: Both iconic and geometric approaches have often been found to yield relative large residual errors for large motion lung registration (forced inhale-to-exhale): e.g. 4.68 mm for the discrete optimization algorithm in [7] applied to the DIR-lab COPD data [5] and 3.61 mm (on the inhale-exhale pairs of the EMPIRE10 challenge) for [6], which used both keypoint- and intensity-based information. Learning the alignment of such difficult data appears to be so far impossible with intensity-driven CNN approaches that already struggle with more shallow breathing in 4D-CT [14]. Thus being able to directly match vessel- and airway trees based on geometric features alone can provide a valuable pre-alignment for further intensity-based registration (cf. [8]) or be directly used in clinical applications to perform atlas-based labelling of anatomical segments and branchpoints for physiological studies [13].

1.1 Contributions

Our work contributes two important steps towards data-driven point set registration that enables the incorporation of deep feature learning into a regularized CPD fitting algorithm. First, we utilize dynamic graph CNNs [15] in an auxiliary metric learning task to establish robust correspondences between a moving and a fixed point set. These learned features are shown to yield an improved modeling of prior probabilities in the CPD algorithm. Since all operations of the CPD algorithm are differentiable, we secondly show that it is possible to further optimize the parameters of the feature extraction network directly on the registration task. To evaluate our method we register keypoints extracted from inhale and exhale states in lung CT-scans from the challenging DIR-Lab COPD dataset [5] showing the general feasibility of a deep learning point set registration framework in an end-to-end manner and with only geometric information.

2 Methods

Figure 1: Illustration of our proposed method for supervised non-rigid point set registration. While we investigate the problem of 3D registration, here, point sets are depicted in two dimensions for simplicity. Also, point sets are underlaid with coronal lung CT slices as visualization aides. No image information is used in our registration pipeline.

In this section, we introduce our proposed method for deformable point set registration with deeply learned features. Figure 1 summarizes the methods general idea. Input to our method are the fixed point set and the moving point set . While we make no assumptions on the number of points or correspondences in the input point sets, we assume a further set of keypoint correspondences with

for the supervised learning task, which is denoted as

. We compute geometric features from and with a shared dynamic graph CNN (DGCNN [15]

. The spatial positions together with the extracted descriptors are input to the feature based CPD algorithm that produces displacement vectors for all points in

. We then employ thin-plate splines (TPS) [2]

as a scattered data interpolation method to compute the displacements for

, which yields the transformed point set . Finally, we can compute the mean squared error (MSE) of the Euclidean distance between correspondences in and as a loss for the optimization of the feature extraction network . In the following, we describe the descriptor learning with the DGCNN as well as the extensions to the CPD algorithm to exploit point features as prior probabilities.

2.1 Descriptor Learning on Point Sets with Dynamic Graph CNNs

Figure 2:

Proposed network architecture for geometric feature extraction from the fixed and moving point set. Input is a three-dimensional point set and the network computes a 16-dimensional geometric descriptor for each of the 4096 points. The number of layer neurons for each operation is specified in the corresponding brackets.

Our proposed network architecture for geometric feature extraction is illustrated in Figure 2. A key component is the edge convolution introduced in [15]

, that dynamically builds a k-Nearest-Neighbor (kNN) graph from the points in the input feature space and then aggregates information from neighbouring points to output a final feature map. We employ several edge convolutions with DenseNet style feature concatenation to efficiently capture both local and global geometry. The final feature descriptor is obtained by fully connected layers that reduce the point information to a given dimensionality. We restrict the output descriptor space by

normalization to enable constant parametrization of subsequent operations in the registration pipeline which stabilizes network training. To establish robust initial correspondences between the moving and fixed point set the model is pretrained in an auxiliary metric learning task using a triplet loss.

2.2 Feature-based Coherent Point Drift

The CPD algorithm formulates the alignment of two point sets as a probability density estimation problem. The points in the moving point set

are described as centroids of gaussian mixture models (GMMs) and are fitted to the points in the fixed point set

by maximizing the likelihood. To find the displacements for

the Expectation Maximization (EM) algorithm is used, where in the E-step point correspondence probabilities

are computed and in the M-step the displacement vectors are updated. We incorporate the learned geometric feature descriptors and as additional prior probabilites with


where denotes the spatial point correspondence described in [10], is a trade-off and scaling parameter and


with and . and denote the number of points in and , respectively. In addition to the parameter in (2), that controls the width of the Gaussian, the CPD algorithm includes three more free parameters: , and . Parameter

models the amount of noise and outliers in the point sets, while parameters

and control the smoothness of the deformation field.

3 Experiments

Registering the fully inflated to exhaled lungs is considered one of the most demanding tasks in medical image registration, which is important for analyzing e.g. local ventilation defects in COPD patients. We use the DIR-Lab COPD data set [5] with 10 inhale-exhale pairs of 3D CT scans for all our experiments. The thorax volumes are resampled to isotropic voxel-sizes of  mm and a few thousands keypoints are extracted from inner lung structures with the Foerstner operator. Automatic correspondences to supervise the learning of our DGCNN are established using the discrete and intensity-based registration algorithm of [8], which has an accuracy of 1 mm. In all experiments, no CT-based intensity information is used and all processing relies entirely on the geometric keypoint locations.

In our first experiment, we learn point descriptors directly in a supervised metric learning task. Therefore, a triplet loss is employed forcing feature similarity between corresponding keypoint regions in point set pairs. The inhale and exhale point set form the positive pair, while points from the permuted exhale point set yield as negative examples. These learned features can be directly used in a kNN registration. We then investigate the combination of spatial positions and learned descriptors in the feature-based CPD algorithm. Finally, in our concluding experiment, the feature network is trained in an end-to-end manner as described in Section 2 to further optimize the pretrained geometric features.

Implementation details: Due to the limited number of instances in the used dataset we perform a leave-one-out validation, where we evaluate on one inhale and exhale point set and train our network with the remaining nine pairs. During training we use farthest point sampling to obtain points from the inhale and exhale point set, respectively. Each evaluation is run ten times and results are averaged to account for the effect of the sampling step. The employed network parameters are specified in Figure 2. For the CPD algorithm ( iterations) we use following parameters: , , and . For the end-to-end training we relax parameters and to and , respectively, to allow for further optimization of input features.

4 Results and Discussion

Figure 3: Qualitative results in terms of 3D motion vectors on test case #5. The magnitude is color coded from blue (small motion) to red (large motion).
Case # initial center-aligned triplet + kNN@20 CPD [10] triplet + CPD (ours) end-to-end (ours)
1 26.3 17.8 8.1 5.5 4.2 3.4
2 21.8 14.7 15.6 8.4 9.3 8.9
3 12.6 10.6 6.4 2.7 2.5 2.4
4 29.6 19.0 8.3 4.8 3.4 3.2
5 30.1 18.4 7.8 8.4 5.2 4.6
6 28.5 16.2 7.5 14.0 5.1 4.3
7 21.6 10.2 6.3 3.0 2.6 2.5
8 26.5 17.4 6.3 6.8 4.3 3.9
9 14.9 14.1 9.0 3.5 3.1 3.6
10 21.8 19.6 14.9 7.4 7.5 7.4
mean 23.4 15.7 9.0 6.4 4.7 4.3
std 11.9 7.0 5.5 5.2 4.1 3.6
-val -
Table 1: Results for the 10 inhale and exhale CT scan pairs of the DIR-Lab COPD data set [5]. The mean target registration error (TRE) in mm is computed on the 300 expert annotated landmark pairs per case. The -values are obtained by a rank-sum test over all 3000 landmark errors with respect to our best performing approach.

Qualitative results are shown in Figure 3 where our approach demonstrates a good trade-off between the very smooth motion of the CPD and the potential for large correspondences of the features from triplet-learning. Our quantitative results that are evaluated on 300 independent expert landmark pairs for each patient demonstrate that registering the point clouds directly with CPD (3D coordinates as input) yield a relatively large target registration error (TRE) of 6.45.2 mm (see Table 1). Employing kNN registration based on a DGCNN trained with keypoint correspondences to extract geometric features without regularization is still inferior with a TRE of 9.05.5 mm highlighting the challenges of this point-based registration task and the difficulties of addressing the deformable alignment with one-to-one correspondence search. Combining the geometric features of a pre-trained DGCNN with the regularizing CPD that is extended to use 19-dimensional inputs (16 features + 3 coordinates) yields a substantial improvement over each individual method with a TRE of 4.74.1 mm. Finally, using end-to-end learning to back-propagate the regularized alignment errors through the iterative point drift layers to further improve the feature learning shows another small but significant improvement to 4.33.6 mm. These alignment errors cannot be directly compared to the large variety of image- and feature-based registration algorithms that reached 3.6 mm [6], 4.7 mm [7] or 1.1 mm [8] for similar datasets, but were based on intensity information, while our comparison is restricted to purely geometric approaches without intensity. In addition, a better outcome would be expected by extending the keypoint extraction to focus on vessel- or airway-based nodes and to include anatomical tree-based edges in the graph model. Nevertheless, the results clearly showed that our models are already able to directly learn semantic geometric features in a data-driven manner based on the inherent correspondence information.

5 Conclusion

We have presented a new method for deformable point set registration that learns geometric features from irregular point sets using a dynamic graph CNN (DGCNN) together with a regularizing and fully differentiable high-dimensional coherent point drift (CPD) model. Our results clearly indicate that geometric feature learning, even from relatively uninformative point clouds, is possible with DGCNNs and can be further enhanced when incorporating the CPD model into the optimization. Evaluated on challenging inhale-exhale lung registration of COPD patients we achieve an improvement of 2.1 mm over the classical CPD method and are competitive with many classical image-based registration algorithms despite the fact that no intensity information is used. In addition to these encouraging findings, we believe that alternative regularization models to the CPD, that require fewer iteration steps could have potential to further improve this approach. In future works, many more applications, e.g. surface point shape alignment and analysis, could benefit from deep point registration.


  • [1] Bayer, S., Ravikumar, N., Strumia, M., Tong, X., Gao, Y., Ostermeier, M., Fahrig, R., Maier, A.: Intraoperative brain shift compensation using a hybrid mixture model. In: MICCAI. pp. 116–124 (2018)
  • [2] Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. TPAMI 11(6), 567–585 (1989)
  • [3] Brachmann, E., Krull, A., Nowozin, S., Shotton, J., Michel, F., Gumhold, S., Rother, C.: Dsac-differentiable ransac for camera localization. In: CVPR. pp. 6684–6692 (2017)
  • [4] Bronstein, M.M., Bruna, J., LeCun, Y., Szlam, A., Vandergheynst, P.: Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine 34(4), 18–42 (2017)
  • [5] Castillo, R., Castillo, E., Fuentes, D., Ahmad, M., Wood, A.M., Ludwig, M.S., Guerrero, T.: A reference dataset for deformable image registration spatial accuracy evaluation using the copdgene study archive. Physics in Medicine & Biology 58(9),  2861 (2013)
  • [6] Ehrhardt, J., Werner, R., Schmidt-Richberg, A., Handels, H.: Automatic landmark detection and non-linear landmark-and surface-based registration of lung ct images. Medical Image Analysis for the Clinic-A Grand Challenge, MICCAI 2010, 165–174 (2010)
  • [7]

    Glocker, B., Komodakis, N., Tziritas, G., Navab, N., Paragios, N.: Dense image registration through mrfs and efficient linear programming. Medical image analysis

    12(6), 731–741 (2008)
  • [8] Heinrich, M.P., Handels, H., Simpson, I.J.: Estimating large lung motion in copd patients by symmetric regularised correspondence fields. In: MICCAI. pp. 338–345 (2015)
  • [9] Hu, Y., Modat, M., Gibson, E., Li, W., Ghavami, N., Bonmati, E., Wang, G., Bandula, S., Moore, C.M., Emberton, M., et al.: Weakly-supervised convolutional neural networks for multimodal image registration. Medical Image analysis 49, 1–13 (2018)
  • [10] Myronenko, A., Song, X.: Point set registration: Coherent point drift. TPAMI 32(12), 2262–2275 (2010)
  • [11]

    Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the Conference on Computer Vision and Pattern Recognition. pp. 652–660 (2017)

  • [12] Ravikumar, N., Gooya, A., Beltrachini, L., Frangi, A.F., Taylor, Z.A.: Generalised coherent point drift for group-wise multi-dimensional analysis of diffusion brain mri data. Medical image analysis 53, 47 – 63 (2019)
  • [13] Tschirren, J., McLennan, G., Palágyi, K., Hoffman, E.A., Sonka, M.: Matching and anatomical labeling of human airway tree. TMI 24(12), 1540–1547 (2005)
  • [14] de Vos, B.D., Berendsen, F.F., Viergever, M.A., Sokooti, H., Staring, M., Išgum, I.: A deep learning framework for unsupervised affine and deformable image registration. Medical image analysis 52, 128–143 (2019)
  • [15] Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph cnn for learning on point clouds. arXiv preprint arXiv:1801.07829 (2018)