Large Deformation Diffeomorphic Image Registration with Laplacian Pyramid Networks

06/29/2020 ∙ by Tony C. W. Mok, et al. ∙ 0

Deep learning-based methods have recently demonstrated promising results in deformable image registration for a wide range of medical image analysis tasks. However, existing deep learning-based methods are usually limited to small deformation settings, and desirable properties of the transformation including bijective mapping and topology preservation are often being ignored by these approaches. In this paper, we propose a deep Laplacian Pyramid Image Registration Network, which can solve the image registration optimization problem in a coarse-to-fine fashion within the space of diffeomorphic maps. Extensive quantitative and qualitative evaluations on two MR brain scan datasets show that our method outperforms the existing methods by a significant margin while maintaining desirable diffeomorphic properties and promising registration speed.



There are no comments yet.


page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deformable registration is the process of computing a non-linear transformation to align a pair of images or image volumes by maximizing certain similarity metric between the images. Deformable registration is crucial in a variety of medical image analysis, including diagnostic tasks, radiotherapy and image-guided surgery. Conventional image registration methods

[23, 18, 2, 24]

often rely on the multi-resolution strategy and estimate the target transformation iteratively along with a smoothness regularization. Although conventional image registration methods excel in registration accuracy and diffeomorphic properties (i.e., invertible and topology preserving), the running time of the registration process is dependent on the degree of misalignment between the input images and can be time-consuming with high-resolution 3D image volumes. Recent unsupervised deep learning-based image registration (DLIR) methods

[4, 5, 20, 26]

have demonstrated promising registration speed and quality in a variety of deformable image registration tasks. They treat the image registration problem as the pixel-wise image translation problem, which attempt to learn the pixel-wise spatial correspondence from a pair of input images by using convolutional neural networks (CNN). This significantly speeds up the registration process and shows immense potential for time-sensitive medical studies such as image-guided surgery and motion tracking. However, these approaches may not be good solutions to unsupervised large deformation image registration for two reasons. First, the gradient of the similarity metric at the finest resolution is rough in general, as many possible transformations of the moving image could yield similar measurements of similarity. Second, the optimization problem without the initialized transformation at the finest resolution is difficult due to the large degrees of freedom in the transformation parameters.

To address this challenge, a preliminary study [25] proposes to stack multiple CNNs for direct affine and deformable image registrations, which are optimized separately. Zhao et al. [30] leverage an end-to-end recursive cascaded network to refine the registration result progressively, which is identical to breaking down a large deformation into multiple small deformations. But, both methods are only optimized at the finest level using gradient descent and therefore the results can be sub-optimal as the gradient of the similarity metric can be rough at the finest resolution. Moreover, the recursive cascaded networks consume tremendous extra GPU memory, which limits the possible degree of refinement in 3D settings, resulting in minimal improvement over the brain MR registration tasks. A recent paper [9] avoids these pitfalls and utilizes multiple separated CNNs to mimic the conventional multi-resolution strategy. However, the multiple networks are trained separately and the non-linearity of feature maps in each network are collapsed into a warped image before feeding into the next level. Furthermore, these methods completely ignore the desirable diffeomorphic properties of the transformation, which can further limit their potential for clinical usage.

In this paper, we address the above challenges and present a new deep Laplacian Pyramid Image Registration Network (LapIRN) for large deformation image registration. The main contributions of this work are as follows. We

  • present a novel LapIRN for large deformable image registration that utilizes the advantages of a multi-resolution strategy while maintaining the non-linearity of the feature maps throughout the coarse-to-fine optimization scheme;

  • propose a new pyramid similarity metric for a pyramid network to capture both large and small misalignments between the input scans, which helps to avoid local minima during the optimization; and

  • present an effective diffeomorphic setting of our method and show that our method guarantees desirable diffeomorphic properties, including the invertibility and topology preservation, of the computed transformations.

2 Methods

The Laplacian pyramid network has demonstrated its efficiency and effectiveness in a variety of computer vision tasks, including high-resolution image synthetic

[28, 6]

, super-resolution

[13] and optical flow estimation [29]

, in constructing high-resolution solutions, stabilizing the training and avoiding the local minima. Motivated by the successes of Laplacian pyramid networks, we propose LapIRN that naturally integrates the conventional multi-resolution strategy while maintaining the non-linearity of the feature maps throughout different pyramid levels. In the following sections, we describe the methodology of our proposed LapIRN, including the Laplacian pyramid architecture, coarse-to-fine training scheme, the loss function and, finally, we describe the diffeomorphic settings of our method.

Figure 1: Overview of the proposed 3-level deep Laplacian pyramid image registration networks in 2D settings. We utilize three identical CNN-based registration networks to mimic the registration with the multi-resolution schema. The feature maps from feature encoder, a set of residual blocks, and feature decoder are colored with blue, green and red, respectively. The dotted paths are only included in the training phase. We highlight that all registrations are done in 3D throughout this paper. For clarity and simplicity, we depict the 2D formulation of our method in the figure.

2.1 Deep Laplacian Pyramid Image Registration Networks

Given a fixed 3D scan and a moving 3D scan , the objective of our method is to estimate a time 1 diffeomorphic deformation field such that the warped moving scan is aligned with , subject to the smoothness regularization on the predicted velocity field v. Specifically, we parametrize the deformable registration problem as a function with the Laplacian pyramid framework, where represents the learning parameters in the networks.

2.1.1 Network architecture

We implement our LapIRN using a -level Laplacian pyramid framework to mimic the conventional multi-resolution strategy. For simplicity, we set

to 3 throughout this paper. The overview of LapIRN is illustrated in Fig. 1. Specifically, we first create the input image pyramid by downsampling the input images with trilinear interpolation to obtain

(and ), where denotes the downsampled with a scale factor and . We employ a CNN-based registration network (CRN) to solve the optimization problem for each pyramid level. For the first pyramid level, CRN captures the non-linear misalignment from the concatenated input scans with the coarsest resolution ( and

) and outputs the 3-channel dense vector fields

and deformation fields . For pyramid level , we first upsample the output deformation field from the previous pyramid level () by a factor of 2 to obtain and warp with to obtain a warped moving image . Then, we also upsample the output velocity field from the previous level () by a scale factor of 2 (denoted as ) and concatenate it with the input scans ( and ) to form a 5-channel input for the CRN in level . Finally, we add the output velocity fields from level with upsampled to obtain and integrate the resulting velocity field to produce the final deformation fields for pyramid level . The feature embeddings from CRN at the lower level are added to the next level via a skip connection, which greatly increases the receptive field as well as the non-linearity of the network to learn complex non-linear correspondence at the finer levels.

2.1.2 CNN-based Registration Network

The architecture of CRNs is identical among all the pyramid levels. The CRN consists of 3 components: a feature encoder, a set of residual blocks, and a feature decoder. As shown in Fig. 1, the feature encoder is comprised of two

3D convolutional layers with stride 1 and one

3D convolutional layer with stride 2. In our implementation, we use 5 residual blocks for each CRN, each containing two 3D convolutional layers with pre-activation structure [8] and skip connection. Finally, a feature decoder module with one transpose convolutional layer and two consecutive 3D convolutional layers with stride 1, followed by SoftSign activation, is appended at the end to output the target velocity fields v

. A skip connection from the feature encoder to the feature decoder is added to prevent the vanishing of low-level features when learning the target deformation fields. In CRN, each convolution layer has 28 filters and is followed by a leaky rectified linear unit (LeakyReLU) activation

[15] with a negative slope of 0.2, except for the output convolution layers.

2.1.3 Coarse-to-fine Training

Intuitively, our proposed LapIRN can be trained in an end-to-end manner, which is identical to learning a multi-resolution registration with deep supervision [14]. However, we found that end-to-end training for LapIRN is not an ideal training scheme as it is difficult to balance the weight of multiple losses between different resolutions. To address this issue, we propose to train LapIRN using a coarse-to-fine training scheme with a stable warm start, which is similar to [11, 28]. Specifically, we first train the CRN from the coarsest level alone and then we progressively add the CRN from the next level to learn the image registration problem at a finer resolution. To avoid an unstable warm start, we freeze the learning parameters for all the pre-trained CRNs for a constant steps whenever a new CRN is added to the training. We set to 2000 and repeat this training scheme until the finest level is completed.

Figure 2: Example axial MR slices from the moving, atlas and resulting warped images from Demons, SyN, DIF-VM, VM and LapIRN. The caudate and putamen are colored in red and blue respectively. Major artifacts are pointed out by yellow arrows.

2.2 Similarity Pyramid

Solving the image registration problem with a intensity-based similarity metric on the finest resolution often results in local minimal solutions. By leveraging the fact that perfectly aligned image pair will yield high similarity values among all resolutions, we propose a similarity pyramid framework to address this challenge. Although the proposed similarity pyramid framework applies to a multitude of similarity measurements, we formulate it using local normalized cross-correlation (NCC) as seen in [4] for simplicity. The proposed similarity pyramid is then formulated as:


where denotes the similarity pyramid with levels, represents the local normalized cross-correlation with windows size , and denotes the images in the image pyramid (i.e., is the image with the lowest resolution). A lower weight is assigned to the similarity value with lower resolution to prevent the domination of the similarity from lower level. We set to in our implementation. The proposed similarity pyramid captures the similarity in a multi-resolution fashion. Since the similarity metric is smoother and less sensitive to noise in the coarser resolution, integrating the similarity metric from a lower level helps to avoid local minima during the optimization problem in high-resolution.


where denotes the current pyramid level, the second terms is the smoothness regularization on the velocity field v, and is a regularization parameter.

2.3 Diffeomorphic Deformation

Recent DLIR methods often parameterize the deformation model using displacement field such that the dense deformation field , where represents the identity transformation. Although this parameterization is common and intuitive, the desirable properties of the predicted solution, including topology preservation and invertibility, cannot be guaranteed. Therefore, we parameterize our deformation model using the stationary velocity field under the Log-Euclidean framework and optimize our model within the space of diffeomorphic maps. Specifically, the diffeomorphic deformation field is defined as , subject to . We follow [1, 5] to integrate the (smooth) stationary velocity field v over unit time using the scaling and squaring method with time step to obtain the time 1 deformation field such that is approximated to , which is a member of the Lie group. Apart from that, we also report the results of LapIRN, which is a variant of LapIRN parameterizing the deformation model with displacement fields instead.

3 Experiments

3.0.1 Data and Pre-processing

We have evaluated our method on brain atlas registration tasks using 425 T1-weighted brain MR scans from the OASIS [16] dataset and 40 brain MR scans from the LPBA40 [22] dataset. In the OASIS dataset, it includes subjects aged from 18 to 96 and 100 of the included subjects suffered from very mild to moderate Alzheimer’s disease. We carry out standard preprocessing steps, including skull stripping, spatial normalization and subcortical structures segmentation, for each MR scan using FreeSurfer [7]. For OASIS, we utilize the subcortical segmentation maps of the 26 anatomical structures as the ground truth in the evaluation. In the LPBA40 dataset, the MR scans in atlas space and its subcortical segmentation map of 56 structures, which are manually delineated by experts, are used in our experiments. We resample all MR scans with isotropic voxel sizes of and center cropped all the preprocessed MRI scans to . We randomly split the OASIS dataset into 255, 20 and 150 volumes and split the LPBA40 dataset into 28, 2 and 10 volumes for training, validation and test sets, respectively. We randomly select 5 MR scans and 2 MR scans from the test sets as the atlas in OASIS and LPBA40, respectively. Finally, we register each subject to an atlas using different deformable registration methods and list the results in Table 1. In total, there are 745 and 18 combinations of test scans from OASIS and LPBA40, respectively, included in the evaluation.

3.0.2 Measurement

While recent DLIR methods [30, 4, 10] evaluate their method solely based on the Dice score between the segmentation maps in warped moving scans and the atlas, the quality of the predicted deformation fields, as well as the desirable diffeomorphic properties, are by no means to be ignored. Therefore, we evaluate our method using a sequence of measurements, including the Dice score of the subcortical segmentation maps (DSC), the percentage of voxels with non-positive Jacobian determinant (

), the standard deviation of the Jacobian determinant on the deformation fields (std(

)), the volume change between the segmentation maps before and after transformation (TC) [21], and the average running time to register each pair of MR scans in seconds (Time), to provide a comprehensive evaluation on registration accuracy and the quality of solutions.

Figure 3: Boxplots depicting the average Dice scores of each anatomical structure in OASIS for DIF-VM, SyN, Demons and our method. The left and right hemispheres of the brain are combined into one structure for visualization. The brain stem (BS), thalamus (Th), cerebellum cortex (CblmC), lateral ventricle (LV), cerebellum white matter (WM), putamen (Pu), caudate (Ca), pallidum (Pa), hippocampus (Hi), 3rd ventricle (3V), 4th ventricle (4V), amygdala (Am), CSF (CSF), and cerebral cortex (CeblC) are included.

3.0.3 Implementation

Our proposed method LapIRN and its variants LapIRN

are implemented with PyTorch

[19]. We employ an Adam optimizer with a fixed learning rate 1. We set to 4 for LapIRN, which is just enough to guarantee the smoothness of the velocity fields, and to 1 for LapIRN. We train our networks from scratch and select the model with the highest Dice score on the validation set.

3.0.4 Baseline Methods

We compare our method with two conventional approaches (denoted as Demons [27] and SyN [2]) and two cutting edge DLIR methods (denoted as VM[4] and DIF-VM [5]). Demons and SyN are the top-performing registration among 14 classical non-linear deformation algorithms [12]. Both Demons and SyN utilize a 3-level multi-resolution strategy to capture large deformation. VM employs a ”U” shape CNN structure to learn the dense non-linear correspondence between input scans, while DIF-VM is a probabilistic diffeomorphic variant of VM. For Demons, we use the official implementation in the ITK toolkit [17]. For SyN, we adopt the official implementation in the ANTs software package [3]. The parameters in Demons and SyN are carefully tuned to balance the tradeoff between registration accuracy and runtime. For the DLIR methods (VM and DIF-VM), we use their official implementation online with default parameters. All DLIR methods are trained from scratch.

Method OASIS LPBA40   DSC std() TC Time     DSC std() TC Time   Initial 0.567 - - - - 0.586 - - - - Demons 0.715 0.000 0.259 1.102 192 0.720 0.048 0.174 1.004 190 SyN 0.723 0.000 0.357 1.109 1439 0.725 0.000 0.241 1.069 1225 DIF-VM 0.693 0.008 0.592 1.086 0.695 0.680 0.970 0.414 0.986 0.683 VM 0.727 2.626 0.611 1.054 0.517 0.705 0.884 0.319 1.025 0.519 LapIRN 0.808 3.031 0.651 1.161 0.312 0.756 3.110 0.728 1.033 0.310 LapIRN 0.765 0.007 0.319 1.101 0.331 0.736 0.008 0.301 1.032 0.334

Table 1: Quantitative evaluation of the results from OASIS and LPBA40 dataset. DSC indicates registration accuracy. represents the average percentage of folding voxels in the deformation fields. std() indicates the smoothness of the deformation fields (lower is better). indicates the topology change of the anatomical structure (closer to 1 is better). indicates the average running time to register each pair of MR scans in seconds. Initial: spatial normalization.

3.0.5 Results

Table 1 gives a comprehensive summary of the results. The variant of our method LapIRN achieves 0.808 Dice on a large scale MR brain dataset (OASIS), which outperforms both conventional methods and DLIR methods, Demons, SyN, DIF-VM, VM, by a significant margin of 13%, 18%, 17% and 11% of Dice score respectively. Nevertheless, similar to methods that work with displacement fields (i.e., VM), the solutions from LapIRN and VM cannot guarantee to be smooth and locally invertible as indicated by the high standard deviation of Jacobian determinant (0.65 and 0.61 respectively) and a large percentage of folding voxels (3% and 2.6%) in both datasets. Our proposed method LapIRN alleviates this issue and achieves the best registration performance over all the baseline methods, yielding plausible and smooth deformation fields with a standard deviation of the Jacobian determinants of 0.319 and folding voxels. Furthermore, the inference time of LapIRN is only 0.33 sec, which is significantly faster than the conventional methods (Demons and SyN). We also highlight that our methods outperform the conventional methods even on the small-scale LPBA40 dataset, which has limited training data. Figure 2 illustrates the example of MR slices with large initial misalignment from all methods. The qualitative result shows that LapIRN is capable of large deformation, while the results from VM and DIF-VM are considered to be sub-optimal. Figure 3 depicts the average DSC for each anatomical structure in OASIS dataset. Compare to methods with diffeomorphic properties, our proposed method LapIRN achieves consistently better registration performance among 14 anatomical structures.

4 Conclusion

In this paper, we have presented a novel deep Laplacian pyramid networks for deformable image registration with the similarity pyramid, which mimics the conventional multi-resolution strategy to capture large misalignments between input scans. To guarantee the desirable diffeomorphic properties of the deformation fields, we formulate our method with diffeomorphism using the stationary vector fields under the Log-Euclidean framework. Extensive experiments have been carried out and the results showed that not only does our method achieve the state-of-the-art registration accuracy with very efficient running time (0.3 sec), our methods also guarantee desirable diffeomorphic properties of the deformation fields. The formulation of our method can be easily transferred to various applications with minimum effort and has demonstrated immense potentials for time-sensitive medical studies.


  • [1] Arsigny, V., Commowick, O., Pennec, X., Ayache, N.: A log-euclidean framework for statistics on diffeomorphisms. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 924–931. Springer (2006)
  • [2] Avants, B.B., Epstein, C.L., Grossman, M., Gee, J.C.: Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain. Medical image analysis 12(1), 26–41 (2008)
  • [3] Avants, B.B., Tustison, N., Song, G.: Advanced normalization tools (ants). Insight j 2(365), 1–35 (2009)
  • [4]

    Balakrishnan, G., Zhao, A., Sabuncu, M.R., Guttag, J., Dalca, A.V.: An unsupervised learning model for deformable medical image registration. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 9252–9260 (2018)

  • [5] Dalca, A.V., Balakrishnan, G., Guttag, J., Sabuncu, M.R.: Unsupervised learning for fast probabilistic diffeomorphic registration. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 729–738. Springer (2018)
  • [6] Denton, E.L., Chintala, S., Fergus, R., et al.: Deep generative image models using a laplacian pyramid of adversarial networks. In: Advances in neural information processing systems. pp. 1486–1494 (2015)
  • [7] Fischl, B.: Freesurfer. Neuroimage 62(2), 774–781 (2012)
  • [8] He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: European conference on computer vision. pp. 630–645. Springer (2016)
  • [9] Hering, A., van Ginneken, B., Heldmann, S.: mlvirnet: Multilevel variational image registration network. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 257–265. Springer (2019)
  • [10] Hu, X., Kang, M., Huang, W., Scott, M.R., Wiest, R., Reyes, M.: Dual-stream pyramid registration network. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 382–390. Springer (2019)
  • [11] Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017)
  • [12] Klein, A., Andersson, J., Ardekani, B.A., et al.: Evaluation of 14 nonlinear deformation algorithms applied to human brain mri registration. Neuroimage 46(3), 786–802 (2009)
  • [13] Lai, W.S., Huang, J.B., Ahuja, N., Yang, M.H.: Fast and accurate image super-resolution with deep laplacian pyramid networks. IEEE transactions on pattern analysis and machine intelligence 41(11), 2599–2613 (2018)
  • [14]

    Lee, C.Y., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. In: Artificial intelligence and statistics. pp. 562–570 (2015)

  • [15]

    Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: International Conference on Machine Learning (ICML). vol. 30, p. 3 (2013)

  • [16] Marcus, D.S., Wang, T.H., Parker, J., Csernansky, J.G., Morris, J.C., Buckner, R.L.: Open access series of imaging studies (oasis): cross-sectional mri data in young, middle aged, nondemented, and demented older adults. Journal of cognitive neuroscience 19(9), 1498–1507 (2007)
  • [17] McCormick, M.M., Liu, X., Ibanez, L., Jomier, J., Marion, C.: Itk: enabling reproducible research and open science. Frontiers in neuroinformatics 8,  13 (2014)
  • [18] Ou, Y., Sotiras, A., Paragios, N., Davatzikos, C.: Dramms: Deformable registration via attribute matching and mutual-saliency weighting. Medical image analysis 15(4), 622–639 (2011)
  • [19] Paszke, A., Gross, S., Chintala, S., et al.: Automatic differentiation in pytorch. In: NIPS-W (2017)
  • [20] Rohé, M.M., Datar, M., Heimann, T., Sermesant, M., Pennec, X.: Svf-net: Learning deformable image registration using shape matching. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 266–274. Springer (2017)
  • [21] Rohlfing, T., Maurer, C.R., Bluemke, D.A., Jacobs, M.A.: Volume-preserving nonrigid registration of mr breast images using free-form deformation with an incompressibility constraint. IEEE transactions on medical imaging 22(6), 730–741 (2003)
  • [22] Shattuck, D.W., Mirza, M., Adisetiyo, V., Hojatkashani, C., Salamon, G., Narr, K.L., Poldrack, R.A., Bilder, R.M., Toga, A.W.: Construction of a 3d probabilistic atlas of human cortical structures. Neuroimage 39(3), 1064–1080 (2008)
  • [23] Thirion, J.P.: Image matching as a diffusion process: an analogy with maxwell’s demons. Medical image analysis 2(3), 243–260 (1998)
  • [24] Vercauteren, T., Pennec, X., Perchant, A., Ayache, N.: Diffeomorphic demons: Efficient non-parametric image registration. NeuroImage 45(1), S61–S72 (2009)
  • [25] de Vos, B.D., Berendsen, F.F., Viergever, M.A., Sokooti, H., Staring, M., Išgum, I.: A deep learning framework for unsupervised affine and deformable image registration. Medical image analysis 52, 128–143 (2019)
  • [26] de Vos, B.D., Berendsen, F.F., Viergever, M.A., Staring, M., Išgum, I.: End-to-end unsupervised deformable image registration with a convolutional neural network. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pp. 204–212. Springer (2017)
  • [27] Wang, H., Dong, L., O’Daniel, J., Mohan, R., Garden, A.S., Ang, K.K., Kuban, D.A., Bonnen, M., Chang, J.Y., Cheung, R.: Validation of an accelerated ‘demons’ algorithm for deformable image registration in radiation therapy. Physics in Medicine & Biology 50(12),  2887 (2005)
  • [28] Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional gans. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 8798–8807 (2018)
  • [29] Xiang, X., Zhai, M., Zhang, R., Lv, N., El Saddik, A.: Optical flow estimation using spatial-channel combinational attention-based pyramid networks. In: IEEE International Conference on Image Processing (ICIP). pp. 1272–1276. IEEE (2019)
  • [30] Zhao, S., Dong, Y., Chang, E.I., Xu, Y., et al.: Recursive cascaded networks for unsupervised medical image registration. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 10600–10610 (2019)