MvMM-RegNet: A new image registration framework based on multivariate mixture model and neural network estimation

06/28/2020 ∙ by Xinzhe Luo, et al. ∙ 0

Current deep-learning-based registration algorithms often exploit intensity-based similarity measures as the loss function, where dense correspondence between a pair of moving and fixed images is optimized through backpropagation during training. However, intensity-based metrics can be misleading when the assumption of intensity class correspondence is violated, especially in cross-modality or contrast-enhanced images. Moreover, existing learning-based registration methods are predominantly applicable to pairwise registration and are rarely extended to groupwise registration or simultaneous registration with multiple images. In this paper, we propose a new image registration framework based on multivariate mixture model (MvMM) and neural network estimation. A generative model consolidating both appearance and anatomical information is established to derive a novel loss function capable of implementing groupwise registration. We highlight the versatility of the proposed framework for various applications on multimodal cardiac images, including single-atlas-based segmentation (SAS) via pairwise registration and multi-atlas segmentation (MAS) unified by groupwise registration. We evaluated performance on two publicly available datasets, i.e. MM-WHS-2017 and MS-CMRSeg-2019. The results show that the proposed framework achieved an average Dice score of 0.871± 0.025 for whole-heart segmentation on MR images and 0.783± 0.082 for myocardium segmentation on LGE MR images.



There are no comments yet.


page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The purpose of image registration is to align images into a common coordinate space, where further medical image analysis can be conducted, including image-guided intervention, image fusion for treatment decision, and atlas-based segmentation [15]. In the last few decades, intensity-based registration has received considerable scholarly attention. Commonly used similarity measures comprise intensity difference and correlation-based methods for intra-modality registration, and information-theoretic metrics for inter-modality registration [11, 19, 23, 24, 15].

Recently, deep learning (DL) techniques have formulated registration as a parameterized mapping function, which not only made registration in one shot possible but achieved state-of-the-art accuracies [10, 12, 4, 8]. de Vos et al. [10] computed dense correspondence between two images by optimizing normalized cross-correlation between intensity pairs. While intensity-based similarity measures are widely used for intra-modality registration, there are circumstances when no robust metric, solely based on image appearance, can be applied. Hu et al. [12] therefore resorted to weak labels from corresponding anatomical structures and landmarks to predict voxel-level correspondence. Balakrishnan et al. [4] proposed leveraging both intensity- and segmentation-based metrics as loss functions for network optimization. More recently, Dalca et al. [8] developed a probabilistic generative model and derived a framework that could incorporate both of the intensity images and anatomical surfaces.

Meanwhile, in the literature several studies have suggested coupling registration with segmentation, in which image registration and tissue classification are performed simultaneously within the same model [2, 21, 6, 27]. However, the search for the optimal solution of these methods usually entails computationally expensive iterations and may suffer from problems of parameter tuning and local optimum. A recent study attempted to leverage registration to perform Bayesian segmentation on brain MRI with an unsupervised deep learning framework [9]. Nevertheless, it can still be difficult to apply unsupervised intensity-based approaches to inter-modality registration or to datasets with poor imaging quality and obscure intensity class correspondence. Besides, previous DL-integrated registration methods have mainly focused on pairwise registration and are rarely extended to groupwise registration or simultaneous registration with multiple images.

In this paper, we consider the scenario in which multiple images from various modalities need to be co-registered simultaneously onto a common coordinate space, which is set onto a reference subject or can be implicitly assumed during groupwise registration. To this end, we propose a probabilistic image registration framework based on both multivariate mixture model (MvMM) and neural network estimation, referred to as MvMM-RegNet. The model incorporates both types of information from the appearance and anatomy associated with each image subject, and explicitly models the correlation between them. A neural network is then employed to estimate likelihood and achieve efficient optimization of registration parameters. Besides, the framework provides posterior estimation for MAS on novel test images.

The main contribution of this work is four-fold. First, we extend the conventional MvMM for image registration with multiple subjects. Second, a DL-integrated groupwise registration framework is proposed, with a novel loss function derived from the probabilistic graphical model. Third, by modelling the relationship between appearance and anatomical information, our model outperforms previous ones in terms of segmentation accuracy on cardiac medical images. Finally, we investigate two applications of the proposed framework on cardiac image segmentation, i.e. SAS via pairwise registration and MAS unified by groupwise registration, and achieve state-of-the-art results on two publicly available datasets.

2 Methods

Groupwise registration aims to align every subject in a population to a common coordinate space [5, 7], referred to as the common space [27]. Assume we have moving subjects , of which each is defined on spatial domain . For each subject , we can observe its appearance from medical imaging as well as labels of anatomical structures in various cases for image registration tasks. Thus, we can formulate as a pair of appearance and anatomical observations for each subject.

Associated with the moving subjects is a set of spatial transforms that map points from the common space to counterparts in each subject space:


where , . The framework is demonstrated in Fig. 1(a).

(a) Groupwise registration
(b) Graphical model
Figure 1:

(a) Groupwise registration framework, (b) Graphical representation of the proposed generative model, where random variables are in circles, deterministic parameters are in boxes, observed variables are shaded and plates indicate replication.

2.1 Multivariate mixture model

The proposed method builds on a generative model of the appearance and anatomical information over a population of subjects. The likelihood function is computed as a similarity measure to drive the groupwise registration process.

For spatial coordinates in the common space, an exemplar atlas can be determined a priori, providing anatomical statistics of the population regardless of their corresponding appearances through medical imaging. For notational convenience, we denote tissue types using label values , where , is the set of labels, with its prior distribution defined as . Assuming independence of each location, the likelihood can be written as . Moreover, by summing over all states of the hidden variable , we have


Given the common-space anatomical structures, the multivariate mixture model assumes conditional independence of the moving subjects, namely


where denotes a patch of observations centred at . Given anatomical structures of each subject, one can further assume its appearance is conditional independent of the groupwise anatomy, i.e.

. Hence, we can further factorize the conditional probability into


Accordingly, the log-likelihood is given by


In practice, we optimize the negative log-likelihood as a dissimilarity measure to obtain the desired spatial transforms . The graphical representation of the proposed model is shown in Fig. 1(b).

2.2 The conditional parameterization

In this section, we specify in detail the conditional probability distributions (CPDs) for a joint distribution that factorizes according to the Bayesian network structure represented in

Fig. 1(b).

2.2.1 Spatial prior.

One way to define the common-space spatial prior is to average over a cohort of subjects [2], and the resulting probabilistic atlas is used as a reference. To avoid bias from a fixed reference and consider the population as a whole, we simply adopt a flat prior over the common space, i.e. , satisfying , where is the weight to balance each tissue class.

2.2.2 Label consistency.

Spatial alignment of a group of subjects can be measured by their label consistency, defined as the joint distribution of the anatomical information , where . Each CPD gives the likelihood of the anatomical structure around a subject location being labelled as , conditioned on the transform that maps from the common space to each subject space. We model it efficiently by a local Gaussian weighting:


where is the Kronecker delta function, defines a neighbourhood around of radius and

specifies the weight for each voxel within the neighbourhood. This formulation is equivalent to applying Gaussian filtering using an isotropic standard deviation

to the segmentation mask [12], where we set .

2.2.3 Appearance model.

Finally, we seek to specify the term . A common approach adopted by many tissue classification algorithms [17, 2, 13, 27, 9] is to model this CPD as a mixture of Gaussians (MOG), where intensities of the same tissue type should be clustered and voxel locations are assumed independent. Nevertheless, we hypothesize that using such an appearance model can mislead the image registration when the assumption of intensity class correspondence is violated, due to poor imaging quality, particularly in cross-modality or contrast enhanced images [20]. Let and be the voxel-wise gradient and Euclidean-norm operators, respectively. A vanilla means is to use a mask around the ROI boundaries:


which ignores the appearance information. However, we argue that a reasonable CPD design should reflect fidelities of medical imaging and serve as a voxel-wise weighting factor for likelihood estimation. Thus, we formalize a CPD that 1) is defined with individual subjects, 2) is zero on voxels distant to the ROIs, 3) has increasing values at regions where appearance and anatomy have consistent rate of change. Therefore, we speculate that voxels with concordant gradient norms between appearance and anatomy are more contributory to determining the spatial correspondence. Based on these principles, one can estimate the CPD as a Gibbs distribution computed from an energy function or negative similarity measure between gradient-norm maps of appearance and anatomy, i.e.


where is the normalization factor and can be the negative normalized cross-correlation (NCC) [3] or negative entropy correlation coefficient (ECC) [18]. Fig. 2 visualises the different appearance models.

Figure 2: Visualization of different appearance models computed from a coronal view of a whole heart MR image subject at background areas, where ”Mask”, ”MOG”, ”NCC” and ”ECC” denote appearance model using ROI mask, mixture of Gaussians, normalized cross correlation and entropy cross correlation, respectively. For comparison, values are normalized to intervals between 0 and 1.

2.3 Neural network estimation

We formulate a neural network parameterized by that takes as input a group of images to predict the deformation fields, based on a 3D UNet-style architecture designed for image registration [12]. To discourage non-smooth displacement, we resort to bending energy as a deformation regularization term and incorporate it into the loss function [10, 12]. Hence, the final loss function for network optimization becomes


where denotes the deformation regularization term and is a regularization coefficient.

2.4 Applications

In this section, we present two applications from the proposed MvMM-RegNet framework, which are validated in our experiments.

2.4.1 Pairwise MvMM-RegNet for SAS.

Pairwise registration can be considered as a specialization of groupwise registration where the number of subjects equals two and one of the spatial coordinate transforms is the identity mapping. We will demonstrate the registration capacity of our model by performing pairwise registration on a real clinical dataset, referred to as pMvMM-RegNet.

2.4.2 Groupwise MvMM-RegNet for MAS.

During multi-atlas segmentation (MAS), multiple expert-annotated images with segmented labels, called atlases, are co-registered to a target space, where the warped atlas labels are combined by label fusion [14]. Delightfully, our model provides a unified framework for this procedure through groupwise registration, denoted as gMvMM-RegNet. By setting the common space onto the target as the reference space, we can derive the following segmentation formula:


In practice, the MAS result with atlases can be generated from times of groupwise registration over subjects followed by label fusion using Eq. 10.

3 Experiments and Results

In this section, we investigate two applications of the proposed framework described in Section 2.4. In both of the two experiments, the neural networks were trained on a

2080 Ti GPU with the spatial transformer module adapted from open-source code in VoxelMorph 


, implemented in TensorFlow 

[1]. The Adam optimizer was adopted [16], with a cyclical learning rate bouncing between 1e-5 and 1e-4 to accelerate convergence and avoid shallow local optima [22].

3.1 pMvMM-RegNet for SAS on whole heart MRI

3.1.1 Materials and baseline.

This experiment was performed on the MM-WHS challenge dataset, which provides 120 multi-modality whole-heart images from multiple sites, including 60 cardiac CT and 60 cardiac MRI [26, 25], of which 20 subjects from each of the modalities were selected as training data. Intra- (MR-to-MR) and inter-modality (CT-to-MR) but inter-subject registration tasks were explored on this dataset, resulting in 800 propagated labels in total for 40 test MR subjects.

An optimal weighting of bending energy could lead to a low registration error, when maintaining the global smoothness of the deformations. To be balanced, we set as the default regularization strategy111See Fig. 1 in the supplementary material for an empirical result.. We analysed different variants of the appearance model described in Section 2.2.3, i.e. ”MOG”, ”Mask”, ”NCC” and ”ECC”, and compared with a reimplementation of [12], known as ”WeaklyReg”, which exploited the Dice similarity metric for weakly-supervised registration. In addition, with the propagated labels obtained from pairwise registrations, we evaluated the performance of MAS by applying a simple majority vote to the results, denoted as ”MVF-MvMM”.

3.1.2 Results and discussion.

Methods MR-to-MR CT-to-MR
Dice HD (mm) Dice HD (mm)
Baseline-MOG *
Baseline-Mask * * *
Baseline-ECC * *
Baseline-NCC *
MVF-MvMM Dice= HD (mm)=
Table 1: Average substruture Dice and Hausdorff distance (HD) of MR-to-MR and CT-to-MR inter-subject registration, with * indicating statistically significant improvement given by a Wilcoxon signed-rank test ().

Table 1 presents the Dice statistics of both intra- and inter-modality registration tasks on the MM-WHS dataset. With increasingly plausible modelling of the relationship between appearance and anatomy, we have observed better registration accuracy especially for MR images, indicating efficacy of the proposed framework. Fusing labels by majority vote (”MVF-MvMM”) can produce a better segmentation accuracy, reaching an average Dice score of 222See Fig.2 in the supplementary material for evaluation statistics on all cardiac substructure., comparable to the inter-observer variability of reported in [25].

3.2 gMvMM-RegNet for MAS on LGE CMR

3.2.1 Materials and baseline.

In this experiment, we explored MAS with the application of Eq. 10 on MS-CMRSeg challenge dataset [27]. The dataset consists of 45 patients scanned using three CMR sequences, i.e. the LGE, T2 and bFFSP, from which 20 patients were chosen in random for training, 5 for validation and 20 for testing. We implemented inter-subject and inter-modality groupwise registration and evaluated the MAS results on LGE CMR images.

A 2D version of the network architecture described in Section 2.3 was devised to jointly predict the deformation fields for atlases by optimizing Eq. 9. The MAS result was generated by times of groupwise registration over randomly sampled subjects followed by label fusion using Eq. 10.

3.2.2 Results and discussion.

Figure 3: Dice scores of MAS results using atlases, where denotes the number of subjects used in each groupwise registration and counts the number of groupwise registrations performed before label fusion.

The comparison between SAS and MAS highlights that more accurate and realistic segmentation is generated by groupwise registration than pairwise registration, especially for apical and basal slices333See Fig. 3 in the supplementary material for visualization of the segmentation results.. Fig. 3 further reports the mean Dice scores for each cardiac substructure obtained from MAS using times of groupwise registration with subjects. With a fixed total number of atlases, label fusion on 2D slices resulting from groupwise registration outperforms those from conventional pairwise registration, reaching the average myocardium Dice score of . However, we also observe decline in accuracy when having a large number of subjects () to be groupwise registered. This discrepancy could be attributed to the lack of network parameters compromising the predicted deformations.

4 Conclusion

In this work, we propose a probabilistic image registration framework based on multivariate mixture model and neural network estimation, coupling groupwise registration and multi-atlas segmentation in a unified fashion. We have evaluated two applications of the proposed model, i.e. SAS via pairwise registration and MAS unified by groupwise registration, on two publicly available cardiac image datasets and compared with state-of-the-art methods. The proposed appearance model along with MvMM has shown its efficacy in realizing registration on cardiac medical images characterizing inferior intensity class correspondence. Our method has also proved its superiority over conventional pairwise registration algorithms in terms of segmentation accuracy, highlighting the advantage of groupwise registration as a subroutine to MAS.


This work was supported by the National Natural Science Foundation of China (grant no. 61971142).


  • [1] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. J. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Józefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. G. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. A. Tucker, V. Vanhoucke, V. Vasudevan, F. B. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng (2015)

    TensorFlow: large-scale machine learning on heterogeneous distributed systems

    ArXiv abs/1603.04467. Cited by: §3.
  • [2] J. Ashburner and K. J. Friston (2005) Unified segmentation. NeuroImage 26, pp. 839–851. Cited by: §1, §2.2.1, §2.2.3.
  • [3] B. B. Avants, C. L. Epstein, M. Grossman, and J. C. Gee (2008) Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain. Medical image analysis 12 1, pp. 26–41. Cited by: §2.2.3.
  • [4] G. Balakrishnan, A. Zhao, M. R. Sabuncu, J. V. Guttag, and A. V. Dalca (2019) VoxelMorph: a learning framework for deformable medical image registration. IEEE Transactions on Medical Imaging 38, pp. 1788–1800. Cited by: §1, §3.
  • [5] S. K. Balci, P. Golland, M. E. Shenton, and W. M. Wells (2007) Free-form b-spline deformation model for groupwise registration.. Medical image computing and computer-assisted intervention : MICCAI … International Conference on Medical Image Computing and Computer-Assisted Intervention 10 WS, pp. 23–30. Cited by: §2.
  • [6] K. K. Bhatia, P. Aljabar, J. P. Boardman, L. Srinivasan, M. Murgasova, S. J. Counsell, M. A. Rutherford, J. V. Hajnal, A. D. Edwards, and D. Rueckert (2007) Groupwise combined segmentation and registration for atlas construction. Medical image computing and computer-assisted intervention : MICCAI … International Conference on Medical Image Computing and Computer-Assisted Intervention 10 Pt 1, pp. 532–40. Cited by: §1.
  • [7] K. K. Bhatia, J. V. Hajnal, A. Hammers, and D. Rueckert (2007) Similarity metrics for groupwise non-rigid registration. Medical image computing and computer-assisted intervention : MICCAI … International Conference on Medical Image Computing and Computer-Assisted Intervention 10 Pt 2, pp. 544–52. Cited by: §2.
  • [8] A. V. Dalca, G. Balakrishnan, J. V. Guttag, and M. R. Sabuncu (2019) Unsupervised learning of probabilistic diffeomorphic registration for images and surfaces. Medical image analysis 57, pp. 226–236. Cited by: §1.
  • [9] A. V. Dalca, E. M. Yu, P. Golland, B. Fischl, M. R. Sabuncu, and J. E. Iglesias (2019) Unsupervised deep learning for bayesian brain mri segmentation. In MICCAI, Cited by: §1, §2.2.3.
  • [10] B. D. de Vos, F. F. Berendsen, M. A. Viergever, H. Sokooti, M. Staring, and I. Išgum (2018) A deep learning framework for unsupervised affine and deformable image registration. Medical Image Analysis 52, pp. 128–143. Cited by: §1, §2.3.
  • [11] D. L. G. Hill, P. G. Batchelor, M. Holden, and D. J. Hawkes (2001) Medical image registration. Physics in medicine and biology 46 3, pp. R1–45. Cited by: §1.
  • [12] Y. Hu, M. Modat, E. Gibson, W. Li, N. Ghavami, E. Bonmati, G. Wang, S. Bandula, C. M. Moore, M. Emberton, S. Ourselin, J. A. Noble, D. C. Barratt, and T. Vercauteren (2018)

    Weakly-supervised convolutional neural networks for multimodal image registration

    Medical Image Analysis 49, pp. 1–13. Cited by: §1, §2.2.2, §2.3, §3.1.1.
  • [13] J. E. Iglesias, M. R. Sabuncu, and K. V. Leemput (2013) A unified framework for cross-modality multi-atlas segmentation of brain mri. Medical Image Analysis 17, pp. 1181–1191. Cited by: §2.2.3.
  • [14] J. E. Iglesias and M. R. Sabuncu (2014) Multi-atlas segmentation of biomedical images: a survey. Medical image analysis 24 1, pp. 205–219. Cited by: §2.4.2.
  • [15] A. Khalil, S. Ng, Y. M. Liew, and K. W. Lai (2018) An overview on image registration techniques for cardiac diagnosis and treatment. Cardiology research and practice. Cited by: §1.
  • [16] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. CoRR abs/1412.6980. Cited by: §3.
  • [17] K. V. Leemput, F. Maes, D. Vandermeulen, and P. Suetens (1999) Automated model-based tissue classification of mr images of the brain. IEEE Transactions on Medical Imaging 18, pp. 897–908. Cited by: §2.2.3.
  • [18] F. Maes, A. Collignon, D. Vandermeulen, G. Marchal, and P. Suetens (1997) Multimodality image registration by maximization of mutual information. IEEE Transactions on Medical Imaging 16, pp. 187–198. Cited by: §2.2.3.
  • [19] T. Mäkelä, P. Clarysse, O. Sipilä, N. Pauna, Q. Pham, T. Katila, and I. E. Magnin (2002) A review of cardiac image registration methods. IEEE Transactions on Medical Imaging 21, pp. 1011–1021. Cited by: §1.
  • [20] J. P. W. Pluim, J. B. A. Maintz, and M. A. Viergever (2003) Mutual-information-based registration of medical images: a survey. IEEE Transactions on Medical Imaging 22, pp. 986–1004. Cited by: §2.2.3.
  • [21] K. M. Pohl, J. W. Fisher, W. E. L. Grimson, R. Kikinis, and W. M. Wells (2006) A bayesian model for joint segmentation and registration. NeuroImage 31, pp. 228–239. Cited by: §1.
  • [22] L. N. Smith (2015) Cyclical learning rates for training neural networks.

    2017 IEEE Winter Conference on Applications of Computer Vision (WACV)

    , pp. 464–472.
    Cited by: §3.
  • [23] A. Sotiras, C. Davatzikos, and N. Paragios (2013) Deformable medical image registration: a survey. IEEE Transactions on Medical Imaging 32, pp. 1153–1190. Cited by: §1.
  • [24] M. A. Viergever, J. B. A. Maintz, S. Klein, K. Murphy, M. Staring, and J. P. W. Pluim (2016) A survey of medical image registration - under review. Medical image analysis 33, pp. 140–144. Cited by: §1.
  • [25] X. Zhuang, L. Li, C. Payer, D. Štern, M. Urschler, M. P. Heinrich, J. Oster, C. Wang, Ö. Smedby, C. Bian, X. Yang, P. Heng, A. Mortazi, U. Bagci, G. Yang, C. Sun, G. Galisot, J. Ramel, T. Brouard, Q. Tong, W. Si, X. Liao, G. Zeng, Z. Shi, G. Zheng, C. Wang, T. MacGillivray, D. E. Newby, K. S. Rhode, S. Ourselin, R. Mohiaddin, J. Keegan, D. N. Firmin, and G. Yang (2019) Evaluation of algorithms for multi-modality whole heart segmentation: an open-access grand challenge. In Medical Image Analysis, Cited by: §3.1.1, §3.1.2.
  • [26] X. Zhuang and J. Shen (2016) Multi-scale patch and multi-modality atlases for whole heart segmentation of mri. Medical image analysis 31, pp. 77–87. Cited by: §3.1.1.
  • [27] X. Zhuang (2019) Multivariate mixture model for myocardial segmentation combining multi-source images. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, pp. 2933–2946. Cited by: §1, §2.2.3, §2, §3.2.1.