1 Introduction
The purpose of image registration is to align images into a common coordinate space, where further medical image analysis can be conducted, including imageguided intervention, image fusion for treatment decision, and atlasbased segmentation [15]. In the last few decades, intensitybased registration has received considerable scholarly attention. Commonly used similarity measures comprise intensity difference and correlationbased methods for intramodality registration, and informationtheoretic metrics for intermodality registration [11, 19, 23, 24, 15].
Recently, deep learning (DL) techniques have formulated registration as a parameterized mapping function, which not only made registration in one shot possible but achieved stateoftheart accuracies [10, 12, 4, 8]. de Vos et al. [10] computed dense correspondence between two images by optimizing normalized crosscorrelation between intensity pairs. While intensitybased similarity measures are widely used for intramodality registration, there are circumstances when no robust metric, solely based on image appearance, can be applied. Hu et al. [12] therefore resorted to weak labels from corresponding anatomical structures and landmarks to predict voxellevel correspondence. Balakrishnan et al. [4] proposed leveraging both intensity and segmentationbased metrics as loss functions for network optimization. More recently, Dalca et al. [8] developed a probabilistic generative model and derived a framework that could incorporate both of the intensity images and anatomical surfaces.
Meanwhile, in the literature several studies have suggested coupling registration with segmentation, in which image registration and tissue classification are performed simultaneously within the same model [2, 21, 6, 27]. However, the search for the optimal solution of these methods usually entails computationally expensive iterations and may suffer from problems of parameter tuning and local optimum. A recent study attempted to leverage registration to perform Bayesian segmentation on brain MRI with an unsupervised deep learning framework [9]. Nevertheless, it can still be difficult to apply unsupervised intensitybased approaches to intermodality registration or to datasets with poor imaging quality and obscure intensity class correspondence. Besides, previous DLintegrated registration methods have mainly focused on pairwise registration and are rarely extended to groupwise registration or simultaneous registration with multiple images.
In this paper, we consider the scenario in which multiple images from various modalities need to be coregistered simultaneously onto a common coordinate space, which is set onto a reference subject or can be implicitly assumed during groupwise registration. To this end, we propose a probabilistic image registration framework based on both multivariate mixture model (MvMM) and neural network estimation, referred to as MvMMRegNet. The model incorporates both types of information from the appearance and anatomy associated with each image subject, and explicitly models the correlation between them. A neural network is then employed to estimate likelihood and achieve efficient optimization of registration parameters. Besides, the framework provides posterior estimation for MAS on novel test images.
The main contribution of this work is fourfold. First, we extend the conventional MvMM for image registration with multiple subjects. Second, a DLintegrated groupwise registration framework is proposed, with a novel loss function derived from the probabilistic graphical model. Third, by modelling the relationship between appearance and anatomical information, our model outperforms previous ones in terms of segmentation accuracy on cardiac medical images. Finally, we investigate two applications of the proposed framework on cardiac image segmentation, i.e. SAS via pairwise registration and MAS unified by groupwise registration, and achieve stateoftheart results on two publicly available datasets.
2 Methods
Groupwise registration aims to align every subject in a population to a common coordinate space [5, 7], referred to as the common space [27]. Assume we have moving subjects , of which each is defined on spatial domain . For each subject , we can observe its appearance from medical imaging as well as labels of anatomical structures in various cases for image registration tasks. Thus, we can formulate as a pair of appearance and anatomical observations for each subject.
Associated with the moving subjects is a set of spatial transforms that map points from the common space to counterparts in each subject space:
(1) 
where , . The framework is demonstrated in Fig. 1(a).
(a) Groupwise registration framework, (b) Graphical representation of the proposed generative model, where random variables are in circles, deterministic parameters are in boxes, observed variables are shaded and plates indicate replication.
2.1 Multivariate mixture model
The proposed method builds on a generative model of the appearance and anatomical information over a population of subjects. The likelihood function is computed as a similarity measure to drive the groupwise registration process.
For spatial coordinates in the common space, an exemplar atlas can be determined a priori, providing anatomical statistics of the population regardless of their corresponding appearances through medical imaging. For notational convenience, we denote tissue types using label values , where , is the set of labels, with its prior distribution defined as . Assuming independence of each location, the likelihood can be written as . Moreover, by summing over all states of the hidden variable , we have
(2) 
Given the commonspace anatomical structures, the multivariate mixture model assumes conditional independence of the moving subjects, namely
(3) 
where denotes a patch of observations centred at . Given anatomical structures of each subject, one can further assume its appearance is conditional independent of the groupwise anatomy, i.e.
. Hence, we can further factorize the conditional probability into
(4) 
Accordingly, the loglikelihood is given by
(5) 
In practice, we optimize the negative loglikelihood as a dissimilarity measure to obtain the desired spatial transforms . The graphical representation of the proposed model is shown in Fig. 1(b).
2.2 The conditional parameterization
In this section, we specify in detail the conditional probability distributions (CPDs) for a joint distribution that factorizes according to the Bayesian network structure represented in
Fig. 1(b).2.2.1 Spatial prior.
One way to define the commonspace spatial prior is to average over a cohort of subjects [2], and the resulting probabilistic atlas is used as a reference. To avoid bias from a fixed reference and consider the population as a whole, we simply adopt a flat prior over the common space, i.e. , satisfying , where is the weight to balance each tissue class.
2.2.2 Label consistency.
Spatial alignment of a group of subjects can be measured by their label consistency, defined as the joint distribution of the anatomical information , where . Each CPD gives the likelihood of the anatomical structure around a subject location being labelled as , conditioned on the transform that maps from the common space to each subject space. We model it efficiently by a local Gaussian weighting:
(6) 
where is the Kronecker delta function, defines a neighbourhood around of radius and
specifies the weight for each voxel within the neighbourhood. This formulation is equivalent to applying Gaussian filtering using an isotropic standard deviation
to the segmentation mask [12], where we set .2.2.3 Appearance model.
Finally, we seek to specify the term . A common approach adopted by many tissue classification algorithms [17, 2, 13, 27, 9] is to model this CPD as a mixture of Gaussians (MOG), where intensities of the same tissue type should be clustered and voxel locations are assumed independent. Nevertheless, we hypothesize that using such an appearance model can mislead the image registration when the assumption of intensity class correspondence is violated, due to poor imaging quality, particularly in crossmodality or contrast enhanced images [20]. Let and be the voxelwise gradient and Euclideannorm operators, respectively. A vanilla means is to use a mask around the ROI boundaries:
(7) 
which ignores the appearance information. However, we argue that a reasonable CPD design should reflect fidelities of medical imaging and serve as a voxelwise weighting factor for likelihood estimation. Thus, we formalize a CPD that 1) is defined with individual subjects, 2) is zero on voxels distant to the ROIs, 3) has increasing values at regions where appearance and anatomy have consistent rate of change. Therefore, we speculate that voxels with concordant gradient norms between appearance and anatomy are more contributory to determining the spatial correspondence. Based on these principles, one can estimate the CPD as a Gibbs distribution computed from an energy function or negative similarity measure between gradientnorm maps of appearance and anatomy, i.e.
(8) 
where is the normalization factor and can be the negative normalized crosscorrelation (NCC) [3] or negative entropy correlation coefficient (ECC) [18]. Fig. 2 visualises the different appearance models.
2.3 Neural network estimation
We formulate a neural network parameterized by that takes as input a group of images to predict the deformation fields, based on a 3D UNetstyle architecture designed for image registration [12]. To discourage nonsmooth displacement, we resort to bending energy as a deformation regularization term and incorporate it into the loss function [10, 12]. Hence, the final loss function for network optimization becomes
(9) 
where denotes the deformation regularization term and is a regularization coefficient.
2.4 Applications
In this section, we present two applications from the proposed MvMMRegNet framework, which are validated in our experiments.
2.4.1 Pairwise MvMMRegNet for SAS.
Pairwise registration can be considered as a specialization of groupwise registration where the number of subjects equals two and one of the spatial coordinate transforms is the identity mapping. We will demonstrate the registration capacity of our model by performing pairwise registration on a real clinical dataset, referred to as pMvMMRegNet.
2.4.2 Groupwise MvMMRegNet for MAS.
During multiatlas segmentation (MAS), multiple expertannotated images with segmented labels, called atlases, are coregistered to a target space, where the warped atlas labels are combined by label fusion [14]. Delightfully, our model provides a unified framework for this procedure through groupwise registration, denoted as gMvMMRegNet. By setting the common space onto the target as the reference space, we can derive the following segmentation formula:
(10) 
In practice, the MAS result with atlases can be generated from times of groupwise registration over subjects followed by label fusion using Eq. 10.
3 Experiments and Results
In this section, we investigate two applications of the proposed framework described in Section 2.4. In both of the two experiments, the neural networks were trained on a
2080 Ti GPU with the spatial transformer module adapted from opensource code in VoxelMorph
[4], implemented in TensorFlow
[1]. The Adam optimizer was adopted [16], with a cyclical learning rate bouncing between 1e5 and 1e4 to accelerate convergence and avoid shallow local optima [22].3.1 pMvMMRegNet for SAS on whole heart MRI
3.1.1 Materials and baseline.
This experiment was performed on the MMWHS challenge dataset, which provides 120 multimodality wholeheart images from multiple sites, including 60 cardiac CT and 60 cardiac MRI [26, 25], of which 20 subjects from each of the modalities were selected as training data. Intra (MRtoMR) and intermodality (CTtoMR) but intersubject registration tasks were explored on this dataset, resulting in 800 propagated labels in total for 40 test MR subjects.
An optimal weighting of bending energy could lead to a low registration error, when maintaining the global smoothness of the deformations. To be balanced, we set as the default regularization strategy^{1}^{1}1See Fig. 1 in the supplementary material for an empirical result.. We analysed different variants of the appearance model described in Section 2.2.3, i.e. ”MOG”, ”Mask”, ”NCC” and ”ECC”, and compared with a reimplementation of [12], known as ”WeaklyReg”, which exploited the Dice similarity metric for weaklysupervised registration. In addition, with the propagated labels obtained from pairwise registrations, we evaluated the performance of MAS by applying a simple majority vote to the results, denoted as ”MVFMvMM”.
3.1.2 Results and discussion.
Methods  MRtoMR  CTtoMR  

Dice  HD (mm)  Dice  HD (mm)  
WeaklyReg  
BaselineMOG  *  
BaselineMask  *  *  *  
BaselineECC  *  *  
BaselineNCC  *  
MVFMvMM  Dice=  HD (mm)= 
Table 1 presents the Dice statistics of both intra and intermodality registration tasks on the MMWHS dataset. With increasingly plausible modelling of the relationship between appearance and anatomy, we have observed better registration accuracy especially for MR images, indicating efficacy of the proposed framework. Fusing labels by majority vote (”MVFMvMM”) can produce a better segmentation accuracy, reaching an average Dice score of ^{2}^{2}2See Fig.2 in the supplementary material for evaluation statistics on all cardiac substructure., comparable to the interobserver variability of reported in [25].
3.2 gMvMMRegNet for MAS on LGE CMR
3.2.1 Materials and baseline.
In this experiment, we explored MAS with the application of Eq. 10 on MSCMRSeg challenge dataset [27]. The dataset consists of 45 patients scanned using three CMR sequences, i.e. the LGE, T2 and bFFSP, from which 20 patients were chosen in random for training, 5 for validation and 20 for testing. We implemented intersubject and intermodality groupwise registration and evaluated the MAS results on LGE CMR images.
A 2D version of the network architecture described in Section 2.3 was devised to jointly predict the deformation fields for atlases by optimizing Eq. 9. The MAS result was generated by times of groupwise registration over randomly sampled subjects followed by label fusion using Eq. 10.
3.2.2 Results and discussion.
The comparison between SAS and MAS highlights that more accurate and realistic segmentation is generated by groupwise registration than pairwise registration, especially for apical and basal slices^{3}^{3}3See Fig. 3 in the supplementary material for visualization of the segmentation results.. Fig. 3 further reports the mean Dice scores for each cardiac substructure obtained from MAS using times of groupwise registration with subjects. With a fixed total number of atlases, label fusion on 2D slices resulting from groupwise registration outperforms those from conventional pairwise registration, reaching the average myocardium Dice score of . However, we also observe decline in accuracy when having a large number of subjects () to be groupwise registered. This discrepancy could be attributed to the lack of network parameters compromising the predicted deformations.
4 Conclusion
In this work, we propose a probabilistic image registration framework based on multivariate mixture model and neural network estimation, coupling groupwise registration and multiatlas segmentation in a unified fashion. We have evaluated two applications of the proposed model, i.e. SAS via pairwise registration and MAS unified by groupwise registration, on two publicly available cardiac image datasets and compared with stateoftheart methods. The proposed appearance model along with MvMM has shown its efficacy in realizing registration on cardiac medical images characterizing inferior intensity class correspondence. Our method has also proved its superiority over conventional pairwise registration algorithms in terms of segmentation accuracy, highlighting the advantage of groupwise registration as a subroutine to MAS.
Acknowledgement
This work was supported by the National Natural Science Foundation of China (grant no. 61971142).
References

[1]
(2015)
TensorFlow: largescale machine learning on heterogeneous distributed systems
. ArXiv abs/1603.04467. Cited by: §3.  [2] (2005) Unified segmentation. NeuroImage 26, pp. 839–851. Cited by: §1, §2.2.1, §2.2.3.
 [3] (2008) Symmetric diffeomorphic image registration with crosscorrelation: evaluating automated labeling of elderly and neurodegenerative brain. Medical image analysis 12 1, pp. 26–41. Cited by: §2.2.3.
 [4] (2019) VoxelMorph: a learning framework for deformable medical image registration. IEEE Transactions on Medical Imaging 38, pp. 1788–1800. Cited by: §1, §3.
 [5] (2007) Freeform bspline deformation model for groupwise registration.. Medical image computing and computerassisted intervention : MICCAI … International Conference on Medical Image Computing and ComputerAssisted Intervention 10 WS, pp. 23–30. Cited by: §2.
 [6] (2007) Groupwise combined segmentation and registration for atlas construction. Medical image computing and computerassisted intervention : MICCAI … International Conference on Medical Image Computing and ComputerAssisted Intervention 10 Pt 1, pp. 532–40. Cited by: §1.
 [7] (2007) Similarity metrics for groupwise nonrigid registration. Medical image computing and computerassisted intervention : MICCAI … International Conference on Medical Image Computing and ComputerAssisted Intervention 10 Pt 2, pp. 544–52. Cited by: §2.
 [8] (2019) Unsupervised learning of probabilistic diffeomorphic registration for images and surfaces. Medical image analysis 57, pp. 226–236. Cited by: §1.
 [9] (2019) Unsupervised deep learning for bayesian brain mri segmentation. In MICCAI, Cited by: §1, §2.2.3.
 [10] (2018) A deep learning framework for unsupervised affine and deformable image registration. Medical Image Analysis 52, pp. 128–143. Cited by: §1, §2.3.
 [11] (2001) Medical image registration. Physics in medicine and biology 46 3, pp. R1–45. Cited by: §1.

[12]
(2018)
Weaklysupervised convolutional neural networks for multimodal image registration
. Medical Image Analysis 49, pp. 1–13. Cited by: §1, §2.2.2, §2.3, §3.1.1.  [13] (2013) A unified framework for crossmodality multiatlas segmentation of brain mri. Medical Image Analysis 17, pp. 1181–1191. Cited by: §2.2.3.
 [14] (2014) Multiatlas segmentation of biomedical images: a survey. Medical image analysis 24 1, pp. 205–219. Cited by: §2.4.2.
 [15] (2018) An overview on image registration techniques for cardiac diagnosis and treatment. Cardiology research and practice. Cited by: §1.
 [16] (2014) Adam: a method for stochastic optimization. CoRR abs/1412.6980. Cited by: §3.
 [17] (1999) Automated modelbased tissue classification of mr images of the brain. IEEE Transactions on Medical Imaging 18, pp. 897–908. Cited by: §2.2.3.
 [18] (1997) Multimodality image registration by maximization of mutual information. IEEE Transactions on Medical Imaging 16, pp. 187–198. Cited by: §2.2.3.
 [19] (2002) A review of cardiac image registration methods. IEEE Transactions on Medical Imaging 21, pp. 1011–1021. Cited by: §1.
 [20] (2003) Mutualinformationbased registration of medical images: a survey. IEEE Transactions on Medical Imaging 22, pp. 986–1004. Cited by: §2.2.3.
 [21] (2006) A bayesian model for joint segmentation and registration. NeuroImage 31, pp. 228–239. Cited by: §1.

[22]
(2015)
Cyclical learning rates for training neural networks.
2017 IEEE Winter Conference on Applications of Computer Vision (WACV)
, pp. 464–472. Cited by: §3.  [23] (2013) Deformable medical image registration: a survey. IEEE Transactions on Medical Imaging 32, pp. 1153–1190. Cited by: §1.
 [24] (2016) A survey of medical image registration  under review. Medical image analysis 33, pp. 140–144. Cited by: §1.
 [25] (2019) Evaluation of algorithms for multimodality whole heart segmentation: an openaccess grand challenge. In Medical Image Analysis, Cited by: §3.1.1, §3.1.2.
 [26] (2016) Multiscale patch and multimodality atlases for whole heart segmentation of mri. Medical image analysis 31, pp. 77–87. Cited by: §3.1.1.
 [27] (2019) Multivariate mixture model for myocardial segmentation combining multisource images. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, pp. 2933–2946. Cited by: §1, §2.2.3, §2, §3.2.1.
Comments
There are no comments yet.