Image registration is a key element of medical image analysis. Most state-of-the-art registration algorithms, such as ANTs 
, can utilize geometric methods that are guaranteed to produce smooth invertible deformations that are much desired in medical image registration. A revolution is taking place in the last couple of years in the application of machine learning methods, especially convolutional neural networks, to this problem. While recent registration networks can make predictions of the nonlinear transformation much faster and obtain registration accuracy comparable to traditional methods, they usually do not have theoretical guarantees on the smoothness and invertibility of their predicted deformations.
Supervised methods, such as in [8, 11, 14], learn from known reference deformations for training data – either actual “ground truth” in the case of synthetic image pairs, or deformations computed by other automatic or semiautomatic methods. They usually do not have problems of smoothness, but still relies on other tools such as ANTs running ahead to produce desired transformations. The registration problem is much harder in the setting of unsupervised methods. Most of the unsupervised approaches like [15, 7, 12, 10, 2] take the idea of spatial transformer (ST) . This spatial transformer used in registration usually consists of two basic function units: a deformation unit and a sampling unit. With input (source image) and
(target image) stacked as an ordered pair, the deformation unit produces a static displacement field. The warped image is then constructed in the sampling unit by sampling the source image with via , where is the identity map. As a summary, the right action of diffeomorphism on image is approximated by . The smoothness constraint on is usually addressed by regularizing its derivative . The work  is one representative and Figure 1 shows the work flow of the idea introduced as above. The whole network is trained so that it minimizes the loss: , where stands for cross correlation loss and is a hyperparameter controlling the strength of the regularization.
These work emphasize more on the accuracy and efficiency of registration when compared to classical methods but usually did not put equal emphasis on checking geometric properties such as smoothness, invertibility or orientation preservation for the predicted deformations. Particularly, Jacobian determinant of the predicted transformations i.e. det( det(Id + ) from a neural network can very likely be negative at multiple locations. This “folding” issue during prediction may still persist even when one increases regularization strength of (see Figure 2).
Additionally, this hyper-parameter is usually difficult to set in order to reach a good balance between nice geometrical properties111In the paper, it will mainly refer to smoothness, invertibility and particularly, transformations has positive Jacobian determinant everywhere. and registration accuracy, since larger values often cause smaller deformations reducing the accuracy. In this paper, we propose two separate ideas: a cycle consistent design and a refinement module focusing specifically on the negative Jacobian determinant issue of unsupervised registration networks. The two ideas inspect the problem from a different angle by altering the training mechanism instead of tuning hyper-parameters or changing the commonly adopted loss functions. From our experiment, both methods greatly reduce chances of negative Jacobian determinant in predicted transformations with little sacrifice in registration accuracy.
2 Proposed methods
2.1 Cycle consistent design
From the mathematical point of view, the transformations used in registration tasks should ideally be diffeomorphisms so that topological properties are not changed during the transformations. In order to approximate the ideal property of invertibility, training of the network should also respect this invertibility property. In fields such as computer vision, there has already been research such as
utilizing this idea for better quality control of cross-domain image generation. That work defines two joint cycle consistent loops for better training two separate generative adversarial networks for unpaired image-to-image translation back and forth. We use a related idea in a different setting here for regularizing the predicted static displacement field. This “cycle consistent loss” idea does not involve new losses but forces the same network to perform a backward prediction trying to recover the input right after it completes the forward prediction.
As seen in Figure 3, the spatial transformer will first predict the warped image and displacement field with the stacked source image and target image . The predicted target image (now as source) is then stacked with the original source image (now as target). They will be fed into the same spatial transformer to produce a reconstruction for and corresponding inverted displacement field . The whole network is trained with the cycle consistent loss:
While it is straightforward that this design directly addresses the invertibility of the network, the cycle constraint also contributes to the task of learning a smooth solution in an indirect way: the design regularizes the network by forcing the spatial transformer to learn a solution and its inverse at the same time. This helps the network rule out possible transformations that are not cycle consistent. This design also does not add any additional learnable parameters to the original spatial transformer and can be trained as equally efficient.
2.2 Refinement Module
The other method we propose in this paper is more straightforward by adding a refinement module to the original registration network. This module focuses directly on the task of refining the displacement field. The resulting local smoothness can also contribute to the local invertibility of the transformation through Inverse Function Theorem. Our goal is to separate the task of “generating a smooth (with less folding) deformation field to warp the source as similar as the target” into two competing subtasks. The generation network (active in phase I) will only focus on improving similarities between warped and target image with mild constraints on the smoothness condition. The following refinement network (active in phase II) with a stronger regularization strength, on the other hand, puts its attention on reducing folding locations by adopting more rigorous smoothness constraints. In order to train both networks together to reach an ideal “equilibrium” state where the transformation is both accurate and smooth with less foldings, we adopted a alternating training algorithm as seen in Fig 4.
3 Related Work
To author’s best knowledge when completing this paper,  and  are most relevant research.  designed an inverse consistent network and argued adding an “anti-folding constraint” to prevent folding in predicted transformation. Different from his work, we did not create new loss in this paper, but focuses on discovering possible training mechanisms that will help regularize the network. The alternating training with refinement model is similar to , but our purpose is for regularizing deformation in image transformation instead of image generations. 222The code for the paper is released at https://github.com/dykuang/Medical-image-registration.
We used MindBoggle101 dataset  for experiments. Details of data collection and processing, including atlas creation, are described in . In the present paper, we used brain volumes consisting of the following three named subsets of Mindboggle101: NKI-RS-22, NKI-TRT-20 and OASIS-TRT-20. Each image has a dimension of , we truncated the marginal reducing the size to . These images are already linearly aligned to MNI152 space. We also normalized the intensity of each brain volume to by its maximum voxel intensity.
Figure 5 (left) shows one subject of the dataset with two annotated labels. Labels used in Mindboggle101 data set are cortex surface labels. Their geometrical complexity leads to more challenging registration tasks, especially for neural network approaches. In the following experiments, predictions from the original VoxelMorph network  will be used as the baseline network. This baseline network alone, it with cycle consistent design and it with refinement will be compared. The baseline method and the method with cycle consistent design are trained with
and 10 epochs. In the baseline network with refinement, training phase I useswhile training phase II takes . Each phase will run 3 epochs continuesly before switching to the other. A total of 4 iterations of this training loop is used. “Adam” optimizer  with learning rate are used for all the three networks. For results showing below, block R in Figure 4
is simply consisting of two consecutive convolutional layers with Leaky ReLU activation. Better result is possible with a more complex architecture.
We access the accuracy of predicted registration via dice score between ROI labels/masks. For image pair , each indexed label associated with will be warped with the predicted deformation from the registration network, dice score is then calculated.
This metric on test set (OASIS-TRT-20) is summarized in Fig 5.
The foldings of the deformation is accessed via examining locations where negative Jacobian determinant happen. Let be defined as the percentage of voxel locations where the Jacobian determinant is negative over all voxels , i.e. The ideal predicted transformation should have this number as small as possible. To better access the general performance of our proposed methods, we perform a 3-fold validation333Each fold will use 2 of the 3 datasets for forming training set and test on the third. Figure 5 and figure 6 are from the fold when pairs from OAISIS dataset are used as test. This experiment has 1722 training pairs and 380 test pairs. with the 3 datasets at hand. We summarize this number from different methods into Table 1 for comparison. Mean (
) and standard deviations () of on the test set are recorded. For better visualization, we also put one slice of the Jacobian determinant and the projected warped grid on the same slice in Figure 6. The transformation for visualization used in the figure is predicted on the pair formed by subject OASIS-TRT-3 (source) and subject OASIS-TRT-8 (target).
Table 1 suggests there are big differences of the underlying transformation in terms of the measure. From the cross validation results, the baseline method has a mean value of 1.97% locations where Jacobian determinants are negative. When the cycle consistent design is applied, this value drops to 0.13%. In other words, more than 90% of the unsatisfactory locations happening in the baseline prediction are eliminated. With the refinement module, though not reducing as much as the cycle consistent case, it achieves a mean of 0.20% with a smaller standard deviation. Surprisingly, Figure 5 shows that this big improvements in terms of eliminating folding locations did not sacrifice much the registration accuracy measured by dice score. Figure 6 shows an example of locations of negative Jacobian determinant, this help give an intuitive view of what happened behind the curtain. From the warped grid column, one can clearly see networks with cycle consistent design or refinement module did not change much in locations where the baseline prediction are already smooth but put attentions on foldings and “unfold” them to produce a much smoother transformation.
We contributed two separate ideas for improving the smoothness of deformation when a deep neural network is used for unsupervised registration tasks. These ideas do not create new kinds of losses but focus on another direction that could bring improvements by adopting different training mechanisms. Both ideas work well in reducing locations that has negative Jacobian determinant when compared to the baseline neural network with little sacrifice of accuracy. They also do not change the baseline registration network and can be used together with other ideas for enhancing smoothness or registration.
-  Avants, B.B., Tustison, N.J., Song, G., Cook, P.A., Klein, A., Gee, J.C.: A reproducible evaluation of ants similarity metric performance in brain image registration. Neuroimage 54(3), 2033–2044 (2011)
Balakrishnan, G., Zhao, A., Sabuncu, M.R., Guttag, J., Dalca, A.V.: An unsupervised learning model for deformable medical image registration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 9252–9260 (2018)
-  Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in neural information processing systems. pp. 2672–2680 (2014)
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in neural information processing systems. pp. 2017–2025 (2015)
-  Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
-  Klein, A., Tourville, J.: 101 labeled brain images and a consistent human cortical labeling protocol. Frontiers in neuroscience 6, 171 (2012)
-  Li, H., Fan, Y.: Non-rigid image registration using fully convolutional networks with deep self-supervision. arXiv preprint arXiv:1709.00799 (2017)
-  Rohé, M.M., Datar, M., Heimann, T., Sermesant, M., Pennec, X.: Svf-net: Learning deformable image registration using shape matching. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 266–274. Springer (2017)
-  Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention. pp. 234–241. Springer (2015)
-  Shan, S., Guo, X., Yan, W., Chang, E.I., Fan, Y., Xu, Y., et al.: Unsupervised end-to-end learning for deformable medical image registration. arXiv preprint arXiv:1711.08608 (2017)
-  Sokooti, H., de Vos, B., Berendsen, F., Lelieveldt, B.P., Išgum, I., Staring, M.: Nonrigid image registration using multi-scale 3d convolutional neural networks. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 232–239. Springer (2017)
Wang, S., Kim, M., Wu, G., Shen, D.: Scalable high performance image registration framework by unsupervised deep feature representations learning. In: Deep Learning for Medical Image Analysis, pp. 245–269. Elsevier (2017)
-  Xu, B., Wang, N., Chen, T., Li, M.: Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853 (2015)
-  Yang, X., Kwitt, R., Styner, M., Niethammer, M.: Quicksilver: Fast predictive image registration–a deep learning approach. NeuroImage 158, 378–396 (2017)
-  Yoo, I., Hildebrand, D.G., Tobin, W.F., Lee, W.C.A., Jeong, W.K.: ssemnet: Serial-section electron microscopy image registration using a spatial transformer network with learned features. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pp. 249–257. Springer (2017)
-  Zhang, J.: Inverse-consistent deep networks for unsupervised deformable image registration. arXiv preprint arXiv:1809.03443 (2018)
-  Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint (2017)