On Reducing Negative Jacobian Determinant of the Deformation Predicted by Deep Registration Networks

06/28/2019 ∙ by Dongyang Kuang, et al. ∙ 0

Image registration is a fundamental step in medical image analysis. Ideally, the transformation that registers one image to another should be a diffeomorphism that is both invertible and smooth. Traditional methods like geodesic shooting approach the problem via differential geometry, with theoretical guarantees that the resulting transformation will be smooth and invertible. Most previous research using unsupervised deep neural networks for registration have used a local smoothness constraint (typically, a spatial variation loss) to address the smoothness issue. These networks usually produce non-invertible transformations with "folding" in multiple voxel locations, indicated by a negative determinant of the Jacobian matrix of the transformation. While using a loss function that specifically penalizes the folding is a straightforward solution, this usually requires carefully tuning the regularization strength, especially when there are also other losses. In this paper we address this problem from a different angle, by investigating possible training mechanisms that will help the network avoid negative Jacobians and produce smoother deformations. We contribute two independent ideas in this direction. Both ideas greatly reduce the number of folding locations in the predicted deformation, without making changes to the hyperparameters or the architecture used in the existing baseline registration network.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Image registration is a key element of medical image analysis. Most state-of-the-art registration algorithms, such as ANTs [1]

, can utilize geometric methods that are guaranteed to produce smooth invertible deformations that are much desired in medical image registration. A revolution is taking place in the last couple of years in the application of machine learning methods, especially convolutional neural networks, to this problem. While recent registration networks can make predictions of the nonlinear transformation much faster and obtain registration accuracy comparable to traditional methods, they usually do not have theoretical guarantees on the smoothness and invertibility of their predicted deformations.

Supervised methods, such as in [8, 11, 14], learn from known reference deformations for training data – either actual “ground truth” in the case of synthetic image pairs, or deformations computed by other automatic or semiautomatic methods. They usually do not have problems of smoothness, but still relies on other tools such as ANTs running ahead to produce desired transformations. The registration problem is much harder in the setting of unsupervised methods. Most of the unsupervised approaches like [15, 7, 12, 10, 2] take the idea of spatial transformer (ST) [4]. This spatial transformer used in registration usually consists of two basic function units: a deformation unit and a sampling unit. With input (source image) and

(target image) stacked as an ordered pair, the deformation unit produces a static displacement field

. The warped image is then constructed in the sampling unit by sampling the source image with via , where is the identity map. As a summary, the right action of diffeomorphism on image is approximated by . The smoothness constraint on is usually addressed by regularizing its derivative . The work [2] is one representative and Figure 1 shows the work flow of the idea introduced as above. The whole network is trained so that it minimizes the loss: , where stands for cross correlation loss and is a hyperparameter controlling the strength of the regularization.

Figure 1: An overview of the registration network usually used for registration. The popular U-net architecture[9] is used as the deformation unit for generating the displacement field.

These work emphasize more on the accuracy and efficiency of registration when compared to classical methods but usually did not put equal emphasis on checking geometric properties such as smoothness, invertibility or orientation preservation for the predicted deformations. Particularly, Jacobian determinant of the predicted transformations i.e. det( det(Id + ) from a neural network can very likely be negative at multiple locations. This “folding” issue during prediction may still persist even when one increases regularization strength of (see Figure 2).

Figure 2: A snapshot of at the same location of the projected warped grid with different regularization strength. From left to right, the network is trained with separately.

Additionally, this hyper-parameter is usually difficult to set in order to reach a good balance between nice geometrical properties111In the paper, it will mainly refer to smoothness, invertibility and particularly, transformations has positive Jacobian determinant everywhere. and registration accuracy, since larger values often cause smaller deformations reducing the accuracy. In this paper, we propose two separate ideas: a cycle consistent design and a refinement module focusing specifically on the negative Jacobian determinant issue of unsupervised registration networks. The two ideas inspect the problem from a different angle by altering the training mechanism instead of tuning hyper-parameters or changing the commonly adopted loss functions. From our experiment, both methods greatly reduce chances of negative Jacobian determinant in predicted transformations with little sacrifice in registration accuracy.

2 Proposed methods

2.1 Cycle consistent design

From the mathematical point of view, the transformations used in registration tasks should ideally be diffeomorphisms so that topological properties are not changed during the transformations. In order to approximate the ideal property of invertibility, training of the network should also respect this invertibility property. In fields such as computer vision, there has already been research such as


utilizing this idea for better quality control of cross-domain image generation. That work defines two joint cycle consistent loops for better training two separate generative adversarial networks for unpaired image-to-image translation back and forth. We use a related idea in a different setting here for regularizing the predicted static displacement field. This “cycle consistent loss” idea does not involve new losses but forces the same network to perform a backward prediction trying to recover the input right after it completes the forward prediction.

Figure 3: A diagram illustrating the cycle consistent design.

As seen in Figure 3, the spatial transformer will first predict the warped image and displacement field with the stacked source image and target image . The predicted target image (now as source) is then stacked with the original source image (now as target). They will be fed into the same spatial transformer to produce a reconstruction for and corresponding inverted displacement field . The whole network is trained with the cycle consistent loss:


While it is straightforward that this design directly addresses the invertibility of the network, the cycle constraint also contributes to the task of learning a smooth solution in an indirect way: the design regularizes the network by forcing the spatial transformer to learn a solution and its inverse at the same time. This helps the network rule out possible transformations that are not cycle consistent. This design also does not add any additional learnable parameters to the original spatial transformer and can be trained as equally efficient.

Figure 4: A diagram showing the activity of the network during training phases. Colored part in the above figure is the baseline network, greyed and dashed part is the attachable refinement network. D: the network producing displacement field. S: the sampling module. R: the network for learning possible corrections needed for smoother field with less “folding”. Training is happening alternatively between these two phases.

2.2 Refinement Module

The other method we propose in this paper is more straightforward by adding a refinement module to the original registration network. This module focuses directly on the task of refining the displacement field. The resulting local smoothness can also contribute to the local invertibility of the transformation through Inverse Function Theorem. Our goal is to separate the task of “generating a smooth (with less folding) deformation field to warp the source as similar as the target” into two competing subtasks. The generation network (active in phase I) will only focus on improving similarities between warped and target image with mild constraints on the smoothness condition. The following refinement network (active in phase II) with a stronger regularization strength, on the other hand, puts its attention on reducing folding locations by adopting more rigorous smoothness constraints. In order to train both networks together to reach an ideal “equilibrium” state where the transformation is both accurate and smooth with less foldings, we adopted a alternating training algorithm as seen in Fig 4.

3 Related Work

To author’s best knowledge when completing this paper, [16] and [3] are most relevant research. [16] designed an inverse consistent network and argued adding an “anti-folding constraint” to prevent folding in predicted transformation. Different from his work, we did not create new loss in this paper, but focuses on discovering possible training mechanisms that will help regularize the network. The alternating training with refinement model is similar to [3], but our purpose is for regularizing deformation in image transformation instead of image generations. 222The code for the paper is released at https://github.com/dykuang/Medical-image-registration.

4 Experiment

4.1 Dataset

We used MindBoggle101 dataset [6] for experiments. Details of data collection and processing, including atlas creation, are described in [6]. In the present paper, we used brain volumes consisting of the following three named subsets of Mindboggle101: NKI-RS-22, NKI-TRT-20 and OASIS-TRT-20. Each image has a dimension of , we truncated the marginal reducing the size to . These images are already linearly aligned to MNI152 space. We also normalized the intensity of each brain volume to by its maximum voxel intensity.

Figure 5 (left) shows one subject of the dataset with two annotated labels. Labels used in Mindboggle101 data set are cortex surface labels. Their geometrical complexity leads to more challenging registration tasks, especially for neural network approaches. In the following experiments, predictions from the original VoxelMorph network [2] will be used as the baseline network. This baseline network alone, it with cycle consistent design and it with refinement will be compared. The baseline method and the method with cycle consistent design are trained with

and 10 epochs. In the baseline network with refinement, training phase I uses

while training phase II takes . Each phase will run 3 epochs continuesly before switching to the other. A total of 4 iterations of this training loop is used. “Adam” optimizer [5] with learning rate are used for all the three networks. For results showing below, block R in Figure 4

is simply consisting of two consecutive convolutional layers with Leaky ReLU

[13] activation. Better result is possible with a more complex architecture.

We access the accuracy of predicted registration via dice score between ROI labels/masks. For image pair , each indexed label associated with will be warped with the predicted deformation from the registration network, dice score is then calculated.


This metric on test set (OASIS-TRT-20) is summarized in Fig 5.

Figure 5: Left: Sample brain volume and 2 labels. Right: Mean dice scores of different methods on selected regions. Each point is the mean dice score averaged over corresponding ROI labels per registration pair instead of over the union of labels in that region. Results from SyNQuick algorithm in the ANTs package are also listed for better interpreting these dice scores, but not for the purpose of comparison.
Baseline 1.97 % 0.73%
Cycle Consistent 0.13% 0.05%
Refinement 0.20% 0.02%
Table 1: Summary of with the 3-fold validation.

The foldings of the deformation is accessed via examining locations where negative Jacobian determinant happen. Let be defined as the percentage of voxel locations where the Jacobian determinant is negative over all voxels , i.e. The ideal predicted transformation should have this number as small as possible. To better access the general performance of our proposed methods, we perform a 3-fold validation333Each fold will use 2 of the 3 datasets for forming training set and test on the third. Figure 5 and figure 6 are from the fold when pairs from OAISIS dataset are used as test. This experiment has 1722 training pairs and 380 test pairs. with the 3 datasets at hand. We summarize this number from different methods into Table 1 for comparison. Mean (

) and standard deviations (

) of on the test set are recorded. For better visualization, we also put one slice of the Jacobian determinant and the projected warped grid on the same slice in Figure 6. The transformation for visualization used in the figure is predicted on the pair formed by subject OASIS-TRT-3 (source) and subject OASIS-TRT-8 (target).

Figure 6: Determinant of Jacobian map and the warped grid projected on the same slice. From left to right: the baseline VoxelMorph prediction, baseline with cycle consistent design and baseline with refinement module. Locations where the determinants are negative are shown in red.

Table 1 suggests there are big differences of the underlying transformation in terms of the measure. From the cross validation results, the baseline method has a mean value of 1.97% locations where Jacobian determinants are negative. When the cycle consistent design is applied, this value drops to 0.13%. In other words, more than 90% of the unsatisfactory locations happening in the baseline prediction are eliminated. With the refinement module, though not reducing as much as the cycle consistent case, it achieves a mean of 0.20% with a smaller standard deviation. Surprisingly, Figure 5 shows that this big improvements in terms of eliminating folding locations did not sacrifice much the registration accuracy measured by dice score. Figure 6 shows an example of locations of negative Jacobian determinant, this help give an intuitive view of what happened behind the curtain. From the warped grid column, one can clearly see networks with cycle consistent design or refinement module did not change much in locations where the baseline prediction are already smooth but put attentions on foldings and “unfold” them to produce a much smoother transformation.

5 Conclusion

We contributed two separate ideas for improving the smoothness of deformation when a deep neural network is used for unsupervised registration tasks. These ideas do not create new kinds of losses but focus on another direction that could bring improvements by adopting different training mechanisms. Both ideas work well in reducing locations that has negative Jacobian determinant when compared to the baseline neural network with little sacrifice of accuracy. They also do not change the baseline registration network and can be used together with other ideas for enhancing smoothness or registration.