One Shot Learning for Deformable Medical Image Registration and Periodic Motion Tracking

07/10/2019 ∙ by Tobias Fechter, et al. ∙ 4

Deformable image registration is a very important field of research in medical imaging. Recently multiple deep learning approaches were published in this area showing promising results. However, drawbacks of deep learning methods are the need for a large amount of training datasets and their inability to register unseen images different from the training datasets. One shot learning comes without the need of large training datasets and has already been proven to be applicable to 3D data. In this work we present an one shot registration approach for periodic motion tracking in 3D and 4D datasets. When applied to 3D dataset the algorithm calculates the inverse of a registration vector field simultaneously. For registration we employed a U-Net combined with a coarse to fine approach and a differential spatial transformer module. The algorithm was thoroughly tested with multiple 4D and 3D datasets publicly available. The results show that the presented approach is able to track periodic motion and to yield a competitive registration accuracy. Possible application are the use as a stand-alone algorithm for 3D and 4D motion tracking or in the beginning of studies until enough datasets for a separate training phase are available.



There are no comments yet.


page 1

page 4

page 5

page 7

page 10

Code Repositories


One Shot Deformable Medical Image Registration

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Deformable image registration (DIR) is a technique to determine spatial non linear correspondences between two or more images and finds many important applications in the medical field. DIR can be used to detect anatomical changes within a patient (e.g. monitoring, breathing or heart motion) or to find correspondences among different patients (e.g. atlas based segmentation). Although, DIR has been an active research field for years, due to its complexity it remains a challenging problem. Some reasons that hamper the alignment of two or more images are: multiple image modalities, ill-posed high dimensional optimization, different types of motion, bad image contrast [1]. These limitations require usually specialised algorithms tailored to a specific problem. Therefore, many innovative ideas have been proposed over the past few decades. Good overviews of the field can be found in [1, 2, 3].

With the rise of machine learning in the recent years deep learning (DL) based algorithms were presented to solve medical image registration problems. DL algorithms can learn relevant image features, the relations between features and how to translate the learned features into a deformation vector.

The best performing methods base on convolutional neural networks (

CNNs). CNNs have an advantage in handling spatial information and are able to learn complex image features specific for the current registration problem.

One of the first approaches to align images with a CNN was the spatial transformer module [4], which was used to improve classification accuracy.

[5] presented a CNN to register magnetic resonance images to intra-procedural transrectal ultrasound images of prostate cancer patients. They trained their network with labelled image datasets but after completion of training the network is able to register unlabelled images.

Other supervised methods were presented by [6] who showed a network for predicting the momentum of large deformation diffeomorphic metric mapping model, [7] who presented a network for the registration of magnetic resonance images (MRI) of the brain and [8, 9] showed a method for pulmonary CT registration.

The aforementioned methods can be classified as supervised methods. They need example registrations

[6, 7, 8] or manual segmentations [5] for training the neural networks. Generating these information is complex, time consuming and problem specific. Recently, unsupervised methods [10, 11, 12, 13, 14]

were presented. Similarly to conventional registration methods the authors of these methods find a solution to the registration problem at hand by optimizing an image similarity metric.

One drawback of the above mentioned supervised and unsupervised DL methods is that a lot of training data is needed to optimise the network weights. Training data has to be selected carefully as the datasets should cover future registration tasks as best as possible. Therefore these algorithms cannot be applied to unseen domains [15]. Another machine learning approach that comes without the demand for big training datasets is one shot learning. Here the parameters of the network are trained from scratch only with the images to be registered. In [15] it has been shown that one shot learning can produce registration results comparable to state of the art methods for two dimensional image slices.

So far the presented CNN based registration methods can only generate a deformation field for two three dimensional (3D) image datasets. However, in some medical fields the application of four dimensional (4D

) data (3D + time) plays an important role. By way of example, organ or tumour movement due to respiratory motion is estimated by a 4D computed tomography (

CT) scan for radiotherapy treatment planning [16] or the progress of a disease (e.g. tumour growth or shrinkage) during the course of treatment can be measured by follow up scans [17]. To measure the growth of a tumour by means of a 4D dataset, DIR can be done by applying a 3D DIR algorithm to each 3D dataset and its temporal successor as the deformation of the tumour does not follow a specific pattern. Whereas, the motion of organs due to breathing or heartbeat is periodic and can not be covered by the multiple application of a 3D DIR algorithm.

For organ specific motion patterns [18] used Gaussian kernel to calculate temporal smooth deformation fields for cardiac cine MRI datasets. Cubic B-splines with a periodic constraint were used by [19] for lung motion estimation. A trajectory constraint for periodic motion is presented in [20, 21]. The authors presented a multi-channel Diffeomorphic Demons based algorithm to register two 4D cardiac time series. In [22] temporal coherence was ensured by smoothing temporal fibers. All these methods for the registration of 4D datasets rely on conventional image registration methods.

In this work we show a way how to use the strengths of CNNs to enable DIR of 4D periodic image datasets. We present an unsupervised CNN based DIR algorithm to calculate dense displacement fields for 4D image data with a periodic motion pattern. In case of 3D to 3D registration the algorithm calculates the deformation vector field (DVF) and an approximation of the inverse field simultaneously. Registration is performed in one shot, so no training data is necessary. We evaluated the algorithm on multiple image datasets publicly available and are able to show that its performance is equal or superior to state of the art DL registration algorithms.

Ii Data

The proposed algorithm was evaluated on 31 publicly available 4D datasets. Each of them having either manually set landmarks or manually drawn contours for registration evaluation. Registration results of the proposed algorithm were evaluated and compared to the results of other state of the art algorithms.

Ii-a DirLab

To evaluate the presented algorithm in the thoracic region the publicly available DirLab [23, 24] dataset was used. The dataset consists of 10 4D CTs of the thoracic region. Each 4D CT consists of 10 3D CTs and manually set reference landmarks in the lungs for registration evaluation. The full number of landmarks was identified only in the maximum inhalation and exhalation phase but in addition a subset of 75 features was marked on each of the expiratory phase images. In table II more details can be seen.

Ii-B Popi

Another dataset for evaluating image registration algorithms is provided by the Léon Bérard Cancer Center & CREATIS lab, Lyon, France [19]. Six 4D CTs consisting of 10 3D CTs each, are provided. The datasets show the lung region and contain landmarks for registration evaluation. For the first three patients landmarks were identified in each of the 10 3D CTs, for the remaining three patients landmarks were identified in the CTs depicting the maximum inhale and exhale phase. Dataset details can be found in table II.

Ii-C Sunnybrook

To evaluate the performance of our algorithm on a different body site and modality we used the cardiac MRI datasets [25] provided by the Sunnybrook Health Sciences Centre. In particular, we used the 12 datasets that were also provided as training data for the MICCAI left ventricle segmentaion challenge [26]. The datasets contain cine-MRI images from patients with different pathologies and expert-drawn contours for validation. Each datasets consists of 20 time-bins and expert drawn contours are available for end-diastolic and end-systolic phases. In table II details on the datasets are listed (the datasets are abbreviated as SC-*).

Iii Methods

Let be a 4D medical image dataset consisting of 3D images (for conventional 3D to 3D DIR is 2). The aim of our deformable DIR algorithm is to find a transformation that maps each 3D image in to its timely adjacent successor (as we deal with periodic data is aligned to ). consist of 3D dense vector fields with 3 components (x-, y- and z-direction) each and describes the trajectory for each voxel in . is the sequential application of the deformation fields to to position . The deformation vector at a given position and time-bin is indicated as .

In this work we assume that the images are pre-registered using affine transformations. The task at hand is to calculate the deformable transformation that represents anatomically plausible spatial relationships between voxels in and takes care of the periodic character of the underlying motion.

Iii-a Preprocessing

To reduce the amount of data to be processed every dataset was divided into foreground and background region and only the foreground region was processed by the registration algorithm. For the Popi and DirLab datasets the foreground was defined as all voxels inside the body outline contour. The body outline contour was generated automatically with [27]. For the Sunnybrook datasets the foreground was generated by discarding the largest connected region containing only voxels with intensity 0.

The Sunnybrook datasets have much smaller voxel spacing along the x- and y-axis than along the z-axis, which had a negative impact on the registration results. Therefore these datasets were resampled to a voxel size of 1.5 x 1.5 x 1.5 mm. For the other two dataset collections this step was neglected because the differences in voxel spacing along the axes were not as big as for the Sunnybrook datasets and preliminary tests showed that resampling the datasets leads to no significant change in registration performance.

In a last preprocessing step the foreground voxel intensities were normalised to zero mean and standard deviation of one.

Iii-B Network Architecture and Training

Following the recently published works about DL based image registration [15, 10, 11, 12, 13, 14] we decided for a CNN based architecture. Similar to [15] we employed the U-net architecture [28] with 2 downsampling steps with average pooling layers, 2 upsampling steps with transposed convolution layers [29] and skip connections by summation instead of concatenation. The network takes or a patch of as input (each time-bin as a single channel) and calculates deformation fields with the network parameters . An illustration of all layers and connections can be found in Fig. 1.

The idea of one shot learning in image registration is to start with an untrained model and to optimise the parameters until convergence only with the one dataset to be registered [15].

Figure 1: This figure shows the multi-resolution network architecture used in this work exemplarily for a 4D CT scan of the thoracic region with 10 time-bins. The gray shaded rectangles stand for processing the input data in a different image resolution. To calculate large deformations (indicated in red in the output image) the U-net is trained till convergence on the two times downsampled input dataset. Then, medium and small deformations (yellow and green) are calculated by training the net on the simple downsampled and the original input dataset, respectively. The number on the lower left end of the squares indicates the number of channels, the number on the lower edge indicates the channel size. Note that the U-Net architecture is the same for all image resolutions, solely the input is changing. Convolutional layers had a kernel size of

, stride and padding were set to 1, the pooling layers had a kernel size of

and a stride of 2. One downsampling step reduces the image size by a factor of 2.

Iii-C Multi-Resolution Patch Processing

The large amount of memory which is needed for 4D image data, the corresponding deformation fields and the network parameters prevented to process datasets as a whole. Therefore, before processing an input dataset it is split into smaller non overlapping patches. Splitting is performed along the spatial dimensions only. The network is then trained till convergence with each patch. After termination of training the computed vector field patches are assembled.

Patch based processing in combination with the networks receptive field ( voxels) inhibits the presented network with the original image as input to account for large deformations (occurring e.g. in the diaphragmatic region). Thus we adopted a coarse-to-fine approach like it is often used in conventional image registration [30]. Our multi-resolution approach consists of 3 image registration steps. First, basic deformations are calculated with the input image downsampled by a factor of 4. This step is repeated with the input image downsampled by a factor of 2 to capture intermediate deformations. In a last step the original image is fed to the network. The calculated vector fields of a respective downsample step are added to the upsampled vector fields of previous steps before the loss is calculated. Thereby, the net first calculates a basic deformation field and adds more and more details after every upsampling step. The whole multi-resolution architecture is depicted in Fig. 1.

Iii-D Loss Function

To find we optimize

to minimise a loss function

. The value of indicates the quality of the deformation field and how well aligns the images in . In this work is defined as:


and consists of 3 parts: an image dissimilarity metric , a regularization constraint for smoothness and to prevent folding as well as a cyclic constraint to encourage periodic deformations. and are weighting terms to control the influence of the regularisation terms. For the negative normalized cross correlation was adopted as it is known to perform well for mono modal image registration [31]. was chosen to be the l2-norm derivatives of the deformation field in combination with a boundary smoothness constraint to guarantee a homogeneous deformation field also at patch transitions:


with being the set of all 3D voxel positions, the number of time-bins, the set of voxel positions at the border of an image patch and the set of neighbouring voxel positions along a given axis. For a better understanding we visualised the sets and in Fig. 2. controls the influence of the boundary smoothness constraint and calculates the Euclidean distance between positions and .

Figure 2: The voxel sets needed for the calculation of the regularization constraint are illustrated in this figure. The red line indicates a fictive image patch of 5 voxels. The area filled in blue represents the set of border voxels and the orange filled area contains the neighbouring voxels of the dark blue coloured voxel along the x-axis.

The periodic constraint was defined as the sum of all deformation vectors along the trajectory of a voxel:


A sum of zero indicates a perfect periodic motion. To calculate in (1) and in (3) we implemented a differential spatial transformer module [4]

that warps the input images and allows backpropagation of the loss to update


Iii-E Implementation

The presented algorithm was implemented with pytorch v1.0.1 and is provided on GitHub

111 Computations were done on a PC with a 8 core Intel Xeon CPU, 60 GB memory and a Nvidia Titan XP GPU. In our test setting the following parameter values were used: : , : , : zero in the first multi-resolution step and 0.1 for the following steps. We assumed convergence when the current value of the loss function (1) and the moving average of previous loss values do not differ by more than . The parameter values were determined on one test dataset (DirLab08) and left unchanged for all test cases.

Iv Evaluation

For the datasets used in this study either manually set landmarks or contours were provided for registration quality assessment. To evalute the datasets with provided landmarks we calculated the distance between corresponding landmarks before and after registration. To evaluate datasets with manually drawn contours we made use of 3 methods usually used in the field of image segmentation. First, we investigated volume similarities by the Sørensen-Dice index (DSC) [32]. Volume-based metrics alone might miss clinically relevant differences between contours as they show a lower sensitivity to errors where outlines deviate and the volume of the erroneous region is small compared to the total volume. Thus, we considered also distance-based metrics like the Hausdorff distance (HD) and the average symmetric surface distance (ASSD).

To detect anatomically implausible deformations we calculated the determinant of the Jacobian matrix for every point in a deformation vector field. The value of the determinant gives information about how the deformation changes the volume at a given point. A value of 1 indicates expansion, a value of 1 indicates no volume change, a value between 0 and 1 indicates shrinkage and a value 0 indicates image folding. For reasons of simplification we state only the mean of the determinant values standard deviation as well as the fraction of foldings (FoF) per image.

Statistical analysis was performed with the Wilcoxon signed‐rank test. This test, which has for null hypothesis that the median group difference is zero, was chosen due to non‐normal distribution and heteroscedasticity of the data. In our experiments, the confidence alpha was set to 1%.

Iv-a Experiments

We evaluated the presented algorithm in two different ways for each 4D dataset. First, we were interested in the general performance of the presented algorithm by registering only the two images of each dataset depicting extreme phases (maximum inhale/exhale, end-systolic/diastolic). In a second step we evaluated the ability of the presented algorithm to conceive periodic deformations over all time-bins. The size of the patches sequentially processed by the network may have an influence on the results. A patch size too small could hamper the algorithm from capturing large deformations and has a large computational overhead by copying data to and from the GPU. A large patch size can prevent registration by running out of memory. To measure the impact of the patch size on the results we conducted the two evaluation ways mentioned before with patches of size 484848 (Patch48), 646464 (Patch64), 808080 (Patch80) and 969696 voxels (Patch96). With a patch size of 96 voxels we were running out of memory for some 4D cases, therefore larger patch sizes were not evaluated. In a last test setting the impact of the downsampling layers is analysed.

Iv-A1 3D registration evaluation

For several datasets at hand landmarks or segmentations are only available for the two extreme phases. To analyse the registration accuracy and the ability of our algorithm to calculate a DVF and its inverse simultaneously just these two phases were employed. The registration of only two phases resembles a conventional 3D registration task and facilitates also a comparison to published results in the literature. The figures of merit mentioned in section IV were calculated for the DVF from exhale/diastolic phase to inhale/systolic phase and vice versa. To measure how well the inverse DVF is approximated by the presented algorithm we applied the two calculated DVF of a dataset to a landmark-set or a segmentation in sequence and compared the double deformed input to the original input. This procedure was carried out two times for each dataset. First by starting with the exhale/diastolic phase and in a second run by starting with the inhale/systolic phase. In an ideal case double deformed input and original input would be the same. To account for anatomically implausible deformations we calculated the Jacobi determinant for every DVF.

Iv-A2 4D registration evaluation

When registering 4D datasets it is important that the algorithm can track moving structures over all time-bins and takes care of the underlying motion pattern. In contrast to 3D registration the network has to combine multiple small deformations instead of calculating one large deformation. We investigated how the registration error changes from time-bin to time-bin, whether it increases or decreases compared to 3D registration when focusing on the extreme phases only and how well the periodic motion pattern is reflected. This is done by calculating landmark distance, DSC, ASSD, HD for every single DVF from one time-bin to its successor (provided landmarks or contour are available) as well as for combinations of successive DVF. The evaluation scheme for one time-bin is defined as follows:

  1. set to

  2. deform landmarks/segmentations of a time-bin with and compare the result to landmarks/segmentations of time-bin

  3. if set to otherwise set to

  4. if is equal to terminate otherwise go to 2)

This scheme is repeated for all time-bins in a dataset and results in comparisons provided that landmarks/segmentations are available for all time-bins. If landmarks/segmentations are missing for a given time-bin the comparison is skipped. In an ideal case the consecutively deformed landmarks/segmentations of a given time-bin with all DVF of a dataset are equal to the original landmarks/segmentations of that given time-bin. Foldings were again detected by calculating the Jacobi determinant.

Iv-A3 Downsampling

Calculating a dense displacement field demands a lot of calculations and memory. To determine the gain of accuracy by the third layer of our network (which calculates a DVF without downsampling) we performed tests by using only the first two layers and evaluated performance and computational costs.

V Results

The first part of this section covers the analysis of the different input patch sizes. With the best performing patch size we then evaluated our algorithm for the registration of two 3D image datasets with a focus on accuracy, inverse calculation and comparison to other state-of-the-art algorithms. The third part deals with the evaluation of the 4D dataset registration results with an emphasis on accuracy and periodicity. In the last subsection results for the algorithm with only 2 layers are given.

V-a Patch size

Table I summarises the registration performance of our algorithm with different patch sizes per datasets. At a first glance the high computation time of our algorithm with a patch size of 48 voxles attracts attention. The higher the patch size the lower the computation time, which indicates that additional computations for copying data to and from the GPU and assembling the deformation field are an important factor. For the 3D test cases the worst results were given also with Patch48. Especially for the lung datasets a big difference to the other settings could be seen. A deeper inspections showed that the higher the initial registration error the higher is the difference to the better performing methods. Thus, we assume that a patch size of 48 voxels is already too small to gather enough information to account for large deformations. Therefore, Patch48 was disregarded in the remainder. The results for patch sizes of 64, 80 and 96 voxels look similar, which is also confirmed by a statistical analysis. Only for the datasets Sunnybrook3D, Popi4D and Dirlab4D the best performing algorithm yielded a statistically better performance compared to the others. The algorithm with a patch size of 80 voxels yielded best results in two out of these three test cases. Therefore, in further analyses and comparisons, we only report the results of our algorithm with a patch size of 80 voxels.

Method Dirlab3D (mm)* Popi3D (mm)* Sunnybrook3D (DSC)** Dirlab4D (mm)** Popi4D (mm)* Sunnybrook4D (DSC)** Computation Time (min)
Patch48 1.92 2.35 1.77 2.03 0.83 0.12 1.99 1.78 1.75 1.36 0.87 0.10 44 50
Patch64 1.77 2.07 1.50 1.67 0.85 0.11 1.91 1.58 1.73 1.24 0.89 0.09 31 33
Patch80 1.83 2.35 1.49 1.59 0.86 0.10 1.95 1.64 1.71 1.32 0.87 0.13 28 29
Patch96 1.86 2.08 1.44 1.50 0.84 0.12 1.99 1.57 1.77 1.30 0.87 0.15 26 26
Table I: This table compares the registration performance of our algorithm with four different patch size settings ranging from 484848 voels to 969696 voxels. The last column shows the computation time averaged over 3D and 4D datasets. * Landmark distance; ** Sørensen-Dice index

V-B 3D registration evaluation

In table II the results for the 3D registration evaluation can be seen. The average landmark distance between maximum inhale and exhale phase of the lung datasets could be reduced from an initial value of 8.33 mm to 1.70 mm for both, maximum inspiration to expiration phase and vice versa. DSC could be increased to a value of 0.86 and HD and ASSD could be decreased to values of 11.73 and 1.72, respectively with DVF calculated between end-diastolic and end-systolic phase of the cardiac cine MRI datasets. The analysis of the ability to calculate the inverse transformation showed an average landmark distance of 0.89 0.42 mm comparing the original landmark set with the two times deformed landmark sets of the lung datasets. The same analysis for the Sunnybrook datasets showed DSC, HD and ASSD values of 0.97 0.02, 6.17 4.27 mm and 0.36 0.25 mm, respectively. The large difference of HD and ASSD can be explained with the big slice thickness of several millimetres.

For the cardiac datasets the DVF for mapping the systolic phase to diastolic phase showed better results (DSC: 0.90, HD: 10.35 mm, ASSD: 1.49 mm) compared to the registration from diastolic to systolic phase (DSC: 0.81, HD: 13.12 mm, ASSD: 1.95 mm). We reduce this direction-dependent behaviour to the resampling of the Sunnybrook datasets. In the diastolic images the border of the LV segmentation is in the area of a high gradient in the deformation field where small and large deformations meet. Due to the nearest neighbour interpolation used for resampling the segmentation masks, the mask for interpolated slices is either too small or too large with respect to the linearly interpolated image slices. An example is shown in Fig.

3. However, to exclude an impact of the networks design we repeated the registration with switched input channels. The repeated experiments showed similar results and confirm the more accurate evaluation results from systolic to diastolic phase. Another point that contradicts a methodological failure is that for the lung datasets the better performance was distributed equally.

a) x,y-plane of SC-HYP-38 b) SC-HYP-38 along z-axis

Figure 3: In this figure the result for registering extreme systolic to diastolic phase for case SC-HYP-38 is depicted. The spikes in z-direction of the calculated contour (red) in b) and the closeness of the input contour (green) to tissue with a lower motion amplitude are an indicator that a reason for the poor registration result lies in the resampling along the z-asix and the nearest neighbour interpolation.

Table III gives results of our proposed algorithm alongside those reported in the literature for DL based and conventional registration algorithms. Summarised, our algorithm is able to keep up with the results in the literature. One has to note that in the literature the results are stated only for one registration direction. As it was not always clear which registration direction the authors reported, we averaged our results over both directions.

Lung Datasets Before Registration Inhale to Exhale Exhale to Inhale
Dataset Dimensions Time- bins Voxelsize (mm) LD (mm) LD (mm) LD (mm)
DirLab01 256 x 256 x 94 10 0.97 x 0.97 x 2.50 3.89 2.78 1.26 1.06 1.16 0.65
DirLab02 256 x 256 x 112 10 1.16 x 1.16 x 2.50 4.34 3.90 1.14 0.71 1.13 0.59
DirLab03 256 x 256 x 104 10 1.15 x 1.15 x 2.50 6.94 4.05 1.31 0.85 1.33 0.79
DirLab04 256 x 256 x 99 10 1.13 x 1.13 x 2.50 9.83 4.88 1.81 1.78 1.87 1.73
DirLab05 256 x 256 x 106 10 1.10 x 1.10 x 2.50 7.47 5.5 1.85 1.81 1.75 1.35
DirLab06 512 x 512 x 128 10 0.97 x 0.97 x 2.50 10.89 6.96 2.52 4.90 2.09 2.13
DirLab07 512 x 512 x 136 10 0.97 x 0.97 x 2.50 11.02 7.42 1.84 1.66 1.98 1.64
DirLab08 512 x 512 x 128 10 0.97 x 0.97 x 2.50 14.99 9.00 3.36 4.75 3.57 5.24
DirLab09 512 x 512 x 128 10 0.97 x 0.97 x 2.50 7.92 3.97 1.44 0.84 1.50 0.85
DirLab10 512 x 512 x 120 10 0.97 x 0.97 x 2.50 7.30 6.34 1.79 2.52 1.80 1.93
Popi01 512 x 512 x 141 10 0.98 x 0.98 x 2.00 5.90 2.73 1.04 0.46 1.15 0.84
Popi02 512 x 512 x 169 10 0.98 x 0.98 x 2.00 14.04 7.20 2.51 2.89 2.90 3.62
Popi03 512 x 512 x 170 10 0.88 x 0.88 x 2.00 7.67 5.05 1.43 1.52 1.38 1.56
Popi04 512 x 512 x 187 10 0.78 x 0.78 x 2.00 7.33 4.89 1.25 1.91 1.10 1.74
Popi05 512 x 512 x 139 10 1.17 x 1.17 x 2.00 7.09 5.08 1.32 1.01 1.27 0.92
Popi06 512 x 512 x 161 10 1.17 x 1.17 x 2.00 6.68 3.68 1.29 1.01 1.25 0.88
8.33 8.86 1.70 2.26 1.70 2.04
Cardiac Datasets Before Registration Systolic to Diastolic Diastolic to Systolic
Dataset Dimensions Time- bins Voxelsize (mm) DSC HD (mm) ASSD (mm) DSC HD (mm) ASSD (mm) DSC HD (mm) ASSD (mm)
SC-HF-I-1 256 x 256 x 12 20 1.37 x 1.37 x 10.00 0.84 14.73 2.05 0.93 8.08 1.40 0.90 12.00 1.58
SC-HF-I-2 256 x 256 x 13 20 1.37 x 1.37 x 10.00 0.84 12.11 1.94 0.94 6.71 1.27 0.93 7.50 1.26
SC-HF-I-4 256 x 256 x 10 20 1.29 x 1.29 x 8.00 0.87 13.05 1.85 0.96 6.00 0.86 0.94 6.71 1.04
SC-HF-I-40 256 x 256 x 11 20 1.37 x 1.37 x 8.00 0.73 16.46 2.95 0.94 8.75 0.96 0.87 12.73 1.56
SC-HF-NI-3 256 x 256 x 12 20 1.37 x 1.37 x 8.00 0.90 11.68 1.25 0.96 4.74 0.82 0.94 7.65 0.99
SC-HF-NI-4 256 x 256 x 11 20 1.37 x 1.37 x 8.00 0.86 9.40 1.96 0.93 7.50 1.21 0.89 15.00 1.80
SC-HF-NI-34 256 x 256 x 13 20 1.37 x 1.37 x 10.00 0.78 12.30 2.17 0.92 10.50 1.41 0.89 13.08 1.57
SC-HF-NI-36 256 x 256 x 10 20 1.21 x 1.21 x 10.00 0.78 12.30 2.17 0.95 20.00 0.59 0.93 11.11 0.84
SC-HYP-1 256 x 256 x 9 20 1.37 x 1.37 x 10.00 0.53 20.05 5.13 0.84 15.00 2.48 0.68 16.50 3.02
SC-HYP-3 256 x 256 x 8 20 1.37 x 1.37 x 10.00 0.32 19.91 8.94 0.89 10.50 1.24 0.71 13.50 2.17
SC-HYP-38 256 x 256 x 12 20 1.37 x 1.37 x 10.00 0.71 12.60 2.47 0.88 7.79 1.68 0.61 16.84 3.02
SC-HYP-40 256 x 256 x 16 20 1.37 x 1.37 x 8.00 0.55 20.14 6.44 0.88 6.71 1.55 0.78 11.52 1.95
SC-N-2 256 x 256 x 9 20 1.37 x 1.37 x 8.00 0.54 21.96 6.98 0.93 6.00 0.90 0.78 10.06 1.74
SC-N-3 256 x 256 x 20 20 1.37 x 1.37 x 8.00 0.62 20.62 4.22 0.76 21.53 3.33 0.67 25.01 3.75
SC-N-40 256 x 256 x 12 20 1.25 x 1.25 x 10.00 0.56 17.29 4.72 0.80 15.37 2.63 0.66 17.62 3.02
0.70 15.64 3.68 0.90 10.35 1.49 0.81 13.12 1.95
0.17 4.07 2.30 0.06 5.23 0.77 0.12 4.71 0.87
Table II: In this table image details of the datasets used or testing are listed in addition with the initial registration error and the results of the 3D baseline registration evaluation tests. For the lung datasets the average landmark distance (LD) standard deviation is given for maximum exhalation to inhalation phase registration and vice versa, average Sørensen-Dice index (DSC), Hausdorff distance (HD) and average symmetric surface distance (ASSD) standard deviation of the contours for end-diastolic and end-systolic phase registrations can be seen for the cardiac MRI datasets.
Dataset Proposed De Voss 2019 [11] Eppenhof 2018 [8] Eppenhof 2018 [9] Vishnevskiy 2017 [33] Dataset Proposed Vandemeulebroucke 2011 [19]
DirLab01 1.21 0.88 1.27 1.16 - 1.65 0.89 0.76 - Popi01 1.09 0.68 0.96 0.57
DirLab02 1.13 0.65 1.20 1.12 1.24 0.61 2.26 1.16 0.77 - Popi02 2.71 3.28 1.56 1.34
DirLab03 1.32 0.82 1.48 1.26 - 3.15 1.63 0.90 - Popi03 1.40 1.54 1.53 1.70
DirLab04 1.84 1.76 2.09 1.93 1.70 1.00 4.24 2.69 1.24 - Popi04 1.17 1.83 1.96 2.92
DirLab05 1.80 1.60 1.95 2.10 - 3.52 2.23 1.12 - Popi05 1.30 0.97 1.48 1.39
DirLab06 2.30 3.78 5.16 7.09 - 3.19 1.50 0.8 - Popi06 1.27 0.95 1.25 0.95
DirLab07 1.91 1.65 3.05 3.01 - 4.25 2.08 0.80 - Average 1.49 1.59 1.46 1.65
DirLab08 3.47 5.00 6.48 5.37 - 9.03 5.08 1.34 - Proposed De Voss 2019 [11]
DirLab09 1.47 0.85 2.10 1.66 1.61 0.82 3.85 1.86 0.92 -
DirLab10 1.79 2.24 2.09 2.24 - 5.07 2.31 0.82 - Sunnybrook 0.86 0.10 0.89 0.14
Average 1.83 2.35 2.64 4.32 1.52 0.85 4.02 3.08 0.95 - (cardiac)
Table III: Comparison of registration results between our proposed registration algorithm and those reported in the literature. The algorithms by De Voss et al. [11], Eppenhof et al. [8, 9] use deep learning based methods, the algorithms by Vishnevskiy et al. [33] and Vandemeulebroucke et al. [19] are conventional registration algorithms. For the DirLab and Popi datasets the table shows the average landmark distances in mm, for the Sunnybrook dataset the average Sørensen-Dice index is given. As the presented algorithm calculates the vector fields from one image to another and the inverse at the same time the results of our algorithm are averaged over both deformations. In the literature the results are reported only for one direction.

Analysis of the DVF with respect to anatomically implausible deformations showed that the FoF per image dataset was on average 0.25 % of the calculated deformation vectors. The average Jacobi determinant value was 1.00 0.32. A deeper inspection of the Jacobi determinant values revealed that folding occurs mainly in the area of large deformations close to the image or patch border where tissue can move out of or into the field of view.

V-C 4D Evaluation

The initial average landmark distance between consecutive time-bins was 2.25 1.55 mm for the Popi datasets and 2.19 2.02 mm for the DirLab datasets. The proposed registration algorithm could decrease the average distance to 1.24 0.90 mm and 1.54 1.31 mm, respectively. The construction of trajectories for voxels requires the combination of all DVF of a 4D dataset. As a consequence the registration error at the start of a trajectory gets propagated over time and increases or decreases with the combination of the different DVF. In Fig. 4 the error evolution is visualised for the three test datasets. At the beginning the mean registration error is 1.54 mm for the DirLab datasets and 1.24 mm for the Popi datasets. The error then increases to a mean of 2.58 mm for DirLab and 2.07 mm for Popi in the middle of the periodic motion. In the end of the breathing cycle the error falls to an error around 1 mm for both datasets. A similar error propagation pattern could be measured for the Sunnybrook datasets. For the Sunnybrook datasets segmentations are only available for the extreme phases, which allows only four comparisons per dataset. When starting with the end-diastolic phase a comparison in the middle of the cyclic motion to the end-systolic segmentation and in the end of the cycle motion to the original end-diastolic segmentation is possible. The same applies when starting with the end-systolic segmentation. In Fig. 4 the red dots indicate the measured DSC overlap, the red dotted line was fitted to the dots and estimates the DSC overlap for time-bins without a segmentation available. Compared to a direct registration of the maximum exhale and inhale phases the registration error increased on average by 0.48 mm with the combination of the DVF. The average registration error for the extreme phases in the Popi dataset increased from 1.46 1.59 mm to 1.98 1.56 mm when we registered the whole 4D dataset instead of the two extreme phases solely. The same analysis reported an increase from 1.83 2.35 mm to 2.54 2.01 mm for the DirLab images and and decrease in the DSC from 0.86 0.10 to 0.71 0.17 for the Sunnybrook data. The DVF of the 4D evaluation showed a FoF of 0.02 % and an average Jacobi determinant of 1.00 0.06.

a) before registration
b) after registration

Figure 4: In this figure the development of the registration error over time is plotted for the different 4D datasets before a) and after registration b). In the beginning the error is low, then due to large deformations and the combination of deformation vector fields the error increases towards the middle of a time series until it decreases again because of the periodicity constraint. For the cardiac Sunnybrook datasets only segmentations for the 2 extreme phases are available, therefore we interpolated the remaining values (red line). The magenta filled area indicates the average registration error for the Popi and DirLab datasets standard deviation.

V-D Impact Downsampling

When we used only the first two layers of our network for registration the performance decreased slightly but the gain in computation time was huge. The average registration error for registering 3D respiratory images was 1.84 1.88 mm compared to 1.70 2.15 mm with all 3 layers. The computation time for two 3D images was on average 4 minutes, for 4D datasets the average time needed for registration increased to 8 minutes. The peak in error propagation increased to an average value of 2.32 1.55 mm, 3.08 2.27 mm and 0.75 0.14 DSC for the Popi, DirLab and Sunnybrook datasets, respectively.

a) DirLab08 deformation grid for inhale to exhale registration b) DirLab08 with overlayed image foldings (Jacobi determinant 0) c) DirLab08 registration error

d) SC-N-3 deformation grid for diastolic to systolic registration e) SC-N-3 with overlayed image foldings (Jacobi determinant 0) f) SC-N-3 contours (green: input, blue: target, red: result)

Figure 5: This figures show an analysis for two test cases that resulted in the worst results for 3D registration evaluation. The first row shows the extreme inhale phase of dataset Dirlab08 with overlayed deformation grid (a), Jacobi determinant (b) and registration error (c). The patient had a very strong sheering motion along the ribcage in caudal direction which could not be fully captured by our network. The second row shows the extreme diastolic phase for the dataset SC-N-3 with overlayed deformation grid (d) and Jacobi determinant (e) and the extreme systolic phase with contours of the left ventricle (f).

Vi Discussion

We proposed a new method to combine DL and classical medical image registration techniques to track organ motion on 3D and 4D medical image datasets. Compared to other methods our approach offers three main benefits. First, no training data is needed to achieve good results. Second, when the algorithm is applied to register two 3D datasets, it calculates the DVF and an approximation of its inverse simultaneously. Third, we demonstrated that our algorithm achieves a good performance independent of organ site or modality.

We showed that our algorithm can calculate the motion between two 3D images and adequately approximate the inverse. Also the periodic motion of the beating heart or the respiratory cycle could be reflected by our algorithm. When we compared the direct registration of the two extreme phases of a dataset with the registration of the whole 4D dataset, an increase of the registration error at the extreme phases could be seen. In our work we decided to register each image to its directly adjacent timely neighbour. Another approach would be to register each time-bin to a reference image. The reasons why we decided against the registration with a reference image are that our approach is more intuitive and directly calculates a trajectory for each voxel. Moreover we made the experience in the beginning of the work that the network is better in registering images with small deformations than in registering images with large deformations.

By evaluating the proposed algorithm with multiple 3D and 4D publicly available datasets we were able to show that the benefits of a DL architecture can be used without training data to achieve results as good as those reported in the literature with training data available. However, the results showed also that our algorithm is outperformed in 3D registration tasks by conventional image registration algorithms. One has to take into account that these algorithms make use of motion specific parameters and weights [33] whereas our approach is a more generic one without the need of motion, modality or organ specific information for registering two 3D images (only for 4D we assume periodicity).

The approach we used in this work is unsupervised but before a registration problem can be solved the influence of the different parts of the objective function needs to be tuned. This task requires expert knowledge and has a big impact on the accuracy. Due to simplicity we used one parameter set for all datasets. Although the general performance of our algorithm is very promising we made the experience that a different parameter setting would have increased the performance for some datasets. Here supervised methods definitely have an advantage.

One drawback of the presented registration algorithm is the computation time. Compared to DL based registration methods with an extra training phase the actual time needed to register one dataset is rather long (even with only 2 layers).

On grounds of our analysis we see two applications setting for our algorithm: When the training of network is not possible due to a lack of datasets or very heterogeneous datasets, which is often the case in small mono-centric studies in the field of radiotherapy that investigate a specific organ motion for treatment planning. Such studies usually have a low patient number as the acquisition of 4D images is time consuming, costly and implicates a higher radiation exposure in case of 4D CT. A second application scenario would be e.g. in the early stage of a study with a larger patient cohort. Datasets could be registered with our one shot approach until enough datasets are available to allow for an extra training phase. Subsequently the same architecture used for one shot registration could be trained on the existing datasets to enable a faster registration of upcoming datasets with similar accuracy. In a lifelong learning fashion [34], the network would learn over time and increase its performance with additional datasets.

In our future work a focus will be on an easier optimization of the objective function parameters. Especially in the case of a deep architecture the tuning of the meta parameters is very time consuming and unintuitive as it is hard to comprehend how a change in the parameters influences the network weights and the registration performance. Depending on the registration task at hand, different parameters are needed. For datasets with a large amount of shearing motion a lower weighting of the smoothness term could increase the registration quality. On the other hand for images with a homogeneous motion but with a lot of noise (like in some ultrasound images) a bigger influence of the smoothness term might improve the registration result. The use of a minimum spanning tree approach [35] in the loss calculation to tackle sheering motion or the classification of an image into motion classes could be possible options.

Vii Conclusion

In this paper we presented a DL based algorithm to calculate DVF for periodic motion tracking with 3D and 4D medical image datasets without the need of any training phase. For 3D datasets the algorithm is able to calculate forward and inverse transformation simultaneously. The algorithm was tested thoroughly with multiple public datasets. Compared to existing state of the art algorithms, our approach showed a competitive performance. The main benefits of our algorithm are that it is generic and can be used for different modalities and body regions and that no training data is needed in advance. The promising results suggest a possible utilization in two different setting. First, as generic stand-alone algorithm for conventional image registration and motion tracking in e.g. radiotherapy treatment planning. Second, for studies focused on a specific organ and image modality the algorithm could be used without training phase until enough data is available and then be trained and thus specialised on a specific registration task to further increase registration accuracy.

Viii Acknowledgements

We are thanking Richard Castillo, the Sunnybrook Health Sciences Centre as well as the Léon Bérard Cancer Center & CREATIS lab for sharing the datasets that we used in this work to evaluate the presented algorithm. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.

Disclosure of Conflicts of Interest: The authors have no relevant conflicts of interest to disclose.


  • [1] A. Sotiras, C. Davatzikos, and N. Paragios, “Deformable medical image registration: A survey,” IEEE Transactions on Medical Imaging, vol. 32, no. 7, pp. 1153–1190, jul 2013.
  • [2] A. P. Keszei, B. Berkels, and T. M. Deserno, “Survey of non-rigid registration tools in medicine,” Journal of Digital Imaging, vol. 30, no. 1, pp. 102–116, oct 2016.
  • [3] S. Oh and S. Kim, “Deformable image registration in radiation therapy,” Radiation Oncology Journal, vol. 35, no. 2, pp. 101–111, jun 2017.
  • [4]

    M. Jaderberg, K. Simonyan, A. Zisserman, and k. kavukcuoglu, “Spatial transformer networks,” in

    Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, Eds.   Curran Associates, Inc., 2015, pp. 2017–2025.
  • [5] Y. Hu, M. Modat, E. Gibson, W. Li, N. Ghavami, E. Bonmati, G. Wang, S. Bandula, C. M. Moore, M. Emberton et al., “Weakly-supervised convolutional neural networks for multimodal image registration,” Medical image analysis, vol. 49, pp. 1–13, 2018.
  • [6] X. Yang, R. Kwitt, M. Styner, and M. Niethammer, “Quicksilver: Fast predictive image registration – a deep learning approach,” NeuroImage, vol. 158, pp. 378 – 396, 2017.
  • [7] J. Fan, X. Cao, P.-T. Yap, and D. Shen, “Birnet: Brain image registration using dual-supervised fully convolutional networks.”
  • [8]

    K. A. J. Eppenhof and J. P. W. Pluim, “Pulmonary CT registration through supervised learning with convolutional neural networks,”

    IEEE Transactions on Medical Imaging, pp. 1–1, 2018.
  • [9] Deformable image registration using convolutional neural networks, vol. 10574, 2018.
  • [10] B. D. de Vos, F. F. Berendsen, M. A. Viergever, M. Staring, and I. Išgum, “End-to-end unsupervised deformable image registration with a convolutional neural network,” in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support.   Springer, 2017, pp. 204–212.
  • [11] B. D. de Vos, F. F. Berendsen, M. A. Viergever, H. Sokooti, M. Staring, and I. Išgum, “A deep learning framework for unsupervised affine and deformable image registration,” Medical image analysis, vol. 52, pp. 128–143, 2019.
  • [12]

    A. V. Dalca, G. Balakrishnan, J. Guttag, and M. R. Sabuncu, “Unsupervised learning for fast probabilistic diffeomorphic registration,”

    Medical Image Computing and Computer Assisted Intervention – MICCAI 2018, Jan. 2018.
  • [13] J. Krebs, H. e Delingette, B. Mailhe, N. Ayache, and T. Mansi, “Learning a probabilistic model for diffeomorphic registration,” IEEE Transactions on Medical Imaging, pp. 1–1, 2019.
  • [14] H. Li and Y. Fan, “Non-rigid image registration using self-supervised fully convolutional networks without training data,” CoRR, vol. abs/1801.04012, 2018.
  • [15] E. Ferrante, O. Oktay, B. Glocker, and D. H. Milone, “On the adaptability of unsupervised cnn-based deformable image registration to unseen image domains,” Machine Learning in Medical Imaging, jan 2018.
  • [16] A. Maciejczyk, I. Skrzypczyńska, and M. Janiszewska, “Lung cancer. radiotherapy in lung cancer: Actual methods and future trends,” Reports of practical oncology and radiotherapy : journal of Greatpoland Cancer Center in Poznan and Polish Society of Radiation Oncology, vol. 19, pp. 353–360, Nov. 2014.
  • [17] O. Oehlke, M. Mix, E. Graf, T. Schimek-Jasch, U. Nestle, I. Götz, S. Schneider-Fuchs, A. Weyerbrock, I. Mader, B. G. Baumert, S. C. Short, P. T. Meyer, W. A. Weber, and A.-L. Grosu, “Amino-acid pet versus mri guided re-irradiation in patients with recurrent glioblastoma multiforme (gliaa) - protocol of a randomized phase ii trial (noa 10/aro 2013-1).” BMC cancer, vol. 16, p. 769, Oct. 2016.
  • [18] H. Sundar, H. Litt, and D. Shen, “Estimating myocardial motion by 4d image warping.” Pattern recognition, vol. 42, pp. 2514–2526, Nov. 2009.
  • [19] J. Vandemeulebroucke, S. Rit, J. Kybic, P. Clarysse, and D. Sarrut, “Spatiotemporal motion estimation for respiratory-correlated imaging of the lungs,” Medical physics, vol. 38, no. 1, pp. 166–178, 2011.
  • [20] J.-M. Peyrat, H. Delingette, M. Sermesant, X. Pennec, C. Xu, and N. Ayache, “Registration of 4d time-series of cardiac images with multichannel diffeomorphic demons.” Medical image computing and computer-assisted intervention : MICCAI … International Conference on Medical Image Computing and Computer-Assisted Intervention, vol. 11, pp. 972–979, 2008.
  • [21] J.-M. Peyrat, H. Delingette, M. Sermesant, C. Xu, and N. Ayache, “Registration of 4d cardiac ct sequences under trajectory constraints with multichannel diffeomorphic demons.” IEEE transactions on medical imaging, vol. 29, pp. 1351–1368, Jul. 2010.
  • [22]

    G. Wu, Q. Wang, J. Lian, and D. Shen, “Estimating the 4d respiratory lung motion by spatiotemporal registration and super-resolution image reconstruction,”

    Medical Physics, vol. 40, no. 3, p. 031710, feb 2013.
  • [23] R. Castillo, E. Castillo, R. Guerra, V. E. Johnson, T. McPhail, A. K. Garg, and T. Guerrero, “A framework for evaluation of deformable image registration spatial accuracy using large landmark point sets,” Physics in Medicine & Biology, vol. 54, no. 7, p. 1849, 2009.
  • [24] E. Castillo, R. Castillo, J. Martinez, M. Shenoy, and T. Guerrero, “Four-dimensional deformable image registration using trajectory modeling,” Physics in Medicine & Biology, vol. 55, no. 1, p. 305, 2009.
  • [25] P. Radau, Y. Lu, K. Connelly, G. Paul, A. Dick, and G. Wright, “Evaluation framework for algorithms segmenting short axis cardiac mri,” The MIDAS Journal-Cardiac MR Left Ventricle Segmentation Challenge, vol. 49, 2009.
  • [26] “Cardiac mr left ventricle segmentation challenge,”, accessed: 2019-04-25.
  • [27] T. Fechter, J. Dolz, U. Nestle, and D. Baltas, “Ep-1417: Clinical evaluation of a fully automatic body delineation algorithm for radiotherapy,” Radiotherapy and Oncology, vol. 123, pp. S757 – S758, 2017, eSTRO 36, May 5-9, 2017, Vienna, Austria.
  • [28] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi, Eds.   Cham: Springer International Publishing, 2015, pp. 234–241.
  • [29] V. Dumoulin and F. Visin, “A guide to convolution arithmetic for deep learning.”
  • [30] J. A. Schnabel, D. Rueckert, M. Quist, J. M. Blackall, A. D. Castellano-Smith, T. Hartkens, G. P. Penney, W. A. Hall, H. Liu, C. L. Truwit, F. A. Gerritsen, D. L. G. Hill, and D. J. Hawkes, “A generic framework for non-rigid registration based on non-uniform multi-level free-form deformations,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2001, W. J. Niessen and M. A. Viergever, Eds.   Berlin, Heidelberg: Springer Berlin Heidelberg, 2001, pp. 573–581.
  • [31] A. Roche, G. Malandain, and N. Ayache, “Unifying maximum likelihood approaches in medical image registration,” International Journal of Imaging Systems and Technology, vol. 11, no. 1, pp. 71–80, 2000.
  • [32] T. Sørensen, “A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on danish commons,” Biol. Skr., vol. 5, pp. 1–34, 1948.
  • [33] V. Vishnevskiy, T. Gass, G. Szekely, C. Tanner, and O. Goksel, “Isotropic total variation regularization of displacements in parametric image registration,” IEEE Transactions on Medical Imaging, vol. 36, no. 2, pp. 385–395, feb 2017.
  • [34] G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter, “Continual lifelong learning with neural networks: A review.” Neural networks : the official journal of the International Neural Network Society, vol. 113, pp. 54–71, May 2019.
  • [35] M. P. Heinrich, M. Jenkinson, M. Brady, and J. A. Schnabel, “Mrf-based deformable registration and ventilation estimation of lung ct,” IEEE Transactions on Medical Imaging, vol. 32, no. 7, pp. 1239–1248, July 2013.