One Shot Deformable Medical Image Registration
Deformable image registration is a very important field of research in medical imaging. Recently multiple deep learning approaches were published in this area showing promising results. However, drawbacks of deep learning methods are the need for a large amount of training datasets and their inability to register unseen images different from the training datasets. One shot learning comes without the need of large training datasets and has already been proven to be applicable to 3D data. In this work we present an one shot registration approach for periodic motion tracking in 3D and 4D datasets. When applied to 3D dataset the algorithm calculates the inverse of a registration vector field simultaneously. For registration we employed a U-Net combined with a coarse to fine approach and a differential spatial transformer module. The algorithm was thoroughly tested with multiple 4D and 3D datasets publicly available. The results show that the presented approach is able to track periodic motion and to yield a competitive registration accuracy. Possible application are the use as a stand-alone algorithm for 3D and 4D motion tracking or in the beginning of studies until enough datasets for a separate training phase are available.READ FULL TEXT VIEW PDF
Image registration, the process of aligning two or more images, is the c...
The recent application of deep learning technologies in medical image
Deep learning-based methods have recently demonstrated promising results...
While deep learning has achieved significant advances in accuracy for me...
Deformable registration is ubiquitous in medical image analysis. Many
This paper presents a review of deep learning (DL) based medical image
Our anatomy is in constant motion. With modern MR imaging it is possible...
One Shot Deformable Medical Image Registration
Deformable image registration (DIR) is a technique to determine spatial non linear correspondences between two or more images and finds many important applications in the medical field. DIR can be used to detect anatomical changes within a patient (e.g. monitoring, breathing or heart motion) or to find correspondences among different patients (e.g. atlas based segmentation). Although, DIR has been an active research field for years, due to its complexity it remains a challenging problem. Some reasons that hamper the alignment of two or more images are: multiple image modalities, ill-posed high dimensional optimization, different types of motion, bad image contrast . These limitations require usually specialised algorithms tailored to a specific problem. Therefore, many innovative ideas have been proposed over the past few decades. Good overviews of the field can be found in [1, 2, 3].
With the rise of machine learning in the recent years deep learning (DL) based algorithms were presented to solve medical image registration problems. DL algorithms can learn relevant image features, the relations between features and how to translate the learned features into a deformation vector.
The best performing methods base on convolutional neural networks (CNNs). CNNs have an advantage in handling spatial information and are able to learn complex image features specific for the current registration problem.
One of the first approaches to align images with a CNN was the spatial transformer module , which was used to improve classification accuracy.
 presented a CNN to register magnetic resonance images to intra-procedural transrectal ultrasound images of prostate cancer patients. They trained their network with labelled image datasets but after completion of training the network is able to register unlabelled images.
Other supervised methods were presented by  who showed a network for predicting the momentum of large deformation diffeomorphic metric mapping model,  who presented a network for the registration of magnetic resonance images (MRI) of the brain and [8, 9] showed a method for pulmonary CT registration.
The aforementioned methods can be classified as supervised methods. They need example registrations[6, 7, 8] or manual segmentations  for training the neural networks. Generating these information is complex, time consuming and problem specific. Recently, unsupervised methods [10, 11, 12, 13, 14]
were presented. Similarly to conventional registration methods the authors of these methods find a solution to the registration problem at hand by optimizing an image similarity metric.
One drawback of the above mentioned supervised and unsupervised DL methods is that a lot of training data is needed to optimise the network weights. Training data has to be selected carefully as the datasets should cover future registration tasks as best as possible. Therefore these algorithms cannot be applied to unseen domains . Another machine learning approach that comes without the demand for big training datasets is one shot learning. Here the parameters of the network are trained from scratch only with the images to be registered. In  it has been shown that one shot learning can produce registration results comparable to state of the art methods for two dimensional image slices.
So far the presented CNN based registration methods can only generate a deformation field for two three dimensional (3D) image datasets. However, in some medical fields the application of four dimensional (4D
) data (3D + time) plays an important role. By way of example, organ or tumour movement due to respiratory motion is estimated by a 4D computed tomography (CT) scan for radiotherapy treatment planning  or the progress of a disease (e.g. tumour growth or shrinkage) during the course of treatment can be measured by follow up scans . To measure the growth of a tumour by means of a 4D dataset, DIR can be done by applying a 3D DIR algorithm to each 3D dataset and its temporal successor as the deformation of the tumour does not follow a specific pattern. Whereas, the motion of organs due to breathing or heartbeat is periodic and can not be covered by the multiple application of a 3D DIR algorithm.
For organ specific motion patterns  used Gaussian kernel to calculate temporal smooth deformation fields for cardiac cine MRI datasets. Cubic B-splines with a periodic constraint were used by  for lung motion estimation. A trajectory constraint for periodic motion is presented in [20, 21]. The authors presented a multi-channel Diffeomorphic Demons based algorithm to register two 4D cardiac time series. In  temporal coherence was ensured by smoothing temporal fibers. All these methods for the registration of 4D datasets rely on conventional image registration methods.
In this work we show a way how to use the strengths of CNNs to enable DIR of 4D periodic image datasets. We present an unsupervised CNN based DIR algorithm to calculate dense displacement fields for 4D image data with a periodic motion pattern. In case of 3D to 3D registration the algorithm calculates the deformation vector field (DVF) and an approximation of the inverse field simultaneously. Registration is performed in one shot, so no training data is necessary. We evaluated the algorithm on multiple image datasets publicly available and are able to show that its performance is equal or superior to state of the art DL registration algorithms.
The proposed algorithm was evaluated on 31 publicly available 4D datasets. Each of them having either manually set landmarks or manually drawn contours for registration evaluation. Registration results of the proposed algorithm were evaluated and compared to the results of other state of the art algorithms.
To evaluate the presented algorithm in the thoracic region the publicly available DirLab [23, 24] dataset was used. The dataset consists of 10 4D CTs of the thoracic region. Each 4D CT consists of 10 3D CTs and manually set reference landmarks in the lungs for registration evaluation. The full number of landmarks was identified only in the maximum inhalation and exhalation phase but in addition a subset of 75 features was marked on each of the expiratory phase images. In table II more details can be seen.
Another dataset for evaluating image registration algorithms is provided by the Léon Bérard Cancer Center & CREATIS lab, Lyon, France . Six 4D CTs consisting of 10 3D CTs each, are provided. The datasets show the lung region and contain landmarks for registration evaluation. For the first three patients landmarks were identified in each of the 10 3D CTs, for the remaining three patients landmarks were identified in the CTs depicting the maximum inhale and exhale phase. Dataset details can be found in table II.
To evaluate the performance of our algorithm on a different body site and modality we used the cardiac MRI datasets  provided by the Sunnybrook Health Sciences Centre. In particular, we used the 12 datasets that were also provided as training data for the MICCAI left ventricle segmentaion challenge . The datasets contain cine-MRI images from patients with different pathologies and expert-drawn contours for validation. Each datasets consists of 20 time-bins and expert drawn contours are available for end-diastolic and end-systolic phases. In table II details on the datasets are listed (the datasets are abbreviated as SC-*).
Let be a 4D medical image dataset consisting of 3D images (for conventional 3D to 3D DIR is 2). The aim of our deformable DIR algorithm is to find a transformation that maps each 3D image in to its timely adjacent successor (as we deal with periodic data is aligned to ). consist of 3D dense vector fields with 3 components (x-, y- and z-direction) each and describes the trajectory for each voxel in . is the sequential application of the deformation fields to to position . The deformation vector at a given position and time-bin is indicated as .
In this work we assume that the images are pre-registered using affine transformations. The task at hand is to calculate the deformable transformation that represents anatomically plausible spatial relationships between voxels in and takes care of the periodic character of the underlying motion.
To reduce the amount of data to be processed every dataset was divided into foreground and background region and only the foreground region was processed by the registration algorithm. For the Popi and DirLab datasets the foreground was defined as all voxels inside the body outline contour. The body outline contour was generated automatically with . For the Sunnybrook datasets the foreground was generated by discarding the largest connected region containing only voxels with intensity 0.
The Sunnybrook datasets have much smaller voxel spacing along the x- and y-axis than along the z-axis, which had a negative impact on the registration results. Therefore these datasets were resampled to a voxel size of 1.5 x 1.5 x 1.5 mm. For the other two dataset collections this step was neglected because the differences in voxel spacing along the axes were not as big as for the Sunnybrook datasets and preliminary tests showed that resampling the datasets leads to no significant change in registration performance.
In a last preprocessing step the foreground voxel intensities were normalised to zero mean and standard deviation of one.
Following the recently published works about DL based image registration [15, 10, 11, 12, 13, 14] we decided for a CNN based architecture. Similar to  we employed the U-net architecture  with 2 downsampling steps with average pooling layers, 2 upsampling steps with transposed convolution layers  and skip connections by summation instead of concatenation. The network takes or a patch of as input (each time-bin as a single channel) and calculates deformation fields with the network parameters . An illustration of all layers and connections can be found in Fig. 1.
The idea of one shot learning in image registration is to start with an untrained model and to optimise the parameters until convergence only with the one dataset to be registered .
The large amount of memory which is needed for 4D image data, the corresponding deformation fields and the network parameters prevented to process datasets as a whole. Therefore, before processing an input dataset it is split into smaller non overlapping patches. Splitting is performed along the spatial dimensions only. The network is then trained till convergence with each patch. After termination of training the computed vector field patches are assembled.
Patch based processing in combination with the networks receptive field ( voxels) inhibits the presented network with the original image as input to account for large deformations (occurring e.g. in the diaphragmatic region). Thus we adopted a coarse-to-fine approach like it is often used in conventional image registration . Our multi-resolution approach consists of 3 image registration steps. First, basic deformations are calculated with the input image downsampled by a factor of 4. This step is repeated with the input image downsampled by a factor of 2 to capture intermediate deformations. In a last step the original image is fed to the network. The calculated vector fields of a respective downsample step are added to the upsampled vector fields of previous steps before the loss is calculated. Thereby, the net first calculates a basic deformation field and adds more and more details after every upsampling step. The whole multi-resolution architecture is depicted in Fig. 1.
To find we optimize
to minimise a loss function. The value of indicates the quality of the deformation field and how well aligns the images in . In this work is defined as:
and consists of 3 parts: an image dissimilarity metric , a regularization constraint for smoothness and to prevent folding as well as a cyclic constraint to encourage periodic deformations. and are weighting terms to control the influence of the regularisation terms. For the negative normalized cross correlation was adopted as it is known to perform well for mono modal image registration . was chosen to be the l2-norm derivatives of the deformation field in combination with a boundary smoothness constraint to guarantee a homogeneous deformation field also at patch transitions:
with being the set of all 3D voxel positions, the number of time-bins, the set of voxel positions at the border of an image patch and the set of neighbouring voxel positions along a given axis. For a better understanding we visualised the sets and in Fig. 2. controls the influence of the boundary smoothness constraint and calculates the Euclidean distance between positions and .
The periodic constraint was defined as the sum of all deformation vectors along the trajectory of a voxel:
The presented algorithm was implemented with pytorch v1.0.1 and is provided on GitHub111https://github.com/ToFec/OneShotImageRegistration. Computations were done on a PC with a 8 core Intel Xeon CPU, 60 GB memory and a Nvidia Titan XP GPU. In our test setting the following parameter values were used: : , : , : zero in the first multi-resolution step and 0.1 for the following steps. We assumed convergence when the current value of the loss function (1) and the moving average of previous loss values do not differ by more than . The parameter values were determined on one test dataset (DirLab08) and left unchanged for all test cases.
For the datasets used in this study either manually set landmarks or contours were provided for registration quality assessment. To evalute the datasets with provided landmarks we calculated the distance between corresponding landmarks before and after registration. To evaluate datasets with manually drawn contours we made use of 3 methods usually used in the field of image segmentation. First, we investigated volume similarities by the Sørensen-Dice index (DSC) . Volume-based metrics alone might miss clinically relevant differences between contours as they show a lower sensitivity to errors where outlines deviate and the volume of the erroneous region is small compared to the total volume. Thus, we considered also distance-based metrics like the Hausdorff distance (HD) and the average symmetric surface distance (ASSD).
To detect anatomically implausible deformations we calculated the determinant of the Jacobian matrix for every point in a deformation vector field. The value of the determinant gives information about how the deformation changes the volume at a given point. A value of 1 indicates expansion, a value of 1 indicates no volume change, a value between 0 and 1 indicates shrinkage and a value 0 indicates image folding. For reasons of simplification we state only the mean of the determinant values standard deviation as well as the fraction of foldings (FoF) per image.
Statistical analysis was performed with the Wilcoxon signed‐rank test. This test, which has for null hypothesis that the median group difference is zero, was chosen due to non‐normal distribution and heteroscedasticity of the data. In our experiments, the confidence alpha was set to 1%.
We evaluated the presented algorithm in two different ways for each 4D dataset. First, we were interested in the general performance of the presented algorithm by registering only the two images of each dataset depicting extreme phases (maximum inhale/exhale, end-systolic/diastolic). In a second step we evaluated the ability of the presented algorithm to conceive periodic deformations over all time-bins. The size of the patches sequentially processed by the network may have an influence on the results. A patch size too small could hamper the algorithm from capturing large deformations and has a large computational overhead by copying data to and from the GPU. A large patch size can prevent registration by running out of memory. To measure the impact of the patch size on the results we conducted the two evaluation ways mentioned before with patches of size 484848 (Patch48), 646464 (Patch64), 808080 (Patch80) and 969696 voxels (Patch96). With a patch size of 96 voxels we were running out of memory for some 4D cases, therefore larger patch sizes were not evaluated. In a last test setting the impact of the downsampling layers is analysed.
For several datasets at hand landmarks or segmentations are only available for the two extreme phases. To analyse the registration accuracy and the ability of our algorithm to calculate a DVF and its inverse simultaneously just these two phases were employed. The registration of only two phases resembles a conventional 3D registration task and facilitates also a comparison to published results in the literature. The figures of merit mentioned in section IV were calculated for the DVF from exhale/diastolic phase to inhale/systolic phase and vice versa. To measure how well the inverse DVF is approximated by the presented algorithm we applied the two calculated DVF of a dataset to a landmark-set or a segmentation in sequence and compared the double deformed input to the original input. This procedure was carried out two times for each dataset. First by starting with the exhale/diastolic phase and in a second run by starting with the inhale/systolic phase. In an ideal case double deformed input and original input would be the same. To account for anatomically implausible deformations we calculated the Jacobi determinant for every DVF.
When registering 4D datasets it is important that the algorithm can track moving structures over all time-bins and takes care of the underlying motion pattern. In contrast to 3D registration the network has to combine multiple small deformations instead of calculating one large deformation. We investigated how the registration error changes from time-bin to time-bin, whether it increases or decreases compared to 3D registration when focusing on the extreme phases only and how well the periodic motion pattern is reflected. This is done by calculating landmark distance, DSC, ASSD, HD for every single DVF from one time-bin to its successor (provided landmarks or contour are available) as well as for combinations of successive DVF. The evaluation scheme for one time-bin is defined as follows:
deform landmarks/segmentations of a time-bin with and compare the result to landmarks/segmentations of time-bin
if set to otherwise set to
if is equal to terminate otherwise go to 2)
This scheme is repeated for all time-bins in a dataset and results in comparisons provided that landmarks/segmentations are available for all time-bins. If landmarks/segmentations are missing for a given time-bin the comparison is skipped. In an ideal case the consecutively deformed landmarks/segmentations of a given time-bin with all DVF of a dataset are equal to the original landmarks/segmentations of that given time-bin. Foldings were again detected by calculating the Jacobi determinant.
Calculating a dense displacement field demands a lot of calculations and memory. To determine the gain of accuracy by the third layer of our network (which calculates a DVF without downsampling) we performed tests by using only the first two layers and evaluated performance and computational costs.
The first part of this section covers the analysis of the different input patch sizes. With the best performing patch size we then evaluated our algorithm for the registration of two 3D image datasets with a focus on accuracy, inverse calculation and comparison to other state-of-the-art algorithms. The third part deals with the evaluation of the 4D dataset registration results with an emphasis on accuracy and periodicity. In the last subsection results for the algorithm with only 2 layers are given.
Table I summarises the registration performance of our algorithm with different patch sizes per datasets. At a first glance the high computation time of our algorithm with a patch size of 48 voxles attracts attention. The higher the patch size the lower the computation time, which indicates that additional computations for copying data to and from the GPU and assembling the deformation field are an important factor. For the 3D test cases the worst results were given also with Patch48. Especially for the lung datasets a big difference to the other settings could be seen. A deeper inspections showed that the higher the initial registration error the higher is the difference to the better performing methods. Thus, we assume that a patch size of 48 voxels is already too small to gather enough information to account for large deformations. Therefore, Patch48 was disregarded in the remainder. The results for patch sizes of 64, 80 and 96 voxels look similar, which is also confirmed by a statistical analysis. Only for the datasets Sunnybrook3D, Popi4D and Dirlab4D the best performing algorithm yielded a statistically better performance compared to the others. The algorithm with a patch size of 80 voxels yielded best results in two out of these three test cases. Therefore, in further analyses and comparisons, we only report the results of our algorithm with a patch size of 80 voxels.
|Method||Dirlab3D (mm)*||Popi3D (mm)*||Sunnybrook3D (DSC)**||Dirlab4D (mm)**||Popi4D (mm)*||Sunnybrook4D (DSC)**||Computation Time (min)|
|Patch48||1.92 2.35||1.77 2.03||0.83 0.12||1.99 1.78||1.75 1.36||0.87 0.10||44 50|
|Patch64||1.77 2.07||1.50 1.67||0.85 0.11||1.91 1.58||1.73 1.24||0.89 0.09||31 33|
|Patch80||1.83 2.35||1.49 1.59||0.86 0.10||1.95 1.64||1.71 1.32||0.87 0.13||28 29|
|Patch96||1.86 2.08||1.44 1.50||0.84 0.12||1.99 1.57||1.77 1.30||0.87 0.15||26 26|
In table II the results for the 3D registration evaluation can be seen. The average landmark distance between maximum inhale and exhale phase of the lung datasets could be reduced from an initial value of 8.33 mm to 1.70 mm for both, maximum inspiration to expiration phase and vice versa. DSC could be increased to a value of 0.86 and HD and ASSD could be decreased to values of 11.73 and 1.72, respectively with DVF calculated between end-diastolic and end-systolic phase of the cardiac cine MRI datasets. The analysis of the ability to calculate the inverse transformation showed an average landmark distance of 0.89 0.42 mm comparing the original landmark set with the two times deformed landmark sets of the lung datasets. The same analysis for the Sunnybrook datasets showed DSC, HD and ASSD values of 0.97 0.02, 6.17 4.27 mm and 0.36 0.25 mm, respectively. The large difference of HD and ASSD can be explained with the big slice thickness of several millimetres.
For the cardiac datasets the DVF for mapping the systolic phase to diastolic phase showed better results (DSC: 0.90, HD: 10.35 mm, ASSD: 1.49 mm) compared to the registration from diastolic to systolic phase (DSC: 0.81, HD: 13.12 mm, ASSD: 1.95 mm). We reduce this direction-dependent behaviour to the resampling of the Sunnybrook datasets. In the diastolic images the border of the LV segmentation is in the area of a high gradient in the deformation field where small and large deformations meet. Due to the nearest neighbour interpolation used for resampling the segmentation masks, the mask for interpolated slices is either too small or too large with respect to the linearly interpolated image slices. An example is shown in Fig.3. However, to exclude an impact of the networks design we repeated the registration with switched input channels. The repeated experiments showed similar results and confirm the more accurate evaluation results from systolic to diastolic phase. Another point that contradicts a methodological failure is that for the lung datasets the better performance was distributed equally.
Table III gives results of our proposed algorithm alongside those reported in the literature for DL based and conventional registration algorithms. Summarised, our algorithm is able to keep up with the results in the literature. One has to note that in the literature the results are stated only for one registration direction. As it was not always clear which registration direction the authors reported, we averaged our results over both directions.
|Lung Datasets||Before Registration||Inhale to Exhale||Exhale to Inhale|
|Dataset||Dimensions||Time- bins||Voxelsize (mm)||LD (mm)||LD (mm)||LD (mm)|
|DirLab01||256 x 256 x 94||10||0.97 x 0.97 x 2.50||3.89 2.78||1.26 1.06||1.16 0.65|
|DirLab02||256 x 256 x 112||10||1.16 x 1.16 x 2.50||4.34 3.90||1.14 0.71||1.13 0.59|
|DirLab03||256 x 256 x 104||10||1.15 x 1.15 x 2.50||6.94 4.05||1.31 0.85||1.33 0.79|
|DirLab04||256 x 256 x 99||10||1.13 x 1.13 x 2.50||9.83 4.88||1.81 1.78||1.87 1.73|
|DirLab05||256 x 256 x 106||10||1.10 x 1.10 x 2.50||7.47 5.5||1.85 1.81||1.75 1.35|
|DirLab06||512 x 512 x 128||10||0.97 x 0.97 x 2.50||10.89 6.96||2.52 4.90||2.09 2.13|
|DirLab07||512 x 512 x 136||10||0.97 x 0.97 x 2.50||11.02 7.42||1.84 1.66||1.98 1.64|
|DirLab08||512 x 512 x 128||10||0.97 x 0.97 x 2.50||14.99 9.00||3.36 4.75||3.57 5.24|
|DirLab09||512 x 512 x 128||10||0.97 x 0.97 x 2.50||7.92 3.97||1.44 0.84||1.50 0.85|
|DirLab10||512 x 512 x 120||10||0.97 x 0.97 x 2.50||7.30 6.34||1.79 2.52||1.80 1.93|
|Popi01||512 x 512 x 141||10||0.98 x 0.98 x 2.00||5.90 2.73||1.04 0.46||1.15 0.84|
|Popi02||512 x 512 x 169||10||0.98 x 0.98 x 2.00||14.04 7.20||2.51 2.89||2.90 3.62|
|Popi03||512 x 512 x 170||10||0.88 x 0.88 x 2.00||7.67 5.05||1.43 1.52||1.38 1.56|
|Popi04||512 x 512 x 187||10||0.78 x 0.78 x 2.00||7.33 4.89||1.25 1.91||1.10 1.74|
|Popi05||512 x 512 x 139||10||1.17 x 1.17 x 2.00||7.09 5.08||1.32 1.01||1.27 0.92|
|Popi06||512 x 512 x 161||10||1.17 x 1.17 x 2.00||6.68 3.68||1.29 1.01||1.25 0.88|
|8.33 8.86||1.70 2.26||1.70 2.04|
|Cardiac Datasets||Before Registration||Systolic to Diastolic||Diastolic to Systolic|
|Dataset||Dimensions||Time- bins||Voxelsize (mm)||DSC||HD (mm)||ASSD (mm)||DSC||HD (mm)||ASSD (mm)||DSC||HD (mm)||ASSD (mm)|
|SC-HF-I-1||256 x 256 x 12||20||1.37 x 1.37 x 10.00||0.84||14.73||2.05||0.93||8.08||1.40||0.90||12.00||1.58|
|SC-HF-I-2||256 x 256 x 13||20||1.37 x 1.37 x 10.00||0.84||12.11||1.94||0.94||6.71||1.27||0.93||7.50||1.26|
|SC-HF-I-4||256 x 256 x 10||20||1.29 x 1.29 x 8.00||0.87||13.05||1.85||0.96||6.00||0.86||0.94||6.71||1.04|
|SC-HF-I-40||256 x 256 x 11||20||1.37 x 1.37 x 8.00||0.73||16.46||2.95||0.94||8.75||0.96||0.87||12.73||1.56|
|SC-HF-NI-3||256 x 256 x 12||20||1.37 x 1.37 x 8.00||0.90||11.68||1.25||0.96||4.74||0.82||0.94||7.65||0.99|
|SC-HF-NI-4||256 x 256 x 11||20||1.37 x 1.37 x 8.00||0.86||9.40||1.96||0.93||7.50||1.21||0.89||15.00||1.80|
|SC-HF-NI-34||256 x 256 x 13||20||1.37 x 1.37 x 10.00||0.78||12.30||2.17||0.92||10.50||1.41||0.89||13.08||1.57|
|SC-HF-NI-36||256 x 256 x 10||20||1.21 x 1.21 x 10.00||0.78||12.30||2.17||0.95||20.00||0.59||0.93||11.11||0.84|
|SC-HYP-1||256 x 256 x 9||20||1.37 x 1.37 x 10.00||0.53||20.05||5.13||0.84||15.00||2.48||0.68||16.50||3.02|
|SC-HYP-3||256 x 256 x 8||20||1.37 x 1.37 x 10.00||0.32||19.91||8.94||0.89||10.50||1.24||0.71||13.50||2.17|
|SC-HYP-38||256 x 256 x 12||20||1.37 x 1.37 x 10.00||0.71||12.60||2.47||0.88||7.79||1.68||0.61||16.84||3.02|
|SC-HYP-40||256 x 256 x 16||20||1.37 x 1.37 x 8.00||0.55||20.14||6.44||0.88||6.71||1.55||0.78||11.52||1.95|
|SC-N-2||256 x 256 x 9||20||1.37 x 1.37 x 8.00||0.54||21.96||6.98||0.93||6.00||0.90||0.78||10.06||1.74|
|SC-N-3||256 x 256 x 20||20||1.37 x 1.37 x 8.00||0.62||20.62||4.22||0.76||21.53||3.33||0.67||25.01||3.75|
|SC-N-40||256 x 256 x 12||20||1.25 x 1.25 x 10.00||0.56||17.29||4.72||0.80||15.37||2.63||0.66||17.62||3.02|
|Dataset||Proposed||De Voss 2019 ||Eppenhof 2018 ||Eppenhof 2018 ||Vishnevskiy 2017 ||Dataset||Proposed||Vandemeulebroucke 2011 |
|DirLab01||1.21 0.88||1.27 1.16||-||1.65 0.89||0.76 -||Popi01||1.09 0.68||0.96 0.57|
|DirLab02||1.13 0.65||1.20 1.12||1.24 0.61||2.26 1.16||0.77 -||Popi02||2.71 3.28||1.56 1.34|
|DirLab03||1.32 0.82||1.48 1.26||-||3.15 1.63||0.90 -||Popi03||1.40 1.54||1.53 1.70|
|DirLab04||1.84 1.76||2.09 1.93||1.70 1.00||4.24 2.69||1.24 -||Popi04||1.17 1.83||1.96 2.92|
|DirLab05||1.80 1.60||1.95 2.10||-||3.52 2.23||1.12 -||Popi05||1.30 0.97||1.48 1.39|
|DirLab06||2.30 3.78||5.16 7.09||-||3.19 1.50||0.8 -||Popi06||1.27 0.95||1.25 0.95|
|DirLab07||1.91 1.65||3.05 3.01||-||4.25 2.08||0.80 -||Average||1.49 1.59||1.46 1.65|
|DirLab08||3.47 5.00||6.48 5.37||-||9.03 5.08||1.34 -||Proposed||De Voss 2019 |
|DirLab09||1.47 0.85||2.10 1.66||1.61 0.82||3.85 1.86||0.92 -|
|DirLab10||1.79 2.24||2.09 2.24||-||5.07 2.31||0.82 -||Sunnybrook||0.86 0.10||0.89 0.14|
|Average||1.83 2.35||2.64 4.32||1.52 0.85||4.02 3.08||0.95 -||(cardiac)|
Analysis of the DVF with respect to anatomically implausible deformations showed that the FoF per image dataset was on average 0.25 % of the calculated deformation vectors. The average Jacobi determinant value was 1.00 0.32. A deeper inspection of the Jacobi determinant values revealed that folding occurs mainly in the area of large deformations close to the image or patch border where tissue can move out of or into the field of view.
The initial average landmark distance between consecutive time-bins was 2.25 1.55 mm for the Popi datasets and 2.19 2.02 mm for the DirLab datasets. The proposed registration algorithm could decrease the average distance to 1.24 0.90 mm and 1.54 1.31 mm, respectively. The construction of trajectories for voxels requires the combination of all DVF of a 4D dataset. As a consequence the registration error at the start of a trajectory gets propagated over time and increases or decreases with the combination of the different DVF. In Fig. 4 the error evolution is visualised for the three test datasets. At the beginning the mean registration error is 1.54 mm for the DirLab datasets and 1.24 mm for the Popi datasets. The error then increases to a mean of 2.58 mm for DirLab and 2.07 mm for Popi in the middle of the periodic motion. In the end of the breathing cycle the error falls to an error around 1 mm for both datasets. A similar error propagation pattern could be measured for the Sunnybrook datasets. For the Sunnybrook datasets segmentations are only available for the extreme phases, which allows only four comparisons per dataset. When starting with the end-diastolic phase a comparison in the middle of the cyclic motion to the end-systolic segmentation and in the end of the cycle motion to the original end-diastolic segmentation is possible. The same applies when starting with the end-systolic segmentation. In Fig. 4 the red dots indicate the measured DSC overlap, the red dotted line was fitted to the dots and estimates the DSC overlap for time-bins without a segmentation available. Compared to a direct registration of the maximum exhale and inhale phases the registration error increased on average by 0.48 mm with the combination of the DVF. The average registration error for the extreme phases in the Popi dataset increased from 1.46 1.59 mm to 1.98 1.56 mm when we registered the whole 4D dataset instead of the two extreme phases solely. The same analysis reported an increase from 1.83 2.35 mm to 2.54 2.01 mm for the DirLab images and and decrease in the DSC from 0.86 0.10 to 0.71 0.17 for the Sunnybrook data. The DVF of the 4D evaluation showed a FoF of 0.02 % and an average Jacobi determinant of 1.00 0.06.
When we used only the first two layers of our network for registration the performance decreased slightly but the gain in computation time was huge. The average registration error for registering 3D respiratory images was 1.84 1.88 mm compared to 1.70 2.15 mm with all 3 layers. The computation time for two 3D images was on average 4 minutes, for 4D datasets the average time needed for registration increased to 8 minutes. The peak in error propagation increased to an average value of 2.32 1.55 mm, 3.08 2.27 mm and 0.75 0.14 DSC for the Popi, DirLab and Sunnybrook datasets, respectively.
We proposed a new method to combine DL and classical medical image registration techniques to track organ motion on 3D and 4D medical image datasets. Compared to other methods our approach offers three main benefits. First, no training data is needed to achieve good results. Second, when the algorithm is applied to register two 3D datasets, it calculates the DVF and an approximation of its inverse simultaneously. Third, we demonstrated that our algorithm achieves a good performance independent of organ site or modality.
We showed that our algorithm can calculate the motion between two 3D images and adequately approximate the inverse. Also the periodic motion of the beating heart or the respiratory cycle could be reflected by our algorithm. When we compared the direct registration of the two extreme phases of a dataset with the registration of the whole 4D dataset, an increase of the registration error at the extreme phases could be seen. In our work we decided to register each image to its directly adjacent timely neighbour. Another approach would be to register each time-bin to a reference image. The reasons why we decided against the registration with a reference image are that our approach is more intuitive and directly calculates a trajectory for each voxel. Moreover we made the experience in the beginning of the work that the network is better in registering images with small deformations than in registering images with large deformations.
By evaluating the proposed algorithm with multiple 3D and 4D publicly available datasets we were able to show that the benefits of a DL architecture can be used without training data to achieve results as good as those reported in the literature with training data available. However, the results showed also that our algorithm is outperformed in 3D registration tasks by conventional image registration algorithms. One has to take into account that these algorithms make use of motion specific parameters and weights  whereas our approach is a more generic one without the need of motion, modality or organ specific information for registering two 3D images (only for 4D we assume periodicity).
The approach we used in this work is unsupervised but before a registration problem can be solved the influence of the different parts of the objective function needs to be tuned. This task requires expert knowledge and has a big impact on the accuracy. Due to simplicity we used one parameter set for all datasets. Although the general performance of our algorithm is very promising we made the experience that a different parameter setting would have increased the performance for some datasets. Here supervised methods definitely have an advantage.
One drawback of the presented registration algorithm is the computation time. Compared to DL based registration methods with an extra training phase the actual time needed to register one dataset is rather long (even with only 2 layers).
On grounds of our analysis we see two applications setting for our algorithm: When the training of network is not possible due to a lack of datasets or very heterogeneous datasets, which is often the case in small mono-centric studies in the field of radiotherapy that investigate a specific organ motion for treatment planning. Such studies usually have a low patient number as the acquisition of 4D images is time consuming, costly and implicates a higher radiation exposure in case of 4D CT. A second application scenario would be e.g. in the early stage of a study with a larger patient cohort. Datasets could be registered with our one shot approach until enough datasets are available to allow for an extra training phase. Subsequently the same architecture used for one shot registration could be trained on the existing datasets to enable a faster registration of upcoming datasets with similar accuracy. In a lifelong learning fashion , the network would learn over time and increase its performance with additional datasets.
In our future work a focus will be on an easier optimization of the objective function parameters. Especially in the case of a deep architecture the tuning of the meta parameters is very time consuming and unintuitive as it is hard to comprehend how a change in the parameters influences the network weights and the registration performance. Depending on the registration task at hand, different parameters are needed. For datasets with a large amount of shearing motion a lower weighting of the smoothness term could increase the registration quality. On the other hand for images with a homogeneous motion but with a lot of noise (like in some ultrasound images) a bigger influence of the smoothness term might improve the registration result. The use of a minimum spanning tree approach  in the loss calculation to tackle sheering motion or the classification of an image into motion classes could be possible options.
In this paper we presented a DL based algorithm to calculate DVF for periodic motion tracking with 3D and 4D medical image datasets without the need of any training phase. For 3D datasets the algorithm is able to calculate forward and inverse transformation simultaneously. The algorithm was tested thoroughly with multiple public datasets. Compared to existing state of the art algorithms, our approach showed a competitive performance. The main benefits of our algorithm are that it is generic and can be used for different modalities and body regions and that no training data is needed in advance. The promising results suggest a possible utilization in two different setting. First, as generic stand-alone algorithm for conventional image registration and motion tracking in e.g. radiotherapy treatment planning. Second, for studies focused on a specific organ and image modality the algorithm could be used without training phase until enough data is available and then be trained and thus specialised on a specific registration task to further increase registration accuracy.
We are thanking Richard Castillo, the Sunnybrook Health Sciences Centre as well as the Léon Bérard Cancer Center & CREATIS lab for sharing the datasets that we used in this work to evaluate the presented algorithm. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.
Disclosure of Conflicts of Interest: The authors have no relevant conflicts of interest to disclose.
M. Jaderberg, K. Simonyan, A. Zisserman, and k. kavukcuoglu, “Spatial transformer networks,” inAdvances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, Eds. Curran Associates, Inc., 2015, pp. 2017–2025.
K. A. J. Eppenhof and J. P. W. Pluim, “Pulmonary CT registration through supervised learning with convolutional neural networks,”IEEE Transactions on Medical Imaging, pp. 1–1, 2018.
A. V. Dalca, G. Balakrishnan, J. Guttag, and M. R. Sabuncu, “Unsupervised learning for fast probabilistic diffeomorphic registration,”Medical Image Computing and Computer Assisted Intervention – MICCAI 2018, Jan. 2018.
G. Wu, Q. Wang, J. Lian, and D. Shen, “Estimating the 4d respiratory lung motion by spatiotemporal registration and super-resolution image reconstruction,”Medical Physics, vol. 40, no. 3, p. 031710, feb 2013.