A Spatiotemporal Volumetric Interpolation Network for 4D Dynamic Medical Image

02/28/2020 ∙ by Yuyu Guo, et al. ∙ 12

Dynamic medical imaging is usually limited in application due to the large radiation doses and longer image scanning and reconstruction times. Existing methods attempt to reduce the dynamic sequence by interpolating the volumes between the acquired image volumes. However, these methods are limited to either 2D images and/or are unable to support large variations in the motion between the image volume sequences. In this paper, we present a spatiotemporal volumetric interpolation network (SVIN) designed for 4D dynamic medical images. SVIN introduces dual networks: first is the spatiotemporal motion network that leverages the 3D convolutional neural network (CNN) for unsupervised parametric volumetric registration to derive spatiotemporal motion field from two-image volumes; the second is the sequential volumetric interpolation network, which uses the derived motion field to interpolate image volumes, together with a new regression-based module to characterize the periodic motion cycles in functional organ structures. We also introduce an adaptive multi-scale architecture to capture the volumetric large anatomy motions. Experimental results demonstrated that our SVIN outperformed state-of-the-art temporal medical interpolation methods and natural video interpolation methods that have been extended to support volumetric images. Our ablation study further exemplified that our motion network was able to better represent the large functional motion compared with the state-of-the-art unsupervised medical registration methods.



There are no comments yet.


page 1

page 3

page 4

page 6

page 7

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Dynamic medical imaging modalities enable the examination of functional and mechanical properties of the human body and are used for clinical applications, e.g., four-dimensional (4D) computed tomography (CT) for respiratory organ motion modelling [32], 4D magnetic resonance (MR) imaging for functional heart analysis [8], and 4D ultrasound (US) for echocardiography analysis [39]. These 4D modalities have high spatial (volumetric) and temporal (time sequence) sampling rate to capture the periodic motion cycles of organ activities, and this information is used for clinical decision making. However, the acquisition of these dynamic images requires larger radiation doses which may cause harm to humans, and longer image scanning and reconstruction times; these factors limit the use of 4D imaging modalities to broader clinical applications [31, 9].

Figure 1: The cardiac motions in two-time phases: End-Systole (ES) and End-Diastole (ED). The red bounding boxes highlight the heart structure. All images showing transaxial views, cropped to enlarge the heart.

To mitigate these factors, reducing the temporal sampling has been widely employed but this compromises valuable temporal information [22, 14]. In these approaches, intermediary image volumes are interpolated from their adjacent volumes. Such interpolation methods are reliant on either non-rigid registration [5, 27, 39] or optical flow-based [29, 18]

algorithms. Non-rigid registration approaches calculate the dense image volume correspondences that occur from one volume to another, and then uses the calculated correspondences to generate the intermediary volumes. Such approaches, however, often generate artifacts or fuzzy boundaries and do not perform well when the variations in anatomy or organ activity (e.g., size and shape) are large. An alternative approach was to use optical flow-based methods (using deep learning)

[18, 37]

to estimate a dense motion (i.e., deformation) field between image pairs. However, these methods were limited to 2D image interpolation and therefore did not utilize the rich spatial information inherent in medical image volumes. They are also limited when the motion between the image sequences are not in linear trajectory and are not changing in a constant velocity. Therefore, these approaches are not applicable to volumetric temporal imaging modalities that exhibit large non-linear motions in spatiotemporal space.

In this paper we propose a spatiotemporal volumetric interpolation network (SVIN) designed for 4D dynamic medical images. To the best of our knowledge, this is the first deep learning-based method for 4D dynamic medical image interpolation. An overview of our model is illustrated in Fig. 2 which comprises of two main networks. Our first spatiotemporal motion network leverages the 3D convolutional neural network (CNN) for unsupervised parametric volumetric registration to derive spatiotemporal motion field from two-image volumes. In the second sequential volumetric interpolation network, the derived motion field is used to interpolate the image volume, together with a new regression-based module to characterize the periodic motion cycles in functional organ structures. We also propose an adaptive multi-scale architecture that learns the spatial and appearance deformation in multiple volumes to capture large motion characteristics. We demonstrate the application of our method on cardiac motion interpolation, which is acquired using both 4D CT and 4D MR images. These images are characterized by twisting action during contraction to relaxation of the heart structure, and has complex changes in muscle morphology, as depicted in Fig. 1. Our method was used to increase the temporal resolution in both the CT and MR image volumes. We evaluate our method in comparison to the state-of-the-art interpolation method. We further conducted an ablation study to demonstrate the effectiveness of our motion network.

2 Related Works

We partitioned the related works into three categories which we deemed relevant to our research: (1) Medical dynamic image interpolation; (2) spatiotemporal motion field calculation for medical image and (3) natural video interpolation approaches.

2.1 Dynamic medical image interpolation

Many existing medical image interpolation methods rely upon optical flow-based or non-rigid registration methods to generate a linearly interpolated image by averaging pixel values between the adjacent image sequences [5, 21, 29, 27, 39, 36]. For instance, Ehrhardt et al. [13] presented an optical flow-based method to establish spatial correspondence between adjacent slices for cardiac temporal image. Zhang et al. [39] used non-rigid registration-based method to synthesize echocardiography and cardiovascular MR image sequences. The main advantage of these approaches is that they track spatiotemporal motion field, in a pixel-wise manner, between the neighboring images to estimate the interpolation. However, their assumption limited the spatiotemporal motion between the adjacent images to be in a linear trajectory, and thus disregarded the complex, non-linear motions apparent in functional organ structures. Recently, there are two CNN based methods for temporal interpolation via motion field for MR images from Lin Zhang et al. [24] and Kim et al. [18]. They achieved outstanding performance compared with previous works. However, their method did not support full 3D volumetric information and did not perform well when there was large variations in motion.

2.2 Learning spatiotemporal motion fields from volume image sequence

Many studies used deformable medical image registration techniques to estimate the motion field between the input image sequences. The deformable medical image registration techniques can be divided into two parts: non-learning based [35, 1, 19, 4, 12] and learning-based methods [20, 38, 34]. The typical non-learning based approaches are free-form deformations with B-splines [1], Demons [35] and ANTs [2]

. These approaches optimize displacement vector fields by calculating the similarity of the topological structures. Deep learning-based methods, in recent years, used labelled data of spatiotemporal motion field and have shown great performances

[20, 38, 34]. However, their performance was dependent upon the availability of large-scale labelled data. To address this, several unsupervised methods were proposed to predict the spatiotemporal motion field [11, 23, 3]. Although these methods demonstrated promising results, [11] and [23] were only useful in patch-based volumes or in 2D slices. Jiang et al. [3] recently developed a CNN, VoxelMorph which used full 3D volumetric information. However, it was not designed for dynamic image sequences where it has large variations in motion.

Figure 2: An overview of the proposed method which contains a motion network and an interpolation network. An adaptive multi-scale architecure is used in both of motion and interpolation network to cover the large motion. A regression module is intergrated in our interpolation network to constrain the intermediated motion field.

2.3 Natural video interpolation approaches

Video interpolation is an active research task in natural scenes, e.g., model-based tracking, patch identification, and matching and framerate upsampling [16, 10, 26, 28]. Niklaus et al. [30] developed a spatially-adaptive convolution kernel to estimate the motion for each pixels. Liu et al. [25] divided the frame interpolation into two steps, optical flow estimation and image interpolation. Their network learnt an input pair of consecutive frames in an unsupervised manner and then refined the interpolation based on the outputs of the estimation. Jiang et al. [17] presented Slomo – a technique which interpolates frame motion by linearly combining bi-directional optical flows, and then further refining the estimated motion flow field through an end-to-end CNN. Recently, Peleg et al. [33] presented a multi-scale structured architecture neural network to better capture the local details from high resolutions frame. However, when considering the application of these methods to dynamic medical images interpolation, this is a challenging problem as the temporal sampling in medical image volume sequences are much lower than that of natural scene videos. In addition, the deformation and visual differences in dynamic medical images are comparatively more complex and non-trivial than natural scene videos.

3 Proposed Method

Let be a sequence of volumetric images representing the cardiac motion from end-diastole (ED) () to end-systole (ES) () phase, and let be a pair of cardiac images indicating two random time points within the cardiac motion. Our aim is to interpolate the intermediate image . For this work, we used images at ED (denote as ) and ES (denote as ) phase to interpolate the complete the cardiac motion. denotes the motion field between and in bi-directions.

Figure 3: The architecture of our spatiotemporal volumetric motion network with an adaptive multi-scale architecture.

Fig. 2 shows the overall proposed method. Initially, spatiotemporal motion network was used to learn and capture bi-directional motion fields between and in an unsupervised manner. Two linearly interpolated intermediate images were then coarsely created using the learned spatiotemporal motion fields, and . Using the coarsely interpolated intermediate images and their corresponding deformation fields, we further refined the coarse intermediate images by the volumetric interpolation network, where we used a regression-based module to constrain the interpolation to follow the patterns of cardiac biological motion. Specifically, both of our volumetric motion estimation and interpolation network are using an adaptive multi-scale architecture which enables to capture various types motions - both small and large volume spatiotemporal deformations (see Fig. 2 and 3).

3.1 Spatiotemporal volumetric motion field estimation

Fig. 3 presents the architecture of 3D CNN for spatiotempopral motion field estimation. We estimate a motion field that can represent the voxel-wise motion flow of volume images at two individual time points. This can be represented as a function , where indicates the vectors that represent the movement in 3D space and are the learnable parameters of the network. We used an encoder-decoder architecture with skip connections for generating by given and .

In order to produce a volumetric motion field that can cover various types of deformations, we propose an adaptive multi-scale architecture that embeded both a global and a local learning. More specifically, for global learning, our motion field estimation network focuses on large deformation while the volumetric images in a low-scale level would ignore the local details while more detailed information will be covered when the volumetric image in a high-scale. In addition, the global deformation from low-scale is integrated to high-scale, which reduces the difficulty of the network for learning and constrains the network to pay more attention to detailed deformation. Our deformation field can be defined as:


where representing the warped image by the spatial vector field with bilinear interpolation.

For training our motion field estimation network, we used an image-wise similarity loss and a motion field smoothness regularization loss with an adaptive multi-scale network architecture (as shown in Fig. 3). Given the network output , where denotes the volumetric images at different scales (we used 3 different scales in total), we define a motion field smoothness regularization loss as:


Where is the gradient operator. The image-wise similarity loss was leveraged from VoxelMorph [3] and this can be defined as:


3.2 Sequential volumetric interpolation network

Based on the derived deformation fields and , we used used linear interpolation approach to synthesize the intermediate deformation fields, as shown in follows:


Based on Eqs. 3 and 4, the linear interpolation based deformation field for can be approximated as:


To improve the consistency in bi-directions, Eq. 3 and 4 can be modified to as follows:


In addition, we introduce a hyper-weight map to balance the importance of using deformation from the bi-directions (forward and backward directions) and this can be defined as:


Thus, the linear image interpolation based on bi-directional deformation and can be defined as:


As examplified in right-side of Fig. 2, we used an adaptive multi-scale network architecture to ensure the synthesized intermediate volumetric images will have a high spatial-temporal resolution.

3.3 Regression-based module for interpolation constraints

Since most biological movements have a relatively fixed motion pattern, especially in cardiac motion [7], we present a regression-based module to model the relationship between cardiac motion of the cardiac cycle and time phase (as shown in Fig. 4). Specifically, we attempted to build a regression model representing the population-based cardiac motion vector which indicate the shape variability at individual time point. The population-based cardiac motions at individual time point was then used to constrain the appearance of the synthetic intermediate volumetric images. Our regression estimation at time point is defined as:

Figure 4: Illustration of left ventricle (LV) volume changing during the cardiac contraction period. The brown curve shows the real motion flow of LV, and blue hidden line shows the simple linear assumption. The blue points and green points represent the intermediate time points.

3.4 Training details for volumetric interpolation

For training our sequential volumetric interpolation network, our loss function is defined as a sum of an image-wise similarity loss

, a regression loss and a regulation loss :


where image-wise similarity loss is used to evaluate the similarity of the predicted synthetic intermediate images and the real intermediate images at multiple image scales and is defined as:


where represents a 3-scales volumetric image loss. represents the real intermediate volumetric images and represents the predicted synthetic intermediate volumetric images. The regression loss is defined as the appearance difference at individual time point:


Regularization loss is used to constrain the predicted motions to be consistent in bi-directions and is defined as:


The weights have been set empirically using a validation set.

4 Experiments

4.1 Materials and implementation details

We demonstrate our method with two datasets: 4D Cardiac CT (4D-C-CT), and ACDC (4D-MR cardiac cine or tagged MR imaging) [6]. Fig. 5

shows a snapshot of randomly sampled cardiac sequence volume slices. The 4D-C-CT dataset consists of 18 patient data, each having 5 time points (image volumes) from ED to ES. Image volume is characterized by a high-resolution ranging from 0.32 to 0.45 mm in intra-slice (x- and y-resolutions) and from 0.37mm to 0.82mm in inter-slice (z-resolution). The ACDC dataset contains 100 patients data. On average, each patient has 10.93 time points from ED to ES and it has an imaging resolution from 1.37 to 1.68 mm in x- and y-resolution and 5 to 10 mm in z-resolution. All scans of 4D-C-CT were resampled to a 128x128x96 grid and crop resulting images to 96x96x96. For ACDC dataset, we resampled all scans to 160x160x10. We pad ACDC data in z-axis by 0, increasing its size to 160x160x12 to reduce the border effects of 3D convolution. We randomly selected 80 training / 20 testing patient data and applied contrast-normalization to both of datasets, consistent to other similar researches


We implemented all the networks using Pytorch library and was trained on two 11GB Nvidia 1080Ti GPUs. All models were trained with a learning rate of 0.0001. In all our evaluations, we used 3-fold cross-validation on both the datasets.

Figure 5: A snapshot of our training data

4.2 Evaluation and metrics

In order to evaluate the two networks in our SVIN, we conducted an ablation study. For the unsupervised spatiotemporal motion network, we compared it with state-of-the-art CNN based deformable medical image registration – VoxelMorph [3]. For the interpolation network, state-of-the-art image interpolation methods were used in the comparison including (i) RVLI [39] – registration-based volume linear interpolation for medical images, (ii) MFIN [24] – CNN-based medical image interpolation (2D slice-based), and (iii) Slomo – natural video interpolation [17] in 2D as well as its extension to work on medical image volumes (3D-Slomo). For image volume interpolation, we interpolated 3 intermediate volumes in between the ED-ES frames (see Fig 5), evenly distributed across the time points.

We used the standard image interpolation evaluation metrics including Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), Mean Squared Error, and Normalized Root Mean Square Error (NRMSE). We used the same evaluation metrics for the spatiotemporal motion field estimation, consistent to other medical image registration approaches

[24]. In addition, we further used Dice Similarity Coefficient (DSC) to measure the usefulness of our interpolation in medical imaging applications.

5 Results and Discussion

Figure 6: Comparison of spatiotemporal volumetric motion estimation results. The intensity image is warped from estimated spatiotemporal motion field. The red curve represents the real segmentation results while the green color shows the warped segmentation results (see the yellow arrows indicated). The red arrows indicate some organ boundaries.

5.1 Ablation study spatiotemporal volumetric motion field estimation

VoxelMorph 0.787 27.10 0.276 0.807 0.880
Ours 0.197 33.17 0.138 0.918 0.944
Table 1: The performance of spatiotemporal motion field estimation on 4D-C-CT dataset.
VoxelMorph 0.194 38.06 0.132 0.912 0.920
Ours 0.168 38.93 0.121 0.914 0.936
Table 2: The performance of spatiotemporal motion field estimation on ACDC dataset.
Figure 7: Visual results of two samples from 4D-C-CT. The first left column shows the paired-input volumes (ED and ES) and the last right column shows the real intermediate volume. The rest columns show the interpolated intermediary volumes of different approaches.

The results of motion field estimation on two datasets - 4D-C-CT and ACDC are shown in Table 1 and 2. Our results show that motion estimation network with our adaptive architecture outperforms the recent VoxelMorph [3] across all metrics on 4D-C-CT dataset, achieving the PSNR score of 33.176, NRMSE of 0.1388, SSIM of 0.9185 and MSE of 0.00197. Similarly, it also had better scores across all metrics on ACDC dataset. Our motion estimation architecture had higher improvements on 4D-C-CT dataset than that of ACDC dataset relative to VoxelMorph. We attribute this to our robust multi-scale adaptive 3D CNN which can effectively learn both large and small variations in motion.

Fig. 6 shows the synthesized volumes based on the derived motion field and their corresponding warped segmentation results. It clearly shows that the warped segmentation results from the motion field learnt by our motion architecture is more similar to the ground truth.

5.2 Comparison with the state-of-the-art interpolation methods

Table 3 and 4 represent the interpolation results of different time points from ED to ES on 4D-C-CT and ACDC datasets, respectively. As expected, results show that the intermediate volumes that are in later time points had better performances. This is due to the fact that the earlier time points have larger motion variations, which contributed to its lower accuracy.

1st-point 0.45 29.45 0.211 0.830
2nd-point 0.43 29.47 0.210 0.825
3rd-point 0.28 31.52 0.165 0.863
Table 3: Multi-volume cardiac sequence interpolation results on the 4D-C-CT dataset.
1st-point 1.22 39.34 0.109 0.934
2nd-point 0.95 40.42 0.087 0.950
3rd-point 0.28 45.86 0.052 0.977
Table 4: Multi-volume cardiac sequence interpolation results on the ACDC dataset.

The comparative quantitative results for volume interpolation are shown in Table 5 and 6. SVIN outperformed all other state-of-the-art interpolation method on 4D-C-CT dataset across all measures. Similarly, it also had the best scores across all metrics on the ACDC dataset. We attribute this to our adaptive multi-scale architecture capturing the variant type of motions and regression-based module which effectively constrains the intermediate volumetric motions and learn relevant inherent functional motion patterns (see Fig. 7 and 8). Our results show that the RVLI was the closest to our results. However, the RVLI was not able to accurately interpolate the volumes when there were artifacts as evident in Fig. 7 and 8. MFIN and Slomo also did not consider full 3D volumetric information, i.e., limited to 2D space, which contributed to its lower scores. As expected, our implemented 3D-Slomo produced a better result relative to the 2D methods. The 3D-Slomo, however, was not able to accurately synthesize the clear organ boundary and estimate the motion trajectory when there are large changes of cardiac activities (see Fig. 7).

Figure 8: Visual results of two samples from ACDC. The first left column shows the paired-input volumes (ED and ES) and the last right column shows the real intermediate volume. The rest columns show the interpolated intermediary volumes of different approaches.
MFIN 1.06 26.84 0.308 0.709 0.844
Slomo 1.13 26.52 0.308 0.704 0.839
3D-Slomo 0.92 26.33 0.303 0.713 0.872
RVLI 0.54 28.70 0.237 0.806 -
Ours 0.39 30.15 0.196 0.840 0.917
Table 5: Performance comparisons on the 4D-C-CT dataset.
MFIN 1.082 30.69 0.309 0.607
Slomo 1.001 31.08 0.296 0.630
3D-Slomo 0.341 35.27 0.178 0.845
RVLI 0.331 35.66 0.173 0.860
Ours 0.081 41.87 0.085 0.953
Table 6: Performance comparisons on the ACDC dataset.

6 Conclusion

6.1 Summary

We presented a novel interpolation method for 4D dynamic medical images. Our proposed two-stage network was designed to exploit the volumetric medical images that exhibit large variations between the motion sequences. Experimental results demonstrated that our SVIN outperformed state-of-the-art temporal medical interpolation methods and natural video interpolation methods that has been extended to support volumetric images. Our ablation study further exemplified that our motion network with our adaptive multi-scale architecture was able to better represent the large functional motion compared with the state-of-the-art unsupervised medical registration methods.

6.2 Extensions implementation

In Section 4, we discussed our general multi-scale architecture for learning the spatial appearance volume in different scales to retain the spatial information for volume synthesis. Rather than learning a spatial transform model, in the future we will implement our architecture in other volume synthesis task.

We leverage a regression based constrain module to explore the potential rule of functional motion. This could be extended to the other 4D volumetric task. Also, this part can be further optimized to constrain the interpolation.

Although we demonstrated our SVIN model on cardiac imaging modality, there is no restriction of our method to be applied to other dynamic images. We suggest that our method is broadly applicable to other medical, and non-medical image volume interpolation problems where the motion field can be modelled.


  • [1] J. Ashburner (2007) A fast diffeomorphic image registration algorithm. Neuroimage 38 (1), pp. 95–113. Cited by: §2.2.
  • [2] B. B. Avants, N. J. Tustison, G. Song, P. A. Cook, A. Klein, and J. C. Gee (2011) A reproducible evaluation of ants similarity metric performance in brain image registration. Neuroimage 54 (3), pp. 2033–2044. Cited by: §2.2.
  • [3] G. Balakrishnan, A. Zhao, M. R. Sabuncu, J. Guttag, and A. V. Dalca (2018)

    An unsupervised learning model for deformable medical image registration


    Proceedings of the IEEE conference on computer vision and pattern recognition

    pp. 9252–9260. Cited by: §2.2, §3.1, §4.2, §5.1.
  • [4] S. K. Balci, P. Golland, M. E. Shenton, and W. M. Wells (2007) Free-form b-spline deformation model for groupwise registration. Med Image Comput Comput Assist Interv. Cited by: §2.2.
  • [5] C. F. Baumgartner, C. Kolbitsch, J. R. McClelland, D. Rueckert, and A. P. King (2013) Groupwise simultaneous manifold alignment for high-resolution dynamic mr imaging of respiratory motion. In International Conference on Information Processing in Medical Imaging, pp. 232–243. Cited by: §1, §2.1.
  • [6] O. Bernard, A. Lalande, C. Zotti, F. Cervenansky, X. Yang, P. Heng, I. Cetin, K. Lekadir, O. Camara, M. A. G. Ballester, et al. (2018) Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: is the problem solved?. IEEE transactions on medical imaging 37 (11), pp. 2514–2525. Cited by: §4.1.
  • [7] B. Biffi, J. L. Bruse, M. A. Zuluaga, H. N. Ntsinjana, A. M. Taylor, and S. Schievano (2017) Investigating cardiac motion patterns using synthetic high-resolution 3d cardiovascular magnetic resonance images and statistical shape analysis. Frontiers in pediatrics 5, pp. 34. Cited by: §3.3.
  • [8] A. Bornstedt, E. Nagel, S. Schalla, B. Schnackenburg, C. Klein, and E. Fleck (2001) Multi-slice dynamic imaging: complete functional cardiac mr examination within 15 seconds. Journal of Magnetic Resonance Imaging: An Official Journal of the International Society for Magnetic Resonance in Medicine 14 (3), pp. 300–305. Cited by: §1.
  • [9] F. Canè, B. Verhegghe, M. De Beule, P. B. Bertrand, R. J. Van der Geest, P. Segers, and G. De Santis (2018) From 4d medical images (ct, mri, and ultrasound) to 4d structured mesh models of the left ventricular endocardium for patient-specific simulations. BioMed research international 2018. Cited by: §1.
  • [10] B. Choi, J. Han, C. Kim, and S. Ko (2007) Motion-compensated frame interpolation using bilateral motion estimation and adaptive overlapped block motion compensation. IEEE Transactions on Circuits and Systems for Video Technology 17 (4), pp. 407–416. Cited by: §2.3.
  • [11] B. D. de Vos, F. F. Berendsen, M. A. Viergever, M. Staring, and I. Išgum (2017) End-to-end unsupervised deformable image registration with a convolutional neural network. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pp. 204–212. Cited by: §2.2.
  • [12] I. L. Dryden (2014) Shape analysis. Wiley StatsRef: Statistics Reference Online. Cited by: §2.2.
  • [13] J. Ehrhardt, D. Säring, and H. Handels (2006) Optical flow based interpolation of temporal image sequences. In Medical Imaging 2006: Image Processing, Vol. 6144, pp. 61442K. Cited by: §2.1.
  • [14] K. G. Hollingsworth (2015) Reducing acquisition time in clinical mri by data undersampling and compressed sensing reconstruction. Physics in Medicine & Biology 60 (21), pp. R297. Cited by: §1.
  • [15] Y. Jang, Y. Hong, S. Ha, S. Kim, and H. Chang (2017) Automatic segmentation of lv and rv in cardiac mri. In International Workshop on Statistical Atlases and Computational Models of the Heart, pp. 161–169. Cited by: §4.1.
  • [16] B. Jeon, G. Lee, S. Lee, and R. Park (2003) Coarse-to-fine frame interpolation for frame rate up-conversion using pyramid structure. IEEE Transactions on Consumer Electronics 49 (3), pp. 499–508. Cited by: §2.3.
  • [17] H. Jiang, D. Sun, V. Jampani, M. Yang, E. Learned-Miller, and J. Kautz (2018) Super slomo: high quality estimation of multiple intermediate frames for video interpolation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9000–9008. Cited by: §2.3, §4.2.
  • [18] N. Karani, C. Tanner, S. Kozerke, and E. Konukoglu (2017) Temporal interpolation of abdominal mris acquired during free-breathing. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 359–367. Cited by: §1, §2.1.
  • [19] A. Klein, J. Andersson, B. A. Ardekani, J. Ashburner, B. Avants, M. Chiang, G. E. Christensen, D. L. Collins, J. Gee, P. Hellier, et al. (2009) Evaluation of 14 nonlinear deformation algorithms applied to human brain mri registration. Neuroimage 46 (3), pp. 786–802. Cited by: §2.2.
  • [20] J. Krebs, T. Mansi, H. Delingette, L. Zhang, F. C. Ghesu, S. Miao, A. K. Maier, N. Ayache, R. Liao, and A. Kamen (2017) Robust non-rigid registration through agent-based action learning. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 344–352. Cited by: §2.2.
  • [21] G. Lee, R. Park, Y. Song, C. Kim, and J. Hwang (2003) Real-time 3d ultrasound fetal image enhancment techniques using motion-compensated frame rate up-conversion. In Medical Imaging 2003: Ultrasonic Imaging and Signal Processing, Vol. 5035, pp. 375–385. Cited by: §2.1.
  • [22] G. Li, D. Citrin, K. Camphausen, B. Mueller, C. Burman, B. Mychalczak, R. W. Miller, and Y. Song (2008) Advances in 4d medical imaging and 4d radiation therapy. Technology in cancer research & treatment 7 (1), pp. 67–81. Cited by: §1.
  • [23] H. Li and Y. Fan (2017) Non-rigid image registration using fully convolutional networks with deep self-supervision. arXiv preprint arXiv:1709.00799. Cited by: §2.2.
  • [24] Z. Lin, K. Neerav, T. Christine, and K. Ender (2018) Temporal interpolation via motion field prediction. Cited by: §2.1, §4.2, §4.2.
  • [25] Z. Liu, R. A. Yeh, X. Tang, Y. Liu, and A. Agarwala (2017) Video frame synthesis using deep voxel flow. In Proceedings of the IEEE International Conference on Computer Vision, pp. 4463–4471. Cited by: §2.3.
  • [26] G. Long, L. Kneip, J. M. Alvarez, H. Li, X. Zhang, and Q. Yu (2016) Learning image matching by simply watching video. In European Conference on Computer Vision, pp. 434–450. Cited by: §2.3.
  • [27] C. T. Metz, S. Klein, M. Schaap, T. van Walsum, and W. J. Niessen (2011) Nonrigid registration of dynamic medical imaging data using nd+ t b-splines and a groupwise optimization approach. Medical image analysis 15 (2), pp. 238–249. Cited by: §1, §2.1.
  • [28] S. Meyer, A. Djelouah, B. McWilliams, A. Sorkine-Hornung, M. Gross, and C. Schroers (2018) Phasenet for video frame interpolation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 498–507. Cited by: §2.3.
  • [29] T. Nam, R. Park, and J. Yun (2006) Optical flow based frame interpolation of ultrasound images. In International Conference Image Analysis and Recognition, pp. 792–803. Cited by: §1, §2.1.
  • [30] S. Niklaus, L. Mai, and F. Liu (2017) Video frame interpolation via adaptive separable convolution. In Proceedings of the IEEE International Conference on Computer Vision, pp. 261–270. Cited by: §2.3.
  • [31] Y. Ohno, H. Hatabu, D. Takenaka, S. Adachi, M. Kono, and K. Sugimura (2002) Solitary pulmonary nodules: potential role of dynamic mr imaging in management—initial experience. Radiology 224 (2), pp. 503–511. Cited by: §1.
  • [32] T. Pan, T. Lee, E. Rietzel, and G. T. Chen (2004) 4D-ct imaging of a volume influenced by respiratory motion on multi-slice ct. Medical physics 31 (2), pp. 333–340. Cited by: §1.
  • [33] T. Peleg, P. Szekely, D. Sabo, and O. Sendik (2019) IM-net for high resolution video frame interpolation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2398–2407. Cited by: §2.3.
  • [34] H. Sokooti, B. de Vos, F. Berendsen, B. P. Lelieveldt, I. Išgum, and M. Staring (2017) Nonrigid image registration using multi-scale 3d convolutional neural networks. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 232–239. Cited by: §2.2.
  • [35] J. Thirion (1998) Image matching as a diffusion process: an analogy with maxwell’s demons. Medical image analysis 2 (3), pp. 243–260. Cited by: §2.2.
  • [36] E. Tryggestad, A. Flammang, S. Han-Oh, R. Hales, J. Herman, T. McNutt, T. Roland, S. M. Shea, and J. Wong (2013) Respiration-based sorting of dynamic mri to derive representative 4d-mri for radiotherapy planning. Medical physics 40 (5), pp. 051909. Cited by: §2.1.
  • [37] W. Yan, Y. Wang, Z. Li, R. J. Van Der Geest, and Q. Tao (2018) Left ventricle segmentation via optical-flow-net from short-axis cine mri: preserving the temporal coherence of cardiac motion. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 613–621. Cited by: §1.
  • [38] X. Yang, R. Kwitt, M. Styner, and M. Niethammer (2017) Quicksilver: fast predictive image registration–a deep learning approach. NeuroImage 158, pp. 378–396. Cited by: §2.2.
  • [39] W. Zhang, J. M. Brady, H. Becher, and J. A. Noble (2011) Spatio-temporal (2d+ t) non-rigid registration of real-time 3d echocardiography and cardiovascular mr image sequences. Physics in Medicine & Biology 56 (5), pp. 1341. Cited by: §1, §1, §2.1, §4.2.