Convolutional Neural Networks vs. Deformable Image Registration For Medical Slice Interpolation

04/28/2020
by   Dilip Kumar Verma, et al.
0

Medical image slice interpolation is an active field of research. The methods for this task can be categorized into two broad groups: intensity-based and object-based interpolation methods. While intensity-based methods are generally easier to perform and less computationally expensive, object-based methods are capable of producing more accurate results and account for deformable changes in the objects within the slices. In this paper, performance of two well-known object-based interpolation methods is analyzed and compared. Here, a deformable registration-based method specifically designed for medical applications and a learning-based method, trained for video frame interpolation, are considered. While the deformable registration-based technique is capable of accurate modeling of the changes in the shapes of the objects within slices, the learning-based method is able to produce results with similar accuracy, but with a much sharper appearance in a fraction of the time. This is despite the fact that the learning-based approach is not trained on medical images and rather is trained using regular video footage. However, experiments show that the method is capable of accurate slice interpolation results.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 4

02/05/2014

An Optimization Method For Slice Interpolation Of Medical Images

Slice interpolation is a fast growing field in medical image processing....
01/24/2021

FlowReg: Fast Deformable Unsupervised Medical Image Registration using Optical Flow

We propose FlowReg, a deep learning-based framework for unsupervised ima...
03/21/2022

Slice Imputation: Intermediate Slice Interpolation for Anisotropic 3D Medical Image Segmentation

We introduce a novel frame-interpolation-based method for slice imputati...
08/07/2021

Deformable Image Registration using Neural ODEs

Deformable image registration, aiming to find spatial correspondence bet...
08/05/2021

RockGPT: Reconstructing three-dimensional digital rocks from single two-dimensional slice from the perspective of video generation

Random reconstruction of three-dimensional (3D) digital rocks from two-d...
10/04/2021

Light-weight Deformable Registration using Adversarial Learning with Distilling Knowledge

Deformable registration is a crucial step in many medical procedures suc...
04/13/2020

Accelerating B-spline Interpolation on GPUs: Application to Medical Image Registration

Background and Objective. B-spline interpolation (BSI) is a popular tech...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Image interpolation and super-resolution as a common research area has been widely studied in various branches of image processing. Using the available image data, image interpolation/super-resolution techniques aim to enhance the resolution of the data. Such resolution enhancement can be done in both spatial domain as well as temporal domain for image sequences. Image interpolation, more specifically slice interpolation, has found wide-spread use in biomedical applications to enhance the resolution of the data acquired using biomedical imaging modalities such as CT, MRI, etc. This practice is especially useful for visualization and analysis purposes of biomedical data. In such modalities, a sequence of 2D image are acquired from the patient. However, given the limitations of the imaging systems, usually the resolution is not symmetric in all three coordinate directions as the through-plane resolution of the acquired data is significantly lower than the in-plane resolution. Such discrepancy results in step-shaped iso-surfaces and discontinuity in structures in 3D reconstructed models. Therefore, it is necessary to develop image interpolation techniques to overcome these limitations.

Generally, slice interpolation techniques can be divided into two groups: intensity-based interpolation, and object-based interpolation. In the first group, the interpolated slices are computed directly from the intensity information of the available data, without considering the shape information of the objects contained in the images. Nearest-neighbor, linear and cubic spline interpolation methods are examples of techniques used in this group. The simplicity of such interpolation techniques results in very low computational complexity, which makes them highly popular in visualization applications. However, for analysis purposes given that the results produced by these techniques suffer from blurring artifacts on object boundaries, they are not recommended.

In object-based methods, the shape information of the objects contained in the images are taken into account to guide the interpolation into more accurate representations of the interpolated slices. Examples of such methods can be seen in [8, 10, 9, 14, 13]. Goshtasby et al. used a gradient magnitude based approach to find the corresponding points between consecutive slices and then applied linear interpolation to compute the interpolated slices [8]. The proposed method is proven useful when the shape difference is small as the search domain is limited to small neighborhoods. However, such assumption is not generally true in practical applications. To overcome the limitations, other techniques such as column fitting interpolation [10], shape-based interpolation [9], morphology-based interpolation [14], and feature-guided interpolation [13]

have been proposed. To find pixel-wise correspondence between pixels of images, registration and/or optical flow estimation methods can be employed. Such correspondence can be formulated by taking into account intensity or feature information of images

[3, 1, 2]. A relatively new group of techniques for slice interpolation use image registration as the main building block to estimate the changes in shape of the objects contained in the available slices [19, 7, 15, 4, 11]. In this group, deformable (non-rigid) image registration serves as means to estimate the pixel-wise correspondence between the consecutive slices.

With the introduction of deep Convolutional Neural Networks (CNNs) the field of image processing is completely transformed. In supervised CNNs, after setting the hyperparameters of the network (such as number of layers, structure of layers, evaluation functions etc.) the network is trained by introducing sets of data and the parameters of the network are trained to reduce the amount of error between the network’s output and the ground truth. Examples of such techniques can be found in 3D slice interpolation of medical images as well. Chen et al

[6] proposed a 3D Densely Connected Super-Resolution Network (DCSRN) for slice interpolation of medical images. Peng et al [18] on the other hand proposed to use a 2D CNN for interpolating anisotropic brain MRI data to enhance the lower resolution along the through-plane direction. The proposed method takes advantage of CNN-based data fusion and refinement to achieve the final results. In the work of Kudo et al [12] use of conditional Generative Adversarial Networks (GANs) is explored for super-resolution of CT images

In this paper, we explore use of CNN-based frame interpolation techniques for slice interpolation of biomedical images. More specifically, we aim to analysis the performance of adaptive separable convolutions for video frame interpolation proposed in the work of Niklaus et al [17] and compare it with the registration-based approach proposed by Leng et al [15]

. One major challenge in using deep learning-based methods in biomedical application is the limitation in access to abundance of data required to train the networks. Given this, we aim to see whether the pre-trained video frame interpolation technique which is trained on regular video footage without any human annotation can be used for biomedical slice interpolation.

The rest of the paper is organized as follows. In section 2, an overview of both video frame interpolation and registration-based slice interpolation methods is provided. Section 3 contains experiments using biomedical volume data and quantitative and qualitative performance comparisons are presented. Section 4 concludes the paper.

Ii Methods

Ii-a Registration-Based Slice Interpolation

The slice interpolation proposed in [15] works based on multi-resolution deformable image registration. The registration model proposed in the approach is as follows. Given two input images and , they are represented as continuous functions (X) and (X), in which defined in the domain using bilinear interpolation. The model aims to compute two displacement maps, and that minimize the following energy function:

(1)

In this equation, and are first-order derivatives of the with respect to and , respectively. The first term in the energy function is the fidelity term which aims to minimize the difference between the two deformed input images. In this formulation, bi-directional matching is used for improved performance. Also to balance the mismatch endurance of the squared intensity difference (SSD) in low-intensity and high-intensity regions, a modified SSD is employed with as a positive constant set empirically. Because of the ill-posedness of the registration problem, the energy function needs to be regularized by introducing smoothing functions. Here the first-order regularization and the area regularization terms are used respectively. These terms’ contributions in the final energy function are weighted empirically according to given images. The registration model is minimized using a geometric flow-based method. After the derivation of the registration model, the in-between slices are computed using linear or cubic spline interpolation. The avid reader is referred to the original paper [15] for more details on the implementation.

Ii-B Convolutional Neural Network-Based Frame Interpolation

The CNN-based adaptive separable convolution method proposed in [17] is based on a previous work by the same authors [16], that aims to reduce the computational complexity of the former approach. As is common in many image processing applications, if the higher-dimensional kernels can be estimated in separable forms, the computational complexity of the process is reduced significantly, which leads to lower computational time for both training and inference.

The goal of the CNN-based video frame interpolation method is to synthesize a frame in-between the two input frames and . For each output pixel in the interpolated frame, a pair of 2D convolution kernels are estimated, and , to compute the intensity value of the interpolated pixel as:

(2)

In this formulation, and are patches centered at location in the input frames and is the convolution operator. The kernels are estimated to represent both displacement and re-sampling information for the interpolation procedure. Given the high computational complexity of estimating these 2D kernels for large displacements, the aim is to estimate a pair of 1D kernels, for both the horizontal and vertical directions, for each 2D kernel as and where:

(3)

The neural network architecture consists of a contracting component (encoder) for feature extraction of the two input frames and an expanding component (decoder) to perform upsampling and dense prediction. Skip connections are incorporated in the network to connect the layers from the contracting component to the layers of the expanding component. The last expanding layer is then connected to four sub-networks, each estimating one of the four required 1D kernels. For the layers in the contracting component, stacks of

convolution kernels with Rectified Linear Units (ReLU) combined with average pooling are used. As for the layers in the expanding component, use of bilinear interpolation is considered.

A combination of two loss functions is used for training of the network. The first is based on the

norm of the intensity differences between the interpolated image and the ground truth. The second one, noted as perceptual loss, is based on the norm of the high-level feature differences between the interpolated image and the ground truth:

(4)

where is the feature extraction function. It is reported in [17] that the relu4_4 layer of the VGG-19 network [20] is used for feature extraction.

The training is done by randomly selecting 250,000 data samples, each containing patches from high-quality YouTube videos that contain sufficiently large motions. Random data augmentation is done on the fly. As for the kernel sizes, kernels of size 51 are used. For more details on the implementation and training of the network, the reader is referred to the original work [17].

Iii Results And Discussions

To assess the performance of the two methods for slice interpolation of biomedical volume images, two sets of data are used here. For the first set, a sequence of chest CT images is considered [15]. The chest CT sequence consists of 69 slices with size

. The even slices are removed and then interpolated by odd slices using the two registration and CNN based methods. The second data set (RESECT) is a series of brain MRI images consisting of 391 slices of size

[22]. Similar to the chest sequence, here, the even slices are removed and the odd slices are used for interpolation of the missing slices.

Three subjective/objective metrics are used for performance assessment: Peak Signal to Noise Ratio (PSNR), Structural SIMilarity (SSIM)

[21], and Brenner Sharpness [5]. While PSNR and SSIM measure the performance with respect to the ground truth, the Brenner Sharpness belongs to the class of blind quality assessment methods since it measures the sharpness of the interpolation results independent of the ground truth.

To compute the PSNR, first we need to compute the Mean Squared Error (MSE) between the ground truth and the interpolation result. Assuming and as the ground truth and estimated images respectively, the MSE can be defined as:

(5)

where and are the pixel of the ground truth and estimated images respectively, and is the total number of pixels. Having the MSE, PSNR can be defined as:

(6)

where is the dynamic range of pixel intensities in the images.

For the SSIM, three different components play significant roles: luminance, contrast ratio and structure. The simplified equation for SSIM can be written as [21]:

(7)

where and are the averages and and

are the variances of the ground truth and the estimated image, respectively while the

is the covariance value. and are constants defined as and .

For the Brenner Sharpness measure, the squared sum of the image’s first derivatives in both horizontal and vertical directions is calculated [5].

Fig. 1 shows the average performance comparison of slice interpolation for the two registration-based and CNN-based interpolation methods for the chest CT sequence. From left to right, PSNR, SSIM and Brenner Sharpness are depicted, respectively. For this dataset, the CNN-based method’s average performance metrics are 26.45, 0.9088, 1940 while the performance metrics for the registration-based method are 26.85, 0.9090, 1500, for PSNR, SSIM and Brenner Sharpness, respectively. While the performance of the CNN-based method is inferior to that of registration-based interpolation in terms of PSNR, in terms of SSIM the two perform similarly, and the CNN-based method produces much sharper results as is evident from the Brenner Sharpness measure. The inferior performance in PSNR can be attributed to the fact that the CNN-based method used here relies on weight matrices that are trained on regular video datasets and not on medical datasets.

Fig. 2 provides a qualitative comparison of the performance of the two methods for the chest CT sequence. In the top row, the ground truth, the result of registration-based slice interpolation, and the result of CNN-based slice interpolation are shown respectively. In the bottom row, the difference between the two surrounding slices that are used for the interpolation, as well as the difference between the results of registration-based and CNN-based methods with respect to the ground truth are shown. Close inspection of the results reveal that the two methods perform almost identically in terms of their differences with the ground truth data. The result of the registration-based method suffers from smoothing while the result of the CNN-based method is much sharper.

Fig. 1: Average performance comparison of the registration-based (REG) and the learning-based (CNN) for the chest CT images using PSNR, SSIM and Brenner Sharpness as performance metrics.
Fig. 2: Sample interpolated slices from the chest CT sequence. Top row: the ground truth, the results of registration-based, and CNN-based slice interpolation, respectively. Bottom row: the difference between the two surrounding slices used for the interpolation, the difference between the results of registration-based and CNN-based with respect to the ground truth, respectively

Fig. 3 shows the average performance of the two methods for the RESECT brain MRI dataset. Similar to Fig. 1, on the left the average PSNR is shown while average SSIM and average Brenner Sharpness are shown in the middle and right panels of the figure. For this dataset, the CNN-based method’s average performance metrics are 34.19, 0.9598, 1574 while the performance metrics for the registration-based method are 33.49, 0.9669, 1290, for PSNR, SSIM and Brenner Sharpness, respectively.

Fig. 4 provides a qualitative comparison of the performance of the two methods for the RESECT brain MRI dataset. As before, the ground truth, as well as the results of registration and CNN-based methods are shown in the top row, while the difference images are shown in the bottom row. Visual comparison of the results reveal that both methods perform similarly while the result of the CNN-based method is much sharper than the registration-based method.

Average computational time can also be compared. For the registration-based method the C implementation provided by the authors of the original paper are used. As for the CNN-based method, the Python implementation provided by the authors on GitHub is used. In general, given that Python is an interpreted language, its computational times are much slower than C codes. Despite this, our experiments showed that the computational time needed for slice interpolation using the CNN-based approach is much lower than that of the registration-based approach. This is to be expected, since deep learning algorithms are generally fast for inference with trained models. For the chest CT images, the computational time for interpolating 35 in-between slices of size is 320 seconds and 93 seconds for the registration-based and CNN-based methods, respectively. For the RESECT data set, the computational time for interpolating 195 in-between slices of size is 1965 seconds and 777 seconds for the registration-based and the CNN-based methods, respectively.

Fig. 3: Average performance comparison of the registration-based (REG) and the learning-based (CNN) for the brain RESECT MRI images using PSNR, SSIM and Brenner Sharpness as performance metrics.
Fig. 4: Sample interpolated slices from the RESECT brain MRI sequence. Top row: the ground truth, the result of registration-based, and CNN-based slice interpolation, respectively. Bottom row: the difference between the two surrounding slices used for the interpolation, the difference between the result of registration-based and CNN-based with respect to the ground truth, respectively.

Iv Conclusion

In this paper use of registration-based and learning-based methods for slice interpolation of medical images is explored. For the registration-based technique, the slice interpolation is formulated as a linear/cubic interpolation combined with a deformable registration to model the variations in the shapes of the objects contained in the available slices [15]. As for the learning-based approach, a deep convolutional neural network architecture is used to account for both displacement analysis and frame synthesize by taking advantage of separable convolutional kernels to reduce the computational complexity, in both training and inference steps [17]. Even though the learning-based method is trained on regular video footage, and not on actual medical volume images, it is capable of producing highly accurate results in a fraction of computational time when compared with the registration-based method. This shows the great capability of the learning-based methods in such applications. Given that applicability of these techniques in medical image processing is ultimately to help in improving the processes for analysis and visualizations, it is necessary to incorporate domain knowledge into the models for a more truthful performance. This is left for future research.

References

  • [1] A. Baghaie, R. M. D’Souza, and Z. Yu (2015) Dense correspondence and optical flow estimation using gabor, schmid and steerable descriptors. In International Symposium on Visual Computing, pp. 406–415. Cited by: §I.
  • [2] A. Baghaie, R. M. D’Souza, and Z. Yu (2017) Dense descriptors for optical flow estimation: a comparative study. Journal of Imaging 3 (1), pp. 12. Cited by: §I.
  • [3] A. Baghaie, Z. Yu, and R. M. D’souza (2014) Fast mesh-based medical image registration. In International Symposium on Visual Computing, pp. 1–10. Cited by: §I.
  • [4] A. Baghaie and Z. Yu (2014) Curvature-based registration for slice interpolation of medical images. In International Symposium Computational Modeling of Objects Represented in Images, pp. 69–80. Cited by: §I.
  • [5] J. F. Brenner, B. S. Dew, J. B. Horton, T. King, P. W. Neurath, and W. D. Selles (1976) An automated microscope for cytologic research a preliminary evaluation.. Journal of Histochemistry & Cytochemistry 24 (1), pp. 100–111. Cited by: §III, §III.
  • [6] Y. Chen, Y. Xie, Z. Zhou, F. Shi, A. G. Christodoulou, and D. Li (2018) Brain mri super resolution using 3d deep densely connected neural networks. In 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), pp. 739–742. Cited by: §I.
  • [7] D. H. Frakes, L. P. Dasi, K. Pekkan, H. D. Kitajima, K. Sundareswaran, A. P. Yoganathan, and M. J. Smith (2008) A new method for registration-based medical image interpolation. IEEE transactions on medical imaging 27 (3), pp. 370–377. Cited by: §I.
  • [8] A. Goshtasby, D. A. Turner, and L. V. Ackerman (1992) Matching of tomographic slices for interpolation. IEEE Transactions on Medical Imaging 11 (4), pp. 507–516. Cited by: §I.
  • [9] G. J. Grevera and J. K. Udupa (1996) Shape-based interpolation of multidimensional grey-level images. IEEE transactions on medical imaging 15 (6), pp. 881–892. Cited by: §I.
  • [10] W. E. Higgins, C. J. Orlick, and B. E. Ledell (1996) Nonlinear filtering approach to 3-d gray-scale image interpolation. IEEE transactions on medical imaging 15 (4), pp. 580–587. Cited by: §I.
  • [11] A. Horváth, S. Pezold, M. Weigel, K. Parmar, and P. Cattin (2017) High order slice interpolation for medical images. In International Workshop on Simulation and Synthesis in Medical Imaging, pp. 69–78. Cited by: §I.
  • [12] A. Kudo, Y. Kitamura, Y. Li, S. Iizuka, and E. Simo-Serra (2019) Virtual thin slice: 3d conditional gan-based super-resolution for ct slice interval. In

    International Workshop on Machine Learning for Medical Image Reconstruction

    ,
    pp. 91–100. Cited by: §I.
  • [13] T. Lee and C. Lin (2002) Feature-guided shape-based image interpolation. IEEE transactions on medical imaging 21 (12), pp. 1479–1489. Cited by: §I.
  • [14] T. Lee and W. Wang (2000) Morphology-based three-dimensional interpolation. IEEE Transactions on Medical Imaging 19 (7), pp. 711–721. Cited by: §I.
  • [15] J. Leng, G. Xu, and Y. Zhang (2013) Medical image interpolation based on multi-resolution registration. Computers & Mathematics with Applications 66 (1), pp. 1–18. Cited by: §I, §I, §II-A, §II-A, §III, §IV.
  • [16] S. Niklaus, L. Mai, and F. Liu (2017) Video frame interpolation via adaptive convolution. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    ,
    pp. 670–679. Cited by: §II-B.
  • [17] S. Niklaus, L. Mai, and F. Liu (2017) Video frame interpolation via adaptive separable convolution. In Proceedings of the IEEE International Conference on Computer Vision, pp. 261–270. Cited by: §I, §II-B, §II-B, §II-B, §IV.
  • [18] C. Peng, W. Lin, H. Liao, R. Chellappa, and S. K. Zhou (2019) Deep slice interpolation via marginal super-resolution, fusion and refinement. arXiv preprint arXiv:1908.05599. Cited by: §I.
  • [19] G. P. Penney, J. A. Schnabel, D. Rueckert, M. A. Viergever, and W. J. Niessen (2004) Registration-based interpolation. IEEE transactions on medical imaging 23 (7), pp. 922–926. Cited by: §I.
  • [20] K. Simonyan and A. Zisserman (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §II-B.
  • [21] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli (2004) Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13 (4), pp. 600–612. Cited by: §III, §III.
  • [22] Y. Xiao, M. Fortin, G. Unsgård, H. Rivaz, and I. Reinertsen (2017) RE trospective evaluation of cerebral tumors (resect): a clinical database of pre-operative mri and intra-operative ultrasound in low-grade glioma surgeries. Medical physics 44 (7), pp. 3875–3882. Cited by: §III.