I Introduction
Image interpolation and superresolution as a common research area has been widely studied in various branches of image processing. Using the available image data, image interpolation/superresolution techniques aim to enhance the resolution of the data. Such resolution enhancement can be done in both spatial domain as well as temporal domain for image sequences. Image interpolation, more specifically slice interpolation, has found widespread use in biomedical applications to enhance the resolution of the data acquired using biomedical imaging modalities such as CT, MRI, etc. This practice is especially useful for visualization and analysis purposes of biomedical data. In such modalities, a sequence of 2D image are acquired from the patient. However, given the limitations of the imaging systems, usually the resolution is not symmetric in all three coordinate directions as the throughplane resolution of the acquired data is significantly lower than the inplane resolution. Such discrepancy results in stepshaped isosurfaces and discontinuity in structures in 3D reconstructed models. Therefore, it is necessary to develop image interpolation techniques to overcome these limitations.
Generally, slice interpolation techniques can be divided into two groups: intensitybased interpolation, and objectbased interpolation. In the first group, the interpolated slices are computed directly from the intensity information of the available data, without considering the shape information of the objects contained in the images. Nearestneighbor, linear and cubic spline interpolation methods are examples of techniques used in this group. The simplicity of such interpolation techniques results in very low computational complexity, which makes them highly popular in visualization applications. However, for analysis purposes given that the results produced by these techniques suffer from blurring artifacts on object boundaries, they are not recommended.
In objectbased methods, the shape information of the objects contained in the images are taken into account to guide the interpolation into more accurate representations of the interpolated slices. Examples of such methods can be seen in [8, 10, 9, 14, 13]. Goshtasby et al. used a gradient magnitude based approach to find the corresponding points between consecutive slices and then applied linear interpolation to compute the interpolated slices [8]. The proposed method is proven useful when the shape difference is small as the search domain is limited to small neighborhoods. However, such assumption is not generally true in practical applications. To overcome the limitations, other techniques such as column fitting interpolation [10], shapebased interpolation [9], morphologybased interpolation [14], and featureguided interpolation [13]
have been proposed. To find pixelwise correspondence between pixels of images, registration and/or optical flow estimation methods can be employed. Such correspondence can be formulated by taking into account intensity or feature information of images
[3, 1, 2]. A relatively new group of techniques for slice interpolation use image registration as the main building block to estimate the changes in shape of the objects contained in the available slices [19, 7, 15, 4, 11]. In this group, deformable (nonrigid) image registration serves as means to estimate the pixelwise correspondence between the consecutive slices.With the introduction of deep Convolutional Neural Networks (CNNs) the field of image processing is completely transformed. In supervised CNNs, after setting the hyperparameters of the network (such as number of layers, structure of layers, evaluation functions etc.) the network is trained by introducing sets of data and the parameters of the network are trained to reduce the amount of error between the network’s output and the ground truth. Examples of such techniques can be found in 3D slice interpolation of medical images as well. Chen et al
[6] proposed a 3D Densely Connected SuperResolution Network (DCSRN) for slice interpolation of medical images. Peng et al [18] on the other hand proposed to use a 2D CNN for interpolating anisotropic brain MRI data to enhance the lower resolution along the throughplane direction. The proposed method takes advantage of CNNbased data fusion and refinement to achieve the final results. In the work of Kudo et al [12] use of conditional Generative Adversarial Networks (GANs) is explored for superresolution of CT imagesIn this paper, we explore use of CNNbased frame interpolation techniques for slice interpolation of biomedical images. More specifically, we aim to analysis the performance of adaptive separable convolutions for video frame interpolation proposed in the work of Niklaus et al [17] and compare it with the registrationbased approach proposed by Leng et al [15]
. One major challenge in using deep learningbased methods in biomedical application is the limitation in access to abundance of data required to train the networks. Given this, we aim to see whether the pretrained video frame interpolation technique which is trained on regular video footage without any human annotation can be used for biomedical slice interpolation.
The rest of the paper is organized as follows. In section 2, an overview of both video frame interpolation and registrationbased slice interpolation methods is provided. Section 3 contains experiments using biomedical volume data and quantitative and qualitative performance comparisons are presented. Section 4 concludes the paper.
Ii Methods
Iia RegistrationBased Slice Interpolation
The slice interpolation proposed in [15] works based on multiresolution deformable image registration. The registration model proposed in the approach is as follows. Given two input images and , they are represented as continuous functions (X) and (X), in which defined in the domain using bilinear interpolation. The model aims to compute two displacement maps, and that minimize the following energy function:
(1) 
In this equation, and are firstorder derivatives of the with respect to and , respectively. The first term in the energy function is the fidelity term which aims to minimize the difference between the two deformed input images. In this formulation, bidirectional matching is used for improved performance. Also to balance the mismatch endurance of the squared intensity difference (SSD) in lowintensity and highintensity regions, a modified SSD is employed with as a positive constant set empirically. Because of the illposedness of the registration problem, the energy function needs to be regularized by introducing smoothing functions. Here the firstorder regularization and the area regularization terms are used respectively. These terms’ contributions in the final energy function are weighted empirically according to given images. The registration model is minimized using a geometric flowbased method. After the derivation of the registration model, the inbetween slices are computed using linear or cubic spline interpolation. The avid reader is referred to the original paper [15] for more details on the implementation.
IiB Convolutional Neural NetworkBased Frame Interpolation
The CNNbased adaptive separable convolution method proposed in [17] is based on a previous work by the same authors [16], that aims to reduce the computational complexity of the former approach. As is common in many image processing applications, if the higherdimensional kernels can be estimated in separable forms, the computational complexity of the process is reduced significantly, which leads to lower computational time for both training and inference.
The goal of the CNNbased video frame interpolation method is to synthesize a frame inbetween the two input frames and . For each output pixel in the interpolated frame, a pair of 2D convolution kernels are estimated, and , to compute the intensity value of the interpolated pixel as:
(2) 
In this formulation, and are patches centered at location in the input frames and is the convolution operator. The kernels are estimated to represent both displacement and resampling information for the interpolation procedure. Given the high computational complexity of estimating these 2D kernels for large displacements, the aim is to estimate a pair of 1D kernels, for both the horizontal and vertical directions, for each 2D kernel as and where:
(3) 
The neural network architecture consists of a contracting component (encoder) for feature extraction of the two input frames and an expanding component (decoder) to perform upsampling and dense prediction. Skip connections are incorporated in the network to connect the layers from the contracting component to the layers of the expanding component. The last expanding layer is then connected to four subnetworks, each estimating one of the four required 1D kernels. For the layers in the contracting component, stacks of
convolution kernels with Rectified Linear Units (ReLU) combined with average pooling are used. As for the layers in the expanding component, use of bilinear interpolation is considered.
A combination of two loss functions is used for training of the network. The first is based on the
norm of the intensity differences between the interpolated image and the ground truth. The second one, noted as perceptual loss, is based on the norm of the highlevel feature differences between the interpolated image and the ground truth:(4)  
where is the feature extraction function. It is reported in [17] that the relu4_4 layer of the VGG19 network [20] is used for feature extraction.
The training is done by randomly selecting 250,000 data samples, each containing patches from highquality YouTube videos that contain sufficiently large motions. Random data augmentation is done on the fly. As for the kernel sizes, kernels of size 51 are used. For more details on the implementation and training of the network, the reader is referred to the original work [17].
Iii Results And Discussions
To assess the performance of the two methods for slice interpolation of biomedical volume images, two sets of data are used here. For the first set, a sequence of chest CT images is considered [15]. The chest CT sequence consists of 69 slices with size
. The even slices are removed and then interpolated by odd slices using the two registration and CNN based methods. The second data set (RESECT) is a series of brain MRI images consisting of 391 slices of size
[22]. Similar to the chest sequence, here, the even slices are removed and the odd slices are used for interpolation of the missing slices.Three subjective/objective metrics are used for performance assessment: Peak Signal to Noise Ratio (PSNR), Structural SIMilarity (SSIM)
[21], and Brenner Sharpness [5]. While PSNR and SSIM measure the performance with respect to the ground truth, the Brenner Sharpness belongs to the class of blind quality assessment methods since it measures the sharpness of the interpolation results independent of the ground truth.To compute the PSNR, first we need to compute the Mean Squared Error (MSE) between the ground truth and the interpolation result. Assuming and as the ground truth and estimated images respectively, the MSE can be defined as:
(5) 
where and are the pixel of the ground truth and estimated images respectively, and is the total number of pixels. Having the MSE, PSNR can be defined as:
(6) 
where is the dynamic range of pixel intensities in the images.
For the SSIM, three different components play significant roles: luminance, contrast ratio and structure. The simplified equation for SSIM can be written as [21]:
(7) 
where and are the averages and and
are the variances of the ground truth and the estimated image, respectively while the
is the covariance value. and are constants defined as and .For the Brenner Sharpness measure, the squared sum of the image’s first derivatives in both horizontal and vertical directions is calculated [5].
Fig. 1 shows the average performance comparison of slice interpolation for the two registrationbased and CNNbased interpolation methods for the chest CT sequence. From left to right, PSNR, SSIM and Brenner Sharpness are depicted, respectively. For this dataset, the CNNbased method’s average performance metrics are 26.45, 0.9088, 1940 while the performance metrics for the registrationbased method are 26.85, 0.9090, 1500, for PSNR, SSIM and Brenner Sharpness, respectively. While the performance of the CNNbased method is inferior to that of registrationbased interpolation in terms of PSNR, in terms of SSIM the two perform similarly, and the CNNbased method produces much sharper results as is evident from the Brenner Sharpness measure. The inferior performance in PSNR can be attributed to the fact that the CNNbased method used here relies on weight matrices that are trained on regular video datasets and not on medical datasets.
Fig. 2 provides a qualitative comparison of the performance of the two methods for the chest CT sequence. In the top row, the ground truth, the result of registrationbased slice interpolation, and the result of CNNbased slice interpolation are shown respectively. In the bottom row, the difference between the two surrounding slices that are used for the interpolation, as well as the difference between the results of registrationbased and CNNbased methods with respect to the ground truth are shown. Close inspection of the results reveal that the two methods perform almost identically in terms of their differences with the ground truth data. The result of the registrationbased method suffers from smoothing while the result of the CNNbased method is much sharper.
Fig. 3 shows the average performance of the two methods for the RESECT brain MRI dataset. Similar to Fig. 1, on the left the average PSNR is shown while average SSIM and average Brenner Sharpness are shown in the middle and right panels of the figure. For this dataset, the CNNbased method’s average performance metrics are 34.19, 0.9598, 1574 while the performance metrics for the registrationbased method are 33.49, 0.9669, 1290, for PSNR, SSIM and Brenner Sharpness, respectively.
Fig. 4 provides a qualitative comparison of the performance of the two methods for the RESECT brain MRI dataset. As before, the ground truth, as well as the results of registration and CNNbased methods are shown in the top row, while the difference images are shown in the bottom row. Visual comparison of the results reveal that both methods perform similarly while the result of the CNNbased method is much sharper than the registrationbased method.
Average computational time can also be compared. For the registrationbased method the C implementation provided by the authors of the original paper are used. As for the CNNbased method, the Python implementation provided by the authors on GitHub is used. In general, given that Python is an interpreted language, its computational times are much slower than C codes. Despite this, our experiments showed that the computational time needed for slice interpolation using the CNNbased approach is much lower than that of the registrationbased approach. This is to be expected, since deep learning algorithms are generally fast for inference with trained models. For the chest CT images, the computational time for interpolating 35 inbetween slices of size is 320 seconds and 93 seconds for the registrationbased and CNNbased methods, respectively. For the RESECT data set, the computational time for interpolating 195 inbetween slices of size is 1965 seconds and 777 seconds for the registrationbased and the CNNbased methods, respectively.
Iv Conclusion
In this paper use of registrationbased and learningbased methods for slice interpolation of medical images is explored. For the registrationbased technique, the slice interpolation is formulated as a linear/cubic interpolation combined with a deformable registration to model the variations in the shapes of the objects contained in the available slices [15]. As for the learningbased approach, a deep convolutional neural network architecture is used to account for both displacement analysis and frame synthesize by taking advantage of separable convolutional kernels to reduce the computational complexity, in both training and inference steps [17]. Even though the learningbased method is trained on regular video footage, and not on actual medical volume images, it is capable of producing highly accurate results in a fraction of computational time when compared with the registrationbased method. This shows the great capability of the learningbased methods in such applications. Given that applicability of these techniques in medical image processing is ultimately to help in improving the processes for analysis and visualizations, it is necessary to incorporate domain knowledge into the models for a more truthful performance. This is left for future research.
References
 [1] (2015) Dense correspondence and optical flow estimation using gabor, schmid and steerable descriptors. In International Symposium on Visual Computing, pp. 406–415. Cited by: §I.
 [2] (2017) Dense descriptors for optical flow estimation: a comparative study. Journal of Imaging 3 (1), pp. 12. Cited by: §I.
 [3] (2014) Fast meshbased medical image registration. In International Symposium on Visual Computing, pp. 1–10. Cited by: §I.
 [4] (2014) Curvaturebased registration for slice interpolation of medical images. In International Symposium Computational Modeling of Objects Represented in Images, pp. 69–80. Cited by: §I.
 [5] (1976) An automated microscope for cytologic research a preliminary evaluation.. Journal of Histochemistry & Cytochemistry 24 (1), pp. 100–111. Cited by: §III, §III.
 [6] (2018) Brain mri super resolution using 3d deep densely connected neural networks. In 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), pp. 739–742. Cited by: §I.
 [7] (2008) A new method for registrationbased medical image interpolation. IEEE transactions on medical imaging 27 (3), pp. 370–377. Cited by: §I.
 [8] (1992) Matching of tomographic slices for interpolation. IEEE Transactions on Medical Imaging 11 (4), pp. 507–516. Cited by: §I.
 [9] (1996) Shapebased interpolation of multidimensional greylevel images. IEEE transactions on medical imaging 15 (6), pp. 881–892. Cited by: §I.
 [10] (1996) Nonlinear filtering approach to 3d grayscale image interpolation. IEEE transactions on medical imaging 15 (4), pp. 580–587. Cited by: §I.
 [11] (2017) High order slice interpolation for medical images. In International Workshop on Simulation and Synthesis in Medical Imaging, pp. 69–78. Cited by: §I.

[12]
(2019)
Virtual thin slice: 3d conditional ganbased superresolution for ct slice interval.
In
International Workshop on Machine Learning for Medical Image Reconstruction
, pp. 91–100. Cited by: §I.  [13] (2002) Featureguided shapebased image interpolation. IEEE transactions on medical imaging 21 (12), pp. 1479–1489. Cited by: §I.
 [14] (2000) Morphologybased threedimensional interpolation. IEEE Transactions on Medical Imaging 19 (7), pp. 711–721. Cited by: §I.
 [15] (2013) Medical image interpolation based on multiresolution registration. Computers & Mathematics with Applications 66 (1), pp. 1–18. Cited by: §I, §I, §IIA, §IIA, §III, §IV.

[16]
(2017)
Video frame interpolation via adaptive convolution.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 670–679. Cited by: §IIB.  [17] (2017) Video frame interpolation via adaptive separable convolution. In Proceedings of the IEEE International Conference on Computer Vision, pp. 261–270. Cited by: §I, §IIB, §IIB, §IIB, §IV.
 [18] (2019) Deep slice interpolation via marginal superresolution, fusion and refinement. arXiv preprint arXiv:1908.05599. Cited by: §I.
 [19] (2004) Registrationbased interpolation. IEEE transactions on medical imaging 23 (7), pp. 922–926. Cited by: §I.
 [20] (2014) Very deep convolutional networks for largescale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §IIB.
 [21] (2004) Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13 (4), pp. 600–612. Cited by: §III, §III.
 [22] (2017) RE trospective evaluation of cerebral tumors (resect): a clinical database of preoperative mri and intraoperative ultrasound in lowgrade glioma surgeries. Medical physics 44 (7), pp. 3875–3882. Cited by: §III.
Comments
There are no comments yet.