Images with high resolution (HR) are greatly in demand for many real applications[trinh2014novel]. However, the resolution and quality of the images are normally limited by the imaging hardware. For medical images, which provide useful and crucial details of the anatomical and physiological information for the patients, are very desirable with HR. In addition to the possible restrictions of the imaging hardware, medical images are more susceptible by the health limitations (e.g., ionizing radiation dose of using X-ray) and acquisition time limitations (e.g., Specific Absorption Rate limits of using MRI). Moreover, movements due to patients fatigue and organs pulsation will further degrade image qualities and result in images with lower signal-to-noise ratio (SNR). Low resolution (LR) medical images with limited field of view and degraded image quality could reduce the visibility of vital pathological details and compromise the diagnostic accuracy and prognosis [yang2016combined].
Research studies have shown that image super-resolution (SR) provides an alternative and relatively cheaper solution to improve the perceptual quality of medical images in terms of the spatial resolution enhancement instead of hardware improvement. Compared to conventional image interpolation, SR methods can provide better HR outputs with higher SNR and less blurry effects. Broadly speaking, there are two different types of SR: (1) using multiple LR images acquired from different views of the same object to reconstruct the HR output, but acquiring multi-view images could be expensive and sometimes infeasible; (2) learning a particular SR model using LR-HR training pairs, and performing the inference on a new input LR image to yield the HR output [yang2012coupled, trinh2014novel].
More recently, deep learning base SR methods have boosted the performance of the super-resolved HR images mainly owe to the development of the computing power and the available big data. For example, the SRGAN method[ledig2017photo], which was developed based on a Generative Adversarial Network (GAN) model, has demonstrated fast and accurate SR results. However, SRGAN has been developed for natural images and there are still limited studies for medical images.
In this study, we developed a lesion focused SR (LFSR) method that leverage the merits of GAN based models to generate perceptually more realistic SR results and also avoid introducing non-existing features into the lesion area after SR. By performing simulation based studies on the Multimodal Brain Tumor Segmentation Challenge (BraTS) datasets, we demonstrate the efficacy of our SR method in application of spatial resolution enhancement of brain tumor MRI images to potentially maintain crucial diagnostic information for further clinical tasks.
2.1 Lesion Focused SR
Our LFSR includes a lesion detection neural network, a super resolution images generator , a HR/SR images discriminator , and a pre-trained 19 layers VGG[vgg19] (Fig.1). The aims to detect the region of interest (ROI, e.g. brain tumors), and , from whole size LR and HR images and before we applying the GAN:
We propose a max pooling residual block and an input-scale free residual neural network. Compared to the residual blocks [resnet]
and skip connection have been widely used, a max pooling layers is added after two residual blocks, which include two skip connections between four convolution and batch normalization layers. This can help accelerate the training process, and reduce the memory cost of the ROI detection task.
During training, and of the GAN are playing a game:
aims to estimate as realistic as possible SR images,, from , and the discriminator aims to figure them out from the ground truth . With the lesion detection, the training aims to solve:
where and are the trainable parameters, and
are the loss functions for theand . In our proposed LFSR, we use a SR residual network (SRResNet) as the generator , which includes 16 residual blocks, and following sub-pixel convolution layers. The discriminator and the pre-trained VGG are trained simultaneously with to generate perceptually realistic image features.
2.2 Data Preprocessing and Training Settings
We have tested bilinear interpolation, SRResNet, SRGAN[ledig2017photo] and LFSR on the post-contrast T1-weighted (T1Gd) MRI scans from the BraTS 2018 datasets [menze2015multimodal], which have been randomly divided into training ( images) and validation (
images) datasets. All the slices are normalized to zero-mean and unit-variance. We simulated the LR images by downsampling the HR ground truth and tested with additive white Gaussian noises (AWGN,) applied in the k-spaces [Bao2003].
All the experiments were performed on a Linux workstation with NVIDIA TITAN Xp GPUs. All the models were implemented in Python, based on the TensorLayer [tensorlayer2017] library, and were trained with Adam optimizer with the initial learning rate of . The
was trained independently for 100 epochs withloss. The SRResNet was trained for 350 epochs with the pixel-wise mean square error loss . The generator in SRGAN and LFSR was initially trained with for 50 epochs, then the GAN was trained with , and , where was the percentage of incorrectly distinguished , and , and were the percentages of correctly distinguished and .
3 Results and Discussions
3.1 Lesion Detection
Our has achieved high accuracies on both X2 and X4 downsampled images. In evaluation, we defined that if a tumor was covered by the predicted ROI, it was a perfect detection, and if it was covered, it was a acceptable detection. In the X2 case, images () were perfect detections, and other () were acceptable detections. In the X4 case, images () were perfect detections, and other () acceptable detections .
3.2 X2 and X4 SR
Here we showed the X2 and X4 SR results in Fig. 2. Both bilinear interpolation and SRResNet have produced blurry SR results although SRResNet has achieved the highest PSNR. SRGAN and our proposed LFSR have resulted in images with more realistic texture features compared to the ground truth. Compared to the SRGAN, our LFSR has obtained higher (X2 cases) or equivalent (X4 cases) PSNR. More importantly, our LFSR has achieved significant reduction of the GPU memory cost; therefore, our LFSR can double the batch size, which has accelerated the training process to 266.8s/epoch for X2 and 194.8s/epoch for X4 (compared to SRGAN training time 649.8s/epoch for X2, and 370.8s/epoch for X4).
3.3 X2 SR with Additive Noise
We have also tested LFSR in X2 SR with additive Gaussian noise. The bilinear interpolation with non local means denoising [nonlocaldenoising] method (B+NLD) was tested to suppress the noise and provided a more fair comparison. All three deep learning methods have achieved higher PSNR and SSIM when noise presented (Table 1). The based SRResNet has still achieved the highest PSNR and SSIM. In contrast, both SRGAN and LFSR have been still able to generate more perceptually realistic textures from our qualitative studies. Furthermore, LFSR has achieved higher PSNR and SSIM than the SRGAN for the noisy cases (Table 1), and more efficient training.
In summary, we have developed and validated a lesion focused SR (i.e., LFSR) method to super-resolve the tumor ROIs imaged by MRI. Compared to state-of-the-arts SR method, our proposed LFSR method is more efficient and it can result in perceptually more realistic SR, which will maintain crucial image features for further clinical tasks and decisions. In the final camera ready version, we will include a more detailed description of our method and more comparison results.