Magnetic resonance imaging (MRI) is widely employed in a variety of medical applications to aid precise diagnosis. However, doctors are frequently confronted with the resolution-acquisition time trade-off in clinical environment, due to the capacity of hardware and the cooperation of patients. The acquisition of high resolution (HR) magnetic resonance (MR) images consumes more scan time, but the patients cannot remain still for a long period, resulting in high probability of motion artifact. On the other hand, shortened acquisition time is more applicable for patients, whereas the images with sufficient resolution can hardly be obtained. Therefore, with the capability to recover the resolution loss of HR images from low resolution (LR) images and reduce the motion artifact, deep learning-based super resolution (SR) and motion artifact reduction (MAR) technique is a promising approach, as it reconstructs high quality images without extra scan time or any modification on the MR pulse sequence.
Super resolution image reconstruction, as one of the major fields of computer vision, was deeply influenced by deep learning technology. In such data-driven approach, collecting a big amount of image pairs which consist of low resolution and corresponding high resolution images as training data, and training the deep neural network to extract the pixel-wise features and generate the upsampled super resolution image by using the pairs of training data. This end-to-end learning-based method was firstly applied using a 2D CNN by Dong et al.. Though a larger neural network results in enhanced overall performance , training such a deep CNN has been demonstrated to be challenging . Recently, Zhang et al.  proposed network with channel attention and residual in residual structure with further enhanced performance in super resolution.
Single image super resolution (SISR) has been adopted into medical image sectors following the rapid advancement in natural images. The majority of previous research apply the 2D network structures on medical images slice by slice . However, medical images like Computed Tomography (CT) and MRI normally carry information of 3D anatomical structures. Processing each image slice independently may lead to mismatch in the reconstructed images of adjacent slices. Therefore, a 3D model is desired to solve this problem, since it is capable to extract 3D structural information. According to recent research, 3D CNN outperforms 2D networks with a wide margin in MRI SR . However, because of the extra dimension of 3D networks, the demand on huge amount of GPU resource and fairly long inference time prevent 3D neural networks to be deployed easily in the real clinics. Although Chen et al. have demonstrated fast 3D models , the margin between the GPU consumption of 2D and 3D networks is still huge.
Regarding to the downsampling methods for synthetic LR image generation, there are two commonly-used approaches. The first one is image-based method using Gaussian blurring , which is transferred from generic computer vision task without considering the scheme of MR image acquisition. Such gap leads to the inconsistencies between training and inference phase by using the deep learning based restoration thus degrade the performance in the real clinics. The second one is the k-space truncation, which is recently recognized as the way of mimicking real LR image acquisition in MRI . Besides, multiple downsampling strategies have been employed in previous research. In the studies of 2D networks, (frequency-encoding (FE)phase-encoding (PE)slice-encoding (SL)) and , as and in-plane downsampling, were commonly used . And for 3D networks, HR images were downsampled with scale factors of , and . These downsampling strategies lead to various acceleration factors in acquisition time and different difficulties for SR reconstruction. For downsampling strategies with the same acceleration factor, Therefore, selecting a proper downsampling strategy with lower difficulty can result in better SR reconstruction result and more accurate diagnosis. However, the downsampling strategies were not analyzed in previous studies.
Furthermore, patient movement is inevitable during MRI measurements, resulting in motion-artifact that degrades the image quality. Deep learning algorithms is also capable to reduce motion artifacts and restore the high-quality images . Most of the previous research utilized 2D UNet-based model to reduce the motion artifact , while 3D models may improve the performance of motion artifact reduction but also face the same problem with GPU consumption. Besides, previous research normally utilized random movement to retrospectively generate motion artifact , thus the severity of the motion artifact was not quantifiable.
At last, doctors usually concern about the accuracy of restored high-quality images. Tanno et al. and Qin et al.  employed a method to generate aleatoric uncertainty  as auxiliary information for doctor to understand the uncertainty of restored MR image. However, such method can not tell if the uncertainty is caused by the noise in the training data or caused by error generated by the deep neural network due to out of distribution (OOD) data, i.e., distribution of training data is not identical to the distribution of test data which is commonly seen in the real clinical environment. The latter is the actual uncertainty which reflects the quality of the restored MR image. On the other hand, during the training and evaluation phases of the deep neural network, the ground-truth (GT) images are available, and the performances of the algorithms are evaluated by calculating the difference or similarity between the output images and the GT using some specific metrics. However, when we apply the deep-learning techniques in clinical settings, GT is no longer available, necessitating the use of quantitative methods to estimate the accuracy of restored MR image at both pixel level and overall level to assist the medical doctors to make a trustworthy diagnostics, e.g. quality of each pixel and the SSIM / PSNR of entire restored MR image although the GT is not available.
Therefore, with consideration of clinical application of MRI, we propose the following methods as contributions in this paper:
1). Super resolution reconstruction and motion artifact reduction for MR images are comparable tasks, which restore the images with corrupted k-spaces. Therefore, we propose a universal convolutional neural network modified from the 2D residual channel attention network with multiple slices input, for multiple tasks in high quality MR image restoration including 3D MRI super resolution and motion artifact reduction, with state-of-the-art performance and minimized demands on computation resource and inference time.
2). K-space truncation is recognized as the most realistic low resolution MR image generation approach to replace the conventional downsampling methods in computer vision tasks. As a result, the trained model can be applied in the real clinic application by using in-vivo low resolution images. Besides, we involved several downsampling strategies, including in-plane down-sampling, through-plane down-sampling and their combination with different scale factors. These downsampling strategies were grouped based on the ratio of acquisition time reduction. The super resolution reconstruction performance were evaluated and compared in each downsampling strategy group. The results illustrate the difficulties of super resolution reconstruction with different downsampling strategies, and provide guideline for selecting proper low resolution image acquisition strategy.
3). Furthermore, we propose a reproducible and quantifiable retrospective motion artifact generation method. With our method, motion artifact was generated by mimicking the real motion artifact with a predefined motion pattern and controllable severity.
4). At last, we designed a method to estimate the pixel-wise uncertainty quantification of generated super-resolution MR image. With our method, the estimated pixel-wise uncertainty was acquired simultaneous with the restored images using the same deep neural network guided by an additional loss component. The correlation between the uncertainty and SSIM / PSNR was further investigated and quantified in out study, thus to provide doctors more trustworthy advice in the diagnosis even though the GT is not available.
2 Methods and Experiments
As a universal network for MR image restoration, our model is designed to learn an accurate inverse mapping of the input low-quality images (LR images or MA corrupted images) to the output high-quality reference images (HR images or original images without MA) and the uncertainty maps in an end-to-end training procedure. The high-quality images were exclusively utilized as the system’s target during the training phase. During the training, a loss function computed from restored and reference images, and the uncertainty maps is back-propagated through layers to update the weights of the network. In the test phase, the model was only fed with low-quality images and restores them to high-quality images and provides the uncertainty maps. In the following section, we will go through our proposed network and training procedure in further detail.
In addition, the image resolution is determined by the matrix size of the k-space for a pre-defined field of view (FOV). To accelerate the image acquisition, down-sampling of the k-space, which leads to lower spatial resolution, is commonly used, since the acquisition time is proportional to the phase-encoding steps in phase-encoding (PE) and slice-encoding (SL) directions for 3D imaging. Meanwhile, the resolution in frequency-encoding (FE) direction is normally bonded to that in PE direction, which means the resolution changes simultaneously in FE and PE directions, while the resolution in SL direction is independent and more flexible. Therefore, down-sampling in SL direction is another option besides of in-plane down-sampling, which is studied in previous publications for 2D SISR. In our study, we investigated multiple down-sampling strategies, including in-plane down-sampling, through-plane down-sampling and the combination of them. They are classified based on different acceleration factors. More details are presented in the following section.
Furthermore, we employed the image-based motion artifact generation method with reproducible and quantifiable severity of motion artifact. We designed a pattern of patient movement including in-plane and through-plane movement. The severity of motion artifact can be controlled by adjusting the severity, duration and frequency of movement.
At last, we utilized evidential regression learning to predict the uncertainty of the restored images. The uncertainty map was generated simultaneously with the restored images by our model, and was guided by the normal-inverse-gamma loss. The generated uncertainty map can be used to predict the accuracy of the restored images when there is no ground truth in the real clinical settings.
2.1 Super-resolution and Motion Artifact Reduction Background
The high-quality image restoration task refers to design a restoration system function, h, which inputs low quality (LQ) image(e.g. low-resolution image, or image with blur, motion-artifact, etc.) and outputs the corresponding restored image(e.g. SR image or image without blur, motion-artifact) in high quality (HQ) as shown in the following equation.
For super-resolution task, the LQ image is generated from HR image through resolution degradation system function, f, which usually represents a particular down-sampling or performance degradation function, f, as below:
As can be seen, in order to restore the SR image as close to HR image as possible, the task of designing h is actually equivalent to find the inverse of function f:
Prior studies indicate that such task is an ill-posed inverse problem, the analytical solution of the inverse of f does not exist. The researches in computer vision have revealed CNN-based models have the great potentials to fit the inverse of f in the generic computer vision domain. The CNN-based models have been trained to simultaneously extract low-level feature, build-up representative feature space and accurately reconstruct image from learned features. Consequently, the CNN-based approaches achieve state-of-the-art performance in high quality MR image restoration including MRI super-resolution and motion-artifact reduction in recent studies.
In previous studies, several 2D CNN-based networks were implemented to process the images with single slice in each patch . However, most of the medical images carry anatomical information in 3D volume. Therefore, 3D CNN-based networks were proposed for the restoration of 3D medical images. It has been proved by previous studies that 3D models outperform 2D models by a wide margin in restoring 3D MR images 
, as cross-slice information can be of tremendous assistance. However, the additional dimension of a 3D network exerts a significant strain on computation resources. The number of parameters skyrockets to many times of the 2D networks; the size of image patch increases since 3D patch consists of several slices, resulting in higher VRAM consumption; and the number of operations explodes due to the extra strides on the third dimension. Due to these factors, we propose a method of using 2D CNN-based network to process thin-slab 3D image volume.
2.2 Proposed High Quality MR image Restoration Network
Our network is based on the 2D Residual Channel Attention Network (RCAN) , which has been proposed in the generic computer vision field to generate the SR visual image from the LR visual image. The basic pipeline of our model for high-quality MRI reconstruction is shown in Fig.1. In the major part of the our model, a dedicated module called channel attention (CA) layer is embedded into every residual block to form a residual channel attention block (RCAB). Several RCABs and a long skip connection build a residual group (RG) , and the same pattern has also been extended to form residual in residual (RIR) module which includes several RGs and a long skip connection. The attention weights for different channels that carry the statistics of semantic information of the MR image feature maps will be learned and used to guide the network to focus on restoration of the information that has more important semantic information. Meanwhile, the skip connection in RCAB, RG, and RIR could provide the feasibility of training the network model at an easy pace, especially enforce the network focusing on learning high-frequency information by bypassing the low-frequency information directly. All the weights of the network are adjusted by the back-propagation, driven by minimization of the primary and refinement loss functions which represent differences between the predicted HQ MR images and ground truth MR images. The up-sampling module performs the up-scaling operation on the learned feature maps, resulting in the same size of the SR images and the HR images when the LR images in the super-resolution task are downscaled.
For the MRI SR task, the HR images had a larger matrix size comparing to the generated LR images along the down-sampled directions. We constructed the RCAN model with RGs, RCABs per RG, and 2x up-sampling module if in-plane upscaling of MR image is needed, to process the cropped 2D LR images with only one channel or the 3D LR images with the third dimension placed on the channels, generate corresponding 2D/3D SR images, and reassemble SR patches together into a completely reconstructed SR image. For the MRI motion-artifact reduction task, the MA corrupted MR images have the same size as the MA reduced (MAR) images, thus the up-sampling module was removed to keep the same size of MA images and MAR image while applying larger perceptive field for learning multiple-scale feature.
In our experiment, we used the 2D RCAN-based network to process both 2D (as single-slice mode) and thin-slab 3D (as thin-slab mode) MR image patches. The 2D network is intrinsically capable of processing 3D dataset due to its channel dimension. Therefore, the network was applied as multi-channel input and multi-channel output mode, and we placed the third dimension of the 3D image patch on the channel dimension. We assume the first convolutional layer of the network has the number of input channels as and the number of output channels as , and the size of the input image patches . Therefore, the convolution kernel size of the first layer is , which equals to the convolution with a 3D convolution kernel, whose size is
with non-padding. In this step, features of multiple input slices were extracted and compressed into a single channel feature map fortimes with different filters. And then in the hidden layers, these extracted features will be learnt to reconstruct the high quality images. At the last convolutional layer of the network, the number of the output channels equals to the expected number of slices of the target image, and the high quality images are restored. Our experiments reveal that with our method, 2D networks are able to learn the through-slice features, so that can reconstruct high quality images from through-plane downgraded images and achieve better performance compared to 3D networks, but have very low demands on computation resources.
Besides, ensemble methods include network-based and data-based ones, which are usually proposed to boost the performance of neural network. Network-based ensemble employs several de-correlatedly trained networks or performing the inference several times with different dropout, whereas data-based is performed with data augmentation that original images are rotated, flipped or re-scaled before fed to the network, and after inference transformed back to original position and scale. However, all these conventional ensemble methods consume more time in training or operation complexity. We design a simply but useful strategy to utilize ensemble. In thin-slab mode of our network, each patch consisted of several slices, and every slice appeared in different positions in different patches and was processed by the network differently resulting in several outputs for the same image slice. Our experiments showed that the average of all the outputs for the same slice achieved higher values in all metrics then every output image of this slice. Our method doesn’t need any additional operation or training time.
2.3 Down-sampling Strategies for Super-resolution
Different from super-resolution tasks for visual images, where the low-resolution images are generated using bicubic or Gaussian blurring down-sampling, a k-space truncation was performed. Regard to the acquisition of MR image, the signal in frequency domain is collected to form the k-space, which is transformed to image domain to reconstruct the image using inverse Fast Fourier Transformation (iFFT). Following the approach of k-space truncation down-sampling, the images were transformed to the k-space with Fast Fourier Transformation (FFT), and the central region of k-space with varying sizes based on the down-sampling factor was retained and the surrounding region was truncated. Then the truncated k-space was transformed back to the image domain with iFFT. For both high resolution and generated low-resolution images, the intensities of voxels were scaled to 0 and 1. An example of the generation of 3D LR MR images in three directions is shown in Fig.2.
The matrix size and resolution of the image rely on the numbers of sampling points in the frequency-encoding (FE), phase-encoding (PE) and slice-encoding (SL) directions for 3D imaging. When the FOV is fixed and the numbers of sampling points reduce, the matrix size and resolution decrease in one or multiple directions depending on the down-sampling factor in each direction. Therefore, k-space truncation is the down-sampling method mimicking the real acquisition of low-resolution image. Furthermore, the acquisition time highly depends on phase-encoding steps in PE and SL directions for 3D imaging. Thus, only down-sampling in PE and SL directions lead to shortening of acquisition time. However, In regular MRI measurements, the downgrade of the resolution in PE direction also leads to the downgrade of resolution in FE direction with the same scale factor to maintain the isotropic in-plane resolution. Therefore, the k-space is simultaneously down-sampled in FE direction, resulting in unnecessary loss of k-space regions and greater difficulty in restoring the HR image. Meanwhile, the slice thickness is independent from the in-plane resolution, therefore can be down-sampled with higher flexibility. More specifically, when we expect a acceleration, we have two down-sampling options, which are () and . With the down-sampling, 75% of the k-space is lost, while with only 50% of k-space is lost.
Nonetheless, the difficulty of restoring HR image doesn’t only depends on the ratio of truncated k-space. As we know, the low-frequency components of the acquired signal concentrate at the k-space center, the regions closer to the k-space center are dropped, the more difficult it is to restore the HR image. Therefore, we investigated multiple down-sampling strategies and their super-resolution reconstruction performance in this study. Based on the regular MRI measurement process, we tested the super-resolution reconstruction with the down-sampling factors of and for acceleration; for acceleration; , , and for acceleration.
After down-sampling, the high-resolution and low-resolution images were cropped into patches with smaller size to save computation resource. For 2D models, each slice of the HR images was cropped into patches with 32 voxels overlapped between neighbouring patches to avoid artifact on the edges of the patches, the low-resolution images were cropped into patches with 16 voxels overlapped for an scale factor of 2 and patches with 8 voxels overlapped for an scale factor of 4. In the through-plane direction, each LR patch contains 1/3/5 neighbouring slices from the LR image, and the number of slices for HR patches is 1/3/5 times the scale factor in SL direction. For 3D models, zero-filling for the truncated k-space was used to maintain the matrix size, since most 3D models doesn’t have upsampling module. Both HR and LR images were cropped into patches with 32 voxels overlapped between neighbouring patches.
2.4 Motion Pattern and Motion Artifact Quantification
For a retrospective generation of motion-artifact in MR images, the method of splicing lines from multiple k-spaces was applied to mimic the generation of real motion-artifact. As shown in Fig.3A, a group of images was generated from each slice of the original image by rotating the original image in certain directions and by certain degrees. Then the original image and manipulated images were transformed to k-space using FFT, and k-space segments of the original image were replaced with the segments from k-spaces of the manipulated images following a predefined pattern. The images for the motion-artifact correction task were not cropped, therefore the sizes of input MA images and GT images were 320x256 in axial direction.
Regard to the motion pattern, we employed simplified and commonly-recognized patterns of motion in brain imaging, which are head rotation as in-plane movement and nodding as through-plane movement, and the severity was managed by the frequency of motion. The scheme of motion pattern is shown in Fig.3B. For severity control, we utilized echo-group (EG) as a unit of the minimal time period in which a certain number of successive echos were acquired (can also be considered as the TR for sequences from turbo-spin-echo family.), and the duration of any action must be an integer times of EG. As to the type of movement, we employed the head rotation by 5 degrees to the left and to the right with and without head nodding by 5 degrees. Therefore, the whole process of patient movement was built as below:
at , the patient stayed in the original position and stayed for ;
from to , the patient’s head rotated to the left for 5 degrees;
from to , the patient’s head stayed at the position of 5 degrees to the left;
from to , the patient’s head rotated back to the starting position;
from to , the patient’s head stayed in the starting position;
from to , the patient’s head rotated to the right and returned to the starting position following the same process of step 2 to 4;
from to , the patient’s head stayed in the starting position.
The process from step 2 to 7 were repeated until the whole k-space was acquired, and head nodding were perform together with head rotating. The severity of motion-artifact was controlled by and EG.
In our study, was set to and , leading to the ratio of corrupted k-space lines of and . consists of 80 echos. A centric trajectory was selected to fill the k-space. The SSIMs and PSNRs of the images with different severity of motion-artifact follow a linear tendency as shown in Table 3 in the result section.
In previous studies, the pixel-wise aleatoric uncertainty can be generated by incorporating Gaussian negative log likelihood (NLL) loss into neural network , and applied in MR super-resolution reconstruction  . Such uncertainty only represents the uncertainty from data which can not be prevented and it is not the main issue when applying the deep learning based MRI restoration in the clinic practices. On the contrary, the inevitable OOD data issue (i.e. images acquired from different patients may have diverse distributions in real clinic environment due to image quality difference), can be represented by epistemic uncertainty . In our study, both pixel-wise aleatoric uncertainty and epistemic uncertainty are estimated by using evidential regression 
. Evidential deep learning formulates learning as an evidence acquisition process. Every training example adds support to a learned higher-order, evidential distribution. Sampling from this distribution yields instances of lower-order likelihood functions from which the data was drawn. Instead of placing priors on network weights, as is done in Bayesian neural networks, evidential approaches place priors directly over the likelihood function. By training a neural network to output the hyperparameters of the higher-order evidential distribution, a grounded representation of both epistemic and aleatoric uncertainty can then be learned without the need for sampling. Amini et al. proposed the method to estimate a posterior distribution
, and the approximation to the posterior distribution takes the form of the Gaussian conjugate prior, the Normal Inverse-Gamma (NIG) distribution. Afterwards, the prediction and uncertainties can be calculated as:
Furthermore, we investigated the the relationships between epistemic uncertainty and the metrics of SSIM and PSNR of the reconstructed images. Linear regression and exponential regression were performed to estimate the correlation of epistemic uncertainty to SSIM and PSNR for the test datasets, respectively. Then, the obtained regression curves can be used to estimate the SSIM and PSNR of the reconstructed images when ground truth is not available.
2.6 Loss Functions
Prior studies applied different types of loss functions to train the neural networks towards specific feature refinements. Pixel-wise L1 loss is the most commonly used loss function, which is employed in nearly all studies to minimize the pixel-to-pixel difference between the reconstructed and the target images. Pixel-wise Charbonnier loss was used in this study as a differentiable l1 loss to avoid a strong smoothing effect:
where is assigned as .
Furthermore, SSIM loss is used in some most recent studies to drive the network reconstructing the high-quality image towards better similarity in structures with HRGT image. It is normally applied as 1 subtracted by SSIM value measured between MRI SR images and HRGT images  . In this study, we utilized the L1 loss of the square of the SSIM value.
The calculation of SSIM will be introduced in latter saction.
In our study, we utilized the weighted sum of Charbonnier loss and SSIM loss:
where in our study for the best performance.
Besides, the Normal-inverse-Gamma loss  was employed when the uncertainty map was demanded:
. , , and are the outputs of the network. is the ground truth.
Therefore, we utilized the weighted sum of Charbonnier loss, SSIM loss and NIG loss when evidential regression was employed:
where 0.5 and 1 were selected as the value of and in our study for the best performance.
In this study, we employed the T1w images of the Human Connectome Project (HCP) dataset  consisting of multi-contrast images from 1113 patients. The T1w images were acquired in sagittal direction with 3D MPRAGE on Siemens 3T PRISMA platform. The matrix size were 320x320x256, and the resolution was 0.7mm isotropic. In this study, we randomly selected 80/10/10 patients for training / validation / test groups in our experiments from the HCP patient cohort. The training datasets were used for training the neural networks, the validation datasets were used for monitoring the performance of the neural networks during training, and the the test datasets were used for evaluating the neural networks after training. The three groups has no shared datasets. Besides, for verifying the quantified correlation between uncertainty and SSIM / PSNR, we involved another 40 datasets, which were isolated from the training / validation / test groups mentioned above.
To save GPU resource, each 3D image volume was cropped into small patches. For 2D networks, the patch size was for the LR images with 16 voxels overlapped between neighbouring patches, and for the HR images with 32 voxels overlapped between neighbouring patches. For thin-slab 3D networks, the patch sizes of LR and HR images were the same as 2D networks, and the number of slices per patch was and for the LR images with
slices overlapped between neighbouring patches. For 3D networks, the LR images were interpolated to achieve the same size as the HR images, since most 3D models don’t have upsampling module. The patch size waswith 32 voxels overlapped between neighbouring patches for both the LR and the HR images.
2.8 Implementation Details
As to the implementation of RCAN, for SR factor of 2, RG number of 5 was set, with 5 RCAB in each RG. Conv layers in shallow feature extraction and RIR structure have 64 filters, except for that in the channel-downscaling. For SR factor of 4, we connected two stages of the RCAN network sequentially for progressive-upsampling, consists of 5 RGs with 5 RCABs and one up-sampling module which up-samples the LR images with an SR factor of 2 at the end of each stage.
The networks were trained on two workstations equipped with either two Quadro GV100 graphic cards (Nvidia, Santa Clara, Calif ). We used Pytorch 1.7 at the back end for all deep learning experiments. In each training batch, eight LR patches were randomly extracted as inputs. We trained our model 50 epochs using ADAM optimizer with, , and , and Cosine-decay learning rate was applied starting from and ending at .
We employed peak signal-to-noise ratio (PSNR) and structure similarity index (SSIM) to evaluate the quality of reconstructed images:
where and are the means of x and y representing the luminance of the images, and
are the variances of x and y representing the contrast of the images, andis the covariance of x and y measuring the tendency of x and y to vary together. and are small constants.
3 Results and Discussion
3.1 Super-Resolution with Different Down-sampling Strategies
|Scale||Acceleration Factor||# Input / Output Slices||# Params (M)||# Ops (GFlop)||Inference Time (s)||SSIM / PSNR|
|1 / 1||2.23||9.09||8.10||0.9625 / 38.7468|
|3 / 3||2.23||9.12||21.74||0.9631 / 38.8662|
|3 / 3 +||2.23||9.12||21.74||0.9634 / 38.9355|
|5 / 5||2.24||9.14||33.90||0.9627 / 38.7745|
|5 / 5 +||2.24||9.14||33.90||0.9631 / 38.8605|
|1 / 2||2.08||33.95||10.58||0.9558 / 38.5996|
|3 / 6||2.09||33.99||18.48||0.9724 / 40.7656|
|3 / 6 +||2.09||33.99||18.48||0.9725 / 40.8979|
|5 / 10||2.09||34.05||32.12||0.9736 / 40.9537|
|5 / 10 +||2.09||34.05||32.12||0.9739 / 41.1143|
|1 / 1||2.88||7.36||11.15||0.9272 / 35.5164|
|3 / 3||2.88||7.38||19.48||0.9291 / 35.6303|
|3 / 3 +||2.88||7.38||19.48||0.9300 / 35.7243|
|5 / 5||2.88||7.40||19.44||0.9289 / 35.4683|
|5 / 5 +||2.88||7.40||19.44||0.9301 / 35.5939|
|1 / 4||2.09||33.96||6.75||0.9148 / 34.1939|
|3 / 12||2.09||34.05||13.89||0.9450 / 36.9546|
|3 / 12 +||2.09||34.05||13.89||0.9449 / 37.0753|
|5 / 20||2.10||34.15||19.00||0.9458 / 37.0159|
|5 / 20 +||2.10||34.15||19.00||0.9463 / 37.2224|
|1 / 2||2.23||9.10||5.70||0.9356 / 36.0732|
|3 / 6||2.24||9.15||12.50||0.9490 / 37.5758|
|3 / 6 +||2.24||9.15||12.50||0.9494 / 37.6909|
|5 / 10||2.24||9.19||19.53||0.9491 / 37.5689|
|5 / 10 +||2.24||9.19||19.53||0.9498 / 37.7166|
As explained in the method section, we employed several downsampling strategies, including in-plane downsampling, through-plane downsampling and their combination with different scale factors. In this section, we will firstly evaluate the performance of our network in single-slice mode and thin-slab mode for SR image reconstruction from LR image generated by different down-sampling strategies, and then the down-sampling strategies will be compared on the difficulty of SR image reconstruction.
The performance of SR reconstruction using our network in both single-slice mode and thin-slab mode are shown in Table 1. It is obvious that thin-slab mode always outperforms single-slice mode for all downsampling strategies. Particularly for the strategiew involving through-plane downsampling, the SSIM and PSNR values of SR images reconstructed with 3-slice input increase significantly from single-slice input, and raise again with 5-slice input. For the strategies with only in-plane downsampling, the SSIM and PSNR values reach the peak at 3-slice input, and drop slightly with 5-slice input. In addition, self-ensemble improves the performance of thin-slab mode in the SR reconstruction from all down-sampling strategies. Furthermore, with our network, there is only a tiny increase in the number of parameters and the number of operations within each down-sampling strategies.
Furthermore, the SR reconstruction performance are also compared between different downsampling strategies with the same acceleration factors in Table 1. With the acceleration factor of 2, the SSIM and PSNR values of downsampling are over 0.01 and 2dB higher than those of , respectively. And with the acceleration factor of 4, the downsampling has the highest values of SSIM and PSNR, which are around 0.003 and 0.5dB higher than and 0.02 and 2dB higher than . The results reveal that it is easier to reconstruct accurate SR images from dowmsampling for acceleration and for acceleration.
Beside of the difference in the metrics, the visual effect is also compared in Fig.4. In the left half of Fig.4 reveals the difference between single-slice and thin-slab SR. In the sagittal view, errors in reconstructing small anatomical structures can be found in the single-slice SR image, while the thin-slab SR image shows extensive accuracy in restoring small anatomical structures. Furthermore, mismatch of neighbouring slices can be observed in the axial view of the single-slice SR image, but not in the thin-slab SR image. The right half of Fig.4 compares SR images from different downsampling strategies with acceleration factor of 4, revealing the limitation of the and downsamplings for SR reconstruction. In the sagittal view, the reconstructed image of loses nearly all the small anatomical structures, some of which are restored with the , and most of the small anatomical structures are accurately restored with the . In the axial view, the images reconstructed from and are blurred in vertical and horizontal directions, respectively. While, the image reconstructed from is less blurry than the previous two with most of the small anatomical structures restored.
3.2 Super-Resolution Performance Compared with the State of the Art
|Model||Scale||# Ops (GFlops)||GPU Consumption (GB)||Inference Time (s)||SSIM / PSNR|
|2D VDSR||9.68||1.81||6.45||0.9582 / 37.2060|
|2D DRRN||19.35||2.84||13.76||0.9580 / 37.2497|
|DCSRN||57.50||10.96||19.67||0.9573 / 37.3491|
|mDCSRN||107.54||22.60||46.78||0.9618 / 38.2984|
|Ours (1 slice input)||9.09||2.29||8.10||0.9625 / 38.7468|
|Ours (3 slices input)||9.12||2.29||21.74||0.9631 / 38.8662|
|Ours+ (3 slices input)||9.12||2.29||21.74||0.9634 / 38.9355|
|2D VDSR||9.68||1.81||5.88||0.9157 / 33.9240|
|2D DRRN||19.35||2.84||13.90||0.9156 / 34.1150|
|DCSRN||57.50||10.96||19.91||0.9055 / 33.8439|
|mDCSRN||107.54||22.60||47.06||0.9208 / 34.5682|
|Ours (1 slice input)||7.356||2.14||11.15||0.9272 / 35.5164|
|Ours (3 slices input)||7.396||2.14||19.48||0.9291 / 35.6303|
|Ours+ (3 slices input)||7.396||2.14||19.48||0.9300 / 35.7243|
|3D FSRCNN||7.079||5.32||11.07||0.9405 / 35.8810|
|DCSRN||57.50||10.96||20.68||0.9398 / 35.8276|
|mDCSRN||107.54||22.60||46.84||0.9469 / 36.3776|
|Ours (5 slices input)||9.19||2.29||19.53||0.9491 / 37.5689|
|Ours+ (5 slices input)||9.19||2.29||19.53||0.9498 / 37.7166|
We implemented the previously published state-of-the-art models,including VDSR and DRRN as the baseline of 2D models to compare with our model for 2D SR. We extended FSRCNN  to 3D mode, and implemented DCSRN and mDCSRN as the baseline for 3D SR. All models were trained with the same number of epochs and settings (i.e. learning rate decay and optimizer). With respect to quantitative similarity metrics, as shown in Table 2, our model with thin-slab SR highly outperformed all previous SISR models in both 2D and 3D SR tasks, took less inference time and consumed less GPU resource than most of the previous models.
More specifically, for SR reconstruction from downsampling, our model in single-slice mode achieves 0.9625 and 38.7468 in SSIM and PSNR, which are over 0.0045 and 1.5 dB higher than the other 2D models, and 0.0007 and 0.45dB higher than the 3D model of mDCSRN. The performance of our model is further enhanced with thin-slab mode. For SR reconstruction from downsampling, the performance of our model in both modes exceed the other models by over 0.0064 and 0.95dB in SSIM and PSNR, respectively. For SR reconstruction from downsampling, which is only applicable for 3D models, our model in thin-slab mode outperforms the mDCSRN by 0.0022 and 1.2dB without ensemble, and 0.0029 and 1.3dB with ensemble in SSIM and PSNR, respectively.
With regards to the number of operation, our model has the lowest value for and SR, and the second lowest value for SR. Besides, our model consumes comparable VRAM to the other 2D models, which is only 10% of the VRAM consumption of mDCSRM for all tasks. And the inference time of our models is below 50% of mDCSRN for 3D SR reconstruction.
Fig.5 illustrates the difference in visual effect between the 3D SR of various networks with downsampling. In the sagittal view, the reconstructed SR images of FSRCNN and DCSRN are still highly blurry, while the others has comparable sharpness. In the axial view, our network shows enhanced accuracy in the reconstructed small anatomical structures.
3.3 Motion Artifact Generation and Reduction
In previous studies, motion artifact was normally generated by random movement and corrupting randomly selected k-space lines, which leads to non-reproducible severity of motion artifact. Our motion artifact generation method utilizes repeating predefined motion pattern with certain frequencies resulting in controllable severity. As shown in Table 3, with our method, the SSIMs and PSNRs of the motion artifact-corrupted images, which reveal the severity difference of motion artifact, follow a consistent tendency along with the increase of . In the group with only in-plane rotation, the SSIM increases by 0.05 to 0.07 and the PSNR increase by 2.5 to 3.3 dB when every time the is doubled. The SSIM and PSNR of the group with both in-plane and through-plane rotation follow the same tendency. Besides, the comparison between these two groups reveals that the through-plane rotation leads to decrements of around 0.05 in SSIM and around 0.5 dB in PSNR.
Besides, our network was also applied on the task of motion-artifact reduction. Also shown in Table 3, our network achieved outstanding results with significant improvements in SSIM and PSNR comparing to the UNet , which was widely-used in previous studies. In single-image motion artifact reduction mode (1 slice input and output), our network highly outperformed the UNet with consistent improvement in both SSIM and PSNR of the motion artifact-reduced image for different severity of motion artifact. In thin-slab mode, the performance of our network was further improved and the SSIM and PSNR reached the highest values with the self-ensemble. Our network in thin-slab mode outperformed single-image mode by 0.006 to 0.008 increment in SSIM and 0.4 to 0.6 dB increment in PSNR, and outperformed UNet by around 0.01 increment in SSIM and around 1 dB increment in PSNR for the correction of the most severe artifact.
|In-plane / Through-plane||Before Correction||U-Net||Ours (1 Slice input)||Ours (3 Slices input)||Ours+ (3 Slices input)|
|Rotation||SSIM / PSNR||SSIM / PSNR||SSIM / PSNR||SSIM / PSNR||SSIM / PSNR|
|5 / 0||0.7515 / 28.4205||0.9334 / 36.1477||0.9378 / 36.6720||0.9434 / 37.0669||0.9447 / 37.1948|
|5 / 0||0.8142 / 30.9466||0.9547 / 37.8081||0.9573 / 38.4131||0.9608 / 38.7964||0.9615 / 38.8977|
|5 / 0||0.8843 / 34.6632||0.9726 / 40.2495||0.9742 / 41.0761||0.9762 / 41.3078||0.9766 / 41.3979|
|5 / 0||0.9335 / 37.9792||0.9815 / 42.4534||0.9824 / 43.0007||0.9838 / 43.3416||0.9841 / 43.4338|
|5 / 5||0.7391 / 27.8964||0.9319 / 35.5853||0.9349 / 36.0326||0.9428 / 36.6468||0.9442 / 36.7868|
|5 / 5||0.8051 / 30.4936||0.9538 / 37.3098||0.9564 / 37.9584||0.9601 / 38.3078||0.9609 / 38.4133|
|5 / 5||0.8798 / 34.2659||0.9726 / 40.0343||0.9742 / 40.6765||0.9760 / 41.0267||0.9764 / 41.1266|
|5 / 5||0.9301 / 37.5567||0.9804 / 41.5370||0.9815 / 42.1384||0.9830 / 42.4736||0.9834 / 42.5703|
The comparison of visual effect of motion artifact reduction also reveals significant improvement with our network. Fig.6 shows an example of images in axial and sagittal planes with motion artifact reduced from the most severe case. On the axial plane, as the plane of the image patches, the UNet corrected image contains various incorrectly restored anatomical structures comparing to the ground truth, while with our network in single-image mode, the quality of restored image is improved with minor errors anatomical structures. And our model in thin-slab mode provides extensively enhanced image quality, where the majority of features have been kept and very well represented while the motion artifact is substantially removed. On the sagittal plane, the margin between the models is even more significant. Due to lack of through-slice information, each slice of the 3D image volume was restored independently, severe through-slice mismatch can be found in the UNet generated image, and is slightly reduced with our model in single-image mode. While with our network in thin-slab mode, the through-slice mismatch is significantly reduced.
3.4 Restoration with Uncertainty
The uncertainty maps were generated simultaneously with the restored images using our network with NIG loss. In this section, the aleatoric and epistemic uncertainty quantification for the super resolution task will be evaluated. Firstly, an example of HRGT image with the corresponding SR image, absolute error map, uncertainty maps and SSIM map shown in Fig.7 can support the qualitative evaluation. As mentioned in the method section, out-of-distribution data can be identified with epistemic uncertainty, but cannot with aleatoric uncertainty alone . Fig.7D shows that the uncertainty pervade over the whole image volume including the background region, while the epistemic uncertainty map shows significantly emphasized regions in anatomical structures. Besides, It can be found that the regions with high values in the epistemic uncertainty map matches the the regions with higher error in the absolute error map and the regions with lower SSIM in SSIM map, and vice versa.
Furthermore, by investigating the relationships between the mean epistemic uncertainty of each image slice and their SSIM/PSNR values, we observed strong correlations between them. As shown in Fig.8, we firstly performed linear regression on the pairs of mean epistemic and SSIM values using the 10 datasets from the test group (blue dots in the figures), and obtained a linear regression (the red solid line) with 95% prediction interval (the region between the red dashed lines). The same process was perform on the pairs of mean epistemic and PSNR values using exponential regression, due to the logarithm function of PSNR, and regression equation with 95% prediction interval were also obtained. The values of the regressions were above 0.8, indicating good fittings of the regressions. Then, we involved another 40 datasets, which were isolated from the training/validation/test groups to verify the accuracy of the regressions on predicting the SSIM and PSNR from the mean epistemic uncertainty values. The additional 40 datasets are presented as green crosses in Fig.8 with around 95% of data located in the prediction intervals for all regressions, revealing that the correlations between mean epistemic uncertainty and SSIM/PSNR follow the predicted distribution with a high accuracy. Therefore, the SSIM and PSNR values can be predicted with high accuracy using the proposed method in clinical settings when ground truth is not available.
In this paper, we firstly proposed a method using 2D RCAN-based network for restoration of 3D high-quality MR image. In the SR task, our network outperformed both other 2D and 3D networks in all downsampling strategies, but consumed 10% of GPU resource and below 50% of inference time comparing to 3D networks, therefore easily deployed on any consumer GPU. In the MAR task, our network in single-slice mode achieved significantly promoted performance than UNet, and further improved its performance with thin-slab mode by enhancing through-slice agreement. Our method can be expended by using 3D network to process 4D images, for instance enhancing temporal and spatial resolutions simultaneously.
Furthermore, we compared various combinations of in-plane and through-plane dwonsampling with different scale factors. Our experiments revealed that using for acceleration and for acceleration result in the best SR reconstruction performance. Therefore, these downsampling strategies are recommended for the acquisition of real LR image when applying deep learning-based SR reconstruction in clinical environment.
Besides, we proposed a novel retrospective motion artifact generation method. In our method, a predefined motion pattern with adjustable duration and frequency of movement was used, resulting in controllable and quantifiable severity of generated motion artifact. In our experiments, motion artifact corrupted images with linearly increased SSIM and PSNR values were obtained by adjusting the frequency of movement. The motion pattern can be adjusted based on any specific scenario.
At last, we utilized evidential regression learning to generate uncertainty maps using our network simultaneously with the reconstructed images. The generated epistemic uncertainty map matched the absolute error and SSIM map between the ground truth and the reconstructed image. In addition, we investigated the relationship between epistemic uncertainty and the SSIM/PSNR values. Our experiments showed that the mean uncertainty of each image slice followed linear and exponential regression to the SSIM and PSNR, respectively. Therefore, using the regression equations obtained from pairs of images from test dataset, the SSIM and PSNR values can be predicted in clinical settings, where the ground truth is not available.
-  Dong, C., Loy, C.C., He, K., Tang, X., 2016. Image super-resolution using deep convolutional networks. IEEE transactions on pattern analysis and machine intelligence 38, 295-307.
-  Dong, C., Loy, C.C., Tang, X., 2016. Accelerating the super-resolution convolutional neural network, in: European Conference on Computer Vision, Springer. pp. 391-407.
-  Sun, S., Chen, W., Wang, L., Liu, X., Liu, T.Y., 2016. On the depth of deep neural networks: A theoretical view., in: AAAI, pp. 2066-2072.
Glorot, X., Bengio, Y.. Understanding the difficulty of training deep feedforward neural networks, in: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics 2010, 249-256.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun: Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
-  Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong, Yun Fu; Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 286-301.
-  Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., van der Laak, J.A., van Ginneken, B., Sanchez, C.I.. A survey on deep learning in medical image analysis. Medical image analysis 2017 42, 60-88.
-  Oktay, O., Bai, W., Lee, M., Guerrero, R., Kamnitsas, K., Caballero, J., de Marvao, A., Cook, S., ORegan, D., Rueckert, D., 2016. Multi-input cardiac image super-resolution using convolutional neural networks, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 246-254.
-  You, C., Li, G., Zhang, Y., Zhang, X., Shan, H., Li, M., Ju, S., Zhao, Z., Zhang, Z., Cong, W., et al., 2019. CT super-resolution GAN constrained by the identical, residual, and cycle learning ensemble (GAN-CIRCLE). IEEE Transactions on Medical Imaging 39, 188-203.
-  Wang, S., Su, Z., Ying, L., Peng, X., Zhu, S., Liang, F., Feng, D., Liang, D., 2016. Accelerating magnetic resonance imaging via deep learning, in: Biomedical Imaging (ISBI), 2016 IEEE 13th International Symposium on, IEEE. pp. 514-517.
-  Chen, Y et al.. Brain MRI Super Resolution Using 3D Deep Densely Connected Neural Networks. 2018 IEEE 15th International Symposium on Biomedical Imaging.
Chen, Y et al.. Efficient and Accurate MRI Super-Resolution using a Generative Adversarial Network and 3D Multi-Level Densely Connected Network, 2018 MICCAI, pp. 91– 99.
-  Pham, C.H., Ducournau, A., Fablet, R., Rousseau, F., 2017. Brain MRI superresolution using deep 3D convolutional networks, in: Biomedical Imaging (ISBI 2017), 2017 IEEE 14th International Symposium on, IEEE. pp. 197-200.
-  Xiaole Zhao, Yulun Zhang, Tao Zhang, and Xueming Zou. Channel Splitting Network for Single MR Image Super-Resolution. IEEE TRANSACTIONS ON IMAGE PROCESSING 2019, 28, 5649-5662.
-  Evan M. Masutani, Naeim Bahrami, Albert Hsiao. Deep Learning Single-Frame and Multiframe Super-Resolution for Cardiac MRI. Radiology 2020; 295:
-  Akshay S. Chaudhari, Zhongnan Fang, Feliks Kogan, Jeff Wood, Kathryn J. Stevens, Eric K. Gibbons, Jin Hyung Lee, Garry E. Gold, Brian A. Hargreaves. Super-resolution musculoskeletal MRI using deep learning. Magn Reson Med 2018; 80: 2139-2154.
-  ZHANG SIYUAN, DONG JINGXIAN, JIANG CAIWEN, HOU WENGUANG, DENG XIANBO. 2D CNN-Based Slices-to-Volume Superresolution Reconstruction. IEEE Access 2020, 8, 86357-86366.
-  Wei Lu, Zhijin Song, Jinghui Chu. A novel 3D medical image super-resolution method based on densely connected network. Biomedical Signal Processing and Control 2020, 62, 102120.
-  Küstner T, Armanious K, Yang J, Yang B, Schick F, Gatidis S. Retrospective Correction of Motion‐affected MR Images using Deep Learning Frameworks. Magn Reson Med 2019; 82: 1527-1540.
-  Johnson PM, Drangova M. Conditional generative adversarial network for 3D rigid‐body motion correction in MRI. Magn Reson Med 2019; 82: 901-910.
-  Chunga, H et al.. Simultaneous super-resolution and motion artifact removal in diffusion-weighted MRIusing unsupervised deep learning. arXiv:2105.00240v1
-  Chengyan Wanga, Yucheng Liang, Yuan Wu, Siwei Zhao, Yiping P. Du. Correction of out-of-FOV motion artifacts using convolutional neural network. Magnetic Resonance Imaging 2020, 71, 93-102.
-  Ben A Duffy, Lu Zhao, Farshid Sepehrband, Joyce Min, DannyJJ Wang, Yonggang Shi, Arthur W Toga, Hosung Kim. Retrospective motion artifact correction of structural MR images using deep learning improves the quality of cortical surface reconstructions. NeuroImage 230 (2021) 117756.
-  Gyutaek Oh, Jeong Eun Lee, and Jong Chul Ye. Unpaired MR Motion Artifact Deep Learning Using Outlier-Rejecting Bootstrap Aggregation. IEEE TMI.2021.3089708.
-  Seul Lee, Soozy Jung, Kyu-Jin Jung, Dong-Hyun Kim. Deep Learning in MR Motion Correction: a Brief Review and a New Motion Simulation Tool (view2Dmotion). iMRI 2020;24:196-206.
-  Ryutaro Tanno, Daniel E. Worrall, Enrico Kaden, Aurobrata Ghosh, Francesco Grussu, Alberto Bizzi, Stamatios N. Sotiropoulos, Antonio Criminisi, Daniel C. Alexander. Uncertainty Quantification in Deep Learning for Safer Neuroimage Enhancement. arXiv:1907.13418v1.
-  Yu Qin, Zhiwen Liu, Chenghao Liu, Yuxing Li, Xiangzhu Zeng, Chuyang Ye. Super-Resolved q -Space deep learning with uncertainty quantification. Medical Image Analysis 2021; 67: 101885.
-  Balaji Lakshminarayanan, Alexander Pritzel, Charles Blundell. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. NIPS 2017.
-  Alex Kendall, Yarin Gal. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? NIPS 2017.
-  Alexander Amini, Wilko Schwarting, Ava Soleimany, Daniela Rus. Deep Evidential Regression. NeurIPS 2020.
-  Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, Eero P. Simoncelli: Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Transactions on Image Processing, 2004.
-  Evan M. Masutani, Naeim Bahrami, Albert Hsiao. Deep Learning Single-Frame and Multiframe Super-Resolution for Cardiac MRI. Radiology 2020; 295:
-  Jun Shi, Qingping Liu, Chaofeng Wang, Qi Zhang, Shihui Ying, and Haoyu Xu. Super-resolution reconstruction of MR image with a novel residual learning network algorithm. Phys. Med. Biol. 2018; 63.
-  Guang Yang, Simiao Yu, Hao Dong, Greg Slabaugh, Pier Luigi Dragotti, Xujiong Ye, Fangde Liu, Simon Arridge, Jennifer Keegan, Yike Guo, and David Firmin. DAGAN: Deep De-Aliasing Generative Adversarial Networks for Fast Compressed Sensing MRI Reconstruction. IEEE Transcations on Medical Imaging 2018; 37.
-  Barron JT, A General and Adaptive Robust Loss Function. ArXiv:1701.03077.
-  Van Essen, D.C., Smith, S.M., Barch, D.M., Behrens, T.E., Yacoub, E., Ugurbil, K., Consortium, W.M.H., et al., 2013. The WU-Minn human connectome project: an overview. Neuroimage 80, 62-79.
-  J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super-resolution using very deep convolutional networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 1646–1654.
-  Y. Tai, Y. Jian, and X. Liu, “Image super-resolution via deep recursive residual network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recogn.,2017, pp. 2790–2798.
-  C. Dong, C. C. Loy, and X. Tang, “Accelerating the super-resolution convolutional neural network,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 391–407