Multi-Contrast Super-Resolution MRI Through a Progressive Network

08/05/2019 ∙ by Qing Lyu, et al. ∙ Rensselaer Polytechnic Institute 0

Magnetic resonance imaging (MRI) is widely used for screening, diagnosis, image-guided therapy, and scientific research. A significant advantage of MRI over other imaging modalities such as computed tomography (CT) and nuclear imaging is that it clearly shows soft tissues in multi-contrasts. Compared with other medical image super-resolution (SR) methods that are in a single contract, multi-contrast super-resolution studies can synergize multiple contrast images to achieve better super-resolution results. In this paper, we propose a one-level non-progressive neural network for low up-sampling multi-contrast super-resolution and a two-level progressive network for high up-sampling multi-contrast super-resolution. Multi-contrast information is combined in high-level feature space. Our experimental results demonstrate that the proposed networks can produce MRI super-resolution images with good image quality and outperform other multi-contrast super-resolution methods in terms of structural similarity and peak signal-to-noise ratio. Also, the progressive network produces a better SR image quality than the non-progressive network, even if the original low-resolution images were highly down-sampled.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 3

page 5

page 6

page 7

page 8

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Magnetic resonance imaging (MRI) is one of the most widely used medical imaging modalities. Compared with other modalities such as computed tomography (CT) and nuclear imaging, MRI is advantageous in providing clear tissue structure and functional information without inducing ionizing radiation. With pulse sequences, the MRI system can be flexibly configured to generate multi-contrast images like T1, T2, and proton density (PD) weighted images, which contain important physiological and pathological features. However, a major shortcoming of current MRI system is that it is difficult to obtain high-resolution (HR) MR images in clinical applications due to the trade-off among the system cost-effectiveness and signal-to-noise ratio [plenge2012super, van2012super]. Clinically, in order to obtain HR MR images, patients are required to remain stable in the gantry for long time, which intensifies patients’ discomfort and inevitably introduces motion artifacts that compromise image quality.

Super-resolution (SR) techniques have a great utility in improving MR image quality without any hardware modification. Current SR approaches can be categorized into single-image super-resolution (SISR) methods and multi-contrast super-resolution (MCSR) methods. The SISR approach was extensively studied in the past a few decades, which aims to improve low-resolution (LR) images obtained in a single contrast mode. Currently, there are many SISR methods in the literature. From model-based methods like interpolation algorithms

[park2003super] and iterative deblurring algorithms [hardie2007fast, manjon2010non] to learning-based methods such as dictionary learning methods [yang2010image, zeyde2010single, timofte2014a+]

. Many impressive SR results were achieved but they are still insufficiently mature for clinical use. In recent years, SISR becomes a hot topic in the deep learning field. A number of neural network-based SR models were proposed. According to different model designs, these models can be categorized into linear models such as SRCNN

[dong2015image] and VDSR [kim2016accurate], residual models like CARN [ahn2018fast] and REDNet [mao2016image], recursive models like DRCN [kim2016deeply] and DRRN [tai2017image], densely connected models RDN [zhang2018residual] and D-DBPN [haris2018deep], attention-based models with SelNet [choi2017deep] and RCAN [zhang2018image] as examples, progressive models SCN [wang2015deep] and LapSRN [lai2017deep], and generative adversarial network (GAN) models EnhanceNet [sajjadi2017enhancenet] and SRGAN [ledig2017photo].

Clinically, T1, T2 and PD weighted images are often generated together for diagnosis with complementary information. Although weighted images are only good at showing a certain type of tissues, they reflect the same anatomy, and can provide synergy when used in combination. Based on this consideration, the MRI MCSR approach is naturally conceivable to use MRI multi-contrast images collectively for SR imaging superior to SISR results. Typically, MCSR imaging may achieve super-resolution results for a specific contrast image aided by HR images with other contrasts. Relevant information in those reference images is extracted to guide recovery of details in the image of interest. Similar to the SISR methods, current MCSR methods can also be categorized into model-based methods and learning-based methods. Examples of model-based methods are based on non-local mean [jafari2014mri], total variation [brudfors2018mri], edge gradient [zheng2018multi], shareable information [gong2015promise], and similarity [manjon2010mri, zheng2017multi]. The methods using dictionary learning [lu2015mr]

and convolutional neural network

[zeng2018simultaneous] are representatives of the learning-based methods.

In this paper, we propose a one-level non-progressive network for low up-sampling factor MCSR and a two-level progressive neural network for high up-sampling factor MCSR. The main contributions of this paper are: 1) the two-level progressive neural network based on the Wasserstein generative adversarial network with gradient penalty (WGAN-GP) architecture that can achieve excellent MCSR results in the case of a high up-sampling factor; 2) the finding that among multiple multi-contrast information integration strategies, information combined in a high-level feature space gives the best MCSR results; and 3) the composite loss involving mean-squared-error (MSE), perceptual loss, and texture matching loss to ensure that generated images can recover texture details and faithful to the ground truth.

Fig. 1:

Two proposed network architectures for MCSR. (a) The generator structure of the proposed one-level non-progressive model, which contains an encoder-decoder network and a reference feature extraction network. (b) The generator structure of the proposed two-level progressive model.

Ii Methodology

Ii-a Overall SR Process

Given an HR MR image of the size , the LR image obtained after the down-sampling process with a factor will be of the size . This down-sampling process can be expressed as:

(1)

where denotes the down-sampling or blurring function, and represents the system noise. Theoretically, an SR process is to find the inverse solution of the original down-sampling function . As SR imaging is an ill-posed problem, it is impossible to find the exact inverse solution, and only approximate solutions can be obtained. The goal of the SR imaging process is to find a most desirable inverse function of the ideal inverse solution . Then, the SR process can be expressed as:

(2)

To obtain such an approximate solution , image priors should be used. In our previous SISR study [lyu2019mri], we proposed an ensemble-learning-based deep learning framework to achieve MRI SR imaging with complementary image priors. Although a dataset with rich image features was built for single-contrast images, the amount of prior information gathered from single-contrast images is still limited. Therefore, in this study we take advantage of multi-contrast images for MRI SR imaging, which should contain more prior information than single-contrast images. Specifically, we propose a deep learning-based MCSR method for SR T2 weighted imaging by incorporating HR PD or T1 weighted images as reference images.

Ii-B Down-sampling and Zero-filling

In our previous study [lyu2019mri], LR MRI images were obtained by down-sampling T2-weighted images in the frequency domain. We first converted the original T2 image of size into the k

-space. Then, we cropped data and only kept data points in a central low frequency region. For the down-sampling factors 2, 3 and 4, the central 25%, 11.1% and 6.25% data points were kept. All the peripheral data points were zeroed out, which is called zero-filling. Then, we used the inverse Fourier transform to convert the modified data back into the image domain to produce LR images. Through such a down-sampling and zero-filling process, we degraded the image quality and kept the image size unchanged.

Ii-C One-Level Non-progressive Network

The proposed one-level non-progressive network is based on the Wasserstein generative adversarial network with gradient penalty (WGAN-GP) [gulrajani2017improved] which includes a generator and a discriminator. As shown in Fig. 1 (a), the generator consists of an encoder-decoder network and a reference feature extraction network.

The encoder-decoder network [shan20183]

mainly contributes to the improvement of image quality. The encoder has 8 sequential convolutional layers, each of which is followed by a rectified linear unit (ReLU). The number of these convolutional filters are 32, 32, 64, 64, 128, 128, 256, and 256, respectively. Each convolutional layer uses

convolution kernels with a stride of 1 and a padding of 0. The encoder condenses information and extracts features layer by layer. The decoder consists of 8 sequential transposed convolutional layers, each of which is also followed by a ReLU. The numbers of filters in each transposed convolutional layer are 256, 128, 128, 64, 64, 32, 32, 1, respectively. Differing from convolutional layers, transposed convolutional layers enlarge feature maps through transposed convolution filters of kernel size

with a stride of 1 and a padding of 0. The decoder gradually dilates information layer by layer. Between the encoder and the decoder, there are three skip connections. Feature maps extracted in the encoder are transferred to the decoder through the skip connections. These skip connections can boost the training process through utilizing the similarity between input and output images. The discriminator includes three blocks followed by two fully-connected layers. Each block contains two convolutional layers, two ReLU activation functions, and a

max-pooling layer. Each convolutional layer contains

convolution kernels with a stride of 1 and a padding of 1. The numbers of convolutional filters are 64, 64, 128, 128, 256, and 256, respectively. The numbers of neurons in the two fully-connection layers are 128 and 1 respectively.

The reference feature extraction network is utilized to extract feature maps from the reference images. These extracted features are then fed into the encoder-decoder network. The structure of the reference feature extraction network is the same as the encoder in the encoder-decoder network.

Ii-D Objective Function

The objective function of the generator contains four parts: the adversarial loss , MSE loss , perceptual loss , and texture matching loss .

The adversarial loss is used to train the generator to produce indistinguishable results from original HR images so that it can fool the discriminator. For a WGAN-GP model, the adversarial loss can be expressed as:

(3)

where represents the generator, stands for the discriminator, and is the input LR image of the generator.

The MSE is one of the most widely used fidelity terms in SR studies. It evaluates the similarity between generator outputs and ground truths in the image space. MSE can greatly improve the signal-to-noise ratio of generated results [wang2004image]. MSE loss can be expressed as

(4)

Although MSE can increase generator outputs with high PSNR and structural similarity to ground truth, it tends to produce over-smoothing SR results [dosovitskiy2016generating, johnson2016perceptual]

. To overcome this problem, the perceptual loss is included in the objective function. Different from the MSE loss that evaluates image similarity at the pixel-level, the perceptual loss measures image similarity in a high-level feature space. We used the pre-trained VGG16 model

[simonyan2014very] to extract feature maps. Specifically, feature maps from the , , , and convolutional layers were selected for calculation of the perceptual loss. All feature maps equally contribute to the perceptual loss. The perceptual loss can be expressed as the MSE of feature maps between the generator outputs and the ground truth:

(5)

where represents extracted feature maps.

Proposed by Gatys et al. [gatys2016image], the texture matching loss contributes to the texture similarity between the generator output and ground truth [sajjadi2017enhancenet]

. The texture matching loss calculates the Gram matrix of the generator result and the ground truth in a feature space. The Gram matrix measures feature correlations of multiple layers. Texture information can be kept by decreasing the Gram matrix difference between the generator result and the ground truth. This texture matching loss is also called the style loss in image style transfer studies because the Gram matrix can represent the style of an image. This type of characteristics is suitable for MCSR studies as well. In MCSR images, there exists a style transfer between different contrast images. This loss function ensures correct features are used in this style transfer process. The texture matching loss is the MSE of the Gram matrix between the generator output and the ground truth in a feature space:

(6)

where stands for the Gram matrix defined as the outer product of feature map matrix and its transpose matrix . is the number of feature maps in the convolution layer, and stands for the height times the width of the feature map. In this study, feature maps involved in the texture matching loss are the same as those used for the perceptual loss.

In total, the objective function of the generator is:

(7)

where presents the trainable parameters of the generator.

The objective function of the discriminator is the same as the objective function of the original WGAN-GP model, which is defined as follows:

(8)

where represents the trainable parameters of the discriminator.

Ii-E Two-Level Progressive Network

Different from the neural networks that achieve SR imaging through one step, the progressive network splits an SR work into several sequential steps. In each step, the progressive network implements the image up-sampling by a small factor [lai2017deep, Anwar2019]. Through the combination of several steps, it can implement the SR for a large up-sampling factor. LapSRN [lai2017deep] is a good example of the progressive network, which contains three sequentially displaced subnets and each subnet can increase the image resolution by a factor of 2. Through such a neural network, LapSRN can finally improve image resolution by a factor of 8.

Here we propose a two-level progressive neural network to achieve MCSR for an up-sampling factor of 4. The proposed model is the combination of two one-level nonprogressive network. Its generator contains two encoder-decoder networks and a reference feature extraction network, as shown in Fig. 1b. The discriminator is the same as the discriminator for the encoder-decoder network. To obtain 4-fold MCSR results, we trained the progressive network and ensured that it can progressively improve image quality level by level. In the first level, 2-fold MSCR results will be achieved. The other 2-fold MCSR results will be obtained in the second level. The input of the progressive network is the LR images through 4-fold down-sampling and zero-filling. The ground truth of the first level is the LR images through 2-fold down-sampling and zero-filling. The ground truth of the second level is the original HR images.

The objective function of the progressive network combines results of both the two levels. The modified equations of (4-6) are expressed as

(9)
(10)
(11)

where stands for 4-fold down-sampled LR images, for 2-fold down-sampled LR images, and for original HR images.

Ii-F Image Quality Evaluation Metrics

The structural similarity (SSIM), peak signal-to-noise ratio (PSNR) [wang2004image] and information fidelity criterion (IFC) [sheikh2005information] metrics are used to evaluate the image quality of MCSR results.

SSIM is a fidelity metric that compares images in the terms of luminance, contrast, and structural similarity. A widely used SSIM expression is as follows:

(12)

where and stand for the average pixel intensities of image and , and

represent the pixel intensity variances of

and . is the covariance of image and , and and are two variables to stabilize the division. As SSIM measures the similarity at the patch level, in this work all SSIM scores will be given as the mean of SSIM values based on patches.

PSNR is another commonly used fidelity measure. It is related to MSE, and can be expressed as the ratio between the square of the maximum value and the MSE value:

(13)

IFC is proposed by Sheikh [sheikh2005information]. It measures image quality based on natural scene statistics. While image information can be extracted by human visual system, IFC utilizes the mutual information between the reference and a distorted image to evaluate image quality. Compared with SSIM and PSNR, IFC takes the correlation between image and human visual perception into account. However, IFC is not the best at evaluating structural similarity.

Currently, there is no widely accepted measuring metric that can be used to accurately evaluated image quality in all possible circumstances. Each of the above-mentioned metric has its own limitation. To comprehensively evaluated image quality, these three metrics are used in combination.

Fig. 2: Comparisons between different models used in our ablation studies in terms of the perceptual loss, mean-squared-error (MSE) loss, Wasserstein distance, PSNR and SSIM.

Iii Experiment

Iii-a Datasets

IXI dataset: The IXI dataset [IXI] contains registered T2 weighted and PD weighted MRI images of 578 patients. T2 weighted images were used for SR with PD weighted images as the reference. 7,000 pairs of T2 and PD weighted images were selected for training. Another 1,955 pairs for testing. We adopted 10-fold cross-validation in this study. The size of original HR images of both T2 and PD weighted images is . 2-fold, 3-fold, and 4-fold down-sampled T2 weighted LR images were created through down-sampling and zero-filling. Before training, all images were normalized over the range of [0, 1]. Each LR and HR image pairs was cropped into 16 patch pairs with the patch size of .

NAMIC Dataset: NAMIC brain multimodality dataset [NAMIC] includes registered T1 and T2 weighted MRI images scanned from 20 patients.1,620 pairs were used for training, and 180 different pairs for testing. Similar to what we did with the IXI dataset, we chose T2 weighted images for SR with T1 weighted images as the reference. The size of the original HR image is . We down-sampled and zero-filled T2 weighted images for 2-, 3-, and 4-fold LR images. Before training, all images were normalized over the interval [0, 1]. Each LR and HR image pairs was cropped into 16 patch pairs with the patch size of .

Iii-B Experimental Details

For training, 112K image patches of size

were randomly selected with a batch size of 32. The training process continued for 50 epochs with the learning rate of

. The discriminator was trained four times before the generator was trained once. Hyperparameters

, , and in the objective function of the generator were determined based on the result shown in Fig. 3. We first determined the coefficient of the perceptual loss. As shown in Fig. 3, both SSIM and PSNR scores get dramatically increased when goes up from 0.01 to 1.0. Some further SSIM and PSNR improvement can be obtained by increasing from 1.0 to 100. However, such an increment was far less than the increment in the range of 0.01 and 1.0. Considering a too large perceptual loss coefficient will overwhelm the contributions from other loss components, we finally set to 1.0. Next, we tuned and . For the coefficient of the MSE loss, when was set to 0.1, the highest PSNR and the second highest SSIM scores can be obtained. Based on these, we set to 0.1. Similarly, we set to 0.1. The value of in the objective function of the discriminator was 10, being consistent to the paper [gulrajani2017improved]. The Adam optimization method [kingma2014adam]

was used to train the neural network. All the work was implemented using PyTorch on a GTX 1080Ti GPU.

Fig. 3: Results of tuning hyperparameters in the objective function (the x-axes are on logarithmic scale).

Iv Results

Iv-a Convergence Behavior

During the training process, the perceptual loss , MSE loss , and Wasserstein distance (

) in each epoch were recorded for monitoring the convergence of the network model. Variations of the perceptual loss and MSE loss reflect the similarity between the generated result and the ground truth in the high-level feature space and low-level image space respectively. The Wasserstein distance is known as the earth-mover’s (EM) distance, which shows the probability distribution similarity between the generator output data and the true data. It is defined as

(14)

As shown in Fig. 2, all perceptual loss curves, MSE loss curves, and Wasserstein distance curves gradually decrease towards zero with the continuation of the training process, indicating that our model converged stably. After 40 epochs, all curves become stable and close to zero, which reflects the fact that the generator can stably bring about super-resolution results with great similarity relative to the corresponding HR ground truth.

Fig. 4: Schematic views of the four ablation studies. (a)-(d) corresponds to the proposed four ablation studies. The blue block represents the encoder-decoder network and the orange block stands for the reference feature extraction network. (a) The first study is an SISR study in which the input is a T2 weighted LR image, and the output is an T2 weighted SR image. (b) The second study is an image synthesis study in which the input is a PD weighted HR image, and the output is a T2 weighted SR image. (c) The third study is an MCSR study in which the input is a T2 weighted LR image, with the PD weighted HR image as the reference, the output is the T2 weighted SR image. Multi-contrast information is combined in a low-level image space. (d) The fourth study is also an MCSR study in which the input is a T2 weighted LR image, with the PD weighted HR image as the reference, the output is the T2 weighted SR image. Multi-contrast information is combined in a high-level feature space.

Iv-B Ablation Study

To find the best neural network structure for MCSR imaging, we built four models and conducted a systematic ablation evaluation. In this subsection, only IXI dataset was used. In the first study, to investigate the utility of the introduction of reference images, we removed the reference feature extraction network from the proposed network and only kept a single encoder-decoder network with only T2 weighted LR images as the input. In the second study, we were interested in directly converting PD weighted images into T2 weighted images through an image synthesis process. Hence, we used PD weighted HR images as the input to the same neural network in the first study, instead of T2 weighted LR images. In the third and fourth studies, we focused on how to effectively utilize the information in PD weighted HR images. In the third study, these two-contrast images were combined in a low-level image space and fed into the same neural network model used in the previous two studies. In the fourth study, two-contrast images were combined in a high-level feature space, which is the proposed network. The above four studies are illustrated in Fig. 4. For comparison, all input T2 weighted LR images used in the ablation study are 2-fold down-sampled LR images. Results from the ablation studies are shown in Fig. 5 and Table I.

Fig. 2 shows that convergence curves in the second ablation study, with the highest Wasserstein distance, perceptual loss, MSE loss and the lowest SSIM and PSNR values. As shown in Fig. 5, the SR result obtained in the second study also has the worst image quality as it has the largest pixel-value difference from the ground truth. Taking the center white matter regions in the first three rows as an example, all other SR results show the fissure with clear shapes while the second SR result cannot. Quantitative results in Table I also indicate that the results from the second ablation study have the lowest SSIM, PSNR and IFC scores. All these results emphasize that the image synthesis study converting PD weighted HR images into T2 weighted HR images cannot obtain high-quality SR results.

When comparing results among the first, third and fourth ablation studies, MCSR results from the third and fourth ablation study beat the SISR results from the first study. MCSR results have higher PSNR, SSIM, and IFC scores and less pixel-value differences between MCSR results and the ground truths than SISR results. As an example, the dark curve highlighted by the red arrow in the T2 weighted LR image in the fourth row of Fig. 5 can be hardly distinguished from the background. However, this curve shows clearly in the corresponding PD weighted HR image. After the training, this curve is still hard to be seen in the SR result in the first ablation study but it became much clearer in the third and fourth ablation studies that utilized PD weighted HR images in the training process.

In the third and fourth ablation study, curves of the third study in Fig. 2 decrease quickly in the first several training epochs. However, after 10 epochs, curves of the fourth study are slightly lower than the third study. Fig. 5 demonstrates that the fourth study can achieve SR results with smaller pixel-value differences. Quantitative results in Table I also show that the fourth ablation study results beat the third ablation counterparts in terms of PSNR, SSIM and IFC scores.

Fig. 5: Results from the ablation-based evaluation. PSNR, SSIM, IFC values below each SR result are shown in blue, red, and green, respectively. In the second and fifth row, SR1, SR2, SR3, and SR4 indicate the results from four ablation studies respectively. The hot maps in the bottom row show the absolute pixel-value difference between ablation study results and the corresponding ground truth T2 weighted HR image.
SR1 SR2 SR3 SR4
SSIM 0.9600.012 0.9150.025 0.9750.006 0.9760.006
PSNR 34.7051.457 28.2821.544 38.2251.684 38.5131.765
IFC 3.9900.246 1.8900.178 4.6660.298 5.0580.332
TABLE I: Statistic Result of Ablation Studies Based on IXI Dataset (MEANSTD)

Iv-C Non-progressive Model MCSR Results with 2-fold, 3-fold and 4-fold Up-sampling

In the previous subsection, we demonstrated that our method can achieve 2-fold up-sampling MCSR results with good image quality. Now, let us study MCSR with a higher up-sampling factor. MCSR results for 2-, 3-, and 4-fold up-sampling rates based on the one-level model are shown in Figs. 6, 7, and Table II

. It can be seen that when images are down-sampled by a larger factor, the down-sampled LR images become much more blurred with much less visible details. After training, we can obtain quite good MCSR results with clearer shapes and textual details, and these MCSR results are with great structural similarity with the corresponding HR image. For example, fissures in the white matter pointed by red arrows in Fig. 6 become gradually less clear and less detectable with the increment of down-sampling factors. However, in each MCSR result, these fissures become much clearer and detectable. Among all MCSR results, it can be found that MCSR results obtained from a smaller down-sampling factor have higher PSNR, SSIM, and IFC scores and less pixel-value differences with the HR ground truth, indicating that these MCSR results based on LR images with a small down-sampling factor have higher image qualities than those based on LR images with a larger down-sampling factor, as heuristically expected.

Fig. 6: T2 weighted SR results with different down-sampling factors based on the IXI dataset. LR images with different down-sampling factors are shown in the first and fourth rows, and their corresponding SR results in the second and fifth rows. PSNR, SSIM, and IFC scores are in blue, red, and green respectively below each subfigure. The hot maps show the absolute pixel-value differences between MCSR results and the ground truth T2 weighted HR image.
Fig. 7: T2 weighted SR results with different down-sampling factors based on the NAMIC dataset. LR images with different down-sampling factors are shown in the first and fourth rows and their corresponding SR results in the second and fifth rows. PSNR, SSIM, and IFC scores are in blue, red, and green respectively below each subfigure. The hot maps show the absolute pixel-value differences between the corresponding MCSR results and the ground truth T1 weighted HR images.
2SR 3SR 4SR
IXI Dataset SSIM 0.9760.006 0.9600.010 0.9500.013
PSNR 38.5131.765 35.2151.820 33.3292.116
IFC 5.0580.332 3.3040.245 2.7080.215
NAMIC Dataset SSIM 0.9740.006 0.9540.010 0.9250.014
PSNR 34.1270.978 31.5621.149 28.2071.178
IFC 3.5380.164 2.5040.105 1.9230.113
TABLE II: Statistic Result of Larger-Factor MCSR Imaging Using The One-Level Non-progressive Model (MEANSTD)

Iv-D Progressive Model Further Improves Image Quality with 4-fold Up-sampling

In the previous subsection, although quite good MCSR results can be obtained from highly down-sampled LR images like 4-fold SR results, these results are still with lower image quality compared with results based on low down-sampling factor LR images. We hope to further improve the quality of highly down-sampled LR images, and proposed a two-level progressive model to achieve this goal. In this subsection, we presented two MCSR results from the proposed progressive model trained with different objective functions. One is called 4PRO U SR, and the other one 4PRO C SR. “U” and “C” mean unconstrained and constrained respectively. The unconstrained results are from the progressive model trained with the objective functions (4), (5), and (6) that only considers the final 4-fold SR results. On the other hand, the constrained results are from the progressive model that trained with the objective functions (9), (10), and (11) that considers both 2-fold and 4-fold SR results.

Fig. 8 shows the comparison between each subnet’s output and its corresponding ground truth based on the 4PRO C SR model. As shown in Fig. 8, our two-level progressive network can generate two levels of MCSR results under a strong restriction. Compared with their ground truths, the network can produce very faithful results.

Fig. 8: Comparison of the two-level progressive network outputs with their ground truths. 2LR represents the ground truth at the first level, and T2 HR stands for the ground truth at the second level. 2PRO SR shows the outputs of the first level, and 4PRO SR the outputs of the second level.

Figs. 9 and 10 show the comparison of 4-fold MCSR results based on the one-level non-progressive model and the two-level progressive model. It can be found that the progressive network produced MCSR results have sharper edges, clearer texture details, and are with higher PSNR, SSIM, and IFC scores and less pixel-value differences from the corresponding ground truths than the one-level non-progressive model. Table III statistically compares different methods. Both the 4PRO U SR and 4PRO C SR progressive results are better than the non-progressive results. Compared with the progressive result from the unconstrained model (4PRO U SR), the result from the constrained model (4PRO C SR) have higher SSIM, PSNR, and IFC scores, indicating the constrained progressive model can bring about SR imaging result with better image quality than the unconstrained model.

Fig. 9: Comparisons of T2 weighted MCSR results from the one-level non-progressive network and the two-level progressive network. Results are all based on the IXI dataset. 4SR and 4PRO C SR denote MCSR results using the one-level non-progressive network and the two-level progressive network respectively. PSNR, SSIM, and IFC scores are in blue, red, and green respectively below each subplot. The hot maps show the absolute pixel-value differences between the super-resolution results and the ground truth T2 weighted HR images.
Fig. 10: Comparisons of T2 weighted MCSR results from the one-level non-progressive network and the two-level progressive network. Results are all based on the NAMIC dataset. 4SR and 4PRO C SR denote MCSR results using the one-level non-progressive network and the two-level progressive network respectively. PSNR, SSIM, and IFC scores are in blue, red, and green respectively below each subfigure. The hot maps show the absolute pixel-value differences between corresponding results and the ground truth T1 weighted HR images.
4LR 4SR 4PRO U SR 4PRO C SR
IXI Dataset SSIM 0.7930.029 0.9500.013 0.9580.011 0.9610.010
PSNR 27.1511.309 33.3292.116 34.8142.192 35.1402.210
IFC 1.6270.150 2.7080.215 3.1290.270 3.3540.303
NAMIC Dataset SSIM 0.7510.024 0.9250.014 0.9510.011 0.9530.011
PSNR 26.8271.131 28.2071.178 30.7631.229 30.9421.584
IFC 1.6260.120 1.9230.113 2.3520.118 2.3850.120
TABLE III: Statistical Comparison of MCSR Results From Non-progressive Model And Progressive Models (MEANSTD)

Iv-E Comparing with State-of-the-art Methods

We also compared our results with some recently published state-of-the-art (SOTA) MCSR methods, including SSIP [manjon2010mri], SRGR [zheng2018multi], and Zeng’s model [zeng2018simultaneous]. According to the results shown in Table IV, our one-level-nonprogressive model achieved the highest SSIM scores among all the methods. the progressive model produced the highest SSIM and PSNR scores among all the methods.

UP-sampling Factor SSIP* Zeng* SRGR** Non-progressive Model Progressive Model
2 0.924 0.945 0.959 0.974 -
25.34 38.32 33.98 34.13 -
3 0.790 0.872 - 0.954 -
21.08 33.76 - 31.56 -
4 0.663 0.811 - 0.925 0.953
18.90 30.84 - 28.20 30.94
  • In each line, top values are SSIM scores and bottom values are PSNR scores.

  • Results are from Table II in [zeng2018simultaneous].

  • Results are based on the averaging of two MCSR results listed in Table I in [zheng2018multi].

TABLE IV: Comparisons of Our Methods with Other SOTA MCSR Methods Based on The NAMIC Dataset

V Discussions

In this study, we have jointly used the adversarial loss, MSE loss, perceptual loss, and textual matching loss in the objective function. Table V has summarized the comparison of SR imaging results with different objective functions. The results obtained with all four loss components enjoy the highest SSIM score and the second highest PSNR score. Adding the textual matching loss decreases the PSNR score, which is in accordance with the results in [sajjadi2017enhancenet]. This can be explained by the fact that the MSE loss directly contributes to the PSNR score. Adding the texture matching loss dilates the contribution of the MSE loss term. Despite adding the texture matching loss lowers the PSNR score, SR imaging results actually are with better overall image quality. Here, we support this point by showing that the highest SSIM score can be obtained by using all four loss functions together, which is in favor of the results presented in the paper [sajjadi2017enhancenet].

SSIM PSNR
0.973 37.899
0.975 38.966
0.976 38.513
TABLE V: Comparisons of SR Results on IXI Dataset Through Using Different Combinations of Loss Functions in the Objective Function

To investigate if the combination of different contrast image will help to improve the SR result and which information combination can achieve the best result, we conducted four ablation studies. According to the ablation results shown in Figs. 3, 5 and Table I, the use of only PD weighted HR images for image synthesis cannot generate good T2 weighted images, which indicates that PD weighted HR images cannot provide enough information or image priors for the neural network to reconstruct a T2 weighted image of high quality. Compared with the SISR method, MCSR methods show superior results, suggesting that multi-contrast images can provide much more image priors than single-contrast images for super-resolution imaging.

When comparing results of the third ablation study to that of the fourth study, the results from the fourth study slightly outperform that from the third one. With respect to different multi-contrast information combinations, the resultant SR imaging results indicate that multi-contrast image information combined in a high-level feature space can bring about better MCSR results than a low-level image space combination. These results may be appreciated from three aspects. First, adding such a reference feature extraction network splits the feature extraction process of two-contrast images, which facilities the neural network to flexibly extract features based on each contrast modality’s unique characteristics. Second, more parameters are used in the high-level combination strategy, which indicates that the neural network may have a stronger ability to extract more multi-contrast features for the SR imaging. Third, the high-level combination model only transfers features of T2 weighted images through skip connections to the decoder. Such a design enhances the contribution of T2 weighted features in the image restoration process happened in the decoder, and makes sure that the SR images are restored with more T2 characteristics.

Results of the one-level nonprogressive model in Fig. 6 and Table II show that our method produces MCSR results with clear edges and textures even if the original LR images are highly down-sampled with severe information loss. However, the image quality of MCSR results decreases with the increment of the down-sampling factor, which is not surprising as further down-sampled LR images come with less details, and it would be more difficult for the neural network to recover details based on such more blurred images.

To obtain good SR results with high up-sampling factors, we propose a two-level progressive network to progressively achieve MCSR level-/stage-wise. Compared with 4-fold MCSR results obtained from a one-level nonprogressive network, the proposed two-level progressive network shows significantly better results. According to Table IV, the progressive model can even achieve 4-fold up-sampling MCSR results close to 3-fold up-sampling results from the non-progressive model. The improvement of the progressive model attributes to the strong model parameter restriction. Parameters in the generator of the progressive network are strictly constrained during the training process guided by the objective function. At each level, the encoder-decoder network is trained to produce 2-fold up-sampling SR results. Compared with a loosely constrained training process like the one that only makes sure the final output close to the ground truth and no constraint on intermediate results (4PRO U SR in Table III), such a strong constriction can boost the optimization process and ensure parameters more reasonably adapted to generate good results.

Our progressive model can be extended to perform larger factor up-sampling MCSR tasks. For example, our current two-level progressive model can be extended into a three-level model for 8-fold up-sampling MCSR. We believe that for 8-fold resolution improvement, more contrast mechanisms may be needed. For example, we may need to couple CT and MRI images together [wang2015vision]. We have not implemented such an 8-fold MCSR because the limitation of the dataset. Also, the size of original HR images used in this study is 256256, with 8-fold down-sampling the obtained LR images would be too blurry.

Vi Conclusion

In this paper, we have proposed a one-level non-progressive neural network for low factor up-sampling (such as 2) and a two-level progressive neural network for large factor up-sampling (such as 4). For achieving better results, multiple losses have been combined in a composite object function. According to our results, MCSR results with high image quality can be generated when multi-contrast information is combined in a high-level feature space. The proposed progressive network could be extended for MCSR results with an even larger up-sampling factor if more different contrast images are available.

References