A Multimodal Deep Network for the Reconstruction of T2W MR Images

by   Antonio Falvo, et al.
Sapienza University of Rome

Multiple sclerosis is one of the most common chronic neurological diseases affecting the central nervous system. Lesions produced by the MS can be observed through two modalities of magnetic resonance (MR), known as T2W and FLAIR sequences, both providing useful information for formulating a diagnosis. However, long acquisition time makes the acquired MR image vulnerable to motion artifacts. This leads to the need of accelerating the execution of the MR analysis. In this paper, we present a deep learning method that is able to reconstruct subsampled MR images obtained by reducing the k-space data, while maintaining a high image quality that can be used to observe brain lesions. The proposed method exploits the multimodal approach of neural networks and it also focuses on the data acquisition and processing stages to reduce execution time of the MR analysis. Results prove the effectiveness of the proposed method in reconstructing subsampled MR images while saving execution time.


page 4

page 6

page 7


A Deep Cascade of Convolutional Neural Networks for MR Image Reconstruction

The acquisition of Magnetic Resonance Imaging (MRI) is inherently slow. ...

Towards multi-sequence MR image recovery from undersampled k-space data

Undersampled MR image recovery has been widely studied for accelerated M...

MR image reconstruction using the learned data distribution as prior

MR image reconstruction from undersampled data exploits priors to compen...

Towards Clinical Diagnosis: Automated Stroke Lesion Segmentation on Multimodal MR Image Using Convolutional Neural Network

The patient with ischemic stroke can benefit most from the earliest poss...

Learning to Decode 7T-like MR Image Reconstruction from 3T MR Images

Increasing demand for high field magnetic resonance (MR) scanner indicat...

Semantic Features Aided Multi-Scale Reconstruction of Inter-Modality Magnetic Resonance Images

Long acquisition time (AQT) due to series acquisition of multi-modality ...

1 Introduction

Nuclear magnetic resonance (NMR) is a transmission analysis technique that allows to obtain information on the state of matter, exploiting the interaction between magnetic fields and atoms nuclei. In the biomedical field, information deriving from the NMR is represented in the form of tomographic images. Nowadays, the NMR plays an important role in the health field, and it allows to carry out a whole typology of diagnostic exams, from traditional to functional neuroradiology, from internal diagnostic to obstetrics and pediatric diagnostics [1].

During the acquisition stage of an MR signal, it is necessary to sample the entire -space to obtain images that are as much detailed as possible [4, 10]. Data in the -space encode information on spatial frequencies and are generally captured line by line. Therefore, the acquisition time for a given sequence depends on the number of lines sampled in the -space, thus leading to a rather slow acquisition process. Moreover, significant artifacts may occur in the MR images caused by to slow movements of the patient, due to physiological factors or to fatigue, e.g., too much time in the same position [4, 10]. Moreover, the long scan time also increases the healthcare cost for the patient, besides limiting the availability of MR scanners.

Over the years, several methods, such as compressed magnetic resonance and parallel magnetic resonance [11, 3, 12, 7], have been proposed to accelerate MRI scans by skipping the

-space phase coding lines and avoid the aliasing phenomenon introduced by subsampling. The problem of accelerating magnetic resonance can also be tackled through deep learning techniques. In particular, the reconstruction of tomographic images has been often efficiently addressed by using convolutional neural networks (CNNs)

[8, 13, 17, 16, 14].

Most of the state-of-the-art methods focus on the reconstruction of MR images using a unimodal neural architecture where a subsampled image to be reconstructed is provided as input. In this paper, we propose a new deep learning method for reconstructing MR images by exploiting additional information provided by FLAIR images. Such images are widely used in the MR diagnosis as they allow to enhance the brain lesions due to the disease. FLAIR images are highly correlated with the T2 weighted images (T2WIs), thus the joint use of such images increases the efficiency of the reconstruction and also presents much more information in the lesion region. To exploit both the images, we propose a multimodal deep neural network, inspired by the well-known U-Net, a convolutional-based model that was developed for biomedical image segmentation [15]. In the literature, several studies have been proposed using a multimodal approach for image reconstruction. In [19]

, T2WIs were attempt to be estimated from T1WI, while other works focus on improving the quality of the subsampled images with the help of high resolution images with different contrast

[6, 9]. However, to the best of our knowledge, no attempt has ever been made to reconstruct T2WIs from subsampled T2WIs (T2WIsub) and FLAIR images while maintaining high image quality in the area of lesions.

Experimental results prove that the proposed method is able to accelerate the MR analysis four times, while preserving the image quality, with a high detail of any lesion and negligible aliasing artifacts.

The rest of the paper is organized as follows. In Section 2, we introduce the proposed approach, including a new subsampling mask, while the proposed Multimodal Dense U-Net is presented in Section 3. Results are shown in Section 4 and, finally, our conclusion is drawn in Section 5.

2 Proposed Approach: Main Definitions

We first focus on the images to be provided as input in order to reconstruct the T2WI.

2.1 Problem Formulation

We denote with the -space for the T2WI that represents the target. Multiplying the -space for a suitably designed mask , it is possible to obtain a subsampled version of the -space, i.e.,


The bidimensional inverse Fourier transform allows to achieve data into the space domain. Therefore, we define the fully-sampled target image

and the subsampled T2 image to be used for reconstruction through the proposed deep network. Finally, we denote the FLAIR image to be provided as input with .

We want to reconstruct the fully-sampled T2 image , given the only availability of subsampled and . The reconstructed T2 image is denoted as

. To this end, we build and train a deep network to minimize the following loss function:


in which the MSE denotes the mean-square error and DSSIM the structural dissimilarity index. The former is defined as:


On the other hand, the DSSIM is complementary to the structural similarity index (SSIM), which is often adopted to assess the perceived quality of television and film images as well as other types of digital images and videos. It was designed to improve traditional methods such as the signal-to-peak noise ratio (PSNR) and the mean square error (MSE), and it is defined as:


where , represent the mean values, and

the variances, and

the covariance.

2.2 Customization of a New Subsampling Mask

Most of the existing literature dealing with MRI acceleration mainly focuses on the reconstruction of images. However, the quality of the reconstruction depends significantly on how -space is sampled. This problem can be faced essentially by adopting one of the following approaches: 1) a dynamic approach based on deep learning in which cells of fixed width are allowed to move in the -space and change position based on the reconstruction performance; 2) a static approach in which fixed sampling masks are used that go to select only certain areas of the -space.

In this work, we choose the static approach since the dynamic one does not guarantee that the power spectrum of a reference image is similar to that of the test image. The adopted sampling method consists of a mask that acts along the direction of the phase coding of the -space, in which it is possible, once the subsampling factor is set, to choose the percentage of samples that will occupy the central part of -space, thus leaving the rest of the samples equidistant from each other.

Figure 1 shows two different types of mask both obtained by setting a subsampling factor . In the center mask of Fig. 1, samples are taken exclusively in the central area of the -space, where most of the low-frequency components can be found providing useful information on the contrast of the image [19]. However, in this work, we propose a new mask, depicted in Fig. 1, which selects the 80% of the total samples from the center and the remainder in an equidistant manner so as to have information even in the high frequencies.

Figure 1: Subsampling masks: a) center mask and b) the proposed custom mask.

3 Multimodal Dense U-Net

The proposed neural network is multimodal architecture: on one branch we provide the T2WIsub as input while on the other branch we provide the FLAIR image to be used to improve the reconstruction quality. We expect all the spatial information in the FLAIR image to help estimate the anatomical structures in T2WI.

Both inputs initially undergo separate contraction transformations to then merge later and follow the classic coding-decoding approach of U-Net models. The proposed Multimodal Dense U-Net is depicted in Fig. 2.

The network consists essentially of 4 components, namely convolutive layers, pooling layers, deconvolutive layers and dense blocks. The size of the characteristic map decreases along the contraction path through the pooling blocks as it increases along the expansion path by deconvolution. Pooling partitions the input image into a set of squares, and for each of the resulting regions returns the maximum value as output. Its purpose is to progressively reduce the size of the representations, so as to reduce the number of parameters and the computational complexity of the network, at the same time counteracting any overfitting.

Figure 2: Scheme of the proposed Multimodal Dense U-Net architecture.

Deconvolutive layers act inversely with respect to pooling and aim to increase the spatial dimensions of the inputs. This allows to obtain images of a size comparable to those of the input images from the network. In the simplest case these levels can be implemented as static oversampling with bilinear interpolation.

The dense block, proposed in [5]

, allows to effectively increase the depth of the entire network while maintaining a low complexity. Moreover, it requires less parameters to be trained. The dense block consists of three consecutive operations: batch normalization (BN) , ELUs activation functions

[2] and convolution filters. The hyper-parameters for the dense block are the growth rate (GR) and the number of convolutional layers (NC). The network ends with a reconstruction level consisting of a dense block followed by a convolutional layer that yields the reconstructed T2WI.

4 Experimental Results

4.1 Dataset and Network Setting

We test the proposed network on a dataset containing MRIs of multiple sclerosis patients [18]. In particular, the dataset is related to 30 patients and it contains axial 2D-T1W, 2D-T2W and 3D-FLAIR images. The final voxel size of such images is mm. In our work, a further preprocessing has been performed in MATLAB to make the voxel size isotropic to mm, to extract slices of size , and to shrink intensity to the range . T2WIsub images were created by considering two types of masks, the center mask and the proposed custom mask, with a subsampling factor .

The proposed Multimodal Dense U-Net has been implemented on Keras. In the training stage, for each patient we provide the network with 150 FLAIR and T2WIsub images using the T2WIs as target. For dense blocks, we set a zero growth rate and a number of levels equal to 5 with feature maps size of 64 and ELU activation levels. We use Adam as an optimizer for training. A total of 80 epochs are performed, with early stopping. The duration of each epoch is about 15 minutes, having set a batch size of 4 and using a desktop PC with an Intel Core i5 6600-K 3.50 GHz CPU, 16 GB of RAM and NVIDIA GeForce GTX 970 GPU. To quantitatively evaluate reconstruction performance, we use the MSE and DSSIM metrics.

Figure 3: Predicted images using: a) center mask and b) the proposed custom mask.

4.2 Evaluation of the Proposed Mask

We want to evaluate first the effectiveness of the proposed custom subsampling mask compared to the center mask on the quality of reconstruction in terms of the SSIM using a Dense U-Net network.

Results are shown in Fig. 3, where it is clear that the proposed custom mask allows us to obtain a reconstruction of the image with outstanding performance. In particular, using the center mask we get a 71% reconstruction percentage compared to the target (Fig. 3), while using the proposed custom mask (Fig. 3) the similarity index rises up to 88%.

4.3 Evaluation of the Proposed Deep Architecture

Conceptually the two networks might seem similar, but the proposed architecture manages the two inputs differently with respect to the standard Dense U-Net. Moreover, the hyper-parameters of the dense blocks chosen for our network considerably change the concept of dense block as the whole of growth was set to zero thus avoiding internal expansion in dense blocks.

We compare the reconstruction quality of the two networks in terms of SSIM having used the mask that provided the best performance for the subsampling, i.e., the proposed custom mask. Results are shown in Fig. 4, where it is clear that the quality of reconstruction has been considerably improved compared to Dense U-Net. In particular, the degree of similarity with respect to the target is 94% rather than 88% of the Dense U-Net. By using the proposed architecture, high image quality is achieved, thus enabling the recognition of brain injuries caused by the disease. We also show the loss function behavior for the proposed method in Fig. 5.

Figure 4: Predicted T2WI reconstructed by the proposed Multimodal Dense U-Net.
Figure 5: Loss function behavior.

5 Conclusion

In this work, we propose a deep learning model exploiting the capabilities of both multimodal networks and dense blocks. In particular, the proposed approach allows to reconstruct T2WIs, subsampled by a factor of 4, thus leveraging the correlation that exists with FLAIR images. At the same time, the proposed method is able to maintain a high quality of image reconstruction, in particular in the area of the brain lesions due to multiple sclerosis. The comparison with a state-of-the-art Dense U-Net architecture has shown that the proposed network outperforms both in terms of perceptive quality and in terms of execution times. Future works will focus on increasing the speed of the MRI scan, with the goal of achieving an acceleration of at least 10 times, and on further improving the reconstruction quality.


  • [1] Beall, P.T., Amtey, S.R., Kasturi, S.R.: NMR Data Handbook for Biomedical Applications. Pergamon Books Inc., Elmsford, NY (1984)
  • [2] Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs). In: International Conference on Learning Representations (ICLR). pp. 1–14. San Juan, Puerto Rico (May 2016)
  • [3] Gamper, U., Boesiger, P., Kozerke, S.: Compressed sensing in dynamic MRI. Magnetic Resonance in Medicine 59(2), 365–373 (Feb 2008)
  • [4] Haacke, E.M., Brown, R.W., Thompson, M.R., Venkatesan, R.: Magnetic Resonance Imaging: Physical Principles and Sequence Design, vol. 82. Wiley-Liss, New York, NY (1999)
  • [5]

    Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 2261–2269. Honolulu, HI (Jul 2017)

  • [6] Huang, J., Chen, C., Axel, L.: Fast multi-contrast MRI reconstruction. Magnetic Resonance Imaging 32(10), 1344–1352 (Dec 2014)
  • [7] Jaspan, O.N., Fleysher, R., Lipton, M.L.: Compressed sensing MRI: A review of the clinical literature. The British Journal of Radiology 88(1056) (Dec 2015)
  • [8] Jin, K.H., McCann, M.T., Froustey, E., Unser, M.: Deep convolutional neural network for inverse problems in imaging. IEEE Transactions on Image Processing 26(9), 4509–4522 (Sep 2017)
  • [9] Kim, K.H., Do, W.J., Park, S.H.: Improving resolution of MR images with an adversarial network incorporating images with different contrast. Medical Physics 45(7), 3120–3131 (Jul 2018)
  • [10] Liang, Z.P., Lauterbur, P.C.: Principles of Magnetic Resonance Imaging. A Signal Processing Perspective. The Institute of electrical and Electronics Engineers, New York, NY (2000)
  • [11] Lustig, M., Donoho, D., Pauly, J.M.: Sparse MRI: The application of compressed sensing for rapid MR imaging. Magnetic Resonance in Medicine 58, 1182–1195 (Oct 2007)
  • [12] Lustig, M., Donoho, D.L., Santos, J.M., Pauly, J.M.: Compressed sensing MRI. IEEE Signal Processing Magazine 25(2), 72–82 (Mar 2008)
  • [13] McCann, M.T., Jin, K.H., Unser, M.: Convolutional neural networks for inverse problems in imaging: A review. IEEE Signal Processing Magazine 34(6), 85–95 (Nov 2017)
  • [14]

    Qin, C., Schlemper, J., Caballero, J., Price, A.N., Hajnal, J.V., Rueckert, D.: Convolutional recurrent neural networks for dynamic MR image reconstruction. IEEE Transactions on Medical Imaging 38(1), 280–290 (Jan 2019)

  • [15] Ronnenberger, O., Fischer, P., Brox, T.: U-Net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI). Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Cham (2015)
  • [16] Roy, S., Butman, J.A., Reich, D.S., Calabresi, P.A., Pham, D.L.: Multiple sclerosis lesion segmentation from brain MRI via fully convolutional neural networks. arXiv preprint arXiv:1803.09172v1 (Mar 2018)
  • [17] Schlemper, J., Caballero, J., Hajnal, J.V., Price, A.N., Rueckert, D.: A deep cascade of convolutional neural networks for dynamic MR image reconstruction. IEEE Transactions on Medical Imaging 37(2), 491–503 (Feb 2018)
  • [18] Žiga, L., Galimzianova, A., Koren, A., Lukin, M., Pernuš, F., Likar, B., Špiclin, v.: A novel public MR image dataset of multiple sclerosis patients with lesion segmentations based on multi-rater consensus. Neuroinformatics 16(1), 51–63 (Jan 2018)
  • [19] Xiang, L., Chen, Y., Chang, W., Zhan, Y., Lin, W., Wang, Q., Shen, D.: Deep leaning based multi-modal fusion for fast MR reconstruction. IEEE Transactions on Biomedical Engineering (Early Access) (2018)