stylish-fastmri
None
view repo
Adaptive intelligence aims at empowering machine learning techniques with the additional use of domain knowledge. In this work, we present the application of adaptive intelligence to accelerate MR acquisition. Starting from undersampled k-space data, an iterative learning-based reconstruction scheme inspired by compressed sensing theory is used to reconstruct the images. We adopt deep neural networks to refine and correct prior reconstruction assumptions given the training data. The network was trained and tested on a knee MRI dataset from the 2019 fastMRI challenge organized by Facebook AI Research and NYU Langone Health. All submissions to the challenge were initially ranked based on similarity with a known groundtruth, after which the top 4 submissions were evaluated radiologically. Our method was evaluated by the fastMRI organizers on an independent challenge dataset. It ranked #1, shared #1, and #3 on respectively the 8x accelerated multi-coil, the 4x multi-coil, and the 4x single-coil track. This demonstrates the superior performance and wide applicability of the method.
READ FULL TEXT VIEW PDFNone
Magnetic Resonance Imaging (MRI) is a widely applied non-invasive imaging modality, with excellent soft tissue contrast and high spatial resolution. Unlike Computed Tomography (CT) scanning, MRI does not expose patients to any ionizing radiation, making it a favourable modality compared to CT. MR images are essential for clinical assessment of soft tissue as well as functional and structural measurements, which leads to early detection and diagnosis of many diseases. However, MRI is relatively slow compared to other imaging modalities. The total examination time can vary from 15 minutes for knee imaging to an hour or more for cardiac imaging. Remaining motionless for this long in a confined space is challenging, especially for children, elderly and claustrophobic patients. Motion artifacts are difficult to correct, which may require a complete re-scan [1]. Furthermore, the acquisition time affects the temporal resolution and subsequently limits the potential of MRI for dynamic imaging of the abdomen [2] and the heart [3], where high temporal resolution and robustness against motion are critical for diagnosis. Moreover, the relatively long scan times lead to high costs that limit the availability of MRI scanners [4]. Therefore, fast acquisition and reconstruction are crucial to improve the performance of current MR scanners, which led in recent years to the development of techniques such as parallel reception, compressed sensing and multi-band accelerations. However, there is still need for further scan acceleration.
The long acquisition time is intrinsic to the scanner and physics properties of MRI. For the majority of scans performed in clinical practice, this acquisition is done through consecutive reading-out of single k-lines. These readouts are constrained by physical limitations of the hardware, the contrast generating principle, and human physiology. The scanning time could be shortened by reducing the number of acquired k-lines, i.e. by undersampling the 2D or 3D k-space. However, this could violate the Nyquist criterion, resulting in aliasing and blurriness in the reconstructed images, rendering them unqualified for clinical purpose. Compressed Sensing (CS) and Parallel Imaging are the most common solutions for acceleration by undersampling, while maintaining image quality. Compressed Sensing, the focus of this paper, introduced by Donoho [5] and Lustig [6], leverages the fact that MR images can be compressed in some domain, restoring the missing k-space data through an iterative reconstruction algorithm [7]. Parallel Imaging focuses on exploiting measurements by multiple receive coils to reduce the aliasing artifacts [8].
When CS is used to accelerate MR acquisitions, k-spaces are sampled pseudo-randomly and the image is subsequently reconstructed by promoting a sparse solution. In the optimal setting, the reconstructed image will be identical to the Fourier transform of the full k-space and have a limited number of large coefficients when transformed to the sparse domain. Equation (
1) shows the optimization function that describes the CS algorithm:(1) |
where is the reconstructed image, is the fully measured k-space data, is the Fourier transform, (mask) is the undersampling operation, represents the sparsity transform coefficients, and is the regularization parameter. The norm is used to enforce sparsity of the solution in a domain specified by the transformation . The norm is used as a similarity measure between the measured k-space data and the reconstructed k-space , called the ’data consistency’ term. Note that, in case of multi-coil acquisitions, the data consistency term is given by:
(2) |
where denotes the coil element and the coil sensitivity map. To simplify notation, without loss of generality, the single-coil data consistency term will be used throughout this paper.
Recently, deep learning has shown promising results for speeding up MR reconstruction algorithms using Convolutional Neural Networks (CNN) and Generative Adversarial Networks (GAN). In contrast to iteratively solving optimization problems, deep learning offers a solution for reconstructing highly-accelerated scans by adopting learnable reconstruction schemes.
The literature of deep learning-based reconstruction algorithms can be divided into two categories [9]. First, data-driven approaches, where a neural network is trained to find the optimal transformation from the zero-filled k-space to the desired reconstruction. Here, the network is completely dependent on the underlying training dataset without any task specific prior-knowledge on the domain; following are selected exemplar algorithms of this approach. Quan et al. [10] developed a GAN network for MR reconstruction starting from undersampled data. Their network consists of two consecutive networks, one for reconstruction and one for refining the results. They used a cyclic data consistency term alongside the WGAN loss. Mardani et al. [11] developed a GAN network for CS. The proposed network corrects aliasing artifacts of MR images. Akcakaya et al. [12] trained a neural network on patient-specific auto-calibration MR signals. The network was trained on phantom data and tested on cardiac MR scans. AUTOMAP [13] reports good reconstruction results with an architecture that learns to directly transform k-space into image data. Lee et al. [14] introduced two separate deep residual networks for magnitude and phase. The proposed networks successfully reconstructed images even when obtained with high undersampling factors.
Second, hybrid approaches are presented in the literature. This class of algorithms builds on top of existing reconstruction solutions and integrate learning-based approaches to substitute part of the original computations. An example of an existing reconstruction algorithm is the Iterative Shrinkage-Thresholding Algorithm (ISTA) [15]
, which iteratively updates an estimate of the reconstruction by applying a filtering operation in a sparsified transformation of the reconstruction. Recently, Zhang and Ghanem
[16]developed a deep learning approach called ISTA-Net that mimics the conventional ISTA algorithm, but enriches it by replacing the sparsifying transform and the thresholding with learned operations. The resulting network does not implement a fully iterative algorithm, but it simulates it by adopting a fixed number of iterations, effectively enabling the implementation of a deep neural network that can be trained by the backpropagation algorithm.
Inspired by the work of Zhang and Ghanem [16], in this paper we propose a deep-learning based solution, Adaptive-CS-Network, that mimics the ISTA algorithm, but introduces strong prior information, i.e., inductive biases, to better constrain the reconstruction problem. The main contributions of this work are: i) we propose a novel CNN network that integrates and enhances the conventional CS approach; ii) it integrates multiscale sparsification, inspired by wavelet transforms, but in a learnable manner; iii) we adopt domain-specific knowledge, such as data consistency, a prior on known phase behavior, and the location of the background: these computations cannot be easily learned by a CNN; iv) the proposed model exploits the correlation between neighbouring slices by adopting a 2.5D learning approach. In addition, we propose a hierarchical training strategy that leverages the available data. We conducted extensive experiments to investigate the performance of the network, and show that domain specific information is crucial for reconstructing high-quality MR images. The proposed network showed superior performance by winning one, and co-winning a second track out of the three tracks of the fastMRI challenge [17].
The fastMRI challenge is a challenge organized by Facebook AI Research and NYU Langone Health [17]. The aim of the challenge is to advance and encourage AI-based research in MR reconstruction in order to allow acceleration of the acquisition and, subsequently, to reduce the examination time. The challenge is divided in three tracks: 4x single-coil, 4x multi-coil, and 8x multi-coil accelerations. Eight teams participated in the multi-coil track and 17 teams in the single-coil track [18].
The challenge organizers released a large-scale dataset of raw MR data of the knee [19]. The data was acquired with a 2D protocol in the coronal direction with a 15 channel knee coil array using Siemens MR machines at two different field strengths: 1.5T and 3T [17]. The data was acquired using two pulse sequences: a proton density weighting with (PDFS) and without (PD) fat suppression. The data is divided approximately equally between these pulse sequences. The pixel size is 0.5 mm 0.5 mm with a slice thickness of 3 mm.
The dataset is divided in 4 categories: training (973 volumes, 34,742 slices), validation (199 volumes, 7,135 slices), test (118 volumes, 4,092 slices), and challenge (104 volumes, 3,810 slices). These numbers are the same for multi-coil and single-coil data, with the exception of the test and challenge categories, where single-coil data has respectively 10 and 12 volumes less than the multi-coil data. The training, validation and test sets were publicly available since late November, 2018, while the challenge set was available since September 2019. The full k-space was available for all the datasets except for the test and challenge sets. Training and validation sets were considered for training and optimizing our model, while the test set was used for evaluating model performance on a public leaderboard. The final model was evaluated by the organizers on the independent challenge set.
The k-space data provided in the challenge were undersampled using a Cartesian mask, where k-space lines are set to zero in the phase encoding direction. The sampling density is dependent on the acceleration rate (4x or 8x). All the masks fully sample the low-frequency area of k-space. For the 4x accelerated scans, it corresponded to 8% of the total k-space lines, while it was 4% for 8x acceleration.
In order to measure the accuracy of the reconstructed volumes compared to the target volumes , the following metrics were considered:
measures the square of the Euclidean norm between a pair of images:
(3) |
the ratio between the maximum intensity and the underlying distortion noise:
(4) |
measures image similarity using human perception aspects
[20]. SSIM is calculated by measuring three image distortions including luminance , contrast and structure :(5) |
where are the distortion weights, here chosen as 1.
We submitted the reconstructions on the challenge dataset via an online form, which were then evaluated independently by the fastMRI organizers, described in detail by Knoll et al. [18]. All submissions were ranked by the SSIM metric, after which only the 4 highest ranking submissions were evaluated by a panel of 7 radiologists. The panel was asked to evaluate the reconstructions on a scale from 1 to 5 on four different categories, where 1 is the best and 5 is the worst. The 4 categories were the rating of artifacts, reconstruction sharpness, perceived contrast-to-noise ratio and diagnostic confidence. The radiological scores were subsequently averaged and translated to a final ranking.
In this section we present the background of our solution, first by introducing the Iterative Shrinkage-Thresholding Algorithm (ISTA) [15] and, second, by introducing its deep learning-based variant, ISTA-Net [16]. Then, we present our solution, the Adaptive-CS-Network, that builds on top of the ISTA-Net framework by introducing several improvements, including strong inductive biases derived from domain knowledge on the reconstruction problem.
ISTA is an optimization algorithm to solve (1) in an iterative fashion, starting from the reconstruction , which is often obtained by reconstructing the zero-filled undersampled k-space. The initial estimate is refined using the following update rules:
(6) | ||||
(7) |
where is an update of the estimate , where the error in the measured data is corrected by a step . Equation (7) is a special case of the proximal mapping, with a regularization weight , and a crucial step for optimization algorithms such as ISTA, ADMM and AMP. When is a wavelet transform , it can be proven that
(8) |
where is the soft-tresholding operator defined as . In general, solving (7) is not straightforward for non-linear operators
, limiting the applicability of the ISTA framework to simple transforms. Another problem of this family of algorithms, is the difficulty of tuning the hyperparameters
and in addition to its slow convergence, hence requiring a lot of iterations to achieve the optimal solution of (1).Recently, Zhang and Ghanem introduced a deep-learning approach to overcome the limitations of the ISTA framework for image-to-image reconstruction. Their solution, called ISTA-Net [16], replaces the handcrafted transform with a learned operator
, which consists of a 2D learnable convolution followed by a rectified linear unit (ReLU) and a second convolution. By replacing
with in (7), we can rewrite the update rule as(9) |
and, by defining as the inverse of , i.e., , Zhang and Ghanem propose to update (8) as follows:
(10) |
where has a similar architecture as .
The model is trained end-to-end, where the iterations of the ISTA algorithm are “unrolled”, i.e., a number of identical reconstruction blocks are created. Note that in the ISTA-Net approach, the learnable parameters are shared among all the blocks in the unrolled network, unlike our solution. The training loss is defined as a combination of the reconstruction and discrepancy loss:
(11) | ||||
(12) | ||||
(13) |
The reconstruction loss encodes the need for the final reconstruction, defined as , to be as close as possible in the least squares sense to the ground-truth image, i.e., . The discrepancy loss stimulates that . The parameter allows to control the weight given to the discrepancy loss, and it is chosen to be arbitrarily small, e.g., . An extension, called ISTA-Net is also presented by the authors, where residual computations are adopted.
Starting from the network developed by Zhang and Ghanem, we developed the Adaptive-CS-Network approach. Our solution builds on top of the ISTA-Net solution based on three key innovations, here ordered by importance to the final network performance: i) the use of multi-scale and ii) multi-slice computations, together with iii) the introduction of soft MRI priors. We present them independently, building towards the update rule of the Adaptive-CS-Network model as presented in (15). Fig. 1 illustrates the proposed network.
First, many non-learned CS algorithms make use of multi-scale transforms to sparsify the signal. An example is given in (8), where is a wavelet transform; a decomposition of the signal into a set of basis functions at different scales. We include this inductive bias in our design, and adopt a multi-scale transform , and its inverse . As an additional design choice, we decide to sparsify and learn only the residual, therefore our update rule is written as follows:
(14) |
where
comprises of 2D convolutions, since the acquisition protocol was 2D, and LeakyReLU operations. To generate a multiscale representation, a max-pooling layer is used and the resulting features are then processed again by convolutional blocks and non-linearities. The exact design of
is presented in Fig. 1. The feature maps produced at the different scales are then thresholded using the soft-max function. Differently from ISTA-Net, we learn a lambda parameter and feature channel for each scale . This approach gives the network the flexibility of tuning the thresholds independently, hence reducing the complexity of the transforms learned by the convolutional operators. Finally, the filtered channels are transformed back into the image domain by the inverse, consisting of interpolation, 2D convolutions and Leaky-ReLu operators. Note that, contrary to the latest literature in deep learning networks, we decided not to adopt strided convolutions for sub- and up-sampling, which would increase the risk of creating checkerboard artifacts
[21]; instead we took the more conservative approach of adopting pooling and interpolation layers for achieving better image quality. Overall, the computation represented by is implemented with a UNet-like architecture [22], where the feature maps before the skip connections are filtered according to the parameter .Second, it is important to note that the slice thickness of the dataset is much higher than the in-plane resolution. This indicates that inter-slice correlations are less useful for finer scales, and potentially damaging as the network could see the extra slices as noisy information. However, such information can be beneficial at coarser scales. Since our transform is multi-scale by nature, we found it beneficial to inject neighboring slices
into the model. To reduce the memory footprint of the model, we adopted a 2.5D convolution approach by concatenating neighbouring slices into the input tensor along the channel dimension, enabling to “reinvest” the saved GPU memory as compared to a truly 3D convolution approach, into more unrolled iterations. More details on the number of slices used and the definition of the loss function are given in Section
IV-C.Finally, we adopted a hybrid- or nudge- approach to incorporate additional prior knowledge into the reconstruction algorithm. We therefore computed additional information derived from the current estimate together with k-space . These soft priors, which are presented in the next section, capture some properties of an MR image that cannot be easily learned by a deep neural network due to the limited size of the receptive field. The priors come in the form of images, and are provided as extra input channel to the transform . In this way, they are integrated in the computations performed by whenever this is beneficial for the optimization of the loss function.
The overall update for a block in the Adaptive-CS-Network model is defined as follows:
(15) |
Each block in the network learns different transforms and , enabling each block to focus on different properties of the image and effectively increasing the network capacity.
In our final design, the transform does not receive the data consistent image , as defined in (6), but rather the current estimate together with the data consistency prior computed as follows:
(16) |
This “soft data-consistency” update allows the network to evaluate the reliability of the acquired data and potentially compensate errors in the coil combination defined by in (1).
The second prior we provided to the network, , represents the known phase response for spin-echo MR sequences. Theoretically, spin-echo sequences have zero phase everywhere in the image. In practice, however, slowly varying phase will occur, i.e. nonzero phase only in the low frequencies, due to hardware and acquisition imperfections. Taking this into account, it is noted that the final reconstructed image should be a real valued image after removal of the slowly varying phase. This information is captured in the following prior:
(17) |
where denotes the complex conjugate, and refers to low pass filtering. The low pass filter is chosen such that it corresponds to the center part of k-space which is fully sampled. By doing so, the low pass filtered image can be derived beforehand only once, hence is replaced by .
Finally, we adopt a simple approach to estimate the location in where the background is found, which is common in parallel imaging techniques. The following prior is applied:
(18) |
This prior will penalize estimated signal content where is low, i.e., within the background. Again, is replaced by . Because is based on the fully measured centeral part of k-space, the image is artefact free albeit at low spatial resolution, leading to a reliable background identification.
In Fig. 1 the design of the Adaptive-CS-Network is shown, including the multi-scale transforms, the multi-slice computation and the priors provided as input. Note how the spin-echo and background priors are computed only for the central slice, in order to save GPU memory.
We implemented our models in PyTorch
[23]. All the optimization experiments were performed on NVIDIA GeForce GTX 1080 GPUs with 16 GB RAM and the final network was trained on two NVIDIA V100 GPU with 16 GB RAM. In order to run as many experiments as possible given the challenge deadline, model optimization (see Section IV) was done with a relatively small model (10 blocks), which we trained for 20 epochs. All the optimization networks were trained and validated on the highest acceleration rate of the challenge, i.e. 8x and for single-coil data, except for the number of the blocks which was performed for both 4x and 8x, and for the priors which are more relevant for the multi-coil data. Since the ground truth for the test set was not available, all the quantitative comparisons were only done on the validation set.
For the challenge, we trained the final model using the training and validation datasets for 25 epochs and accelerations randomly selected from 2x to 10x. The model was subsequently fine-tuned on eight data sub-populations identified by the acceleration (4x and 8x), the protocol (PD and PDFS) and the scanner field strength (1.5T and 3T). Fine-tuning was then performed for 10 epochs on the sub-populations. This procedure was performed independently for the single- and multi-coil datasets, resulting in a total of 8 models. All models were trained using an exponentially decaying learning rate of . The final models had 33M trainable parameters each.
In this section we present how we optimized the network configuration, on a smaller model with reconstruction blocks, using the quantitative measures reported in Section II-B for validation. We performed experiments on the number of the blocks, the loss functions, the influence of using adjacent slices, the optimizer, and the soft priors. A repeated measure one-way ANOVA test was performed on the SSIM values using a significance level of . P-values are only stated for the comparisons between the best method and the other methods.
The proposed model consists of multiple blocks, related to the number of unrolled iterations of the ISTA scheme. Increasing the number of blocks leads to an increase in the number of parameters of the model, and subsequently training time and GPU memory usage as well as an increase in risk of overfitting. In this experiment we investigated the effect of the number of the blocks on the quality of reconstructed images. Tests were ran with the 2D network for 4x and 8x acceleration rates without neighboring slices, MSE as loss function, RMSprop as optimizer, and with the Unet-like architecture of 16 filter maps for each convolutional layer. Fig.
2 reports the relative changes to a single block of our quantitative metrics. Based on the experiments, increasing the number of the blocks will improve the performance of the network. Therefore, the final network was configured with the maximum number of blocks that could be fitted into GPU memory: 25 blocks. However, for the optimization experiments below only 10 blocks were employed to limit the duration of the training.In this experiment we investigated the effect of a wide range of loss functions on the performance of our network. Here, we used the single slice reconstruction network with only 10 blocks, RMSprop as the optimizer, and 16 filter maps for each convolutional layer. The models were trained for 20 epochs to ensure convergence of the model. The evaluated loss functions included MSE, perceptual loss (PL) [24], , Huber [25] and multi-scale structural similarity index (MSSIM) [26]. The PL loss function was calculated using a pre-trained VGG-16 at layers relu12, relu22, and relu33.
Loss function | SSIM | NMSE | PSNR | -value |
---|---|---|---|---|
MSE | 0.6570.149 | 0.0460.029 | 30.22.8 | 0.001 |
Perceptual loss | 0.6640.157 | 0.0610.044 | 29,23.2 | 0.001 |
Huber | 0.6640.148 | 0.0620.041 | 29.13.0 | 0.001 |
0.6640.148 | 0.0620.041 | 29.13.0 | 0.001 | |
SSIM | 0.6620.145 | 0.0650.041 | 28.92.8 | 0.001 |
MSSIM [26] | 0.6710.143 | 0.0500.034 | 30.13.1 | 0.001 |
Eq. (20) | 0.6730.143 | 0.0480.033 | 30.33.1 |
MSSIM [26] builds upon SSIM (see Section II-B3) by incorporating structural similarity at multiple image resolutions, thereby supplying more flexibility compared to SSIM, and is defined as follows:
(19) |
where denote the reconstructed and target images respectively, is the number of scales used, , c and s are the luminance, contrast, and structure as defined in [20], , , and are the weights of the distortion factors at different resolution levels. We adopted the same weights as reported in [26].
Zhao et al. [27] reported that a linear combination of SSIM and preserves the different properties of an image better than each separately: SSIM encourages the network to preserve structural and contrast information, while enforces sensitivity to uniform biases to preserve luminance [28]. Since MSSIM reached higher metric values than SSIM (see Table I), we deployed a weighted summation of MSSIM [26] and :
(20) |
where was chosen, following Zhao et al. [27].
Table I reports the quantitative results for the different loss functions. The weighted linear combination of MSSIM and yielded the best results, where the -values indicate that the improvement was highly consistent, albeit with small improvements on SSIM-values. Fig. 3 shows two example results for the different loss functions, confirming the favorable results for the model trained using a combination of MSSIM and . Therefore, this loss function was selected for training the final model. For the remainder of the experiments, MSSIM is used as loss function.
Network | SSIM | NMSE | PSNR | -value |
---|---|---|---|---|
2D | 0.6710.143 | 0.0500.034 | 30.13.1 | 0.001 |
2.5D W0.1 | 0.5490.128 | 0.0890.034 | 26.82.1 | 0.001 |
2.5D W0.2 | 0.5480.128 | 0.0900.033 | 26.82.1 | 0.001 |
2.5D | 0.6740.143 | 0.0480.033 | 30.33.1 |
The resolution of the images in the dataset is anisotropic with a voxel size of . Due to the correlation between adjacent slices with respect to anatomical structures in MRI images, we performed an experiment to assess whether inclusion of neighbouring slices into the reconstruction might improve the performance. We compared the 2D scheme using only the center slice with three alternative 2.5D schemes: i) the neighboring slices were used together with the center slice as input, but only the center slice was used in the loss function (network 2.5D); ii) and iii) the neighboring slices are also used in the loss, with different weights (0.1 vs 0.2 for the neighbors; 1.0 for the center slice). MSSIM was used for the loss function, 10 blocks, RMSprop as the optimizer, and 16 feature maps.
Table II shows the results of this experiment, showing that the 2.5D schema very consistently improves over the 2D scheme, and that the loss should only be defined on the center slice. For the final model, this scheme was selected.
We experimented with different optimizers including RMSprop, rectified Adam (RAdam) [29], LookAhead [30] and Ranger [31]. RAdam exploits a dynamic rectifier to adjust the adaptive momentum of Adam [32]. LookAhead not only uses an adaptive learning rate and accelerated schemes but also iteratively updates two sets of weights, i.e. fast and slow weights. Ranger combines Radam and LookAhead optimizers into a single one. We used the 2D network with 10 blocks and 16 feature maps for each layer, and MSSIM the loss function.
Table III tabulates the results for the different optimizers. Since the best results were obtained for the RAdam optimizer, very consistently improving over the other optimizers, this was used for the final network.
Optimizer | SSIM | NMSE | PSNR | -value |
---|---|---|---|---|
RMSprop | 0.6730.143 | 0.0480.033 | 30.33.1 | 0.001 |
LookAhead | 0.6680.140 | 0.0500.032 | 30.02.9 | 0.001 |
Ranger | 0.6680.140 | 0.0500.032 | 30.02.9 | 0.001 |
RAdam | 0.6740.141 | 0.0480.032 | 30.33.0 |
In this experiment, we compare the proposed model to ISTA-Net [16]. For this experiment, a 2D network with 10 blocks and 16 feature maps per layer was used, SSIM as loss function, and RAdam as the optimizer. Since ISTA-Net uses a much smaller single scale architecture with much fewer network parameters, we added an experiment increasing the feature maps for ISTA-Net such that the number of parameters was the same as for our architecture. According to the results reported in Table IV, the proposed model outperforms ISTA-Net significantly.
Model | SSIM | NMSE | PSNR | -value |
---|---|---|---|---|
ISTA-Net | 0.5470.117 | 0.1690.022 | 23.81.9 | 0.001 |
ISTA-Net-L | 0.5430.119 | 0.1030.038 | 26.22.0 | 0.001 |
A-CS-Net | 0.6710.143 | 0.0500.034 | 30.13.1 |
To assess the contribution of the additional soft priors, we compared the full model against a version without known phase behaviour and without background information . Visually, we observed only small differences. To verify the differences in a realistic setting, we submitted the results to the public leaderboard of the fastMRI challenge. As shown in Table V, the network with all priors performed better in terms of the SSIM metric, although the results worsened in terms of NMSE and PSNR. Despite the fact that the improvement was minimal, we decided to adopt all priors for the final model to ensure our participation in the last challenge phase, since the selection was based on SSIM.
Acceleration | prior | SSIM | NMSE | PSNR |
---|---|---|---|---|
4x | 0.772 | 0.025 | 30.98 | |
0.773 | 0.028 | 33.49 | ||
8x | 0.674 | 0.038 | 30.90 | |
0.675 | 0.044 | 30.27 |
In this section, we describe the configuration of the submitted model [33] and analyze the resulting reconstructions. The final performance is evaluated with the quantitative metrics on the test and challenge datasets, and by presenting the radiological scores for the challenge dataset as performed by the fastMRI challenge organizers.
Following our model optimization study, the configuration of the final model was determined as follows. The linear combination of MSSIM and (20) was chosen as the loss function. The 2.5D scheme was chosen with two neighboring slices, with the loss applied only on the central slice. For training the model, the RAdam optimizer was deployed. Fig. 4 shows the structure of the final network. Each block is determined by three parameters for the denoiser: 1) the number of scales for the denoiser , 2) the kernel size used in the convolutions and, 3) the number of feature maps in the first convolutional layer, which is then doubled at each scale. According to the experiments presented in Fig. 2, the number of reconstruction blocks greatly affects the reconstruction performance, empirically observing that performance still improves when 15 blocks are used. The available GPU memory is a limiting factor when designing a deep neural network. To allow for a large number of blocks, we chose a different design in each block, mixing a less powerful design (16 filters) with more powerful ones (64 filters). By adopting this strategy, our final design contained 25 reconstruction blocks.
Dataset | Coil | Detail | SSIM | NMSE | PSNR | |
Test | multi | 4x | ALL | 0.928 | 0.005 | 39.9 |
PD | 0.961 | 0.002 | 41.7 | |||
PDFS | 0.891 | 0.009 | 37.9 | |||
8x | ALL | 0.888 | 0.009 | 36.8 | ||
PD | 0.937 | 0.005 | 38.5 | |||
PDFS | 0.843 | 0.013 | 35.3 | |||
single | 4x | ALL | 0.777 | 0.027 | 33.7 | |
PD | 0.877 | 0.010 | 36.9 | |||
PDFS | 0.685 | 0.043 | 30.7 | |||
8x | ALL | 0.680 | 0.042 | 30.5 | ||
PD | 0.777 | 0.019 | 32.4 | |||
PDFS | 0.575 | 0.067 | 28.5 | |||
Challenge | multi | 4x | ALL | 0.927 | 0.005 | 39.9 |
8x | ALL | 0.901 | 0.009 | 37.4 | ||
single | 4x | ALL | 0.751 | 0.030 | 32.7 |
Fig. 5 shows example results of the final network for the multi-coil track from the validation dataset. Fig. 6 shows examples from the test and challenge datasets. Table VI shows the SSIM, NMSE, and PSNR values for the test and challenge set (as described in Section II-A), for the images with and without fat suppression and both combined, for both single- and multi-coil MRI scans. For the radiological evaluation, our method scored 2.285, 1.286, and 2.714 for multi-coil 4x, multi-coil 8x, and single-coil 4x, respectively (the closer to 1, the better). More details on the results for the challenge were presented in [18].
In this paper we propose a general method, named Adaptive-CS-Net, for reconstructing undersampled MRI data, combining ideas from compressed sensing theory with ideas from MR physics and deep learning. The method was developed in the context of the 2019 fastMRI challenge, which focused on accelerating knee MR imaging. The proposed network is an unrolled iterative learning-based reconstruction scheme, in which a large number of reconstruction blocks refine the MR image by denoising the signal in a learned and multi-scale fashion. Moreover, we added neighboring slices as input to the sparsifying transform, as well as a number of soft priors that encode MRI domain knowledge.
The main driver of the performance of our network is the multi-scale architecture, as demonstrated in a direct comparison with ISTA-Net that is corrected by the number of trainable parameters. According to the experimental results on the number of blocks for 4x and 8x accelerations of both single- and multi-coil data, we showed that the number of blocks has a large impact on model performance. Therefore, it was decided to use the maximum number of blocks that we could fit into the GPU memory, where we adopted different model designs for the different blocks to save memory. It might be expected that beyond a certain number of blocks, overfitting of the data might occur. However, signs of overfitting were not observed during training and the final number of blocks was only marginally larger than tested in the optimization experiments. Whether further increase in the number of blocks could results in even better performance could be the topic of further experiments. This would, however, need better hardware, as the current design is memory- and time-bound during training. With the current configuration, final model training took approximately 7 days on two V100 GPUs.
We experimented with a large variety of loss functions. Results showed that the linear summation of MSSIM and performed best. We defined a 2.5D scheme to train the network in which three adjacent slices were reconstructed while the loss function was calculated only for the central slice. The proposed scheme outperformed the 2D network as well as 2.5D networks in which the loss was calculated over all slices. By incorporating the neighbouring slices, the network can exploit existing correlations into the reconstruction of the target slice, which is our main target as defined by our loss. It can be expected that for MRI acquisition with less asymmetric voxel sizes, the inclusion of information of neighbouring slices would become more important. We tested different optimizers, where the newly introduced RAdam outperformed the others and we used it for training the final network. We also incorporated prior knowledge, including data consistency, known phase behaviour and background discrimination to support the network in the reconstruction process. We observed that these priors provided only limited extra performance to the network, resulting in visually similar images and minimal difference in the metrics. We can conclude that the Adaptive-CS-Net is sufficiently powerful to learn directly from the data how to reconstruct the undersampled k-space, being the multi-scale structure and the use of many reconstruction blocks the main driver of our performance. As a future work, we want to better understand how much the network is relying on the priors by adopting interpretable AI techniques such as differentiable image parameterizations for feature visualization [34]. Stronger use of the priors via the loss function is an additional option.
As mentioned before, the radiologist scores were based on the visual quality of the reconstructed images and not on diagnostic interchangeability. Therefore, designing a network based on the diagnosis can be considered a point for further research. We furthermore observed that optimizing for SSIM was needed for reaching the final stage of the challenge, but is not necessarily an ideal representative of radiological image quality. This observation was very recently confirmed in a comparative study by others [35].
In this paper we propose an adaptive intelligence algorithm called Adaptive-CS-Net, which was developed in the context of the 2019 fastMRI challenge. In the two clinically relevant tracks of the challenge, using multi-coil MRI acquisitions, the proposed method was leading, while on a simulated single-coil track the method ranked 3rd.
Facebook AI Research and NYU Langone Health are acknowledged for organizing the 2019 fastMRI challenge.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 1828–1837, 2018.Radiology: Artificial Intelligence
, vol. 2, no. 1, p. e190007, 2020.J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in
European Conference on Computer Vision, pp. 694–711, Springer, 2016.