I Introduction
Magnetic Resonance Imaging (MRI) is a widely applied noninvasive imaging modality, with excellent soft tissue contrast and high spatial resolution. Unlike Computed Tomography (CT) scanning, MRI does not expose patients to any ionizing radiation, making it a favourable modality compared to CT. MR images are essential for clinical assessment of soft tissue as well as functional and structural measurements, which leads to early detection and diagnosis of many diseases. However, MRI is relatively slow compared to other imaging modalities. The total examination time can vary from 15 minutes for knee imaging to an hour or more for cardiac imaging. Remaining motionless for this long in a confined space is challenging, especially for children, elderly and claustrophobic patients. Motion artifacts are difficult to correct, which may require a complete rescan [1]. Furthermore, the acquisition time affects the temporal resolution and subsequently limits the potential of MRI for dynamic imaging of the abdomen [2] and the heart [3], where high temporal resolution and robustness against motion are critical for diagnosis. Moreover, the relatively long scan times lead to high costs that limit the availability of MRI scanners [4]. Therefore, fast acquisition and reconstruction are crucial to improve the performance of current MR scanners, which led in recent years to the development of techniques such as parallel reception, compressed sensing and multiband accelerations. However, there is still need for further scan acceleration.
The long acquisition time is intrinsic to the scanner and physics properties of MRI. For the majority of scans performed in clinical practice, this acquisition is done through consecutive readingout of single klines. These readouts are constrained by physical limitations of the hardware, the contrast generating principle, and human physiology. The scanning time could be shortened by reducing the number of acquired klines, i.e. by undersampling the 2D or 3D kspace. However, this could violate the Nyquist criterion, resulting in aliasing and blurriness in the reconstructed images, rendering them unqualified for clinical purpose. Compressed Sensing (CS) and Parallel Imaging are the most common solutions for acceleration by undersampling, while maintaining image quality. Compressed Sensing, the focus of this paper, introduced by Donoho [5] and Lustig [6], leverages the fact that MR images can be compressed in some domain, restoring the missing kspace data through an iterative reconstruction algorithm [7]. Parallel Imaging focuses on exploiting measurements by multiple receive coils to reduce the aliasing artifacts [8].
When CS is used to accelerate MR acquisitions, kspaces are sampled pseudorandomly and the image is subsequently reconstructed by promoting a sparse solution. In the optimal setting, the reconstructed image will be identical to the Fourier transform of the full kspace and have a limited number of large coefficients when transformed to the sparse domain. Equation (
1) shows the optimization function that describes the CS algorithm:(1) 
where is the reconstructed image, is the fully measured kspace data, is the Fourier transform, (mask) is the undersampling operation, represents the sparsity transform coefficients, and is the regularization parameter. The norm is used to enforce sparsity of the solution in a domain specified by the transformation . The norm is used as a similarity measure between the measured kspace data and the reconstructed kspace , called the ’data consistency’ term. Note that, in case of multicoil acquisitions, the data consistency term is given by:
(2) 
where denotes the coil element and the coil sensitivity map. To simplify notation, without loss of generality, the singlecoil data consistency term will be used throughout this paper.
Recently, deep learning has shown promising results for speeding up MR reconstruction algorithms using Convolutional Neural Networks (CNN) and Generative Adversarial Networks (GAN). In contrast to iteratively solving optimization problems, deep learning offers a solution for reconstructing highlyaccelerated scans by adopting learnable reconstruction schemes.
The literature of deep learningbased reconstruction algorithms can be divided into two categories [9]. First, datadriven approaches, where a neural network is trained to find the optimal transformation from the zerofilled kspace to the desired reconstruction. Here, the network is completely dependent on the underlying training dataset without any task specific priorknowledge on the domain; following are selected exemplar algorithms of this approach. Quan et al. [10] developed a GAN network for MR reconstruction starting from undersampled data. Their network consists of two consecutive networks, one for reconstruction and one for refining the results. They used a cyclic data consistency term alongside the WGAN loss. Mardani et al. [11] developed a GAN network for CS. The proposed network corrects aliasing artifacts of MR images. Akcakaya et al. [12] trained a neural network on patientspecific autocalibration MR signals. The network was trained on phantom data and tested on cardiac MR scans. AUTOMAP [13] reports good reconstruction results with an architecture that learns to directly transform kspace into image data. Lee et al. [14] introduced two separate deep residual networks for magnitude and phase. The proposed networks successfully reconstructed images even when obtained with high undersampling factors.
Second, hybrid approaches are presented in the literature. This class of algorithms builds on top of existing reconstruction solutions and integrate learningbased approaches to substitute part of the original computations. An example of an existing reconstruction algorithm is the Iterative ShrinkageThresholding Algorithm (ISTA) [15]
, which iteratively updates an estimate of the reconstruction by applying a filtering operation in a sparsified transformation of the reconstruction. Recently, Zhang and Ghanem
[16]developed a deep learning approach called ISTANet that mimics the conventional ISTA algorithm, but enriches it by replacing the sparsifying transform and the thresholding with learned operations. The resulting network does not implement a fully iterative algorithm, but it simulates it by adopting a fixed number of iterations, effectively enabling the implementation of a deep neural network that can be trained by the backpropagation algorithm.
Inspired by the work of Zhang and Ghanem [16], in this paper we propose a deeplearning based solution, AdaptiveCSNetwork, that mimics the ISTA algorithm, but introduces strong prior information, i.e., inductive biases, to better constrain the reconstruction problem. The main contributions of this work are: i) we propose a novel CNN network that integrates and enhances the conventional CS approach; ii) it integrates multiscale sparsification, inspired by wavelet transforms, but in a learnable manner; iii) we adopt domainspecific knowledge, such as data consistency, a prior on known phase behavior, and the location of the background: these computations cannot be easily learned by a CNN; iv) the proposed model exploits the correlation between neighbouring slices by adopting a 2.5D learning approach. In addition, we propose a hierarchical training strategy that leverages the available data. We conducted extensive experiments to investigate the performance of the network, and show that domain specific information is crucial for reconstructing highquality MR images. The proposed network showed superior performance by winning one, and cowinning a second track out of the three tracks of the fastMRI challenge [17].
Ii FastMRI Challenge
The fastMRI challenge is a challenge organized by Facebook AI Research and NYU Langone Health [17]. The aim of the challenge is to advance and encourage AIbased research in MR reconstruction in order to allow acceleration of the acquisition and, subsequently, to reduce the examination time. The challenge is divided in three tracks: 4x singlecoil, 4x multicoil, and 8x multicoil accelerations. Eight teams participated in the multicoil track and 17 teams in the singlecoil track [18].
Iia Dataset
The challenge organizers released a largescale dataset of raw MR data of the knee [19]. The data was acquired with a 2D protocol in the coronal direction with a 15 channel knee coil array using Siemens MR machines at two different field strengths: 1.5T and 3T [17]. The data was acquired using two pulse sequences: a proton density weighting with (PDFS) and without (PD) fat suppression. The data is divided approximately equally between these pulse sequences. The pixel size is 0.5 mm 0.5 mm with a slice thickness of 3 mm.
The dataset is divided in 4 categories: training (973 volumes, 34,742 slices), validation (199 volumes, 7,135 slices), test (118 volumes, 4,092 slices), and challenge (104 volumes, 3,810 slices). These numbers are the same for multicoil and singlecoil data, with the exception of the test and challenge categories, where singlecoil data has respectively 10 and 12 volumes less than the multicoil data. The training, validation and test sets were publicly available since late November, 2018, while the challenge set was available since September 2019. The full kspace was available for all the datasets except for the test and challenge sets. Training and validation sets were considered for training and optimizing our model, while the test set was used for evaluating model performance on a public leaderboard. The final model was evaluated by the organizers on the independent challenge set.
The kspace data provided in the challenge were undersampled using a Cartesian mask, where kspace lines are set to zero in the phase encoding direction. The sampling density is dependent on the acceleration rate (4x or 8x). All the masks fully sample the lowfrequency area of kspace. For the 4x accelerated scans, it corresponded to 8% of the total kspace lines, while it was 4% for 8x acceleration.
IiB Quantitative Evaluation
In order to measure the accuracy of the reconstructed volumes compared to the target volumes , the following metrics were considered:
IiB1 Normalized Mean Square Error (NMSE)
measures the square of the Euclidean norm between a pair of images:
(3) 
IiB2 Peak SignaltoNoise Ratio (PSNR)
the ratio between the maximum intensity and the underlying distortion noise:
(4) 
IiB3 Structural Similarity Index Metric (SSIM)
measures image similarity using human perception aspects
[20]. SSIM is calculated by measuring three image distortions including luminance , contrast and structure :(5) 
where are the distortion weights, here chosen as 1.
IiC Radiological Evaluation On The Challenge Dataset
We submitted the reconstructions on the challenge dataset via an online form, which were then evaluated independently by the fastMRI organizers, described in detail by Knoll et al. [18]. All submissions were ranked by the SSIM metric, after which only the 4 highest ranking submissions were evaluated by a panel of 7 radiologists. The panel was asked to evaluate the reconstructions on a scale from 1 to 5 on four different categories, where 1 is the best and 5 is the worst. The 4 categories were the rating of artifacts, reconstruction sharpness, perceived contrasttonoise ratio and diagnostic confidence. The radiological scores were subsequently averaged and translated to a final ranking.
Iii Methods
In this section we present the background of our solution, first by introducing the Iterative ShrinkageThresholding Algorithm (ISTA) [15] and, second, by introducing its deep learningbased variant, ISTANet [16]. Then, we present our solution, the AdaptiveCSNetwork, that builds on top of the ISTANet framework by introducing several improvements, including strong inductive biases derived from domain knowledge on the reconstruction problem.
Iiia ISTA Background
ISTA is an optimization algorithm to solve (1) in an iterative fashion, starting from the reconstruction , which is often obtained by reconstructing the zerofilled undersampled kspace. The initial estimate is refined using the following update rules:
(6)  
(7) 
where is an update of the estimate , where the error in the measured data is corrected by a step . Equation (7) is a special case of the proximal mapping, with a regularization weight , and a crucial step for optimization algorithms such as ISTA, ADMM and AMP. When is a wavelet transform , it can be proven that
(8) 
where is the softtresholding operator defined as . In general, solving (7) is not straightforward for nonlinear operators
, limiting the applicability of the ISTA framework to simple transforms. Another problem of this family of algorithms, is the difficulty of tuning the hyperparameters
and in addition to its slow convergence, hence requiring a lot of iterations to achieve the optimal solution of (1).IiiB ISTANet
Recently, Zhang and Ghanem introduced a deeplearning approach to overcome the limitations of the ISTA framework for imagetoimage reconstruction. Their solution, called ISTANet [16], replaces the handcrafted transform with a learned operator
, which consists of a 2D learnable convolution followed by a rectified linear unit (ReLU) and a second convolution. By replacing
with in (7), we can rewrite the update rule as(9) 
and, by defining as the inverse of , i.e., , Zhang and Ghanem propose to update (8) as follows:
(10) 
where has a similar architecture as .
The model is trained endtoend, where the iterations of the ISTA algorithm are “unrolled”, i.e., a number of identical reconstruction blocks are created. Note that in the ISTANet approach, the learnable parameters are shared among all the blocks in the unrolled network, unlike our solution. The training loss is defined as a combination of the reconstruction and discrepancy loss:
(11)  
(12)  
(13) 
The reconstruction loss encodes the need for the final reconstruction, defined as , to be as close as possible in the least squares sense to the groundtruth image, i.e., . The discrepancy loss stimulates that . The parameter allows to control the weight given to the discrepancy loss, and it is chosen to be arbitrarily small, e.g., . An extension, called ISTANet is also presented by the authors, where residual computations are adopted.
IiiC AdaptiveCSNetwork
Starting from the network developed by Zhang and Ghanem, we developed the AdaptiveCSNetwork approach. Our solution builds on top of the ISTANet solution based on three key innovations, here ordered by importance to the final network performance: i) the use of multiscale and ii) multislice computations, together with iii) the introduction of soft MRI priors. We present them independently, building towards the update rule of the AdaptiveCSNetwork model as presented in (15). Fig. 1 illustrates the proposed network.
First, many nonlearned CS algorithms make use of multiscale transforms to sparsify the signal. An example is given in (8), where is a wavelet transform; a decomposition of the signal into a set of basis functions at different scales. We include this inductive bias in our design, and adopt a multiscale transform , and its inverse . As an additional design choice, we decide to sparsify and learn only the residual, therefore our update rule is written as follows:
(14) 
where
comprises of 2D convolutions, since the acquisition protocol was 2D, and LeakyReLU operations. To generate a multiscale representation, a maxpooling layer is used and the resulting features are then processed again by convolutional blocks and nonlinearities. The exact design of
is presented in Fig. 1. The feature maps produced at the different scales are then thresholded using the softmax function. Differently from ISTANet, we learn a lambda parameter and feature channel for each scale . This approach gives the network the flexibility of tuning the thresholds independently, hence reducing the complexity of the transforms learned by the convolutional operators. Finally, the filtered channels are transformed back into the image domain by the inverse, consisting of interpolation, 2D convolutions and LeakyReLu operators. Note that, contrary to the latest literature in deep learning networks, we decided not to adopt strided convolutions for sub and upsampling, which would increase the risk of creating checkerboard artifacts
[21]; instead we took the more conservative approach of adopting pooling and interpolation layers for achieving better image quality. Overall, the computation represented by is implemented with a UNetlike architecture [22], where the feature maps before the skip connections are filtered according to the parameter .Second, it is important to note that the slice thickness of the dataset is much higher than the inplane resolution. This indicates that interslice correlations are less useful for finer scales, and potentially damaging as the network could see the extra slices as noisy information. However, such information can be beneficial at coarser scales. Since our transform is multiscale by nature, we found it beneficial to inject neighboring slices
into the model. To reduce the memory footprint of the model, we adopted a 2.5D convolution approach by concatenating neighbouring slices into the input tensor along the channel dimension, enabling to “reinvest” the saved GPU memory as compared to a truly 3D convolution approach, into more unrolled iterations. More details on the number of slices used and the definition of the loss function are given in Section
IVC.Finally, we adopted a hybrid or nudge approach to incorporate additional prior knowledge into the reconstruction algorithm. We therefore computed additional information derived from the current estimate together with kspace . These soft priors, which are presented in the next section, capture some properties of an MR image that cannot be easily learned by a deep neural network due to the limited size of the receptive field. The priors come in the form of images, and are provided as extra input channel to the transform . In this way, they are integrated in the computations performed by whenever this is beneficial for the optimization of the loss function.
IiiD Final Design
The overall update for a block in the AdaptiveCSNetwork model is defined as follows:
(15) 
Each block in the network learns different transforms and , enabling each block to focus on different properties of the image and effectively increasing the network capacity.
In our final design, the transform does not receive the data consistent image , as defined in (6), but rather the current estimate together with the data consistency prior computed as follows:
(16) 
This “soft dataconsistency” update allows the network to evaluate the reliability of the acquired data and potentially compensate errors in the coil combination defined by in (1).
The second prior we provided to the network, , represents the known phase response for spinecho MR sequences. Theoretically, spinecho sequences have zero phase everywhere in the image. In practice, however, slowly varying phase will occur, i.e. nonzero phase only in the low frequencies, due to hardware and acquisition imperfections. Taking this into account, it is noted that the final reconstructed image should be a real valued image after removal of the slowly varying phase. This information is captured in the following prior:
(17) 
where denotes the complex conjugate, and refers to low pass filtering. The low pass filter is chosen such that it corresponds to the center part of kspace which is fully sampled. By doing so, the low pass filtered image can be derived beforehand only once, hence is replaced by .
Finally, we adopt a simple approach to estimate the location in where the background is found, which is common in parallel imaging techniques. The following prior is applied:
(18) 
This prior will penalize estimated signal content where is low, i.e., within the background. Again, is replaced by . Because is based on the fully measured centeral part of kspace, the image is artefact free albeit at low spatial resolution, leading to a reliable background identification.
In Fig. 1 the design of the AdaptiveCSNetwork is shown, including the multiscale transforms, the multislice computation and the priors provided as input. Note how the spinecho and background priors are computed only for the central slice, in order to save GPU memory.
IiiE Network Training and Implementation Details
We implemented our models in PyTorch
[23]. All the optimization experiments were performed on NVIDIA GeForce GTX 1080 GPUs with 16 GB RAM and the final network was trained on two NVIDIA V100 GPU with 16 GB RAM. In order to run as many experiments as possible given the challenge deadline, model optimization (see Section IV) was done with a relatively small model (10 blocks), which we trained for 20 epochs. All the optimization networks were trained and validated on the highest acceleration rate of the challenge, i.e. 8x and for singlecoil data, except for the number of the blocks which was performed for both 4x and 8x, and for the priors which are more relevant for the multicoil data. Since the ground truth for the test set was not available, all the quantitative comparisons were only done on the validation set.
For the challenge, we trained the final model using the training and validation datasets for 25 epochs and accelerations randomly selected from 2x to 10x. The model was subsequently finetuned on eight data subpopulations identified by the acceleration (4x and 8x), the protocol (PD and PDFS) and the scanner field strength (1.5T and 3T). Finetuning was then performed for 10 epochs on the subpopulations. This procedure was performed independently for the single and multicoil datasets, resulting in a total of 8 models. All models were trained using an exponentially decaying learning rate of . The final models had 33M trainable parameters each.
Iv Experiments and results: Model optimization
In this section we present how we optimized the network configuration, on a smaller model with reconstruction blocks, using the quantitative measures reported in Section IIB for validation. We performed experiments on the number of the blocks, the loss functions, the influence of using adjacent slices, the optimizer, and the soft priors. A repeated measure oneway ANOVA test was performed on the SSIM values using a significance level of . Pvalues are only stated for the comparisons between the best method and the other methods.
Iva Number of blocks
The proposed model consists of multiple blocks, related to the number of unrolled iterations of the ISTA scheme. Increasing the number of blocks leads to an increase in the number of parameters of the model, and subsequently training time and GPU memory usage as well as an increase in risk of overfitting. In this experiment we investigated the effect of the number of the blocks on the quality of reconstructed images. Tests were ran with the 2D network for 4x and 8x acceleration rates without neighboring slices, MSE as loss function, RMSprop as optimizer, and with the Unetlike architecture of 16 filter maps for each convolutional layer. Fig.
2 reports the relative changes to a single block of our quantitative metrics. Based on the experiments, increasing the number of the blocks will improve the performance of the network. Therefore, the final network was configured with the maximum number of blocks that could be fitted into GPU memory: 25 blocks. However, for the optimization experiments below only 10 blocks were employed to limit the duration of the training.IvB Loss functions
In this experiment we investigated the effect of a wide range of loss functions on the performance of our network. Here, we used the single slice reconstruction network with only 10 blocks, RMSprop as the optimizer, and 16 filter maps for each convolutional layer. The models were trained for 20 epochs to ensure convergence of the model. The evaluated loss functions included MSE, perceptual loss (PL) [24], , Huber [25] and multiscale structural similarity index (MSSIM) [26]. The PL loss function was calculated using a pretrained VGG16 at layers relu12, relu22, and relu33.
Loss function  SSIM  NMSE  PSNR  value 

MSE  0.6570.149  0.0460.029  30.22.8  0.001 
Perceptual loss  0.6640.157  0.0610.044  29,23.2  0.001 
Huber  0.6640.148  0.0620.041  29.13.0  0.001 
0.6640.148  0.0620.041  29.13.0  0.001  
SSIM  0.6620.145  0.0650.041  28.92.8  0.001 
MSSIM [26]  0.6710.143  0.0500.034  30.13.1  0.001 
Eq. (20)  0.6730.143  0.0480.033  30.33.1 
MSSIM [26] builds upon SSIM (see Section IIB3) by incorporating structural similarity at multiple image resolutions, thereby supplying more flexibility compared to SSIM, and is defined as follows:
(19) 
where denote the reconstructed and target images respectively, is the number of scales used, , c and s are the luminance, contrast, and structure as defined in [20], , , and are the weights of the distortion factors at different resolution levels. We adopted the same weights as reported in [26].
Zhao et al. [27] reported that a linear combination of SSIM and preserves the different properties of an image better than each separately: SSIM encourages the network to preserve structural and contrast information, while enforces sensitivity to uniform biases to preserve luminance [28]. Since MSSIM reached higher metric values than SSIM (see Table I), we deployed a weighted summation of MSSIM [26] and :
(20) 
where was chosen, following Zhao et al. [27].
Table I reports the quantitative results for the different loss functions. The weighted linear combination of MSSIM and yielded the best results, where the values indicate that the improvement was highly consistent, albeit with small improvements on SSIMvalues. Fig. 3 shows two example results for the different loss functions, confirming the favorable results for the model trained using a combination of MSSIM and . Therefore, this loss function was selected for training the final model. For the remainder of the experiments, MSSIM is used as loss function.
Network  SSIM  NMSE  PSNR  value 

2D  0.6710.143  0.0500.034  30.13.1  0.001 
2.5D W0.1  0.5490.128  0.0890.034  26.82.1  0.001 
2.5D W0.2  0.5480.128  0.0900.033  26.82.1  0.001 
2.5D  0.6740.143  0.0480.033  30.33.1 
IvC Multislice Network
The resolution of the images in the dataset is anisotropic with a voxel size of . Due to the correlation between adjacent slices with respect to anatomical structures in MRI images, we performed an experiment to assess whether inclusion of neighbouring slices into the reconstruction might improve the performance. We compared the 2D scheme using only the center slice with three alternative 2.5D schemes: i) the neighboring slices were used together with the center slice as input, but only the center slice was used in the loss function (network 2.5D); ii) and iii) the neighboring slices are also used in the loss, with different weights (0.1 vs 0.2 for the neighbors; 1.0 for the center slice). MSSIM was used for the loss function, 10 blocks, RMSprop as the optimizer, and 16 feature maps.
Table II shows the results of this experiment, showing that the 2.5D schema very consistently improves over the 2D scheme, and that the loss should only be defined on the center slice. For the final model, this scheme was selected.
IvD Optimizer
We experimented with different optimizers including RMSprop, rectified Adam (RAdam) [29], LookAhead [30] and Ranger [31]. RAdam exploits a dynamic rectifier to adjust the adaptive momentum of Adam [32]. LookAhead not only uses an adaptive learning rate and accelerated schemes but also iteratively updates two sets of weights, i.e. fast and slow weights. Ranger combines Radam and LookAhead optimizers into a single one. We used the 2D network with 10 blocks and 16 feature maps for each layer, and MSSIM the loss function.
Table III tabulates the results for the different optimizers. Since the best results were obtained for the RAdam optimizer, very consistently improving over the other optimizers, this was used for the final network.
Optimizer  SSIM  NMSE  PSNR  value 

RMSprop  0.6730.143  0.0480.033  30.33.1  0.001 
LookAhead  0.6680.140  0.0500.032  30.02.9  0.001 
Ranger  0.6680.140  0.0500.032  30.02.9  0.001 
RAdam  0.6740.141  0.0480.032  30.33.0 
IvE AdaptiveCSNet vs ISTANet
In this experiment, we compare the proposed model to ISTANet [16]. For this experiment, a 2D network with 10 blocks and 16 feature maps per layer was used, SSIM as loss function, and RAdam as the optimizer. Since ISTANet uses a much smaller single scale architecture with much fewer network parameters, we added an experiment increasing the feature maps for ISTANet such that the number of parameters was the same as for our architecture. According to the results reported in Table IV, the proposed model outperforms ISTANet significantly.
Model  SSIM  NMSE  PSNR  value 

ISTANet  0.5470.117  0.1690.022  23.81.9  0.001 
ISTANetL  0.5430.119  0.1030.038  26.22.0  0.001 
ACSNet  0.6710.143  0.0500.034  30.13.1 
IvF Soft Priors
To assess the contribution of the additional soft priors, we compared the full model against a version without known phase behaviour and without background information . Visually, we observed only small differences. To verify the differences in a realistic setting, we submitted the results to the public leaderboard of the fastMRI challenge. As shown in Table V, the network with all priors performed better in terms of the SSIM metric, although the results worsened in terms of NMSE and PSNR. Despite the fact that the improvement was minimal, we decided to adopt all priors for the final model to ensure our participation in the last challenge phase, since the selection was based on SSIM.
Acceleration  prior  SSIM  NMSE  PSNR 

4x  0.772  0.025  30.98  
0.773  0.028  33.49  
8x  0.674  0.038  30.90  
0.675  0.044  30.27 
V AdaptiveCSNET: Submitted Model
In this section, we describe the configuration of the submitted model [33] and analyze the resulting reconstructions. The final performance is evaluated with the quantitative metrics on the test and challenge datasets, and by presenting the radiological scores for the challenge dataset as performed by the fastMRI challenge organizers.
Following our model optimization study, the configuration of the final model was determined as follows. The linear combination of MSSIM and (20) was chosen as the loss function. The 2.5D scheme was chosen with two neighboring slices, with the loss applied only on the central slice. For training the model, the RAdam optimizer was deployed. Fig. 4 shows the structure of the final network. Each block is determined by three parameters for the denoiser: 1) the number of scales for the denoiser , 2) the kernel size used in the convolutions and, 3) the number of feature maps in the first convolutional layer, which is then doubled at each scale. According to the experiments presented in Fig. 2, the number of reconstruction blocks greatly affects the reconstruction performance, empirically observing that performance still improves when 15 blocks are used. The available GPU memory is a limiting factor when designing a deep neural network. To allow for a large number of blocks, we chose a different design in each block, mixing a less powerful design (16 filters) with more powerful ones (64 filters). By adopting this strategy, our final design contained 25 reconstruction blocks.
Dataset  Coil  Detail  SSIM  NMSE  PSNR  
Test  multi  4x  ALL  0.928  0.005  39.9 
PD  0.961  0.002  41.7  
PDFS  0.891  0.009  37.9  
8x  ALL  0.888  0.009  36.8  
PD  0.937  0.005  38.5  
PDFS  0.843  0.013  35.3  
single  4x  ALL  0.777  0.027  33.7  
PD  0.877  0.010  36.9  
PDFS  0.685  0.043  30.7  
8x  ALL  0.680  0.042  30.5  
PD  0.777  0.019  32.4  
PDFS  0.575  0.067  28.5  
Challenge  multi  4x  ALL  0.927  0.005  39.9 
8x  ALL  0.901  0.009  37.4  
single  4x  ALL  0.751  0.030  32.7 
Fig. 5 shows example results of the final network for the multicoil track from the validation dataset. Fig. 6 shows examples from the test and challenge datasets. Table VI shows the SSIM, NMSE, and PSNR values for the test and challenge set (as described in Section IIA), for the images with and without fat suppression and both combined, for both single and multicoil MRI scans. For the radiological evaluation, our method scored 2.285, 1.286, and 2.714 for multicoil 4x, multicoil 8x, and singlecoil 4x, respectively (the closer to 1, the better). More details on the results for the challenge were presented in [18].
Vi Discussion
In this paper we propose a general method, named AdaptiveCSNet, for reconstructing undersampled MRI data, combining ideas from compressed sensing theory with ideas from MR physics and deep learning. The method was developed in the context of the 2019 fastMRI challenge, which focused on accelerating knee MR imaging. The proposed network is an unrolled iterative learningbased reconstruction scheme, in which a large number of reconstruction blocks refine the MR image by denoising the signal in a learned and multiscale fashion. Moreover, we added neighboring slices as input to the sparsifying transform, as well as a number of soft priors that encode MRI domain knowledge.
The main driver of the performance of our network is the multiscale architecture, as demonstrated in a direct comparison with ISTANet that is corrected by the number of trainable parameters. According to the experimental results on the number of blocks for 4x and 8x accelerations of both single and multicoil data, we showed that the number of blocks has a large impact on model performance. Therefore, it was decided to use the maximum number of blocks that we could fit into the GPU memory, where we adopted different model designs for the different blocks to save memory. It might be expected that beyond a certain number of blocks, overfitting of the data might occur. However, signs of overfitting were not observed during training and the final number of blocks was only marginally larger than tested in the optimization experiments. Whether further increase in the number of blocks could results in even better performance could be the topic of further experiments. This would, however, need better hardware, as the current design is memory and timebound during training. With the current configuration, final model training took approximately 7 days on two V100 GPUs.
We experimented with a large variety of loss functions. Results showed that the linear summation of MSSIM and performed best. We defined a 2.5D scheme to train the network in which three adjacent slices were reconstructed while the loss function was calculated only for the central slice. The proposed scheme outperformed the 2D network as well as 2.5D networks in which the loss was calculated over all slices. By incorporating the neighbouring slices, the network can exploit existing correlations into the reconstruction of the target slice, which is our main target as defined by our loss. It can be expected that for MRI acquisition with less asymmetric voxel sizes, the inclusion of information of neighbouring slices would become more important. We tested different optimizers, where the newly introduced RAdam outperformed the others and we used it for training the final network. We also incorporated prior knowledge, including data consistency, known phase behaviour and background discrimination to support the network in the reconstruction process. We observed that these priors provided only limited extra performance to the network, resulting in visually similar images and minimal difference in the metrics. We can conclude that the AdaptiveCSNet is sufficiently powerful to learn directly from the data how to reconstruct the undersampled kspace, being the multiscale structure and the use of many reconstruction blocks the main driver of our performance. As a future work, we want to better understand how much the network is relying on the priors by adopting interpretable AI techniques such as differentiable image parameterizations for feature visualization [34]. Stronger use of the priors via the loss function is an additional option.
As mentioned before, the radiologist scores were based on the visual quality of the reconstructed images and not on diagnostic interchangeability. Therefore, designing a network based on the diagnosis can be considered a point for further research. We furthermore observed that optimizing for SSIM was needed for reaching the final stage of the challenge, but is not necessarily an ideal representative of radiological image quality. This observation was very recently confirmed in a comparative study by others [35].
Vii Conclusion
In this paper we propose an adaptive intelligence algorithm called AdaptiveCSNet, which was developed in the context of the 2019 fastMRI challenge. In the two clinically relevant tracks of the challenge, using multicoil MRI acquisitions, the proposed method was leading, while on a simulated singlecoil track the method ranked 3rd.
Viii Acknowledgements
Facebook AI Research and NYU Langone Health are acknowledged for organizing the 2019 fastMRI challenge.
References
 [1] M. Zaitsev, J. Maclaren, and M. Herbst, “Motion artifacts in MRI: A complex problem with many partial solutions,” Journal of Magnetic Resonance Imaging, vol. 42, no. 4, pp. 887–901, 2015.
 [2] S. S. Vasanawala, M. T. Alley, B. A. Hargreaves, R. A. Barth, J. M. Pauly, and M. Lustig, “Improved pediatric MR imaging with compressed sensing,” Radiology, vol. 256, no. 2, pp. 607–616, 2010.
 [3] A. Hsiao, M. Lustig, M. T. Alley, M. Murphy, F. P. Chan, R. J. Herfkens, and et al, “Rapid pediatric cardiac assessment of flow and ventricular volume with compressed sensing parallel imaging volumetric cine phasecontrast MRI,” American Journal of Roentgenology, vol. 198, no. 3, pp. W250–W259, 2012.
 [4] M. S. Cohen and R. M. Weisskoff, “Ultrafast imaging,” Magnetic Resonance Imaging, vol. 9, no. 1, pp. 1–37, 1991.
 [5] D. L. Donoho, “Compressed sensing,” IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289–1306, 2006.
 [6] M. Lustig, D. Donoho, and J. M. Pauly, “Sparse MRI: The application of compressed sensing for rapid MR imaging,” Magnetic Resonance in Medicine, vol. 58, no. 6, pp. 1182–1195, 2007.
 [7] Z.P. Liang, F. Boada, R. Constable, E. Haacke, P. Lauterbur, and M. Smith, “Constrained reconstruction methods in MR imaging,” Rev Magn Reson Med, vol. 4, no. 2, pp. 67–185, 1992.
 [8] K. P. Pruessmann, M. Weiger, M. B. Scheidegger, and P. Boesiger, “Sense: sensitivity encoding for fast MRI,” Magnetic Resonance in Medicine, vol. 42, no. 5, pp. 952–962, 1999.
 [9] D. Liang, J. Cheng, Z. Ke, and L. Ying, “Deep MRI reconstruction: Unrolled optimization algorithms meet neural networks,” arXiv preprint arXiv:1907.11711, 2019.
 [10] T. M. Quan, T. NguyenDuc, and W.K. Jeong, “Compressed sensing MRI reconstruction using a generative adversarial network with a cyclic loss,” IEEE Transactions on Medical Imaging, vol. 37, no. 6, pp. 1488–1497, 2018.
 [11] M. Mardani, E. Gong, J. Y. Cheng, S. S. Vasanawala, G. Zaharchuk, L. Xing, and et al, “Deep generative adversarial neural networks for compressive sensing MRI,” IEEE Transactions on Medical Imaging, vol. 38, no. 1, pp. 167–179, 2018.
 [12] M. Akçakaya, S. Moeller, S. Weingärtner, and K. Uğurbil, “Scanspecific robust artificialneuralnetworks for kspace interpolation (RAKI) reconstruction: Databasefree deep learning for fast imaging,” Magnetic Resonance in Medicine, vol. 81, no. 1, pp. 439–453, 2019.
 [13] B. Zhu, J. Z. Liu, S. F. Cauley, B. R. Rosen, and M. S. , “Image reconstruction by domaintransform manifold learning,” Nature, vol. 555, no. 7697, pp. 487–492, 2018.
 [14] D. Lee, J. Yoo, S. Tak, and J. C. Ye, “Deep residual learning for accelerated MRI using magnitude and phase networks,” IEEE Transactions on Biomedical Engineering, vol. 65, no. 9, pp. 1985–1995, 2018.
 [15] A. Beck and M. Teboulle, “A fast iterative shrinkagethresholding algorithm for linear inverse problems,” SIAM journal on imaging sciences, vol. 2, no. 1, pp. 183–202, 2009.

[16]
J. Zhang and B. Ghanem, “ISTANet: Interpretable optimizationinspired deep
network for image compressive sensing,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 1828–1837, 2018.  [17] J. Zbontar, F. Knoll, A. Sriram, M. J. Muckley, M. Bruno, A. Defazio, and et al, “fastMRI: An open dataset and benchmarks for accelerated MRI,” 2018.
 [18] F. Knoll, T. Murrell, A. Sriram, N. Yakubova, J. Zbontar, M. Rabbat, and et al, “Advancing machine learning for MR image reconstruction with an open competition: Overview of the 2019 fastMRI challenge,” arXiv preprint arXiv:2001.02518, 2020.

[19]
F. Knoll, J. Zbontar, A. Sriram, M. J. Muckley, M. Bruno, A. Defazio, and
et al, “fastmri: A publicly available raw kspace and dicom dataset of knee
images for accelerated mr image reconstruction using machine learning,”
Radiology: Artificial Intelligence
, vol. 2, no. 1, p. e190007, 2020.  [20] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004.
 [21] A. Araujo, W. Norris, and J. Sim, “Computing receptive fields of convolutional neural networks,” Distill, vol. 4, no. 11, p. e21, 2019.
 [22] O. Ronneberger, P. Fischer, and T. Brox, “Unet: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer Assisted Intervention, pp. 234–241, Springer, 2015.
 [23] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, and et al, “Automatic differentiation in PyTorch,” in Proceedings of Neural Information Processing Systems, 2017.

[24]
J. Johnson, A. Alahi, and L. FeiFei, “Perceptual losses for realtime style transfer and superresolution,” in
European Conference on Computer Vision, pp. 694–711, Springer, 2016.  [25] P. J. Huber, “Robust estimation of a location parameter,” in Breakthroughs in statistics, pp. 492–518, Springer, 1992.
 [26] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in The ThritySeventh Asilomar Conference on Signals, Systems & Computers, 2003, vol. 2, pp. 1398–1402, IEEE, 2003.
 [27] H. Zhao, O. Gallo, I. Frosio, and J. Kautz, “Loss functions for image restoration with neural networks,” IEEE Transactions on Computational Imaging, vol. 3, no. 1, pp. 47–57, 2016.
 [28] H. Zhao, O. Gallo, I. Frosio, and J. Kautz, “Loss functions for neural networks for image processing,” arXiv preprint arXiv:1511.08861, 2015.
 [29] L. Liu, H. Jiang, P. He, W. Chen, X. Liu, J. Gao, and et al, “On the variance of the adaptive learning rate and beyond,” ICLR, 2020.
 [30] M. Zhang, J. Lucas, J. Ba, and G. E. Hinton, “Lookahead optimizer: k steps forward, 1 step back,” in Advances in Neural Information Processing Systems, pp. 9593–9604, 2019.
 [31] Q. Tong, G. Liang, and J. Bi, “Calibrating the learning rate for adaptive gradient methods to improve generalization performance,” arXiv preprint arXiv:1908.00700, 2019.
 [32] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” International Conference for Learning Representations, 2015.
 [33] N. Pezzotti, E. de Weerdt, S. Yousefi, M. S. Elmahdy, J. van Gemert, C. Schülke, and et al, “AdaptiveCSNet: fastMRI with adaptive intelligence,” arXiv preprint arXiv:1912.12259, 2019.
 [34] A. Mordvintsev, N. Pezzotti, L. Schubert, and C. Olah, “Differentiable image parameterizations,” Distill, vol. 3, no. 7, p. e12, 2018.
 [35] A. Mason, J. Rioux, S. E. Clarke, A. Costa, M. Schmidt, V. Keough, T. Huynh, and S. Beyea, “Comparison of objective image quality metrics to expert radiologists’ scoring of diagnostic quality of MR images,” IEEE Transactions on Medical Imaging, vol. 39, no. 4, pp. 1064–1072, 2020.
Comments
There are no comments yet.