I Brief Literature Review
The idea of applying machine learning for MR reconstruction is not new. Nearly thirty years ago, for example, neural networks were investigated in the context of minimizing Gibbs artifacts resulting from kspace truncation
[yan1993data, hui1995mri, hui1995comments]. More recently, hardware and software advancements have allowed for training of advanced models and by 2016, the first deep learning models were being investigated [sun2016deep, wang2016accelerating]. Since then, the deeplearningbased MR reconstruction field has grown rapidly. Several deeplearning models have been proposed for MR CS reconstruction, however most have been validated using private datasets and in a SC acquisition setting. In late 2018, the fastMRI initiative [zbontar2018fastmri] made SC and MC knee raw MR data available for benchmarking purposes. The CalgaryCampinas initiative [RN136] has also added publicly available SC brain MR raw data. With this report, we provide access to MC data.Deeplearningbased MR reconstruction models can be categorized into four groups (Figure 1):

Image domain learning uses an image obtained by inverse Fourier transforming the zerofilled kspace as a starting point. It uses a deep learning model that operates solely in the image domain;

Sensor domain (kspace) learning
uses a model operating solely in the acquisition domain, which is the spatialfrequency domain in the case of MR imaging. It tries to estimate the missing kspace samples followed by applying the inverse FT to reconstruct the final image;

Domain transform learning methods try to learn the appropriate transform directly from the sparsely sampled kspace data in order to generate alias free imagedomain reconstructions; and

Hybrid (sensor and image domains) learning comprises blocks that process the data in both sensor (kspace) and image domains. These blocks are connected through the appropriate FT (i.e., direct or inverse).
The majority of techniques proposed to date are image domain learning methods (see Table I).
Model  Reference 

Image domain learning  [mardani2019deep, Schlemper2018StochasticDC, dedmari2018complex, qin2019convolutional, semantic_interpretability, xiang2018deep, RN306, RN307, RN305, RN253, wang2016accelerating, kwon2017parallel, hammernik2018learning, han2018deep, zengK2019] 
Sensor domain learning  [zhang_micccai_2018, akccakaya2019scan, kim2019loraki] 
Domain transform learning  [RN289, RN323, dautomap] 
Hybrid domain learning  [souza19a, wang2018, eo2018kiki, souza2018hybrid] 
Literature summary classifying key magnetic resonance imaging reconstruction methods into four groups: 1) image domain learning, 2) sensor domain learning, 3) domain transform learning, and 4) hybrid domain learning.
Ia Image Domain Learning
The seminal work of Jin et al. [RN254]
proposed to use a direct inversion, which would be the zerofilled kspace inverse FT in the case of MR reconstruction, followed by a residual Unet to solve normalconvolutional inverse problems. The residual connection learns the difference between input and output to mitigate the vanishing gradient problem that can potentially disturb the network training process. Jin
et al. tested their model by reconstructing xray computed tomography in synthetic phantoms and real sinograms, but their model is directly extendable to MR reconstruction. In another study, Lee et al.[RN253] compared residual against nonresidual Unets to reconstruct MC MR data. Their results clearly indicated the advantage of using the residual connection, which have subsequently been incorporated in the majority of recently proposed models (cf., [RN307, RN305, RN306, qin2019convolutional, eo2018kiki, mardani2019deep]).The model proposed by Schlemper et al. [RN306] consists of a flat unrolled deep cascade of CNNs interleaved with data consistency (DC) operations. DC replaces the network kspace signal estimates by measurements obtained in the sampling process. For dynamic MR reconstruction, their model also included data sharing layers. Seitzer et al. [semantic_interpretability] built upon [RN306] by adding a visual refinement network, which in their case is a residual Unet, that was trained independently using the result of the flat unrolled deep cascade as its input. Their results showed improvement in terms of semantic interpretability and mean opinion scores[semantic_interpretability], but the flat unrolled cascade was still better in terms of peak signaltonoise ratio (pSNR). In a subsequent work, Schlemper et al. [Schlemper2018StochasticDC] added dilated convolutions and a stochastic component to their originally proposed model [RN306]
. The dilated convolutions were used to efficiently increase the network receptive field, while the stochastic component consisted of dropping subnetworks of the cascade with a given probability. These authors claimed that their stochastic component accelerated learning because the network became shorter and each subnetwork could see different levels of residual noise, which made the model more robust.
The cascaded deep learning models consist basically of a stack of convolutional layers. However, when the number of convolutional layers increase, their influence on subsequent layers decrease. It makes the training process more difficult. In order to overcome this problem, Zeng et al.[zengK2019] proposed a very deep densely connected network that combines subnetworks connecting them by dense connections. Each subnetwork generates a reconstructed MR image using the information of the previous subnetwork. The subnetworks are composed from convolutional and DC layers. The dense connections are expected to help with the vanishing gradient problem and the training process, improving the overall reconstruction performance of the network.
A commonly perceived problem with CS MR reconstruction techniques is the loss of highfrequency information, which happens due to two factors: 1) Most kspace sampling schemes favor sampling the lowfrequencies more densely, i.e.
, high frequencies are less densely sampled; and 2) commonly used network loss functions, such as
norm, tend to give smooth reconstructions. Adversarial models try to mitigate this problem by including a new term in the reconstruction model (generator) cost function based on the capacity of a properly trained classifier (discriminator) to distinguish between a fully sampled inverse FT reconstruction and a CS accelerated MR reconstruction. Yang et al. [RN307] proposed a deep dealiasing generative adversarial network (DAGAN) that used a residual Unet as generator with a loss function composed of four different components: an image domain loss, a frequency domain loss, a perceptual loss, and an adversarial loss. Quan et al. [RN305] proposed an adversarial model with a cyclic loss [RN315]. Their method consisted of a reconstruction network cascaded with a refinement network governed by a cyclic loss component that tried to enforce that the mapping between input (sampled kspace) and output (reconstructed image) was a bijection, i.e. invertible. Mardani et al. [mardani2019deep] proposed a generative adversarial network for compressed sensing (GANCS) that tried to model the low dimensional manifold of highquality MR images by leveraging a mixture of leastsquares generative adversarial network and a pixelwise cost.The work of Dedmari et al.[dedmari2018complex]
, to the best of our knowledge, is the only study that implemented a complexvalued fully convolutional neural network (CNN) for MR reconstruction. The main advantage of their work was that they took full advantage of the complex number arithmetic as opposed to the other techniques that represent complexvalues by splitting real and imaginary components into separate image channels. Unfortunately complexvalued neural networks implementations are still in their infancy, and deep learning frameworks do not provide support for defining complex networks, which is the reason we did not use them in this work.
We would like to emphasize that even though some image domain learning models described have DC blocks or a frequencydomain term in the network training loss function, the learning portion of these models happened in image domain. Therefore, we did not classify these models as hybrid.
IB Sensor Domain (Kspace) Learning
The work of Zhang et al. [zhang2018multi] followed the trend of using adversarial models. The authors proposed a MC generative adversarial network for MR reconstruction in the kspace domain. They tested their approach on 8channel data using coherent sampling. Their network output estimated the fully sampled kspaces for each channel. Their final reconstruction was obtained by taking the channelwise inverse FT and combining the channels through sum of squares [larsson2003snr]. Akćakaya et al. [akccakaya2019scan]
proposed a scanspecific model for kspace interpolation that was trained on the autocalibration signal. Their model outperformed GRAPPA especially for acceleration factors
. Kim et al. [kim2019loraki]proposed a similar approach, but using a recurrent neural network model. In their experiments they outperformed the model proposed in
[akccakaya2019scan].IC Domain Transform Learning
Zhu et al. [RN289] proposed to learn the manifold of the transform that connected the sampled kspace and image domains. Their technique is called automated transform by manifold approximation (AUTOMAP). Their model had a quadratic parameters complexity, which did not allow them to train their model due to hardware limitations for images of dimensions greater than pixels. Subsequent work by Schlemper et al. [dautomap] proposed to decompose AUTOMAP (dAUTOMAP). Instead of learning a twodimensional transform, they decomposed it into two onedimensional transforms, which made their model parameter complexity linear. In their comparison, dAUTOMAP outperformed AUTOMAP. A somewhat similar approach that looks into the translation of onedimensional inverse FT of kspace to an image was investigated in [RN323]. The authors only compared their proposal against traditional CS reconstruction models and demonstrated superior results.
ID Hybrid Learning
Hybrid models leverage information as presented in kspace and image domains without trying to learn the domain transform, making the parameter complexity more manageable. A previous study [souza2018hybrid] proposed a hybrid model, which consisted of a kspace Unet connected to an image domain Unet through the inverse FT. Their model was trained endtoend. However, the model did not have DC steps, and was assessed only on singlecoil data. Eo et al. [eo2018kiki] developed a dualdomain model named KIKInet that cascaded kspace domain networks with image domain networks interleaved by DC layers and the appropriate domain transform. A similar approach has also been used for computed tomography reconstruction [adler2018learned]. A further investigation of KIKInet [souza19a] looked at other possible domain configurations for the subnetworks in the cascade and their results indicated that starting the cascade with an image domain subnetwork may be advantageous.
Ii Materials and Methods
Iia Dataset
One hundred and eleven volumetric T1weighted partially Fourierencoded hybrid datasets were consecutively acquired as part of the ongoing Calgary Normative Study [tsang2017white]. Data were acquired on a clinical MR scanner (Discovery MR750; General Electric Healthcare, Waukesha, WI) with a 12channel coil. A threedimensional, T1weighted, gradientrecalled echo, sagittal acquisition was employed on presumed healthy subjects (age: years years [mean standard deviation]; range: years to years). Acquisition parameters were TR/TE/TI = ms/ ms/ ms (92 scans) and TR/TE/TI = ms/ ms/ ms (19 scans), with to contiguous mm slices and a field of view of mm mm. The acquisition matrix size for each channel was . In the sliceencoded direction (), data were partially collected up to and then zero filled to . The scanner automatically applied the inverse FT, using the fast Fourier transform (FFT) algorithms, to the kspace data in the frequencyencoded direction, so a hybrid dataset was saved. Kspace undersampling was then performed retrospectively in two directions (corresponding to the phase encoding, , and slice encoding, , directions). Note that the reconstruction problem is effectively a twodimensional problem (i.e., in the plane). The partial Fourier data were reconstructed by taking the channelwise iFFT of the collected kspaces and combining the outputs through the conventional sum of squares algorithm that has been shown to be optimal in terms of signaltonoise ratio for reconstruction of MC MR [larsson2003snr]. The reconstructed spatial resolution was mm.
The acquired data were used to train, validate and test the proposed SC and MC deep learning reconstruction models. The raw dataset used in this work is publicly available for benchmark purposes as part of the CalgaryCampinas dataset [RN136] (https://sites.google.com/view/calgarycampinasdataset /home).
IiB Cascade of Unet Models
Let represent fullysampled kspaces, one for each coil channel, of sizes pixels. The fully sampled reconstruction is given by:
(1) 
where is the twodimensional FT operator applied across each channel component of the multidimensional array. The input for our model is the undersampled and zerofilled set of measurements that can be conveniently defined by:
(2) 
where is the elementwise multiplication and represents the sampling function defined by:
(3) 
is the set of kspace positions sampled. Our models consist of cascading Unets () where each Unet block operates either on kspace or image domains. The kspace domain Unet ():
(4) 
and the image domain Unet ():
(5) 
In these equations, represents a generic input in kspace domain . The right hand side of Equations 4 and 5 enforce DC for the kspace positions measured during the sampling process. This DC implementation consider a noiseless setting. Another common implementation consists in linearly combining the outputs predicted by the network with the values measured during sampling based on an estimated noise level [eo2018kiki, RN306]. Our final cascade of Unets model is given by:
(6) 
is the reconstruction estimated by the model. The loss function used to train the model was simply the mean squared error:
(7) 
where is the number of samples used to compute the loss and the upper script indicates a sample in this set.
IiC Deep Learning Models
Four different models were first investigated in this study. The twoelement Unet, termed Wnet, was tested using all four possible domain configurations: a) Wnet II, b) Wnet KK, c) Wnet IK, and d) Wnet KI. The Unet model (Figure 2) used in this work is a modified version of the originally proposed Unet [RN196] and was designed empirically. Modification was made because, when designing our model, we noticed that a network with less convolutions and convolutional layers yielded similar results compared to more complex models. Our Unet has 22 convolutional layers and 3,000,674 for the SC configuration and 3,011,156 trainable parameters for the MC configuration. The Wnet models consist of two cascaded Unets and thus have twice as many convolutional layers and trainable parameters.
In the second stage, the bestperforming Wnet was identified and concatenated with itself to form a fourelement WWnet model. We compared WWnet against the four previously described models. The WWnet model consists of four cascaded Unets and thus has four times as many convolutional layers and trainable parameters as the basic Unet.
In addition, we compared our W and WWnet results against the previously published Deep Cascade method [RN306]. We implemented Deep Cascade using six subnetworks and five convolutional layers with filters and a final convolutional layer that goes back to the number of channels of the input, i.e., either 2 or 24 depending on the model. Our Deep Cascade implementation had 894,348 parameters for the SC configuration and 978,960 trainable parameters for the MC configuration. We choose to compare our approaches against Deep Cascade because recent work has demonstrated superior performance when compared (cf., [RN306, souza19a]) to other recently published deeplearningbased MR image reconstruction techniques, such as Dictionary Learning MR Imaging [ravishankar2010mr], DAGAN [RN307], KIKInet [eo2018kiki] and the networks discussed in [RN305, souza2018hybrid]. We used our own implementation of Deep Cascade because the original implementation provided by the authors only worked in the SC configuration.
IiD Experimental Setup and Implementation
Each of the six networks described in the previous subsection were trained four times: once each for the SC and MC configurations and for each of two different acceleration factors, . is the reciprocal of the fraction of kspace that was sampled. In this work we tested and
. This resulted in a total of 24 trained models. All models were trained from scratch over 50 epochs using the Adam optimizer
[kingma2014adam] with a learning rate of and decay of . The networks training were interrupted if the cost function did not improve for five consecutive epochs and some of them completed training prior to 50 epochs, which provided the rationale for our chosen number of epochs. Fortythree volumes (consisting of 11,008 slices) were used for training, eighteen volumes (4,608 slices) for model selection (validation), and 50 volumes (12,800 slices) for testing. A Poisson disc distribution sampling scheme [cook1986stochastic] in the plane, where the center of kspace was fully sampled within a circle of radius 16 to preserve the lowfrequency information, was used. The radius of 16 was determined experimentally. During training, the sampling patterns were randomly generated on each epoch for data augmentation purposes. The deep learning reconstructions were compared against the fullysampled partial Fourier reconstruction reference. The best SC and MC models were also assessed for a range of acceleration factors extending between and .Our reconstruction models were implemented in Python 3 using the Keras library (
https://keras.io/) and TensorFlow (
https://www.tensorflow.org/) as the backend. Training, validation and testing were performed on a seventh generation Intel Core i7 processor with 16 GB of RAM memory and a GTX 1070 graphics processing unit (GPU). The code is publicly available at https://github.com/rmsouza01/CDDeep CascadeMRReconstruction.IiE Performance Metrics and Statistical Analysis
The reconstructed images were assessed both qualitatively (visual assessment) and quantitatively (performance metrics). Qualitative assessments included a single blinded expert (NN) reviewing the resulting images and assessing image artifact. Quantitatively, images were assessed using two commonly used image reconstruction performance metrics: nromalized root mean squared error (NRMSE) and peak signal to noise ratio (pSNR). Also, we used the visual information fidelity (VIF) [sheikh2006image] metric, which was shown to have a strong correlation with radiologist opinion when rating MR image quality [mason2019comparison].
Lower NRMSE represents better reconstructions, while the opposite is true for pSNR and VIF. Where appropriate, mean
standard deviation values were reported. Because the metrics did not follow a normal distribution, statistical significance between the experimental network models was determined using a nonparametric Friedman chisquared test. Posthoc testing to assess specific pairwise differences was performed using a Dunn’s test with Bonferroni correction. A
value was used as the level of statistical significance.Processing times of the SC and MC channel configurations were measured across two hundred and fiftysix (256) image slices using the hardware previously described (see Section III C). The average reconstruction time per slice for each of the models was reported.
Iii Results
A range of the slices towards the edges of the threedimensional acquisition volumes did not contain anatomy and were basically noise. Although qualitatively agreeing with the noise properties of the reference image (Supplementary Figure 2), reconstruction of these edge slices resulted in large changes in the residual maps and were thus excluded from the quantitative image analysis, leaving a total of slices in the test set.
The metrics for the SC configuration reconstruction are summarized in Table II. Statistically significant differences were found between the group means (). Posthoc testing indicated that Deep Cascade had the overall best metrics for , although the differences were small when compared with WWnet IIII. For , WWnet IIII obtained the best results. Among the SC configuration, in all experiments, image domain learning methods had superior results in the quantitative analysis, followed by hybrid models and then the kspace only model.
The performance metrics for the MC configuration reconstruction for and are summarized in Table III. Statistically significant differences were observed between group means (). The posthoc testing indicated that WWnet IKIK had the overall best metrics for both and . Among the MC configuration, in all experiments, hybriddomain learning methods had superior results in the quantitative analysis.
Model  NRMSE  pSNR (dB)  VIF 
Wnet II  
Wnet KK  
Wnet IK  
Wnet KI  
WWnet IIII  
Deep Cascade  
Wnet II  
Wnet KK  
Wnet IK  
Wnet KI  
WWnet IIII  
Deep Cascade 
Model  NRMSE  pSNR (dB)  VIF 
Wnet II  
Wnet KK  
Wnet IK  
Wnet KI  
WWnet IKIK  
Deep Cascade  
Wnet II  
Wnet KK  
Wnet IK  
Wnet KI  
WWnet IKIK  
Deep Cascade 
Representative sample reconstructed images using the SC and MC configurations for and are depicted in Figures 3 and 4, respectively. Visual assessment of the reconstructed images showed noticeable reconstruction artifacts, particularly with the SC configuration and the Wnet KK model. Artifacts are more noticeable at .
The arguably best SC model was WWnet IIII and the best MC model was WWnet IKIK. They were trained and tested for a range of acceleration factors (. The average NRMSE, pSNR and VIF results are depicted in Figure 5. On average the MC WWnet IKIK decreased NRMSE by and increased pSNR and VIF by and , respectively, compared to SC WWnet IIII. Differences were statistically significant (). Representative reconstructions for each accleration factor in the SC and MC configurations are depicted in Figures 6 and 7, respectively.
The average reconstruction time for each of the models assessed are reported in Table IV. The SC configuration was slower, because it reconstructed each of the 12channels independently prior to combining them through sum of squares. The slowest model in the SC configuration was WWnet IIII, which took ms to reconstruct each slice. The second slowest was Deep Cascade followed by the Wnet models. Our MC configuration implementation was not optimal, specially the portion that computes the channelwise FT (FFT or iFFT), which was implemented through a slow interpreted loop in Python. Although the MC configuration had a suboptimal implementation, the slowest model required ms to reconstruct a slice.
Model  SC (ms)  MC (ms) 

Wnet II  222.0  39.2 
Wnet KK  222.0  34.4 
Wnet IK  222.0  38.8 
Wnet KI  222.0  36.4 
WWnet IIII/IKIK  452.4  57.0 
Deep Cascade  400.8  62.4 
Iv Discussion
Our experiments indicated that cascades of Unets can improve CS MR reconstruction. In our comparison with the Deep Cascade method that is composed of six flat unrolled subnetworks, our WWnet model (composed of a cascade of four Unets) achieved statistically significant better results in three out of four experiments (SC, was the exception). In the MC configuration, Deep Cascade was also outperformed by the hybrid Wnet models (IK and KI) in terms of pSNR and NRMSE. These results are not surprising, since Unets are more flexible models that work across different scales when compared to flat convolutional neural networks. Also, a single Unet had been shown to be superior to a flat CNNN model for MR reconstruction when using architectures that produced the same number of feature maps [RN253]. WWnet has more trainable parameters when compared to Deep Cascade. Nevertheless, WWnet and Deep Cascade had similar processing times. Deep Cascade implementation was faster than WWnet in the SC configuration by ms per slice reconstructed, while it was slower in the MC configuration by ms per slice.
In all of the four experiments, Wnet IK models slightly outperformed Wnet KI models indicating that it may be advantageous to start the cascade using an image domain network. A similar result using flat unrolled structures was reported in [souza19a]. This finding can be explained by the fact that high frequencies of kspace are less densely sampled, potentially resulting in regions where the convolutional kernel would have no signal to operate upon. By starting with an image domain CNN block and because of the global property of the FT, the output of this network has a corresponding kspace that is now complete, effectively mitigating the problem of regions with no samples for the convolution kernels to operate on.
The SC configuration results indicated that imagelearning methods are better suited for the reconstruction of these kind of data followed by hybrid learning approaches and then sensor (kspace) learning models. Wnet KK had clear blurring and ringing artifacts that made it especially difficult to distinguish the transition between whitematter and graymatter tissue (Figures 3 and 4). The ranking of techniques for the MC configuration was different. Hybrid learning models achieved the best quantitative metrics. The observed performance boost of hybrid and sensor (kspace) learning methods can be attributed to the fact that now the kspace (sub)networks learn not just potential correlations within the same kspace, but also correlations present across coil channels. Correlations in space across channel are known to be strong and are the underlying basis of PI techniques [grappa].
The results of the MC configuration experiments were superior to the SC configuration results (Tables II and III). The best MC configuration metrics reduced NRMSE by and increased pSNR and VIF by and , respectively, compared to the best SC model. This observation is explained by the fact that the MC configuration looks at all channels simultaneously. The advantage of the SC configuration is its flexibility. A properly trained SC network can work for an arbitrary number of coil channels (see Supplementary Figure 3 for an example using a 32channel coil). In contrast, for the MC configuration, it is necessary to train one model for every coilchannel configuration. The models trained using the MC configuration were between and faster than the SC configuration models and that difference is expected to increase when using larger number of channels. Nonetheless, by using modern GPUs and by optimizing the reconstruction code, both models have the potential for online, i.e., while patient is still in the scanner, MR reconstruction.
Visual inspection of the images agree with the quantitative results. Visual differences between SC and MC reconstructions are more noticeable at higher acceleration factors (Figure 4). The VIF metric, which is correlated with the radiologist assessment of image quality [mason2019comparison], has a larger difference between SC and MC configurations when compared at (Figure 5). Assuming that an image with VIF is good enough to be incorporated into the clinical setting, the SC configuration would allow acceleration factors of up to , while the MC configuration would allow accelerations of up to (Figure 5) when using a 12channel coil. The usage of more sophisticated coils with more elements could potentially allow for further acceleration.
V Conclusions
In this work, we investigated cascades of Unets across different domain configurations for MR reconstruction. Two different configurations were investigated: the SC and the MC configurations. Our results indicate that image domain learning approaches are advantageous when processing channels independently (SC configuration), while hybrid approaches are better when reconstructing all channels simultaneously (MC configuration). The MC configuration also proved to be considerably faster than the SC configuration with an speed difference proportional to the number of coil channels. The SC configuration, however, is more flexible than the MC configuration, because it is independent of the number of coil channels. Our WWnet (IIII for SC and IKIK for MC) method outperformed the stateoftheart Deep Cascade method in three of the four comparisons. Unlike previous studies (cf. [souza2018hybrid, eo2018kiki]), our investigations indicated that starting the cascade of Unets with an image domain network for MC data leads to better results. Future studies should investigate the SC and MC configurations using a range of different coils (e.g., 4channel, 8channel, etc.). Also, the optimal domain configuration for the subnetworks that compose the cascade of Unets, which is a problem that grows exponentially ( where is the number of subnetworks) is not yet known.
Comments
There are no comments yet.