Dual-domain Cascade of U-nets for Multi-channel Magnetic Resonance Image Reconstruction

11/04/2019 ∙ by Roberto Souza, et al. ∙ 5

The U-net is a deep-learning network model that has been used to solve a number of inverse problems. In this work, the concatenation of two-element U-nets, termed the W-net, operating in k-space (K) and image (I) domains, were evaluated for multi-channel magnetic resonance (MR) image reconstruction. The two element network combinations were evaluated for the four possible image-k-space domain configurations: a) W-net II, b) W-net KK, c) W-net IK, and d) W-net KI were evaluated. Selected promising four element networks (WW-nets) were also examined. Two configurations of each network were compared: 1) Each coil channel processed independently, and 2) all channels processed simultaneously. One hundred and eleven volumetric, T1-weighted, 12-channel coil k-space datasets were used in the experiments. Normalized root mean squared error, peak signal to noise ratio, visual information fidelity and visual inspection were used to assess the reconstructed images against the fully sampled reference images. Our results indicated that networks that operate solely in the image domain are better suited when processing individual channels of multi-channel data independently. Dual domain methods are more advantageous when simultaneously reconstructing all channels of multi-channel data. Also, the appropriate cascade of U-nets compared favorably (p < 0.01) to the previously published, state-of-the-art Deep Cascade model in in three out of four experiments.



There are no comments yet.


page 1

page 2

page 10

page 11

page 12

page 13

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Brief Literature Review

The idea of applying machine learning for MR reconstruction is not new. Nearly thirty years ago, for example, neural networks were investigated in the context of minimizing Gibbs artifacts resulting from k-space truncation

[yan1993data, hui1995mri, hui1995comments]. More recently, hardware and software advancements have allowed for training of advanced models and by 2016, the first deep learning models were being investigated [sun2016deep, wang2016accelerating]. Since then, the deep-learning-based MR reconstruction field has grown rapidly. Several deep-learning models have been proposed for MR CS reconstruction, however most have been validated using private datasets and in a SC acquisition setting. In late 2018, the fastMRI initiative [zbontar2018fastmri] made SC and MC knee raw MR data available for benchmarking purposes. The Calgary-Campinas initiative [RN136] has also added publicly available SC brain MR raw data. With this report, we provide access to MC data.

Deep-learning-based MR reconstruction models can be categorized into four groups (Figure 1):

  1. Image domain learning uses an image obtained by inverse Fourier transforming the zero-filled k-space as a starting point. It uses a deep learning model that operates solely in the image domain;

  2. Sensor domain (k-space) learning

    uses a model operating solely in the acquisition domain, which is the spatial-frequency domain in the case of MR imaging. It tries to estimate the missing k-space samples followed by applying the inverse FT to reconstruct the final image;

  3. Domain transform learning methods try to learn the appropriate transform directly from the sparsely sampled k-space data in order to generate alias free image-domain reconstructions; and

  4. Hybrid (sensor and image domains) learning comprises blocks that process the data in both sensor (k-space) and image domains. These blocks are connected through the appropriate FT (i.e., direct or inverse).

The majority of techniques proposed to date are image domain learning methods (see Table I).

Fig. 1: Groups of deep learning techniques proposed for magnetic resonance image reconstruction: (a) image domain learning, (b) sensor-domain (k-space) learning, (c) domain transform learning, and (d) hybrid sensor- and image-domain learning. See text for technique descriptions. iFFT = inverse Fast Fourier Transform.
Model Reference
Image domain learning [mardani2019deep, Schlemper2018StochasticDC, dedmari2018complex, qin2019convolutional, semantic_interpretability, xiang2018deep, RN306, RN307, RN305, RN253, wang2016accelerating, kwon2017parallel, hammernik2018learning, han2018deep, zengK2019]
Sensor domain learning [zhang_micccai_2018, akccakaya2019scan, kim2019loraki]
Domain transform learning [RN289, RN323, dautomap]
Hybrid domain learning [souza19a, wang2018, eo2018kiki, souza2018hybrid]

Literature summary classifying key magnetic resonance imaging reconstruction methods into four groups: 1) image domain learning, 2) sensor domain learning, 3) domain transform learning, and 4) hybrid domain learning.

I-a Image Domain Learning

The seminal work of Jin et al. [RN254]

proposed to use a direct inversion, which would be the zero-filled k-space inverse FT in the case of MR reconstruction, followed by a residual U-net to solve normal-convolutional inverse problems. The residual connection learns the difference between input and output to mitigate the vanishing gradient problem that can potentially disturb the network training process. Jin

et al. tested their model by reconstructing x-ray computed tomography in synthetic phantoms and real sinograms, but their model is directly extendable to MR reconstruction. In another study, Lee et al.[RN253] compared residual against non-residual U-nets to reconstruct MC MR data. Their results clearly indicated the advantage of using the residual connection, which have subsequently been incorporated in the majority of recently proposed models (cf., [RN307, RN305, RN306, qin2019convolutional, eo2018kiki, mardani2019deep]).

The model proposed by Schlemper et al. [RN306] consists of a flat unrolled deep cascade of CNNs interleaved with data consistency (DC) operations. DC replaces the network k-space signal estimates by measurements obtained in the sampling process. For dynamic MR reconstruction, their model also included data sharing layers. Seitzer et al. [semantic_interpretability] built upon [RN306] by adding a visual refinement network, which in their case is a residual U-net, that was trained independently using the result of the flat unrolled deep cascade as its input. Their results showed improvement in terms of semantic interpretability and mean opinion scores[semantic_interpretability], but the flat unrolled cascade was still better in terms of peak signal-to-noise ratio (pSNR). In a subsequent work, Schlemper et al. [Schlemper2018StochasticDC] added dilated convolutions and a stochastic component to their originally proposed model [RN306]

. The dilated convolutions were used to efficiently increase the network receptive field, while the stochastic component consisted of dropping subnetworks of the cascade with a given probability. These authors claimed that their stochastic component accelerated learning because the network became shorter and each subnetwork could see different levels of residual noise, which made the model more robust.

The cascaded deep learning models consist basically of a stack of convolutional layers. However, when the number of convolutional layers increase, their influence on subsequent layers decrease. It makes the training process more difficult. In order to overcome this problem, Zeng et al.[zengK2019] proposed a very deep densely connected network that combines sub-networks connecting them by dense connections. Each sub-network generates a reconstructed MR image using the information of the previous sub-network. The sub-networks are composed from convolutional and DC layers. The dense connections are expected to help with the vanishing gradient problem and the training process, improving the overall reconstruction performance of the network.

A commonly perceived problem with CS MR reconstruction techniques is the loss of high-frequency information, which happens due to two factors: 1) Most k-space sampling schemes favor sampling the low-frequencies more densely, i.e.

, high frequencies are less densely sampled; and 2) commonly used network loss functions, such as

norm, tend to give smooth reconstructions. Adversarial models try to mitigate this problem by including a new term in the reconstruction model (generator) cost function based on the capacity of a properly trained classifier (discriminator) to distinguish between a fully sampled inverse FT reconstruction and a CS accelerated MR reconstruction. Yang et al. [RN307] proposed a deep de-aliasing generative adversarial network (DAGAN) that used a residual U-net as generator with a loss function composed of four different components: an image domain loss, a frequency domain loss, a perceptual loss, and an adversarial loss. Quan et al. [RN305] proposed an adversarial model with a cyclic loss [RN315]. Their method consisted of a reconstruction network cascaded with a refinement network governed by a cyclic loss component that tried to enforce that the mapping between input (sampled k-space) and output (reconstructed image) was a bijection, i.e. invertible. Mardani et al. [mardani2019deep] proposed a generative adversarial network for compressed sensing (GANCS) that tried to model the low dimensional manifold of high-quality MR images by leveraging a mixture of least-squares generative adversarial network and a pixel-wise cost.

The work of Dedmari et al.[dedmari2018complex]

, to the best of our knowledge, is the only study that implemented a complex-valued fully convolutional neural network (CNN) for MR reconstruction. The main advantage of their work was that they took full advantage of the complex number arithmetic as opposed to the other techniques that represent complex-values by splitting real and imaginary components into separate image channels. Unfortunately complex-valued neural networks implementations are still in their infancy, and deep learning frameworks do not provide support for defining complex networks, which is the reason we did not use them in this work.

We would like to emphasize that even though some image domain learning models described have DC blocks or a frequency-domain term in the network training loss function, the learning portion of these models happened in image domain. Therefore, we did not classify these models as hybrid.

I-B Sensor Domain (K-space) Learning

The work of Zhang et al. [zhang2018multi] followed the trend of using adversarial models. The authors proposed a MC generative adversarial network for MR reconstruction in the k-space domain. They tested their approach on 8-channel data using coherent sampling. Their network output estimated the fully sampled k-spaces for each channel. Their final reconstruction was obtained by taking the channel-wise inverse FT and combining the channels through sum of squares [larsson2003snr]. Akćakaya et al. [akccakaya2019scan]

proposed a scan-specific model for k-space interpolation that was trained on the autocalibration signal. Their model outperformed GRAPPA especially for acceleration factors

. Kim et al. [kim2019loraki]

proposed a similar approach, but using a recurrent neural network model. In their experiments they outperformed the model proposed in


I-C Domain Transform Learning

Zhu et al. [RN289] proposed to learn the manifold of the transform that connected the sampled k-space and image domains. Their technique is called automated transform by manifold approximation (AUTOMAP). Their model had a quadratic parameters complexity, which did not allow them to train their model due to hardware limitations for images of dimensions greater than pixels. Subsequent work by Schlemper et al. [dautomap] proposed to decompose AUTOMAP (d-AUTOMAP). Instead of learning a two-dimensional transform, they decomposed it into two one-dimensional transforms, which made their model parameter complexity linear. In their comparison, d-AUTOMAP outperformed AUTOMAP. A somewhat similar approach that looks into the translation of one-dimensional inverse FT of k-space to an image was investigated in [RN323]. The authors only compared their proposal against traditional CS reconstruction models and demonstrated superior results.

I-D Hybrid Learning

Hybrid models leverage information as presented in k-space and image domains without trying to learn the domain transform, making the parameter complexity more manageable. A previous study [souza2018hybrid] proposed a hybrid model, which consisted of a k-space U-net connected to an image domain U-net through the inverse FT. Their model was trained end-to-end. However, the model did not have DC steps, and was assessed only on single-coil data. Eo et al. [eo2018kiki] developed a dual-domain model named KIKI-net that cascaded k-space domain networks with image domain networks interleaved by DC layers and the appropriate domain transform. A similar approach has also been used for computed tomography reconstruction [adler2018learned]. A further investigation of KIKI-net [souza19a] looked at other possible domain configurations for the sub-networks in the cascade and their results indicated that starting the cascade with an image domain sub-network may be advantageous.

Ii Materials and Methods

Ii-a Dataset

One hundred and eleven volumetric T1-weighted partially Fourier-encoded hybrid datasets were consecutively acquired as part of the ongoing Calgary Normative Study [tsang2017white]. Data were acquired on a clinical MR scanner (Discovery MR750; General Electric Healthcare, Waukesha, WI) with a 12-channel coil. A three-dimensional, T1-weighted, gradient-recalled echo, sagittal acquisition was employed on presumed healthy subjects (age: years years [mean standard deviation]; range: years to years). Acquisition parameters were TR/TE/TI = ms/ ms/ ms (92 scans) and TR/TE/TI = ms/ ms/ ms (19 scans), with to contiguous -mm slices and a field of view of mm mm. The acquisition matrix size for each channel was . In the slice-encoded direction (), data were partially collected up to and then zero filled to . The scanner automatically applied the inverse FT, using the fast Fourier transform (FFT) algorithms, to the k-space data in the frequency-encoded direction, so a hybrid dataset was saved. K-space undersampling was then performed retrospectively in two directions (corresponding to the phase encoding, , and slice encoding, , directions). Note that the reconstruction problem is effectively a two-dimensional problem (i.e., in the plane). The partial Fourier data were reconstructed by taking the channel-wise iFFT of the collected k-spaces and combining the outputs through the conventional sum of squares algorithm that has been shown to be optimal in terms of signal-to-noise ratio for reconstruction of MC MR [larsson2003snr]. The reconstructed spatial resolution was mm.

The acquired data were used to train, validate and test the proposed SC and MC deep learning reconstruction models. The raw dataset used in this work is publicly available for benchmark purposes as part of the Calgary-Campinas dataset [RN136] (https://sites.google.com/view/calgary-campinas-dataset /home).

Ii-B Cascade of U-net Models

Let represent fully-sampled k-spaces, one for each coil channel, of sizes pixels. The fully sampled reconstruction is given by:


where is the two-dimensional FT operator applied across each channel component of the multi-dimensional array. The input for our model is the undersampled and zero-filled set of measurements that can be conveniently defined by:


where is the element-wise multiplication and represents the sampling function defined by:


is the set of k-space positions sampled. Our models consist of cascading U-nets () where each U-net block operates either on k-space or image domains. The k-space domain U-net ():


and the image domain U-net ():


In these equations, represents a generic input in k-space domain . The right hand side of Equations 4 and 5 enforce DC for the k-space positions measured during the sampling process. This DC implementation consider a noiseless setting. Another common implementation consists in linearly combining the outputs predicted by the network with the values measured during sampling based on an estimated noise level [eo2018kiki, RN306]. Our final cascade of U-nets model is given by:


is the reconstruction estimated by the model. The loss function used to train the model was simply the mean squared error:


where is the number of samples used to compute the loss and the upper script indicates a sample in this set.

Ii-C Deep Learning Models

Four different models were first investigated in this study. The two-element U-net, termed W-net, was tested using all four possible domain configurations: a) W-net II, b) W-net KK, c) W-net IK, and d) W-net KI. The U-net model (Figure 2) used in this work is a modified version of the originally proposed U-net [RN196] and was designed empirically. Modification was made because, when designing our model, we noticed that a network with less convolutions and convolutional layers yielded similar results compared to more complex models. Our U-net has 22 convolutional layers and 3,000,674 for the SC configuration and 3,011,156 trainable parameters for the MC configuration. The W-net models consist of two cascaded U-nets and thus have twice as many convolutional layers and trainable parameters.

In the second stage, the best-performing W-net was identified and concatenated with itself to form a four-element WW-net model. We compared WW-net against the four previously described models. The WW-net model consists of four cascaded U-nets and thus has four times as many convolutional layers and trainable parameters as the basic U-net.

In addition, we compared our W- and WW-net results against the previously published Deep Cascade method [RN306]. We implemented Deep Cascade using six sub-networks and five convolutional layers with filters and a final convolutional layer that goes back to the number of channels of the input, i.e., either 2 or 24 depending on the model. Our Deep Cascade implementation had 894,348 parameters for the SC configuration and 978,960 trainable parameters for the MC configuration. We choose to compare our approaches against Deep Cascade because recent work has demonstrated superior performance when compared (cf., [RN306, souza19a]) to other recently published deep-learning-based MR image reconstruction techniques, such as Dictionary Learning MR Imaging [ravishankar2010mr], DAGAN [RN307], KIKI-net [eo2018kiki] and the networks discussed in [RN305, souza2018hybrid]. We used our own implementation of Deep Cascade because the original implementation provided by the authors only worked in the SC configuration.

Fig. 2:

The base U-net model architecture. The network receives as input either single-channel (SC) or multi-channel (MC) k-space data. This U-net has 22 convolutional layers, three max-pooling layers, three up-sampling layers, and one residual connection. The kernel sizes of the convolutions are

, with the exception of the final layer, where we use convolutions.

Ii-D Experimental Setup and Implementation

Each of the six networks described in the previous subsection were trained four times: once each for the SC and MC configurations and for each of two different acceleration factors, . is the reciprocal of the fraction of k-space that was sampled. In this work we tested and

. This resulted in a total of 24 trained models. All models were trained from scratch over 50 epochs using the Adam optimizer

[kingma2014adam] with a learning rate of and decay of  . The networks training were interrupted if the cost function did not improve for five consecutive epochs and some of them completed training prior to 50 epochs, which provided the rationale for our chosen number of epochs. Forty-three volumes (consisting of 11,008 slices) were used for training, eighteen volumes (4,608 slices) for model selection (validation), and 50 volumes (12,800 slices) for testing. A Poisson disc distribution sampling scheme [cook1986stochastic] in the plane, where the center of k-space was fully sampled within a circle of radius 16 to preserve the low-frequency information, was used. The radius of 16 was determined experimentally. During training, the sampling patterns were randomly generated on each epoch for data augmentation purposes. The deep learning reconstructions were compared against the fully-sampled partial Fourier reconstruction reference. The best SC and MC models were also assessed for a range of acceleration factors extending between and .

Our reconstruction models were implemented in Python 3 using the Keras library (


) and TensorFlow (

https://www.tensorflow.org/) as the backend. Training, validation and testing were performed on a seventh generation Intel Core i7 processor with 16 GB of RAM memory and a GTX 1070 graphics processing unit (GPU). The code is publicly available at https://github.com/rmsouza01/CD-Deep Cascade-MR-Reconstruction.

Ii-E Performance Metrics and Statistical Analysis

The reconstructed images were assessed both qualitatively (visual assessment) and quantitatively (performance metrics). Qualitative assessments included a single blinded expert (NN) reviewing the resulting images and assessing image artifact. Quantitatively, images were assessed using two commonly used image reconstruction performance metrics: nromalized root mean squared error (NRMSE) and peak signal to noise ratio (pSNR). Also, we used the visual information fidelity (VIF) [sheikh2006image] metric, which was shown to have a strong correlation with radiologist opinion when rating MR image quality [mason2019comparison].

Lower NRMSE represents better reconstructions, while the opposite is true for pSNR and VIF. Where appropriate, mean

standard deviation values were reported. Because the metrics did not follow a normal distribution, statistical significance between the experimental network models was determined using a non-parametric Friedman chi-squared test. Post-hoc testing to assess specific pair-wise differences was performed using a Dunn’s test with Bonferroni correction. A

-value was used as the level of statistical significance.

Processing times of the SC and MC channel configurations were measured across two hundred and fifty-six (256) image slices using the hardware previously described (see Section III C). The average reconstruction time per slice for each of the models was reported.

Iii Results

A range of the slices towards the edges of the three-dimensional acquisition volumes did not contain anatomy and were basically noise. Although qualitatively agreeing with the noise properties of the reference image (Supplementary Figure 2), reconstruction of these edge slices resulted in large changes in the residual maps and were thus excluded from the quantitative image analysis, leaving a total of slices in the test set.

The metrics for the SC configuration reconstruction are summarized in Table II. Statistically significant differences were found between the group means (). Post-hoc testing indicated that Deep Cascade had the overall best metrics for , although the differences were small when compared with WW-net IIII. For , WW-net IIII obtained the best results. Among the SC configuration, in all experiments, image domain learning methods had superior results in the quantitative analysis, followed by hybrid models and then the k-space only model.

The performance metrics for the MC configuration reconstruction for and are summarized in Table III. Statistically significant differences were observed between group means (). The post-hoc testing indicated that WW-net IKIK had the overall best metrics for both and . Among the MC configuration, in all experiments, hybrid-domain learning methods had superior results in the quantitative analysis.

W-net II
W-net KK
W-net IK
W-net KI
Deep Cascade
W-net II
W-net KK
W-net IK
W-net KI
Deep Cascade
TABLE II: Single Channel (SC) Configuration: Average normalized root mean squared error (NRMSE), peak signal to noise ratio (pSNR) and visual information fidelity (VIF) reconstruction results for the SC configuration. Mean standard deviation is reported. The best results for each factor are emboldened. A Friedman chi-squared test determined statistical significance across the six experimental models () for . Post-hoc pairwise Dunn’s test with Bonferroni correction between the WW-net IIII and the other five methods for each factor was significant for all comparisons (). Image domain learning methods achieved the best quantitative results.
W-net II
W-net KK
W-net IK
W-net KI
Deep Cascade
W-net II
W-net KK
W-net IK
W-net KI
Deep Cascade
TABLE III: Multi-Channel (MC) Configuration: Average normalized root mean squared error (NRMSE), peak signal to noise ratio (pSNR) and visual information fidelity (VIF) reconstruction results for the MC configuration. Mean standard deviation is reported. The best results for each factor are emboldened. A Friedman chi-squared test determined statistical significance across the six experimental models () for both factors. Post-hoc pairwise Dunn’s test with Bonferroni correction between the WW-net IKIK and the other five methods for each factor was significant for all comparisons (). Hybrid learning methods achieved the best quantitative results.

Representative sample reconstructed images using the SC and MC configurations for and are depicted in Figures 3 and 4, respectively. Visual assessment of the reconstructed images showed noticeable reconstruction artifacts, particularly with the SC configuration and the W-net KK model. Artifacts are more noticeable at .

The arguably best SC model was WW-net IIII and the best MC model was WW-net IKIK. They were trained and tested for a range of acceleration factors (. The average NRMSE, pSNR and VIF results are depicted in Figure 5. On average the MC WW-net IKIK decreased NRMSE by and increased pSNR and VIF by and , respectively, compared to SC W-Wnet IIII. Differences were statistically significant (). Representative reconstructions for each accleration factor in the SC and MC configurations are depicted in Figures 6 and 7, respectively.

The average reconstruction time for each of the models assessed are reported in Table IV. The SC configuration was slower, because it reconstructed each of the 12-channels independently prior to combining them through sum of squares. The slowest model in the SC configuration was WW-net IIII, which took ms to reconstruct each slice. The second slowest was Deep Cascade followed by the W-net models. Our MC configuration implementation was not optimal, specially the portion that computes the channel-wise FT (FFT or iFFT), which was implemented through a slow interpreted loop in Python. Although the MC configuration had a sub-optimal implementation, the slowest model required ms to reconstruct a slice.

Model SC (ms) MC (ms)
W-net II 222.0 39.2
W-net KK 222.0 34.4
W-net IK 222.0 38.8
W-net KI 222.0 36.4
WW-net IIII/IKIK 452.4 57.0
Deep Cascade 400.8 62.4
TABLE IV: Average reconstruction times for the different models across the single-channel (SC) and multi-channel (MC) configurations. The SC configuration is considerably slower, because it reconstructed each of the 12-channels independently. Note that the reconstruction times roughly double with the depth of the cascade (i.e., W-net versus WW-net). For the MC configuration the implementation of the appropriate channel-wise Fourier Transform (direct or inverse) was sub-optimal. Therefore, the processing times did not scale with the cascade depth.

Iv Discussion

Our experiments indicated that cascades of U-nets can improve CS MR reconstruction. In our comparison with the Deep Cascade method that is composed of six flat unrolled sub-networks, our WW-net model (composed of a cascade of four U-nets) achieved statistically significant better results in three out of four experiments (SC, was the exception). In the MC configuration, Deep Cascade was also outperformed by the hybrid W-net models (IK and KI) in terms of pSNR and NRMSE. These results are not surprising, since U-nets are more flexible models that work across different scales when compared to flat convolutional neural networks. Also, a single U-net had been shown to be superior to a flat CNNN model for MR reconstruction when using architectures that produced the same number of feature maps [RN253]. WW-net has more trainable parameters when compared to Deep Cascade. Nevertheless, WW-net and Deep Cascade had similar processing times. Deep Cascade implementation was faster than WW-net in the SC configuration by  ms per slice reconstructed, while it was slower in the MC configuration by  ms per slice.

In all of the four experiments, W-net IK models slightly outperformed W-net KI models indicating that it may be advantageous to start the cascade using an image domain network. A similar result using flat unrolled structures was reported in [souza19a]. This finding can be explained by the fact that high frequencies of k-space are less densely sampled, potentially resulting in regions where the convolutional kernel would have no signal to operate upon. By starting with an image domain CNN block and because of the global property of the FT, the output of this network has a corresponding k-space that is now complete, effectively mitigating the problem of regions with no samples for the convolution kernels to operate on.

The SC configuration results indicated that image-learning methods are better suited for the reconstruction of these kind of data followed by hybrid learning approaches and then sensor (k-space) learning models. W-net KK had clear blurring and ringing artifacts that made it especially difficult to distinguish the transition between white-matter and gray-matter tissue (Figures 3 and 4). The ranking of techniques for the MC configuration was different. Hybrid learning models achieved the best quantitative metrics. The observed performance boost of hybrid and sensor (k-space) learning methods can be attributed to the fact that now the k-space (sub-)networks learn not just potential correlations within the same k-space, but also correlations present across coil channels. Correlations in space across channel are known to be strong and are the underlying basis of PI techniques [grappa].

The results of the MC configuration experiments were superior to the SC configuration results (Tables II and III). The best MC configuration metrics reduced NRMSE by and increased pSNR and VIF by and , respectively, compared to the best SC model. This observation is explained by the fact that the MC configuration looks at all channels simultaneously. The advantage of the SC configuration is its flexibility. A properly trained SC network can work for an arbitrary number of coil channels (see Supplementary Figure 3 for an example using a 32-channel coil). In contrast, for the MC configuration, it is necessary to train one model for every coil-channel configuration. The models trained using the MC configuration were between and faster than the SC configuration models and that difference is expected to increase when using larger number of channels. Nonetheless, by using modern GPUs and by optimizing the reconstruction code, both models have the potential for online, i.e., while patient is still in the scanner, MR reconstruction.

Visual inspection of the images agree with the quantitative results. Visual differences between SC and MC reconstructions are more noticeable at higher acceleration factors (Figure 4). The VIF metric, which is correlated with the radiologist assessment of image quality [mason2019comparison], has a larger difference between SC and MC configurations when compared at (Figure 5). Assuming that an image with VIF is good enough to be incorporated into the clinical setting, the SC configuration would allow acceleration factors of up to , while the MC configuration would allow accelerations of up to (Figure 5) when using a 12-channel coil. The usage of more sophisticated coils with more elements could potentially allow for further acceleration.

V Conclusions

In this work, we investigated cascades of U-nets across different domain configurations for MR reconstruction. Two different configurations were investigated: the SC and the MC configurations. Our results indicate that image domain learning approaches are advantageous when processing channels independently (SC configuration), while hybrid approaches are better when reconstructing all channels simultaneously (MC configuration). The MC configuration also proved to be considerably faster than the SC configuration with an speed difference proportional to the number of coil channels. The SC configuration, however, is more flexible than the MC configuration, because it is independent of the number of coil channels. Our WW-net (IIII for SC and IKIK for MC) method outperformed the state-of-the-art Deep Cascade method in three of the four comparisons. Unlike previous studies (cf. [souza2018hybrid, eo2018kiki]), our investigations indicated that starting the cascade of U-nets with an image domain network for MC data leads to better results. Future studies should investigate the SC and MC configurations using a range of different coils (e.g., 4-channel, 8-channel, etc.). Also, the optimal domain configuration for the sub-networks that compose the cascade of U-nets, which is a problem that grows exponentially ( where is the number of sub-networks) is not yet known.