## 1 Introduction

MRI is one of the most applied imaging procedures for disease diagnosis and treatment planning. As a non-invasive, radiation-free, and in-vivo imaging modality, it provides significantly better soft-tissue contrast than many other imaging modalities and offers accurate measurements of both anatomical and functional signals. However, its long acquisition time, owing to sampling full k-space, especially under the multiple protocols requiring long TE and TR, could lead to significant artifacts in the reconstructed image caused by patient or physiological motions during acquisitions, *i.e*. cardiac motion and respiration. Furthermore, a long acquisition time also limits the availability of MR scanner for patients and causes delayed patient care in the medical system.

To accelerate the MRI acquisition, various efforts have been made for improving the reconstruction image quality with undersampled k-space data. Previously, CS and Parallel Imaging have achieved significant progresses in fast MRI [otazo2010combination, griswold2002generalized, huang2005k]. In CS-MRI, one assumes that images have a sparse representation in certain domains [lustig2007sparse, murphy2012fast, liang2009accelerating]. In conventional CS-MRI, previous works focus on using sparse coefficients in sparsifying transforms, *i.e*. wavelet transform [qu2010combined, zhang2015exponential] and contourlet transform [qu2010iterative], combined with regularization parameters, *i.e*. total variation, to solve the ill-posed inverse problems in an iterative manner. However, these iterative minimization process based on sparsifying transforms tends to generate smaller sparse coefficient values, and lead to loss of details and unwanted artifacts in the reconstruction when undersampling rate is high [ravishankar2010mr]. Thus, current CS-MRI is limited to a rate of [lustig2007sparse, ravishankar2010mr]. Furthermore, reconstruction based on these iterative algorithms is time-consuming; thus it is challenging to deploy them in near real-time MRI scenarios, *i.e*

. Cardiac-MRI and Functional-MRI. More recently, triggered by the success of computer vision

[lecun2015deep], deep learning based algorithms have been developed for fast MRI reconstruction and demonstrated significant advantages

[wang2016accelerating, schlemper2017deep, sun2016deep, hammernik2018learning, liu2019theoretically, lonning2019recurrent, han2019k, mardani2018deep, zhang2019reducing, zhu2018image]. Wang*et al*. [wang2016accelerating] first proposed to train a multi-layer CNN to recover the fully sampled MRI image from undersampled MRI image using supervised training with paired data. Similarly, Jin

*et al*. [jin2017deep] proposed to use U-Net [ronneberger2015u] to solve the inverse problem in imaging. Yang

*et al*. [yang2017dagan]

developed a De-Aliasing GAN that uses a U-Net as the generator with a loss function consisting of four components: an image domain loss, a frequency domain loss, a perceptual loss, and an adversarial loss. Quan

*et al*. [quan2018compressed] introduced a RefineGAN that uses a U-Net structure based generator with a cyclic loss for MRI de-aliasing. However, individual training is required for different sampling patterns and undersampling rates. Schlemper

*et al*. [schlemper2017deep, qin2018convolutional] came up with the data consistency layer in a deep cascade of CNN, which ensures the consistency of the reconstruction image in k-space and potentially reduces the issue of overfitting in deep training. Other than learning from pre-defined undersampling patterns, Zhang

*et al*. [zhang2019reducing]

proposed to use active learning to determine the 1D Cartesian undersampling pattern on the fly during the MRI acquisition. These pioneering deep learning based methods surpass conventional CS algorithms owing to the high-nonlinearity properties of data-driven feature extraction and achieve significantly lower computation during run-time. However, existing deep learning based methods have three major limitations. In this work, we aim to break these limitations.

Firstly, the above-mentioned algorithms are principally learned in the image domain alone, with a few amendments that use the frequency domain in the loss design [yang2017dagan] or in the data consistency layer [schlemper2017deep]. All of these deep learning based algorithms are designed to receive an image reconstructed from undersampled k-space as input and output an image as if reconstructed from fully sampled k-space; but unfortunately in the input images to the CNNs, detailed structures are likely distorted or even disappear. While there are recent attempts on restoring fully sampled image from undersampled k-space [eo2018kiki], the large scale trainable parameters limits the model can only be trained in an incremental fashion which could be further improved. As the first step to tackle this issue, we develop a dual domain learning scheme in MRI, which allows the network to restore the data in both image and frequency domains in a recurrent fashion. Previous works on CT metal artifact reduction and limited angle CT reconstruction also demonstrated the advantages of cross domain learning [lin2019dudonet, zhou2019limited].

Secondly, the previous studies are limited to conventional network design, such as multi-layer CNN and U-Net, and there are few attempts to design a customized network structure for undersampled MRI reconstruction. Inspired by the recent development of super-resolution imaging techniques

[dong2015image, kim2016accurate, zhang2018residual, hu2018squeeze], we propose a Dilated Residual Dense Network (DRD-Net) with a Squeeze-and-Excitation Dilated Dense Residual Block (SDRDB) as the building module. The DRD-Net is used for both image and frequency domain restorations. Our SDRDB is customized for MRI reconstruction task. In fast MRI acquisitions, we observe significant sparsity in undersampled k-space, especially when undersampling rate is high. Previous studies have demonstrated that better k-space recovery can be achieved via using non-local information interpolation, such as GRAPPA

[griswold2002generalized]. Based on this assumption, the first motivation of our DRD Block design in k-space is synthesis of missing k-space from a large receptive field by utilizing non-local k-space data, thus bringing more robustness and reliability. In the image domain, human organ anatomy is correlated in different regions. Our DRD Block with a large receptive field in the image domain can better capture this correlation between anatomical regions and synthesize the missing anatomical information even when the signal is highly distorted.Thirdly, previous studies have not fully explored the use of MRI protocol that requires a short acquisition time as deep prior to guide the MRI reconstruction process. In clinical routines, typical total scanning time for T1WI, T2WI, and FLAIR is mins, in which T2WI and FLAIR take the majority due to their long TR and TE. However, using the undersampled image with the detailed structures may already disappeared as input, the existing methods could synthesize artificial structure that does not belong to the patient. Recently, Xiang *et al*. explored the merits of using T1WI as additional channel input in the image domain to aid T2WI reconstruction [xiang2018ultra]. To further address this issue, we propose to use T1WI as deep prior in both image domain and k-space domain for improving the MRI reconstruction fidelity, given that the structural information in T1WI is highly correlated with that in different MRI protocols.

In summary, we propose a Dual Domain Recurrent Network (DuDoRNet) embedding with T1 priors to address these problems by learning two DRD-Nets on dual domains in a recurrent fashion to restore k-space and image domains simultaneously. Our method (Figure 2) consists of three major parts: recurrent blocks comprised of image restoration network (iDRD-Net) and k-space restoration network (kDRD-Net), recurrent T1 priors embedded in image and k-space domains, and recurrent data consistency regularization. A correct reconstruction should ensure the consistency in both domains linked by the linear operation of FT. Our intuition is that image domain restoration can be enhanced by fusing signal back-propagated from the k-space restoration, vice versa. Given sparse signal in both domains, DRD-Net with a large receptive field can sense more signal for a better restoration. Our recurrent learning can better avoid overfitting in directly optimizing restoration networks in dual domains. Extensive experiments on MRI patients with different sampling patterns and acceleration rates demonstrate that our DuDoRNet generates superior reconstructions.

## 2 Problem Formulation

Denoting a 2D k-space with complex values as , and a 2D image reconstructed from as , we need to reconstruct a fully sampled image from both the undersampled k-space () and reconstructed image (). The relationship between and can be written as:

(1) |

(2) |

where represents the binary undersampling mask used for accelerations; and denote the fully sampled and the undersampled k-space, respectively; and denote the images reconstructed from fully sampled and undersampled k-space, respectively; is the element-wise multiplication operation; and and are FT and IFT.

To tackle the ill-posed inverse problem of reconstructing from limited sampled data, we propose to restore both image domain and k-space domain. In image domain restoration, the optimization can be expressed as minimizing the prediction error:

(3) |

where is the prediction of the fully sampled image (

) generated by the estimation function (

) with its parameters () using the undersampled image () as input. Furthermore, the data consistency constraint [schlemper2017deep] is often used in addition to the prediction error:(4a) | ||||

(4b) |

where (4a) is the same as (3), and (4b) is the regularization term for data consistency. Similarly, we can formulate the optimization target for k-space restoration as:

(5) |

where is the prediction of the fully sampled k-space () generated by the estimation function () with its parameters () using the undersampled k-space () as input. Combining the image domain and k-space domain restoration with data consistency, the target function can thus be formulated as:

(6a) | ||||

(6b) | ||||

(6c) |

Directly optimizing multiple terms in the above target function is challenging in traditional network design, owing to its high computational complexity, overfitting, and local optima problems. Thus, we propose a dual-domain recurrent learning strategy that optimizes and recurrently. Our proposed approach is illustrated in details in the following sections.

## 3 Methods

The overall pipeline of our network is illustrated in Figure 1. It consists of three major parts: 1) the dual domain recurrent network consisting of recurrent blocks with image and k-space restoration networks in it; 2) the deep prior information generated from T1 data in the image and k-space domains for feeding into the recurrent blocks of the network; and 3) the recurrent data consistency regulations using sampled k-space data. In each recurrent block, we propose to use a Dilated Residual Dense Network (DRD-Net) for both image and k-space restorations.

### 3.1 Dilated Residual Dense Network

We develop a network structure for both MRI image and k-space restoration, called Dilated Residual Dense Network (DRD-Net). The idea is to use a Squeeze-and-excitation Dilated Residual Dense Block (SDRDB) as the backbone in our DRD-Net. The DRD-Net design and SDRDB are shown in Figure 2 and Figure 3, respectively. Compared to RDN [zhang2018residual], we customize the local and global structure design for our MRI reconstruction task.

#### 3.1.1 Global Structure

Our DRD-Net consists of three parts: initial feature extraction (IFE) via two sequential convolution layers, multiple SDRDBs followed by global feature fusion, and global residual learning. The overall pipeline is as follow:

(7) |

(8) |

where and denote the first and second convolutional operations in IFE, respectively. The first extracted feature is used for global residual learning in the third part. The second extracted feature is used as SDRDB input. If there are SDRDBs, the -th output can be written as:

(9) |

where represents the n-th SDRDB operation with . Given the extracted local features from a set of SDRDBs, we apply global feature fusion (GFF) to extract the global feature via:

(10) |

where denotes the concatenation operation along feature channel. Our global feature fusion function consists of a and convolution layers to fuse the extracted local features from different levels of SDRDB. The GFF output is used as input for global residual learning:

(11) |

The element-wise addition of global feature and initial feature are fed into our final convolution layer for reconstruction output.

#### 3.1.2 SDRD Block

The design of our SDRDB is shown in Figure 3. It contains four densely connected atrous convolution layers [chen2017deeplab], local feature fusion, and local residual learning. Expanding the expression of SDRDB, the -th convolution output in -th SDRDB can be written as:

(12) |

where denotes the -th convolution followed by Leaky-ReLU in the -th SDRDB, is the concatenation operation along feature channel, and the number of convolution is set to . Our SDRDB begins by composing a feature pyramid using 4 atrous convolution layers with dilation rates of 1, 2, 4, and 4. For an atrous convolution layer with kernel size (K) and kernel dilation (D), the receptive field (R) can be written as . The combination of two atrous convolution layers creates a new receptive field of . With that being said, the dense connection over 4 atrous layers enables diverse combinations of layers with various receptive fields, which can more efficiently extract features from different scales than traditional dilation approaches. Figure 4 demonstrates the feature pyramid from our 4 densely connected atrous convolution layers.

Cartesian | ZP | GRAPPA[griswold2002generalized] | TV[ma2008efficient] | Wang[wang2016accelerating] | DeepCas[schlemper2017deep] | RefGAN[quan2018compressed] | Ours w/o Prior | UF-T2 [xiang2018ultra] | Ours |

PSNR [dB] | 22.498 | 23.318 | 23.026 | 26.061 | 26.993 | 25.848 | 27.834 | 30.594 | 32.511 |

SSIM | 0.667 | 0.730 | 0.725 | 0.828 | 0.859 | 0.819 | 0.898 | 0.929 | 0.957 |

MSE() | 0.622 | 0.508 | 0.555 | 0.306 | 0.221 | 0.340 | 0.178 | 0.097 | 0.066 |

Radial | ZP | GRAPPA | TV | Wang | DeepCas | RefGAN | Ours w/o Prior | UF-T2 | Ours |

PSNR [dB] | 24.294 | 29.548 | 27.222 | 32.586 | 34.955 | 35.110 | 37.270 | 33.318 | 40.815 |

SSIM | 0.581 | 0.822 | 0.727 | 0.889 | 0.957 | 0.959 | 0.974 | 0.939 | 0.981 |

MSE() | 0.412 | 0.122 | 0.216 | 0.102 | 0.025 | 0.023 | 0.021 | 0.049 | 0.010 |

Spiral | ZP | GRAPPA | TV | Wang | DeepCas | RefGAN | Ours w/o Prior | UF-T2 | Ours |

PSNR [dB] | 26.181 | 31.893 | 33.890 | 35.987 | 43.867 | 36.049 | 48.418 | 37.793 | 49.186 |

SSIM | 0.675 | 0.882 | 0.909 | 0.932 | 0.972 | 0.960 | 0.991 | 0.961 | 0.993 |

MSE() | 0.266 | 0.076 | 0.046 | 0.029 | 0.0046 | 0.024 | 0.0018 | 0.019 | 0.0014 |

Then, we apply our local feature fusion (LFF), consisting of convolution layer and SE layer, to fuse the output from the last SDRDB and all convolution layers in current SDRDB. Thus, the LFF output can be expressed as:

(13) |

where

denotes the LFF operation. Finally, we apply the local residual learning to LFF output by adding the residual connection from SDRDB input. Thus, the SDRDB output is:

(14) |

### 3.2 Dual-Domain Recurrent Learning

In this section, we present details about our proposed DuDoRNet. As illustrated in Figure 1, each of the recurrent blocks of DuDoRNet contains one image restoration DRD-Net (iDRD-Net), one k-space restoration DRD-Net (kDRD-Net), and two data consistency layers interleaved. The T1 deep prior provides data in both image space and k-space that are fed into DuDoRNet recurrent blocks. In the -th recurrent block, denoting the image input as , the image restoration optimization target can be written as:

(15) |

where is the image restoration network based on DRD-Net with parameters . The image output from the last recurrent block and T1 image prior are concatenated channel-wise for inputting into iDRD-Net. is the binary undersampling mask used for accelerations. The k-space values in can be altered after inference through iDRD-Net since it only optimizes the first term. To maintain the k-space fidelity at sampled locations of , we add the data consistency as the second term. Denoting the output of iDRD-Net as and its FT as , the corresponding output from data consistency layer can be thus formulated as:

(16) |

where controls the level of linear combination between sampled k-space values and predicted values. When , the sampled k-space directly substitutes the prediction at in k-space. Denoting this output as , then and T1 k-space prior are concatenated channel-wise to feed into the kDRD-Net for k-space restoration. Similarly, the k-space restoration optimization target can be written as:

(17) |

where is the k-space restoration network based on DRD-Net with network parameters . Similarly here, the second term is to ensure the data consistency in the restored k-space. Thus, the loss function for each recurrent block is . The final loss is the summation of recurrent blocks losses: , where denotes the number of recurrent blocks.

In our experiments, is set to , is set to , and the number of SDRDB is set to . Each recurrent block in our DuDoRNet share the same network parameters. The final reconstruction output during testing is obtained by applying IFT to the last kDRD-Net output after data consistency operation.

## 4 Experiments

### 4.1 Experimental Settings

Dataset and Training We acquired an in-house MRI dataset consisting of 20 patients. We scanned each patient using three different protocols (T1, T2, and FLAIR) with full k-space sampled, resulting in three 3D volumes of for each patient in both image and k-space domains. 360 2D images are generated for each protocol. We split the dataset patient-wise into training/validation/test sets with a ratio of . As a result, our dataset consists of 252 training images, 36 validation images, and 72 test images for each protocol. Three different k-space undersampling patterns are examined in our experiments, including Cartesian, radial, and spiral trajectories. Examples of the sampling patterns are illustrated in Figure 5 (green box). The acceleration factor (R) is set to a value between 1 and 6 for all three patterns, corresponding to acceleration in acquisition time. Using these patterns, randomly undersampled T2/FLAIR and fully-sampled T1 are used as input to our model for training. During training, we also randomly augment the image data by flipping horizontally or vertically, and by rotating at different angles.

Performance Evaluation

The final evaluation is performed on our 72 test images. For quantitative evaluation, we evaluate our image reconstruction results using Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), and Mean Square Error (MSE). For comparative study, we compared our results with the following algorithms in previous works: total variation with penalized compress-sensing (TV-CS)

[lustig2007sparse], GRAPPA [griswold2002generalized], and four CNN-based algorithms including a sequential convolutional model [wang2016accelerating], a cascade image restoration model called DeepCas [schlemper2017deep], a structural refinement GAN model called RefGAN [quan2018compressed], and a DenseUNet model with complementary T1 image called UF-T2 [xiang2018ultra]. All CNN-based methods are trained using the same settings as to our method.### 4.2 Results

Image Quality Evaluation and Comparison We evaluated our reconstruction results by computing their SSIM, PSNR, and MSE, where the fully sampled images are used as ground-truth for this calculations. In Table 1, we demonstrated our T2 reconstruction evaluations on three sampling patterns with a high acceleration rate of . The first sub-table summarized the image quality when Cartesian-based acceleration is applied. Our DuDoRNet without T1 prior achieved PSNR up to dB, and boosted to dB when T1 prior is given. As compared to the previous state-of-the-art method with T1 prior called UF-T2 [xiang2018ultra], we improved the reconstruction from dB to dB. Similarly for radial-based acceleration, our DuDoRNet without T1 prior achieved PSNR up to dB, and further boosted to dB when T1 prior is provided. As compared to the best results with PSNR from RefGAN [quan2018compressed] without T1 prior, our DuDoRNet achieves significantly better results. In the last sub-table, we found our DuDoRNet with spiral pattern yields the best image quality among all sampling patterns and reconstruction methods. Under the spiral pattern, our DuDoRNet without T1 prior achieved PSNR dB and reinforced to dB when T1 prior is present. The qualitative comparison results with non-CNN based methods are shown in Figure 5

. As we can see, the reconstructions with zero padding (ZP) at a high acceleration rate create significant aliasing artifacts and lose anatomical details. Non-CNN based methods can improve the reconstruction as compared to ZP, but it is hard to see a significant improvement when a significant level of aliasing artifact is presented. In comparison, our reconstructions are robust to aliasing artifacts and structural loss in the input. The qualitative comparison results with CNN based methods are shown in Figure

6. At a high acceleration rate, the CNN based methods achieve better results than non-CNN based methods. Among them, our method restores information in both image and k-space better preserved the important anatomical details, as demonstrated in Figure 6 with arrows and ellipses. More results on FLAIR reconstruction are summarized in supplemental materials with similar performances.Reconstruction Stability Evaluation To evaluate the performance stability at different acceleration rates, we recorded the reconstruction performance by varying from 2 to 6 on all three patterns. The evaluation results are summarized in Figure 7. For the Cartesian pattern, our method can consistently maintain the SSIM to above 0.95 even when an aggressive acceleration rate is used, *i.e*. . Due to the large aliasing artifact created from the Cartesian sampling pattern as the acceleration rate increases, the reconstruction performance is more challenging to remain stable without T1 prior, but we were still able to maintain SSIM to above 0.89 and consistently outperforms the other methods for all undersampling rates. For radial pattern, all methods performed approximately the same at low acceleration rates . However, for a more aggressive undersampling rate, *i.e*. , our DuDoRNet is able to reduce the structural loss by a considerable margin. At , Our DuDoRNet with and without T1 prior can consistently maintain the SSIM to above 0.98 and 0.97, respectively. Lastly, we found the best performance stability when the spiral pattern is applied. Our DuDoRNet keeps the SSIM to above 0.99 over the whole undersampling range regardless of T1 prior. Under the same acceleration rate, radial and spiral patterns sample the k-space more uniformly than random Cartesian, thus leading to less aliasing artifact in the initial reconstruction input for the models, as demonstrated in Figure 5. As a result, radial and spiral patterns generate less aliasing artifacted input and are able to output more stable reconstruction in our experiments.

### 4.3 Ablation Studies

SSIM | Cartesian | Radial | Spiral | Average |
---|---|---|---|---|

(A) Net-baseline | 0.839 | 0.902 | 0.951 | 0.897 |

(B) Net-Rec | 0.860 | 0.959 | 0.973 | 0.931 |

(C) Net-DD | 0.851 | 0.929 | 0.962 | 0.914 |

(D) Net-DIL | 0.844 | 0.909 | 0.956 | 0.903 |

(E) Net-Rec-DD | 0.891 | 0.968 | 0.989 | 0.949 |

(F) Net-Rec-DIL | 0.869 | 0.962 | 0.978 | 0.936 |

(G) Net-DD-DIL | 0.859 | 0.935 | 0.969 | 0.921 |

(H) Net-Rec-DD-DIL | 0.898 | 0.974 | 0.991 | 0.954 |

Firstly, we evaluated the effectiveness of different components in our DuDoRNet. Without loss of generality, the reconstruction performance is evaluated on for all three patterns. We evaluate three key components, including: dual domain learning (DD), dilated residual dense learning (DIL), and recurrent learning (Rec) without T1 prior. in Rec is set to 5. The component analysis is summarized in Table 2. As we can observe, recurrent learning (B) and dual domain learning (C) each improve the performance over the baseline (A) by 0.015, which are more significant than dilated residual dense learning (D). Combining recurrent learning and dual domain learning (E), the performance achieves the largest boost as compared to the other two component combinations (F and G). Our DuDoRNet, equipping all components (H), produces the best reconstruction results. Overall, all three components help DuDoRNet to enhance the performance.

Secondly, we evaluated the effect of in our DuDoRNet. As shown in Figure 8, the reconstruction performance, measured by SSIM, increases monotonically when increases, while the rate of improvement starts to converge after for all three patterns. We also found that our DuDoRNet achieves the best performance when the spiral pattern is used even when different is implemented.

## 5 Conclusion

We present a dual domain recurrent network for fast MRI reconstruction with T1 prior embedded. Specifically, we propose to restore both image and k-space domains recurrently through DRD-Nets with large receptive fields. The T1 prior is embedded at each recurrent block to deeply guide restorations for both domains. Extensive experimental results demonstrate that while previous fast MRI methods on single domain for individual protocol have limited capability of directly reducing aliasing artifacts in the image domain, our DuDoRNet can efficiently restore the reconstruction, and the T1 prior can further significantly improve the structural recovery. Future work includes exploring the application of DuDoRNet to other signal recovery tasks, such as noise reduction and image super-resolution.

Comments

There are no comments yet.