I Introduction
Magnetic Resonance Imaging (MRI) is a widely applied noneinvasive technology that reveals both functional and anatomical information. Unfortunately, traditional MRI suffers from inherently slow acquisition from a large volume of data in the kspace, i.e., Fourier space [21]
. Researchers introduce the compressed sensing (CS) theory into MRI reconstruction, allowing fast acquisition at a sampling rate much lower than the Nyquist rate without degrading image quality. The main challenge for CSMRI is the illness of the inherited inverse problem to estimate images of high quality from undersampled data points. Moreover, the existence of acquisition noise and computation of discrete Fourier transformation deteriorate the reconstruction stability.
Conventional approaches to CSMRI devote significant efforts to incorporating prior knowledge on images as regularizers in order to restore details as well as suppress artifacts. Sparsity constraints in a specific transform domain are the most commonly used [5, 30, 22, 14]. Dictionary learning based methods consider the sparsity in the subspace spanned by reference images [34, 2, 45]. Nonlocal paradigms take the advantage of the sparse representation by grouping similar local image patches [31, 11]. These methods derived from certain prior models are able to find a stable and satisfactory reconstruction by optimizing an energy functional with various forms of regularizers. Nevertheless, this optimization typically demands iterative calculation of gradient information in a highdimensional image space, suffering from rather slow process.
Recent deep learning based approaches directly apply pretrained deep architectures to infer MR images from degraded sampled inputs [38, 17, 35], rendering fast reconstruction. Nevertheless, these methods upon direct deep mappings are not so stable as conventional ones to data variations because they abandon the principled prior knowledge and totally depend on training examples. By taking advantage of both domain knowledge and available data, hybrid paradigms attempt to bridge the gap between the data learning and numerical optimization for functional energy with regularizers. For instance, Sun et al. [43] and Diamond et al. [9] unroll the iterative optimization as the prediction process using deep networks. Unfortunately, this type of ad hoc combinations breaks the convergence of optimization iterations so that it cannot guarantee the optimal solution to the original objective energy. Figure 1 illustrates an example of reconstruction from the latest deep learning based method [32] where the circled anatomic structures mistakenly appear artifacts while smearing part of tissues, though increasing PSNR from to . Therefore, neither medical research nor clinical diagnosis can trust reconstruction without theoretical guarantee.
Existing methods fail to provide an efficient or reliable treatment for CSMRI. In this study, we design a converged deep framework comprising modules assembled in principle for trustworthy fast MR reconstruction. Specifically, we provide an efficient iterative process by integrating the taskdriven domain knowledge with the datadriven deep networks. Meanwhile, we introduce an automatic feedback mechanism that leads deep iterations converging to the optimal solution of the original energy. Further, we consider two extensive tasks in practical scenarios, i.e., parallel imaging and Rician noise pollution, to investigate the robustness of our paradigm for realworld applications. Our framework flexibly adapts to the energies targeting to these scenarios, and robustly finds the solutions. Our contributions can be listed as follows:

We propose a generic deep framework assembling modules in principle to efficiently and reliably solve the CSMRI model, typically nonconvex and nonsmooth.

We establish an automatic feedback mechanism in virtue of the firstorder optimality condition to reject improper predictions from embedded deep architectures and to guide the deep iterations towards the desired solution.

We customtailor two extensions of our framework to handle the case of parallel imaging and of Rician noise contamination in practical scenarios.
Extensive experimental results on benchmarks and raw data collected in the space of an MRI equipment demonstrate the superiority of the proposed paradigm against stateoftheart techniques in both reconstruction accuracy and efficiency, as well as the robustness to noise contamination.
Ground Truth  ZeroFilling  RefineGAN  Ours 
Ii Related Work
This section reviews approaches to CSMRI reconstruction based on prior models and deep learning, followed by the algorithms for two practical reconstruction scenarios, i.e., parallel imaging and Rician noise removal.
Iia Model based CSMRI
Conventional CSMRI needs to solve an energy optimization with regularizations derived from image priors in a specific transform domain or subspace. This term plays a vital role in the reconstruction quality. Researchers have devoted tremendous efforts to exploring the sparsity prior in various transform domains including the gradient [5], discrete Cosine transform [22], partial Fourier transform (RecPF) [42], and wavelet transform domains [30]. These techniques, developed upon the sparse representation with a fixed set of basis functions, can hardly provide satisfactory quality. Alternatively, researchers resort to the datadriven sparse prior in the subspace spanned by an overcomplete dictionary of reference images, performing better in details recovery and artifacts removal [34, 2, 45, 31, 11]. These dictionary based CSMRI methods are able to generate reliable or plausible reconstruction following the underlying physical prior or constraint, but typically suffer from extremely slow process as the energy optimization turns out to be a computationally expensive iterative process.
Additionally, most of regularization terms derive the sparsity prior in the forms of the norm. The norm has nice mathematical properties, but may not be able to characterize complex data distributions in real applications. Also, simply concatenating reference images into an overcomplete dictionary as the data dependent prior has the limitation in handling flexible structures of real imaging problems.
IiB Deep learning based CSMRI
Recent studies introduce deep learning techniques into MRI reconstruction in order to utilize the information given by vaults of available data examples. Wang et al.
design and train a convolutional neural network (CNN) to learn the direct mapping from the input undersampled
space data to the desired fullysampled image [38]. A modified UNet architecture is used to learn the aliasing artifacts in [17] and a deep cascading of convolutional network is applied to accelerate MR imaging in[35].Lately, generative adversarial networks (GANs) are able to remove aliasing artifacts by generated details, and gain the stateoftheart quality for a specific benchmark [32, 41, 26]. These methods provide an extremely fast feedforward inference for reconstruction, but highly depend on training examples without any principled knowledge, failing to give the desired solution when the input deviates from the mode of training data distribution.There exist hybrid techniques that integrate deep architectures into an iterative optimization process for the reconstruction energy with prior knowledge. Sun et al. present ADMMNet for CSMRI by introducing learnable architectures into the Alternating Direction Method of Multipliers (ADMM) iterations [43]. A similar unrolling scheme is developed in [9]. These methods upon deep priors can improve the reconstruction accuracy and reliability at a relatively faster speed. Unfortunately, the unrolling schemes that directly replace iterations by deep architectures cannot guarantee the convergence of the iterative optimization process. Moreover, no mechanism is available to control the errors generated by nested structures. Therefore, we can still hardly rely on these reconstructed results in medical analysis and diagnosis.
IiC Parallel imaging algorithms
Parallel imaging (PI) techniques, similar with CSMRI, are also commonly used in clinic applications to accelerate MRI acquisition by using different sensitivities received from multiple coils installed at different locations. For the sake of saving time, PI conducts spatial encoding in the form of receiver coils with various sensitivity profiles instead of gradient encoding. SENSE and its variants directly restore the images from undersampled data in the image domain [29, 44]
. Another line of studies is to first interpolate the missing data in the frequency domain, and then to transfer the results to the image domain,
e.g., GRAPPA[15], SPIRiT [23], and their derived techniques. Recent approaches combine a CS energy model with PI in order to further speed up the acquisition such as CSSENSE [18], CSGRAPPA [7], SPIRiT [27], and SAKE [36]. It is worth investigating how these energybased PI approaches may benefit from deep learning.IiD Rician noise removal
Independent zeromean Gaussian noise with equal variances may simultaneously corrupt both real and imaginary parts of the complex data in the
space during practical MR acquisition [33]. Typical MR imaging techniques convert the original complex Gaussian noise to Rician noise when computing the magnitude of the complex space data [16]. The nonadditive Rician noise challenges traditional noise removal techniques. Researchers designate special treatments targeting at Rician noise removal, e.g., nonlocalmeans algorithms[39, 25], wavelet packet[40], wavelet domain filtering[28], and total variation(TV) denoisers [8, 19]. Deep learning based denoisers are also developed to tackle Rician noise [43, 41]. Nevertheless, it is still an open problem to fuse Rician noise removal with efficient CSMRI reconstruction in a unified framework for practical MR acquisition.Iii Method
We first give the problem formulation of CSMRI, and present our deep framework. Subsequently, we detail the reconstruction process that converges to the optimum of the original energy, and finally provide the theoretical analysis.
Iiia Problem formulation
According to the CS theory [11], typical CSMRI reconstruction attempts to recover a fullysampled image from undersampled space data as:
(1) 
where represents the undersampled observation in the space, and
denotes the column vector of the desired MR image to be reconstructed. The Fourier transform and undersampling operation are denoted as
and , respectively. The composite operator constitutes a linear operation . represents the regularization imposed on the illposed inverse problem for reconstruction in order to constrain the searching space of the desired solution .Existing CSMRI techniques usually solve the energy optimization (1) by iterations upon gradient information with the prior penalty. These methods suffer from expensive time consumption owing to the iterative gradient calculations. Alternatively, an endtoend inference upon deep networks can achieve efficient MR reconstruction from undersampled data. These direct approaches without explicit constraints from prior knowledge lack reliability or robustness to data variations. Hybrid strategies integrate both principled knowledge and deep architectures into the optimization procedure, but this simple combination has no theoretical guarantee on the convergence so that the reliability issue has not been resolved yet. Such trade of reliability for efficiency may produce serious consequence in medical analysis and diagnosis because one cannot tell whether a reconstruction algorithm restores authentic anatomical structures or fabricates/hallucinates details upon an available data distribution. In this work, we propose a deep reconstruction framework with theoretical guarantee on convergence, enabling both efficiency and robustness.
IiiB Deep optimization framework
We consider the classical CSMRI model with the sparsity regularization on the wavelet basis:
(2) 
where denotes the sparse code of on the wavelet basis (), and indicates the tradeoff parameter. We use the regularization with so that the problem turns out to be challenging nonconvex sparse optimization. To tackle the challenge, our framework embraces imaging principles, deep inference, and sparsity priors, either of which was proved to be critical for effective reconstruction in previous studies. Accordingly, we design the fidelity, datadriven, and prior modules in every optimization iteration to exploit these three types of information. Further, the framework embeds an optimality conditional module that automatically checks the output of each iteration and rejects the improper one leading to the undesired result. This module maintains the convergence, yielding a reliable solution to the optimization.
The top row of Fig. 2 demonstrates the cascade of all ingredients at each iteration of the optimization for our framework, and we elaborate the four modules in the order shown as Fig. 2.
Fidelity module: The fidelity term reveals the image formation process of MR imaging in (2), which is necessary for reconstruction algorithms. This module in our framework constitutes a mapping at the (+1)th iteration from the previous output (initialized as the degraded input ) to an intermediate reconstructed image , characterizing the inverse imaging process as:
(3) 
where denotes the parameters for the mapping. The calculation of the mapping and its parameters will be detailed later in the reconstruction process.
Datadriven module: Networks provide a flexible mechanism for characterizing underlying structures shared by numerous training pairs. Our framework consists of deep architectures encoding these image priors. We regard the artifacts introduced by the fidelity mapping as unknown noise, and thus employ a residual denoising network with shortcuts (IRCNN) [46] as the datadriven module of our framework:
(4) 
where denotes the parameters of the residual network in the (+1)th stage. We will give a detailed depiction of the choice for this CNNbased denoiser later.
Optimal condition module: One major issue arises with the datadriven module whether the inference of the deep network follows the direction converging to the optimum of the model energy. This module equipped with a checking mechanism resolves this issue, via automatically checking and rejecting the improper output of the datadriven module:
(5) 
where and are the outputs of the datadriven module at this stage and of the previous iteration, respectively. denotes the parameters for the optimality condition. This module calculates an indicator using the firstorder optimality condition that determines whether to accept the updating from deep architectures.
Prior module: The sparsity prior upon a specific basis is necessary to constrain the solution space in traditional CSMRI. The prior naturally describes the intrinsic mutual characteristics of MR images so that reconstruction can benefit from incorporating this prior. Thus, we append a prior module to the condition module to further improve restoration quality:
(6) 
where denotes the parameters for the prior. Consequently, we are able to include domain knowledge for MR images, and recover more details. We apply an iterative process with all these four modules cascaded to solve the optimization (2) as shown in the bottom row of Fig. 2. The converged optimum , found by the process, gives the final reconstruction.
IiiC Reconstruction process
We elaborate the computation of each module in our deep optimization framework, forming an efficient and converged reconstruction process.
Closed solution for module : This module intends to give an intermediate solution to the fidelity term in (2), , without the complex sparsity constraint. Instead, we constrain the solution close to the reconstruction at the previous iteration using the norm:
(7) 
where balances between the fidelity term and . This constraint guarantees the recovery from the fidelity term not deviating from the overall optimization direction. Moreover, the continuity of (7) renders a closed solution:
(8) 
Residual learning for module : Inspired by the discovery that artifacts from undersampled data have a topologically simpler structure than original images [17], we choose a deep architecture with shortcuts to learn structural information underlying prereconstructed images
with noise/artifacts. The deep architecture consists of seven blocks. The first block is composed of a dilated convolution layer cascaded with a rectified linear unit (ReLU) layer. Each of the five middle blocks consists of three layers,
i.e., a dilated convolution, a batch normalization and a ReLU. The last one is a single convolution layer. Dilated convolutions with a larger receptive field are adopted in the network to differentiate anatomical structures from artifacts, resulting in better denoising performance.
Such a deep approach accommodates various image structures reflected by training examples without the need of explicit models. More importantly, a well trained deep network can provide rather fast forward inference needing no gradient calculation in the image domain at each iteration. Specifically, we formulate this forward process as:
(9) 
where ResNet denotes the inference via the trained deep network with shortcuts. The output of the fidelity term is a set of coefficients upon the wavelet domain rather than an image, and thus we have to convert into the image domain by as the input fed to the network. Subsequently, we convert the network output back into coefficients by applying the inverse transformation of .
Optimal condition for module : To maintain the theoretical convergence of the proposed framework, we design a checking mechanism to guarantee iterations always converging towards the optimal solution of the original energy (2). First, we introduce a proximal gradient of a momentum term to connect the output of the datadriven module with the firstorder optimal condition of the minimization energy. We define the momentum proximal gradient^{1}^{1}1. as
(10) 
where is the stepsize and denotes the fidelity term in (2). Then, we establish a feedback mechanism by considering the firstorder optimal condition of (10) as
(11) 
Here, is a positive constant revealing the tolerance scale of the distance between the current solution and previous updating at the th stage. The previous output is also reconsidered when (11) is not satisfied. Finally, we summarize the checking module as
(12) 
In this manner, the iterative process bypasses the improper output of deep networks, and directs to the desired solution.
Proximal gradient for module : The sparsity prior in (2) acts as the last module for each iteration. Considering the nonconvexity of norm regularization, we solve it via a step of proximal gradient as follows:
(13) 
where is the step size. This operation enforces the prior term of the reconstruction model, and thus preserves more details avoiding oversmoothing reconstruction.
IiiD Theoretical investigations
Existing deep optimization strategies highly rely on the distribution of data, and discard the convergence guarantee in iterations [43, 9]. Our scheme not only includes energy optimization with imaging principles and prior knowledge, but also embraces a learnable deep architecture for fast inference. Furthermore, we design an effective mechanism to determine whether the output of networks at current iteration follows a descent direction towards the energy optimum. This subsection theoretically analyzes the convergence behavior of our method.
To simplify the following derivations, we first rewrite the function in (2) as
Furthermore, we also give the following properties about that are helpful for the convergence analysis^{2}^{2}2We place the details of these properties, and all the proofs of the following propositions and theorem in supplemental materials due to page limit.:
a) is proper and Lipschitz smooth;
b) is proper and lower semicontinuous;
c) satisfies the KŁ property and is coercive.
Then two important propositions and one theorem are given to characterize the convergence properties of our method.
Proposition 1
Let and be the sequences generated by Alg. 1. Supposing that the error condition in our module is satisfied, there exists a sequence such that
(14) 
where and is the Lipschitz coefficient of .
Proposition 2
If , let and be the sequences generated by a proximal operator, then we have
(15) 
Remark 1
The inequalities in Propositions 3 and 4 provide a descent sequence by illustrating the relationship of and as
Thus, the operator is a key criterion to check the output of the learnable deep module whether to propagate along a descent direction toward the optimum. Moreover, it also ingeniously builds a bridge to connect the adjacent iteration and .
Theorem 1
Suppose be a sequence generated by our algorithm. The following properties hold.

The square summable of sequence is bounded, i.e.,

The sequence converges to a critical point of .
Remark 2
The second property in Theorem 1 implies that is a Cauchy sequence so that it globally converges to the critical point of .
Iv Extensions to practical scenarios
Our framework can adapt to practical scenarios having similar models with (2). This section gives two examples, i.e., parallel imaging and reconstruction with Rician noise.
Iva Extension 1: Parallel imaging based CSMRI
Reconstruction of sparse multicoil data can be formed as:
(16) 
where is the number of receivers used for multicoil data acquisition. denotes the undersampled MR data recorded by the receiver with the corresponding sensitivity map which can be estimated via commonly used selfcalibration techniques [37, 12].
This CSPI model (16) shares a similar form with (2) so that we are able to readily apply the proposed framework for efficient and reliable reconstruction. The fidelity term in (16) derives the closedform solution for the module as:
(17) 
Accordingly, the derivative of the fidelity term, , in the modules and changes into:
(18) 
We can find final reconstruction by running Alg. 1 with other computations unchanged.
IvB Extension 2: CSMRI with Rician noise
In order to enable the proposed framework to tackle the reconstruction with nonaddition Rician noise, we reformulate the CSMRI model in (2) as:
(19) 
where , and denote the fully sampled clear MR image, observed space data and independent Gaussian noise in real and imaginary components, respectively. The matrices and are the inverse of wavelet transforms and , respectively.
We then split (19) into two subproblems:
(20a)  
(20b) 
where is a penalty parameter. Therefore, we can alternatively apply the iterative process in Alg. 1 to solve and , finally converging to the optimal estimate .
The subproblem (20a) aims to reconstruct a fully sampled noisy MR image from the space observation . Therefore, Alg.1 is applicable to the subproblem with all modules unchanged but taking the two square terms as the fidelity module:
(21) 
where and represent the output at previous iteration of the subproblems (20a) and (20b), respectively. and balances among the fidelity term, and . Also, the continuity of (21) renders a closed solution:
(22)  
Accordingly, we derive the derivative of the fidelity term in both modules and as:
(23) 
The second subproblem (20b) targets at noise removal by writing the fidelity module as:
(24) 
Unfortunately, we can hardly express a closedform solution to the module , and thus resort to an efficient learnable strategy with twostage IRCNNs, trained using pairs of noisefree and Gaussian noisy MR images, in order to tackle nonadditive Rician noise. The first stage predicts from , which removes additive noise by noticing . The second stage feeds the square root of the output of the first stage, i.e., , to the trained IRCNN, inferring that eliminates additive noise . Given , we obtain one step iteration by subsequently calculating the other three modules, , and in Alg. 1. The gradient in and changes into:
(25) 
where and are handily available as byproducts of the twostage IRCNN inference.
Ours 
Data  Tweighted MR Data  Tweighted MR Data  
Mask  Cartesian  Radial  Gaussian  Cartesian  Radial  Gaussian  
Evaluation  PSNR  RLNE  PSNR  RLNE  PSNR  RLNE  PSNR  RLNE  PSNR  RLNE  PSNR  RLNE 
ZeroFilling [4]  22.16  0.2874  25.19  0.2026  25.08  0.2051  23.80  0.3724  25.69  0.2636  26.48  0.2737 
TV [22]  23.85  0.2366  27.76  0.1514  29.76  0.1204  26.76  0.2650  33.27  0.1258  35.85  0.0938 
SIDWT [3]  23.43  0.2479  28.82  0.1449  29.64  0.1220  25.96  0.2905  33.51  0.1173  36.14  0.0905 
PBDW [30]  25.96  0.1857  30.86  0.1145  32.38  0.0890  29.18  0.2008  35.33  0.0951  38.54  0.0685 
PANO [31]  27.88  0.1492  30.51  0.1104  33.94  0.0746  31.74  0.1499  36.09  0.0912  40.43  0.0557 
FDLCP [45]  26.47  0.1757  30.75  0.1075  26.47  0.1757  31.74  0.1500  37.33  0.0788  31.74  0.1500 
BM3DMRI [11]  26.20  0.1802  31.08  0.1035  35.41  0.0630  29.23  0.1999  37.55  0.0770  42.67  0.0428 
DAMP [10]  25.90  0.1866  30.31  0.1128  35.30  0.0638  27.98  0.2306  35.98  0.0919  42.30  0.0447 
ADMMNet [43]  25.14  0.2041  29.42  0.1254  33.22  0.0807  28.62  0.2141  36.30  0.0888  39.19  0.0636 
DAGAN [41]  24.57  0.1707  26.79  0.1324  27.79  0.1177  25.89  0.1196  29.99  0.0748  30.40  0.0714 
RefineGAN [32]  25.27  0.2057  28.20  0.1462  26.72  0.1713  27.58  0.2506  33.54  0.1240  33.81  0.1226 
Ours  28.22  0.1433  32.01  0.0928  36.17  0.0576  33.57  0.1211  38.47  0.0691  42.77  0.0422 
Evaluation  ZeroFilling  SIDWT  PBDW  PANO  FDLCP  BM3DMRI  DMAP  ADMMNet  Ours 
PSNR  22.85  26.66  26.55  26.53  25.65  24.88  26.23  24.90  26.73 
SSIM  0.5353  0.5863  0.5553  0.5804  0.5293  0.5674  0.5282  0.5431  0.5949 
RLNE  0.3062  0.2339  0.2359  0.2289  0.2653  0.2729  0.2529  0.3085  0.2186 
Ground Truth  ZF (23.94)  PBDW (27.21)  PANO (27.66)  FDLCP (26.75)  DAMP (27.20)  Ours (28.33) 
Evaluation  SENSE  ESPIRiT  ESPIRiTL  GRAPPA  Ours 
PSNR  27.82  29.14  30.13  30.96  31.02 
RLNE  0.31  0.27  0.25  00.23  0.23 
Time  8.5570  9.8833  234.982  116.407  9.1078 
Noisy  LMMSE  NLM  UNLM  RiceVST  Ours 
(20.77)  (22.82)  (25.76)  (29.23)  (29.93)  (30.84) 
Ground Truth  ZeroFilling (24.22)  SIDWT (25.50)  PBDW (26.64)  PANO (27.03)  FDLCP (25.00) 
BM3DMRI (27.36)  DAMP (26.27)  ADMMNet(26.09)  DAGAN(24.19)  RefineGAN(25.12)  Ours(27.88) 
V Experiments
This section first explores the impact of each module in our framework on accuracy and convergence behavior. Next, we compare our algorithm with the stateoftheart CSMRI methods to demonstrate its superiority on accuracy, efficiency, and robustness. We also demonstrate on raw space data provided by an MR manufacturer Prodiva 1.5T scanner (Philips Healthcare, Best, Netherlands) besides benchmarks. Further, we conduct experiments on multicoils data and Rician noisy data. All experiments were executed on a PC with Intel(R) CPU @3.0GHz 256GB RAM and a NVIDIA TITAN Xp^{3}^{3}3Source codes and testing data are available at https://github.com/dlutdimt/TGDFTMI.
Va Ablation analysis
First we evaluate four combinations of modules in our framework to figure out their roles. The first case performs reconstruction with the constraint of sparsity prior, and the second one combines the fidelity and datadriven modules in order to demonstrate the performance of deep networks. We use both sparsity and deep priors without the optimal condition module as the third case . The entire paradigm with all modules is the last choice Ours. We apply these strategies on weighted data using the Ridial sampling pattern with 20% ratio. The stopping criterion for iterations is set as .
Figure 3 illustrates that the loss of is slightly larger than that of at the first several iterations. The datadriven module is more significant when image quality is low in the beginning. As the process goes on, the loss for fluctuates and its PSNR decreases, showing unstable inference, while iterations for keep stable. Combining deep and sparsity priors improves performance over the former two, but converges not so stable and fast as Ours. Higher PSNR values of Ours demonstrate more accurate reconstruction. The execution time of Ours is 2.5225s, less than that of (4.4762s), (3.3240s), and (6.2760s). Visual results in Fig. 4 also verify the superior quality of Ours over the other three.
VB Comparisons with the Stateoftheart
We compared our method with eleven CSMRI techniques, including eight model based, ZeroFilling [4], TV [22], SIDWT [3], PBDW [30], PANO [31], FDLCP [45], BM3DMRI [11] and DAMP [10], as well as three learning based, ADMMNet [43], DAGAN [41] and RefineGAN [32]. The latter three methods and ours were tested on GPU.
Comparisons on benchmark sets: We randomly selected 25 weighted and 25 weighted MRI data from 50 different subjects in the IXI dataset^{4}^{4}4http://braindevelopment.org/ixidataset/ for testing in order to evaluate reconstruction accuracy, time consumption and robustness to data variations. Three commonly used sampling patterns are adopted, i.e., the Cartesian [30], Radial [43] and Gaussian [41], with the sampling ratio varying from 10% to 50%.
The parameter in is set as 5 and noise level in ranges from 3.0 to 49.0. The parameters (), and in the modules and are set as , and , respectively. The maximum number of iterations is 50. Comparative approaches take the parameter settings as their respective papers. We refinetune deep models in DAGAN and RefineGAN upon their own using image pairs generated by various patterns and sampling ratios for fair comparisons.
First, we evaluate reconstruction accuracy on benchmark data using three sampling patterns in terms of PSNR and RLNE. Table I shows that ours leads all the competitors by a large margin for all sampling patterns. Next, reconstruction efficiency is explored via comparisons on time consumption and reconstruction error with the Radial sampling at a 50% ratio. The marker at the lower left of the left plot in Fig. 5 indicates that our method gives the least reconstruction error with a comparable low processing time, balancing accuracy and efficiency. Third, we use the Gaussian mask at five sampling ratios varying from 10% to 50%. As shown in the right plot of Fig. 5, ours have the highest PSNR values over all others at all sampling ratios, demonstrating its robustness.
These comparisons demonstrates that traditional methods give ideal accuracy but suffer from large time consumption, while learning based ones enjoy fast forward inference but depend so highly on training data that testing data deviating from training distribution may severely degrade the performance.
Comparisons on Real Complex Data: We conduct comparisons on real space data directly collected from a Philip machine to investigate the performance for practical applications. Many algorithms like BM3DMRI and those learning based ones consider no complex form of these real data. The flexibility of our framework enables us to directly deal with complex data by the modules and . We first separately handle the real and imaginary parts, and then merge to the complex form for the module . In this experiment, the noise level ranges from 20.0 to 49.0. The parameter in and is set to 1.7, and the maximum iteration number is 60.
VC Experiments on Parallel Imaging
We perform reconstruction from multicoils by comparing with four PI techniques, SENSE [29], GRAPPA [15], SPIRiT [23] and SPIRiT [27]. Table III shows that our approach achieves higher accuracy in less time, possessing an ideal ability to handle multicoil data. The top row of Fig. 7 visualizes the input from multiple coils, and the lower row shows the reconstruction using different techniques, where our method restores richer details with higher PSNR.
VD Experiments on reconstruction with Rician noise
The twostage deep networks solving the subproblem (20b) acts as a Rician denoiser. We first compare the strategy with classical Rician denoisers including NLM [6], UNLM [24], LMMSE [1] and RiceVST [13]. We generate training data by adding Rician noise at different levels to 500 weighted MRI data randomly chosen from the MICCAI 2013 grand challenge dataset^{5}^{5}5http://masiweb.vuse.vanderbilt.edu/workshop2013/index.php
/Segmentation_Challenge_Details.
Figure 8 shows error heatmaps between the ground truth and denoised image by various techniques. Our strategy successfully resolves the nonadditivity so that it attenuates noise more effectively than the others, especially for background. RiceVST performs close to ours, and hence we further compare with RiceVST on five noise levels varying from 10 to 50, shown in Fig. 9. Ours outputs higher PSNR at levels less than 30 with much less time than RiceVST.
Our framework naturally embeds the twostage deep denoiser into iterations for reconstruction. We investigate the performance of this oneset solution for reconstruction with Rician noise on weighted data in the IXI dataset. The parameters , and in (20a) and (20b) are set as 0.01, 1.0 and 1.0, respectively. For fair comparisons, we cascade RiceVST or our twostage denoiser to the other reconstruction algorithms as postprocessing. The right plot of Fig. 9 illustrates PSNR values reconstructed by the comparative algorithms with RiceVST (axis) and our twostage denoiser (axis) cascaded. The brown triangle denotes PSNR (dB) obtained by our oneset solution. This figure shows that a) our framework combing reconstruction with denoising produces the best quality over the other reconstruction algorithms with postprocessing, and b) our learnable twostage denoiser performs better than RiceVST as postpocessing for reconstruction. Figure 10 provides an visual illustration for reconstruction with noise, where our framework also preserves more details and introduces lower errors.
Vi Conclusion
We propose a theoretically converged deep framework for CSMRI reconstruction. This paradigm takes advantages of both modelbased and deep representation, improving accuracy and efficiency. Further, we devise an optimal condition that guides iterative propagation converging to the critical point of the energy with priors, guaranteeing reliability. We also apply this framework to parallel imaging and reconstruction with Rician noise for practical scenarios, without needing significant data retrain. Extensive experiments demonstrate the superiority of our framework over the stateoftheart. Future work directs to applications for more tasks like lowdose CT, and to incorporation of other effective deep architectures.
Acknowledgments
This paper extends from the conference version published at AAAI 2019 [20]. The authors would like to thank MR method manager Drs. Chenguang Zhao and sequence engineer Yuli Huang at Philips Healthcare (Suzhou) Co. Ltd. for providing testing data from Prodiva 1.5T scanner (Philips Healthcare, Best, Netherlands). This work is partially supported by the National Natural Science Foundation of China (Nos. 61922019, 61672125, 61572096 and 61632019), and the Fundamental Research Funds for the Central Universities.
Supplemental Materials
Proof
To simplify the following derivations, we first rewrite the function in Eq. (2) as
We first provide detailed explanations about the properties of , and .

is proper if is nonempty.

is Lipschitz smooth if is differentiable and there exists such that
If f is Lipschitz smooth, we have the following inequality

is lower semicontinuous if at any point .

is coercive if is bounded from below and if , where is the norm.

is is said to have the KurdykaŁojasiewicz (KŁ) property at if there exist , a neighborhood of and a desingularizing function which satisfies (1) is continuous at and ; (2) is concave and on ; (3) for all , such that for all
the following inequality holds
Moreover, if satisfies the KŁ property at each point of then is called a KŁ function.
Proposition 3
Let and be the sequences generated by Alg.1. Suppose that the error condition in our is satisfied. Then there existed a sequence such that
(26) 
where and is the Lipschitz coefficient of .
Proof VI.1
In our stage, we have
if the error condition is satisfied. Thus, by the definition of the proximal operator we get
(27)  
Hence in particular, taking and respectively, we obtain
(28) 
Invoking the Lipschitz smooth property of , we also have
(29) 
Combing Eqs. (28) and (29), we obtain
The last inequality holds under the assumption in the th iteration. So far, this prove that the inequality (3) in Proposition 3 holds.
Proposition 4
If , let and be the sequences generated by a proximal operator in Alg.1. Then we have
(30) 
Proof VI.2
Theorem 2
Suppose that be a sequence generated by Alg. 1. The following assertions hold.

The square summable of sequence is bounded, i.e.,

The sequence converges to a critical point of .
Proof VI.3
We first verify that square summable of is bounded. From Propositions 3 and 4, we deduce the
is established. It follows that both sequences and are nonincreasing. Then, since both and are proper, we have is bounded and that the objective sequences and converge to the same value , i.e.,
Moreover, using the assumption is coercive, we have that both and are bounded and thus have accumulation points. Considering Eq. (30) and the relationship of and , we get for any ,
(32) 
Summing over , we further have
(33) 
So far, the first assertion holds.
Next, we prove is a critical point of . Eq. (33) implies that when , i.e., there exist subsequences and
Comments
There are no comments yet.