Converged Deep Framework Assembling Principled Modules for CS-MRI

10/29/2019 ∙ by Risheng Liu, et al. ∙ 11

Compressed Sensing Magnetic Resonance Imaging (CS-MRI) significantly accelerates MR data acquisition at a sampling rate much lower than the Nyquist criterion. A major challenge for CS-MRI lies in solving the severely ill-posed inverse problem to reconstruct aliasing-free MR images from the sparse k-space data. Conventional methods typically optimize an energy function, producing reconstruction of high quality, but their iterative numerical solvers unavoidably bring extremely slow processing. Recent data-driven techniques are able to provide fast restoration by either learning direct prediction to final reconstruction or plugging learned modules into the energy optimizer. Nevertheless, these data-driven predictors cannot guarantee the reconstruction following constraints underlying the regularizers of conventional methods so that the reliability of their reconstruction results are questionable. In this paper, we propose a converged deep framework assembling principled modules for CS-MRI that fuses learning strategy with the iterative solver of a conventional reconstruction energy. This framework embeds an optimal condition checking mechanism, fostering efficient and reliable reconstruction. We also apply the framework to two practical tasks, i.e., parallel imaging and reconstruction with Rician noise. Extensive experiments on both benchmark and manufacturer-testing images demonstrate that the proposed method reliably converges to the optimal solution more efficiently and accurately than the state-of-the-art in various scenarios.



There are no comments yet.


page 1

page 2

page 3

page 6

page 8

page 9

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Magnetic Resonance Imaging (MRI) is a widely applied none-invasive technology that reveals both functional and anatomical information. Unfortunately, traditional MRI suffers from inherently slow acquisition from a large volume of data in the k-space, i.e., Fourier space [21]

. Researchers introduce the compressed sensing (CS) theory into MRI reconstruction, allowing fast acquisition at a sampling rate much lower than the Nyquist rate without degrading image quality. The main challenge for CS-MRI is the illness of the inherited inverse problem to estimate images of high quality from under-sampled data points. Moreover, the existence of acquisition noise and computation of discrete Fourier transformation deteriorate the reconstruction stability.

Conventional approaches to CS-MRI devote significant efforts to incorporating prior knowledge on images as regularizers in order to restore details as well as suppress artifacts. Sparsity constraints in a specific transform domain are the most commonly used [5, 30, 22, 14]. Dictionary learning based methods consider the sparsity in the subspace spanned by reference images [34, 2, 45]. Non-local paradigms take the advantage of the sparse representation by grouping similar local image patches [31, 11]. These methods derived from certain prior models are able to find a stable and satisfactory reconstruction by optimizing an energy functional with various forms of regularizers. Nevertheless, this optimization typically demands iterative calculation of gradient information in a high-dimensional image space, suffering from rather slow process.

Recent deep learning based approaches directly apply pre-trained deep architectures to infer MR images from degraded sampled inputs [38, 17, 35], rendering fast reconstruction. Nevertheless, these methods upon direct deep mappings are not so stable as conventional ones to data variations because they abandon the principled prior knowledge and totally depend on training examples. By taking advantage of both domain knowledge and available data, hybrid paradigms attempt to bridge the gap between the data learning and numerical optimization for functional energy with regularizers. For instance, Sun et al. [43] and Diamond et al. [9] unroll the iterative optimization as the prediction process using deep networks. Unfortunately, this type of ad hoc combinations breaks the convergence of optimization iterations so that it cannot guarantee the optimal solution to the original objective energy. Figure 1 illustrates an example of reconstruction from the latest deep learning based method [32] where the circled anatomic structures mistakenly appear artifacts while smearing part of tissues, though increasing PSNR from to . Therefore, neither medical research nor clinical diagnosis can trust reconstruction without theoretical guarantee.

Existing methods fail to provide an efficient or reliable treatment for CS-MRI. In this study, we design a converged deep framework comprising modules assembled in principle for trustworthy fast MR reconstruction. Specifically, we provide an efficient iterative process by integrating the task-driven domain knowledge with the data-driven deep networks. Meanwhile, we introduce an automatic feedback mechanism that leads deep iterations converging to the optimal solution of the original energy. Further, we consider two extensive tasks in practical scenarios, i.e., parallel imaging and Rician noise pollution, to investigate the robustness of our paradigm for real-world applications. Our framework flexibly adapts to the energies targeting to these scenarios, and robustly finds the solutions. Our contributions can be listed as follows:

  • We propose a generic deep framework assembling modules in principle to efficiently and reliably solve the CS-MRI model, typically non-convex and non-smooth.

  • We establish an automatic feedback mechanism in virtue of the first-order optimality condition to reject improper predictions from embedded deep architectures and to guide the deep iterations towards the desired solution.

  • We custom-tailor two extensions of our framework to handle the case of parallel imaging and of Rician noise contamination in practical scenarios.

Extensive experimental results on benchmarks and raw data collected in the -space of an MRI equipment demonstrate the superiority of the proposed paradigm against state-of-the-art techniques in both reconstruction accuracy and efficiency, as well as the robustness to noise contamination.

Ground Truth Zero-Filling RefineGAN Ours
Fig. 1: A reconstruction example from naive zero-filling, recent RefineGAN, and ours. Undesired artifacts and smoothing details in red circles emerge in the result of RefineGAN.
Fig. 2: The fidelity , data-driven , optimal condition , and prior modules forms one iteration of our reconstruction framework.

Ii Related Work

This section reviews approaches to CS-MRI reconstruction based on prior models and deep learning, followed by the algorithms for two practical reconstruction scenarios, i.e., parallel imaging and Rician noise removal.

Ii-a Model based CS-MRI

Conventional CS-MRI needs to solve an energy optimization with regularizations derived from image priors in a specific transform domain or subspace. This term plays a vital role in the reconstruction quality. Researchers have devoted tremendous efforts to exploring the sparsity prior in various transform domains including the gradient [5], discrete Cosine transform [22], partial Fourier transform (RecPF) [42], and wavelet transform domains [30]. These techniques, developed upon the sparse representation with a fixed set of basis functions, can hardly provide satisfactory quality. Alternatively, researchers resort to the data-driven sparse prior in the subspace spanned by an over-complete dictionary of reference images, performing better in details recovery and artifacts removal [34, 2, 45, 31, 11]. These dictionary based CS-MRI methods are able to generate reliable or plausible reconstruction following the underlying physical prior or constraint, but typically suffer from extremely slow process as the energy optimization turns out to be a computationally expensive iterative process.

Additionally, most of regularization terms derive the sparsity prior in the forms of the norm. The norm has nice mathematical properties, but may not be able to characterize complex data distributions in real applications. Also, simply concatenating reference images into an over-complete dictionary as the data dependent prior has the limitation in handling flexible structures of real imaging problems.

Ii-B Deep learning based CS-MRI

Recent studies introduce deep learning techniques into MRI reconstruction in order to utilize the information given by vaults of available data examples. Wang et al.

design and train a convolutional neural network (CNN) to learn the direct mapping from the input under-sampled

-space data to the desired fully-sampled image [38]. A modified U-Net architecture is used to learn the aliasing artifacts in [17] and a deep cascading of convolutional network is applied to accelerate MR imaging in[35].Lately, generative adversarial networks (GANs) are able to remove aliasing artifacts by generated details, and gain the state-of-the-art quality for a specific benchmark [32, 41, 26]. These methods provide an extremely fast feed-forward inference for reconstruction, but highly depend on training examples without any principled knowledge, failing to give the desired solution when the input deviates from the mode of training data distribution.

There exist hybrid techniques that integrate deep architectures into an iterative optimization process for the reconstruction energy with prior knowledge. Sun et al. present ADMM-Net for CS-MRI by introducing learnable architectures into the Alternating Direction Method of Multipliers (ADMM) iterations [43]. A similar unrolling scheme is developed in [9]. These methods upon deep priors can improve the reconstruction accuracy and reliability at a relatively faster speed. Unfortunately, the unrolling schemes that directly replace iterations by deep architectures cannot guarantee the convergence of the iterative optimization process. Moreover, no mechanism is available to control the errors generated by nested structures. Therefore, we can still hardly rely on these reconstructed results in medical analysis and diagnosis.

Ii-C Parallel imaging algorithms

Parallel imaging (PI) techniques, similar with CS-MRI, are also commonly used in clinic applications to accelerate MRI acquisition by using different sensitivities received from multiple coils installed at different locations. For the sake of saving time, PI conducts spatial encoding in the form of receiver coils with various sensitivity profiles instead of gradient encoding. SENSE and its variants directly restore the images from under-sampled data in the image domain [29, 44]

. Another line of studies is to first interpolate the missing data in the frequency domain, and then to transfer the results to the image domain,

e.g., GRAPPA[15], SPIRiT [23], and their derived techniques. Recent approaches combine a CS energy model with PI in order to further speed up the acquisition such as CS-SENSE [18], CS-GRAPPA [7], -SPIRiT [27], and SAKE [36]. It is worth investigating how these energy-based PI approaches may benefit from deep learning.

Ii-D Rician noise removal

Independent zero-mean Gaussian noise with equal variances may simultaneously corrupt both real and imaginary parts of the complex data in the

-space during practical MR acquisition [33]. Typical MR imaging techniques convert the original complex Gaussian noise to Rician noise when computing the magnitude of the complex -space data [16]. The non-additive Rician noise challenges traditional noise removal techniques. Researchers designate special treatments targeting at Rician noise removal, e.g., nonlocal-means algorithms[39, 25], wavelet packet[40], wavelet domain filtering[28], and total variation(TV) denoisers [8, 19]. Deep learning based denoisers are also developed to tackle Rician noise [43, 41]. Nevertheless, it is still an open problem to fuse Rician noise removal with efficient CS-MRI reconstruction in a unified framework for practical MR acquisition.

Iii Method

We first give the problem formulation of CS-MRI, and present our deep framework. Subsequently, we detail the reconstruction process that converges to the optimum of the original energy, and finally provide the theoretical analysis.

Iii-a Problem formulation

According to the CS theory [11], typical CS-MRI reconstruction attempts to recover a fully-sampled image from under-sampled -space data as:


where represents the under-sampled observation in the -space, and

denotes the column vector of the desired MR image to be reconstructed. The Fourier transform and under-sampling operation are denoted as

and , respectively. The composite operator constitutes a linear operation . represents the regularization imposed on the ill-posed inverse problem for reconstruction in order to constrain the searching space of the desired solution .

Existing CS-MRI techniques usually solve the energy optimization (1) by iterations upon gradient information with the prior penalty. These methods suffer from expensive time consumption owing to the iterative gradient calculations. Alternatively, an end-to-end inference upon deep networks can achieve efficient MR reconstruction from under-sampled data. These direct approaches without explicit constraints from prior knowledge lack reliability or robustness to data variations. Hybrid strategies integrate both principled knowledge and deep architectures into the optimization procedure, but this simple combination has no theoretical guarantee on the convergence so that the reliability issue has not been resolved yet. Such trade of reliability for efficiency may produce serious consequence in medical analysis and diagnosis because one cannot tell whether a reconstruction algorithm restores authentic anatomical structures or fabricates/hallucinates details upon an available data distribution. In this work, we propose a deep reconstruction framework with theoretical guarantee on convergence, enabling both efficiency and robustness.

Iii-B Deep optimization framework

We consider the classical CS-MRI model with the sparsity regularization on the wavelet basis:


where denotes the sparse code of on the wavelet basis  (), and indicates the trade-off parameter. We use the regularization with so that the problem turns out to be challenging non-convex sparse optimization. To tackle the challenge, our framework embraces imaging principles, deep inference, and sparsity priors, either of which was proved to be critical for effective reconstruction in previous studies. Accordingly, we design the fidelity, data-driven, and prior modules in every optimization iteration to exploit these three types of information. Further, the framework embeds an optimality conditional module that automatically checks the output of each iteration and rejects the improper one leading to the undesired result. This module maintains the convergence, yielding a reliable solution to the optimization.

The top row of Fig. 2 demonstrates the cascade of all ingredients at each iteration of the optimization for our framework, and we elaborate the four modules in the order shown as Fig. 2.

Fidelity module: The fidelity term reveals the image formation process of MR imaging in (2), which is necessary for reconstruction algorithms. This module in our framework constitutes a mapping at the (+1)th iteration from the previous output  (initialized as the degraded input ) to an intermediate reconstructed image , characterizing the inverse imaging process as:


where denotes the parameters for the mapping. The calculation of the mapping and its parameters will be detailed later in the reconstruction process.

Data-driven module: Networks provide a flexible mechanism for characterizing underlying structures shared by numerous training pairs. Our framework consists of deep architectures encoding these image priors. We regard the artifacts introduced by the fidelity mapping as unknown noise, and thus employ a residual denoising network with shortcuts (IRCNN) [46] as the data-driven module of our framework:


where denotes the parameters of the residual network in the (+1)th stage. We will give a detailed depiction of the choice for this CNN-based denoiser later.

Optimal condition module: One major issue arises with the data-driven module whether the inference of the deep network follows the direction converging to the optimum of the model energy. This module equipped with a checking mechanism resolves this issue, via automatically checking and rejecting the improper output of the data-driven module:


where and are the outputs of the data-driven module at this stage and of the previous iteration, respectively. denotes the parameters for the optimality condition. This module calculates an indicator using the first-order optimality condition that determines whether to accept the updating from deep architectures.

Prior module: The sparsity prior upon a specific basis is necessary to constrain the solution space in traditional CS-MRI. The prior naturally describes the intrinsic mutual characteristics of MR images so that reconstruction can benefit from incorporating this prior. Thus, we append a prior module to the condition module to further improve restoration quality:


where denotes the parameters for the prior. Consequently, we are able to include domain knowledge for MR images, and recover more details. We apply an iterative process with all these four modules cascaded to solve the optimization (2) as shown in the bottom row of Fig. 2. The converged optimum , found by the process, gives the final reconstruction.

Iii-C Reconstruction process

We elaborate the computation of each module in our deep optimization framework, forming an efficient and converged reconstruction process.

Closed solution for module : This module intends to give an intermediate solution to the fidelity term in (2), , without the complex sparsity constraint. Instead, we constrain the solution close to the reconstruction at the previous iteration using the norm:


where balances between the fidelity term and . This constraint guarantees the recovery from the fidelity term not deviating from the overall optimization direction. Moreover, the continuity of (7) renders a closed solution:


Residual learning for module : Inspired by the discovery that artifacts from under-sampled data have a topologically simpler structure than original images [17], we choose a deep architecture with shortcuts to learn structural information underlying pre-reconstructed images

with noise/artifacts. The deep architecture consists of seven blocks. The first block is composed of a dilated convolution layer cascaded with a rectified linear unit (ReLU) layer. Each of the five middle blocks consists of three layers, 


, a dilated convolution, a batch normalization and a ReLU. The last one is a single convolution layer. Dilated convolutions with a larger receptive field are adopted in the network to differentiate anatomical structures from artifacts, resulting in better denoising performance.

Such a deep approach accommodates various image structures reflected by training examples without the need of explicit models. More importantly, a well trained deep network can provide rather fast forward inference needing no gradient calculation in the image domain at each iteration. Specifically, we formulate this forward process as:


where ResNet denotes the inference via the trained deep network with shortcuts. The output of the fidelity term is a set of coefficients upon the wavelet domain rather than an image, and thus we have to convert into the image domain by as the input fed to the network. Subsequently, we convert the network output back into coefficients by applying the inverse transformation of .

Optimal condition for module : To maintain the theoretical convergence of the proposed framework, we design a checking mechanism to guarantee iterations always converging towards the optimal solution of the original energy (2). First, we introduce a proximal gradient of a momentum term to connect the output of the data-driven module with the first-order optimal condition of the minimization energy. We define the momentum proximal gradient111. as


where is the step-size and denotes the fidelity term in (2). Then, we establish a feedback mechanism by considering the first-order optimal condition of (10) as


Here, is a positive constant revealing the tolerance scale of the distance between the current solution and previous updating at the -th stage. The previous output is also re-considered when (11) is not satisfied. Finally, we summarize the checking module as


In this manner, the iterative process bypasses the improper output of deep networks, and directs to the desired solution.

Proximal gradient for module : The sparsity prior in (2) acts as the last module for each iteration. Considering the non-convexity of -norm regularization, we solve it via a step of proximal gradient as follows:


where is the step size. This operation enforces the prior term of the reconstruction model, and thus preserves more details avoiding over-smoothing reconstruction.

Algorithm 1 lists the iterative reconstruction process where each iteration consists of cascaded computations for the four principled modules. This process converges to the solution to (2), and we will give the theoretical analysis below.

0:   and necessary parameters.
0:  Reconstructed MR image .
1:  while  not converged do
6:  end while
Algorithm 1 Reconstruction procedure

Iii-D Theoretical investigations

Existing deep optimization strategies highly rely on the distribution of data, and discard the convergence guarantee in iterations [43, 9]. Our scheme not only includes energy optimization with imaging principles and prior knowledge, but also embraces a learnable deep architecture for fast inference. Furthermore, we design an effective mechanism to determine whether the output of networks at current iteration follows a descent direction towards the energy optimum. This subsection theoretically analyzes the convergence behavior of our method.

To simplify the following derivations, we first rewrite the function in (2) as

Furthermore, we also give the following properties about that are helpful for the convergence analysis222We place the details of these properties, and all the proofs of the following propositions and theorem in supplemental materials due to page limit.:
a) is proper and Lipschitz smooth;
b) is proper and lower semi-continuous;
c) satisfies the KŁ property and is coercive.

Then two important propositions and one theorem are given to characterize the convergence properties of our method.

Proposition 1

Let and be the sequences generated by Alg. 1. Supposing that the error condition in our module is satisfied, there exists a sequence such that


where and is the Lipschitz coefficient of .

Proposition 2

If , let and be the sequences generated by a proximal operator, then we have

Remark 1

The inequalities in Propositions 3 and 4 provide a descent sequence by illustrating the relationship of and as

Thus, the operator is a key criterion to check the output of the learnable deep module whether to propagate along a descent direction toward the optimum. Moreover, it also ingeniously builds a bridge to connect the adjacent iteration and .

Theorem 1

Suppose be a sequence generated by our algorithm. The following properties hold.

  • The square summable of sequence is bounded, i.e.,

  • The sequence converges to a critical point of .

Remark 2

The second property in Theorem 1 implies that is a Cauchy sequence so that it globally converges to the critical point of .

Iv Extensions to practical scenarios

Our framework can adapt to practical scenarios having similar models with (2). This section gives two examples, i.e., parallel imaging and reconstruction with Rician noise.

Iv-a Extension 1: Parallel imaging based CS-MRI

Reconstruction of sparse multi-coil data can be formed as:


where is the number of receivers used for multi-coil data acquisition. denotes the under-sampled MR data recorded by the receiver with the corresponding sensitivity map which can be estimated via commonly used self-calibration techniques [37, 12].

This CS-PI model (16) shares a similar form with (2) so that we are able to readily apply the proposed framework for efficient and reliable reconstruction. The fidelity term in (16) derives the closed-form solution for the module as:


Accordingly, the derivative of the fidelity term, , in the modules and changes into:


We can find final reconstruction by running Alg. 1 with other computations unchanged.

Iv-B Extension 2: CS-MRI with Rician noise

In order to enable the proposed framework to tackle the reconstruction with non-addition Rician noise, we re-formulate the CS-MRI model in (2) as:


where , and denote the fully sampled clear MR image, observed -space data and independent Gaussian noise in real and imaginary components, respectively. The matrices and are the inverse of wavelet transforms and , respectively.

We then split  (19) into two subproblems:


where is a penalty parameter. Therefore, we can alternatively apply the iterative process in Alg. 1 to solve and , finally converging to the optimal estimate .

The subproblem (20a) aims to reconstruct a fully sampled noisy MR image from the -space observation . Therefore, Alg.1 is applicable to the subproblem with all modules unchanged but taking the two square terms as the fidelity module:


where and represent the output at previous iteration of the subproblems (20a) and (20b), respectively. and balances among the fidelity term, and . Also, the continuity of (21) renders a closed solution:


Accordingly, we derive the derivative of the fidelity term in both modules and as:


The second subproblem (20b) targets at noise removal by writing the fidelity module as:


Unfortunately, we can hardly express a closed-form solution to the module , and thus resort to an efficient learnable strategy with two-stage IRCNNs, trained using pairs of noise-free and Gaussian noisy MR images, in order to tackle non-additive Rician noise. The first stage predicts from , which removes additive noise by noticing . The second stage feeds the square root of the output of the first stage, i.e., , to the trained IRCNN, inferring that eliminates additive noise . Given , we obtain one step iteration by subsequently calculating the other three modules, , and in Alg. 1. The gradient in and changes into:


where and are handily available as byproducts of the two-stage IRCNN inference.

Fig. 3: Loss (left) and PSNR (right) evolution during iterations of four module combinations.
Fig. 4: Images reconstructed with four module combinations.
Data T-weighted MR Data T-weighted MR Data
Mask Cartesian Radial Gaussian Cartesian Radial Gaussian
ZeroFilling [4] 22.16 0.2874 25.19 0.2026 25.08 0.2051 23.80 0.3724 25.69 0.2636 26.48 0.2737
TV [22] 23.85 0.2366 27.76 0.1514 29.76 0.1204 26.76 0.2650 33.27 0.1258 35.85 0.0938
SIDWT [3] 23.43 0.2479 28.82 0.1449 29.64 0.1220 25.96 0.2905 33.51 0.1173 36.14 0.0905
PBDW [30] 25.96 0.1857 30.86 0.1145 32.38 0.0890 29.18 0.2008 35.33 0.0951 38.54 0.0685
PANO [31] 27.88 0.1492 30.51 0.1104 33.94 0.0746 31.74 0.1499 36.09 0.0912 40.43 0.0557
FDLCP [45] 26.47 0.1757 30.75 0.1075 26.47 0.1757 31.74 0.1500 37.33 0.0788 31.74 0.1500
BM3D-MRI [11] 26.20 0.1802 31.08 0.1035 35.41 0.0630 29.23 0.1999 37.55 0.0770 42.67 0.0428
DAMP [10] 25.90 0.1866 30.31 0.1128 35.30 0.0638 27.98 0.2306 35.98 0.0919 42.30 0.0447
ADMM-Net [43] 25.14 0.2041 29.42 0.1254 33.22 0.0807 28.62 0.2141 36.30 0.0888 39.19 0.0636
DAGAN [41] 24.57 0.1707 26.79 0.1324 27.79 0.1177 25.89 0.1196 29.99 0.0748 30.40 0.0714
RefineGAN [32] 25.27 0.2057 28.20 0.1462 26.72 0.1713 27.58 0.2506 33.54 0.1240 33.81 0.1226
Ours 28.22 0.1433 32.01 0.0928 36.17 0.0576 33.57 0.1211 38.47 0.0691 42.77 0.0422
TABLE I: Quantitative comparison on different sampling patterns at 20% sampling ratio.
Fig. 5: RLNE VS Running time (left) and PSNR VS sample ratios (right) reflect efficiency and robustness to sampling variations, respectively. We omit Zerofilling and TV whose results fall beyond the range of these two plots.
PSNR 22.85 26.66 26.55 26.53 25.65 24.88 26.23 24.90 26.73
SSIM 0.5353 0.5863 0.5553 0.5804 0.5293 0.5674 0.5282 0.5431 0.5949
RLNE 0.3062 0.2339 0.2359 0.2289 0.2653 0.2729 0.2529 0.3085 0.2186
TABLE II: Comparisons of time consumption on T-weighted Data using Radial pattern at 50% sampling ratio.
Ground Truth ZF (23.94) PBDW (27.21) PANO (27.66) FDLCP (26.75) DAMP (27.20) Ours (28.33)
Fig. 6: Images, zoomed-in details, and error heat maps (blue-lower and red-higher) reconstructed from raw -space data at the acceleration factor of 2.0. Resultant PSNR values are parenthesized after corresponding algorithms.
PSNR 27.82 29.14 30.13 30.96 31.02
RLNE 0.31 0.27 0.25 00.23 0.23
Time 8.5570 9.8833 234.982 116.407 9.1078
TABLE III: Comparisons on parallel imaging
c GT SENSE (27.43) ESPIRiT(29.59) ESPIRiT-L(30.59) GRAPPA (30.43) Ours (31.961)
Fig. 7: Images for parallel imaging from multi-channel data collected by 8 coils with the acceleration factor of 2.0. PNSR values are given in parentheses after corresponding algorithms.
(20.77) (22.82) (25.76) (29.23) (29.93) (30.84)
Fig. 8: Heat maps (Blue-lower, Red-higher) of Rician denoisers with the noise level of 20. PSNR values are given below corresponding algorithms.
Fig. 9: Left: PSNR and running time at different noise levels. Right: PSNR values by reconstruction algorithms with RiceVST (-axis) and the two-stage deep denoiser (-axis) cascaded as post-processing at the Rician noise level 20. Ours embeds the deep denoiser in every iteration.
Ground Truth ZeroFilling (24.22) SIDWT (25.50) PBDW (26.64) PANO (27.03) FDLCP (25.00)
BM3D-MRI (27.36) DAMP (26.27) ADMM-Net(26.09) DAGAN(24.19) RefineGAN(25.12) Ours(27.88)
Fig. 10: Images, zoomed-in details, and heat maps reconstructed by traditional CS-MRI methods with the two-stage deep denoiser as post-processing. Ours combines reconstruction with the denoiser in one framewok.The parenthesized value gives the resultant PSNR.

V Experiments

This section first explores the impact of each module in our framework on accuracy and convergence behavior. Next, we compare our algorithm with the state-of-the-art CS-MRI methods to demonstrate its superiority on accuracy, efficiency, and robustness. We also demonstrate on raw -space data provided by an MR manufacturer Prodiva 1.5T scanner (Philips Healthcare, Best, Netherlands) besides benchmarks. Further, we conduct experiments on multi-coils data and Rician noisy data. All experiments were executed on a PC with Intel(R) CPU @3.0GHz 256GB RAM and a NVIDIA TITAN Xp333Source codes and testing data are available at

V-a Ablation analysis

First we evaluate four combinations of modules in our framework to figure out their roles. The first case performs reconstruction with the constraint of sparsity prior, and the second one combines the fidelity and data-driven modules in order to demonstrate the performance of deep networks. We use both sparsity and deep priors without the optimal condition module as the third case . The entire paradigm with all modules is the last choice Ours. We apply these strategies on -weighted data using the Ridial sampling pattern with 20% ratio. The stopping criterion for iterations is set as .

Figure 3 illustrates that the loss of is slightly larger than that of at the first several iterations. The data-driven module is more significant when image quality is low in the beginning. As the process goes on, the loss for fluctuates and its PSNR decreases, showing unstable inference, while iterations for keep stable. Combining deep and sparsity priors improves performance over the former two, but converges not so stable and fast as Ours. Higher PSNR values of Ours demonstrate more accurate reconstruction. The execution time of Ours is 2.5225s, less than that of (4.4762s), (3.3240s), and  (6.2760s). Visual results in Fig. 4 also verify the superior quality of Ours over the other three.

V-B Comparisons with the State-of-the-art

We compared our method with eleven CS-MRI techniques, including eight model based, Zero-Filling [4], TV [22], SIDWT [3], PBDW [30], PANO [31], FDLCP [45], BM3D-MRI [11] and DAMP [10], as well as three learning based, ADMM-Net [43], DAGAN [41] and RefineGAN [32]. The latter three methods and ours were tested on GPU.

Comparisons on benchmark sets: We randomly selected 25 -weighted and 25 -weighted MRI data from 50 different subjects in the IXI dataset444 for testing in order to evaluate reconstruction accuracy, time consumption and robustness to data variations. Three commonly used sampling patterns are adopted, i.e., the Cartesian [30], Radial [43] and Gaussian [41], with the sampling ratio varying from 10% to 50%.

The parameter in is set as 5 and noise level in ranges from 3.0 to 49.0. The parameters (), and in the modules and are set as , and , respectively. The maximum number of iterations is 50. Comparative approaches take the parameter settings as their respective papers. We re-finetune deep models in DAGAN and RefineGAN upon their own using image pairs generated by various patterns and sampling ratios for fair comparisons.

First, we evaluate reconstruction accuracy on benchmark data using three sampling patterns in terms of PSNR and RLNE. Table I shows that ours leads all the competitors by a large margin for all sampling patterns. Next, reconstruction efficiency is explored via comparisons on time consumption and reconstruction error with the Radial sampling at a 50% ratio. The marker at the lower left of the left plot in Fig. 5 indicates that our method gives the least reconstruction error with a comparable low processing time, balancing accuracy and efficiency. Third, we use the Gaussian mask at five sampling ratios varying from 10% to 50%. As shown in the right plot of Fig. 5, ours have the highest PSNR values over all others at all sampling ratios, demonstrating its robustness.

These comparisons demonstrates that traditional methods give ideal accuracy but suffer from large time consumption, while learning based ones enjoy fast forward inference but depend so highly on training data that testing data deviating from training distribution may severely degrade the performance.

Comparisons on Real Complex Data: We conduct comparisons on real -space data directly collected from a Philip machine to investigate the performance for practical applications. Many algorithms like BM3D-MRI and those learning based ones consider no complex form of these real data. The flexibility of our framework enables us to directly deal with complex data by the modules and . We first separately handle the real and imaginary parts, and then merge to the complex form for the module . In this experiment, the noise level ranges from 20.0 to 49.0. The parameter in and is set to 1.7, and the maximum iteration number is 60.

The PSNR, SSIM and RLNE values of our method in Tab. II demonstrate more accurate reconstruction than the others on real data. Figure 6 gives the qualitative comparisons with enlarged details and error heat maps. Our method produces sharper textures and less error, showing the superior capability.

V-C Experiments on Parallel Imaging

We perform reconstruction from multi-coils by comparing with four PI techniques, SENSE [29], GRAPPA [15], SPIRiT [23] and -SPIRiT [27]. Table III shows that our approach achieves higher accuracy in less time, possessing an ideal ability to handle multi-coil data. The top row of Fig. 7 visualizes the input from multiple coils, and the lower row shows the reconstruction using different techniques, where our method restores richer details with higher PSNR.

V-D Experiments on reconstruction with Rician noise

The two-stage deep networks solving the subproblem (20b) acts as a Rician denoiser. We first compare the strategy with classical Rician denoisers including NLM [6], UNLM [24], LMMSE [1] and RiceVST [13]. We generate training data by adding Rician noise at different levels to 500 -weighted MRI data randomly chosen from the MICCAI 2013 grand challenge dataset555
. Figure 8 shows error heatmaps between the ground truth and denoised image by various techniques. Our strategy successfully resolves the non-additivity so that it attenuates noise more effectively than the others, especially for background. RiceVST performs close to ours, and hence we further compare with RiceVST on five noise levels varying from 10 to 50, shown in Fig. 9. Ours outputs higher PSNR at levels less than 30 with much less time than RiceVST.

Our framework naturally embeds the two-stage deep denoiser into iterations for reconstruction. We investigate the performance of this one-set solution for reconstruction with Rician noise on -weighted data in the IXI dataset. The parameters , and in (20a) and (20b) are set as 0.01, 1.0 and 1.0, respectively. For fair comparisons, we cascade RiceVST or our two-stage denoiser to the other reconstruction algorithms as post-processing. The right plot of Fig. 9 illustrates PSNR values reconstructed by the comparative algorithms with RiceVST (-axis) and our two-stage denoiser (-axis) cascaded. The brown triangle denotes PSNR (dB) obtained by our one-set solution. This figure shows that a) our framework combing reconstruction with denoising produces the best quality over the other reconstruction algorithms with post-processing, and b) our learnable two-stage denoiser performs better than RiceVST as post-pocessing for reconstruction. Figure 10 provides an visual illustration for reconstruction with noise, where our framework also preserves more details and introduces lower errors.

Vi Conclusion

We propose a theoretically converged deep framework for CS-MRI reconstruction. This paradigm takes advantages of both model-based and deep representation, improving accuracy and efficiency. Further, we devise an optimal condition that guides iterative propagation converging to the critical point of the energy with priors, guaranteeing reliability. We also apply this framework to parallel imaging and reconstruction with Rician noise for practical scenarios, without needing significant data re-train. Extensive experiments demonstrate the superiority of our framework over the state-of-the-art. Future work directs to applications for more tasks like low-dose CT, and to incorporation of other effective deep architectures.


This paper extends from the conference version published at AAAI 2019 [20]. The authors would like to thank MR method manager Drs. Chenguang Zhao and sequence engineer Yuli Huang at Philips Healthcare (Suzhou) Co. Ltd. for providing testing data from Prodiva 1.5T scanner (Philips Healthcare, Best, Netherlands). This work is partially supported by the National Natural Science Foundation of China (Nos. 61922019, 61672125, 61572096 and 61632019), and the Fundamental Research Funds for the Central Universities.

Supplemental Materials


To simplify the following derivations, we first rewrite the function in Eq. (2) as

We first provide detailed explanations about the properties of , and .

  • is proper if is nonempty.

  • is -Lipschitz smooth if is differentiable and there exists such that

    If f is -Lipschitz smooth, we have the following inequality

  • is lower semi-continuous if at any point .

  • is coercive if is bounded from below and if , where is the norm.

  • is is said to have the Kurdyka-Łojasiewicz (KŁ) property at if there exist , a neighborhood of and a desingularizing function which satisfies (1) is continuous at and ; (2) is concave and on ; (3) for all , such that for all

    the following inequality holds

    Moreover, if satisfies the KŁ property at each point of then is called a KŁ function.

Proposition 3

Let and be the sequences generated by Alg.1. Suppose that the error condition in our is satisfied. Then there existed a sequence such that


where and is the Lipschitz coefficient of .

Proof VI.1

In our stage, we have

if the error condition is satisfied. Thus, by the definition of the proximal operator we get


Hence in particular, taking and respectively, we obtain


Invoking the Lipschitz smooth property of , we also have


Combing Eqs. (28) and (29), we obtain

The last inequality holds under the assumption in the -th iteration. So far, this prove that the inequality  (3) in Proposition 3 holds.

Proposition 4

If , let and be the sequences generated by a proximal operator in Alg.1. Then we have

Proof VI.2

As the proximal stage shows in Alg. 1, we have


Similar with the deduction in Proposition 3, we obtain the following inequality:

Thus we get the conclusion

Theorem 2

Suppose that be a sequence generated by Alg. 1. The following assertions hold.

  • The square summable of sequence is bounded, i.e.,

  • The sequence converges to a critical point of .

Proof VI.3

We first verify that square summable of is bounded. From Propositions 3 and 4, we deduce the

is established. It follows that both sequences and are non-increasing. Then, since both and are proper, we have is bounded and that the objective sequences and converge to the same value , i.e.,

Moreover, using the assumption is coercive, we have that both and are bounded and thus have accumulation points. Considering Eq. (30) and the relationship of and , we get for any ,


Summing over , we further have


So far, the first assertion holds.

Next, we prove is a critical point of . Eq. (33) implies that when , i.e., there exist subsequences and