1 Introduction
Magnetic Resonance Imaging (MRI) is widely utilized in clinical applications because of its noneinvasive property and excellent capability in revealing both functional and anatomical information. However, one of the main drawbacks is the inherently slow acquisition speed of MRI in kspace (i.e., Fourier space), due to the limitation of hardwares [Lustig et al.2008]. Compressive sensing MRI (CSMRI) is a commonly used technique allowing fast acquisition at data sampling rate much lower than Nyquist rate without deteriorating the image quality.
In the process of MR data acquisition, the sparse kspace data can be approximated as the following discretized linear system [Eksioglu2016]:
(1) 
where is the desired MR image to be reconstructed from observation ,
is the Fourier transform,
denotes the undersampling operation, andrepresents the acquisition noises. It is easy to find that estimating
from Eq. (1) is actually illposed, due to the singularity of . Thus the main challenge for CSMRI lies in defining proper regularizations and proposing corresponding nonlinear optimization or iterative algorithm for the inverse problem. According to CS theory, the most typical CSMRI reconstruction techniques attempt to optimize the following nonsmooth regularized model:(2) 
where denotes the sparse code of , corresponding to the wavelet basis (denoted as , so we actually have ) of a given inverse wavelet transform (usually implemented by DCT or DWT) and indicates the tradeoff parameter. In this work, we consider the regularization with , thus actually need to address a challenging nonconvex sparse optimization task.
1.1 Related Works
In a conventional way, some effects are in exploring the sparse regularization in specific transform domain [Qu et al.2012, Lustig, Donoho, and Pauly2007, Gho et al.2010]. In addition, algorithms based on the nonlocal processing paradigm aim to construct a better sparsity transform, [Qu et al.2014, Eksioglu2016]. Some dictionary learning based methods focus on training a dictionary from reference images in the related subspace [Ravishankar and Bresler2011, Babacan et al.2011, Zhan et al.2016]. These modelbased CSMRI methods give satisfying performances in details restoration benefiting from the model based data prior. While most of them are based on a sparsity introducing norm or norm, which have weaker abilities to describe the sparsity in real sense and the insufficient sparse regularization may introduce artifacts. What’s more, lacking of data dependent prior, such methods have limitation in handling particular structure of the problem or specific data distribution in real scene.
Recently, some preliminary studies on deep learning based CSMRI have attracted great attentions [Wang et al.2016, Lee, Yoo, and Ye2017, Schlemper et al.2018]. At the expense of sacrificing principled information, such techniques benefit an extremely fast feedforward process with the aid of deep architecture, but are not as flexible as the model based ones to handle various distributions of data.
To absorb the advantages of principled knowledge and datadriven prior, techniques are proposed integrating the deep architecture into an iterative optimization process [Liu et al.2017, Liu et al.2018a, Liu et al.2018c, Liu et al.2018b]. By introducing learnable architectures into the Alternating Direction Method of Multipliers (ADMM) iterations, they proposed an ADMMNet to address the CSMRI problem. In [Diamond et al.2017] unrolled Optimization with Deep Priors (ODP) is proposed to integrate CNN priors into optimization process. It has been demonstrated that these deep priors can significantly improve the reconstruction accuracy. But unfortunately, due to their naive unrolling schemes (i.e., directly replace iterations by architectures), the convergence of these deep optimization based CSMRI methods cannot be theoretically guaranteed. Even worse, no mechanism is proposed to control the errors generated by nested structures. Thus these methods may completely failed if improper architectures are utilized during iterations.
Except for the aforementioned problems, one concern but yet remain unsolved issue is the noise in MR images under real scene. During the practical acquisition of MR data, both real and imaginary parts of the complex data in kspace may be corrupted with uncorrelated zeromean Gaussian noise with equal variance
[Rajan et al.2012]. Thus the actual acquired magnitude MR image is given as follows:(3) 
where denotes the noisefree signal intensity, and represent the uncorrelated real and imaginary components of Gaussian noise, respectively. Notice that the noise in magnitude MR image
no longer follows the Gaussian distribution, but a Rician one.
Respectable effects for removing Rician noise from MR data have been made in several ways [WiestDaesslé et al.2008, Manjón et al.2010, Chen and Zeng2015, Liu and Zhao2016], but rather few studies on CSMRI take into account the presence of actual noise in MR images. Either these CSMRI methods consider only noiseless MR images or treat the noise as a Gaussian one [Sun et al.2016, Yang et al.2018a]. To the best of our knowledge, to date this matter has not been well solved in real sense.
1.2 Contributions
As discussed above, one of the most important limitations of existing deep optimization based CSMRI methods (e.g., [Sun et al.2016] and [Diamond et al.2017]) is the lack of theoretical investigations. To address the aforementioned issues, in this paper, we propose a new deep algorithmic framework to optimize the CSMRI problem in Eq. (2). Specifically, by integrating domain knowledge of the task and learningbased architectures, as well as checking the optimal conditions, we can strictly prove the convergence of our generated deep propagations. Moreover, due to the complex noise distributions (i.e., Rician), it is still challenging to directly apply existing CSMRI methods to tasks in realworld scenarios. Thanks to the flexibility of our paradigm, we further extend the propagation framework to adaptively and iteratively remove Rician noise to guarantee the robustness of the MRI process. Our contributions can be summarized as follows:

To our best knowledge, this is the first work that could establish theoretically convergent deep optimization algorithms to efficiently solve the nonconvex and nonsmooth CSMRI model in Eq. (2). Thanks to the automatic checking and feedback mechanism (based on firstorder optimality conditions), we can successfully reject improper nested architectures, thus will always propagate toward our desired solutions.

To address the robustness issues in existing CSMRI approaches, we also develop an extended deep propagation scheme to simultaneously recover the latent MR images and remove the Rician nosies for realworld MRI process.

Extensive experimental results on realworld benchmarks demonstrate the superiority of the proposed paradigm against stateoftheart techniques in both reconstruction accuracy and efficiency, as well as the robustness to complex noise pollution.
2 Our Paradigm for CSMRI
In this section, we design a theoretically converged deep model to unify various of fundamental factors which affect the performance of CSMRI. Different from most existing deep approaches, all of the fidelity knowledge, datadriven architecture, manual prior and optimality condition are comprehensively taken into consideration in our deep framework. Meanwhile, the theoretical convergence is guaranteed by our optimal checking mechanism. Inspired by recent studies [Liu et al.2017, Liu et al.2018a, Yang et al.2018b], proximal gradient algorithm is utilized in this work to achieve an efficient optimization.
Fidelity Module: Fidelity plays an important role in revealing the intrinsic genesis of problem, which is commonly adopted in traditional handcrafted approaches. Actually, by given a reason manner to solve the fidelity term (i.e., in Eq. (2)), we can generate a rough reconstruction. Rather than introducing additional complex constrains, here we just integrate the current restoration with the fidelity and consider the following subproblem:
(4) 
where denotes the weight parameter and contains a wavelet basis corresponding to an inverse wavelet transform. Eq. (4) provides a balance of and fidelity term, thus an additional benefit is that the restoration can be corrected by the fidelity while is out of the desired descent direction. Benefit of the continuity of the function in Eq. (4), a closed solution can be derived through Eq. (5):
(5) 
Datadriven Module: Inspired by the deep methods which can effectively simulate the data distribution by training numerous paired input and output. A learning based module with deep architecture is taken into account to utilize the implicit data based distribution. Notice that previous work has demonstrated an insufficient sparsity representation in the predefined sparse transform will introduce artifacts during the process[Knoll et al.2011, Liang et al.2011]. Furthermore, the previous fidelity based optimization may also bring in artifacts. Thus a residual learning network with shortcut is adopted as a denoiser. We define this module as
(6) 
where is the parameter of network in the th stage. Notice that here both the input and output are in image domain. Thus the transformation between image and sparse code is incorporated into the network. In the experiments section, we will give a concrete example of the choice for such CNN denoiser.
Optimal Condition Module: Two major arguments arising with datadriven module are: 1) whether the direction generated by deep architecture can satisfy the convergence analysis (i.e., updating along a descent direction); 2) losing the empirical prior, whether the output of network whether trends to search a desired optimality solution.
To clarify above doubts, a checking mechanism based on firstorder optimal condition is proposed to indicate whether the updating by deep architecture is satisfied and which variable should be adopted in the next iterations. First, we introduce a proximal gradient with momentum term to connect the output of datadriven network with the firstorder optimal condition of a constructed minimization energy. Here, we define the momentum proximal gradient^{1}^{1}1. as
(7) 
where is the stepsize and denotes the fidelity term in Eq. (2). Then we establish feedback mechanism by considering the firstorder optimal condition of Eq. (7) as
(8) 
Here, is a positive constant to reveal the tolerance scale of the distance between current solution and the last updating at the th stage. Finally, is also reconsidered when Eq. (8) is not satisfied. Thus, our checking module can be simplified as
(9) 
Prior Module: Manual sparse prior is a widely used component to constrain the desired solution in traditional optimization. It also naturally describes the distribution of MRI. Thus, it makes sense to introduce the sparsity for a better CSMRI restoration. We append a prior module after checking mechanism to pick the penalty term in Eq. (2) up again. Considering the nonconvexity of norm regularization, we solve it by a step of proximal gradient as following:
(10) 
where is the stepsize. In this way, we can enhance the effect of original model and naturally correct oversmooth and preserve more details.
Considering all the aforementioned settlements, we can reconstruct the fullysampled MR data by iteratively solving corresponding subproblems as showed in Alg. 1. We restore the clear MRI by the final estimation .
3 Theoretical Investigations
Different from existing deep optimization strategies [Sun et al.2016, Diamond et al.2017], which discard the convergence guarantee in iterations and almost rely on the distribution of training data. Our scheme not only merges the traditional designed model optimization about fidelity and sparse prior, but also introduces the learnable network architecture into our deep optimization framework. Furthermore, we also provide a effective mechanism to discriminate whether the output of network in current iteration is a desired descent direction. In the following, we will analysis the strict convergence behavior of our method.
To simplify the following derivations, we first rewrite the function in Eq. (2) as
Furthermore, we also state some properties about which are helpful for the convergence analysis as following^{2}^{2}2All the proofs in this paper are presented in [Liu et al.2018d]:
(1) is proper and Lipschitz smooth;
(2) is proper and lower semicontinuous;
(3) satisfies KŁ property and is coercive.
Then some important Propositions are given to illustrate the convergence performance of our approach.
Proposition 1
Let and be the sequences generated by Alg. 1. Suppose that the error condition in our is satisfied. Then there existed a sequence such that
(11) 
where and is the Lipschitz coefficient of .
Proposition 2
If , let and be the sequences generated by a proximal operator in Alg. 1. Then we have
(12) 
Remark 1
Actually, the inequality in Propositions 3 and 4 provide a descent sequence by illustrating the relationship of and as
Thus, the operator is a key criterion to check the output of our designed network whether propagates along a descent direction. Moreover, it also ingeniously builds a bridge to connect the adjacent iteration and .
Theorem 1
Suppose that be a sequence generated by Alg. 1. The following assertions hold.

The square summable of sequence is bounded, i.e.,

The sequence converges to a critical point of .
Remark 2
The second assertion in Theorem 1 implies that is a Cauchy sequence, thus it globally converges to the critical point of .
4 Realworld CSMRI with Rician Noises
It is worth noting that robustness is important in CSMRI. Unfortunately, most strategies only consider the scenario without noise thus they usually fail on realworld cases. To enable our method handle the task of CSMRI with Rician noise, we extend our CSMRI model in Eq. (2) as:
(13) 
where , , , and denotes the fully sampled clear MR image, sparse kspace data, the uncorrelated Gaussian noise in real and imaginary component respectively. and are inverse of the wavelet transform and respectively.
Optimization Strategy: We can rewrite Eq. (13) by splitting it as the following subproblems:
(14a)  
(14b) 
where is the penalty parameter. Thus, we can subsequently tackle each of them with the iterations in Alg. 1 to solve and respectively.
For the subproblem Eq. (14a), which aims to reconstruct the fully sampled noisy MR image from kspace observation , we can optimize it to be similar with the aforementioned CSMRI problem by assuming the fidelity including both CS and noise (i.,e., two square term), respectively. In terms of the second subproblem Eq. (14b), the solution is a clear image. We cannot straightly obtain the closedform solution since the quadratic term . Thus, a learnable strategy is adopted to restore a rough MRI in fidelity module.
Ours 

Sampling Rate  Zerofilling  TV  SIDWT  PBDW  PANO  FDLCP  ADMMNet  BM3DMRI  Ours 
10%  0.0035  2.0098  14.8269  45.2639  27.9626  84.8494  1.2332  10.8877  0.3167 
50%  0.0032  0.7631  5.57611  26.1274  11.1466  83.0681  1.2499  11.5394  0.5633 
Zerofilling  TV  SIDWT  PBDW  PANO  FDLCP  ADMMNet  BM3DMRI  Ours 
Ground Truth (PSNR)  Zerofilling (22.33)  TV (25.22)  SIDWT (25.10)  PBDW (27.39) 
PANO (28.77)  FDLCP (29.78)  ADMMNet (27.91)  BM3DMRI (29.35)  Ours (30.48) 
Learnable Architecture for Rician Noise Removal: It is tricky to learn the fidelity module by a residual network for Rician noise is not additive. To address this trouble, we design two stage networks based on the relationship of and (i.e., and in Eq. (3)). First, we design a residual network to learn from (i.e., ). To release the square operator under , we train the other one by feeding and as input and output.
5 Experimental Results
In this section, we first explore the roles of each module and theoretical results in our paradigm. To demonstrate the superiority of our method, we then compare it with some stateoftheart techniques on both traditional and realworld CSMRI. All experiments are executed on a PC with Intel(R) Gold 6154 CPU @ 3.00GHz 256 GB RAM and a NVIDIA TITAN Xp. Notice that we perform in experiments.
5.1 CSMRI Reconstruction
We first analyze the effects of modules and verify the theoretical convergence by ablation experiments. Then perform comparisons on traditional CSMRI in perspective of reconstruction accuracy, time consuming and robustness.
MRI Data  Zerofilling  TV  SIDWT  PBDW  PANO  FDLCP  ADMMNet  BM3DMRI  Ours 
Chest  22.95  24.43  24.17  25.91  27.73  26.84  25.38  26.27  28.22 
Cardiac  23.40  29.17  27.49  31.34  33.14  33.84  31.42  31.44  35.69 
Renal  24.32  28.77  27.66  31.05  32.21  33.74  31.13  31.01  34.79 
Ground Truth (PSNR)  PANO (27.73)  FDLCP (26.84)  BM3DMRI (26.27)  Ours (28.22) 
Ablation Analysis: First we compare four different combinations of modules in our framework. The first one is to reconstruct directly with the prior module to figure out the role of a manual prior, the second one is to integrate the datadriven module with the fidelity module to explore the effect of data based distribution, and the third choice is combination of these three modules. Adding the optimal condition module, we get the entire paradigm as the last choice. For convenience, we refer them as , , and Ours respectively. We apply these strategies on weighted data using Ridial sampling pattern with 20% sampling rate. The stopping criteria is set as .
As shows in Fig. 1, at the first several iterations, the loss of is slightly larger than that of . Because the input is corrupted with severe artifacts, thus the role of datadriven module is significant at the first several steps. But as process goes on, repeated denoising operation in turn causes oversmoothing. While module can make up for it by incorporating model based knowledge. Though can improve the performance, it cannot ideally converge to a desired solution. The solid line indicates the superiority of Ours over other choices in both convergence rate and reconstruction accuracy. The execution time of , , and Ours is 4.4762s, 3.3240s, 6.2760s and 2.5225s, respectively. As expect, the proposed method provides a much faster reconstruction process. Thus we can verify that our framework has higher efficiency both in terms of theoretical convergence and practical execution time. The visualized results in Fig. 2 also verify that Ours has better performance than others.
Noisy  BM3DMRI  RiceOptVST  Our Denoiser 
MRI Data  Denoiser  Zerofilling  TV  SIDWT  PBDW  PANO  FDLCP  ADMMNet  BM3DMRI  Ours 
weighted  RiceOptVST  24.11  25.51  25.78  26.65  27.08  25.66  26.36  27.36  28.05 
Our Denoiser  24.62  25.71  25.94  26.83  27.30  25.59  26.49  27.64  
weighted  RiceOptVST  25.40  27.83  28.37  29.13  29.22  27.55  29.14  29.67  30.77 
Our Denoiser  26.36  28.20  28.68  29.46  29.64  27.44  29.45  30.34  
Ground Truth (PSNR)  FDLCP (25.72)  ADMMNet (26.29)  BM3DMRI (27.41)  Ours (28.42) 
Comparisons with Existing Methods: In this section, we compare with three traditional CSMRI approaches including Zerofilling [Bernstein, Fain, and Riederer2001], TV [Lustig, Donoho, and Pauly2007] and SIDWT [Baraniuk2007], and five stateoftheart like PBDW [Qu et al.2012], PANO [Qu et al.2014], FDLCP [Zhan et al.2016], ADMMNet [Sun et al.2016] and BM3DMRI [Eksioglu2016]. 25 weighted MRI data and 25 weighted MRI data are randomly chosen from 50 subjects in IXI datasets^{3}^{3}3http://braindevelopment.org/ixidataset/ as the testing data. In the experiment process, the parameter in module is set as 5 and the noise level of network in module ranges from 3.0 to 50.0. Parameter L (), and p in modules and are set as 1.1, 0.00001 and 0.8, respectively. The number of total iterations is 50. As for the parameters of comparative approaches, we adopt the most proper settings as suggested in their papers for fairness.
First, we test on 25 Tweighted MRI data using three different undersampling patterns with a fixed 10% sampling rate. Fig. 3 shows the quantitative results (PSNR). Our method performances best for all three cases and has stronger stability compared with the second best method on variance. As for the effect of sampling ratios variation, we use radial mask under 10%, 30% and 50% sampling rates with evaluation of RLNE and MSE. Fig. 4 shows that our method has the lowest reconstruction error for all sampling rates. For more intuitive comparison, we illustrate the reconstruction error in term of pixels in Fig. 5. We also offer the qualitative comparison in Fig. 6. Visualized results demonstrate our method has better performance in both artifacts removing and details restoration. Time consuming is also considered. We compare our method with others on the 25 Tweighted data using Radial mask with 10% and 50% sampling rate. Notice that ADMMNet and ours are tested on GPU for the incorporation of deep architecture. Tab. 1 shows that our method provides an efficient reconstruction process and comes to the fastest method among the stateoftheart competitors.
To demonstrate the robustness of our approach, we first apply it on various MRI data including the chest, cardiac and renal [Yazdanpanah and Regentova2017]. In Tab. 2, Our proposed framework gives the highest PSNR for all of the tree types of MR images. Fig. 7 visualizes the corresponding results for chest data. we can see that our approach prevails over others in detail restoration at the junction of blood vessels as well as noise removal in the background. Actually, our method has a stronger ability to handle slight noise because of the subprocess of learning based optimization with deep prior. To demonstrate that, we add Rician noise at level of 20 to 25 Tweighted MRI and 25 Tweighted MRI to generate the noisy data. As what is shown in Fig. 8, our method over leads all the competitors by a large margin when the input is corrupted with Rician noise.
Denoiser  Denoiser  Denoiser  Our Denoiser  
Denoising  35.15  27.69  29.78  35.06 
Reconstruction  17.44  19.25  24.41  28.71 
5.2 Realworld CSMRI
We further explore the performance of our approach on realworld CSMRI with Rician noise and the parameters , and in Eq. (13a) and Eq. (13b) are set as 0.01, 1.0 and 1.0, respectively.
Rician Network Behavior:
In the learnable architecture, the first stage is to get and the second stage is to obtain the desired noisefree data in Eq. (3). At the training stage, we generate Rician noisy input data with using 500 weighted MR images randomly picked from the MICCAI 2013 grand challenge dataset^{4}^{4}4http://masiweb.vuse.vanderbilt.edu/workshop2013/index.php
/Segmentation_Challenge_Details.
To verify the effectiveness of our learnable Rician network, we offer some other possible ways to obtain the from through deep learning. Denoiser directly learns the difference between and . Denoiser learns the Gaussian noise existing in the real and imaginary parts separately. Denoiser treats the Rician noise as a Gaussian one. For all the four types of denoisers, we use the same network architecture as IRCNN [Zhang et al.2017]. Tab. 4 shows that Denoiser gives comparable performance in denoising, but performs the worst for realworld CSMRI. On the contrary, our learnable architecture gives a much better result than other methods. It is because the Rician noise is not additive noise. Directly estimation from the difference of and may cause error especially when the noise in the background is large. We also compare with the classical Rician denoising technique RicieOptVST^{5}^{5}5http://www.cs.tut.fi/ foi/RiceOptVST/#ref_software [Foi2011] to evaluate our learnable architecture. Fig. 9 shows our learnable architecture has a better performance in removing the noise on background, indicating that we can also take the learnable architecture to address pure Rician denoising issues.
Benchmark: We then compare our method with other CSMRI techniques on the task of CSMRI with noise. The weighted and weighted MRI data in IXI dataset are adopted as test benchmark. Since the compared methods don’t have mechanism to handle Rician noise, we separately assign a classical Rician noise remover RicieOptVST and our learnable architecture for them to execute the denoising after CS reconstruction. As shows in Tab. 3, our CSMRI framework has superiority against others based on both RicieOptVST and our network. Furthermore, the choice of taking our learnable architecture as the denoiser performs better than that of taking RiceOptVST and the last column shows that the proposed framework surpasses all the combinations. In Fig. 10, we can have a more intuitive understanding to the reconstruction comparison. More details are preserved in our framework than competitive approaches.
6 Conclusions
We propose a theoretically converged deep optimization framework to efficiently solve the nonconvex and nonsmooth CSMRI model. Our framework can take advantage of fidelity, prior, datadriven architecture and optimal condition to guarantee the iterative variables converge to critical point of the specific model. For realworld CSMRI with Rician noise, a learning based architecture is proposed for Rician noise removal. Experiments demonstrate that the our framework is robust and superior than others.
7 Acknowledgments
This work is partially supported by the National Natural Science Foundation of China (Nos. 61672125, 61733002, 61572096 and 61632019), and the Fundamental Research Funds for the Central Universities.
8 Supplemental Materials
To simplify the following derivations, we first rewrite the function in Eq. (2) as
We first provide detailed explanations about the properties of , and .

is proper if is nonempty.

is Lipschitz smooth if is differentiable and there exists such that
If f is Lipschitz smooth, we have the following inequality

is lower semicontinuous if at any point .

is coercive if is bounded from below and if , where is the norm.

is is said to have the KurdykaŁojasiewicz (KŁ) property at if there exist , a neighborhood of and a desingularizing function which satisfies (1) is continuous at and ; (2) is concave and on ; (3) for all , such that for all
the following inequality holds
Moreover, if satisfies the KŁ property at each point of then is called a KŁ function.
Proposition 3
Let and be the sequences generated by Alg.1. Suppose that the error condition in our is satisfied. Then there existed a sequence such that
(15) 
where and is the Lipschitz coefficient of .
Proof 8.1
In our stage, we have
if the error condition is satisfied. Thus, by the definition of the proximal operator we get
(16)  
Hence in particular, taking and respectively, we obtain
(17) 
Invoking the Lipschitz smooth property of , we also have
(18) 
Combing Eqs. (17) and (18), we obtain
The last inequality holds under the assumption in the th iteration. So far, this prove that the inequality (3) in Proposition 3 holds.
Proposition 4
If , let and be the sequences generated by a proximal operator in Alg.1. Then we have
(19) 
Proof 8.2
Theorem 2
Suppose that be a sequence generated by Alg. 1. The following assertions hold.

The square summable of sequence is bounded, i.e.,

The sequence converges to a critical point of .
Proof 8.3
We first verify that square summable of is bounded. From Propositions 3 and 4, we deduce the
is established. It follows that both sequences and are nonincreasing. Then, since both and are proper, we have is bounded and that the objective sequences and converge to the same value , i.e.,
Moreover, using the assumption is coercive, we have that both and are bounded and thus have accumulation points. Considering Eq. (19) and the relationship of and , we get for any ,
(21) 
Summing over , we further have
(22) 
So far, the first assertion holds.
Next, we prove is a critical point of . Eq. (22) implies that when , i.e., there exist subsequences and such that they share the same accumulation point as . Then by Eq. (20), we have
Let and , then by taking on the above inequality, we have
What’s more, we also get since is lower semicontinuous. Thus we have . Considering the continuity of , we have Thus, we obtain
(23) 
By the firstorder optimality condition of Eq. (20) and , we deduce
Hence we get
(24) 
Using the subdifferential of and Eqs. (23), (24), we finally deduce that , which means that is subsequence convergence.
Furthermore, we will prove that is sequence convergence. Since is a KŁ function, we have
From Eq. (24) we get that
On the other hand, from the concavity of and Eq. (21) and (24) we have that
For convenience, we define for all and the following quantities
and
These deduce that
Summing up the above inequality for yields
where the last inequality holds under the fact
Comments
There are no comments yet.