Denoising and deblurring have numerous applications in communications, control, machine learning, and many other fields of engineering and science. Restoration of distorted images is, from the theoretical, as well as from the practical point of view, one of the most interesting and important problems of image processing. One special case is the blurring, due, for instance, to incorrect focus and/or blurring due to movement, or added Gaussian noise (a Gaussian blur).
A mathematical model for the process of blurred images can be expressed as follows. Let be a two-dimensional index set representing the image domain, be the original image, be the observed image, and be a linear blurring operator. Then, the blurred image can be written  as
is an unknown additive noise vector. In this paper, the blurring operatoris assumed to be known, otherwise, one will deal with the blind image deblurring problem , in which also needs to be solved.
In image processing one typically aims at recovering an image from noisy data while still keeping edges in the image, and this goal is the main reason of the tremendous success of the Total Variation (TV) regularization  for solving the deblurring problem (although other methods are also used). The TV method can be presented as
being a parameter and where is the gradient operator, , and the norms , .
In most situations, rather than directly minimizing the support of the image, one is interested in minimizing the support of the gradient of the recovered image. In most references, the convex methodology is considered [4, 5, 6], but in recent years, some nonconvex methods have been developed [7, 8, 9]. The use of a suitable nonconvex and nondifferentiable function allows possibly a smaller number of measurements than the convex one in compressed sensing . In  the authors showed that nonconvex regularization terms in total variation-based image restoration yields even better edge preservation when compared to the convex-type regularization. Moreover, they showed that it seems to be also more robust with respect to noise. Nonconvex regularization in image restoration poses significant challenges on the existence of solutions of associated minimization problems and on the development of efficient solution algorithms.
The main difference between the convex and nonconvex methods is replacing the -norm of the variational term by the nonconvex and nondifferentiable function that uses the nonconvex regulation function , and that we refer to as the semi-norm (). Therefore, the general nonconvex deblurring model is presented as
Many efficient numerical algorithms have been developed for solving the TV regularization problem. One of the most efficient methods for the convex problem (2) is the Alternating Direction Method of Multipliers (ADMM) algorithm [11, 12, 13]. In the general case (convex and nonconvex cases depend of the function ), the method is constructed by introducing an auxiliary variable , which actually represents , to reformulate (3) into a composite optimization problem with linear constraints. The augmented Lagrange dual function is then
where is a parameter, and the norm . If , we use the notation for representing . Now, the standard convex ADMM method () for the deblurring problem can be presented as
The earlier analyses of convergence and performance of the ADMM algorithms directly depended on the existing results of ADMM framework [14, 15, 16, 17, 18]. More recently, motivated by acceleration techniques proposed in , inertial algorithms have been proposed for many areas such as (distributed) optimization and imaging sciences in references [20, 21, 22, 23, 24]. The ideas of the inertial strategy have been also applied to ADMM in [4, 25]; and under several assumptions in convex case, some convergence results are proved in those articles. As the nonconvex penalty functions perform more efficiently in some applications, as above commented, nonconvex ADMM has been also developed and studied [26, 27, 28, 29, 30, 31, 32, 33, 34]. The main goal of this paper is to propose a new algorithm that combines the nonconvex methods and the inertial strategy organically.
In this paper, when is nonconvex, we consider a new inertial scheme for the image deblurring model (3). One of the main differences (and new difficulties) with the convex ADMM, is that in order to properly define the nonconvex ADMM some extra assumptions are needed to prove the convergence. First, at least one of the objective functions has to be smooth. And more, matrix corresponding to the smooth function is required to be injective, i.e., reversible. Thus, a direct application of the ADMM scheme to the image deblurring model cannot guarantee the convergence because the operator fails to be injective (although the numerical performance may be good in some cases). Considering this, we first modify the model (3), and then we develop the new nonconvex inertial ADMM. By using the Kurdyka-Łojasiewicz property, we prove the convergence of the new algorithm under several requirements on the parameters. In opposite to the convex case, selecting a suitable parameter is crucial to obtain the convergence of the new algorithm. In order to make the method more useful, we provide a probabilistic strategy for selecting a suitable .
The rest of the paper is organized as follows. In Section II we collect some mathematical preliminaries needed for the convergence analysis. Section III presents the details for the new algorithm (inertial alternating minimization algorithm, IADMM) including the schemes and parameters. In Section IV, we prove the convergence of the new algorithm. Section V reports the numerical results and compares the algorithm with convex and nonconvex ADMM. Section VI gives some conclusions. Finally, we provide in the Appendixes all the detailed proofs of the proposed results.
Ii Mathematical tools
In this section we present the definitions and basic properties of the subdifferentials and the Kurdyka-Łojasiewicz functions used later in the convergence analysis. The basic notations used in this paper are detailed in Table I.
|stands for or ( norm)|
|stands for the Kronecker product|
|stands for the function class whose derivatives are continuous|
|for a matrix A, rank(A) stands for its rank,|
stands for the minimum eigenvalue of
Let be a proper and lower semicontinuous function. The (limiting) subdifferential, or simply the subdifferential, of at , written as , is defined as
It is easy to verify that the Fréchet subdifferential is convex and closed while the subdifferential is closed. When is convex, the definition agrees with the subgradient in convex analysis . We call is strongly convex with constant if for any and any , it holds that And is called as -gradient continuous (Lipschitz) with constant if Noting the closedness of the subdifferential, we have the following simple proposition.
If , and , then we have
A necessary condition for to be a minimizer of is
which is also sufficient when is convex. A point that satisfies (6) is called (limiting) critical point. The set of critical points of is denoted by .
With these basics, we can easily obtain the following proposition.
If is a critical point of whose definition given in (III), it must hold that
Finally, the proximal map of is defined as
Note that can be nonconvex. If is convex, is a point-to-point operator; otherwise, it may be point-to-set.
Ii-B Kurdyka-Łojasiewicz property
In this paper the convergence analysis is based on the Kurdyka-Łojasiewicz functions, originated in the seminal works of Łojasiewicz  and Kurdyka . This kind of functions has played a key role in several recent convergence results on nonconvex minimization problems and they are ubiquitous in applications.
(a) The function is said to have the Kurdyka-Łojasiewicz property at if there exist , a neighborhood of and a continuous concave function such that
is on .
For all , .
For all in , the Kurdyka-Łojasiewicz inequality holds
(b) Proper lower semicontinuous functions which satisfy the Kurdyka-Łojasiewicz inequality at each point of are called KŁ functions.
There are large sets of functions that are KŁ functions .
Lemma 1 ()
Let be a proper lower semi-continuous function and be a compact set. If is a constant on and satisfies the KŁ property at each point on , then there exists a concave function satisfying the four properties given in Definition 3, and constants such that for any and any satisfying that and , it holds that
Iii Nonconvex IADMM algorithm
In this section we introduce the new extended Inertial Alternating Direction Method of Multipliers (ADMM) algorithm for nonconvex functions.
In this paper, we consider (equivalent to space ) as the two-dimensional index set representing the image domain. In this case, the image variable constrained on is actually a matrix. We use the symbol to present its vectorization (a vector of all the columns of the image variable). And then the original total variation operator then enjoys the following form
the identity matrix andthe banded matrix
If we directly apply the inertial ADMM, the convergence is hard to be proved as fails to be injective. Therefore, we need to modify the image deblurring model (3). To that goal, we define
Obviously, we have . Noting that
and thus, is injective. The following technical lemma focuses on giving a lower bound for the operator .
For any , it holds that
Then, the image deblurring model (3) is equivalent to
Instead, we consider its extended penalty form
where is a large weight parameter. Therefore, we apply the nonconvex inertial ADMM to
where and . This leads us to define the function
Inertial methods have witnessed great success in convex ADMM and nonconvex first-order algorithms. In the nonconvex optimization community, the inertial style ADMM has never been proposed and analyzed. The convex inertial ADMM has been proposed in , in which one first uses the “inertial method” to refresh the current sequence with last iteration, and then performs the ADMM scheme with the updated variables. However, the direct extension of convex ADMM is not allowed in the nonconvex settings. This is because without convexity, several descents are heavily dependent on the continuities of the functions, which
may fail to obey. And the difference of function values at two different points is hard to estimate, which leads to troubles in the convergence proof. Thus, in the updating of, we used rather than the updated one. And the nonconvex IADMM scheme proposed in this paper is defined as follows
where is a free parameter chosen by the user. Actually, if , the algorithm then will reduce to basic ADMM.
Now we can focus on rewriting the inertial scheme for the image deblurring model (17). First, we rearrange the minimization for ,
being the proximal map of (8). For a matrix , and indices ,
The scheme for updating can be rewritten as
That is also
Assumption 1: We assume that , where . And the minimum single value is given as .
This hypothesis also indicates that the matrix is reversible. Note that the rank of is . Then, the assumed hypothesis is easy to be satisfied.
We remark that is the minimizer of , which is strongly convex with . If we set , then we have
where we used the fact when .
The following problem is what exactly is. In a real situation, the dimensions of are large, and so, a direct calculation leads to a large computational cost. Therefore, we provide a probabilistic method to estimate a suitable value of . If is reversible, it is easy to see that . Then, if we obtain a bound , we then have . To that goal we employ a Lemma proposed in :
Lemma 3 (Lemma 4.1, )
For a fixed positive integer and a real number , and given an independent family of standard Gaussian vectors, we have that
with probability at least
with probability at least.
Note that for computing we just need several FFT and inverse FFT. Therefore, its computational cost is low (), and the estimation of is very fast.
Iv Convergence analysis
This section consists of two parts and provides a complete analysis of the convergence problem of the nonconvex IADMM algorithm. The first subsection contains the main convergence results, the proof sketch, the difficulties in the proof and theoretical contributions. While the second subsection introduces the necessary technical lemmas. Assumption 1 holds through this section.
Iv-a Main results
Theorem 1 (Stationary point convergence)
Assume that the free parameter satisfies the condition
with , then any cluster point is also a critical point of .
Theorem 1 describes the stationary point convergence result for the IADMM method, which is free of using the KŁ property of the functions. If the KŁ property is further assumed, the sequence convergence can be proved giving us the Theorem 2.
Theorem 2 (Sequence convergence)
The proof can be divided into two parts, and in order to help the reader we first give a brief sketch of the proof:
I. In the first part we introduce an auxiliary sequence , where are composite points from . An auxiliary function is also proposed. In Lemma 5, we prove a “sufficient descent condition” of the values of at , i.e.,
where is a positive constant.
II. We prove a “relative error condition” of , i.e., there exists such that
where is a positive constant. Note that this condition is different from the “real” relative error condition proposed in .
The major difficulty in deriving these two conditions is the use of inertial terms, with which the descent values are lower bounded by and rather than and . Similarly, the relative error is bounded by and . The relative error can be expanded by triangle inequalities, which is relatively proved. However, for the sufficient descent, the use of the triangle inequalities is much more difficult and technical for the lower boundedness.
The theoretical contributions in this paper are two-fold. The first one is, of course, dealing with the difference caused by the inertial terms. This part also includes how to design the scheme of the algorithm, whose details have been presented in previous section. The second one is to determine the parameters for IADMM applied to the image deblurring.
The main results can be proved with the following lemmas.
Iv-B Technical lemmas
Now we provide the main technical lemma that states the descent condition for a suitable function of the sequences of the Algorithm 1.
Based on Lemma 5, it is important to guarantee that the condition (23) can be satisfied. This fact can be reached if is large enough. Fortunately, in the Algorithm 1 the parameter can be fixed by the user. Thus, the parameter shall be chosen large enough to guarantee the convergence considering condition (23).
If the nonconvex regulation function is coercive and
the sequence is bounded.
Let the sequence be generated by Algorithm 1. Then, for any , there exists and such that
Now, we recall a definition about the limit point set introduced in , which denotes the set of all the stationary points generated by the nonconvex IADMM. The specific mathematical definition of is given as follows.
Let be generated by the nonconvex IADMM . We define the set by
In this section, we illustrate the effectiveness of the proposed algorithm on different numerical blurred images with Gaussian blur.
All the programs have been written entirely in C++, and all the experiments are implemented under Linux running on a desktop computer with an Intel Core i5-2400S CPU (2.5 GHz) and 4 GB Memory. The FFT subroutines used in the algorithms are taken from the fftw-3111http://www.fftw.org/ library. As test problems we have selected twelve images (see Figure 1), which include seven images from USC-SIPI image database222http://sipi.usc.edu/database/, two classical test images (Lena and cameraman), one text image from “El Quijote” book and two medical images. In order to obtain the blurred images, we use, as it is common in literature, the blurring operator generated using a convolution with Gaussian kernel (KernelSize , KernelMu , KernelSigma ) and circular mapping on the edges of the image.
deblurring results for ,
|IADMM ()||IADMM ()||ADMM(TV1)|
deblurring results for ,
|IADMM ()||IADMM ()||ADMM(TV1)|
deblurring results for ,
|IADMM ()||IADMM ()||ADMM(TV1)|
The proposed algorithm IADMM (Algorithm 1) is compared with the widely used augmented Lagrangian methods (ADMM ) for image deblurring. We mainly consider two models, i.e., and . We call them as TV1 and TV(1/2) methods, respectively. Note that TV1 is a convex method, while TV(1/2) is nonconvex. In the tests we have considered (unless so indicated) for all the methods the value of the penalty parameter (a small one) and/or (a large one) just to see the behaviour of the IADMM algorithm.
The performance of the deblurring algorithms is quantitatively measured by means of the objective function (Equations 2 or 3), the Real Error as of the difference between the original and deblurred images, the signal-to-noise ratio (SNR) 
where and denote the original image and the restored image, respectively, and represents the mean of the original image , and the residual ( and in the standard and inertial versions, respectively) as described in :
In the tests we do not provide CPU tests as all the algorithms have a very similar computation cost per iteration (mainly from the FFT routines). Therefore, there is almost no difference between pictures showing iterations or CPU cost, and the consumed CPU is basically proportional to the respective number of iterations.
On our first test we use the brain M0 image with a low value of and we show in Figure 2 that the performances of the ADMM TV1 and TV(1/2) are quite similar in error and SNR. Therefore, in the rest of comparisons we will just consider the TV1 method. For the IADMM method the inertial parameter in Algorithm 1 is investigated firstly by using two different values ( and ). The main difference observed in these tests is that the nonconvex method is more unstable once reached the maximum precision (at this point the ADMM TV1 convex method seems to be the more stable with a quite smooth behaviour). The fastest convergence is observed using the IADMM (with the largest stepsize value ) method, but when the maximum precision is obtained unstable behaviour appears. Therefore, using mainly the information provided by the residual (Eq. (37)), we provide a stop control criterion (in the same spirit as ) that stops the iterative process at the black points of Figure 2 (d) in the tests. That is, we stop when the following condition is hold
In the first case the convergence till the desired tolerance error is obtained, while in the second case the algorithm has reached its stability limit and the residual grows, behaving later in an unstable way. Note that the residual is used in the stop criterion, as it uses known data from the iterations (and which does not depend on the original image that is unknown). We remark that the use of stop control techniques avoids the use of unnecessary iterations, and also to stop at the limit accuracy of the method. Also, from the pictures we observe that the IADMM algorithm provides enough precision in a lower number of iterations. The larger means faster method, but at the price of a more unstable method as it can be seen on Fig. 2(d). On that picture we observe that when the residual begins to behave chaotically, with sudden increases, it means that it is advisable to stop the iterative process as considered in the stop criterion (38) (black dot points on Fig. 2(d)). With that criterion, the IADMM method seems to be an interesting option for fast deblurring problems.
To observe more clearly the influence of the parameter in Algorithm 1 we perform several tests on the brain M0 image on Figure 3 for values and . Note that this parameter plays a role similar to the stepsize (as it also occurs to the parameter ), as it controls the perturbation at each step. A large value will provide, when the method works, a quite fast method, but on the other hand it makes the method more unstable. In fact, from the plot 3(b) we observe that in this case it would be optimal to use the parameter value in combination with the stop criterion, giving the maximum precision in just 28 iterations. Besides, it is shown that after the values selected by the stop criterion the residual begins to oscillate among values that provides similar error but that generates an unstable behaviour giving rise to an increment of the error in subsequent iterations (this instability is delayed when the parameter decreases, what is expected because the increment is smaller, as the vertical lines connecting the error and residual plots show).
The influence of the penalty parameter is also quite relevant, but a detailed analysis is out of the scope of this paper. On the Figure 4 we show the evolution of the residual using two values of and several values of the parameter and . We observe that low values of the penalty parameter gives a lower residual, but the error is lower for large providing a faster convergence, and it has a big effect on the empirical performance of the methods as shown in , but it remains to study optimal combinations of the parameters and and suitable criteria for an automatic selection (this will be part of the next steps in our study of these methods).
On Figure 5, we present the original medium resolution brain M0 image, the blurred one using, as indicated, a convolution with Gaussian kernel, and the results of the IADMM deblurring images using two error tolerances (, ) in the stop criterion (38). We can see that in both cases the quality of the recovered image is visually good.
deblurring results for ,
|IADMM ()||IADMM ()||ADMM(TV1)|
deblurring results for on M0 image
|IADMM ()||IADMM ()||ADMM(TV1)|