I Introduction
In the last decade, some very effective frameworks for image restoration have been proposed that (a) exploit longdistance correlations in natural images, and (b) use patches instead of pixels to robustly compare photometric similarities. This includes the classical NonLocal Means (NLM) algorithm [1], and the more sophisticated BM3D algorithm [2]. The latter combines the NLM framework with other classical algorithms, and is widely considered as the stateoftheart in image denoising. We refer the reader to [3] for a comprehensive review of patchbased algorithms.
Let be some linear indexing of the input noisy image. In NLM, the restored image is computed using the simple formula
(1) 
Here, is some weight (affinity) assigned to pixels and , and is some nonlocal (sufficiently large) neighborhood of pixel over which the averaging is performed [1]. In particular, for a given pixel , let denote the restriction of to a square window around . Letting be the length of this window, this associates every pixel with a point in (the patch space). The weights in NLM are set to be , where is the Euclidean distance between and , and is a smoothing parameter.
Recently, it was demonstrated in [4, 10] that the robustness of the NonLocal Means (NLM) algorithm [1] can be improved by incorporating regression into the NLM framework. The idea was to fix some , and consider the following unconstrained optimization on the patch space:
(2) 
where are the weights used in NLM (one could also use other weights, e.g., see [9]). The denoised pixel was then set to be the center pixel in . Note that this reduces to the simple formula in (1) when . In this case, the optimization is performed pixelwise. For any other value of , the optimization in (2) becomes a generic optimization on the patch space – the regression needs to be performed on patches and not just pixels. In particular, when
, the resulting estimator turns out to be more robust to “outliers” in the patch space (compared to standard NLM), and this leads to significant improvement in the denoising quality. We refer the reader to
[10] for an intuitive understanding of the robustness in NLPR, and for denoising results on synthetic and natural images.Note that we can generally write the optimization problem in (2) as
(3) 
where are given points in , and are some positive weights. Motivated by the work on algorithms for minimization in [5, 6], the authors in [10] proposed to optimize (3) using iteratively reweighted leastsquares (IRLS). Fixing some small , and initializing as the NLM output, the update rule was set to be
(4) 
where
(5) 
We refer the reader to [10]
for the heuristics behind the IRLS update in (
4). Extensive numerical experiments show us the that the algorithm is globally convergent (for arbitrary ) in the convex regime , and locally convergent (fails very rarely compared to, say, the gradient or Newton method) in the nonconvex regime , provided that is set as the NLM estimate. The former observation can be explained using the existing literature on the Weiszfeld algorithm [7, 8], which is perhaps less wellknown in the signal processing community. In this letter, we adapt the “majorizeminimize” framework introduced in [8] to specifically analyze (4) when . The analysis automatically covers the case . This is the content of Section II. In particular, we will show that the algorithm forces the cost to be nonincreasing as the iteration progresses, and that it exhibits linear convergence both in the convex and nonconvex (when convergence does occur) regime. We note that in [6], the authors have done an analysis of a related (but more complex) IRLS algorithm in the nonconvex regime, but the analysis is quite involved. The present analysis is much more simple, and is based on elementary results from smooth unconstrained optimization.Ii Convergence Analysis
The key question is what is the cost associated with the IRLS iterations in (4)? This must be resolved even before we ask questions about convergence. It turns out that the iterations corresponds to a regularized version of the original cost (3). This is given by
(6) 
where is the regularized version of the Euclidean norm,
Note that corresponds to the original cost (3) when . It can be argued that the minimizers of converge to the minimizer of the original problem as tends to zero. In other words, for sufficiently small , the minimizer of is close to that of the original problem. Henceforth, to simplify notation, we fix some small and denote by . We note that in [10], the authors proposed to start with, say, , and then gradually shrink it to zero as the iteration progresses. While this does tend to speed up the convergence, the associated analysis becomes quite complicated.
The advantage we get by considering the regularized problem is that the function is smooth (infinitely differentiable). This allows us to use the powerful tools of smooth optimization. Moreover, inherits the convex nature of the original problem, namely, that it is strictly convex for . Since is smooth, it suffices to show that its Hessian is positive definite. In fact, the gradient (denoted by ) and the Hessian (denoted by ) are given by
and
Here,
is the identity matrix of size
. For any nonzero ,Since , the quadratic form is strictly larger than
This is nonnegative if and only if . Therefore, is strictly convex in this case, and has a unique global minimizer for which . On the other hand, it is not difficult to see that need not be convex when . The best we can hope for in this case is that the iterates in (4) converge to some local stationary point. In fact, we can show that
Theorem 1
The IRLS update in (4) guarantees the following:

For , the sequence is strictly monotonic, for all .

When , converges linearly to the unique global minimizer of .

For , under some additional assumptions, the convergence is again linear and the limit of is a stationary point of .
By linear convergence, we mean that the convergence happens at an exponential rate. The relaxation property is particularly important for the nonconvex setting, providing the guarantee that the cost at the end of the iterations is less than that obtained from the initial estimate . In optimization literature, one calls a relaxation sequence if . Since is trivially bounded below, this implies convergence of the sequence . As we will see, the iterates in (4) unconditionally generate a relaxation sequence in both the convex and nonconvex regime. This property turns out to be a central ingredient in the guarantees provided by Theorem 1. We note that the relaxation property was recently observed in [9] for the special case . However, we remark that the fact that the convergence of is not sufficient to guarantee convergence of , as claimed in [9]. For example, it is possible that keeps oscillating, or escapes to infinity, while ensuring that . One of the cases can however be ruled out immediately:
Proposition 2
For , the IRLS iterates are bounded (do not escape to infinity).
This is a simple consequence of the observation that as . Indeed, if does escape to infinity, then we would have a contradiction since we just showed that is bounded.
In the convex regime, we will show that oscillations can also be ruled out. To do so in the nonconvex regime, we will need additional assumptions.
Iia MajorizeMinimize interpretation
To establish Theorem 1, we will use the majorizeminimize (MM) framework from [8]. In this framework, the key idea is to globally approximate from above using a sequence of quadratic functions. More precisely, after having found , we construct a function such that

for all .

.

has in (4) as its unique global minimizer.
Once we have with the above properties, we immediately see that
That is, we are guaranteed that is a relaxation sequence. We now need to specify .
Proposition 3
Note that the linear part of is simply the linear approximation of at , and the quadratic form is derived from the dominant part of . By construction, . Moreover, it is clear that is strictly convex (for all ), and has a global unique minimizer. Setting to be this minimizer, we have
(8) 
Substituting for , we get the update rule in (4).
To complete the proof, we need to show that . Note that we can write as
We substitute the following above:
and
This allows us to simplify the expression to
where and . It can be verified that each term in the sum is nonnegative for any (for and this is obvious). This shows that , concluding the proof of (7).
IiB Global and local convergence
Since the sequence is monotonic and bounded below, it is convergent. In particular, as . So, what can we say about the sequence ?
Proposition 4
We claim that as gets large.
To do so, we use (8),
and the majorizing property,
Combining these, we see that
Now, from (5) we have the trivial bound . We can then write
where
Since is convergent, we arrive at our claim.
Note that cannot directly conclude from Proposition 4 that is convergent. However, since is bounded, it has convergent subsequences (by compactness). In the convex regime, we can say something more:
Proposition 5
For , every convergent subsequence has the same limit, and this limit is the unique stationary point of . In particular, it is necessary that converges to
We now establish the above claim. Let be a subsequences that converges to . We know that , so that the limit of both and must be . Note that
Since both and are smooth, letting , we have
That is, is a stationary point of . In the convex regime , we know that has a unique stationary point . Therefore, every convergent subsequence of must have the same limit .
Proposition 6
For , the above claim is true only under certain local assumptions.
The problem in this case is that there can be multiple stationary points of , and the previous argument breaks down (as is typical with nonconvex problems). All we know in this case is that is bounded, and that the entire can be restricted to a ball of radius around . Suppose we assume that that the initialization
is “good”, in that it is situated sufficiently close to a local (probably global) minimizer
. It is then possible that is small enough and contains no other stationary points of . In this case, we are guaranteed that every convergent subsequence, and hence the whole sequence , converges to .IiC Convergence rate
Plots of versus for (4) suggests a linear trend both for the convex and nonconvex cases (assuming convergence for the latter). For the convex regime, we can indeed guarantee that
Proposition 7
For ,
In other words, the convergence is exponential, where the convergence rate is controlled by . To establish our claim, we being by comparing at the points and the linear combination (we will define later),
The first inequality follows from majorization, and the second from the optimality of . From (7), we can write as
On the other hand,
(9) 
Using this to eliminate the term containing , we can write as
Now, set
(10) 
Then , and hence
By construction, for all . We are done if we can show that for some constant . By Taylor’s theorem,
(11) 
where is some point on the segment joining and . Plugging (9) and (11) in (10),
where
denotes the smallest eigenvalue of the matrix
. Now, since is continuous, approaches as . Moreover, is a (local) minimizer. Hence, is necessarily positive semidefinite, that is, . This is true for . Therefore, for all sufficiently large , , whereand where .
For the convex regime , is guaranteed to be positive definite, so that . This completes the proof of Proposition 7.
For the nonconvex setting, the above argument holds under additional assumptions:
Proposition 8
For , assume that converges to the local minimizer , and that is positive definite. Then converges linearly to .
Iii Discussion
We note that it is rather difficult to extend the analysis in [9] to the nonconvex setting . Moreover, it is not possible to estimate the convergence rate using the analysis in [9]. What their paper shows is that the cost is nonincreasing for the case . One cannot study the behavior of the iterates using this result alone. For example, as our analysis shows, while the cost is nonincreasing when , it is not sufficient to guarantee convergence. We also note that the technique of proof used in [7] and [8] are quite different from that used is our paper. The basic idea of “majorizeminimize” is the same, but the mathematical ideas used in the present paper are different. Voss, for example, uses the sophisticated KKT theory (on polytopes) in his analysis. As against this, our analysis uses elementary ideas from smooth unconstrained optimization. Moreover, it is not obvious how the analysis in [7] and [8] can be extended to the nonconvex regime .
The linear convergence of IRLS is typical of firstorder methods. We have also tried secondorder Newton methods for optimizing (6), which are guaranteed to exhibit quadratic convergence (locally). Indeed, Newton methods typically require less than half the number of iterations needed by the updates in (4) to reach a given accuracy. However, the cost of a single Newton step (computation of and its inversion) is significantly more than that of the simple update in (4). As a result, the total execution time of IRLS turns out to be smaller than that of Newton methods. The other point that we noticed from the numerical simulations is that IRLS is much more stable than Newton (or gradient descent) methods in the nonconvex regime. In particular, the iterates in the Newton method often diverge to infinity if the initialization is not “close” to a local minimum. However, the IRLS iterates never escape to infinity (this is clear from Theorem 1), and almost always converge to the global optimum when we initialize using the NLM output. In rare cases when (4) gets “stuck” in a local minimum (say, due to bad initialization), Newton methods are found to have the same problem. It would be interesting to see if one could give a more accurate analysis of the IRLS algorithm in the nonconvex regime.
Finally, we note that the analysis in this letter are about the convergence properties of the NLPR algorithm in [10], and not about its denoising performance. These are two entirely different aspects of the algorithm. In [10], empirical results about the performance of NLPR were reported, but the issue of convergence was not studied. We are currently investigating the “optimality” of the denoising performance of NLPR and possible scope for improvements, which will be reported in a future correspondence.
Iv Acknowledgments
The author would like to thank Prof. Gilad Lerman and Prof. Amit Singer for interesting discussions.
References
 [1] A. Buades, B. Coll, J. M. Morel, “A review of image denoising algorithms, with a new one,” Multiscale Modeling and Simulation, vol. 4, pp. 490530, 2005.
 [2] K. Dabov, A. Foi, V. Katkovnik, K. Egiazarian, “Image denoising by sparse 3D transformdomain collaborative filtering,” IEEE Transactions on Image Processing, vol. 16, pp. 20802095, 2007.
 [3] P. Milanfar, “A tour of modern image filtering”, IEEE Signal Processing Magazine, vol. 30, no. 1, 2013.
 [4] K. N. Chaudhury, A. Singer, ”Nonlocal Euclidean medians,” IEEE Signal Processing Letters, vol. 19, no. 11, pp. 745  748, 2012.
 [5] R. Chartrand, “Iteratively reweighted algorithms for compressive sensing,” IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 38693872, 2008.
 [6] I. Daubechies, R. Devore, M. Fornasier, C. S. Gunturk “Iteratively reweighted least squares minimization for sparse recovery,” Communications on Pure and Applied Mathematics, vol. 63, pp. 138, 2009.
 [7] E. Weiszfeld, “Sur le point par lequel le somme des distances de points donnes est minimum,” Tohoku Mathematical Journal, vol. 43, pp. 355386, 1937.
 [8] H. Voss, U. Eckhardt, “Linear convergence of generalized Weiszfeld’s method,” Computing, vol. 25(3), pp. 243251, 1980.
 [9] Z. Sun, S. Chen, “Analysis of nonlocal Euclidean medians and its improvement”, IEEE Signal Processing Letters, vol. 20, no. 4, pp. 303306, 2013.
 [10] K. N. Chaudhury, A. Singer, ”Nonlocal patch regression: Robust image denoising in patch space,” accepted to IEEE International Conference on Acoustics, Speech and Signal Processing, 2013.
Comments
There are no comments yet.