The Plug-and-Play (PnP) ADMM is a variant of the alternating direction method of multiplier (ADMM) algorithm. Since its introduction in 2013 , the algorithm has demonstrated extremely promising results in image restoration and signal recovery problems [2, 3, 4, 5, 6, 7, 8]. However, despite the enormous applications of PnP ADMM and several studies on its convergence [8, 9, 10]
, it is generally unclear why the algorithm is performing so well. On the one hand, one can argue that the superior performance is attributed to the underlying image denoisers, which is evident from applications using convolutional neural networks[11, 12, 13, 14]. Yet on the other hand, unless we can explicitly write down the optimization which describes the interaction of the denoisers, e.g., in the form of maximum-a-posteriori (MAP), it will be extremely difficult to quantify the solution even if we know the solution exists. The contribution of this paper is to present a set of formal performance analysis results for a family of denoisers known as the graph filters, so that we can understand the PnP ADMM behavior.
In order to state the problem more concretely, it would be useful to first review the PnP ADMM. The starting point of the PnP ADMM is the ADMM algorithm, which has become a standard tool for minimizing a sum of two separable functions of the form
Here, represents the objective function
arising from the physical measurement models, e.g., blur, sampling, projection, or other linear transformations. The functionis the regularization function, representing the prior model of the underlying signals, e.g., -norm, sparsity, or total variation. The parameter is the regularization constant, and is assumed known a-priori and is fixed in this paper. Problems taking the form of (1
) is broad, encompassing various linear inverse problems in imaging, deblurring, super-resolution, CT, MRI, and compressed sensing, to name a few.
ADMM solves (1) by converting the unconstrained optimization into a constrained problem
and considers the augmented Lagrangian function:
Then, the algorithm finds the solution by seeking a saddle point of , which involves solving a sequence of subproblems in the form
where is the scaled Lagrange multiplier, and are the intermediate variables. Under mild conditions, e.g., when both and are closed, proper and convex, and if a saddle point of exists, one can show that the iterates returned by (4)-(6) converges to the solution of (1). Readers interested in knowing more about the theoretical properties of ADMM can consult tutorials such as .
The idea of PnP ADMM is to modify (5) by observing that it is essentially a denoising step, if we treat as a “noisy” version of and as a regularization for . Based on this observation, we can replace (5) by a denoiser such that
where is the denoising strength or a hypothesized “noise level”. The choice of is broad. can be a proximal map such as total variation denoising, or an off-the-shelf image denoisers such as BM3D, non-local means, and more recently neural networks.
Existence of . Given an arbitrary denoiser , is it possible to have an equivalent such that (5) and (7) coincides? If not, under what conditions of would this happen? Some recent studies provide an answer to this question [8, 16], which will be elaborated in later sections of this paper. An alternative approach is the Regularization by Denoising (RED) , which has advantages and limitations.
Performance. If does exist, what is it and why is it an effective regularization function? If does not exists, then what does PnP ADMM solve, and why is it good? The focus of this paper is to address this performance issue. We will take two paths, one through the MAP formulation, and the other one through a concept called the consensus equilibrium.
Generalization. How to generalize the algorithm to multiple denoisers? The generalization can be done using the consensus equilibrium technique. However, can we explain the performance, and what would be the optimal combination weights?
The main assumption we make in this paper is that the denoisers are in the family of symmetric smoothing filters [18, 19, 20] or graph filters [21, 22, 23, 24, 25, 26, 27]. These filters are simple in the sense that they are (pseudo) linear and hence they can be expressed using matrices. However, they are also representative enough to cover a sufficiently large group of image denoisers that we encounter in the literature.
The remaining of the paper is arranged as follows. We begin with a brief review of the construction of graph filters and their properties in Section 2. The main results are presented in Section 3 and 4. Section 3 analyzes the problem from a MAP perspective, whereas Section 4 approaches the problem from the consensus equilibrium perspective. Further discussions of open problems are presented in Section 5.
Ii Graph Filters
Ii-a Constructing a Graph Filter
Graph filters is a class of image denoisers mostly studied in the field of graph signal processing. The application of graph filters is very broad, e.g., deblurring using graph Laplacian , boosting image denoisers , image denoising using global similarity , JPEG compression using random walk , and blind deconvolution using reweighted graph total variation . In terms of theory, there are studies of the graphs in continuous domain , sampling of graphs , and filter banks of graphs , to name a few.
Represented as matrices, graph filters take the form:
where is the noisy image, and is the denoised image. The matrix is the filter. There are multiple ways of constructing . One of the most commonly used approaches is to define a non-local weight kernel :
where is a -dimensional patch centered at pixel , and is a parameter characterizing the denoising strength of the filter. Modifications to (9) are common, e.g., by incorporating spatial distance (which gives the non-local means filter), restricting to pixels instead of patches (which gives the bilateral filter), or extending the norm to a weighted norm (which gives the kernel regression filter). For more examples of the kernel matrices, we refer the readers to .
The matrix defined in (9) is symmetric but the row sum is not 1. Thus, cannot be used directly as a denoising filter because it amplifies or attenuates signals. To normalize the matrix while preserving the symmetry, the most popular method is by means of the Sinkhorn-Knopp balancing algorithm [31, 19, 20], which iteratively normalizes the rows and columns of until convergence. When is symmetrized, the resulting matrix is called a symmetric smoothing filter, given by
where the diagonal matrix is defined such that and . Since the columns and rows of sum to 1,
is a doubly stochastic matrix.
Ii-B Properties of Graph Filters
There are a few interesting properties of .
can be considered as an undirected graph. is the weight on the edge linking node and node .
. This follows from the fact that
is doubly stochastic, and so the eigenvalues are all non-negative.
is the graph Laplacian, with the zero eigenvalue associated to the vector.
The regularization defined through the graph Laplacian can be interpreted as
which is measuring the smoothness of the signal with the weights defined by the graph .
The symmetric matrix has a better denoising performance than its non-symmetric counter part where . This is attributed to the implicit clustering of the pixels during the Sinkhorn-Knopp balancing algorithm 19].
In order to present the theoretical results of this paper, we make an assumption about :
We assume that the matrix is invertible, i.e., there exists a such that
Assumption 1 is very strong and could be unrealistic for practical ’s. However, for the derivation of the main results, at least in Section 3, we need to simplify some of the technical details. In Section IV when we discuss the performance from the consensus equilibrium perspective, we will lift the assumption.
Remark: Pseudo-Linearity. An important remark we should pay attention to is the pseudo-linearity of in the actual PnP ADMM implementation. That is, the matrix is a function of the input noisy image , or more precisely instead of (8). However, as mentioned in , the dependency of on is typically small if
is estimated from apre-filtered version of , for example applying a baseline denoising algorithm to and constructing a new weight matrix based on the baseline estimate. Practically this is also justified in applications such as , where is updated for the first tens of iterations and is kept fixed for the remaining iterations. Following the same line of arguments, we assume that is pre-defined and is independent of . Of particular interest is the oracle case where is estimated from the ground truth.
Iii Performance Analysis via MAP
In this section we present the first set of analytic results. We call it the maximum-a-posteriori (MAP) analysis because we need to explicitly derive the MAP optimization.
Iii-a Existence of
Given a graph filter with , the first question we should ask is whether it is possible to obtain a regularization function defined in (5). The rationale is that if we are able to do so, then we can characterize the MAP optimization in (1), and hence understand the performance.
Consider the minimization
the solution of which is characterized by
where is the sub-differential of , and the inverse operator is the resolvent of . We assume that
is an invertible matrix. Therefore, must be continuously differentiable and so the subgradient at every point has a unique gradient. This replaces by , and substituting into (14) yields
Define as the matrix representation of such that . Since (15) holds for all , this further implies that the mappings are identical:
Applying to both sides yields , and hence
The identity in (17) offers a useful way of characterizing the existence of .
The mapping exists if and only if is symmetric.
The idea is to treat as a vector field which must be conservative. Therefore, the gradient must be symmetric using a classic result in vector calculus  (in particular Theorem 4.3.8 and 4.3.10). ∎
For graph filters, the symmetry of is guaranteed because is symmetric. In addition, we also have that is non-expansive, as the eigenvalues of are bounded above by 1. Therefore, if we use a graph filter and if exists, then according to  the corresponding minimization is a proximal map, and is convex, closed and proper. However, an arbitrary denoiser does not necessarily have a symmetric Jacobian, and hence is not guaranteed to exist.
We can now state the first key result of this paper.
Let be a graph filter with full rank so that the inverse of exists. If is defined by
then the solution of the minimization of (5) is
The first order optimality condition of (21) is given by , which does not imply because and unless has full rank. More specifically, requires to live in the range space of . Since contains zero eigenvalues, there are multiple which can be mapped to the same through . This is problematic, because the optimizer of (21) is no longer , and hence it violates our assumption that the denoiser is a graph filter.
Iii-C The Role of
Knowing the explicit expression of , we can now analyze its performance. But before we proceed to the discussion, we should clarify the role of the parameter . By substituting (18) into (2), and recalling , we can show that the original problem which PnP ADMM solves is
There is an alternative optimization in the graph signal processing literature known as the graph Laplacian, which we will study extensively in this paper:
In (22), the original regularization parameter is eliminated because contains its reciprocal. Thus, is replaced by the ADMM internal parameter . This internal parameter has no influence to the final solution in the classical ADMM because ADMM converges for all when and are closed, proper and convex. However, in PnP ADMM, the optimization cost is characterized by . Thus, the solution would change as changes, a phenomenon reported in earlier papers, e.g., .
We can also ask: For the best and , how would the solutions of (22) and (23) behave? Figure 1 shows an experimental result where . We vary (and equivalently ) to evaluate the mean squared error (MSE) for a fixed oracle . As shown in Figure 1, PnP ADMM and graph Laplacian have different operating regimes for and . PnP ADMM prefers smaller , whereas graph Laplacian prefers larger . We will explain this observation in Section 4.
Iii-D Comparison with Graph Laplacian
We can now compare the performance of PnP ADMM and the conventional graph Laplacian regularization. In order to do so, we consider a denoising problem such that
where , where . That is, is a noisy version of . Such choice of is also justified by the -subproblem (5), which is also a denoising problem.
We consider the following two optimizations:
Here, the first optimization (24) uses graph Laplacian as the regularization, and the second optimization (25) uses the PnP regularization. We use a parameter instead of or to keep the notation simple. The solutions of the two optimizations are
where we defined matrices and to simplify notations.
Our goal is to analyze the mean squared error (MSE) of these two estimates. To this end, we first eigen-decompose as , where is the eigenvalue matrix. Define as the projection of by columns of . Then, the MSE of and can be found as follows.
The MSE of is
Similarly, the MSE of is
Now, we can consider the -th term of the sums:
It holds that for ,
1. Energy Concentration by . The graph filter , when applied to a signal , projects the signal onto the space spanned by its columns. The effectiveness of this projection is determined by the quality of . In the oracle setting, ensures that most of the energy is captured by the first few eigenvectors. This is illustrated in the Figure 2, where we consider a 1D signal and its noisy version . The weight matrix is constructed from using the non-local means kernel defined in (9). By projecting and using the eigenvector matrix , we see that has significantly fewer non-zeros than for large eigen-indices. Since is the vector , we have the observation:
Observation 1: For large , is typically very small.
In other words, the PnP regularization ensures that the MSE goes to zero for high frequency (i.e., large eigen-index ) whereas the graph Laplacian has residue MSE for high frequency. These can also be seen from the eigenvalues of and :
where denotes the -th eigenvalue of the matrix. If we plot and as functions of the eigen-index , we will obtain a result shown in Figure 3. In this experiment we set and . As , we have
which explain why reaches a steady state whereas reaches zero as .
Observation 2: As increases so that and , the eigenvalue approaches a non-zero constant, whereas approaches 0. The same observation applies to MSEs.
3. Trade offs in Bias and Variance. MSE is the sum of bias and variance. So we can decompose the MSE and inspect
Since by construction is decreasing as grows, we can show that .
For any , and , it holds that for all .
To prove this result we just need to show the function
is non-positive. In this expression, the first two terms are non-negative. The last term is non-positive because , where the first inequality holds since . ∎
The conclusion of this proposition is that the variance of PnP is always lower than the variance of graph Laplacian. However, the bias of PnP is larger than the bias of graph Laplacian, as we show in Figure 4. More specifically, we can prove the following result:
For any , and , it holds that for all .
Since the numerators of and are identical, we only need to compare the denominator. Thus, we have
if , which is true when . ∎
The reason why the overall MSE is lower for PnP is attributed to the observation below.
Observation 3: The drop in variance of PnP is more significant than the gain in bias, and hence PnP’s overall MSE is lower than that of graph Laplacian.
4. When is Graph Laplacian Better? The analysis above suggests conditions under which PnP will perform worse than graph Laplacian. This happens when the vector is large, because then the denominator of will amplify more than that of . This also suggests that PnP is more sensitive to the correctness of than graph Laplacian. That is, when constructing from a pre-filtered signal, PnP demands higher quality of the pre-filtered signal.
To confirm this claim, we conduct an experiment by constructing a pre-filtered signal via , where is the true signal, and . That is, the pre-filtered signal is a noisy version of the clean signal, with perturbation specified by the noise variance . We then construct a graph filter using . Technically speaking, is now . Our goal is to see how and would change when increases. 111An alternative experiment would be to pick a specific baseline denoising algorithm and adjust its internal parameters. However, different baseline denoising algorithms have different characteristics and this could cause bias in the analysis. Thus, we choose to design a pre-filtered signal by adding iid noise.
The result of this experiment is shown in Figure 5, where we observe that PnP has a considerably lower MSE than graph Laplacian when is small. But as increases, quickly increases and eventually goes above . This explains the relatively weak robustness of PnP compared to graph Laplacian.
However, we should emphasize that such conclusion is only valid when is fixed. In practical PnP ADMM, the weight matrix is updated in every ADMM iteration. If decreases as PnP iterates, then could become a good filter.
Observation 4: The performance of PnP could be worse than graph Laplacian when is estimated wrongly.
5. Geometry: Truncating Eigenvectors of . If is oracle, is there any simple geometric interpretation, beside the bias-variance analysis, that can explain the performance?
By expressing the eigen-decomposition of the as
the PnP ADMM regularization in (25) can be written as
If the eigenvalues , then the second term in (38) will dominate as is infinity. In this case, we must have in order for the regularization to stay finite. Therefore, the overall optimization is now constrained:
The constraint provides a simple geometric interpretation. While the graph Laplacian regularization searches in the entire column space, PnP ADMM only searches for a subspace spanned by eigenvectors of non-zero eigenvalues. As we see in Observation 1, this provides a very strong denoising power when is the oracle. Referring to the decomposition in (37), the signal is spanned only by whereas the noise is spanned by both and . By restricting , we ensure that any noise living in will be eliminated. See Figure 6 for illustration. As a result, the denoising is significantly more effective than graph Laplacian.
Observation 5: Geometrically, PnP ADMM restricts the search space so that the pure noise components are forced to zero.
Iii-E Questions Not Answered
To summarize, the fundamental reason why PnP works better is that the graph filter has effectively removed the unwanted noise from before proceeding to additional processing. Thus, a stronger denoiser leads to better PnP ADMM algorithm.
One major limitation of the above MAP-based analysis is that it tries to match with a proximal map associated with a regularization function . This is a particularly strong requirement, because many denoisers are not symmetric and hence there is no corresponding proximal map. It also requires to exist, which is difficult to have in practice, e.g., a uniform average operator does not have an inverse.
The question we ask is that if we are willing to give up the proximal map requirement, would it be possible to obtain another interpretation that can explain the performance for denoisers that do not have ? This leads to the next section of equilibrium-based analysis.
Iv Performance Analysis via Equilibrium
The MAP optimization in (1) is not the only way to obtain . In , Buzzard et al. generalize PnP ADMM by a concept called consensus equilibrium, which allows us to characterize the solution of PnP ADMM by analyzing the equilibrium state of a dynamical system. In this section we discuss the PnP ADMM performance from such equilibrium perspective.
The starting point of the equilibrium analysis is the PnP ADMM algorithm listed in (4)-(6). By defining a pair of functions and using (4) and (5) respectively, we can show that the PnP ADMM solution satisfies
Consider two operators and :
The consensus equilibrium solution satisfies
Since , it holds that , which implies that
The optimality of gives
Substituting into gives
Rearranging the terms yields the desired result. ∎
The interesting fact about Proposition 4 is that, given we can easily check whether it is a consensus equilibrium point without going through the PnP ADMM algorithm. In other words, we can “create” an alternative optimization such that its solution is an equilibrium point, and we can interpret the meaning of the optimization to gain insights. To illustrate this point, let us continue with the example in Proposition 4.
Define the functions , , and :
The minimizers of these four functions all satisfy the consensus equilibrium condition (42).
Proposition 5 shows four different optimization formulations that give the same consensus equilibrium solution. The first optimization in is the MAP formulation we saw in Section 3. The other optimizations of , and have some meaningful interpretations.
Interpretation of . The optimization of is informative in the sense that it completely eliminates the inverse matrix , as opposed to the MAP in (25). The objective function says that instead of trying to minimize the residue between the optimization variable and the noisy observation , we minimize with the filtered version . For oracle ’s where pure noise components are isolated at high frequencies (See Observation 5), ensures that the pure noise components are eliminated. Therefore, when minimizing , we are guaranteed to only search for the signal component.
The regularization term is a concave function for any , because is positive semi-definite. The concave function injects high frequency components that are lost by . In the extreme cases when , we have
and so the minimizer is just . When , we have
and so the minimizer is . In order to control the amount of high frequency injection, a small is more preferred than a large . This explains why in Figure 1 a smaller works better for PnP ADMM.
Interpretation of . The function offers an interpretation similar to , e.g., the objective has no penalty over the subspace spanned by the eigenvectors of zero-eigenvalues, and so the regularization will force those dimensions to zero.
is also a special case of Kheradmand and Milanfar , who suggested
for any . If we let , then we obtain .
Interpretation of . The last example , while still giving the consensus equilibrium solution, is more or less useless (in terms of interpretation) because it is just a re-statement of the solution.
In summary, Proposition 5 shows that the solution of the PnP ADMM is completely characterized by the consensus equilibrium. Solving the MAP optimization is just one of the many possible ways of achieving the equilibrium. If we depart from the MAP formulation, i.e., if we are willing to change the forward modeling by integrating the denoiser into the forward model, then there is a wide range of optimization formulations that can offer meaningful interpretations while satisfying the equilibrium condition.
Iv-B General Linear Inverse Problems
The equilibrium based approach also allows us to extend the analysis to general linear inverse problems which could be difficult to study by writing out the MSE in Section 3. The general linear inverse problem using the PnP ADMM with a graph filter requires two operators:
of which the equilibrium solution satisfies
The condition in (48) can be written as
which is also the equivalent to
How do we interpret (49)? The inner minimization is the standard graph Laplacian minimization with changed to in the objective. If has zero eigenvalues, then ensures that the columns along those directions are untouched. Then by the virtue of , the minimizer must be zero along those directions because is positive semi-definite. The outer step by multiplying the solution with ensures that the noise from the inner minimization is eliminated before creating the final solution.
Iv-C Performance of Multiple Priors
Beyond linear inverse problem we can extend the analysis to PnP ADMM involving multiple priors. In this case, the MAP-based optimization is