# Plug-and-Play Unplugged: Optimization Free Reconstruction using Consensus Equilibrium

Regularized inversion methods for image reconstruction are used widely due to their tractability and their ability to combine complex physical sensor models with useful regularity criteria. Such methods were used in the recently developed Plug-and-Play prior method, which provides a framework to use advanced denoising algorithms as regularizers in inversion. However, the need to formulate regularized inversion as the solution to an optimization problem severely limits both the expressiveness of possible regularity conditions and the variety of provably convergent Plug-and-Play denoising operators. In this paper, we introduce the concept of consensus equilibrium (CE), which generalizes regularized inversion to include a much wider variety of regularity operators without the need for an optimization formulation. Consensus equilibrium is based on the solution of a set of equilibrium equations that balance data fit and regularity. In this framework, the problem of MAP estimation in regularized inversion is replaced by the problem of solving these equilibrium equations, which can be approached in multiple ways, including as a fixed point problem that generalizes the ADMM approach used in the Plug-and-Play method. We present the Douglas-Rachford (DR) algorithm for computing the CE solution as a fixed point and prove the convergence of this algorithm under conditions that include denoising operators that do not arise from optimization problems and that may not be nonexpansive. We give several examples to illustrate the idea of consensus equilibrium and the convergence properties of the DR algorithm and demonstrate this method on a sparse interpolation problem using electron microscopy data.

## Authors

• 7 publications
• 3 publications
• 10 publications
• ### Linearized ADMM and Fast Nonlocal Denoising for Efficient Plug-and-Play Restoration

In plug-and-play image restoration, the regularization is performed usin...
01/18/2019 ∙ by Unni V. S., et al. ∙ 0

• ### Plug-and-Play Priors for Bright Field Electron Tomography and Sparse Interpolation

Many material and biological samples in scientific imaging are character...
12/23/2015 ∙ by Suhas Sreehari, et al. ∙ 0

• ### Performance Analysis of Plug-and-Play ADMM: A Graph Signal Processing Perspective

The Plug-and-Play (PnP) ADMM algorithm is a powerful image restoration f...
08/31/2018 ∙ by Stanley H. Chan, et al. ∙ 0

• ### 4D X-Ray CT Reconstruction using Multi-Slice Fusion

There is an increasing need to reconstruct objects in four or more dimen...
06/15/2019 ∙ by Soumendu Majee, et al. ∙ 2

• ### Regularization by Denoising via Fixed-Point Projection (RED-PRO)

Inverse problems in image processing are typically cast as optimization ...
08/01/2020 ∙ by Regev Cohen, et al. ∙ 5

• ### Fixed-Point and Objective Convergence of Plug-and-Play Algorithms

A standard model for image reconstruction involves the minimization of a...
04/21/2021 ∙ by Pravin Nair, et al. ∙ 0

• ### Regularized Fourier Ptychography using an Online Plug-and-Play Algorithm

The plug-and-play priors (PnP) framework has been recently shown to achi...
10/31/2018 ∙ by Yu Sun, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Over the past 30 years, statistical inversion has evolved from an interesting theoretical idea to a proven practical approach. Most statistical inversion methods are based on the maximum a posteriori (MAP) estimate, or more generally regularized inversion, using a Bayesian framework, since this approach balances computational complexity with achievable image quality. In its simplest form, regularized inversion is based on the solution of the optimization problem

 x∗=argminx{f(x)+h(x)}, (1)

where is the data fidelity function and is the regularizing function. In the special case of MAP estimation, represents the forward model and represents the prior model, given by

 f(x)=−logpforward(y|x),     h(x)=−logpprior(x),

where is the data and is the unknown to be recovered. The solution of equation (1) balances the goals of fitting the data while also regularizing this fit according to the prior.

In more general settings, for example with multiple data terms from multi-modal data collection, a cost function can be decomposed as a sum of auxiliary (usually convex) functions:

 minimize f(x)=N∑i=1fi(x),

with variable and . In consensus optimization, the minimization of the original cost function is reformulated as the minimization of the sum of the auxiliary functions, each a function of a separate variable, with the constraint that the separate variables must share a common value:

 minimize N∑i=1fi(xi) subject to xi=x, i=1,…,N, (2)

with variables , , . This reformulation allows for the application of the Alternating Direction Method of Multipliers (ADMM) or other efficient minimization methods and applies to the original problem in (1) as well as many other problems. An account of this approach with many variations and examples can be found in [4].

While regularized inversion and optimization problems more generally benefit from extensive theoretical results and powerful algorithms, they are also expressively limited. For example, many of the best denoising algorithms cannot be put into the form of a simple optimization [6, 10]. Likewise, the behavior of denoising neural networks cannot generally be captured via optimization. These successful approaches to inverse problems lie outside the realm of optimization problems and give rise to the motivating question for this paper:

Question: How can we generalize the consensus optimization framework in (2) to encompass models and operators that are not associated with an optimization problem, and how can we find solutions efficiently?

There is a vast and quickly-growing literature on methods and results for convex and consensus optimization. Seminal work in this area includes the work of Lions and Mercier [19], as well as the PhD thesis of Eckstein [11] and the work of Eckstein and Bertsekas [12]. We do not provide a complete survey of this literature since our focus is on a framework beyond optimization, but some starting points for this area are [2, 4, 5].

As for approaches to fuse a data fidelity model with a denoiser that is not based on an optimization problem, the first attempt to our knowledge is [28]. The goal of this approach, called the Plug-and-Play prior method, is to replace the prior model in the Bayesian formulation with a denoising operator. This is done by taking the ADMM algorithm, which is often used to find solutions for consensus optimization problems, and replacing one of the optimization steps (proximal maps) of this algorithm with the output of a denoiser. Recently, a number of authors have built on the Plug-and-Play method as a way to construct implicit prior models through the use of denoising operators [24, 27, 26, 29]. In [27], conditions are given on the denoising operator that will ensure it is a proximal mapping, so that the MAP estimate exists and the ADMM algorithm converges. However, these conditions impose relatively strong symmetry conditions on the denoising operator that may not occur in practice. For applications where fixed point convergence is sufficient, it is possible to relax the conditions on the denoising operator by iteratively controlling the step size in the proximal map for the forward model and the noise level for the denoiser [7].

The paper [23] provides a different approach to building on the idea of Plug-and-Play. That paper uses the classical forward model plus prior model in the framework of optimization, but constructs a prior term directly from the denoising engine; this is called Regularization by Denoising (RED). For a denoiser , the prior term is given by . This approach is formulated as an optimization problem associated with any denoiser, but in the case that the denoiser itself is obtained from a prior, the RED prior is different from the denoiser prior. Other approaches that build on Plug-and-Play include [21], which uses primal-dual splitting in place of an ADMM approach, and [17], which uses FISTA in a Plug-and-Play framework to address a nonlinear inverse scattering problem.

In this paper, we introduce Consensus Equilibrium (CE) as an optimization-free generalization of regularized inversion and consensus optimization that can be used to fuse multiple sources of information implemented as maps such as denoisers, deblurring maps, data fidelity maps, proximal maps, etc. We show that CE generalizes consensus optimization problems in the sense that if the defining maps are all proximal maps associated with convex functions, then any CE solution is also a solution to the corresponding consensus optimization problem. However, the consensus equilibrium can still exist in the more general case when the defining maps are not proximal maps; in this case, there is no underlying optimization. In the case of a single data fidelity term and a single denoiser, the solution has the interpretation of achieving the best denoised inverse of the data. That is, the proximal map associated with the forward model pulls the current point towards a more accurate fit to data, while the denoising operator pulls the current point towards a “less noisy” image. We illustrate this in a toy example in two dimensions: the consensus equilibrium is given by a balance between two competing forces.

In addition to introducing the CE equations, we discuss ways to solve them and give several examples. We describe a version of the Douglas-Rachford (DR)/ADMM algorithm with a novel form of anisotropic preconditioning. We also apply Newton’s method, both in standard form and in a Jacobian-free form using Krylov subspaces.

In the experimental results section, we give several examples to illustrate the idea of consensus equilibrium and the convergence properties of these algorithms. We first demonstrate the proposed algorithms on some toy problems in order to illustrate properties of the method. We next use the consensus equilibrium framework to solve an image denosing problem using an array of convolutional neural network (CNN) denoisers, none of which is tuned to match the noise level in a noisy image. Our results demonstrate that that the consensus equilibrium result is better than any of the individually applied CNN denoisers.

## 2 Consensus Equilibrium: Optimization and Beyond

In this section we formulate the consensus equilibrium equations, show that they encompass a form of consensus optimization in the case of proximal maps, and describe the ways that CE goes beyond the optimization framework.

### 2.1 Consensus Equilibrium for Proximal Maps

We begin with a slight generalization of (2):

 minimize N∑i=1μifi(xi) subject to xi=x, i=1,…,N, (3)

with variables , , , and weights , , that sum to 1 (an arbitrary normalization, but one that supports the idea of weighted average that we use later). From the point of view of optimization, each weight could be absorbed into . However, in Consensus Equilibrium we move beyond this optimization framework to the case in which the may be defined only implicitly or the case in which there is no optimization, but only mappings that play a role similar to the proximal maps that arise in the ADMM approach to solving (3). The formulation in (3) serves as motivation and the foundation on which we build.

To extend the optimization framework of (3) to consensus equilibrium, we start with vector-valued maps, , . The Consensus Equilibrium for these maps is defined as any solution that solves the equations

 Fi(x∗+u∗i) =x∗, i=1,…,N, (4) ¯u∗μ =0. (5)

Here is a vector in obtained by stacking the vectors , and is the weighted average .

In order to relate consensus equilibrium to consensus optimization, first consider the special case in which each in (3) is a proper, closed, convex function and each is a corresponding proximal map, i.e., a map of the form

 Fi(x)=argminv{∥v−x∥22σ2+fi(v)}. (6)

Methods such as ADMM, Douglas-Rachford, and other variants of the proximal point algorithm apply these maps in sequence or in parallel with well-chosen arguments, together with some map to promote for all , in order to solve (3); see e.g., [3, 4, 9, 11, 12, 19, 25]. In the setting of Bayesian regularized inversion, each represents a data fidelity term or regularizing function. We allow for the possibility that enforces some hard constraints by taking on the value .

Our first theorem states that when the maps are all proximal maps as described above, then the solutions to the consensus equilibrium problem are exactly the solutions to the consensus optimization problem of equation (3). In this sense, Consensus Equilibrium encompasses the optimization framework of (3).

###### Theorem 1

For , let be a proper, lower-semicontinuous, convex function on , and let with . Define , and assume is finite on some open set in . Let be the proximal maps as in (6). Then the set of solutions to the consensus equilibrium equations of (4) and (5) is exactly the set of solutions to the minimization problem (3).

The proof is contained in the appendix.

### 2.2 Consensus Equilibrium Beyond Optimization

Theorem 1 tells us that consensus equilibrium extends consensus optimization, but as noted above, the novelty of consensus equilibrium is not as a recharacterization of (3) in the case of proximal maps but rather as a framework that applies even when some of the are not proximal mappings and there is no underlying optimization problem to be solved. The Plug-and-Play reconstruction method of [28], which yields high quality solutions for important applications in tomography [27] and denoising [24], is to our knowledge, the first method to use denoisers that do not arise from an optimization for regularized inversion. As we show below, the CE framework also encompasses the Plug-and-Play framework in that if Plug-and-Play converges, then the result is also a CE solution. However, Plug-and-Play grew out of ADMM, and the operators that yield convergence in ADMM are more limited than we would like. Hence, for both consensus optimization and Plug-and-Play priors, CE encompasses the original method but also allows for a wider variety of operators and solution algorithms.

An important point about moving beyond the optimization framework is that a given set of maps may lead to multiple possible CE solutions. This may also happen in the optimization framework when the are not strictly convex since there may be multiple local minima. In the optimization case, the objective function can sometimes be used to select among local minima. The analogous approach for consensus equilibrium is to choose a solution that minimizes the size of , e.g. the or norm of . This corresponds in some sense to minimizing the tension among the competing forces balanced to find equilibrium.

## 3 Solving the Equilibrium Equations

In this section, we rewrite the CE equations as an unconstrained system of equations and then use this to express the solution in terms of a fixed point problem. We also discuss particular methods of solution, including novel preconditioning methods and methods to solve for a wide range of possible . We first introduce some additional notation. For , with and each , define by

 F(v)=⎛⎜ ⎜⎝F1(v1)⋮FN(vN)⎞⎟ ⎟⎠  and  Gμ(v)=⎛⎜ ⎜⎝¯vμ⋮¯vμ⎞⎟ ⎟⎠, (7)

where has the important interpretation of redistributing the weighted average of the vector components given by across each of the output components.

Also, for , let denote the vector obtained by stacking copies of . With this notation, the CE equations are given by

 F(^x∗+u∗) =^x∗, (8) ¯u∗μ =0.

This notation allows us to reformulate the CE equations as the solution to a system of equations.

###### Theorem 2

The point is a solution of the CE equations (4) and (5) if and only if the point satisfies and

 F(v∗)=Gμ(v∗). (9)

###### Proof

Let be a solution to the CE equations, and let . Linearity of together with give , so in particular, . Using this in (8) gives (9).

Conversely, if satisfies (9), define and . Then (4) and (5) are satisfied by definition of and (9).

We use this to reformulate consensus equilibrium as a fixed point problem.

###### Corollary 1

(Consensus equilibrium as fixed point.) The point is a solution of the CE equations (4) and (5) if and only if the point satisfies and

 (2Gμ−I)(2F−I)v∗=v∗. (10)

When is a proximal map for a function , then is known as the reflected resolvent of . Discussion and results concerning this operator can be found in [2, 12, 15] among many other places. This fixed point formulation is closely related to the fixed point formulation for minimizing the sum of two functions using Douglas-Rachford splitting; this is seen clearly in section 4 of [13] among other places. The form given here is somewhat different in that the reflected resolvents are computed in parallel and then averaged, as opposed to the standard sequential form. Beyond that, the novelty here is in the equivalence of this formulation with the CE formulation.

###### Proof (Proof of Corollary 1)

By Theorem 2, is a solution of (4) and (5) if and only if satisfies and (9). From (9) we have . A calculation shows that , so by linearity of . Hence applying to both sides gives (10). Reversing these steps returns from (10) to (9).

### 3.1 Anisotropic Preconditioned Mann Iteration for Nonexpansive Maps

Define . When is nonexpansive and has a fixed point, we can use Mann iteration to find a fixed point of as required by (10). For a fixed parameter , this takes the form

 wk+1=(1−ρ)wk+ρT(wk), (11)

with iterates guaranteed to converge to a fixed point of . In the context of minimization problems in which and are both proximal maps, and depending on the choice of , iterations of this form are essentially variants of the proximal point algorithm and give rise to the (generalized) Douglas-Rachford algorithm, the Peaceman-Rachford algorithm, and the ADMM algorithm, including over-relaxed and under-relaxed variants of ADMM. In the case of and , the form in (11) is equivalent up to a change of variables to the standard ADMM algorithm; other values of give over-relaxed and under-relaxed variants. Early work in this direction appears in [11] and [12]. A concise discussion is found in [15], which together with [14] provides a preconditioned version of this algorithm in the case of . This preconditioning is obtained by replacing the original minimization of by minimization of , which gives rise to iterations involving the conjugate proximal maps , where is the proximal map for as in (6) using the norm in place of the usual Euclidean metric. [15] includes some results about the rate of convergence as a function of . In some cases, a larger value of leads to faster convergence relative to . There are also results on convergence in the case that fixed is replaced by a sequence of such that [2]. Further discussion and early work on this approach is found in [11, 12]. With some abuse of nomenclature, we use ADMM below to refer to Mann iteration with .

Here we describe an alternative preconditioning approach for the Mann iteration in which we use an invertible linear map in place of the scalar in (11). In this approach, can be any nonexpansive map and can be any symmetric matrix with and both positive definite.

###### Theorem 3

Let be a positive definite, symmetric matrix and let be nonexpansive on

with at least one fixed point. Suppose that the largest eigenvalue of

is strictly less than . For any in , define

 vk+1=(I−H)vk+HT(vk) (12)

for each . Then the sequence converges to a fixed point of .

The idea of the proof is similar to the proof of convergence for Mann iteration given in [25], but using a norm that weights differently the orthogonal components arising from the spectral decomposition of . The proof is contained in the appendix.

We note that in the case that each is a proper, closed, convex function on and is the proximal map as in (6), then the map is nonexpansive, so this preconditioning method can be used to find a solution to the problem in (3). The asymptotic rate of convergence with this method is not significantly different from that obtained with the isotropic scaling obtained with a scalar . However, we have found this approach to be useful for accelerating convergence in certain tomography problems in which various frequency components converge at different rates, leading sometimes to visible oscillations in the reconstructions as a function of iteration number. An appropriate choice of the preconditioner can dampen these oscillations and provide faster convergence in the initial few iterations. We will explore this example and related algorithmic considerations further in a future paper.

### 3.2 Beyond nonexpansive maps

The iterative algorithms obtained from (11) and (12) give guaranteed global convergence when is nonexpansive and (or ) satisfy appropriate conditions. However, the iterates of (11) may still be convergent for more general maps . We illustrate this behavior in Case 1 of Section 4.2.

In fact, when is differentiable at a fixed point, the rate of convergence is closely related to the spectrum of the linearization of near this fixed point. The parameter in (11) maintains a fixed point at but changes the linear part of the iterated map to have eigenvalues , where are the eigenvalues of the linear part of . The iterates of (11) converge locally exactly when all of these are strictly inside the unit disk in the complex plane. This can be achieved for sufficiently small precisely when the real part of each is less than 1. Since there is no constraint on the complex part of the eigenvalues, the map may be quite expansive in some directions. In this case, the optimal rate of convergence is obtained when is chosen so that the eigenvalues all lie within a minimum radius disk about the origin.

The use of to affect convergence rate and/or to promote convergence is closely related to the ideas of overrelaxation and underrelaxation as applied in a variety of contexts. See e.g. [16] for further discussion in the context of linear systems. In the current setting, the use of is a form of underrelaxation that is related to methods for iteratively solving ill-posed linear systems. In the following theorem, the main idea is to make use of underrelaxation in order to shrink the eigenvalues of the resulting operator to the unit disk and thus guarantee convergence.

###### Theorem 4

(Local convergence of Mann iterates) Let and be maps such that has a fixed point . Suppose that is differentiable at and that the Jacobian of at has eigenvalues with the real part of strictly less than 1 for all . Then there is and an open set containing such that for any initial point in the iterates defined by (11) converge to .

The proof of this theorem is given in Appendix A.

### 3.3 Newton’s Method

By formulating the CE as a solution to , we can apply a variety of root-finding methods to find solutions. Likewise, rewriting (10) as gives the same set of options.

Let be a smooth map from to . The basic form of Newton’s method for solving is to start with a vector and look for a vector to solve . A first-order approximation gives , where is the Jacobian of at . If this Jacobian is invertible, this equation can be solved for to give and the method iterated. There are a wide variety of results concerning the convergence of this method with and without preconditioning, with various inexact steps, etc. An overview and further references are available in [20].

For large scale problems, calculating the Jacobian can be prohibitively expensive. The Jacobian-Free Newton-Krylov (JFNK) method is one approach to avoid the need for a full calculation of the Jacobian. Let . The key idea in Newton-Krylov methods is that instead of trying to solve exactly, we instead minimize over the vectors in a Krylov subspace, . This subspace is defined by first calculating the residual and then taking

 Kj=span{r,Jr,…,Jj−1r}.

The basis in this form is typically highly ill-conditioned, so the Generalized Minimal RESidual method (GMRES) is often used to produce an orthonormal basis and solve the minimization problem over this subspace. This form requires only multiplication of a vector by the Jacobian, which can be approximated as

 Jr≈H(x0+ϵr)−H(x0)ϵ.

Applying this to produce requires applications of the map together with the creation of the Arnoldi basis elements, which can then be used to find the minimizing by reducing to a standard least squares problem of dimension . Various stopping conditions can be used to determine an appropriate . These calculations take the place of the solution of . In cases for which there are many contracting directions and only a few expanding directions for Newton’s method near the solution point, the JFNK method can be quite efficient. A more complete description with a discussion of the benefits and pitfalls, approaches to preconditioning, and many further references is contained in [18].

We note in connection with the previous section that if is chosen to be in (12), then the choice of in Theorem 3 is an exact Newton step applied to . That is, the formula for the step in Newton’s method in this case is

or

 vk+1=(I−H)vk+HTvk,

which is the same as the formula in Theorem 3.

In the examples below, we use standard Newton’s method applied to both and in the first example and JFNK applied to in the second. Because of the connection with Mann iteration just given, we use the term Newton Mann to describe Newton’s method applied to .

### 3.4 Other approaches

An alternative approach is to convert the CE equations back into an optimization framework by considering the residual error norm given by

 R(v)Δ=∥∥F(v)−Gμ(v)∥∥ (13)

and minimizing over . Assuming that a solution of the CE equations exists, then that solution is also a minimum of this objective function. In the case that is twice continuously differentiable, a calculation using the facts that and that is linear shows that the Hessian of is , where . Hence is locally convex near as long as has no eigenvalue equal to 0. Since is a projection, its only eigenvalues are 0 and 1, hence this is equivalent to saying that does not have 1 as an eigenvalue. If does have an eigenvalue 0, then a perturbation of the form produces a unique solution, which can be followed in a homotopy scheme as decreases to 0.

One possible algorithm for this approach is the Gauss-Newton method, which can be used to minimize a sum of squared function values and which does not require second derivatives.

We note that the residual error of equation (13) is also useful as a general measure of convergence when computing the CE solution; we use this in plots below.

Other candidate solution algorithms include the forward-backward algorithm and related algorithms as presented in [9]. We leave further investigation of efficient algorithms for solving the CE equations to future research.

## 4 Experimental Results

Here we provide some computational examples of varying complexity. For each of these examples, at least one of the component maps is not a proximal mapping, so the traditional optimization formulation of equations (1) or (3) is not applicable.

We start with a toy model in 2 dimensions to illustrate the ideas, then follow with some more complex examples.

### 4.1 Toy model

In this example we have , , both in , and maps and defined by

 F1(v1) =(I+σ2ATA)−1(v1+σ2ATy) F2(v2) =1.1(v21+0.2,v22−0.2sin(2v22))T.

In this case, is a proximal map as in (6) corresponding to and is a weakly expanding map designed to illustrate the properties of consensus equilibrium. We use and

 A=[0.30.60.40.5],  y=[11].

We take and so write for . We apply Newton iterations to and to the fixed point formulation . In both cases, the Jacobian of is evaluated only at the initial point.

Figure 1 shows the vectors obtained from each of the maps and . Blue line segments are vectors from a point to , and red line segments are vectors from a point to . The starting points of each pair of red and blue vectors are chosen so that they have a common ending point, signified by black dot. Open squares show the trajectories of in blue and in red. The trajectories converge to points for which the corresponding red and blue vectors have a common end point and are equal in magnitude and opposite in direction; this is the consensus equilibrium solution. The plots shown are for Newton’s method applied to ; the plots for Newton’s method applied to are similar (not shown). In the right panel of this figure, we use the true fixed point to plot error versus iterate for this example using all 3 methods. The expansion in prevents ADMM from converging in this example.

### 4.2 Stochastic matrix

The next example uses the proximal map form for as in the previous example, although now with dimension . and

were chosen using the random number generator rand in Matlab, approximating the uniform distribution on

in each component. The map has the form ; here is constructed by first choosing entries at random in the interval as for , then replacing the diagonal entry by the maximum entry in that row (in which case the maximum entry may appear twice in one row), and then normalizing so that each row sums to 1. This mimics some of the features in a weight matrix appearing in denoisers such as non-local means [6] but is designed to allow us to compute an exact analytic solution of the CE equations. In particular, since is not symmetric, cannot be a proximal map, as shown in [27].

In order to illustrate possible convergence behaviors, we first fix the matrices and and the vector as above and then use a one-parameter family of maps . When , this map averages and . The map is a proximal map as in (6) with and ; i.e., the proximal map associated with a quadratic regularization term. In the framework of Corollary 1, the map satisfies . Hence the scaling of controls the expansiveness of one of the component maps in , and hence the expansiveness of the operator in (10) through the averaging operator . For the examples here, we choose to be and . As described below, with appropriate choices of parameters, the Jacobian-free Newton Krylov method converges for both examples, while ADMM converges for the first one only.

Recall that if the Lipschitz constant, , is strictly less than , then the operator is a contraction, and if we say it is nonexpansive. Moreover, for linear operators, where

is the maximum singular value of

; and where is the eigenvalue with greatest magnitude.

• : In this case, has Lipschitz constant , and the conditions of Theorem 3 or similar theorems on the convergence of Mann iteration for the convergence of nonexpansive maps do not hold. However, in this case, is affine (linear map plus constant) and all eigenvalues of the linear part of lie strictly inside the unit circle. From (11) with and basic linear algebra, this means that Mann iteration converges. This is confirmed in Figure 2. In this example, convergence for Mann iteration can be improved by taking to be , in which case the convergence is marginally better than that for JFNK. For this example, we used a Krylov subspace of dimension 10, so that each Newton step requires 10 function evaluations. This is indicated by closed circles in the plots.

• : In this case, has Lipschitz constant , and there is an eigenvalue with real part approximately , so averaging with the identity as in Mann iteration will maintain an eigenvalue larger than one. In particular, Mann iteration with (labelled as ADMM) does not converge, but the JFNK algorithm does. For this example, we used a Krylov subspace of dimension 75, so that each Newton step requires 75 function evaluations. This is indicated by closed circles in the plots.

### 4.3 Image Denoising with Multiple Neural Networks

The third example we give is an image denoising problem using multiple deep neural networks. This problem is more complex in that we use several neural networks, none of which is tuned to match the noise in the image to be denoised. Nevertheless, we show that CE is often able to outperform each individual network. The images and code for this section are available at [1].

The forward model of image denoising is described by the following linear equation:

 y=x+η,

where is latent unknown image, is i.i.d. Gaussian noise, and is the corrupted observation. Our motivation is to find an estimate by solving the consensus equilibrium equation analogous to the classical maximum-a-posteriori approach:

 x∗=argminx∈Rn12σ2η∥y−x∥2−logp(x), (14)

where is the prior of . However, instead of a prior function, which would induce a proximal map, we will use a set of convolutional neural networks, which will play the role of regularization in the way that a prior term does, but which are almost certainly not themselves proximal maps for any function.

To define the CE operators , we consider a set of image denoisers. Specifically, we use the denoising convolutional neural network (DnCNN) proposed by Zhang et al. [31]. In the code provided by the authors 111Code available at https://github.com/cszn/ircnn, there are five DnCNNs trained at five different noise levels: , , , and . In other words, the user has to choose the appropriate DnCNN to match the actual noise level . In the CE framework, we see that is the operator

 Fi(vi)=DnCNN(vi,with denoising strengthσi). (15)

The CE operator is the proximal map of the likelihood function:

 FK+1(vK+1)=argminx∈Rn12σ2η∥y−x∥2+12σ2∥vK+1−x∥2, (16)

where is an internal parameter controlling the strength of the regularization . In this example, we set for simplicity.

To make the algorithm more adaptive to the data, we use weights , where

 pi=exp{−(ση−σi)22h2},andpK+1=K∑i=1pi. (17)

In this pair of equations, measures the deviation between the actual noise level and the denoising strength of the neural networks . The parameter controls the cut off. Therefore, among the five neural networks, weights more heavily the relevant networks. The weight is the weight of the map to fit to data. Its value is chosen to be the sum of the weights of the denoisers to provide appropriate balance between the likelihood and the denoisers.

Figures 3 and 4 show some results using noise levels of and , respectively. Notice that none of these noise levels is covered by the trained DnCNNs. Table 1 shows resulting SNR values for the full set of experiments using 8 test images and 3 noise levels of . The results in the center of the table indicate the result of applying an individual CNN to the noisy image. Because of the form of in (16) and , the result of this single application of the CNN is the same as the CE solution obtained by using only that single CNN together with .

Notice that in almost all cases the consensus equilibrium of the full group has the highest PSNR when compared to the individual application of the DnCNNs. Also, the improvement in terms of the PSNR is quite substantial for noise levels and . For , CE still offers PSNR improvement except for House256, which is an image with many smooth regions. In addition, visual inspection of the images shows that the CE result yields the best visual detail while also removing the noise most effectively. While DnCNN denoisers can be very effective, they must be trained in advance using the correct noise level. This demonstrates that the consensus equilibrium can be used to generate a better result by blending together multiple pre-trained DnCNNs.

In order to illustrate that the CE solution outperforms a well-chosen linear combination of the outputs from each denoiser, we report a baseline combination result in Table 1. The baseline results are generated by

 ˆxbaseline=n∑i=1μiˆxi,

where are the initial estimates provided by the denoisers, and is defined through 17 without . That is, we use the same weights as those for CE, excluding the weight for the likelihood term and rescaled to sum to 1 after this exclusion. Therefore, can be considered as a linear combination of the initial estimates, with weights defined by the distance between the current noise level and the trained noise levels. The results in Table 1 show that while very occasionally outperforms the best of the individual denoisers, it is usually worse than the best individual denoiser and is uniformly worse than CE. In the last column of Table 1 we show the result of DnCNN trained at a noise level matched with the actual noise level. It is interesting to note that CE compares favorably with the matched DnCNN in many cases, except for large sigma where the matched DnCNN is uniformly better.

We note that [30]

uses a linear transformation depending on the noise level of a noisy image in order to match the noise level of a trained neural network, and then applies the inverse linear transformation to the output. This provides another approach to the example above but doesn’t include the ability of CE to combine multiple sources of influence without a predetermined conversion from one to the other. We should also point out the recent work of Choi et al.

[8] which demonstrates an optimal mechanism of combining image denoisers.

## 5 Conclusion

We presented a new framework for image reconstruction, which we term Consensus Equilibrium (CE). The distinguishing feature of the CE solution is that it is defined by a balance among operators rather than the minimum of a cost function. The CE solution is given by the consensus vector that arises simultaneously from the balance of multiple operators, which may include various kinds of image processing operations. In the case of conventional regularized inversion, for which the optimization framework holds, the CE solution agrees with the usual MAP estimate, but CE also applies to a wide array of problems for which there is no corresponding optimization formulation.

We discussed several algorithms for solving the CE equations, including a novel anisotropic preconditioned Mann iteration and a Jacobian-free Newton Krylov method. We also introduced a novel precondition method for accelerating the Mann iterations used to solve the CE equations. There is a great deal of room to explore other methods for finding CE solutions as well as for formulating other equilibrium conditions.

Our experimental results, on a variety of problems with varying complexity, demonstrate that the Consensus Equilibrium approach can solve problems for which there is no corresponding regularized optimization and can in some cases achieve consensus results that are better than any of the individual operators. In particular, we showed how the Consensus Equilibrium can be used to integrate a number of CNN denoisers, thereby achieving a better result than any individual denoiser.

## Acknowledgments

We thank the referees for many helpful remarks that have improved this paper substantially.

## Appendix A Appendix: Proofs

###### Proof (Proof of Theorem 1)

In order to use as in (6), we multiply the objective function in (3) by , which does not change the solution. Define the Lagrangian associated with this scaled problem as

 L(x,(xi)Ni=1,(λi)Ni=1)=N∑i=1(σ2μifi(xi)+(x−xi)Tλi),

where the are the Lagrange multipliers for the equality constraints . Since the are all convex and lower-semicontinuous, the first order KKT conditions are necessary and sufficient for optimality [22, Theorem 28.3]. At a solution point , these conditions are given by

 ∇xL(x∗,(x∗i)Ni=1,(λ∗i)Ni=1) =0 ∂xiL(x∗,(x∗i)Ni=1,(λ∗i)Ni=1) ∋0,∀i=1,…,N x∗i−x∗ =0,∀i=1,…,N,

where is the subdifferential with respect to . These convert to

 N∑i=1λ∗i =0 (18) σ2μi∂fi(x∗i) ∋λ∗i, ∀i=1,…,N (19) x∗i =x∗, ∀i=1,…,N. (20)

Define , in which case (18) is the same as (5). Next, use from (20) in (19) and cancel to get for all . Adding to both sides gives or

 (I+σ2∂fi)(x∗)∋x∗+u∗i.

Since the are convex and , we can invert to get . From [2, Proposition 16.34], this is equivalent to (4) in the case that is the proximal map of (6).

###### Proof (Proof of Theorem 3)

Since

is symmetric and positive definite, there is an orthogonal matrix

and a diagonal matrix with for all and . Let be the th column of , and let be orthogonal projection onto the span of . Define the associated norm . Also, let be the product , and let (i.e., the product of all through except ). Define the weighted norm

 ∥v∥2H−1=vTH−1v=∑jλ−1j∥v∥2j,

which is equivalent to the standard norm on .

By assumption, there is a fixed point . Using and applying to both sides of the definition of gives

 ∥vk+1−v∗∥2j=∥(1 −λj)πjvk+λjπjTvk −(1−λj)πjv∗−λjπjTv∗∥2.

Here and below, we use freely as needed. As in [25], we use the equality , which holds for between 0 and 1 and can be verified by expanding both sides as a function of . In our case, we have from the assumptions on . After conversion back to the norm , this yields

 ∥vk+1−v∗∥2j=(1 −λj)∥vk−v∗∥2j+λj∥Tvk−Tv∗∥2j −λj(1−λj)∥vk−Tvk∥2j.

Summing with weights gives

 ∑jλ−1j∥vk+1−v∗∥2j=∑j(λ−1j −1)∥vk−v∗∥2j+∑j∥Tvk−Tv∗∥2j −∑jλ−1jλj(1−λj)∥vk−Tvk∥2j.

Since is nonexpansive, the right hand side is bounded above by replacing with in the second sum. This new sum then exactly cancels the term arising from in the first sum. Let be the minimum over of , and note that since for each by assumption. Putting these together and re-expressing in the norm gives

 ∥vk+1−v∗∥2H−1≤∥vk−v∗∥2H−1−c∥vk−Tvk∥2H−1.

The remainder of the proof is nearly identical to that in [25]; we include it for completeness. Iterating in the first term on the right hand side, we obtain

 ∥vk+1−v∗∥2H−1≤∥v1−v∗∥2H−1−ck∑i=1∥vj−Tvj∥2H−1, (21)

and hence

 k∑i=1∥vj−Tvj∥2H−1≤1c∥v1−v∗∥2H−1.

In particular, and hence