# Scene-Adapted Plug-and-Play Algorithm with Guaranteed Convergence: Applications to Data Fusion in Imaging

The recently proposed plug-and-play (PnP) framework allows leveraging recent developments in image denoising to tackle other, more involved, imaging inverse problems. In a PnP method, a black-box denoiser is plugged into an iterative algorithm, taking the place of a formal denoising step that corresponds to the proximity operator of some convex regularizer. While this approach offers flexibility and excellent performance, convergence of the resulting algorithm may be hard to analyze, as most state-of-the-art denoisers lack an explicit underlying objective function. In this paper, we propose a PnP approach where a scene-adapted prior (i.e., where the denoiser is targeted to the specific scene being imaged) is plugged into ADMM (alternating direction method of multipliers), and prove convergence of the resulting algorithm. Finally, we apply the proposed framework in two different imaging inverse problems: hyperspectral sharpening/fusion and image deblurring from blurred/noisy image pairs.

## Authors

• 4 publications
• 15 publications
• 28 publications
• ### Scene-adapted plug-and-play algorithm with convergence guarantees

Recent frameworks, such as the so-called plug-and-play, allow us to leve...
02/08/2017 ∙ by Afonso M. Teodoro, et al. ∙ 0

• ### Image Restoration by Iterative Denoising and Backward Projections

Inverse problems appear in many applications such as image deblurring an...
10/18/2017 ∙ by Tom Tirer, et al. ∙ 0

• ### A Fast Stochastic Plug-and-Play ADMM for Imaging Inverse Problems

In this work we propose an efficient stochastic plug-and-play (PnP) algo...
06/20/2020 ∙ by Junqi Tang, et al. ∙ 0

• ### Linearized ADMM and Fast Nonlocal Denoising for Efficient Plug-and-Play Restoration

In plug-and-play image restoration, the regularization is performed usin...
01/18/2019 ∙ by Unni V. S., et al. ∙ 0

• ### Learning Personalized Representation for Inverse Problems in Medical Imaging Using Deep Neural Network

Recently deep neural networks have been widely and successfully applied ...
07/04/2018 ∙ by Kuang Gong, et al. ∙ 0

• ### Plug-and-play ISTA converges with kernel denoisers

Plug-and-play (PnP) method is a recent paradigm for image regularization...
04/07/2020 ∙ by Ruturaj G. Gavaskar, et al. ∙ 0

• ### A Bayesian Hyperprior Approach for Joint Image Denoising and Interpolation, with an Application to HDR Imaging

Recently, impressive denoising results have been achieved by Bayesian ap...
06/10/2017 ∙ by Cecilia Aguerrebere, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Image denoising is one of the most studied problems at the core of image processing. It has been questioned if it is a solved problem, given the excellent performance of several state-of-the-art methods, which seem to be approaching some sort of theoretical limit [1]. The answer to this question is negative: denoising is still a very active area of research, not only in itself, but also as a building block for many other tasks. In fact, most algorithms for imaging inverse problems (e.g.,

deblurring, reconstruction, or super-resolution) include a step that is formally equivalent to a denoising operation.

A possible approach to leverage state-of-the-art denoising methods in other inverse problems is to generalize/extend their underlying rationale to these other tasks [2, 3, 4]. The main drawbacks of this approach are that, most often, it is strongly problem-dependent (i.e., different problems require different generalizations) and it may lead to computationally hard problems. The recently proposed plug-and-play (PnP) approach aims at sidestepping these drawbacks and still be able to capitalize on state-of-the-art denoisers. PnP methods work by plugging a black-box denoiser into an iterative algorithm that address the inverse problem in hand. These algorithms typically include the proximity operator [5]

of a convex regularizer, which is formally equivalent to a denoising operation (under Gaussian white noise), with the convexity of the regularizer playing a central role in the convergence properties of the algorithm. The PnP approach replaces the proximity operator with some black-box denoiser

[6, 7], the key advantage being that the same denoiser can be used for different problems, since its use is fully decoupled from the particularities of the inverse problem being addressed.

Replacing the proximity operator with some denoiser makes the convergence of the resulting algorithm difficult to analyze. While the original algorithm optimizes an objective function, the PnP version lacks an explicit objective function. In the original PnP scheme [6, 7], the iterative algorithm in which the denoiser is plugged-in is the alternating direction method of multipliers (ADMM) [8, 9]. PnP-ADMM allows using an arbitrary denoiser to regularize an imaging inverse problem, such as deblurring or super-resolution, tackled via ADMM, but departing from its standard use, where the denoiser is the proximity operator of a convex regularizer [8]. (Another recent approach, termed regularization by denoising, follows a different route by building an explicit regularizer based on a denoiser [10].) The fact that a denoiser, which may lack an explicit objective function or a closed-form, is plugged into ADMM begs three obvious questions [11, 6, 7]:

• is the resulting algorithm guaranteed to converge?

• if so, does it converge to a (global or local) optimum of some objective function?

• if so, can we identify this function?

In this paper, we adopt a patch-based denoiser using Gaussian mixture models (GMM) [12, 13, 14], which allows us to provide positive answers to these three questions.

As proposed in our earlier work [15], a GMM can be adapted/specialized to specific classes of images; the rationale is that a denoiser based on a class-adapted prior is able to capture the characteristics of that class better than a general-purpose one. Here, we take this adaptation one step further, by using scene-adapted priors. As the name suggests, the model is no longer learned from a set of images from the same class, but from one or more images of the same scene being restored/reconstructed. Hyperspectral sharpening [16, 17, 18, 19] and image deblurring from noisy/blurred image pairs [20, 21] are the two data fusion problems that may leverage such priors and that we consider in this paper.

In summary, our contributions are the following.

• We propose plugging a GMM-based denoiser into an ADMM algorithm; this GMM-based denoiser is a modified version of a patch-based MMSE (minimum mean squared error) denoiser [12]; the modification consists in fixing the posterior weights of the MMSE denoiser.

• We prove that the resulting PnP-ADMM algorithm is guaranteed to converge to a global minimum of a cost function, which we explicitly identify.

• We apply the proposed framework to the two fusion problems mentioned above, showing that it yields results competitive with the stat-of-the-art.

An earlier version of this work was published in [22]. With respect to that publication, this one includes more detailed and general results and proofs, and the application to noisy/blurred image pairs, which was not considered.

The paper is organized as follows. Section II briefly reviews ADMM, which is the basis of the proposed algorithm, and its PnP version. Section III studies conditions for the denoiser to be a proximity operator, focusing on linear denoisers and explicitly identifying the underlying objective function. Section IV proposes a GMM-based denoiser and shows that it satisfies the conditions for being a proximity operator. Sections VVII present the two instantiations of the proposed method and the corresponding experimental results. Finally, Section VIII concludes the paper.

## Ii Alternating Direction Method of Multipliers

### Ii-a From Basic ADMM to SALSA

Although dating back to the 1970’s [23]

, ADMM has seen a surge of interest in the last decade, as a flexible and efficient optimization tool, widely used in imaging problems, machine learning, and other areas

[24, 8]. One of the canonical problem forms addressed by ADMM is

 minxf(x)+g(Hx), (1)

where and are closed (equivalently, lower semi-continuous – l.s.c. [5]), proper111A convex function is called proper if its domain is not empty: ., convex functions, and is a full column rank matrix of appropriate dimensions [24]. Each iteration of ADMM for (1) is as follows (with the superscript denoting the iteration counter):

 x(k+1) =argminxf(x)+ρ2∥∥Hx−v(k)−u(k)∥∥22 (2) v(k+1) =argminvg(v)+ρ2∥∥Hx(k+1)−v−u(k)∥∥22, (3) u(k+1) =u(k)−Hx(k+1)+v(k+1), (4)

where are the (scaled) Lagrange multipliers at iteration , and is the penalty parameter.

ADMM can be directly applied to problems involving the sum of closed, proper, convex functions, composed with linear operators,

 minxJ∑j=1gj(Hjx) (5)

by casting it into the form (1) as follows:

 f(x)=0,H=⎡⎢ ⎢⎣H1⋮HJ⎤⎥ ⎥⎦,v=⎡⎢ ⎢⎣v1⋮vJ⎤⎥ ⎥⎦,g(v)=J∑j=1gj(vj).

The resulting instance of ADMM (called SALSA–split augmented Lagrangian shrinkage algorithm [9]) is

 x(k+1) = argminxJ∑j=1∥Hjx−v(k)j−u(k)j∥22 (6) v(k+1)1 = argminzg1(v)+ρ2∥∥H1x(k+1)−z−u(k)1∥∥22, (7) ⋮ ⋮ v(k+1)J = argminzgJ(v)+ρ2∥∥HJx(k+1)−z−u(k)J∥∥22, (8) u(k+1) = uk−Hx(k+1)+v(k+1), (9)

where (as for ), . Computing requires solving a linear system (equivalently, inverting a matrix):

 x(k+1)=(J∑j=1HTjHj)−1(J∑j=1HTj(v(k)j+u(k)j)), (10)

whereas computing (for ) requires applying the proximity operator of the corresponding :

 v(k+1)j=proxgj/ρ(Hjx(k+1)−u(k)j). (11)

Recall that the proximity operator of a convex l.s.c. function , computed at some point , is given by

 proxϕ(z)=argminx12∥x−z∥22+ϕ(x), (12)

and is guaranteed to be well defined due to the coercivity and strict convexity of the term [5].

### Ii-B Convergence of ADMM Algorithm

We present a simplified version of a classical theorem on the convergence of ADMM, proved in the seminal paper of Eckstein and Bertsekas [24]. The full version of the theorem allows for inexact solution of the optimization problems in the iterations of ADMM, while this version, which we will use below, assumes exact solutions. For SALSA, being a particular instance of ADMM, the application of the theorem is trivial.

###### Theorem 1 (Eckstein and Bertsekas [24]).

Consider a problem of the form (1), where has full column rank, and and are closed, proper, convex functions; let , and be given. If the sequences , , and are generated according to (2), (3), (4), then converges to a solution of (1), , if one exists. Furthermore, if a solution does not exist, then at least one of the sequences or diverges.

The PnP framework [6] emerged from noticing that sub-problem (3) in ADMM (equivalently, (7)-(8) in SALSA), which correspond to the application of a proximity operator, can be interpreted as a denoising step. In fact, the definition of proximity operator in (12) is formally equivalent the maximum a posteriori

(MAP) estimate of some unknown

from noisy observations

(under additive white Gaussian noise with unit variance), under a prior proportional to

. Armed with this observation, the goal of the PnP approach is to capitalize on recent developments in image denoising, by replacing a proximity operator by a state-of-the-art denoiser. However, the current state-of-the-art denoising methods do not correspond (at least, explicitly) to proximity operators of convex functions. Most of them exploit image self-similarity by decomposing the image into small patches, i.e., they rely on a divide and conquer principle, where instead of dealing with the noisy image as a whole, small (overlapping) patches are extracted, denoised independently (using, e.g., collaborative filtering, Gaussian mixtures, dictionaries, …) and then put back in their locations [25, 26, 27, 12, 13, 28]. These denoisers often lack a closed-form expression, making the convergence analysis of the resulting algorithm much more difficult. In the following section, we focus on the key properties of the denoising functions needed to satisfy Theorem 1, temporarily leaving aside the operator , as it is problem-dependent.

## Iii When is a Denoiser a Proximity Operator?

### Iii-a Necessary and Sufficient Conditions

If the adopted denoiser in the PnP-ADMM scheme can be shown to correspond to the proximity operator of some l.s.c. convex function (although maybe not explicitly derived as such), then nothing really changes. The following classical result by Moreau [29] (which was used by Sreehari et al [7] in this context) gives necessary and sufficient conditions for some mapping to be a proximity operator.

###### Lemma 1 (Moreau [29]).

A function is the proximity operator of some function if and only if it is non-expansive and it is a sub-gradient of some convex function , i.e., if and only if, for any ,

 ∥p(x)−p(y)∥2≤∥x−y∥2andp(x)∈∂φ(x), (13)

where denotes the sub-differential of at [5].

In practice, given some black-box denoiser that performs a complex sequence of operations (e.g., the state-of-the-art BM3D [26]) it may be difficult, if not impossible, to verify that it satisfies the conditions of Lemma 1. In the next section, we study a class of denoisers that verify the conditions of Lemma 1, and go one step further by identifying the function of which the denoiser is the proximity operator. This second step is important to study the existence and uniqueness of solutions of the underlying optimization problem.

### Iii-B Denoisers as Linear Filters

Arguably, the simplest denoising methods estimate each pixel as a linear combination of the noisy pixels, with weights that depend themselves on these noisy pixels. Mathematically, given a noisy image , where is the underlying clean image and is noise, these methods produce an estimate, , according to

 ^x=p(y)=W(y)y, (14)

where is a square matrix, which is a function of . Many methods, including the classical non-local means (NLM [27]) can be written in this form. Sreehari et al [7] have studied this class of denoisers, with

being a symmetric and doubly-stochastic matrix. This is the case of symmetrized NLM

[6, 30], patch-based sparse representations on a dictionary with a fixed support [18], and patch-based approximate MAP estimation under a GMM prior [31].

In this paper, we study a different class of denoisers, where the matrix is fixed, , but it only needs to be symmetric, positive semi-definite (p.s.d.), and have spectral radius less or equal to 1 (conditions that are satisfied by any symmetric doubly-stochastic matrix). As shown below, the patch-based GMM-denoiser proposed in this paper satisfies these conditions.

Before proceeding, we introduce some notation. Let denote the column span of . Let denote the indicator function of the convex set , i.e., , if , and , if . The following theorem is proved in Appendix A.

###### Theorem 2.

Let be defined as , where

is a symmetric, p.s.d. matrix, with maximum eigenvalue satisfying

. Let be the eigendecomposition of , where contains the eigenvalues, sorted in decreasing order; the first are non-zero and the last are zeros. Let be the diagonal matrix with the non-zero eigenvalues and the matrix with the first columns of (an orthonormal basis for . Then,

• is the proximity operator of a convex function ;

• is the following closed, proper, convex function:

 ϕ(x)=ιS(W)(x)+12xT¯Q(¯Λ−1−I)¯QTx. (15)

We present (and prove) claims (i) and (ii) separately to highlight that (as done by Sreehari et al [7]), by using Lemma 1, it is possible to show that is a proximity operator without the need to explicitly obtain the underlying function.

## Iv GMM-Based Denoising

In this paper, we adopt a GMM patch-based denoiser to plug into ADMM. The reasons for this choice are threefold: (i) it has been shown that GMM are good priors for clean image patches [12, 13, 14]; (ii) a GMM can be learned either from an external dataset of clean images [13, 14], or directly from noisy patches [12]; (iii) the ease of learning a GMM from an external dataset opens the door to class-specific priors, which, naturally, lead to superior results since they capture the characteristics of the image class in hand better than a general-purpose image model [15, 32].

Leaving aside, for now, the question of how to learn the parameters of the model, we consider a GMM prior222

denotes a Gaussian probability density function of mean

and covariance , computed at .

 p(xi)=K∑j=1αjN(xi;\boldmath% μj,Cj), (16)

for any patch of the image to be denoised. A current practice in patch-based image denoising is to treat the average of each patch separately: the average of the noisy patch is subtracted from it, the resulting zero-mean patch is denoised, and the average is added back to the estimated patch. The implication of this procedure is that, without loss of generality, we can assume that the components of the GMM have zero mean (, for ).

Given this GMM prior and a noisy version of each patch, where , with , the MMSE estimate of is given by

 ^xi=K∑j=1βj(yi)vj(yi), (17)

where

 (18)

and

 βj(yi)=αjN(yi;0,Cj+σ2I)∑Kk=1αjN(yi;0,Ck+σ2I). (19)

Notice that

is the posterior probability that the

i-th patch was generated by the j-th component of the GMM, and is the MMSE estimate of the i-th patch if it was known that it had been generated by the -th component.

As is standard in patch-based denoising, after computing the MMSE estimates of the patches, they are returned to their location and combined by straight averaging. This corresponds to solving the following optimization problem

 ˆx∈argminx∈RnN∑i=1∥^xi−Pix∥22, (20)

where is a binary matrix that extracts the -th patch from the image (thus puts the patch back into its place), is the number of patches, is the number of pixels in each patch, and the total number of image pixels. The solution to (20) is

 ^x=(N∑i=1PTiPi)−1(N∑i=1PTi^xi)=1npN∑i=1PTi^xi, (21)

assuming the patches are extracted with unit stride and periodic boundary conditions, thus every pixel belongs to

patches and .

Class-adapted priors have been used successfully in image deblurring, if the image to be deblurred is known to belong to a certain class, e.g., a face or a fingerprint, and a prior adapted to that class is used [15]. Here, we take this adaptation to an extreme, which is possible if we have two types of data/images of the same scene, as is the case with some fusion problems. Instead of using a dataset of other images from the same class (e.g., faces), we leverage one type of data to learn a model that is adapted to the specific scene being imaged, and use it as a prior. The underlying assumption is that the two types of data have similar spatial statistical properties, which is reasonable, as the scene being imaged is the same. In Sections V and VI, we apply this idea to tackle two data fusion problems, namely hyperspectral (HS) sharpening [16], and image deblurring from a noisy/blurred image pair [21].

In HS sharpening, the goal is to fuse HS data of low spatial resolution, but high spectral resolution, with multispectral (MS) or panchromatic (PAN) data of high spatial resolution, but low spectral resolution, to obtain an HS image with high spatial and spectral resolutions. This can be seen as a super-resolution problem, where the HS image is super-resolved with the help of a spatial prior obtained from the MS image of the same scene. More specifically, we resort to a GMM prior, as described in the previous section, learned from (patches of the bands of) the observed MS or PAN image using the classical expectation-maximization (EM) algorithm, and then leverage this prior to sharpen the low resolution HS bands.

In the second problem, the goal is to restore a blurred, but relatively noiseless, image, with the help of another image of the same scene, which is sharp but noisy. The noisy sharp image is used to learn a spatial prior, adapted to the particular scene, which is then used as a prior to estimate the underlying sharp and clean image. Again, this is carried out by learning a GMM from the patches of the noisy, but sharp, image.

In light of the scene-adaptation scheme just described, we further claim that, it not only makes sense to use a GMM prior learned from the observed sharp images, but it also makes sense to keep the weights of each patch in the target image, required to compute (17), equal to the values for the corresponding observed patch, obtained during training. In other words, we use the weights as if we were simply denoising the sharp image, with a GMM trained from its noisy patches. In fact, another instance of this idea has been previously used, based on sparse representations on learned dictionaries [18]. We stress that this approach is only reasonable in the case of image denoising or image reconstruction using scene-adapted priors, where the training and target images are of the same scene, and thus have similar spatial structures. As will be shown in the next section, this modification will play an important role in proving convergence of the proposed PnP-ADMM algorithm.

### Iv-B Analysis of the Scene-adapted GMM-based Denoiser

Consider the role of the denoiser: for a noisy input argument , it produces an estimate , given the parameters of the GMM, , and the noise variance . As the GMM-denoiser is patch-based, the algorithm starts by extracting (overlapping, with unit stride) patches, followed by computing their estimates using (17). Recall that, with the scene-adaptation scheme, parameters no longer depend on the noisy patches, and thus we may write compactly,

 ^xi =K∑j=1βijCj(Cj+σ2I)−1yi=Fiyi=FiPiy, (22)

where is the same matrix that appears in (20), thus . Furthermore, using (21),

 ^x=1npN∑i=1PTiFiPiWy=Wy, (23)

which shows that the GMM-based MMSE denoiser, with fixed weights, is a linear function of the noisy image. Matrix , of course, depends on the parameters of the method: , , and the parameters of the GMM, but it is fixed throughout the iterations of ADMM. Recall that the weights and covariances are obtained during training, and is assumed known, thus is fixed. The following Lemma summarizes the key properties of , when using the scene-adapted GMM denoiser. The proof can be found in Appendix B.

###### Lemma 2.

Matrix is symmetric, positive semi-definite, with maximum eigenvalue no larger than 1.

Lemma 2 and Theorem 2 yield the following corollary.

###### Corollary 1.

The GMM patch-based denoiser defined in (23) corresponds to the proximity operator of the closed, proper, convex function defined in (15).

Notice that Lemma 2 and Corollary 1 hinge on the fact that the denoiser uses fixed weights , rather than being a pure MMSE denoiser. In fact, the simple univariate example in Fig. (a)a shows that an MMSE denoiser may not even be non-expansive. This figure plots the MMSE estimate in (17), as a function of the noisy observed , under a GMM prior with two zero-mean components (one with small and one with large variance). Clearly, the presence of regions of the function with derivative larger than 1 implies that it is not non-expansive.

### Iv-C Convergence Proof

Corollary 1 allows proving convergence of the proposed algorithm. This result is formalized in Corollary 2.

###### Corollary 2.

Consider the SALSA algorithm (in (6)–(9)) applied to a problem of the form (5), with , for at least one . Consider also that are proper closed convex functions, and , where is as defined in (15). Furthermore, assume that the intersection of the domains of is not empty. Then, the algorithm converges.

###### Proof.

The proof amounts to verifying the conditions of Theorem 1. The first condition for convergence is that has full column rank. Since

 H=[HT1,…,HTJ]T,

the fact that implies that has full column rank, regardless of the other blocks. The second condition regards functions to ; while to are assumed to be closed, proper, and convex, Corollary 1 guarantees that is also closed, proper, and convex. Finally, the function being minimized is the sum of closed, proper, convex functions, such that the intersection of the corresponding domains is not empty; thus, the sum is itself closed, proper, and convex, implying that a minimizer is guaranteed to exist. ∎

## V Application to Hyperspectral Sharpening

### V-a Formulation

Hyperspectral (HS) sharpening is a data fusion problem where the goal is to combine HS data of very high spectral resolution, but low spatial resolution, with multispectral (MS) or panchromatic (PAN) data of high spatial resolution, but low spectral resolution [16]. This data fusion problem can be formulated as an inverse problem, where the goal is to infer an image with both high spatial and spectral resolutions, from the observed data [16, 33]. Using compact matrix notation, the forward model can be written as

 Yh = ZBM+Nh, (24) Ym = RZ+Nm, (25)

where:

• is the image to be inferred, with bands and pixels (columns index pixels, rows index the bands);

• denotes the observed HS data, with bands and pixels (naturally, ; often, )

• is the observed MS/PAN data, with bands and pixels (naturally, ; often, );

• represents a spatial low-pass filter;

• is the spatial sub-sampling operator;

• models the spectral response of the MS or PAN sensor;

• and represent noise, assumed white, Gaussian, with known variances and , respectively.

Eqs. (24)-(25) model the HS data as a spatially blurred and downsampled version of , and the MS data as a spectrally degraded version of , via the spectral response matrix . In this paper, we assume that and are known, and that the spatial filter is cyclic so that we may handle it efficiently using the

(FFT); this assumption could be easily relaxed using the method in [34]. Moreover, other types of noise could also be considered within this formulation [18].

Since the number of unknowns in is typically much larger than the number of measurements, , the problem of estimating is ill-posed. Current state-of-the-art methods tackle it using a variational formulation (equivalently, maximum a posteriori – MAP) [16], i.e., seeking an estimate according to

 ˆZ∈argminZ∥ZBM−Yh∥2F2+λ∥RZ−Ym∥2F2+τξ(Z), (26)

where is the regularizer (or negative log-prior), while parameters and control the relative weight of each term.

One way to alleviate the difficulty associated with the very high dimensionality of is to exploit the fact that the spectra in a HS image typically live in a subspace of of dimension (often with ), due to the high correlation among the different bands [35, 16]. This subspace can be interpreted as a set of materials existing on the scene, which is often much smaller than the number of bands. There are several methods to identify a suitable subspace, from the classical singular value decomposition (SVD) or principal component analysis (PCA), to methods specifically designed for HS data [36, 37]. Let contain a basis for the identified subspace, then we may write , where each column of , denoted latent image, is a coefficient representation of the corresponding column of . Instead of estimating directly, this approach reformulates the problem as that of obtaining an estimate of , from which can be recovered. Replacing by its representation in (26) leads to a new estimation criterion

 ˆX∈argminX∥EXBM−Yh∥2F2+λ∥REX−Ym∥2F2+τϕ(X), (27)

where denotes the squared Frobenius norm and is the regularizer acting on the coefficients.

### V-B SALSA for Hyperspectral Sharpening

We follow Simões et al. [17] and use SALSA to tackle problem (27). Notice that, while in (27), the optimization variable is a matrix , in (5

), the optimization variable is a vector

. This is merely a difference in notation, as we can represent matrix by its vectorized version , i.e., by stacking the columns of into vector . In fact, using well-known equalities relating vectorization and Kronecker products, we can write

 ∥EXBM−Yh∥2F =∥∥(MT⊗E)(BT⊗I)x−vec(Yh)∥∥22 (28) ∥REX−Ym∥2F =∥∥(I⊗(RE))x−vec(Ym)∥∥22, (29)

and map (27) into (5) by letting , and

 g1(v1) =∥∥(MT⊗E)v1−vec(Yh)∥∥22=∥EV1M−Yh∥2F (30) g2(v2) =λ∥∥(I⊗(RE))v2−vec(Ym)∥∥22 (31) =λ∥REV2−Ym∥2F, g3(v3) =2τϕ(V3), (32)

where , for , and

 H1=(BT⊗I),H2=I,H3=I. (33)

Although the previous paragraph showed that we can arbitrarily switch between matrix and vector representations, in what follows it is more convenient to keep the matrix versions. Each iteration of the resulting instance of ADMM has the form

 X(k+1)=argminX∥XB−V(k)1−D(k)1∥2F+∥X−V(k)2−D(k)2∥2F +∥X−V(k)3−D(k)3∥2F, V(k+1)1=argminV1∥EV1M−Yh∥2F+ρ∥X(k+1)B−V1−D(k)1∥2F, V(k+1)2=argminV2λ∥REV2−Ym∥2F+ρ∥X(k+1)−V2−D(k)2∥2F, V(k+1)3=argminV3ϕ(V3)+ρ2τ∥X(k+1)−V3−D(k)3∥2F, (34) D(k+1)1=D(k)1−(X(k+1)B−V(k+1)1), (35) D(k+1)2=D(k)2−(X(k+1)−V(k+1)2), (36) D(k+1)3=D(k)3−(X(k+1)−V(k+1)3), (37)

where is the so-called penalty parameter of ADMM [8]. The first three problems are quadratic, thus with closed-form solutions involving matrix inversions [17]:

 X(k+1) = (38) [(V(k)1+D(k)1)BT+V(k)2+D(k)2+V(k)3+D(k)3][BBT+2I]−1, V(k+1)1 =[ETE+ρI]−1[ETYh+ρ(X(k+1)B−D(k)1)]⊙M +(X(k+1)B−D(k)1)⊙(1−M), (39) V(k+1)2 =[λETRTRE+ρI]−1[λETRTYm+ρ(X(k+1)−D(k)2)], (40)

where denotes entry-wise (Hadamard) product. The matrix inversion in (38) can be computed with cost via FFT, assuming periodic boundary conditions or unknown boundaries [34]. The matrix inversions in (39) and (40) involve matrices of size ; since is typically a number around , the cost of these inversions is marginal. Moreover, with fixed , these two inverses can be pre-computed [17].

### V-C GMM-Based PnP-SALSA Algorithm

Finally, the PnP algorithm is obtained by letting the regularizer be as defined in (15), i.e., (34) takes the form

 (41)

where is as given in (22)–(23), with . Of course, (41) is just the formal representation of the denoiser acting on ; matrix is never explicitly formed, and the multiplication by corresponds to the patch-based denoiser described in Section IV. The resulting PnP-SALSA algorithm is as shown in Algorithm 1, and its convergence is stated by the following corollary.

###### Corollary 3.

Algorithm 1 converges.

###### Proof.

Convergence of Algorithm 1 is a direct consequence of Corollary 2, with , and , with as given by (15). The condition that at least one of the matrices is an identity is satisfied in (33). Functions and (see (30) and (31)) are quadratic, thus closed, proper, and convex, and their domain is the whole Euclidean space where they are defined; consequently, , implying that (27) has at least one solution. ∎

### V-D Complete Hyperspectral Sharpening Algorithm

Finally, to obtain a complete HS sharpening algorithm, we still need to specify how matrices and are obtained. Matrix is obtained by applying PCA to all the spectra in the HS data, .

As mentioned above, matrix is not explicitly formed; multiplication by corresponds to the patch-based denoiser described in Subsection IV-B, based on the covariance matrices , and weights . These covariance matrices are learned from all the patches of all the bands of , using the EM algorithm. With denoting the (vectorized) -th patch of the -th band of , each E-step of EM computes the posterior probability that this patch was produced by the -the GMM component:

 βip,j=βj(yim,p)=αjN(yim,p;0,Cj+σ2mI)∑Kk=1αkN(yim,k;0,Ck+σ2mI), (42)

where are the current weight estimates. The M-step updates the weight and covariance estimates. The weights are updated using the standard expression in EM for GMM:

 αj←∑i∑pβip,j∑j∑i∑pβip,j,for j=1,...,K.

The covariances update has to be modified to take into account that the patches have noise (see [12]),

 Cj←eigt(∑i∑pβip,j(yim,p)(yim,p)T∑i∑pβip,j−σ2mI),for j=1,...,K,

where is the eigenvalue thresholding operation, which guarantees that is a p.s.d. matrix after subtracting ; for a real symmetric matrix argument, , where the columns of

are the eigenvectors of

, is the diagonal matrix holding its eigenvalues, and the operation is applied element-wise.

After convergence of EM is declared, the posterior weights of each patch are computed by averaging across bands,

 βij=1LmLm∑p=1βip,j (43)

and stored in (recall that is the number of patches). Naturally, we assume that the patches are aligned across bands, i.e., all the patches , for , are exactly in the same spatial location. The complete method is described in Algorithm 2.

## Vi Application to Deblurring Image Pairs

### Vi-a Formulation

Another application that can leverage scene-adapted priors is image deblurring from blurred and noisy image pairs. It was first proposed in [20], and it is particularly useful for improving image quality under dim lighting conditions. On the one hand, a long exposure with a hand-held camera is likely to yield blurred pictures, whereas, on the other hand, a short exposure with high ISO produces sharp, yet noisy images. The idea is then to combine these two types of images of the same scene and fuse them to obtain a sharp and clean image.

The observation model in this problem is a simple instance of (24)–(25), with , , , and , i.e., there is neither sub-sampling nor spectral degradation from the sensor. Furthermore, we only consider grayscale images, thus , and we vectorize the images differently, with all the pixels stacked into a column vector. In conclusion, the observation model is

 yb = Bx+nb, (44) yn = x+nn, (45)

where:

• is the (vectorized) target image to be inferred;

• is the blurred image;

• is the noisy sharp image;

• represents a spatial blur;

• represent noise, assumed to be zero-mean, white, Gaussian, with variances and , with .

As before, we seek a MAP/variational estimate given by

 ˆx∈argminx∥Bx−yb∥222+λ∥x−yn∥222+τϕ(x), (46)

where is the regularizer, and and have similar meanings as in (27).

### Vi-B ADMM for Deblurring with Image Pairs

For this problem formulation, we may use the canonical form of ADMM in (1). Each iteration of the resulting algorithm becomes

 x(k+1) =argminx∥Bx−yb∥22+λ∥x−yn∥22 +ρ∥∥x−v(k)−u(k)∥∥22 (47) v(k+1) =argminvϕ(v)+ρ2τ∥∥x(k+1)−v−u(k)∥∥22, (48) u(k+1) =u(k)−x(k+1)+v(k+1), (49)

where the objective in (47) is quadratic, with a closed-form solution,

 x(k+1) =[BTB+λI+ρI]−1[BTyb+λyn+ρ(v(k)+u(k))]. (50)

As above, assuming periodic or unknown boundaries, (50) can be implemented with cost via the FFT [34]. The proximity operator in (48) corresponds to the denoising step, which thus takes the form

 v(k+1)=W(x(k+1)−u(k)). (51)

The proof of convergence of the algorithm in (47)–(49) is essentially the same as that of the previous section, so we abstain from presenting it here; we simply have to notice that (51) is indeed a proximity operator and that (46) is guaranteed to have at least one solution, because it is a proper, closed, convex function.

As in the HS sharpening case, is not explicitly formed; multiplication by corresponds to the patch-based denoiser described in Subsection IV-B, based on the covariance matrices , and weights . These covariances are learned via EM from the patches of the noisy sharp image, and the weights of the GMM components at each patch are kept and used to denoise the corresponding patch in the blurred image.

## Vii Experimental Results

### Vii-a Hyperspectral Sharpening

We begin by reporting an experiment designed to assess the adequacy of the GMM prior. We generate synthetic HS and MS data according to models (24) and (25), respectively, with 50dB SNR (nearly noiseless). Afterwards, we add Gaussian noise with a given variance, namely , and recover the clean data using the GMM-based denoiser and BM3D [26]. The results are then compared with respect to the visual quality and peak SNR (PSNR). Whereas BM3D estimates the clean image from the noisy one, the GMM is trained only once on the nearly noiseless PAN image. This prior is then used on all the denoising experiments: noisy PAN, noisy MS band (red channel), noisy HS band, noisy image of coefficients, see Figs 8 to 23. The weights are also kept fixed on the four experiments, that is, we use the weights computed in the training stage, relative to the PAN image. In all the experiments, we consider a GMM with 20 components, trained from overlapping patches.

The results support the claim that the GMM trained from the PAN image is a good prior for both the MS and HS bands, and also for the image of coefficients. In fact, when denoising the PAN and MS data, the GMM-based denoiser, with the precomputed model and weights, is able to outperform BM3D (Figs 8 and 13). Furthermore, when denoising the HS image, the same GMM prior is still able to perform competitively with BM3D, which uses only the noisy input (Figs 18 and 23). While BM3D performs very well in this pure denoising setting, being able to learn the model from the PAN image is an important feature of the GMM denoiser because, in HS sharpening, the spatial resolution of the HS bands may not be enough to learn a suitable prior. On the other hand, since the PAN image has the highest spatial resolution, it is the best band to train the model. Notice that the other MS bands, and also the HS bands and coefficients, retain most of the spatial features of the panchromatic band, which gives supporting evidence to our claim that the weights should be kept fixed. Table I shows the results when the weights are fixed versus weights computed with (19) depending on the noisy input patch (the exact MMSE estimate).

Next, we compare the HS sharpening results using the MMSE and MAP GMM-based denoisers from [12] and [31], respectively, the scene-adapted GMM denoiser herein proposed, and the dictionary-based method from [18] (which, to the best of our knowledge, is state-of-the-art). We use three different metrics: ERGAS (erreur relative globale adimensionnelle de synthèse), SAM (spectral angle mapper), and PSNR [16], in 4 different settings. In the first one, we consider sharpening the HS images using the PAN image, both at 50dB SNR. The second experiment refers to PAN-sharpening as well, at 30dB SNR on the PAN image and last 50 HS bands, and 35dB on all the other bands [18]. Experiments 3 and 4 use the same SNR as 1 and 2, respectively, but the HS bands are sharpened based on 4 MS bands: R, G, B, and near-infrared. Once again, a GMM with components is learned, and used as explained in Subsection V-D. The PAN and MS images were generated from the original HS data using the IKONOS and LANDSAT spectral responses. For more details about the experimental procedure we refer the reader to [18].

Table II shows the results on a cropped area of the ROSIS Pavia University and Moffett Field datasets. The four methods have comparable performances, with the proposed method being slightly better. The results also support the claim that the target image has the same spatial structure as the PAN image used to train the GMM, otherwise keeping the posterior weights would not yield good results. Figures (a)a(d)d show the results of experiment 4 on the Moffett Field data, in terms of visual quality, yet the differences are not visually noticeable. To better grasp the accuracy, we plot the (sorted) errors for each pixel, as well as pixel values across bands. In all experiments, the parameters of PnP-SALSA, as described on Algorithm 1, were selected using a grid search. In particular, in this example, we used , , .