Image denoising is not only one of the core problems in image processing, but also a building block for many other tasks, and has been a very active research topic over the last decades. Most of the current state-of-the-art methods are patch-based, i.e., they rely on a divide and conquer principle, where instead of dealing with the noisy image as a whole, small (overlapping) patches are extracted, denoised independently, and put back in their locations [1, 2, 3]. Applying these patch-based methods in more general restoration/reconstruction problems is not trivial and has been a topic of research in the past years [4, 5, 6].
Recently, Venkatakrishnan et al. proposed a flexible framework, called plug-and-play (PnP), where a state-of-the-art denoiser is seen as a black box and plugged into the iterations of an alternating direction method of multipliers (ADMM) 
. This approach allows using an arbitrary denoiser as a regularizer in an imaging inverse problem, such as deblurring or super-resolution, tackled via an ADMM algorithm, but departing from its standard use, where the denoiser is the proximity operator of some convex regularizer. However, plugging an arbitrary denoiser (possibly without a closed-form expression) into ADMM begs obvious questions [9, 7]: is the resulting algorithm guaranteed to converge? If so, does it converge to a (global or local) optimum of some objective function? Can we identify this function? Here, we give positive answers to these questions, when the plugged denoiser is a modified Gaussian mixture model (GMM) based denoiser [2, 3, 10].
As proposed in earlier work , the GMM can be adapted to specific classes of images; the rationale is that denoisers based on such class-adapted priors are able to better capture the characteristics of the class than a general-purpose denoiser. In this paper, we take this adaptation one step further, by considering scene-adapted priors. As the name suggests, the model is no longer learned from a set of images from the same class as the input image, but from one or more images over the same scene being restored/reconstructed. Hyperspectral sharpening [12, 13, 14, 15] and image deblurring with noisy/blurred image pairs  are two examples of applications that may leverage from such priors.
In summary, the contributions are threefold: (i) we prove that the ADMM algorithm with a plugged-in GMM-based denoiser is guaranteed to converge to the global minimum of a cost function, provided that we introduce a small modification to the denoiser; (ii) this proposed modification not only guarantees convergence of the algorithm, but it is also motivated by the scene-adapted perspective; (iii) we show that the proposed algorithm yields results that are often better than other state-of-the-art hyperspectral sharpening methods.
The paper is organized as follows. Section 2 gives a brief description of ADMM, which is the basis of the proposed algorithm. Section 3 explains the rationale behind the PnP framework, and our reasons for choosing the GMM-based denoiser. Section 4 contains what is arguably the main contribution of this paper: the convergence of the resulting algorithm. The application to HS sharpening is described in Section 5, and experimental results are reported in Section 6. Finally, Section 7 concludes the paper.
2 Alternating Direction Method of Multipliers
Although dating back to the 1970’s 
, ADMM has seen a surge of interest in the last decade, as a flexible and efficient optimization tool, widely used in imaging problems, machine learning, and other areas[18, 19]. One of the canonical problems for ADMM has the form
where are the (scaled) Lagrange multipliers at iteration , and is the penalty parameter.
ADMM can be directly applied to problems involving the sum of closed, proper, convex functions, composed with linear operators,
by casting it into the form (1) as follows:
The resulting instance of ADMM (called SALSA–split augmented Lagrangian shrinkage algorithm ) is
where (as for ), .
3 Plug-and-Play Priors
The PnP framework  emerges from noticing that sub-problems (7)-(8) can be interpreted as the MAP solutions to pure denoising problems. In such situations, the resulting sub-problem has an identity observation operator, function acting as the regularizer (or negative log-prior, in Bayesian terms),
as the noisy data, and noise variance equal to. The goal is to capitalize on recent developments in image denoising, by using a state-of-the-art denoiser plugged into the iterations of ADMM, instead of trying to come up with a convex regularizer for which it is possible to compute the corresponding proximity operator.
3.1 GMM-Based Denoising
Amongst the several state-of-the-art denoising methods, we adopt a GMM patch-based denoiser. The reasons for this choice are threefold: (i) it has been shown that GMM are good priors for clean image patches [3, 10]; (ii) a GMM can be learned either from an external dataset of clean images [3, 10], or directly from noisy patches ; (iii) the ease of learning a GMM from an external dataset opens the door to class-specific priors, which, naturally, lead to superior results since they capture better the characteristics of the image class in hand, than a general-purpose image model [11, 21].
Leaving aside for now the question of how to learn the parameters of the model, we consider a GMM prior111 denotes a Gaussian probability density function of mean
denotes a Gaussian probability density function of meanand covariance , computed at .
for any patch
of the image to be denoised. A current practice in patch-based image denoising is to treat the average of each patch separately: the average of the noisy patch is subtracted from it, the resulting zero-mean patch is denoised, and the average is added back to the estimated patch. The implication of this procedure is that, without loss of generality, we can assume that the components of the GMM have zero mean (, for ).
Given this GMM prior and a noisy version of each patch, where , with , the minimum mean squared error (MMSE) estimate of is given by
is the posterior probability that thei-th patch was generated by the j-th component of the GMM, and is the MMSE estimate of the i-th patch if it was known that it had been generated by the -th component.
As is standard in patch-based denoising, after computing the MMSE estimates of the patches, they are returned to their location and combined by straight averaging. This corresponds to solving the following optimization problem
where is a binary matrix that extracts the -th patch from the image (thus puts the patch back into its place), is the number of patches, is the number of pixels in each patch, and the total number of image pixels. The solution to (14) is
assuming the patches are extracted with unit stride and periodic boundary conditions, thus every pixel belongs topatches and .
3.2 From Class-Adapted to Scene-Adapted Priors
Class-adapted priors have been used successfully in image deblurring, if the image to be deblurred is known to belong to a certain class, e.g., image of a face or fingerprint, and a prior adapted to that class is used . Here, we propose taking this adaptation to an extreme, whenever we have two types of data describing the same scene, as is the case with data fusion problems. Instead of leveraging a dataset of images from the same class as the input image, we should leverage one type of data to learn a model that is targeted to the specific scene being imaged, and use it as a prior in the reconstruction/restoration process. The underlying assumption is that the two types of data share the same spatial statistical properties; this is a reasonable assumption, since the scene being imaged is exactly the same. In Section 6, we implement this idea to tackle a data fusion problem known as hyperspectral (HS) sharpening (see  for a detailed description of the problem). We resort to a GMM prior, as described in Subsection 3.1, learned from (patches of the bands of) the observed multispectral (MS) or panchromatic (PAN) image using the classical expectation-maximization (EM) algorithm, and then leverage this prior to sharpen the low resolution HS bands.
In light of the scene-adaptation scheme just described, we further claim that, it not only makes sense to use a GMM prior learned from the observed MS or PAN image, but it also makes sense to keep the weights of each patch in the target image, used to compute (11), equal to its value for the corresponding observed patch, obtained during training. In other words, we use the weights as if we were simply denoising the high resolution MS image, with a GMM trained from its noisy patches. In fact, another instance of this idea has been previously used, based on sparse representations on learned dictionaries . We stress that this approach is only reasonable in the case of image denoising or image reconstruction using scene-adapted priors, where the training and target images are of the same scene, and thus have similar spatial structures. As will be shown in the next section, this modification will play an important role in proving convergence of the proposed PnP-ADMM algorithm.
4 Convergence of the PnP-ADMM Algorithm
We begin by presenting a simplified version of a classical theorem on the convergence of a generalized version of ADMM, proved in the seminal paper by Eckstein and Bertsekas . The full version of the theorem allows for inexact solution of the optimization problems in the iterations of ADMM, while this version, based on which our proof of convergence of PnP-ADMM will be supported, assumes exact solutions.
Theorem 1 (Eckstein and Bertsekas ).
Consider a problem of the form (1), where has full column rank, and and are closed, proper, and convex functions, and let , and be given. If the sequences , , and are generated according to (2), (3), (4), then converges to a solution of (1), , if one exists. Furthermore, if a solution does not exist, then at least one of the sequences or diverges.
Before proceeding, some properties of the denoising function are established.
4.1 Analysis of GMM-Based Denoiser
Consider the role of the denoiser: for a noisy input argument , it produces an estimate , given the parameters of the GMM, , and a noise level, . As the GMM-denoiser is patch-based, the algorithm starts by extracting (overlapping, with unit stride) patches, followed by computing their estimates using (11). Recall that, with the scene-adaptation scheme, parameters no longer depend on the noisy patches, and thus we may write compactly, for each patch,
where is the binary matrix defined in Subsection 3.1, thus . Moreover, we can relate the input of the denoiser to its image-wise output using (15), under the assumption of periodic boundary conditions. Once again, using compact notation
which shows that the GMM-based MMSE denoiser, with fixed weights, is a linear function of the noisy image. Matrix , of course, depends on the parameters of the method (, ) as well as on the parameters of the GMM, but it is fixed throughout the iterations of the ADMM loop. Recall that the weights, , and covariance matrices, , are obtained during training, and is assumed to be known, thus is fixed. The next lemma states the key properties of matrix . The proof can be found in the Appendix.
Consider that all covariance matrices in the GMM are positive definite. Then, is symmetric, positive definite, and has spectral norm strictly less than 1.
Because the denoiser corresponds to multiplying the argument by a symmetric positive definite matrix, it is the proximity operator of a quadratic function, as stated in the following lemma.
The function , defined by , where is symmetric, positive definite, and has spectral norm strictly less than 1, corresponds to the proximity operator of the quadratic funtion .
Proof: Using the definition of proximity operator ,
The function being minimized w.r.t. can be written as
which is quadratic, strictly convex, with a unique minimizer .
4.2 Convergence Proof
Proof: The proof consists in verifying the conditions of Theorem 1.
The first condition for convergence is that has full column rank. The proposed algorithm is an instance of SALSA (as defined in (6)–(9)), which in turn is an instance of ADMM (as defined in (2)–(4)), with
which has full column rank, regardless of .
The second condition regards functions to . In fact, to are closed, proper, and convex, by virtue of to being proximity operators . Furthermore, as shown by Lemma 2, the denoiser also corresponds to the proximity operator of a quadratic (thus also closed and proper) function , which is strongly convex. Finally, the function being minimized is the sum of convex functions, one of which is strongly convex, thus a minimizer is guaranteed to exist.
We conclude by observing that this proof of convergence depends critically on the fact that the denoiser uses fixed weights in , rather than being a pure MMSE denoiser, as described in Subsection 3.1. In fact, the simple univariate example in Fig. (a)a shows that an MMSE denoiser may not even be non-expansive222Recall that a function is non-expansive if, for any ,
Theorem 3 (Moreau ).A function is a proximal mapping if and only if it is non-expansive and the sub-gradient of some convex function.
5 Application: Hyperspectral Sharpening
In this section, we tackle a data fusion problem known as HS sharpening, under the PnP framework with a GMM-denoiser and a scene-adapted prior, as described above. Using compact matrix notation (as in ), the HS sharpening inverse problem can be modeled as
where is the target image to be estimated, is the observed HS data, is a spatial convolution operator, is a sub-sampling operator, is the observed MS data, models the spectral responses of the MS sensor, and and are Gaussian noises of known variances. Moreover, we assume the sensor has a fixed point spread function, and matrices and are known; see  for a blind approach.
HS data typically has hundreds of bands but the spectral vectors, i.e. the columns of, live in a low dimensional subspace which may be learned from the observed HS data . In particular, for independent and identically distributed (iid) noise, the range of
may be learned from the eigenvectors of the correlation matrix. Then, instead of estimating directly, we estimate latent images , and recover , where the columns of are the eigenvectors corresponding to the
largest eigenvalues of. The observation model becomes
A classical approach to this inverse problem is to compute the maximum a posteriori (MAP) estimate, i.e.,
where is the negative log-prior (or regularizer), while and control the relative weight of each term. We follow Simões et al.  and use SALSA to tackle (21), where once again we assume the noise is Gaussian iid; more general forms of noise correlation can be included . Notice that, while in (21) the optimization variable is a matrix , in (5) the optimization variable is a vector . These is merely a difference in notation, as we can represent matrix by its vectorized version , which corresponds to stacking the columns of into a vector. Using well-known equalities relating vectorization and Kronecker products, we can write
where , for , while
Although we can arbitrarily switch between matrix and vector representations, in what follows it is more convenient to use matrix notation. Each iteration of the resulting ADMM has the form
where is the so-called penalty parameter of ADMM, and are the scaled Lagrange dual variables . The first three problems are quadratic, thus with closed-form solutions involving matrix inversions (for details, see ):
where denotes entry-wise (Hadamard) product. The matrix inversion in (27) can be implemented with cost using the FFT, under the assumption of periodic boundary conditions. The matrix inversions in (28) and (29) involve matrices of size ; since is typically , the cost of this inversion in marginal. Moreover, if is fixed, the two inverses can be pre-computed .
The resulting HS sharpening formulation satisfies Theorem 2. In fact, functions and (see (22) and (23)) are quadratic, thus closed, proper, and convex. Moreover, using Lemma 2, the denoiser corresponds to the proximity operator of a quadratic function , which is strongly convex, closed and proper. Finally, since the cost function is the sum of three convex quadratic functions, one of which is strongly convex (defined implicitly by the denoiser), a global minimum is guaranteed to exist. We stress that, although the global objective in (21) is quadratic when is a GMM-based denoiser with fixed weights, and thus fixed , we still need to resort to the ADMM to solve the problem because is too large to be handled directly.
|Exp. 1 (PAN)||Exp. 2 (PAN)||Exp. 3 (R,G,B,N-IR)||Exp. 4 (R,G,B,N-IR)|
We compare the results using the GMM-based denoiser from , the GMM denoiser herein proposed, and the dictionary-based method from  (which, to the best of our knowledge, is the state-of-the-art). We use three different metrics: ERGAS (erreur relative globale adimensionnelle de synthèse), SAM (spectral angle mapper), and SRE (signal to reconstruction-error) , in 4 different settings. In the first experiment, we consider sharpening the HS images using the PAN image, at 50dB SNR, on both the HS and PAN images. The second experiment refers to PAN-sharpening as well, with 35dB SNR on the first 43 HS bands and 30dB on the 50 remaining HS bands and the PAN image . Experiments 3 and 4 use the same SNR as experiments 1 and 2, respectively, but the HS bands are sharpened based on 4 MS bands: R, G, B, and near-infrared. In all the experiments, a GMM with 20 components is learned from the PAN image, where the PAN and MS images are generated from the original HS bands, using IKONOS and LANDSAT spectral responses.
Table 1 shows the results on a cropped area of the ROSIS Pavia University and Moffett Field datasets. The three methods have comparable performances, with the proposed one being slightly better. Furthermore, the results supports the hypothesis that the target image has the same spatial structure as the PAN image used to train the GMM, otherwise keeping the posterior mixture weights would not yield good results. Figures (a)a–(d)d show the results of experiment 4 on the Moffett Field data, in terms of visual quality, yet the differences are not noticeable. Another way to visualize the accuracy is to plot the (sorted) errors for each pixel and to notice that the proposed method has a larger number below a given threshold, i.e., in Figure (e)e the dashed curve is below the solid one; furthermore, in Figures (f)f and (g)g, the dashed curve is closer (smaller error) to the dotted one (the true pixel value across all bands) than the solid line.
This paper proposed a PnP algorithm, with a GMM-based denoiser, guaranteed to converge to a global minimum of the underlying cost function. The denoiser is scene-adapted and it is the proximity operator of a closed, proper, and convex function; consequently, the standard convergence guarantees of ADMM (in Theorem 1) apply, if the remaining conditions are satisfied. We illustrate the application of the algorithm on a data fusion problem known as HS sharpening.
Experimental results show that the proposed method outperforms another state-of-the-art algorithm based on sparse representations on learned dictionaries , for most of the test settings. The proposed scene-adaptation also improves the results over the GMM-based denoiser in , giving supporting evidence to the hypothesis that the PAN image constitutes a useful training example, as the HS bands (and latent images) share a similar spatial structure.
Appendix: Proof of Lemma 1
Each is symmetric, as it is a convex combination of symmetric matrices, which is clear from (16) and because covariances are symmetric. Thus, is also symmetric (see (17)). Consider the eigendecomposition of each , where contains its eigenvalues, in non-increasing order. Then,
where is a diagonal matrix, and thus the eigenvalues of are in
since ( is positive definite) and .
From (16), each is a convex combination of matrices, each of which with eigenvalues in . Weyl’s inequality  implies that the eigenvalues of a convex combination of symmetric matrices is bounded below (above) by the same convex combination of the smallest (largest) eigenvalues of those matrices. The eigenvalues of are thus all in , i.e., is positive definite and .
Finally, following , we partition the set of patches into a collection of subsets of non-overlapping patches: (the number of subsets of non-overlapping patches is equal to the patch size, due to the assumption of unit stride and periodic boundaries). Using this partition, can be written as
Since the patches in are disjoint, there is a permutation of the image pixels that allows writing as a block-diagonal matrix, where the blocks are the matrices, with . Because the set of eigenvalues of a block-diagonal matrix is the union of the sets of eigenvalues of its blocks, the eigenvalues of each are bounded similarly as those of the , thus in . Finally, again using Weyl’s inequality, the eigenvalues of are bounded above (below) by the average of the largest (smallest) eigenvalues of each , thus also in .
-  K. Dabov et al., “Image denoising by sparse 3D transform-domain collaborative filtering,” IEEE Trans. Image Proc., vol. 16, pp. 2080–2095, 2007.
-  A. Teodoro et al., “Single-frame image denoising and inpainting using Gaussian mixtures,” in ICPRAM, 2015.
-  D. Zoran and Y. Weiss, “From learning models of natural image patches to whole image restoration,” in ICCV, 2011, pp. 479–486.
-  A. Danielyan et al., “BM3D frames and variational image deblurring,” IEEE Trans. Image Proc., vol. 21, pp. 1715–1728, 2012.
-  M. Mignotte, “A non-local regularization strategy for image deconvolution,” Patt. Rec. Lett., vol. 29, pp. 2206–2212, 2008.
-  V. Papyan and M. Elad, “Multi-scale patch-based image restoration,” IEEE Trans. Image Proc., vol. 25, pp. 249–261, 2016.
-  S. Venkatakrishnan et al., “Plug-and-play priors for model based reconstruction,” in IEEE GlobalSIP, 2013, pp. 945–948.
-  Z. Xu et al., “Adaptive ADMM with spectral penalty parameter selection,” arXiv:1605.07246, 2016.
-  S. Chan et al., “Plug-and-play ADMM for image restoration: Fixed point convergence and applications,” IEEE Trans. Comp. Imaging, 2016.
-  G. Yu et al., “Solving inverse problems with piecewise linear estimators: from Gaussian mixture models to structured sparsity,” IEEE Trans. Image Proc., vol. 21, pp. 2481–2499, 2012.
-  A. Teodoro et al., “Image restoration and reconstruction using variable splitting and class-adapted image priors,” in IEEE ICIP, 2016.
-  L. Loncan et al., “Hyperspectral pansharpening: a review,” IEEE Geosc. Remote Sens. Mag., vol. 3, pp. 27–46, 2015.
-  M. Simões et al., “A convex formulation for hyperspectral image superresolution via subspace-based regularization,” IEEE Trans. Geosc. Remote Sens., vol. 55, pp. 3373–3388, 2015.
-  Q. Wei et al., “Hyperspectral and multispectral image fusion based on a sparse representation,” IEEE Trans. Geosc. Remote Sens., vol. 53, pp. 3658–3668, 2015.
-  A. Teodoro et al., “Sharpening hyperspectral images using plug-and-play priors,” in LVA/ICA, 2017.
-  L. Yuan et al., “Image deblurring with blurred/noisy image pairs,” ACM Trans. Graphics, vol. 26, 2007.
-  D. Gabay and B. Mercier, “A dual algorithm for the solution of nonlinear variational problems via finite element approximations,” Comp. and Math. Appl., vol. 2, pp. 17–40, 1976.
-  J. Eckstein and D. Bertsekas, “On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators,” Math. Prog., vol. 5, pp. 293–318, 1992.
-  S. Boyd et al., “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Found. and Trends in Machine Learning, vol. 3, pp. 1–122, 2011.
-  M. Afonso et al., “An augmented Lagrangian approach to the constrained optimization formulation of imaging inverse problems,” IEEE Trans. Image Proc., vol. 20, pp. 681–695, 2011.
-  E. Luo et al., “Adaptive image denoising by targeted databases,” IEEE Trans. Image Proc., vol. 24, pp. 2167–2181, 2015.
-  H. Bauschke and P. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Springer, 2011.
-  J. J. Moreau, “Proximité et dualité dans un espace Hilbertien,” Bull. de la Société Math. de France 93, pp. 273–299, 1965.
-  J. Bioucas-Dias and J. Nascimento, “Hyperspectral subspace identification,” IEEE Trans. Geosc. Remote Sens., vol. 46, pp. 2435–2445, 2008.
-  L. Rudin et al., “Nonlinear total variation based noise removal algorithms,” Physica D: Nonlinear Phenomena, vol. 60, pp. 259–268, 1992.
-  R. Bhatia, Matrix Analysis, Springer, 1997.
-  J. Sulam et al., “Gaussian mixture diffusion,” in IEEE ICSEE, 2016.