Image denoising, deblurring, and inpainting are only a few examples of standard image processing tasks which are traditionally solved through numerical optimization. In most cases, the objective function associated with such problems is expressed as the sum of a data fidelity term and a regularization term (or a number thereof) [1, 13, 3, 4]
. In particular, considering the desired image estimate
to be a (column) vector in, both and are usually defined as non-negative functionals on , in which case the standard form of an optimization-based imaging task is given by
Here is a regularization constant that balances the effects of empirical and prior information on the optimal solution. Specifically, the first (fidelity) term in (1) forces the solution to “agree” with the observed data , as it would be the case if one set, e.g., . On the other hand, the prior information is represented by the second (regularization) term in (1), which is frequently required to prevent overfitting and/or to render the optimal solution unique and stably computable. For instance, when the optimal solution is expected to be sparse, it is common to set [1, 2, 42].
The convexity and differentiability of the squared Euclidean distance, along with the unparalleled convenience of its numerical handling, are behind the main reasons for its prevalence as a measure of image proximity. The same applies to Mean Squared Error (MSE) and Peak to Signal Noise Ratio (PSNR)—closely related metrics, which have been extensively used to quantify visual quality of images and videos. Yet, neither of the above quantitative measures can be considered a good model for the Human Visual System (HVS), as opposed to the Structural Similarity Index (SSIM), originally proposed by Wang et al. [45, 46]. The SSIM index is based upon the assumption that the HVS has evolved to perceive visual distortions as changes in structural information. On the basis of subjective quality assessments involving large databases, SSIM has been generally accepted to be one of the best measures of visual quality and, by extension, of perceptual proximity between images.
Considering the exceptional characteristics of the SSIM mentioned above, the idea of replacing the standard norm-based fidelity measures with SSIM seems rather straightforward. However, the optimization problems thus obtained demand much more careful treatment, especially when the question of existence and uniqueness of their solutions is concerned. The main difficulty here stems from the fact that SSIM is not a convex function.
Notwithstanding the above obstacles, optimization problems employing the SSIM as a data fidelity term have already been addressed. For instance, in , the authors address the problem of finding the best approximation of data images in the domain of an orthogonal transform, with the optimization performed with respect to the SSIM measure. In this work, instead of maximizing SSIM, the authors minimize
in which case is a matrix representing the synthesis phase of an orthogonal transform (e.g., either Fourier, cosine, or wavelet transform), is a vector of approximation coefficients, and is a data image to be approximated. It is important to note that, as opposed to the SSIM, may be considered a measure of structural dissimilarity between and .
Based on the above results, Rehman et al.  address the problem of sparse dictionary learning through modifying the original k-SVD method of Elad et al. in . The conceptual novelty of the work in  consists in using SSIM instead of
-norm based proximity measures, which have been shown to yield substantial improvements in the performance of the sparse learning as well as its application to a number of image processing tasks, including super-resolution. Another interesting use of the SSIM for denoising images was proposed in. Here, the authors introduce the statistical SSIM index (statSSIM), an extension of the SSIM for wide-sense stationary random processes. Subsequently, using the proposed statSSIM, the authors have been able to reformulate a number of classical statistical estimation problems, such as adaptive filter design. Moreover, it was shown in  that statSSIM is a quasi-concave function, which opens up a number of possibilities for its efficient numerical optimization, e.g., using the bisection method [3, 17]. Interestingly enough, what seems to have been omitted in  is to recognize that the SSIM is a quasi-concave function  that can be optimized by means of a number of a number of efficient numerical methods as well. For more examples of exploiting SSIM in imaging sciences, the reader is referred to [44, 40, 10], which demonstrate the application of this measure to rate distortion optimization, video coding, and image classification.
Finally, we note that maximization of is equivalent to minimization of in (2), with if and only if . Consequently, many image processing problems can be formulated in the form of a quasi-convex program as given by
where is an optimization variable,
is a linear transform,is a vector of observed data, and are convex inequality constraint functions. With little loss of generality, in this work, we assume , in which case the problem in (3) can be rewritten in its equivalent unconstrained (Lagrangian) form as
which fits the format of (1). In what follows, we refer to the above problem as an unconstrained SSIM-based optimization problem, as opposed to its constrained version in (3). Note that both problems could also be viewed as special instances under the umbrella of SSIM-based optimization.
As opposed to developing specific methods for particular applications, in this paper, we introduce a set of algorithms to solve the general problems (4) and (3). Subsequently, we demonstrate the performance of these algorithms with several applications such as (TV) denoising and sparse approximation, as well as providing comparisons against more traditional approaches.
2 Structural Similarity Index (SSIM)
The SSIM index provides a measure of visual similarity between two images, which could be, for instance, an original scheme and its distorted version. Since it is assumed that the distortionless image is always available, the SSIM is considered afull-reference measure of image quality assessment (IQA) . Its definition is based on two assumptions: (i) images are highly structured (that is, pixel values tend to be correlated, especially if they are spatially close), and (ii) HVS is particularly sensitive to structural information. For these reasons, SSIM assesses similarity by quantifying changes in perceived structural information, which can be expressed in terms of the luminance, contrast, and structure of the compared images. In particular, given two images, and , let and denote their mean values. Also, let and
be the standard deviations of the images, whose cross-correlation coefficient is equal to. Then, discrepancies between the luminance of and can be quantified using
where is added for stability purposes. Note that is sensitive to relative (rather than absolute) changes of luminance, which makes it consistent with Weber’s law—a model for light adaptation in the HVS . Further, the contrast of and can be compared using
where, as before, is added to prevent division by zero. It is interesting to note that, when there is a change in contrast, is more sensitive if the base contrast is low than when this is high—a characteristic behaviour of the HVS.
Finally, the structural component of SSIM is defined as
with , which makes it very similar to the normalized cross-correlation.
Once the three components , , and are computed, SSIM is defined according to
This definition of the SSIM will be employed for the remainder of the paper.
We close this section by pointing out that the statistics of natural images vary greatly across their spatial domain, in which case it makes sense to replace the global means ,, , and cross-correlation by their localized versions computed over (either distinct or overlapping) local neighbourhoods. In this case, using the localized statistics would result in values of the SSIM, , which can be averaged to yield a mean SSIM index (MSSIM) , which is a more frequently used metric in practice.
2.2 A normalized metric yielded by the SSIM
This less cumbersome version of the SSIM can be simplified even further if both and have zero mean, i.e., . In this case, it is rather straightforward to show that
with the associated dissimilarity index thus becoming
Note that, if , , while if and only if .
given by (12) is an example of a (squared) normalized metric, which has been discussed in [6, 9]. Thus, while tells us about how correlated or similar and are, gives us a sense of how far is from a given observation (in the normalized sense). Moreover, since in the majority of optimization problems the fidelity term quantifies the distance between a model/estimate and its observation, it seems more suitable for us to proceed with minimization of .
We finally note that, throughout the rest of the paper, unless otherwise stated, we will be working with zero-mean images, thus using the definitions of SSIM as given in (11) and (12). This simplification, however, is by no means restrictive. Indeed, many algorithms of modern image processing are implemented under Neumann boundary conditions, thus preserving the mean brightness (or, equivalently, mean value) of input images. Hence, it is rarely a problem to subtract the mean value before the processing starts, while adding it back to the final result.
2.3 Quasiconvexity and quasiconcavity
An interesting property of the dissimilarity measure is that it is quasiconvex over a half-space of . This can be easily proved by using the fact that a function is quasiconvex if its domain and all its sub-level sets
are convex .
 Let be fixed. Then, is quasiconvex if , where is the stability constant of the dissimilarity measure .
Note that, from the relation , it immediately follows that is a quasiconcave function over the halfspace defined by . Moreover, using the definition of quasiconcavity, the same line of arguments as in  can be used to show that (resp. ) is a quasiconcave (resp. quasiconvex) function, when restricted to the half-space defined by .
3 Constrained SSIM-based Optimization
A standard quasiconvex optimization problem is defined as follows:
where is a quasiconvex cost function, (with and ) represents a set of affine equality constraints, and are a set of convex inequality constraint functions. A standard approach to solving (3) is by means of the bisection method, which is designed to converge to the desired solution up to a certain accuracy . This method reduces direct minimization of to a sequence of convex feasibility problems. Using the above approach it is therefore possible to take advantage of the quasiconvexity of to cast SSIM-based optimization problems as quasiconvex optimization problems, which can be subsequently solved by efficient numerical means.
In particular, in what follows, we consider the following quasiconvex optimization problem
where represents a linear transform and is a data image. Although the dissimilarity index above is generally defined as given by (12), in practical settings, it is common to deal with non-trivial data images, suggesting . In this case, the stability constant in (12) can be set to zero, thereby leading to a simplified expression for as given by
Note that, for , the above expression is well defined and differentiable for all .
As it was mentioned earlier in this section, the quasiconvex problem (3) can be efficiently solved by means of the bisection algorithm, a particular version of which we introduce next.
3.1 Quasiconvex optimization
Before providing details on the proposed optimization procedure, we note that, in image processing, one usually deals with positive-valued vectors . Moreover, it is also frequently the case that the matrix obeys the property of producing positive-valued results when multiplied by positive-valued vectors. In other words, provided , the above property guarantees that , which is very typical for situations when describes the effects of, e.g., blurring and/or subsampling. In such cases, it is reasonable to assume that (provided, of course, as well), thereby allowing us to consider as a quasiconvex function of .
Given the latter, in general, a solution of (3) can be found by solving a sequence of feasibility problems . These problems are formulated using a family of convex inequalities, which are defined by means of a family of functions such that
with the additional property that for all , whenever . In particular, one can show that all the above properties are satisfied by
Consequently, the feasibility problems then assume the following form
Using the fact that , and defining and to be vectors in whose entries are all equal to one and zero, respectively, we propose the following algorithm for solving (3).
Notice that this method will find a solution such that in exactly iterations , provided that such solution lies in the quasiconvex region of , in which case the algorithm converges to the optimal to a precision controllable by the value of . Once again, we note that the optimal solution can be reasonably assumed to lie within the region of quasi-convexity of , since one is normally interested in reconstructions that are positively correlated to the given observation .
4 Unconstrained SSIM-based Optimization
In many practical applications, the feasible set of SSIM-based optimization can be described by a single constraint, in which case the optimization problem in (3) can be rewritten in its equivalent unconstrained (Lagrangian) form, that is,
where is a regularization parameter and is a regularization functional, which is often defined to be convex. As usual, when is strictly greater than zero, the regularization term is used to force the optimal solution to reside within a predefined “target” space, thus letting one to use a priori information on the properties of the desired solution. This is particularly important in the case where is either rank-deficient or poorly conditioned, in which case the regularization can help to render the solution well-defined and stably computable. However, since the first term in (20) is not convex, the entire cost function is not convex either. Consequently, the existence of a unique global minimizer of (20) cannot be generally guaranteed. With this restriction in mind, however, it is still possible to devise efficient numerical methods capable of converging to either a locally or globally optimal solution, as it will be shown in the following section of the paper.
4.1 ADDM-based Approach
In order to solve problem in (20) we follow an approach based on the Augmented Lagrangian Method of Multipliers (ADMM). This methodology is convenient since it allows us to solve a wide variety of unconstrained SSIM-based optimization problems by splitting the cost function to be minimized into simpler optimization problems that are easier to solve. Moreover, as will be seen in Section 4.1.2, in several cases these simpler problems will have closed form solutions which may be computed very quickly. This is, of course, advantageous since it improves the performance of the resulting algorithm in terms of execution time.
Before going any further, it is worthwhile to mention that one of this simpler optimization problems will usually have the following form,
where both and are given. We consider this special case separately in the following section.
4.1.1 Quadratic regularization
Since the cost function in (21) is twice-differentiable, root-finding algorithms such as the generalized Newton’s method  can be employed to find a local zero-mean solution . To such an end, we must first compute the gradient of (21) which satisfies the equation,
Here, , , and
is the identity matrix. If we define the following function,
then is a (zero-mean) vector in such that .
Under certain conditions, the convergence of this method is guaranteed by the Newton-Kantorovich theorem, stated below.
 Let and be Banach spaces and . Assume is Fréchet differentiable on an open convex set and
Also, for some , suppose that is defined on all and that
where and . Set
and assume that . Then, the Newton iterates
are well defined, lie in and converge to a solution of , which is unique in . Moreover, if , the order of convergence is at least quadratic.
In our particular case, the Fréchet derivative of is its Jacobian, which we denote as ,
By the previous theorem, convergence is guaranteed in any open convex subset of , where , as long as the initial guess satisfies the following condition,
Here, denotes the inverse of , and is a constant that is less or equal than the Lipschitz constant of . Indeed, it can be proved that for any open convex subset , is Lipschitz continuous.
Let be defined as in Eq. (23). Then, its Jacobian is Lipschitz continuous on any open convex set ; that is, there exists a constant such that for any ,
Here, denotes the Frobenius norm111The Frobenius norm of an matrix is defined as . and
Proof. See Appendix.
From this discussion, we propose the following algorithm for solving the problem in (21).
It is worthwhile to point out that in some cases the calculation of the inverse of may be computationally expensive which, in turn, also makes the -update costly. This difficulty can be addressed by updating the variable in the following manner,
Note that this -update involves the solution of a linear system of equations which can be conveniently done by numerical schemes such as the Gauss-Seidel method .
4.1.2 ADMM-based Algorithm
The problem in (20) can be solved efficiently by taking advantage of the fact that the objective function is separable. Let us introduce a new variable, . Also, let the independent variable of the regularizing term be equal to . We then define the following optimization problem,
Clearly, (33) is equivalent to problem (20), thus a solution of (33) is also a minimizer of the original optimization problem (20). Furthermore, notice that (33) is presented in ADMM form , implying that an analogue of the standard ADMM algorithm can be employed to solve it.
As is customary in the ADMM methodology, let us first form the corresponding augmented Lagrangian of (33),
This expression can be rewritten in its more convenient scaled form by combining the linear and quadratic terms and scaling the dual variable , yielding
where . We employ this version of Eq. (34) since the formulas associated with it are usually simpler than their unscaled counterparts . As usual, the iterations of the proposed algorithm for solving (33) will be the minimization of Eq. (35) with respect to variables and in an alternate fashion, and the update of the dual variable , which accounts for the maximization of the dual function :
As such, we define the following iteration schemes for minimizing the cost function of the equivalent counterpart of problem (20):
Observe that the -update can be computed using the method described in the previous section. Furthermore, when is convex, the -update is equal to the proximal operator of . Recall that for a convex function its proximal operator is defined as
It follows that
Taking this into account, we introduce the following algorithm for solving the problem in (20).
4.2 Alternative ADMM-based Approach
As shown in the previous section, one of the advantages of the ADMM-based approach is that it allows us to solve problem (20) by solving a sequence of simpler optimization problems. The complexity of the proposed method can be further reduced by reformulating (20) as the following equivalent optimization problem:
The scaled form of the augmented Lagrangian of this new problem is given by
Therefore, problem (43) can be solved by defining the following iterations:
Notice that the -update has a closed form solution, which is given by
where is the identity matrix. Alternatively, observe that the variable can also be updated by solving the following system of linear equations:
This option is convenient when the computation of the inverse of the matrix is expensive. Since the matrix is symmetric and positive-definite we can employ a conjugate gradient method to update the primal variable .
Finally, the -update is simply the proximal operator of the function ,
provided is a convex function of .
Given the above discussion, we present the following alternative algorithm for solving the problem in (20).
Naturally, different sets of constraints and regularization terms yield different SSIM-based imaging problems. In this section, we review some applications and the corresponding optimization problems associated with them.
5.1 SSIM with Tikhonov regularization
A common method used for ill-posed problems is Tikhonov regularization or ridge regression. This is basically a constrained version of least squares and it is found in different fields such as statistics and engineering. It is stated as follows
where is called the Tikhonov matrix. A common choice for the matrix is the identity matrix, however, other choices may be a scaled finite approximation of a differential operator or a scaled orthogonal projection [32, 27, 33].
Notice that this problem can be addressed by employing Algorithm I.
Algorithm II can be employed to solve the unconstrained counterpart of the previous problem, given by
In this particular case, the corresponding iteration schemes are defined as follows,
where is the identity matrix. In some cases, the computation of the inverse of the matrix may be expensive, making it more appropriate to employ alternative methods to compute the -update. For instance, if is smaller than , it may be convenient to employ the matrix inversion lemma . Furthermore, notice that the -update is equivalent to solving the following system of linear equations,
Since the matrix is positive-definite and symmetric, the variable can be updated efficiently by using a conjugate gradient method.
5.2 Ssim--based optimization
Other interesting applications emerge when the norm is used either as a constraint or a regularization term. For example, by defining the constraint , we obtain the following SSIM-based optimization problem,
which can be solved with Algorithm II. Problem (60) can be seen as the SSIM version of the classical regularized version of the least squares method known as LASSO (Least Absolute Shrinkage and Selection Operator) [21, 1]. The corresponding iteration schemes to solve it are as follows,
Problems (59) and (60) are appealing because they combine the concepts of similarity and sparseness, therefore, these are relevant in applications in which sparsity is the assumed underlying model for images. To the best of our knowledge, optimization of the SSIM along with the norm has been only reported in [35, 36, 37] and in this work.
5.3 SSIM and Total Variation
By employing the constraint , where is a difference matrix and is the identity matrix , we can define a SSIM-total-variation-denoising method for one-dimensional discrete signals as follows. Given a noisy signal , its denoised reconstruction is the solution of the problem,
Here, we consider as a measure of the total variation (TV) of the vector . Notice that instead of minimizing the TV, we employ it as a constraint. Moreover, methods for solving the version of (65) can be found in [18, 25]. As with the classical TV optimization problems, solutions of (65) have bounded variation as well.
As for images, these can be denoised in the following manner. Let be a noisy image. Also, let be a linear transformation that converts matrices into column vectors, that is,
A reconstruction of the noiseless image can be obtained by means of the following SSIM-based optimization problem,
where the regularizing term is a discretization of the isotropic TV seminorm for real-valued images. In particular, the TV of is the discrete counterpart of the standard total variation of a continuous image that belongs to the function space , where is an open subset of :
where is the Frobenius norm. Notice that the -update may be computed efficiently by using the algorithm introduced by Chambolle in .
As mentioned before, it is more convenient to employ an average of local SSIMs as a fidelity term. Let be a partition of the given image such that . Further, let also be partitions of the variables and such that and . Also, let be given by
Then, the optimization problem that is to be solved is
If are partitions of non-overlapping blocks, the problem in (73) can be solved by carrying out the following iterations,
where is an element of the partition of the dual variable . As expected, , and for all . The extension of this algorithm when a weighted average of local SSIMs is used as a measure of similarity between images is straightforward.
Other imaging tasks can be performed by solving the following variant of the SSIM-based optimization problem in (73),
where is a linear operator (e.g., blurring kernel, subsampling procedure, etc.). A minimizer of (77) may be found by means of Algorithm IV. In this case, the corresponding ADMM iterations are the following:
Furthermore, under certain circumstances, Algorithm III can be employed to solve problem (77). Let be a partition of such that , and for all . If there exist operators , , such that
for all , then a minimizer of (77) can be found by means of Algorithm III. The corresponding iterations are as follows: