Image restoration is an inverse problem, which aims at estimating the originalclean image from a blurry and/or noisy observation . Mathematically, this problem is formulated as:
where is a linear operator, and
are the noise vectors, anddenotes an elementwise product. Let and be column vectors of all entries equal to one and zero, respectively. When and (or and ), (1) corresponds to the additive (or multiplicative) noise model. For convenience, we adopt the vector representation for images, where a 2D image is column-wise stacked into a vector with . So, for completeness, we have , and . Before proceeding, we present an image restoration example on the well-known ‘barbara’ image using our proposed method for solving impulse noise removal in Figure 1.
In general image restoration problems, represents a certain linear operator, e.g. convolution, wavelet transform, etc., and recovering from is known as image deconvolution or image deblurring. When is the identity operator, estimating from is referred to as image denoising . The problem of estimating from is called a linear inverse problem which, for most scenarios of practical interest, is ill-posed due to the singularity and/or the ill-conditioning of . Therefore, in order to stabilize the recovery of , it is necessary to incorporate prior-enforcing regularization on the solution. Therefore, image restoration can be modelled globally as the following optimization problem:
where measures the data fidelity between and the observation , and
are two suitable linear transformation matrices such thatand compute the discrete gradients of the image along the -axis and -axis, respectively111In practice, one does not need to compute and store the matrices and explicitly. Since the adjoint of the gradient operator is the negative divergence operator , i.e., for any , the inner product between vectors can be evaluated efficiently. Fore more details on the computation of and div operators, please refer to [14, 51, 4]., is the regularizer on and , and is a positive parameter used to balance the two terms for minimization. Apart from regularization, other prior information such as bound constraints [5, 70] or hard constraints can be incorporated into the general optimization framework in (2).
|Data Fidelity Function||Noise and References|
|add. Gaussian noise [47, 14]|
|add. Laplace noise [60, 23]|
|add. uniform noise [22, 51]|
|mul. Poisson noise [36, 49]|
|mul. Gamma noise [3, 53]|
|mul. Rayleigh noise [48, 2]|
|mixed Gaussian impulse noise |
|add./mul. impulse noise [ours]|
1.1 Related Work
This subsection presents a brief review of existing TV methods, from the viewpoint of data fidelity models, regularization models and optimization algorithms.
Data Fidelity Models: The fidelity function in (2) usually penalizes the difference between and by using different norms/divergences. Its form depends on the assumed distribution of the noise model. Some typical noise models and their corresponding fidelity terms are listed in Table I. The classical TV model  only considers TV minimization involving the squared -norm fidelity term for recovering images corrupted by additive Gaussian noise. However, this model is far from optimal when the noise is not Gaussian. Other works [60, 23] extend classical TV to use the -norm in the fidelity term. Since the
-norm fidelity term coincides with the probability density function of Laplace distribution, it is suitable for image restoration in the presence of Laplace noise. Moreover, additive uniform noise[22, 51], multiplicative Poisson noise , and multiplicative Gamma noise  have been considered in the literature. Some extensions have been made to deal with mixed Rayleigh impulse noise and mixed Poisson impulse noise in . Recently, a sparse noise model using an -norm for data fidelity has been investigated in  to remove impulse and mixed Gaussian impulse noise. In this paper, we consider -norm data fidelity and show that it is particularly suitable for reconstructing images corrupted with additive/multiplicative 222The impulse noise has a discrete nature (corrupted or uncorrupted), thus it can be viewed as additive noise or multiplicative noise. impulse noise.
Regularization Models: Several regularization models have been studied in the literature (see Table II). The Tikhonov-like regularization  function is quadratic and smooth, therefore it is relatively inexpensive to minimize with first-order smooth optimization methods. However, since this method tends to overly smooth images, it often erodes strong edges and texture details. To address this issue, the total variation (TV) regularizer was proposed by Rudin, Osher and Fatemi in  for image denoising. Several other variants of TV have been extensively studied. The original TV norm in  is isotropic, while an anisotropic variation is also used. From a numerical point of view, and cannot be directly minimized since they are not differentiable. A popular method is to use their smooth approximation and (see  for details). Very recently, the Potts model [29, 42, 9], which is based on the -norm, has received much attention. It has been shown to be particularly effective for image smoothing  and motion deblurring .
|Regularization Function||Description and References|
|Isotropic [47, 53]|
|Anisotropic [50, 60]|
|smooth TV [18, 51]|
|Potts model [56, 57]|
Optimization Algorithms: The optimization problems involved in TV-based image restoration are usually difficult due to the non-differentiability of the TV norm and the high dimensionality of the image data. In the past several decades, a plethora of approaches have been proposed, which include PDE methods based on the Euler-Lagrange equation , the interior-point method , the semi-smooth Newton method , the second-order cone optimization method , the splitting Bregman method [32, 69], the fixed-point iterative method , Nesterov’s first-order optimal method [44, 5], and alternating direction methods [50, 20, 53]. Among these methods, some solve the TV problem in its primal form , while others consider its dual or primal-dual forms [18, 23]. In this paper, we handle the TV problem with -norm data fidelity using a primal-dual formulation, where the resulting equality constrained optimization is solved using proximal Alternating Direction Method of Multipliers (PADMM). It is worthwhile to note that the Penalty Decomposition Algorithm (PDA) in  can also solve our problem, however, it lacks numerical stability. This motivates us to design a new -norm optimization algorithm in this paper.
1.2 Contributions and Organization
The main contributions of this paper are two-fold. (1) -norm data fidelity is proposed to address the TV-based image restoration problem333We are also aware of Ref.  where -norm data fidelity is considered. However, their interpretation from the MAP viewpoint is not correct. . Compared with existing models, our model is particularly suitable for image restoration in the presence of impulse noise. (2) To deal with the resulting NP-hard 444The norm problem is known to be NP-hard , since it is equivalent to NP-complete subset selection problems. norm optimization, we propose a proximal ADMM to solve an equivalent MPEC form of the problem. A preliminary version of this paper appeared in .
The rest of the paper is organized as follows. Section 2 presents the motivation and formulation of the problem for impulse noise removal. Section 3 presents the equivalent MPEC problem and our proximal ADMM solution. Section 4 discusses the connection between our method and prior work. Section 5 provides extensive and comparative results in favor of our TV method. Finally, Section 6 concludes the paper.
2 Motivation and Formulations
This work focuses on image restoration in the presence of impulse noise, which is very common in data acquisition and transmission due to faulty sensors or analog-to-digital converter errors, etc. Moreover, scratches in photos and video sequences can be also viewed as a special type of impulse noise. However, removing this kind of noise is not easy, since corrupted pixels are randomly distributed in the image and the intensities at corrupted pixels are usually indistinguishable from those of their neighbors. There are two main types of impulse noise in the literature [23, 35]: random-valued and salt-and-pepper impulse noise. Let be the dynamic range of an image, where and in this paper. We also denote the original and corrupted intensity values at position as and , respectively.
Random-valued impulse noise: A certain percentage of pixels are altered to take on a uniform random number :
Salt-and-pepper impulse noise: A certain percentage of pixels are altered to be either or :
The above definition means that impulse noise corrupts a portion of pixels in the image while keeping other pixels unaffected. Expectation maximization could be used to find the MAP estimate of
by maximizing the conditional posterior probability, the probability that occurs when
is observed. By the Bayes’ theorem, we have that
Taking the negative logarithm of the above equation, the estimate is a solution of the following minimization problem:
We now focus on the two terms in (5). (i) The expression can be viewed as a fidelity term measuring the discrepancy between the estimate and the noisy image . The choice of the likelihood depends upon the property of noise. From the definition of impulse noise given above, we have that
where is the noise density level as defined in (3) and (4) and counts the number of non-zero elements in a vector. (ii) The term in (5) is used to regularize a solution that has a low probability. We use a prior which has the Gibbs form: with . Here, is the TV prior energy functional, is a normalization factor such that the TV prior is a probability, and is the free parameter of the Gibbs measure. Replacing and into (5) and ignoring a constant, we obtain the following model:
where is a positive number related to , and . The parameter can be 1 (anisotropic TV) or (isotropic TV), and and denote the th component of the vectors and , respectively. For convenience, we define :
In order to make use of more prior information, we consider the following box-constrained model:
where is specified by the user. When is 0, it indicates the pixel in position is an outlier, while when is 1, it indicates the pixel in position is a potential outlier. For example, in our experiments, we set for the random-valued impulse noise and for the salt-and-pepper impulse noise. In what follows, we focus on optimizing the general formulation in (6).
2.2 Equivalent MPEC Reformulations
In this section, we reformulate the problem in (6) as an equivalent MPEC from a primal-dual viewpoint. First, we provide the variational characterization of the -norm using the following lemma.
For any given , it holds that
and is the unique optimal solution of the problem in (7). Here, the standard signum function sign is applied componentwise, and .
The total number of zero elements in can be computed as , where . Note that when , will be achieved by maximization, when , will be enforced by the constraint. Thus, . Since the objective function is linear, maximization is always achieved at the boundaries of the feasible solution space. Thus, the constraint of can be relaxed to , we have: .
Although the MPEC problem in (8) is obtained by increasing the dimension of the original -norm problem in (6), this does not lead to additional local optimal solutions. Moreover, compared with (6), (8) is a non-smooth non-convex minimization problem and its non-convexity is only caused by the complementarity constraint .
Such a variational characterization of the -norm is proposed in [25, 34, 27, 7, 6], but it is not used to develop any optimization algorithms for -norm problems. We argue that, from a practical perspective, improved solutions to (6) can be obtained by reformulating the -norm in terms of complementarity constraints [40, 63, 65, 64, 66, 67]. In the following section, we will develop an algorithm to solve (8) based on proximal ADMM and show that such a “lifting” technique can achieve a desirable solution of the original -norm optimization problem.
3 Proposed Optimization Algorithm
This section is devoted to the solution of (8). This problem is rather difficult to solve, because it is neither convex nor smooth. Our solution is based on the proximal ADM method, which iteratively updates the primal and dual variables of the augmented Lagrangian function of (8).
First, we introduce two auxiliary vectors and to reformulate (8) as:
Let be the augmented Lagrangian function of (9).
where , and are the Lagrange multipliers associated with the constraints , and , respectively, and is the penalty parameter. The detailed iteration steps of the proximal ADM for (9) are described in Algorithm 1. In simple terms, ADM updates are performed by optimizing for a set of primal variables at a time, while keeping all other primal and dual variables fixed. The dual variables are updated by gradient ascent on the resulting dual problem.
(S.0) Choose a starting point . Set . Select step size , , , and .
(S.1) Solve the following minimization problems with and :
(S.2) Update the Lagrange multipliers:
(S.3) if , then (S.4) Set and then go to Step (S.1).
(i) -subproblem. Proximal ADM introduces a convex proximal term to the objective. The specific form of is chosen to expedite the computation of the closed form solution. The introduction of is to guarantee strongly convexity of the subproblems.
-subproblem in (10) reduces to the following minimization problem:
After an elementary calculation, subproblem (15) can be simplified as
with . Then, the solution of (10) has the following closed form expression:
Here the parameter depends on the spectral norm of the linear matrices and . Using the definition of and the classical finite differences that and (see [4, 14, 70]), the spectral norm of can be computed by: .
-subproblem in (10) reduces to the following minimization problem:
where , . Therefore, the solution can be computed as:
(iii) -subproblem. Variable in (11) is updated by solving the following problem:
where . It is not difficult to check that for ,
and when ,
Variable in (11) is updated by solving the following problem:
where and . A simple computation yields that the solution can be computed in closed form as:
Proximal ADM has excellent convergence in practice. The global convergence of ADM for convex problems was given by He and Yuan in [33, 20] under the variation inequality framework. However, since our optimization problem in (8) is non-convex, the convergence analysis for ADM needs additional conditions. By imposing some mild conditions, Wen et al.  managed to show that the sequence generated by ADM converges to a KKT point. Along a similar line, we establish the convergence property of proximal ADM. Specifically, we have the following convergence result.
Please refer to Appendix A. ∎
Remark 1. The condition holds when the multiplier does not change in two consecutive iterations. By the boundedness of the penalty parameter and Eqs (12-14), this condition also indicates that the equality constraints in (9) are satisfied. This assumption can be checked by measuring the violation of the equality constraints. Theorem 1 indicates that when the equality constraint holds, PADMM converges to a KKT point. Though not satisfactory, it provides some assurance on the convergence of Algorithm 1.
Remark 2. Two reasons explain the good performance of our method. (i) It targets a solution to the original problem in (6). (ii) It has monotone and self-penalized properties owing to the complimentarity constraints brought on by the MPEC. Our method directly handles the complimentary constraints in (9) with . These constraints are the only sources of non-convexity for the optimization problem and they characterize the optimality of the KKT solution of (6). These special properties of MPEC distinguish it from general nonlinear optimization [65, 66, 64, 67]. We penalize the complimentary error of (which is always non-negative) and ensure that the error is decreasing in every iteration.
4 Connection with Existing Work
In this section, we discuss the connection between the proposed method -PADM and prior work.
4.1 Sparse Plus Low-Rank Matrix Decomposition
Sparse plus low-rank matrix decomposition [54, 35] is becoming a powerful tool that effectively corrects large errors in structured data in the last decade. It aims at decomposing a given corrupted image (which is of matrix form) into its sparse component () and low-rank component () by solving: . Here the sparse component represents the foreground of an image which can be treated as outliers or impulse noise, while the low-rank component corresponds to the background, which is highly correlated. This is equivalent to the following optimization problem:
which is also based on -norm data fidelity. While they consider the low-rank prior in their objective function, we consider the Total Variation (TV) prior in ours.
4.2 Convex Optimization Method
It is generally believed that is able to remove the impulse noise properly. This is because -norm provides the tightest convex relaxation for the -norm over the unit ball in the sense of -norm. It is shown in  that the problem of minimizing is equivalent to with high probability under the assumptions that (i) is sparse at the optimal solution and (ii) is a random Gaussian matrix and sufficiently “incoherent” (i.e., number of rows in is greater than its number of columns). However, these two assumptions required in  do not necessarily hold true for our optimization problem. Specifically, when the noise level of the impulse noise is high, may not be sparse at the optimal solution . Moreover, the matrix
is a square identity or ill-conditioned matrix. Generally,will only lead to a sub-optimal solution.
4.3 Adaptive Outlier Pursuit Algorithm
Very recently, Yan  proposed the following new model for image restoration in the presence of impulse noise and mixed Gaussian impulse noise:
where is the regularization parameter. They further reformulate the problem above into and then solve this problem using an Adaptive Outlier Pursuit(AOP) algorithm. The AOP algorithm is actually an alternating minimization method, which separates the minimization problem over and into two steps. By iteratively restoring the images and updating the set of damaged pixels, it is shown that AOP algorithm outperforms existing state-of-the-art methods for impulse noise denoising, by a large margin.
Despite the merits of the AOP algorithm, we must point out that it incurs three drawbacks, which are unappealing in practice. First, the formulation in (17) is only suitable for mixed Gaussian impulse noise, i.e. it produces a sub-optimal solution when the observed image is corrupted by pure impulse noise. (ii) Secondly, AOP is a multiple-stage algorithm. Since the minimization sub-problem over 555It actually reduces to the optimization problem. needs to be solved exactly in each stage, the algorithm may suffer from slow convergence. (iii) As a by-product of (i), AOP inevitably introduces an additional parameter (that specifies the Gaussian noise level), which is not necessarily readily available in practical impulse denoising problems.
In contrast, our proposed TV method is free from these problems. Specifically, (i) as have been analyzed in Section 2, i.e. our -norm model is optimal for impulse noise removal. Thus, our method is expected to produce higher quality image restorations, as seen in our results. (ii) Secondly, we have integrated -norm minimization into a unified proximal ADM optimization framework, it is thus expected to be faster than the multiple stage approach of AOP. (iii) Lastly, while the optimization problem in (17) contains two parameters, our model only contains one single parameter.
4.4 Other -Norm Optimization Techniques
Actually, the optimization technique for the -norm regularization problem is the key to removing impulse noise. However, existing solutions are not appealing. The -norm problem can be reformulated as a mixed integer programming problem which can be solved by a tailored branch-and-bound algorithm but it involves high computational complexity. The simple projection methods are inapplicable to our model since they assume the objective function is smooth. Similar to the relaxation, the convex methods such as -support norm relaxation , -largest norm relaxation , QCQP and SDP relaxations  only provide loose approximation of the original problem. The non-convex methods such as Schatten norm [28, 37], re-weighted norm , norm DC (difference of convex) approximation , the Smoothly Clipped Absolute Deviation (SCAD) penalty method, the Minimax Concave Plus (MCP) penalty method  only produce sub-optimal results since they give approximate solutions for the problem or incur high computational overhead.
We take norm approximation method for example and it may suffer two issues. First, it involves an additional hyper-parameter which may not be appealing in practice. Second, the regularized norm problem for general could be difficult to solve. This includes the iterative re-weighted least square method  and proximal point method. The former approximates by with a small parameter and solves the resulting re-weighted least squares sub-problem which reduces to a weighted problem. The latter needs to evaluate a relatively expensive proximal operator in general, except that it has a closed form solution for some special values such as and .
Recently, Lu et al. propose a Penalty Decomposition Algorithm (PDA) for solving the -norm optimization algorithm . As has been remarked in , direct ADM on the norm problem can also be used for solving minimization simply by replacing the quadratic penalty functions in the PDA by augmented Lagrangian functions. Nevertheless, as observed in our preliminary experiments and theirs, the practical performance of direct ADM is worse than that of PDA.
Actually, in our experiments, we found PDA is unstable. The penalty function can reach very large values , and the solution can be degenerate when the minimization problem of the augmented Lagrangian function in each iteration is not exactly solved. This motivates us to design a new -norm optimization algorithm in this paper. We consider a proximal ADM algorithm to the MPEC formulation of -norm since it has a primal-dual interpretation. Extensive experiments have demonstrated that proximal ADM for solving the “lifting” MPEC formulation for produces better image restoration qualities.
5 Experimental Validation
In this section, we provide empirical validation for our proposed -PADMM method by conducting extensive image denoising experiments and performing a thorough comparative analysis with the state-of-the-art.
In our experiments, we use 5 well-known test images of size . All code is implemented in MATLAB using a 3.20GHz CPU and 8GB RAM. Since past studies [11, 21] have shown that the isotropic TV model performs better than the anisotropic one, we choose as the order of the TV norm here. In our experiments, we apply the following algorithms:
(i) BM3D is an image denoising strategy based on an enhanced sparse representation in transform-domain. The enhancement of the sparsity is achieved by grouping similar 2D image blocks into 3D data arrays .
(ii) MFM, Median Filter Methods. We utilize adaptive median filtering to remove salt-and-pepper impulse noise and adaptive center-weighted median filtering to remove random-valued impulse noise.
. The method first detects the damaged pixels by MFM and then solves the TV image inpainting problem.
(v) -ADMM (direct). We directly use ADMM (Alternating Direction Method of Multipliers) to solve the non-smooth non-convex problem with proximal operator being computed analytically. We only consider in our experiments .
(vi) -AOP, the Adaptive Outlier Pursuit (AOP) method described in . We use the implementation provided by the author. Here, we note that AOP iteratively calls the - procedure, mentioned above.
(viii) -PADMM, the proximal ADMM described in Algorithm 1 for solving the optimization problem in (6). We set the relaxation parameter to 1.618 and the strongly convex parameter to . All MATLAB codes to reproduce the experiments of this paper are available online at the authors’ research webpages.
5.1 Experiment Setup
For the denoising and deblurring test, we use the following strategies to generate artificial noisy images.
(a) Denoising problem. We corrupt the original image by injecting random-value, salt-and-pepper noise, and mixed noise (half random-value and half salt-and-pepper) with different densities (10% to 90%) to the images.
(b) Deblurring problem. Although blurring kernel estimation has been pursued by many studies (e.g. ), here we assume that the blurring kernel is known beforehand. We blur the original images with a Gaussian blurring kernel and add impulse noise with different densities (10% to 90%). We use the following MATLAB scripts to generate a blurring kernel of radius ( is set to 7 in the experiments):
We run all the previously mentioned algorithms on the generated noisy and blurry images. For -AOP, we adapt the author’s image denoising implementation to the image deblurring setting. Since both BM3D and Median Filter Methods (MFM) are not convenient to solve the deblurring problems, we do not test them in the deblurring problem. We terminate -PADMM whenever and and . For -PADMM, -PDA, and -PADMM, we use the same stopping criterion to terminate the optimization. For -SBM and -AOP, we adopt the default stopping conditions provided by the authors. For the regularization parameter , we swept over . For the regularization parameter in -AOP, we swept over and set to the number of corrupted pixels.
To evaluate these methods, we compute their Signal-to-Noise Ratios (SNRs). Since the corrupted pixels follow a Bernoulli-like distribution, it is generally hard to measure the data fidelity between the original images and the recovered images. Therefore, we consider three ways to measure SNR.
where is the original clean image and is the mean intensity value of , and is the soft -norm which counts the number of elements whose magnitude is greater than a threshold . We adopt in our experiments.
|Random-Value Impulse Noise|
|Salt-and-Pepper Impulse Noise|
|Mixed Impulse Noise (Half Random-Value Noise and Half Salt-and-Pepper Noise)|