Investigating Task-driven Latent Feasibility for Nonconvex Image Modeling

10/18/2019 ∙ by Risheng Liu, et al. ∙ 14

Properly modeling the latent image distributions always plays a key role in a variety of low-level vision problems. Most existing approaches, such as Maximum A Posterior (MAP), aimed at establishing optimization models with prior regularization to address this task. However, designing sophisticated priors may lead to challenging optimization model and time-consuming iteration process. Recent studies tried to embed learnable network architectures into the MAP scheme. Unfortunately, for the MAP model with deeply trained priors, the exact behaviors and the inference process are actually hard to investigate, due to their inexact and uncontrolled nature. In this work, by investigating task-driven latent feasibility for the MAP-based model, we provide a new perspective to enforce domain knowledge and data distributions to MAP-based image modeling. Specifically, we first introduce an energy-based feasibility constraint to the given MAP model. By introducing the proximal gradient updating scheme to the objective and performing an adaptive averaging process, we obtain a completely new MAP inference process, named Proximal Average Optimization (PAO), for image modeling. Owning to the flexibility of PAO, we can also incorporate deeply trained architectures into the feasibility module. Finally, we provide a simple monotone descent-based control mechanism to guide the propagation of PAO. We prove in theory that the sequence generated by both our PAO and its learning-based extension can successfully converge to the critical point of the original MAP optimization task. We demonstrate how to apply our framework to address different vision applications. Extensive experiments verify the theoretical results and show the advantages of our method against existing state-of-the-art approaches.



There are no comments yet.


page 1

page 5

page 7

page 8

page 9

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Fig. 1: The task-driven feasibility paradigm. This illustrates and compares the general optimization strategy (i.e., “A”, with generally feasible regions, abbreviated as GF) and our proposed ones (i.e., TF-PAO in “B” and LTF-PAO in “C” with task-driven feasibility and learning-based task-driven feasibility, respectively). We will demonstrate that our task-driven feasibility paradigm can significantly improve the optimization process. Especially, the learning-based scheme can investigate the task-driven feasibility from training data, thus more suitable for real-world applications.

While in many vision tasks we often formulate the inverse problems that finding the latent image from the observed one as , where denotes some degradation matrix relating to an imaging/degradation system (such as blur kernel, downsamples or mask, etc.) and is an i.i.d white Gaussian noise term with unknown standard noise level. Typically, in most real-world scenarios, solving the inverse problems is challenging and mathematically ill-posed, that is, the optimal solution either not exists, or is not unique. Over the past decades, numerous methods have been developed to address these low-level vision problems, such as optimizing designed-priors [1, 2, 3, 4, 5]

, deep learning-based approaches 

[6, 7, 8, 9, 10, 11, 12]

. These inverse problems are often formulated as Maximum A Posterior (MAP) estimation with some conditional probability

and prior distribution which can be Indeed, solving the MAP framework is equivalent to deal with the minimization problem who tries to figure out an optimal solution subject to a certain set of constraints such that their objectives reach the best. Following this perspective, the MAP model can be formulated as


where function and typically capture the loss of data fitting and the regularization, respectively. In this work, we assume that the loss is smooth, while the regularization can be nonconvex and nonsmooth.

Owing to the ill-posed nature of most image processing tasks, it is necessary to design priors for getting desired solutions. For example, many image restoration tasks utilize the sparsity prior as the regularization term [1, 2]. In particular, Eq. (1) can be minimized by a broad class of general numerical optimization methods, among which the Proximal Gradient (PG) [13], Half Quadratic Splitting (HQS) [14] and Alternating Direction Method of Multipliers (ADMM) [15] are proven to be the most reliable methods. Over the past decades, many efforts have been devoted to these schemes. For example, by integrating Nesterov’s accelerated gradient method [16] into the fundamental PG scheme, APG is initially developed for convex models [1, 17]. Subsequently, other typical APGs are derived solving problem (1) including monotone APG (mAPG) [18], inexact APG (niAPG) [19] and momentum APG for nonconvex problem (APGnc) [20], etc. Optimization designed priors strategies provide a mathematical understanding of their behaviors with well-defined regularization properties. Flexible and exact proper prior is challenging to construct and solve. However, simple regularizer performs poorly when compared with state-of-the-art methodologies in real-world applications. This is because these methods do not exploit the particular structures of the image processing tasks at hand nor the input data distributions. These limits make it difficult to solve the problems in a purely optimized manner with designed priors.

Different from the optimizing designed-priors schemes, learning-based approaches learn mapping functions to deduce the desirable high-quality images from the observed one. In recent years, various learning-based strategies [6, 9, 10] have been proposed to address practical image modeling problems. These discriminative learning-based methods combine the classical numerical solvers and the collected training data to obtain some task-specific iterations. Similar to this view, plug-in schemes have been recently studied extensively with a great empirical success that replace the regularization term by using task-related operator for vision problems [21, 8, 22, 23, 24]. Indeed, these algorithms perform better than some state-of-the-art methods in real-world applications. Unfortunately, these plug-in schemes consider the prior term with implicit regularization which may break the properties and structures of the objective in Eq. (1). Thus, the existing proofs only demonstrate the iteration sequences converge to a fixed point without knowing the relationship between it and the optimal solutions. By introducing spectral normalization technique for more accurately constraining deep learning-based denoisers, the fixed-point theory is established in [25]. Whereas this theory result is effective only under the strongly convex condition of function which is unsatisfied in plenty of vision problems, for example when the data fitting term is setting as with non-column full rank matrix , the strong convex property is unattainable. In contrast to these implicit plug-and-play methods, an explicit plug-in scheme is developed in [7], named Regularization by Denoising (RED), with explicit Laplacian-based regularization functional. While the regularization term is required to be symmetric which is unsatisfied in many state-of-the-art methods, such as NLM [26], RF [27] and BM3D [28] etc. Further work has been discussed in [29, 30]. Optimal condition-based approaches, such as [31, 32] have been developed efficiently for solving Eq. (1). However, the implicit condition relies on estimating the sub-differential which is usually attained implicitly.

To address the above issues, we develop a Proximal Averaged Optimization (PAO) for the challenging MAP-based nonconvex and nonsmooth problem described in Eq. (1). The proposed PAO is a joint process of optimizing objective and feasibility constraint. Specifically, by enforcing task-driven latent feasibility for MAP-type model, we develop a new perspective to investigate domain knowledge and data distributions for image modeling (see Fig. 1 for illustration.). For Task-driven Feasibility (TF) type minimization problem, we establish a TF-PAO scheme. Considering the flexibility of PAO, we further embed the learnable form into the feasibility constraint and derive a LTF-PAO scheme. The learnable form incorporates designed/trained architectures that aim to find the task-related optimal solution. We rigorously prove that both the proposed PAO and its learning-based extension scheme converge to a critical point of the original problem (1) with a commonly used monotone descent condition. We also demonstrate how to apply our paradigm to address challenging real-world vision applications, such as image deblurring, inpainting and rain streaks removing. Different from manually designed-priors solved by general numerical methods, the proposed PAO exploits the particular structures and the data distributions of image processing. Learning-based methodologies [21, 23, 24, 8] embed learnable network architectures into the MAP framework, whereas the inference process and the exact behaviors of which are actually hard to investigate. In comparison, the developed PAO enables the MAP framework to insert the learnable structures without changing the properties of the objective. In summary, the contributions of this paper mainly include:

  • The developed PAO provides a novel perspective to introduce task-driven feasibility module for nonconvex and nonsmooth MAP-based image modeling problems. By embedding the additional learning-based deep architectures, PAO exploits the data distribution and the particular structure by incorporating the LTF module.

  • The developed TF-PAO and LTF-PAO keep the properties and structures of the objective. Specifically, the developed iteration sequences converge to one of the critical points of Eq. (1). Moreover, we prove in theory that the proposed frameworks can derive sequence convergence results.

  • Further, we also consider the PAO as a flexible ensemble framework to solve the optimization model as described in Eq. (1

    ) when addressing different real-world computer vision tasks. Extensive experiments show the superiority of our PAO method on the tested problems.

Ii The Proposed Algorithm

In this section, by enforcing task-driven feasibility, we first reformulate the minimization problem (1) to a constraint-based scheme which can embed TF and LTF modules for vision tasks. Then, by incorporating with LF and LTF, the PAO is developed for the nonconvex modeling-based image restoration problem. The convergence behaviors about the PAO are performed under some loose conditions about the objective function accordingly.

Ii-a Enforcing Task-driven Feasibility

To solve the MAP-based nonconvex minimization problem as described in Eq. (1), we first reformulate it by enforcing task-driven feasibility module. Then, the original problem can be reformulated as the following constraint-based form


In general, is designed as a rough estimation, such as, the bounded domain , or a set estimated by some equality or inequality constraints, which is normal to characterize the solution space in image processing problems [33, 34]. In this work, the energy-based latent feasibility constraint can be constructed by the TF module, named , for image restoration problems,


where is differentiable and is nonsmooth and nonconvex. We assume that and are proper and lower-semicontinuous.

Considering the important feature of the complex data distribution in real-world applications, we further introduce Learning-based TF (LTF) module to incorporate designed/trained architectures to optimize Eq. (1). Specifically, the network-based building block at the -th iteration can be denoted as , where is the set of learnable parameters with -th training stage111Please refer to the next section for the detailed structures of in real-world applications.. We denote the temporary variable at the -th iteration as . By further considering LTF form as the proximal approximation of Eq. (3) with parameter at the -th iterative, the learning-based task-driven feasibility module can be


Indeed, the modules and are task-related which either can be selected to maintain some of the characteristics of the problem, such as, smooth, edge-enhanced, denoiser and sparse, etc., or can be introduced to characterize the task on another domain. For example, the argument of minimized objective function and the constraint module can be constructed as the image domain and the gradient domain respectively.

Ii-B Proximal Averaged Optimization

In the light of the model (2), two task-driven feasibility type (TF and LTF) PAO are developed in this section.

Ii-B1 PAO with Task-driven Feasibility

Specifically, for objective minimization problem (1), we just adopt PG method to update the variable which yields


where and is the step size. As for the Eq. (3), we actually perform any first-order methods to solve this subproblem such as, PG [1], APG [18], ADMM [15, 35] and HQS [14] etc. By introducing a linear averaging form about and with a weighted parameter sequence ,

our complete TF-PAO iterations are summarized in Alg. 1.

0:  The input , , parameters and .
1:  while not converged do
2:      and .
3:     .
4:     .
5:  end while
Algorithm 1 Tasd-driven Feasibility PAO (TF-PAO)
1:  if  then
2:     .
3:  else
4:     .
5:  end if
6:  .
Algorithm 2

Actually, the temporary variable maybe a bad extrapolation which has potential to fail. To address this issue, we introduce the correction step named as Monotone Descent Updating Scheme (MDUS) in Alg. 2 to ensuring the descent property, i.e., . This technique can be commonly found in first-order methods [17, 18, 36]. Indeed, it is not difficult to understand that only the descent property could neither ensure the decreasing of nor the convergence to a critical point in nonconvex programming. Under the property of proximal gradient, our algorithm obtains sufficient descent. Then we would like to summarize the convergence behaviors for the proposed algorithm in the subsection III222We move the proofs of all our theoretical results to the Appendix Materials..

Ii-B2 PAO with Learning-based Task-driven Feasibility

By embedding designed/trained architectures to TF module, we then develop the LTF-PAO to optimize MAP-based model described in Eq. (1). What share with the statement in subsection II-B is the freedom for selecting method to solve this constraint-based subproblem. Then by introducing a relative loose boundness condition about , we summarize the complete LTF-PAO scheme in Alg. 3. Indeed, is the output of network which is used to approximate the task-driven module at the -th iteration. We introduce a boundness condition (in Steps 5 of Alg. 3) to control the iteration sequence. This aims to prevent any improperly designed/trained architectures, which may deflect our iterative trajectory towards unwanted solutions. The monitor is obtained by checking the boundedness of . The convergence behaviors are summarized in the following section.

0:  The input , , , parameters , and .
1:  while not converged do
2:     , .
3:     .
4:     .
5:     if  then
6:        .
7:     else
8:        , .
9:     end if
10:     .
11:  end while
Algorithm 3 Learning-based TF-PAO (LTF-PAO)

Iii Convergence Results

In this part, we would like to discuss the convergence behaviors for the proposed TF-PAO and LTF-PAO algorithms. We suggest readers to refer to [37] for some definitions in variational analysis, such as proper, lower-semicontinuous, coercive and the limiting subdifferential which will be useful in the following analysis. Our convergence analyses are also based on the following fairly loose assumptions.

Assumption 1.

The objective function in Eq. (1) is proper, lower-semicontinuous and coercive. Function is convex and Lipschitz smooth, i.e., , we have , where is the Lipschitz constant for .

Theorem 1.

Let be the iteration sequences generated by Alg. 1. Then the results are summarized in the following.

  • There exists a constant satisfying that

  • Let be any accumulation of the sequence which implies that is a critical point of the minimization problem in Eq. (1), i.e., .


We first show the sufficient descent property about . By using the proximal update scheme in Step 2 of Alg. 1 and the Lipschitz property of , i.e.,

we conclud that From the MDUS correction scheme, the sufficient descent property is obtained with . Then, we will show the second item. The sufficient descent inequality in the first item implies which means that i.e., there exist subsequence and convergence to a same point as . Incorporating the lower-semicontinuous of and the supreme principle, we obtain that . With the optimal condition we know that Actually, for , we have The above imply that is a critical point. This complete the proof. ∎

(a) PSNR / SSIM (b) 28.1463 / 0.8064 (c) 28.6781 / 0.8436 (d) 29.2157 / 0.8447
Input Eq. (1) Eq. (4) Eq. (2)
Fig. 2: Illustration of the effect of task-driven feasibility for minimize Eq. (1) with the comparison results of PSNR and SSIM scores. subfigure (a) is the input image. (b) and (c) are the optimization results of Eq. (1) and LTF module respectively. subfigure (d) is the result by optimizing Eq. (2) with LTF-PAO.


Note that, the proposed algorithm is a modification PG scheme. Under some mild conditions, for example, the semi-algebraic property on , the sequence convergent property still holds.

Theorem 2.

With the semi-algebraic property333Please refer to [38] for the formal definition of semi-algebraic function. Actually, many functions arising in learning and vision areas, including norm, rational norms (i.e., with positive integers and ) and their finite sums or products, are all semi-algebraic. of function , we can further assert that the sequence in Alg. 1 has finite length, i.e.,


With the KL property (see [38]) and the definition of the sub-differential, we have where is the desingularizing function. From the concavity of , we obtain

If we denote and , the inequality holds which implies Subsequently, we have

Obviously, the above inequality implies the finite length of sequence . If , the inequality holds. If , with the proper, lower-semicontinuous and coercive property of , the sequence is bounded. Then with the update scheme about and the finite length of , we have

This completes the proof. ∎

Indeed, the summable sequence as stated in Theorem 2 implies that there exist satisfying , as . Subsequently, it follows that the iteration is a Cauchy sequence and hence is a globally convergent sequence which is also defined as sequence convergence.

Remark 1.

According to the convergence analysis described in II-B, the objective function is sufficiently descent in Alg. 3 and it is easy to check that the convergence results of Alg. 3 can be obtained in the same manner as stated in Theorem 1. The temporary iteration in Alg. 3 is bonded under the checking condition. This implies the boundness of . Then, in Alg. 3, the sequence convergent property of is attained.

Iv Applications

We emphasize that different from these existing image modeling approaches, the proposed PAO allows us to introduce a task-driven feasibility module related to the application areas when solving the optimization model in Eq. (1

). This section first considers non-blind deconvolution and image inpainting. We take non-blind deconvolution as an illustrative example for establishing PAO. Then, we extend the PAO to even more challenging single image rain streaks removal task.

Iv-a Image Deconvolution

Here we consider a particular non-blind deconvolution problem, which aims to recover latent image from blurred observation . By formulating this problem using sparse coding model , where denotes the sparse code, is a given dictionary444We follow standard settings in image processing to define as the inverse wavelet transform in our problem. Indeed, the form is denoted as , where is the matrix of the blur kernel , is the inverse of the wavelet transform of . and is the unknown noise, we derive a specific case of Eq. (1), that is where , . Subsequently, it can be equivalently described as the following intuitive form, i.e.,


In the following, we will give an example to illustrate PAO with two module settings, i.e., task-driven feasibility and learning-based task-driven feasibility .

Iv-A1 With Task-driven Feasibility

As for the module , we aim to introduce a relatively simple and task-related model to enforce our distribution assumptions on the latent image. Then, we introduce the widely used Total Variation (TV) model [39] in image domain i.e.,


where is threshold parameter, , and respectively denote the gradient on the horizontal and vertical directions. As it is flexible to select operators for solving Eq. (7), we indeed apply HQS scheme to update in this paper. By introducing two auxiliary variables (named and ), the iteration is

where and are two constant parameters. and are updated by proximal gradient operator that we omit it here. Then, applying the proximal gradient approach to update which can be transformed as yields the following form

where . It is clearly to obtain the detailed updating steps following Alg. 1.

Iv-A2 With Learning-based Task-driven Feasibility

Specifically, in this work, the network is considered as a residual formulation. The learning-based iteration step can be directly adopted by where is the set of learnable parameters with -th training stage and

is the basic network unit. In our network, there are nineteen layers which includes seven convolution layers, six ReLU layers, five batch normalization layers and one loss layer. The detailed information about

can be found in the experimental results section. Notice that standard training strategies can be directly adopted to optimize parameters of our basic architecture. If necessary, one may further jointly fine-tune parameters of the whole network after the design phase. By setting , the learning-based scheme of is

Hence, following the iteration form of and the above learning-based scheme, we obtain that

By considering the latent image as the uniform augment, we actually obtain a PAO to integrate both synthesis and analysis mechanisms to address different vision applications, including deblurring, inpainting and rain streaks removal etc. Here the matrix actually formulates the observation forward model for particular image processing paradigm. Possible choices of include an identity operator for denoising, convolution operators for deblurring, filtered subsampling operators for superresolution, the Fourier -domain subsampling operator for magnetic resonance imaging (MRI) reconstruction or mask for image inpainting. We incorporate experimentally designed and trained network architectures into the PAO for solving these problems. In summary, the proposed PAO indeed could integrates advantages from different domain knowledge.

(a) (b)
Fig. 3: Comparing different modularization settings of FMP with additional Gaussian noise level . subfigure (a) plots PSNR of TF-PAO with different first-order methods when updating . subfigure (b) shows the PSNR results of LTF-PAO with different data-ensemble structure, named as BM3D, CNN, RF and TV.
Fig. 4: Illustration the iteration behaviors of FMP in different settings with additional Gaussian noise level 1‰. subfigures (a) and (b) plot the variation about each updating of TF-PAO and LTF-PAO respectively. The legend in subfigures (a) and (b), i.e., ””, means . subfigure (c) plot the error control condition of LTF-PAO.
Fig. 5: Comparing iteration behaviors of TF-PAO and LTF-PAO to classical first-order methods, including exact ones (PG, mAPG), inexact APG (niAPG) and APGnc with additional Gaussian noise level 1‰.

Iv-B Rain Streaks Removal

This subsection focuses on single image rain streaks removal task which is a challenging real-world computer vision problem. A rainy image is often considered as linear combination of rain-free background and rain streaks layer , i.e., . We set . In terms of designing the optimization model, rather than make efforts in exploiting complex priors, we consider the fundamental energy-based sparsity of the observation in certain transform domain as

where , , , and are two positive constants. is the indicator function, i.e., if , then , otherwise . and respectively denote the sparse codes of , on which are two auxiliary variables serving for the subproblem. As for stated in Eq. (2), we consider the general TV regularization as described in the following form

In this part, we first introduce two residual network and as stated in [8]. Then we denote two temporary variable and respectively for background and rain streaks layers. For the background layer network , we just follow [8] to build a series of denoising CNNs which extract the natural image well. For rain streaks layer, the learns rain streaks behavior from rainny images by training rainy image and synthetic rain layer as the degraded clean image pair. We update variables and synchronously

where the auxiliary variables and are updated by proximal gradient operator. Similarily, we obtain that

Then, following the steps described in LTF-PAO, we obtain the updating scheme.

Methods 1 2 3
APG 27.32 0.71 25.61 0.63 24.63 0.57
mAPG 26.68 0.67 25.20 0.60 24.39 0.55
niAPG 27.24 0.73 25.63 0.64 24.76 0.61
FTVD 27.56 0.77 26.63 0.73 24.88 0.62
Ours 28.48 0.81 27.06 0.75 26.13 0.71
TABLE I: Averaged PSNR and SSIM on the Benchmark Image Set [40]. The first row represents the Gaussian noise level. The first column is the comparison with traditional methods.
30.8229 / 0.9108 38.2246 / 0.9247 38.2269 / 0.9251 38.0961 / 0.9324 38.3633 / 0.9581 40.4552 / 0.9802
PG mAPG niAPG APGnc TF-PAO (Ours) LTF-PAO (Ours)
Fig. 6: The non-blind image deconvolution performances of the proposed TF-PAO and LTF-PAO scheme with compared proximal-based first-order methods (PG, mAPG, niAPG and APGnc). The quantitative scores (PSNR / SSIM) are marked blow each images.
Fig. 7: Comparisons of non-blind image deconvolution results with state-of-the-art methods on a challenging real-world blurry image.
Levin 31.35/0.90 29.38/0.88 31.65/0.93 31.55/0.87
Sun 30.79/0.86 30.67/0.85 32.44/0.89 31.55/0.88
Times 48.66 6.38 721.98 0.50
Methods MLP IRCNN FDN Ours
Levin 31.32/0.90 32.28/0.92 32.04/0.93 32.98/0.94
Sun 31.47/0.88 32.61/0.89 32.65/0.89 32.90/0.90
Times 4.59 16.67 2.70 2.41
TABLE II: Averaged quantitative comparison of image deblurring on Sun et al.’s and Levin et al.’s benchmark.

V Experimental Results

In this section, we first verify our theoretical results by investigating the iteration behaviors of the proposed TF-PAO and LTF-PAO on standard deconvolution formulation with Eq. (6). We then evaluated the state-of-the-art performance of LTF-PAO both with general and learning-based methods on different vision applications. We conducted these experiments on a computer with Intel Core i7-7700 CPU (3.6 GHz), 32GB RAM and an NVIDIA GeForce GTX 1060 6GB GPU.

V-a Theoretical Verifications

To verify our theoretical investigations, we performed experiments on non-blind deconvolution. Notice that this problem can be directly addressed by our TF-PAO and LTF-PAO.

V-A1 Modularization Settings

We first provide an comparison about the optimization models in image deblurring application and the corresponding results are shown in Fig. 2. Observed that the proposed PAO with LTF module performs better both than the modeling scheme in Eq. (1) and the LTF in Eq. (4). This illustrates the effectiveness of the proposed PAO scheme.

We then analyze the performance and flexibility of PAO with different operator settings and the corresponding PSNR (i.e., peak signal-to-noise ratio) results with

noise level are plotted in Fig. 3. As for solving the subproblem (7) specified in , four different first-order methods, such as PG (), APG (), HQS () and ADMM () are considered as mentioned in subsection II-B. The PSNR scores of TF-PAO under four different strategies are plotted in Fig. 3 (a). Observed that various methods when obtaining have a slight influence on the performance of our TF-PAO scheme. We adopt HQS as the approach obtaining the iteration steps of in TF-PAO and in LTF-PAO. Note to say, to provide a relative fairness comparison, we keep parameters same under four different circumstances mentioned above. Hereafter, we select relative error (i.e., ) as stop criterion.

24.02 / 0.79 25.57 / 0.84 23.21 / 0.76 26.00 / 0.89 25.95 / 0.85 29.02 / 0.92
Fig. 8: Image inpainting results (with PSNR / SSIM scores) on a challenging example image with missing pixels.

To analyze the effects of network-based block , four different task-specific structures, i.e., TV [39], RF [27], CNNs and BM3D [28] (named as , , and , respectively) are adopted under the LTF-PAO scheme. For CNNs architecture, we introduce a residual network which consists of nineteen layers: seven dilated convolutions with

filter size, six ReLu operations (plugged between each two convolution layers) and five batch normalizations (plugged between convolution and ReLU, expect the first convolution layer). In training stage, we randomly select 800 natural images from ImageNet database 

[41]. The selected pictures are cropped into patches of size . Fig. 3 (b) plotted the PSNR with , , and . As can be seen, LTF-PAO performs better and faster with than others. Hence, we setting as CNNs hereafter.

We further compared our LTF-PAO with the traditional methods under three different Gaussian noise levels (i.e., , and ) on the image set (collected by [40]) and the corresponding results are shown in Tab. I with quantitative performance (i.e., PSNR and SSIM metrics). It can be seen that our LTF-PAO outperforms classical numerical solvers by a large margin in terms of the performance.

Mask Text
TV 32.22/0.93 29.20/0.86 26.07/0.74 35.29/0.97
FoE 34.01/0.90 30.81/0.81 27.64/0.65 37.05/0.95
VNL 27.55/0.91 26.13/0.85 24.23/0.75 28.58/0.95
ISDSB 31.32/0.91 28.23/0.83 24.92/0.70 34.91/0.96
WNNM 31.75/0.94 28.71/0.89 25.63/0.78 34.89/0.97
IRCNN 34.92/0.95 31.45/0.91 26.44/0.79 37.26/0.97
Ours 34.94/0.96 31.61/0.91 27.88/0.81 37.38/0.98
TABLE III: Averaged quantitative comparison of image inpainting on CBSD68 dataset [42]. The first row is the proportion of masks. The first column is the Comparison methods on inpainting.

V-A2 Convergence of PAO

Next, we illustrate the convergence behaviors of PAO Schemes. To evaluate the variation trend of described in PAO, the variable of and intermediate variables (, in TF-PAO and in LTF-PAO) between and are plotted in Fig. 4 (a) and (b). In Fig. 4 (a), the iteration behaviors of , , and prove the boundness of , and . Similarly, we plotted the convergence curves of LTF-PAO in Fig. 4 (b). Aiming at illustrating the iteration steps of LTF-PAO, we show the select condition about the relationship between and described in Fig. 4 (c). This implies the boundness of .

For the proposed schemes of TF-PAO and LTF-PAO are proximal-based methods, it is necessary to compare our methods with the existing proximal-based first-order approaches, such as, the classical proximal gradient (PG), monotone APG (mAPG) [18], inexact APG (niAPG) [19] and momentum APG for nonconvex problem (APGnc) [20], with additional 1‰ noise level and kernel size. The comparison results are shown in Fig. 5 with relative error after log transformation (), reconstruction error (), functional value and PSNR, where denotes the ground truth. Here, the stop criterion is set as . Obviously, the proposed TF-PAO converge faster than other PGs under the same stop condition. Observed that, LTF-PAO perform the best both in PSNR scores and the iteration steps. The corresponding visual results are shown in Fig. 6 with PSNR and SSIM (i.e., structural similarity) scores. Observed that the proposed LTF-PAO remove more noise while keeping the details.

32.87 / 0.91 32.12 / 0.92 29.69 / 0.86 33.40 / 0.96 28.18 / 0.89 37.10 / 0.97
Fig. 9: Rain streaks removal comparison results of synthesized image (top row) from Test1 and real-world rain image (middle and bottom row).

V-B State-of-the-art Comparisons

We then evaluated our LTF-PAO on a variety of low-level vision applications including image deblurring, image inpainting and rain streaks removal.

V-B1 Image Deblurring

In this task, matrix stated in the application part is the blur kernel and is blurry image. As usual, the blurry images are synthesized by applying a blur kernel and adding additive Gaussian noise. We consider the circular boundary conditions when performing the convolution. We reported the results of our LTF-PAO on Sun et al’ challenging benchmark [43] and Levin et al’ dataset [44], together with other state-of-the-art methods including the traditional methods (e.g., IDDBM3D [45], TV [46], parameters learning based methods (e.g., EPLL [47], CSF [40], ) and network based methods (e.g., MLP [48], IRCNN [8], FDN [49]). It can be seen in Tab. II that our method obtained the best quantitative performance (i.e., PSNR and SSIM metrics) on Sun et al’ and Levin et al’ dataset. Moreover, we illustrated the visual comparisons on real image deblurring [50] with unknown blur kernel which is estimated roughly by Pan et al.’ method [51]. As shown in Fig. 7, our method reserve more details.

V-B2 Image Inpainting

In image inpainting task, the matrix is mask, and is the missing pixels image. This task aims to recover the missing pixels of the observation. Here we compared our LTF-PAO with TV [39], FOE [52], VNL [53], ISDSB [54], WNNM [55] and IRCNN [8] on this problem. We normalized the pixel values to . We generated random masks of different levels including , , missing pixels on CBSD68 dataset [42]. Moreover, we collected 12 different text masks to further evaluate the proposed methods. Tab. III presented the PSNR and SSIM comparison results with different masks. Observed that our method perform better than the state-of-the-art approaches regardless the proportion of masks. Further, in comparison with the visual performance of LTF-PAO with other methods, we presented the missing pixels comparisons in Fig. 8 with top five scores (TV, FoE, ISDSB, WNNM and IRCNN). It can be observed that our approach successfully recovered the image with better visual quality, especially in the zoomed-in regions with rich details.

Methods Test1 Test2 Rain100H
JCAS 31.61 0.9183 28.37 0.9050 15.23 0.5150
GMM 32.33 0.9042 29.57 0.8878 14.26 0.4225
DN 30.30 0.9151 27.34 0.9009 13.72 0.4417
DDN 33.41 0.9442 29.91 0.9433 17.93 0.5655
UGSM 33.30 0.9253 27.07 0.9220 14.90 0.4674
JORDER 35.93 0.9530 35.11 0.9732 23.45 0.7490
DID-MDN 29.08 0.9015 27.92 0.8695 17.28 0.6035
Ours 36.39 0.9630 34.88 0.9737 24.30 0.8044
TABLE IV: Averaged PSNR and SSIM results among different rain streaks removal methods on three different rain streaks synthesized form: Test1 (Rain12) [34], Test2 (Rain7) [56] and Rain100H [57].

V-B3 Single-image Rain Streaks Removal

In this part, we evaluated our method on the task of rain streaks removal, in comparison with the state-of-the-art including GMM [34], DN [58], DDN [59], JCAS [60], JORDER [61], UGSM [62], and DID-MDN [63]. All the comparisons shown in this paper are conducted under the same hardware configuration.

Tab. IV reported the quantitative scores on three different datasets: (1) Test1 is obtained by [34], includes 12 synthesized rain images with only one type of rain streaks rendering technique. (2) Rain100H is collected from BSD200 [57] and synthesized with five streak directions; (3) Test2 data-set consists of 7 images, using photorealistic rendering of rain streaks [56]. Here we just adopt the training set provided by Yang et al. [61] for . According to the quantitative results reported in Tab. IV, we provide visual comparisons for five methods with relative high PSNR and SSIM scores (i.e., GMM, DDN, UGSM, JORDER, DID-MDN) in Fig. 9. It can be observed that the proposed LTF-PAO scheme can reserves more details with very few rain streaks left no matter in synthesized or real-world rainy images.

DN 25.51 0.8885 DID-MDN 27.94 0.8696
UGSM 26.38 0.8261 JORDER 27.50 0.8515
DDN 29.90 0.8999 Ours 31.18 0.9152
TABLE V: Averaged PSNR and SSIM on Fu et al.’ [59] test set.
DDN (27.79 / 0.8371) Ours (28.48 / 0.8531)
Fig. 10: Image rain streaks removal results (PSNR / SSIM scores) on Fu et al.’ test set.

Furthermore, we conducted experiments on a large scale dataset with 1,400 test images (collected by [59]). The quantitative and qualitative results are demonstrated in Tab. V and Fig. 10, respectively. Observed that our method performed much better than the compared ones.

Vi Conclusions

In this paper, we developed a Proximal Averaged Optimization (PAO) method for the challenging nonconvex MAP-based model in Eq. (1). By introducing two constraint schemes, i.e., task-driven feasibility and learning-based task-driven feasibility module, TF-PAO and LTF-PAO were established respectively. Then we proved the convergence of PAO with some relatively loose assumptions. Extensive experiments on some challenging tasks showed that our method has better visual performance and quantitative scores against other state-of-the-art methods.


  • [1] A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM journal on imaging sciences, vol. 2, no. 1, pp. 183–202, 2009.
  • [2] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D: nonlinear phenomena, vol. 60, no. 1-4, pp. 259–268, 1992.
  • [3] L. Xu, Q. Yan, Y. Xia, and J. Jia, “Structure extraction from texture via relative total variation,” ACM Transactions on Graphics (TOG), vol. 31, no. 6, p. 139, 2012.
  • [4] J. Cheng, Y. Gao, B. Guo, and W. Zuo, “Image restoration using spatially variant hyper-laplacian prior,” Signal, Image and Video Processing, vol. 13, no. 1, pp. 155–162, 2019.
  • [5] D. Krishnan and R. Fergus, “Fast image deconvolution using hyper-laplacian priors,” in NeurIPS, 2009, pp. 1033–1041.
  • [6] R. Liu, G. Zhong, J. Cao, Z. Lin, S. Shan, and Z. Luo, “Learning to diffuse: A new perspective to design pdes for visual analysis,” IEEE TPAMI, vol. 38, no. 12, pp. 2457–2471, 2016.
  • [7] Y. Romano, M. Elad, and P. Milanfar, “The little engine that could: Regularization by denoising (red),” SIAM Journal on Imaging Sciences, vol. 10, no. 4, pp. 1804–1844, 2017.
  • [8] K. Zhang, W. Zuo, S. Gu, and L. Zhang, “Learning deep cnn denoiser prior for image restoration,” in IEEE CVPR, 2017, pp. 3929–3938.
  • [9] Y. Chen and T. Pock, “Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration,” IEEE TPAMI, vol. 39, no. 6, 2017.
  • [10] K. Gregor and Y. LeCun, “Learning fast approximations of sparse coding,” in ICML.   Omnipress, 2010, pp. 399–406.
  • [11] P. Mu, J. Chen, R. Liu, X. Fan, and Z. Luo, “Learning bilevel layer priors for single image rain streaks removal,” IEEE Signal Processing Letters, vol. 26, no. 2, pp. 307–311, 2018.
  • [12] R. Liu, Z. Jiang, X. Fan, and Z. Luo, “Knowledge-driven deep unrolling for robust image layer separation,” IEEE TNNLS, 2019.
  • [13] D. P. Bertsekas and A. Scientific, Convex optimization algorithms.   Athena Scientific Belmont, 2015.
  • [14] M. Nikolova and M. K. Ng, “Analysis of half-quadratic minimization methods for signal and image recovery,” SIAM Journal on Scientific computing, vol. 27, no. 3, pp. 937–966, 2005.
  • [15] S. Boyd, “Alternating direction method of multipliers,” in NeruIPS, 2011.
  • [16] Y. E. Nesterov, “A method for solving the convex programming problem with convergence rate o (1/k^ 2),” in Dokl. akad. nauk Sssr, vol. 269, 1983, pp. 543–547.
  • [17] A. Beck and M. Teboulle, “Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems,” IEEE TIP, vol. 18, no. 11, 2009.
  • [18] H. Li and Z. Lin, “Accelerated proximal gradient methods for nonconvex programming,” in NeruIPS, 2015.
  • [19] Q. Yao, J. T. Kwok, F. Gao, W. Chen, and T.-Y. Liu, “Efficient inexact proximal gradient algorithm for nonconvex problems,” in IJCAI, 2017.
  • [20] Q. Li, Y. Zhou, Y. Liang, and P. K. Varshney, “Convergence analysis of proximal gradient with momentum for nonconvex optimization,” in ICML, 2017.
  • [21] S. V. Venkatakrishnan, C. A. Bouman, and B. Wohlberg, “Plug-and-play priors for model based reconstruction,” in 2013 Global Conference on Signal and Information Processing.   IEEE, 2013, pp. 945–948.
  • [22] X. Wang and S. H. Chan, “Parameter-free plug-and-play admm for image restoration,” in IEEE International Conference on Acoustics, 2017.
  • [23] S. H. Chan, X. Wang, and O. A. Elgendy, “Plug-and-play admm for image restoration: Fixed-point convergence and applications,” IEEE Transactions on Computational Imaging, vol. 3, no. 1, pp. 84–98, 2017.
  • [24]

    K. Zhang, W. Zuo, and L. Zhang, “Deep plug-and-play super-resolution for arbitrary blur kernels,” in

    IEEE CVPR, 2019, pp. 1671–1681.
  • [25] E. K. Ryu, J. Liu, S. Wang, X. Chen, Z. Wang, and W. Yin, “Plug-and-play methods provably converge with properly trained denoisers,” arXiv preprint arXiv:1905.05406, 2019.
  • [26] D. Kostadin, F. Alessandro, K. Vladimir, and E. Karen, “Image denoising by sparse 3-d transform-domain collaborative filtering,” IEEE TIP, vol. 16, no. 8, pp. 2080–2095, 2007.
  • [27] M. Unser, A. Aldroubi, and M. Eden, “Recursive regularization filters: design, properties, and applications,” IEEE TPAMI, no. 3, pp. 272–277, 1991.
  • [28] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3-d transform-domain collaborative filtering,” IEEE TIP, vol. 16, no. 8, pp. 2080–2095, 2007.
  • [29]

    T. Hong, Y. Romano, and M. Elad, “Acceleration of red via vector extrapolation,”

    Journal of Visual Communication and Image Representation, vol. 63, p. 102575, 2019.
  • [30] E. T. Reehorst and P. Schniter, “Regularization by denoising: Clarifications and new interpretations,” IEEE Transactions on Computational Imaging, vol. 5, no. 1, pp. 52–67, 2019.
  • [31] R. Liu, S. Cheng, L. Ma, X. Fan, and Z. Luo, “Deep proximal unrolling: Algorithmic framework, convergence analysis and applications,” IEEE TIP, 2019.
  • [32] R. Liu, S. Cheng, Y. He, X. Fan, Z. Lin, and Z. Luo, “On the convergence of learning-based iterative methods for nonconvex inverse problems,” IEEE TPAMI, 2019.
  • [33] C. Bao, H. Ji, Y. Quan, and Z. Shen, “L0 norm based dictionary learning by proximal methods with global convergence,” in IEEE CVPR, 2014, pp. 3858–3865.
  • [34] Y. Li, R. T. Tan, X. Guo, J. Lu, and M. S. Brown, “Rain streak removal using layer priors,” in IEEE CVPR, 2016, pp. 2736–2744.
  • [35] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein et al., “Distributed optimization and statistical learning via the alternating direction method of multipliers,”

    Foundations and Trends® in Machine learning

    , vol. 3, no. 1, pp. 1–122, 2011.
  • [36] P. Gong, C. Zhang, Z. Lu, J. Huang, and J. Ye, “A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems,” in ICML, 2013, pp. 37–45.
  • [37] R. T. Rockafellar and R. J.-B. Wets, Variational analysis.   Springer Science & Business Media, 2009, vol. 317.
  • [38] J. Bolte, S. Sabach, and M. Teboulle, “Proximal alternating linearized minimization for nonconvex and nonsmooth problems,” Mathematical Programming, vol. 146, no. 1-2, pp. 459–494, 2014.
  • [39] S. Osher, M. Burger, D. Goldfarb, J. Xu, and W. Yin, “An iterative regularization method for total variation-based image restoration,” Multiscale Modeling & Simulation, vol. 4, no. 2, pp. 460–489, 2005.
  • [40] U. Schmidt and S. Roth, “Shrinkage fields for effective image restoration,” in IEEE CVPR, 2014.
  • [41] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al., “Imagenet large scale visual recognition challenge,” IJCV, vol. 115, no. 3, pp. 211–252, 2015.
  • [42] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising,” IEEE TIP, vol. 26, no. 7, 2017.
  • [43] L. Sun, S. Cho, J. Wang, and J. Hays, “Edge-based blur kernel estimation using patch priors,” in ICCP, 2013, pp. 1–8.
  • [44] A. Levin, Y. Weiss, F. Durand, and W. T. Freeman, “Understanding and evaluating blind deconvolution algorithms,” in IEEE CVPR, 2009, pp. 1964–1971.
  • [45] A. Danielyan, V. Katkovnik, and K. Egiazarian, “Bm3d frames and variational image deblurring,” IEEE TIP, vol. 21, no. 4, 2012.
  • [46] Y. Wang, J. Yang, W. Yin, and Y. Zhang, “A new alternating minimization algorithm for total variation image reconstruction,” SIAM Journal on Imaging Sciences, vol. 1, no. 3, pp. 248–272, 2008.
  • [47] D. Zoran and Y. Weiss, “From learning models of natural image patches to whole image restoration,” in IEEE ICCV, 2011, pp. 479–486.
  • [48] C. J. Schuler, H. Christopher Burger, S. Harmeling, and B. Scholkopf, “A machine learning approach for non-blind image deconvolution,” in IEEE CVPR, 2013.
  • [49] J. Kruse, C. Rother, and U. Schmidt, “Learning to push the limits of efficient fft-based image deconvolution,” in IEEE ICCV, 2017.
  • [50] R. Köhler, M. Hirsch, B. Mohler, B. Schölkopf, and S. Harmeling, “Recording and playback of camera shake: Benchmarking blind deconvolution with a real-world database,” in ECCV, 2012.
  • [51]

    J. Pan, Z. Lin, Z. Su, and M.-H. Yang, “Robust kernel estimation with outliers handling for image deblurring,” in

    IEEE CVPR, 2016.
  • [52] S. Roth and M. J. Black, “Fields of experts,” IJCV, vol. 82, no. 2, 2009.
  • [53] P. Arias, G. Facciolo, V. Caselles, and G. Sapiro, “A variational framework for exemplar-based image inpainting,” IJCV, vol. 93, no. 3, pp. 319–347, 2011.
  • [54] L. He and Y. Wang, “Iterative support detection-based split bregman method for wavelet frame-based image inpainting,” IEEE TIP, vol. 23, no. 12, 2014.
  • [55] S. Gu, Q. Xie, D. Meng, W. Zuo, X. Feng, and L. Zhang, “Weighted nuclear norm minimization and its applications to low level vision,” IJCV, vol. 121, no. 2, pp. 183–208, 2017.
  • [56] S. Tariq, “Rain. nvidia whitepaper,” 2007.
  • [57] D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” in IEEE ICCV, 2001, pp. 416–423.
  • [58] X. Fu, J. Huang, X. Ding, Y. Liao, and J. Paisley, “Clearing the skies: A deep network architecture for single-image rain removal,” IEEE TIP, vol. 26, no. 6, 2017.
  • [59] X. Fu, J. Huang, D. Zeng, Y. Huang, X. Ding, and J. Paisley, “Removing rain from single images via a deep detail network,” in IEEE CVPR, 2017.
  • [60] S. Gu, D. Meng, W. Zuo, and L. Zhang, “Joint convolutional analysis and synthesis sparse representation for single image layer separation,” in IEEE ICCV, 2017, pp. 1717–1725.
  • [61] W. Yang, R. T. Tan, J. Feng, J. Liu, Z. Guo, and S. Yan, “Deep joint rain detection and removal from a single image,” in IEEE CVPR, 2017.
  • [62] T.-X. Jiang, T.-Z. Huang, X.-L. Zhao, L.-J. Deng, and Y. Wang, “Fastderain: A novel video rain streak removal method using directional gradient priors,” IEEE TIP, 2018.
  • [63] H. Zhang and V. M. Patel, “Density-aware single image de-raining using a multi-stream dense network,” in IEEE CVPR, 2018, pp. 695–704.