The basic purpose of image enhancement task is processing given images to make the restored results be more suitable than the original observations for specific applications. There are plenty of ways that bring about degradations on visual data, corresponding to specific image enhancement tasks. For example, image deblurring (also named as deconvolution) is a classical but high-profile application in the image enhancement society 
. The degraded blurry images are produced by the movements of image sensors during exposures, through accumulating incoming lights for amount of times. For another instance, super-resolution is a class of techniques that aim at enhancing the resolutions of low-quality visual devices, such as hand-held computers and mobile phones, in an accurately and quickly manner. While in foggy and hazy weather or underwater scenarios, the atmospheric particles or waters will absorb and scatter not only atmospheric lights, but also reflected lights to cameras. Thus the image acquired under such scenarios are seriously degraded and thus usually have poor visibilities, in which the targets and obstacles are quite difficult to be recognized. Therefore, it is necessary to recover the authentic images from these corrupted observations [3, 4]
. Actually all these image enhancement tasks play active roles not only in improving the visualization of data but also in supporting many subsequent applications in various computer vision, pattern recognition and image analysis problems.
I-a Related Work
In general, existing knowledge-based approaches for settling image enhancement problems can be roughly grouped into two categories, by distinguishing their mechanisms with the prior information, i.e, optimizing designed-priors and learning parameterized-priors. Besides, recently developed deep models often aim to train fully-dependent neural networks to address particular tasks. So in this part, we would like to briefly review related work following the above categories.
I-A1 Optimizing Designed-Priors
Due to the ill-posed nature of most image enhancement tasks, it is necessary to design prior regularizations for getting desired solutions, under the employment of the maximum a posteriori (MAP) approach . Following this way, it is tending to put forward more and more complex priors to better characterize the generative modes of specific tasks and the structures of desired solutions.
Many image enhancement methods exploit the sparsity prior of natural images. Based on the fact that natural image gradients exhibit heavy-tailed distributions , image gradient histograms or total variations  are widely employed as sparsity priors to regularize image enhancement . By using the -optimization techniques, the sparse coding based methods 
encode an image over an over-complete dictionary, and have been demonstrated better image enhancement results than linear transform methods. Though it is non-convex and discontinuous, thepenalty attracts wide attentions since it globally controls the number of non-zero elements. This sparse prior is also a major participant in the fields of image filtering , deconvolution  and image layer separation , etc.
Some of the previously mentioned methods also exploit the local self-similarity prior of images, i.e., restraining a pixel being similar to its neighboring pixels. The image nonlocal self-similarity prior, which is based on the fact that similar patches to a local patch can be quite far from it, which has been demonstrated to achieve better image enhancement performances. The representative works along this line include the nonlocal means , nonlocal regularization , BM3D , etc. In particular, the BM3D method, which stacks the nonlocal similar patches as a 3D cube and applies the 3D wavelet transform to it, has become a benchmark for image denoising; it has also been extended to many other image enhancement tasks . The nonlocal self-similarity can be coupled with low-rank modeling  to further improve the image enhancement results.
Though designing complicated priors probably helps narrowing feasible regions of variables for image enhancement tasks, it also brings difficulties in optimization and obtaining desired solutions. In recent decades, there are some numerical algorithms that have been proposed for solving non-convex optimization problems[18, 19, 20]. However, the theoretical convergence results of those algorithms are relatively weaker compared to the convex problems. Thus it is usually time-consuming for standard numerical algorithms to globally optimize the MAP formulations with sophisticated priors. Moreover, to avoid trapping to the unwanted local minimum on these challenging tasks, the initial points and algorithm parameters are very sensitive to both the final results and the convergence performances of iterative directions.
I-A2 Learning Parameterized-Priors
Instead of using pre-designed priors, one can learn parameterized prior models from natural images directly for image enhancement. It is popular to model image priors with statistical distributions. In 
, a multi-dimensional Gaussian mixture model (GMM) is learned to model image patches, and the so-called EPLL method shows promising denoising results. Moreover, GMM is also used in
to learn piecewise linear estimators for image restoration. In the pioneering work called Fields of Experts (FoE), the filtering responses to images are modeled with the Student’s t-distribution, and the filters are learned with Markov random field. Despite the successes of prior learning based methods, they usually learn a generic image prior model from high quality images, without exploiting the statistics of degraded images and the characteristics of specific enhancement tasks. Therefore, discriminative learning methods have been proposed to model the relationship between the degraded images and their corresponding latent ones, as we briefly introduce below.
Discriminative prior learning methods aim to learn a model to map the degraded image to its corresponding latent image, meanwhile, maximizing the posterior probability of latent data. For example, with conditional random fields, the cascade of shrinkage fields (CSF)  learns a set of filters from clean images and their degraded counterparts from denoising and deblurring. An adaptive version of shrinkage fields  was proposed for blind image deconvolution, and a conditioned regression model  was proposed for image super-resolution. Similar to CSF, Chen et al.  extended FoE priors and proposed a trainable nonlinear reaction diffusion model for both image denoising and super-resolution. By transforming MAP-based energy model into optimization with linear constraints, the discriminative priors can also be learned in the framework of alternating direction method with multipliers . It can be observed that all these models are always over-reliant on the formats of complex statistic prior models, so that their propagation architectures are mostly intricate and inflexibility, which also increase the complexities of the training processes.
I-A3 Fully Data-dependent Neural Networks
During the past few years, with the developments of deep neural networks [30, 31], researchers have increasingly focused on designing deep end-to-end networks as image propagations for solving image-level applications [32, 33]. Different from prior-based approaches, those propagations do not have explicit descriptions of image priors and even do not directly express the generative modes of image-level tasks. Instead, they learn the image propagations as deep networks, through minimizing the differences between the inputs and the desired solutions. Convolutional neural network (CNN) is one of the most commonly-used network structures, which have been widely experienced and analyzed in image processing society [4, 34, 35]. For intricate image-level tasks, researchers usually design complicated neural networks which are specifically combined by multiple sub-networks [36, 32]. Very recently, the authors of  proposed to learn denoiser networks to solve image restoration problems in the framework of optimization unrolling. However, their iterations are proposed without any rigid theoretical analyses, thus their robust performances cannot be always guaranteed.
|Ground Truth||Blurry Input||CSF||IRCNN||Ours|
I-B Our Contributions
As discussed above, most conventional approaches aim to design, optimize and learn different image priors based on particular understandings of the task. In contrast, recently proposed deep models are established in a heuristic manner and trained on a large number of data pairs. Though relatively good performances have been achieved, there still exist some important limitations in current deep visual enhancement models. For example, the existing network structures and architectures are mostly designed with engineering experiences and their performances are mainly dependent on the scale and quality of training data. However, there are actually rich domain knowledge and physical principles underline low-level vision tasks. On the other hand, the tight data-dependent nature of existing deep neural networks limits their application ranges in complex tasks, in which extremely less or even no high-quality training pairs are available. More importantly, till now it is still challenging to design and/or control the feed-forward propagation behaviors of existing deep networks with solid theoretical manners.
To mitigate these issues, we in this paper propose a simple, flexible and generic framework, named deep prior ensemble (DPE), to integrate both knowledge and data-based cues to build theoretically convergent visual propagation for different image enhancement problems. Specifically, we first design three basic propagative building-blocks, i.e., task-aware warm start, data-dependent residual architecture and prior projection. By cascading these components with a novel feedback control strategy, it is able to integrate the superiorities of both designed priors and learned descent directions into a unified framework for visual propagation. Furthermore, we provide rigid theoretical analysis to demonstrate that the feed-forward propagation in DPE is indeed converged to the desired task-related optimal solution. In summary, we list our main contributions as the following four items.
DPE investigates a novel visual propagation scheme to address different image enhancement tasks. At each stage, we can successfully integrate knowledge-driven priors (as warm starts) and fully data-dependent CNNs (as descent directions) for image propagation. A prior projection with error-based feedback control strategy is also introduced to guide the final propagation toward our desired output.
We provide a rigid theoretical analysis for the feed-forward propagation behaviors of DPE and prove that even with experience-based network architectures, we can still guarantee the convergence of DPE (to the critical points of our fundamental image modeling energy), only under some mild conditions.
On the one hand, we can interpret DPE as a data-dependent optimization scheme, in which network architectures are trained to adaptively predict descent directions for iterations, thus automatically avoid unwanted local minimums. On the other hand, DPE should also be understood as a theoretically converged recurrent network with prior-based feedback control.
Finally, we demonstrate the efficiency of DPE by applying it to various image enhancement tasks, such as image deconvolution, interpolation, super-resolution, single image haze removal and underwater enhancement.
The remainder of this paper is organized as follows. Section II first designs three basic propagation building-blocks and then establishes DPE framework based on a cascade of these components with feedback control. In Section III, we provide solid theoretical analysis to prove the convergence of DPE and discuss intrinsic relationships with existing knowledge-based and data-driven approaches. Extensive experiments on different image enhancement tasks are conducted in Section IV to evaluate DPE. Finally, we conclude our work in Section V.
Ii The Proposed Framework
In this section, we first build a general but relatively simple MAP-type energy to unify our fundamental constraints on the problem of visual enhancement. Different from most existing approaches, which directly optimize the energy to obtain their solutions, we then develop a hybrid scheme to combine task-aware and data-dependent information for image propagations. The flowchart and core mechanism of our proposed framework is illustrated in Fig. 1.
Ii-a Fundamental Energy for Visual Enhancement
Most visual enhancement tasks involve the estimation of the latent image of interest given only an observation, that has been compromised by unknown corruptions. Because these problems are fundamentally ill-posed, conventional approaches often aim to design strong priors to regularize the solution space, resulting to MAP-type estimation energies. While possibly well-motived in principle, standard MAP approaches tightly rely on both correct prior selections and exact inference schemes, which may compromise their performances on real-world challenging tasks.
To mitigate these issues, we in this work first build a general but relatively simple MAP-type energy to unify our fundamental constraints on the problem of visual enhancement. Different from most existing approaches, which directly optimize the energy to obtain their solutions, here we develop a hybrid scheme to combine task-aware and data-dependent information for image propagations.
Specifically, the main purpose of image enhancement is to estimate the latent image from an observation that may possibly be degraded by corruptions, noises and blur kernels, etc. As mentioned above, we first build a unified MAP-type energy to enforce fundamental constraints on :
Here the fidelity term is to measure the discrepancy between the estimated and observed and has a close relationship with the generative mode of the specific task, meanwhile, representing supposed noise type of specific task . The most common choice is with a linear operator , which can be identity for denoising, mask operation for inpainting or convolution kernel for deblurring. On the other hand, is known as the regularization term (derived from the image prior model). As mentioned above, the energy in Eq. (1) only aim to enforce fundamental constraints on the solution, here we just define it by the commonly used non-convex potential function on image gradient , i.e., , with parameters and . Moreover, it is also necessary to consider the range restriction on the image, i.e., we define the feasible set of as
where denotes the
-th element of the vectorized variable, is the -dimentional Euclidean space, and are two constants that represent the lower and upper bounds of images, respectively. Overall, we can summarize our fundamental cues of visual enhancement using the following non-convex and non-smooth energy minimization model:
is the characteristic function ofdefined as
Intuitively, one may adopt standard numerical algorithms to solve the model in Eq. (3). Unfortunately, the performances of such direct strategy will be compromised by several issues, including local solutions stemming from non-convexity of the energy and relatively weak prior selections for complex tasks. In fact, most generic numerical solvers are easily falling into unwanted local minimums and thus fail to find our task-related optimal solutions. Another possible idea is to train end-to-end deep networks to learn the underlying regression relationships between observed and desired , while without taking the obvious characteristics of the task. Although straightforward, it is indeed inadvisable to completely discard the explicit and rich domain knowledge of the tasks, i.e., the MAP-inspired energy. Therefore, it is necessary to integrate the superiorities of both task-aware formulations and data-dependent networks to address visual enhancement tasks.
Ii-B The Propagation Building-blocks
In the following, we will design different building-blocks to establish our propagation scheme. At -th stage111 where denotes non-negative integers. the purpose of our propagation is to update from based on . Therefore, we in this part consider the following proximal envelope of with respect to , i.e.,
Ii-B1 Warm Start by Fidelity
we first generate a temporary variable by minimizing the task-related fidelity with the proximal envelope at as follows:
Indeed, by investigating the closed form solution of Eq. (6) (with smooth ), we can also understand as a preliminary warm start, generated by fidelity based gradient descent with a step size parameter .
Ii-B2 Descent by Residual CNN
Different from conventional optimization techniques, which often perform descent updating based on the (sub)-gradient of the energy, here we adopt residual-type CNN, denoted as , to predict directions for our image propagation
where are corresponding network parameters. Conducting in this way, we actually learn an adaptive direction from collected training date. Thus we can recognize this process as a data-dependent deep propagation. Here it is also necessary to emphasize that the role of is slightly different from most existing end-to-end CNNs, which aim to directly address the particular enhancement task. In DPE, is only a general predictor for propagative direction estimation, thus its training should not be sensitive to specific types of tasks. More details of this architecture will be discussed in Section IV.
Ii-B3 Prior Projection
Notice that though the principles of the task (by fidelity) and information from training data (by residual CNN) have been used, we still cannot guarantee that the output of our propagation is always in the feasible solution space. To address this issue, we design a prior projection process to control propagation toward desired outputs:
where and denotes the projection operator on . It will be theoretically verified that this step do can help bring the output of deep architecture (i.e., ) back to the feasible region of the MAP-inspired formulation, i.e., Eq. (3).
Ii-C Deep Prior Ensemble
Now we are ready to design the formal deep prior ensemble (DPE) scheme. The most straightforward way seems to be cascading all above designed building-blocks as follows
From the optimization perspective, Eq. (9) can also be understood as network-incorporated iterations for solving Eq. (3). But due to the inexactness in each iteration, it is challenging to directly analyze their propagative behaviors. So a natural question arises: can we design a more deliberate iterative scheme to generate theoretically better (e.g., convergent) propagations for visual enhancement? By introducing the following sub-gradient bound condition, we can give a positive answer to above question222Detailed analysis can be found in the following section..
(Sub-gradient Error Bound Condition) For , denote the sub-gradient error of in Eq. (5) w.r.t. as , where . Then we consider the following condition:
where is a constant.
In fact, this sub-gradient error can be used as a controller to regularize our propagation behavior. But due to the non-uniqueness of for , it is hard to check Condition II.1 in practice. Therefore, we provide an equivalent expression of in the following proposition333Please refer to Appendix for all the proofs of proposition, theorem and corollary in this paper..
The error can be reformulated as
which is more implementable for practical use.
Iii Theoretical Analysis and Discussions
Iii-a Propagative Behaviors Analysis
We first analyze the propagative behaviors of DPE from an optimization perspective (e.g., convergence and rate) based on the fundamental energy model in Eq. (3). Before providing the main convergence results, we in the following proposition shows that the propagated by our proposed DPE, i.e., the highly nonlinear operations in Eq. (9), can be regarded as the solution of an error-penalized abstract optimization model.
The calculated by Eq. (9), can be regarded as the solution of an error-penalized abstract optimization model, that is,
Here we must emphasize that this equivalent reformulation will only be used for theoretical analysis, and thus it will not be practically solved by our proposed method.
Suppose that is a sequence generated by Alg. 1. Then we have the following assertions.
The energy in Eq. (3) is sufficient descent, i.e.,
with . Moreover, the propagation process is bounded during iterative stages, namely, the sequence propagated by DPE is bounded.
The sub-gradient of satisfies
Any limit point of the propagation sequence is a critical point of the objective function , i.e., ; furthermore, is constant on the set of all limit points of the sequence .
Notice that in nonconvex scenario, the critical point is just the necessary condition to the local optimal solution. Based on this theorem, we can further prove in the following corollary that the propagations generated by DPE can obtain a preferable solution of our fundamental visual enhancement energy.
By verifying the semi-algebraic property of , we have that is a Cauchy sequence, thus the whole sequence converges to the critical point of our fundamental visual enhancement energy. Furthermore, we can obtain the convergence rate of DPE, based on a particular desingularizing function with and . Specifically, the sequence converges after finite iterations if ; the linear and sub-linear rates can be obtained if function faces the case of and , respectively.
In summary, the analyses in Theorem III.1 and Corollary III.1 actually verify the core mechanism of our hybrid propagation. That is, we integrate both domain-knowledge (i.e., task-aware warm start in Eq. (6)) and information from training data (i.e., descent by residual CNN in Eq. (7)) to generate a propagation scheme. By adaptive controlling the prior projection (i.e., prior projection in Eq. (8) and Condition II.1), we can guarantee that the final output of DPE always satisfies our fundamental constrains on visual enhancement tasks (i.e., the critical point of energy in Eq. (3)).
Iii-B Relationships with Existing Approaches
As discussed above, our DPE actually provides a flexible ensemble framework to integrate both conventional knowledge-driven cues (MAP-type models) and data-based priors (deep networks) for image propagation. Actually, DPE should be understood as either a network driven prior optimization scheme (as an analogy to conventional optimization algorithms) or knowledge guided deep model (as an analogy to standard experience-based CNNs). In the following, we will discuss these relationships in detail.
Iii-B1 Network Driven Prior Optimization
Conventional prior-based approaches often design and optimize MAP-type models to generate propagations for visual enhancement. From this perspective, DPE can also be understood as an inexact, network-guided iterative scheme for minimizing the energy formulation in Eq. (3). It seems like that standard optimization techniques  can also be used for solving Eq. (3). However, most of their updating schemes are designed based on the condition . In contrast, recall that DPE actually enforce the condition for our propagation. Hence these standard prior optimization approaches should be regarded as special cases of DPE with more strict assumptions, since the inequality in Condition II.1 is invariably satisfied during stages of propagation. Taken in this sense, it is also quite obvious that our propagation is more flexible than the standard exact optimization schemes. More importantly, as illustrated in Figs. 1 and 2, thanks to such flexibility, we can introduce residual CNN to generate mappings from current stage to the next stage, so that directly learn descent directions from training data for our propagation. Notice that such data-dependent iteration scheme is still with nice convergence properties.
Iii-B2 Knowledge Guided Deep Model
Though end-to-end deep learning approaches have obtained promising performance for relative simple image enhancement tasks, such as denoising and super-resolution[40, 41, 42, 36], their performances are tightly depended on the training data. This is because little or no task information can be revealed in their designed networks. Moreover, it is also challenging to establish network structures to directly learn end-to-end mappings for complex physical principles in image enhancement tasks, e.g., image deconvolution with large size blur kernel. In contrast, our DPE actually provide a new perspective to build deep models using domain knowledge for enhancement tasks. In other words, task-aware processes in DPE can not only significantly reduce the training complexity of the network, but also control the convergence for our final propagation.
Iv Experimental Results
We first verify our developed theoretical results on visual propagation and then compare the proposed DPE framework with state-of-the-art approaches on various image enhancement tasks, including non-blind deconvolution, image interpolation, super-resolution, single image haze removal and underwater enhancement. All these experiments are conducted on a PC with Intel Core i7 CPU at 3.6GHz, 32 GB RAM and a NVIDIA GeForce GTX 1050 Ti GPU.
Iv-a Experimental Setting
Iv-A1 Summary of Enhancement Tasks
Table I summarizes the formulations and fidelities for different enhancement tasks considered in this work. Specifically, , and are denoted as latent image, corrupted observation and errors/noises for non-blind deconvolution (A), image interpolation (B) and super-resolution (C), respectively. Here in (A) denotes the blur kernel and is the convolution. As for (B), denotes the mask and represents the dot product, while in (C) and are the binary sampling matrix and circulant matrix representing the convolution for the anti-aliasing filter, respectively. Notice that both haze removal and underwater enhancement tasks are based on the atmospheric scattering model (i.e., ), in which is the global atmospheric light, , and are the color observation, the latent scene radiance and the medium transmission, respectively. So we consider as the initial transmission estimation and the errors in these two tasks.
|Ground Truth||Blurry Input||GD||PG|
Iv-A2 Propagative Architectures and Training Strategies
As discussed above, different from most existing end-to-end networks, which directly address the enhancement tasks, here our architectures are established to discover the propagative directions from collected dataset. Therefore, we build a negative type residual block (i.e., ), in which consists of dilated convolution layers (with filter size-nd to -th linear layers.
In general, our propagations should be always toward natural image/transmission distributions. Moreover, the descent directions should also have the ability to removal propagative errors during iterations. Therefore, we would like to add different levels of Gaussian noise to our desired image/transmission to synthesize these propagative errors. Specifically, we train on noise level range (divided by a step size of ), resulting in a set of candidate propagative architectures. Then we incorporate these into DPE following criterion designed in Alg. 1 to build our formal propagation scheme.
As for our particular training data, they are generated based on the propagation properties of these enhancement tasks. That is, the applications considered in the following can be divided into two categories: image propagation (i.e., non-blind image deconvolution, image interpolation and super-resolution) and transmission propagation (i.e., single image haze removal and underwater image enhancement). For the first category of tasks, we collect 800 images, in which 400 are from 
and the other 400 are from ImageNet database. We crop them into small patches of size and select patches for training. While for transmission-based applications, we utilize NYU dataset , including 1449 depth images, to synthesize transmission maps. Then we crop them into patches of size and select patches as our training data.
Iv-B Model Analysis and Verification
We in this section first conduct experiments on standard non-blind deconvolution task to investigate the properties of DPE and verify our theoretical results.
Iv-B1 Mechanism Comparisons
Here we consider different image propagation mechanisms, including conventional prior-optimization strategies and fully-data-dependent deep networks, to address non-blind image deconvolution problems. These experiments are conducted on the most widely used Levin et al. dataset, with 32 blurry images of size  and Sun et al.’ dataset, with 640 blurry images with Gaussian noises, sizes range from to . As for prior-optimization, we adopt two different strategies, including gradient descent iterations for the smooth prior model in Eq. (1) (denoted as GD) and proximal gradient scheme for the non-smooth model in Eq. (3) (denoted as PG). In contrast, we also train our residual network building-blocks on different blurry datasets. That is, we generate blurry training data using a single kernel, which is the same as the test one (denoted as SK-Net) or multiple kernels (denoted as MK-Net). In this experiment, we also consider our DPE with two different settings, i.e., propagation without prior projection (denoted as S-DPE) and with prior projection (denoted as C-DPE).
|Levin et al.’||PSNR||29.38||30.12||32.74||31.53||31.65||33.26||31.32||32.51||33.44|
|Sun et al.’||PSNR||30.67||31.03||31.55||30.79||32.44||32.45||31.47||32.61||32.82|
Fig. 3 first illustrated visual comparisons of these different approaches on an example image from Sun et al.’ benchmark. We observed that both of our hybrid propagations can achieve better qualitative performance. We then plotted the convergence behaviors of conventional iteration algorithms (i.e., GD and PG) and our propagations (i.e., S-DPE and C-DPE) on this example image in Fig. 4 (a). The PSNR scores of all the images with the first blur kernel are also shown in Fig. 4 (b). We can see that our proposed hybrid schemes perform consistently better than conventional optimization algorithms, in which C-DPE is the best among all the compared strategies.
Table II further reported average quantitative performances on Sun et al.’ benchmark. It can be seen that PG achieves better performance as it is based on a more accurate prior model. We also observed that directly performing propagations using the networks cannot obtain good performance, even trained on data generated by test kernel (i.e., SK-Net). This is mainly because that relative simple architectures may have difficulty in fitting the deconvolution process. In contrast, our hybrid propagation schemes (i.e., S-DPE and C-DPE) perform much better than these approaches. Moreover, our designed prior projection based feedback strategy (i.e., C-DPE) can further improve the performance of the simplified DPE (i.e., S-DPE).
Iv-B2 Propagative Behaviors Analysis
We then verify the sub-gradient error bound (Condition II.1) in our theoretical part. That is, we plot curves of sub-gradient error (orange) and relative error (blue) on example images from Levin et al.’ and Sun et al.’ benchmarks in Fig. 5. It can be seen that the error conditions in Eq. (10) are always satisfied during the propagations, which verify our theoretical results in Section III.
|Levin et al.||Sun et al.|
We also plotted the intermediate results of DPE (i.e., PSNR scores of the propagative variables ) on example images from both Levin et al.’s and Sun et al.’s benchmarks in Fig. 6. We observed that the fidelity based warm start provided relative good initial values (i.e., ) at each stage. The network based descent direction (i.e., ) can significantly improve the performance, especially on challenging image in Sun et al.’ dataset. Finally, our prior projection can fine tune the deep prior ensemble and result to a more stable image propagation (i.e., ).
Iv-C Real-world Image Enhancement Tasks
In this subsection, we evaluate DPE on various image enhancement tasks, including non-blind image deconvolution, image interpolation, super-resolution, single image haze removal and underwater image enhancement, with comparisons to state-of-the-art approaches for these problems.
|Levin et al.||Sun et al.|
Iv-C1 Non-blind Image Deconvolution
As for this task, we compared the proposed framework with several state-of-the-art algorithms, including TV , HP , CSF , IDDBM3D , EPLL , RTF , MLP, and IRCNN  on both Levin et al.’ and Sun et al.’ datasets. In Table III, we can see that the PSNR and SSIM scores of the proposed DPE are significantly better than the other deconvolution methods. It is observed that the speed of DPE is slower than some simple prior optimization techniques (e.g., TV, HL and CSF), which have very poor restoration performance. But fortunately, our propagation is much faster than the CNN based approaches (e.g., IRCNN) and other high-performance approaches (e.g., IDDBM3D, EPLL and RTF). Fig. 7 then compared the visual performances of DPE against approaches with relatively high quantitative scores in Table III on an example image in Sun et al’ benchmark, Notice that this image is corrupted not only by blur kernels, but also 5% additional Gaussian noise. It is easy to conclude that our method achieved both qualitative enhanced results (e.g., generates the much clearer image with fine texture) and better quantitative performance.
|18.62 / 0.23||24.14 / 0.71||27.46 / 0.81||25.74 / 0.76||28.70 / 0.82||28.83 / 0.83|
|Ground Truth||Corrupted Input||TV||FoE||VNL||ISDSB||Ours|
|2||30.24 / 0.8688||32.28 / 0.9056||32.54 / 0.9069||32.88 / 0.9114||32.42 / 0.9063||33.03 / 0.9124||32.94 / 0.9127|
|3||27.55 / 0.7742||29.13 / 0.8188||29.46 / 0.8236||29.61 / 0.8285||29.28 / 0.8209||29.77 / 0.8314||29.69 / 0.8328|
|4||26.00 / 0.7027||27.32 / 0.7491||27.68 / 0.7570||27.72 / 0.7620||27.49 / 0.7503||28.01 / 0.7674||27.83 / 0.7702|
|Fattal’s benchmark ||D-Hazy benchmark |
|–||0.8122 / 0.1304||0.9680 / 0.0313||0.9463 / 0.0538||0.8420 / 0.1135||0.9699 / 0.0313||0.9252 / 0.0632||0.9786 / 0.0300|
|–||0.8487 / 0.1120||0.8514 / 0.1112||0.9579 / 0.0472||0.9334 / 0.0613||0.9754 / 0.0363||0.9690 / 0.0315||0.9815 / 0.0308|
|Ground Truth||Hazy Input||||||||||||Ours|
Iv-C2 Image Interpolation
For the task of image interpolation (a.k.a. inpainting), we generated two types of corruptions, i.e., random masks with 20%, 40%, 60% and 80% missing pixels and text masks with either English or Chinese characters on the CBSD68 dataset , which contains 68 images with the size of 481 321. We compared DPE with some state-of-the-art methods, including TV , FoE , VNL  and ISDSB  on this task. Table IV shows the quantitative results on image interpolation task. It is obvious that our method performs pretty well in terms of both PSNR and SSIM on different rates of missing pixels and text masks. Fig. 8 then compared the visual results of these approaches. The top row of Fig. 8 illustrated the results of image with 80% missing pixels. It is not hard to see from the zoomed in comparisons that the edge of the object can be successfully preserved in our image propagation. The bottom row of Fig. 8 showed the results of text removal. We found that existing approaches either failed to remove the bold English characters or over smooth image details. In contrast, DPE achieved the best quantitative and qualitative performances.
The task of super-resolution is another important image enhancement task and has received much attention in the past few years. In this experiment, we compare DPE with several state-of-the-art methods including two conventional approaches (i.e., A , TNRD ) and three deep networks (i.e., IRCNN , SRCNN , VDSR ). For quantitative comparisons, we reported PSNR and SSIM on Set14 benchmark  in Table V. We observed that the PSNR score of VDSR is a little bit higher than ours. This is mainly because its network is particularly designed for super-resolution task. Moreover, they first collect training datasets for several specified scales and then combine them into one big dataset for network training. Fortunately, it can be seen that DPE achieved higher SSIM score, which is more convincing to measure the image structure information. We also plotted super-resolution results of an example image from Urban100 dataset  in Fig. 9. It is easy to see that our method can generate clearer texture than other state-of-the-art methods.
|–||0.6283 / 0.3830||0.6613 / 0.2466||0.6801 / 0.2479||0.6510 / 0.1249||0.7196 / 0.1330||0.7234 / 0.1501||0.8443 / 0.0622|
|–||0.7720 / 0.3260||0.8494 / 0.1756||0.8503 / 0.1605||0.8014 / 0.1314||0.8910 / 0.0868||0.8777 / 0.1013||0.9057 / 0.0801|
|Ground Truth||Hazy Input||||||||||||Ours|
Iv-C4 Single Image Haze Removal
We compare our proposed framework with state-of-the-art approaches, including He , Meng , Cai , Berman  and Ren , for single image haze removal. In this task, we initialize the transmission based on existing prior (i.e., haze line ) and perform the DPE propagation to obtain the optimal transmission map. Then we estimate the latent clear image using the atmospheric scattering model as stated in Table I. We first report the quantitative performances ( i.e., the average PSNR, SSIM, and error ) of all the compared methods on two representative dehazing benchmarks (i.e., Fattal’s  and D-Hazy ) in Table VI. It is easy to observe that DPE achieves the best results among all the compared methods on all the test benchmarks. We then compare the estimated transmissions and recovered results on example images from Fattal’s dataset in Figs. 10 and 11. Additional visual comparisons on example images from D-Hazy dataset is plotted in Fig. 12. We also evaluate DPE on real-world hazy images and plot the enhancement results in Fig. 13. From these quantitative and qualitative analyses, we observe that our method consistently out-performs all the compared dehazing methods.
Iv-C5 Underwater Image Enhancement
Finally, we evaluate DPE on the task of underwater image enhancement. In this application, three different categories of algorithms, including layer decomposition (i.e., ), fusion principle (i.e., ) and transmission estimation (i.e., ,  and ours). Notice that here we follow  to perform a histogram-based color correction as the post-process for three transmission-based methods. In Fig. 14, we first compare the performance of transmission estimation for the work in  and our DPE on an example underwater image. It can be seen that DPE obtains more accurate transmission, thus leads to the better enhanced image. Furthermore, we conduct experiments on example images collected by Berman et al.  (top two rows) and ourself (bottom two rows)444Based on the underwater robot picking contest: http://www.cnurpc.org/.. We can see in Fig. 15 that DPE is able to obtain results with more details and better visual quality compared with other methods.
In this paper, we developed a deep prior ensemble (DPE) framework to integrate domain-knowledge and information from training data to address image enhancement. By cascading three newly designed basic propagative building-blocks with a feedback control strategy, we actually establish a theoretically convergent image propagation framework. The main advantage of DPE against conventional optimization-based approaches is that our iterations can successfully avoid unwanted local minimums by network-based descent directions. Meanwhile, we also improve the experience-based network structures by task-aware warm start and prior projection feedback control strategy. Extensive experimental results on various image enhancement tasks demonstrated that the proposed method can successfully provide favorable enhancement performance quantitatively and qualitatively.
Appendix A Proofs of Our Theoretical Results
It is necessary to first review and summarize some fundamental mathematical concepts (e.g., Kurdyka-Łojasiewicz and semi-algebraic properties) in the following definition. More details can also be found in [18, 64] and the reference therein.
Kurdyka-Łojasiewicz Property: Let be a proper lower semi-continuous function. Then function is said to have Kurdyka-Łojasiewicz (KŁ) property at if there exists , a neighborhood of and a concave and continuous function , such that for all , the following inequality holds
If satisfies the KŁ property at each point of then is called a KŁ function.
Semi-algebraic Set and Function: A subset of is a real semi-algebraic set if there exits a finite number of real polynormial function such that . A function is called semi-algebraic if its graph , is a semi-algebraic subset of .
Let be a proper and lower semi-continuous function. Suppose sequence and its (limiting) sub-gradient of , i.e., , have and . If in addition as , then .
A-B Proof of Proposition ii.1
From our propagation scheme and the expression of , we have
Thus from the definition of the proximal map , the above equality is equal to
which is exactly the same form in the Condition II.1. Thus, we conclude the assertion that the is an equivalent form to . ∎
A-C Proof of Proposition iii.1
Then with the definition of proximal map , the equivalent formulation of the above equality:
indicates that can be regarded as an approximate solution of problem: , by regarding as the error to its first-order optimality condition, which also means that can be regarded as a result of the following optimization problem
A-D Proof of Theorem iii.1
First of all, we prove the sufficient descent property. From the equivalent reformulation (i.e., Eq. (12)) of our propagation scheme, we have that
The above inequality can be clarified with the definitions of and , as
On the other hand, since is a coercive function, that is, as , thus it surely brings the boundedness of sequence with the sufficient descent property of .
The second assertion in the Theorem III.1 can be directly deduced from the formation of . Since is proper, lower semi-continuous and is continuous differential, then we have
which is directly deduced from the Condition II.1. Thus, with the definition of , we have proved the second assertion by rewriting the above inequality.
From the sufficient descent property of , we have
for a positive integer . Since is bounded from below, thus we have by taking the limit as . On the other hand, from we have as . Furthermore, denoting , there obviously has and as .
Since is bounded, then there exists a subsequence such that as . By letting step as , then we have
By taking , we have the following inequality with the first condition in the Theorem III.1
Then with the lower semi-continuous property of the function , we have , which further indicates as .
Together with the assertion of Lemma A.1 we have , which indicates that is a critical point of . Moreover, since is bounded from below and sufficient descent, has limit value as . Together with , we have concluded the proof. ∎
A-E Proof of Corollary iii.1
Since is a semi-algebraic function, thus it satisfies KŁ inequality at every point of . From the Condition II.1 that is a bounded sequence, then there exists a subsequence that converges to . With the Condition II.1, then has uniformized KŁ property  at the set of all limit points of . Since is sufficiently descent, then there exists such that for ,
From the concavity of , we get
Summing up the above inequality from to yields
where the first inequality comes from the definition of . Thus we have the following inequality for any , with the fact that
which indicates that has finite length, i.e.,
Further on, the sequence is a Cauchy sequence which converges to a critical point of . ∎
-  J. Pan, Z. Hu, Z. Su, and M. H. Yang, “Deblurring text images via -regularized intensity and gradient prior,” in CVPR, 2014, pp. 2901–2908.
-  H. Wang, X. Gao, K. Zhang, and L. Jie, “Single image super-resolution using gaussian process regression with dictionary-based sampling and student-t likelihood,” IEEE TIP, vol. 26, no. 7, pp. 3556–3568, 2017.
-  R. Fattal, “Single image dehazing,” in ACM ToG, 2008, pp. 1–9.
-  B. Cai, X. Xu, K. Jia, C. Qing, and D. Tao, “Dehazenet: An end-to-end system for single image haze removal,” IEEE TIP, vol. 25, no. 11, pp. 5187–5198, 2016.
-  G. Deng, “Guided wavelet shrinkage for edge-aware smoothing,” IEEE TIP, vol. 26, no. 2, pp. 900–914, 2016.
-  Y. Weiss and W. T. Freeman, “What makes a good model of natural images?” in CVPR, 2007, pp. 1–8.
-  L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D: Nonlinear Phenomena, vol. 60, no. 1-4, pp. 259–268, 1992.
-  S. Osher, M. Burger, D. Goldfarb, J. Xu, and W. Yin, “An iterative regularization method for total variation-based image restoration,” Multiscale Modeling and Simulation, vol. 4, no. 2, pp. 460–489, 2005.
-  I. Daubechies, M. Defrise, and C. De Mol, “An iterative thresholding algorithm for linear inverse problems with a sparsity constraint,” Communications on pure and applied mathematics, vol. 57, no. 11, pp. 1413–1457, 2004.
-  L. Xu, C. Lu, Y. Xu, and J. Jia, “Image smoothing via gradient minimization,” in SIGGRAPH Asia Conference, 2011, p. 174.
-  L. Xu, S. Zheng, and J. Jia, “Unnatural sparse representation for natural image deblurring,” in CVPR, 2013, pp. 1107–1114.
-  X. Guo, X. Cao, and Y. Ma, “Robust separation of reflection from multiple images,” in CVPR, 2014, pp. 2187–2194.
-  A. Buades, B. Coll, and J.-M. Morel, “A non-local algorithm for image denoising,” in CVPR, vol. 2, 2005, pp. 60–65.
-  X. Zhang, M. Burger, X. Bresson, and S. Osher, “Bregmanized nonlocal regularization for deconvolution and sparse reconstruction,” SIAM Journal on Imaging Sciences, vol. 3, no. 3, pp. 253–276, 2010.
-  K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3-d transform-domain collaborative filtering,” IEEE TIP, vol. 16, no. 8, pp. 2080–2095, 2007.
-  A. Danielyan, V. Katkovnik, and K. Egiazarian, “Bm3d frames and variational image deblurring,” IEEE TIP, vol. 21, no. 4, pp. 1715–1728, 2012.
-  S. Gu, L. Zhang, W. Zuo, and X. Feng, “Weighted nuclear norm minimization with application to image denoising,” in CVPR, 2014, pp. 2862–2869.
-  H. Attouch, J. Bolte, P. Redont, and A. Soubeyran, “Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the kurdyka-łojasiewicz inequality,” Mathematics of Operations Research, vol. 35, no. 2, pp. 438–457, 2010.
-  J. Bolte, S. Sabach, and M. Teboulle, “Proximal alternating linearized minimization for nonconvex and nonsmooth problems,” Mathematical Programming, vol. 146, no. 1–2, pp. 459–494, 2014.
-  J. Zeng, S. Lin, and Z. Xu, “Sparse regularization: Convergence of iterative jumping thresholding algorithm,” IEEE TSP, vol. 64, no. 19, pp. 5106–5118, 2016.
-  D. Zoran and Y. Weiss, “From learning models of natural image patches to whole image restoration,” in CVPR, 2011, pp. 479–486.
-  G. Yu, G. Sapiro, and S. Mallat, “Solving inverse problems with piecewise linear estimators: From gaussian mixture models to structured sparsity,” IEEE TIP, vol. 21, no. 5, pp. 2481–2499, 2012.
-  S. Roth and M. J. Black, “Fields of experts,” IJCV, vol. 82, no. 2, pp. 205–229, 2009.
-  U. Schmidt, J. Jancsary, S. Nowozin, S. Roth, and C. Rother, “Cascades of regression tree fields for image restoration,” IEEE TPAMI, vol. 38, no. 4, pp. 677–689, 2016.
-  U. Schmidt and S. Roth, “Shrinkage fields for effective image restoration,” in CVPR, 2014, pp. 2774–2781.
-  L. Xiao, J. Wang, W. Heidrich, and M. Hirsch, “Learning high-order filters for efficient blind deconvolution of document photographs,” in ECCV, 2016, pp. 734–749.
-  G. Riegler, S. Schulter, M. Ruther, and H. Bischof, “Conditioned regression models for non-blind single image super-resolution,” in ECCV, 2015, pp. 522–530.
-  Y. Chen and T. Pock, “Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration,” IEEE TPAMI, vol. 39, no. 6, pp. 1256–1272, 2017.
-  J. Sun, H. Li, Z. Xu et al., “Deep admm-net for compressive sensing mri,” in NIPS, 2016, pp. 10–18.
-  K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in ICLR, 2014.
-  K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016, pp. 770–778.
-  W. Ren, S. Liu, H. Zhang, J. Pan, X. Cao, and M. H. Yang, “Single image dehazing via multi-scale convolutional neural networks,” in ECCV.
-  X. Fu, J. Huang, X. Ding, Y. Liao, and J. Paisley, “Clearing the skies: A deep network architecture for single-image rain removal,” IEEE TIP, vol. 26, no. 6, pp. 2944–2956, 2017.
-  K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising,” IEEE TIP, vol. 26, no. 7, pp. 3142–3155, 2016.
-  C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE TPAMI, vol. 38, no. 2, pp. 295–307, 2016.
-  W. Yang, R. Tan, J. Feng, J. Liu, Z. Guo, and S. Yan, “Deep joint rain detection and removal from a single image,” in CVPR, 2017.
-  K. Zhang, W. Zuo, S. Gu, and L. Zhang, “Learning deep cnn denoiser prior for image restoration,” in CVPR, 2017.
-  G. Yuan and B. Ghanem, “tv: A new method for image restoration in the presence of impulse noise,” in CVPR, 2015, pp. 5369–5377.
-  Z. Lin, R. Liu, and Z. Su, “Linearized alternating direction method with adaptive penalty for low-rank representation,” in NIPS, 2011, pp. 612–620.
-  L. Xu, J. S. Ren, C. Liu, and J. Jia, “Deep convolutional neural network for image deconvolution,” in NIPS, 2014, pp. 1790–1798.
-  J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super-resolution using very deep convolutional networks,” in CVPR, 2016, pp. 1646–1654.
-  R. Yan and L. Shao, “Blind estimation of blur kernels and parameters from a single image,” IEEE TIP, vol. 25, no. 4, pp. 1910–1921, 2016.
-  J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in CVPR, 2009, pp. 248–255.
-  P. K. Nathan Silberman, Derek Hoiem and R. Fergus, “Indoor segmentation and support inference from rgbd images,” in ECCV, 2012.
-  A. Levin, Y. Weiss, F. Durand, and W. T. Freeman, “Understanding and evaluating blind deconvolution algorithms,” in CVPR, 2009, pp. 1964–1971.
-  L. Sun, S. Cho, J. Wang, and J. Hays, “Edge-based blur kernel estimation using patch priors,” in ICCP, 2013, pp. 1–8.
-  C. Li, W. Yin, H. Jiang, and Y. Zhang, “An efficient augmented lagrangian method with applications to total variation minimization,” Computational Optimization and Applications, vol. 56, no. 3, pp. 507–530, 2013.
-  D. Krishnan and R. Fergus, “Fast image deconvolution using hyper-laplacian priors,” in NIPS, 2009, pp. 1033–1041.
C. J. Schuler, H. C. Burger, S. Harmeling, and B. Scholkopf, “A machine learning approach for non-blind image deconvolution,” inCVPR, 2013, pp. 1067–1074.
-  J. B. Huang, A. Singh, and N. Ahuja, “Single image super-resolution from transformed self-exemplars,” in CVPR, 2015, pp. 5197–5206.
-  R. Fattal, “Dehazing using color-lines,” ACM ToG, vol. 34, no. 13, 2014.
-  C. D. V. Cosmin Ancuti, Codruta O. Ancuti, “D-hazy: A dataset to evaluate quantitatively dehazing algorithms,” in ICIP, 2016.
-  K. He, J. Sun, and X. Tang, “Single image haze removal using dark channel prior,” IEEE TPAMI, vol. 33, no. 12, pp. 2341–2353, 2011.
-  G. Meng, Y. Wang, J. Duan, S. Xiang, and C. Pan, “Efficient image dehazing with boundary constraint and contextual regularization,” in ICCV, 2014, pp. 617–624.
-  D. Berman, T. Treibitz, and S. Avidan, “Non-local image dehazing,” in CVPR, 2016, pp. 1674–1682.
-  P. Getreuer, “Total variation inpainting using split bregman,” Image Processing on Line, vol. 2, pp. 147–157, 2012.
P. Arias, G. Facciolo, V. Caselles, and G. Sapiro, “A variational framework for exemplar-based image inpainting,”IJCV, vol. 93, no. 3, pp. 319–347, 2011.
-  L. He and Y. Wang, “Iterative support detection-based split bregman method for wavelet frame-based image inpainting,” IEEE TIP, vol. 23, no. 12, pp. 5470–5485, 2014.
-  R. Timofte, V. D. Smet, and L. V. Gool, “A+: Adjusted anchored neighborhood regression for fast super-resolution,” in ACCV, 2014, pp. 111–126.
-  Y. Li, S. You, M. S. Brown, and R. T. Tan, “Haze visibility enhancement: A survey and quantitative benchmarking,” CVIU, 2017.
-  Y. Li, F. Guo, R. T. Tan, and M. S. Brown, “A contrast enhancement framework with jpeg artifacts suppression,” in ECCV, 2014, pp. 174–188.
-  C. Ancuti, C. O. Ancuti, T. Haber, and P. Bekaert, “Enhancing underwater images and videos by fusion,” in CVPR, 2012, pp. 81–88.
-  D. Berman, T. Treibitz, and S. Avidan, “Diving into haze-lines: Color restoration of underwater images,” in BMVC, 2017.
-  H. Attouch, J. Bolte, and B. F. Svaiter, “Convergence of descent methods for semi-algebraic and tame problems: Proximal algorithms, forward-backward splitting, and regularized gauss-cseidel methods,” Mathematical Programming, vol. 137, no. 1-2, pp. 91–129, 2013.