I Introduction
Image restoration (IR) aiming to reconstruct a high quality image from its low quality observation has many important applications, such as lowlevel image processing, medical imaging, remote sensing, surveillance, etc. Mathematically, IR problem can be expressed as , where and denote the degraded image and the original image, respectively, A denotes the degradation matrix relating to an imaging/degradation system, and denotes the additive noise. Note that for different settings of A, different IR problems can be expressed. For example, the IR problem is a denoising problem [1, 2, 3, 4, 5] when A
is an identical matrix and becomes a deblurring problem
[6, 7, 8, 9] when A is a blurring matrix/operator, or a superresolution problem [10, 11, 8, 12] when A is a subsampling matrix/operator. Essentially, restoring from is a challenging illposed inverse problem. In the past a few decades, the IR problems have been extensively studied. However, they still remain as an active research area.Generally, existing IR methods can be classified into two main categories, i.e., modelbased methods
[13, 1, 14, 15, 8, 16, 9, 17, 18] and learningbased methods [19, 20, 21, 22, 23, 24]. The modelbased methods attack this problem by solving an optimization problem, which is often constructed from a Bayesian perspective. In the Bayesian setting, the solution is obtained by maximizing the posterior , which can be formulated as(1) 
where and denote the data likelihood and the prior terms, respectively. For additive Gaussian noise, corresponds to the norm data fidelity term, and the prior term characterizes the prior knowledge of
in a probability setting. Formally, Eq. (
1) can be rewritten as(2) 
where denotes the regularizer associated with the prior term . Then, the desirable solution is the one that minimizes both the norm data fidelity term and the regularization term weighted by parameter . Clearly, the regularization term plays a critical role in searching for highquality solutions. Numerous regularizers have been developed, ranging from the wellknown total variation (TV) regularizer [13], the sparsitybase regularizers with offtheshelf transforms or learned dictionaries [1, 14, 3, 15], to the nonlocal selfsimilarity (NLSS) inspired regularizers [25, 2, 8]. The TV regularizer is good at characterizing the piecewise constant signals but unable to model more complex image edges and textures. The sparsitybased techniques are more effective in representing local image structures with a few elemental structures (called atoms) from an offtheshelf transformation matrix (e.g., DCT and Wavelets) or a learned dictionary. Indeed, the IR community has witnessed a flurry of sparsitybased IR methods [1, 3, 15, 11] in the past decade. Motivated by the fact that natural images often contain rich repetitive structures, nonolocal regularization techniques [2, 8, 4, 5] combining the NLSS with the sparse representation and lowrank approximation, have shown significant improvements over their local counterparts. Using those carefully designed prior, significant progresses of IR have been achieved. In addition to these explicitly regularized IR methods, denoisingbased IR methods have also been proposed [26, 27, 28, 29, 30]. In these methods, the original optimization problem is decoupled into two separated subproblems  one for dealing with the data fidelity term and the other for the regularization term, yielding simpler optimization problems. Specifically, the subproblem related to the regularization is a pure denoising problem, and thus other more complex denoising methods that cannot be expressed as regularization terms can also be adopted, e.g., BM3D [2], NCSR [8] and GMM [16] methods.
Different from the modelbased methods that rely on a carefully designed prior, the learningbased IR methods learn mapping functions to infer the missing highfrequency details or desirable highquality images from the observed image. In the past decade, many learningbased image superresolution methods [19, 20, 22, 24] have been proposed, where mapping functions from the lowresolution (LR) patches to highresolution (HR) patches are learned. Inspired by the great successes of the deep convolution neural network (DCNN) for image classification [31, 32], the DCNN models have also been successfully applied to image IR tasks, e.g., SRCNN [22], FSRCNN [33] and VDSR [34] for image superresolution, and TNRD [35] and DnCNN [24] for image denoising. In these methods, a DCNN is used to learn the mapping function from the degraded images to the original images. Due to its powerful representation ability, the DCNN based methods have shown better IR performances than conventional optimizationbased IR methods in various IR tasks [22, 34, 35, 24]. Though training of DCNN is very expensive, testing the DCNN is much more efficient than previous optimizationbased IR methods. Though the DCNN models have shown promising results, the DCNN methods lack flexibilities in adapting to different image recovery tasks, as the data likelihood term has not been explicitly exploited. To address this issue, hybrid IR methods that combine the optimizationbased methods and DCNN denoisers have been proposed. In [36]
, a set of DCNN models are pretrained for image denoising task and are integrated into the optimizationbased IR framework for different IR tasks. Compared with other optimizationbased methods, the integration of the DCNN models has advantages in exploiting the large training dataset and thus leads to superior IR performance. Similar idea has also been exploited in the autoencoderbased IR method
[37], where denoising autoencoders are pretrained as a natural image prior and a regularzer based on the pretrained autoencoder is proposed. The resulting optimization problem is then iteratively solved by gradient descent. Despite the effectiveness of the methods
[36, 37], they have to iteratively solve optimization problems, and thus their computational complexities are high. Moreover, the CNN and autoencoder models adopted in [36, 37] are pretrained and cannot be jointly optimized with other algorithm parameters.In this paper, we propose a denoising prior driven deep network to take advantages of both the optimization and discriminative learningbased IR methods. First, we propose a denoisingbased IR method, whose iterative process can be efficiently carried out. Then, we unfold the iterative process into a feedforward neural network, whose layers mimic the process flow of the proposed denoisingbased IR algorithm. Moreover, an effective DCNN denoiser that can exploit the multiscale redundancies is proposed and plugged into the deep network. Through endtoend training, both the DCNN denoisers and other network parameters can be jointly optimized. Experimental results show that the proposed method can achieve very competitive and often stateoftheart results on several IR tasks, including image denoising, deblurring and superresolution.
Ii Related Work
We briefly review the IR methods, i.e., the denoisingbased IR methods and the discriminative learningbased IR methods, which are related to the proposed method.
Iia Denoisingbased IR methods
Instead of using an explicitly expressed regularizer, denoisingbased IR methods [26] allow the use of a more complex image prior by decoupling the optimization problem of Eq. (2) into two subproblems, one for the data likelihood term and the other for the prior term. By introducing an auxiliary variable , Eq. (2) can be rewritten as
(3) 
In [26, 30], the ADMM technique is used to convert the above equally constrained optimization problem into two subproblems
(4) 
where denotes the augmented Lagrange multiplier updated as . The subproblem is a simple quadratic optimization that admits a closedform solution as
(5) 
The intermediately reconstructed image
depends on both the observation model and a fixed estimate of
. The subproblem is also called the proximity operator of computed at point , whose solution can be obtained by a denoising algorithm. By alternatively updating and until convergence, the original optimization problem of Eq. (2) is then solved. The advantage of this framework is that other stateoftheart denoising algorithms, which cannot be explicitly expressed in , can also be used to update , leading to better IR performance. For example, the wellknown BM3D [2][16], NCSR [8] have been used for various IR applications [26, 27, 28]. In [36], the sateoftheart CNN denoiser has also been plugged as an image prior for general IR. Due to the excellent denoising ability, stateoftheart IR results for different IR tasks have been obtained. Similar to [37], an autoencoder denoiser is plugged into the objective function of Eq. (2). However, different from the variable splitting method described above, the objective function of [37] is minimized by gradient descent. Though the denoisingbased IR methods are very flexible and effective in exploiting sateoftheart image prior, they require a lot of iterations for convergence and the whole components cannot be jointly optimized.IiB Deep network based IR methods
Inspired by the great success of DCNNs for image classification [31, 32], object detection [38], semantical segmentation [39], etc., DCNNs have also been applied for lowlevel image processing tasks [22, 34, 35, 24]. Similar to the coupled sparse coding [11], DCNNs have been used to learn nonlinear mapping from the LR patch space to the HR patch space [22]. By designing very deep CNNs, stateoftheart image superresolution results have been achieved [34]. Similar network structures have also been applied for image denoising [24]
and also achieved stateoftheart image denoising performance. For nonblind image deblurring, multiplayer perceptron network
[40] has been developed to remove the deconvolution artifacts. In [41], Xu et al. propose to use DCNN for nonblind image deblurring. Though excellent IR performances have been obtained, these DCNN methods generally treat the IR problems as denoising problems, i.e., removing the noise or artifacts of the initially recovered images, and ignore the observation models.There has been some attempts to leverage the domain knowledge and the observation model for IR. In [23], based on the learned iterative shrinkage/thresholding algorithm (LISTA) [42], Wang et al. developed a deep network whose layers correspond to the steps of the sparse coding based image SR. In [35], the classic iterative nonlinear reaction diffusion method is also implemented as a deep network, whose parameters are jointly trained. The DNN inspired from the ADMMbased sparse coding algorithm has also been developed for compressive sensing based MRI reconstruction [43]. In [44], DNNs constructed from truncated iterative hard thresholding algorithm has also been developed for solving norm sparse recovery problem. These modelbased DNNs have shown significant improvements in terms of both efficiency and effectiveness over original iterative algorithms. However, the strict implementations of the conventional sparse coding based methods result in a limited receipt field of the convolutional filters and thus cannot exploit the spatial correlations of the feature maps effectively, leading to limited IR performance.
Iii Proposed Denoisingbased Image Restoration Algorithm
In this section, we develop an efficient iterative algorithm for solving the denoisingbased IR methods, based on which a feedforward DNN will be proposed in the next section. Considering the denoisingbased IR problem of Eq. (3), we adopt the halfquadratic splitting method, by which the equally constrained optimization problem can be converted into a nonconstrained optimization problem, as
(6) 
The above optimization problem can be solved by alternatively solving two subproblems,
(7) 
The subproblem is a quadratic optimization problem that can be solved in closedform, as , where W is a matrix related to the degradation matrix A. Generally, W is very large, so it is impossible to compute its inverse matrix. Instead, the iterative classic conjugate gradient (CG) algorithm can be used to compute , which requires many iterations for computing . In this paper, instead of solving for an exact solution of the subproblem, we propose to compute with a single step of gradient descent for an inexact solution, as
(8) 
where and is the parameter controlling the step size. By precomputing , the update of can be computed very efficiently. As will be shown later, we do not have to solve the subproblem exactly. Updating once is sufficient for to converge to a local optimal solution. The subproblem is a proximity operator of computed at point , whose solution can be obtained by a denoiser, i.e., , where denotes a denoiser. Various denoising algorithms can be used, including that cannot be explicitly expressed by the MAP estimator with . In this paper, inspired by the success of DCNN for image denoising, we choose a DCNNbased denoiser to exploit the large training dataset. However, different from existing DCNN models for IR, we consider the network that can exploit the multiscale redundancies of natural images, as will be described in the next section. In summary, the proposed iterative algorithm for solving the denoisingbased IR problems is summarized in Algorithm 1. We now discuss the convergence property of Algorithm 1.
Theorem 1.
Consider the energy function
Assume that is lower bounded and coercive^{1}^{1}1 whenever .. For Algorithm 1, has a subsequence that converges to a stationary point of the the energy function provided that the denoiser satisfies the sufficient descent condition:
(9) 
where and is a continuous limiting subgradient of .
Proof See the Appendix.
Let us discuss the condition (1). We list some combinations of the function and mapping that satisfy (1):

is Lipschitz differentiable, and is a gradient descent map, where if is convex in or otherwise. Then, (1) follows from standard gradient analysis.

is proper and lower semicontinuous, the function is at least strongly convex in , and . This is known as the proximal mapping of . The properties of ensures to be well defined. Then, by convexity and optimality condition of the “” subproblem,
(10) This is different from (1) since the righthand side uses rather than . However, applying the righthand side term in the proof yields and thus (1) is satisfied asymptotically and the proof results still apply.

Let denote a manifold of (noiseless) images and be a function that measures a certain kind of squared distance between and . In particular, consider the squared Euclidean distance , where denotes orthogonal projection of to . Then, for , we have Similar to the last point, we have (10) and thus (1) asymptotically.
In parts 2–4 above, we can remove the proximity term , which is used in defining the mapping , and still ensure the same result, i.e., subsequence convergence to a stationary point. However, the proof must be adapted to each separately. We leave this to our future work.
It has been shown in [46] that if has the KurdykaLojasiewicz property, the subsequence convergence can be upgraded to the convergence of full sequence, which has been a standard argument in recent convergence analysis. As shown in [47], functions satisfying the KL property include, but not limited to, real analytic functions, semialgebraic functions, and locally strongly convex functions. Therefore, converges to a stationary point. It is possible that the stationary point is a saddle point rather than a local minimizer. However, it is known that firstorder methods almost always avoid saddle points assuming the initial solution is randomly selected [48]. Therefore, converging to a saddle point is extremely unlikely.
It has been shown in [49] that the denoiser autoencoder can be regarded as a approximately orthogonal projection of the noisy input to the manifold of noiseless images. Therefore, as shown in the above parts and , Algorithm 1 with the mapping function defined by the DCNN denoiser in a loose sense converges to a local minimizer, based on the above analysis.
Iv Denoising Prior Driven Deep Neural Network
In general, Algorithm 1 requires many iterations to converge and is computationally expensive. Moreover, the parameters and the denoiser cannot be jointly optimized in an endtoend training manner. To address these issue, here we propose to unfold the Algorithm 1 into a deep network of the architecture shown in Fig. 1 (a). The network exactly executes iterations of Algorithm 1. The input degraded image first goes through a linear layer parameterized by the degradation matrix for an initial estimate . is then fed into the linear layer parameterized by matrix , whose output is added with weighted by via a shortcut connection. The updated is fed into the denoiser module, whose structure is shown in Fig. 1(b). The denoised signal is fed into the linear layer parameterized by , whose output is further added with and via two shortcut connections for the updated . Such a process is repeated times. In our implementation, was always used. Instead of using fixed weights, all the weights (, , , ) involved in the recurrent stages can be discriminatively learned through endtoend training. Regarding the denoising module, as we are using a DCNNbased denoiser that contains a large number of parameters, we enforce all the denoising modules to share the same parameters to avoid overfitting.
The linear layers and are also trainable for a typical degradation matrix A. For image denoising, , and also reduces to a weighted identity matrix , where . For image deblurring, the layer can be simply implemented with a convolutional layer. The layer can also be computed efficiently by convolutional operations. The weight and filters correspond to and A can also be discriminatively learned. For image superresolution, two types of degradation operators are considered: the Gaussian downsampling and the bicubic downsampling. For Gaussian downsampling, , where H and D denote the Gaussian blur matrix and the downsampling matrix, respectively. In this case, the layer
corresponds to first upsample the input LR image by zeropadding and then convolute the upsampled image with a filter. Layer
can also be efficiently computed with convolution, downsampling and upsampling operations. All convolutional filters involved in these operations can be discriminatively learned. For bicubic downsampling, we simply use the bicubic interpolator function with scaling factor
and () to implement the matrixvector multiplications
and , respectively.Iva The DCNN denoiser
Inspired by the recent advances on semantical segmentation [39] and object segmentation [50], the architecture of the denoising network is illustrated in Fig. 1(b). Similar to the Unet [51] and the sharpMask net [50]
, the proposed network contains two parts: the feature encoding and the decoding parts. In the feature encoding part, there are a series of convolutional layers followed by pooling layers to reduce the spatial resolution of the feature maps. The pooling layer helps increase the receipt field of the neurons. In the feature encoding stage, all the convolutional layers are grouped into
feature extraction blocks ( in our implementation), as shown by the blue blocks in Fig. 1(b). Each block contains four convolutional layers with ReLU nonlinearity and
kernels. The first three layers generate channel feature maps, while the last layer doubles the number of channels followed by a pooling layer to reduce the spatial resolution of the feature maps with scaling factor . In the pooling layers, the feature maps are first convoluted with kernels and then subsampled by a scaling factor of along both axes.The feature decoding part also contains a series of convolutional layers, which are also grouped into four blocks followed by an upsampling layer to increase the spatial resolution of the feature maps. As the finally extracted feature maps lose a lot of spatial information, directly reconstructing images from the finally extracted features cannot recover fine image details. To address this issue, the feature maps of the same spatial resolution generated in the encoding stage are fused with the upsampled feature maps generated in the decoding stage, for obtaining newly upsampled feature maps. Each reconstruction block also consists of four convolutional layers with ReLU nonlinearity and kernels. In each reconstruction block, the first three layers produce channels feature maps and the fourth layer generate channels feature maps, whose spatial resolutions are upsampled with a scaling factor of by a deconvolution layer. The upsampled feature maps are then fused with the feature maps of the same spatial resolution from the encoding part. Specifically, the fusion is conducted by concatenating the feature maps. The last feature decoding block reconstructed the output image. A skip connection from the input image to the reconstructed image is added to enforce the denoising network to predict the residuals, which has be verified to be more robust [24].
IvB Overall network training
Note that the DCNN denoisers do not have to be pretrained. Instead, the overall deep network shown in Fig. 1
(a) is trained by endtoend training. To reduce the number of parameters and thus avoid overfitting, we enforce each DCNN denoiser to share the same parameters. Mean square error (MSE) based loss function is adopt to train the proposed deep network, which can be expressed as
(11) 
where and denote the th pair of degraded and original image patches, respectively, and denotes the reconstructed image patch by the network with parameter set . It is also possible to train the network with other the perceptual based loss functions, which may lead to better visual quality. We remain this as future work. The ADAM optimizer [52] is used to train the network with setting , and . The learning rate is initialize as and halved at every minibatch updates. The proposed network is implemented with framework and trained using Nvidia Titan X GPUs, taking about one day to converge.
V Experimental Results
In this section, we perform several IR tasks to verify the performance of the proposed network, including image denoising, deblurring, and superresolution. We trained each model for different IR tasks. We empirically found that implementing iterations of Algorithm 1 in the network generally lead to satisfied IR results for image denoising, deblurring and superresolution tasks. Thus, we fixed for all IR tasks. To train the networks, we constructed a large training image set, consisting of images of size used in [6].
Va Image denoising
For image denoising, and Algorithm 1 reduce to the iterative denoising process, i.e., the weighted noise image is added back to the denoised image for the next denoising process. Such iterative denoising has shown improvements over conventional denoising methods that only denoise once [3]. Here, we also found that implementing multiple denoising iterations in the proposed network improves the denoising results. To train the network, we extracted image patches of size from the training images and added additive Gaussian noise to the extracted patches to generate the noisy patches. Totally patches were extracted for training. Note that none of the test images was included into the training image set. The training patches were also augmented by flip and rotations. We compared the proposed network with several leading denoising methods, including three modelbased denoising methods, i.e., BM3D method [2], the EPLL method [16], and the lowrank based method WNNM method [5]
, and two deep learning based methods, i.e., the TNRD method
[35] and the DnCNNS method [24].Table I shows the PSNR results of the competing methods on a set of commonly used test images shown in Fig. 2. It can be seen that both the DnCNNS and the proposed network outperform other methods. For most of the test images and noise levels, the proposed network outperforms the DnCNNS method, which is the current stateoftheart denoising method. On average, the PSNR gain over DnCNNS can be up to dB. To further verify the effectiveness of the proposed method, we also employ the Berkeley segmentation dataset (BSD68) that contains 68 natural images for comparison study. Table II shows the average PSNR and SSIM results of the test methods on BSD68. One can seen that the PSNR gains over the other test methods become even larger for higher noise levels. The proposed method outperforms the DnCNNS method by up to dB on average on the BSD68, demonstrating the effectiveness of the proposed method. Parts of the denoised images by the test methods are shown in Figs. 34. One can see that the image edges and textures recovered by modelbased methods, i.e., BM3D, WNNM and EPLL are oversmoothed. The deep learning based methods, TNRD, DnCNNS and the proposed method produce much more visually pleasant image structures. Moreover, the proposed method generates even better results in recovering more details than TNRD and DnCNNS.


IMAGE  C.Man  House  Peppers  Starfish  Monar  Airpl  Parrot  Lena  Barbara  Boat  Man  Couple  Average  








31.92  34.94  32.70  31.15  31.86  31.08  31.38  34.27  33.11  32.14  31.93  32.11  32.38  

32.18  35.15  32.97  31.83  32.72  31.40  31.61  34.38  33.61  32.28  32.12  32.18  32.70  

31.82  34.14  32.58  31.08  32.03  31.16  31.40  33.87  31.34  31.91  31.97  31.90  32.10  

32.19  34.55  33.03  31.76  32.57  31.47  31.63  34.25  32.14  32.15  32.24  32.11  32.51  

32.62  35.00  33.29  32.23  33.10  31.70  31.84  34.63  32.65  32.42  32.47  32.47  32.87  

32.44  35.40  33.19  32.08  33.33  31.78  31.48  34.80  32.84  32.55  32.53  32.51  32.91  







29.45  32.86  30.16  28.56  29.25  28.43  28.93  32.08  30.72  29.91  29.62  29.72  29.98  

29.64  33.23  30.40  29.03  29.85  28.69  29.12  32.24  31.24  30.03  29.77  29.82  30.26  

29.24  32.04  30.07  28.43  29.30  28.56  28.91  31.62  28.55  29.69  29.63  29.48  29.63  

29.71  32.54  30.55  29.02  29.86  28.89  29.18  32.00  29.41  29.92  29.88  29.71  30.06  

30.19  33.09  30.85  29.40  30.23  29.13  29.42  32.45  30.01  30.22  30.11  30.12  30.43  

30.12  33.54  30.90  29.43  30.31  29.14  29.28  32.69  30.30  30.34  30.15  30.24  30.54  







26.13  29.69  26.68  25.04  25.82  25.10  25.90  29.05  27.23  26.78  26.81  26.46  26.73  

26.42  30.33  26.91  25.43  26.32  25.42  26.09  29.25  27.79  26.97  26.94  26.64  27.04  

26.02  28.76  26.63  25.04  25.78  25.24  25.84  28.43  24.82  26.65  26.72  26.24  26.35  

26.62  29.48  27.10  25.42  26.31  25.59  26.16  28.93  25.70  26.94  26.98  26.50  26.81  

27.00  30.02  27.29  25.70  26.77  25.87  26.48  29.37  26.23  27.19  27.24  26.90  27.17  

27.12  31.04  27.44  25.95  27.00  25.97  26.42  29.85  27.21  27.42  27.32  27.23  27.50  



Dataset  BM3D  EPLL  TNRD  DnCNNS  Ours  
12Δ  
PSNR  SSIM  PSNR  SSIM  PSNR  SSIM  PSNR  SSIM  PSNR  SSIM  


BSD68  15  31.08  0.872  31.19  0.883  31.42  0.883  31.74  0.891  32.29  0.888 
25  28.57  0.802  28.68  0.812  28.91  0.816  29.23  0.828  29.88  0.827  
30  25.62  0.687  25.68  0.688  25.96  0.702  26.24  0.719  27.02  0.726  

VB Image deblurring
To train the proposed network for image deblurring, we first convoluted the training images with a blur kernel to generate the blurred images and then extracted the training image patches of size
from the blurred images. The additive Gaussian noise of standard deviation
was also added to the blurred images. Patch augmentation with flips and rotations were adopted, generating total patches for training. Two types of blur kernels were considered, i.e., the Gaussian blur kernel of standard deviation and two motion blur kernels adopted in [53] of sizes and . We trained each model for different blur settings. We compared the proposed method with several leading deblurring methods, i.e., three leading modelbased deblurring methods (EPLL [16], IDDBM3D [7] and NCSR [8]) and the current stateoftheart denoisingbased deblurring methods with CNN denoisers [36] (denoted as DDCNN). The test images involved in this comparison study are shown in Fig. 5. In this experiment, we only conduct deconvolution for grayscale images. However, the proposed method can be easily extended for color image deblurring.The PSNR results of the test deblurring methods are reported in Table III. For fair comparisons, all the PSNRs of the other methods are generated by the codes released by the authors or directly written according to their papers. From table III, we can see that the DDCNN method performs much better than conventional modelbased EPLL, IDDBM3D and NCSR methods. For Gaussian blur, the proposed method outperforms DDCNN by 0.27 dB on average. For other motion blur kernels with higher noise levels, the proposed method is slightly worse than DDCNN method. Parts of the deblurred images by the competing methods are shown in Figs. 68. From Figs. 68, one can see that the proposed method not only produces more sharper edges but also recovers more details than the other methods.


Methods  Butterfly  Peppers  Parrot  starfish  Barbara  Boats  C.Man  House  Leaves  Lena  Average  


Gaussian Blur with standard deviation 1.6  


IDDBM3D  2  29.79  29.64  31.90  30.57  25.99  31.17  27.68  33.56  30.13  30.91  30.13 
EPLL  25.78  26.73  31.32  28.52  24.22  28.84  26.57  31.76  25.29  29.46  27.85  
NCSR  29.72  30.04  32.07  30.83  26.54  31.22  27.99  33.38  30.13  30.99  30.29  
DDCNN  30.44  30.69  31.83  30.78  26.15  31.41  28.05  33.80  30.44  31.05  30.48  
Ours  30.67  30.18  32.40  32.00  26.47  31.54  28.24  34.25  30.23  31.48  30.75  


motion blur kernel 1 of [53]  


EPLL  2.55  26.23  27.40  33.78  29.79  29.78  30.15  30.24  31.73  25.84  31.37  29.63 
DDCNN  32.23  32.00  34.48  32.26  32.38  33.05  31.50  34.89  33.29  33.54  32.96  
Ours  32.58  32.05  34.98  32.71  32.39  33.39  31.70  35.34  32.99  33.80  33.19  


EPLL  7.65  24.27  26.15  30.01  26.81  26.95  27.72  27.37  29.89  23.81  28.69  27.17 
DDCNN  28.51  28.88  31.07  27.86  28.18  29.13  28.11  32.03  28.42  29.52  29.17  
Ours  28.24  28.42  31.03  28.00  28.01  29.19  27.77  32.06  27.98  29.42  29.01  


motion blur kernel 2 of [53]  


EPLL  2.55  26.48  27.37  33.88  29.56  28.29  29.61  29.66  32.97  25.69  30.67  29.42 
DDCNN  31.97  31.89  34.46  32.18  32.00  33.06  31.29  34.82  32.96  33.35  32.80  
Ours  31.86  31.38  34.72  32.28  31.36  32.86  31.21  35.09  32.29  33.35  32.64  


EPLL  7.65  23.85  26.04  29.99  26.78  25.47  27.46  26.58  30.49  23.42  28.20  26.83 
DDCNN  28.21  28.71  30.68  27.67  27.37  28.95  27.70  31.95  27.92  29.27  28.84  
Ours  27.47  28.02  30.46  27.82  26.86  28.84  27.48  31.91  27.28  29.23  28.54  

VC Image superresolution
For image superresolution, we consider two image subsampling operators, i.e., the bicubic downsampling and the Gaussian downsampling. For the former case, the HR images are downsampled by applying the bicubic interpolation function with scaling factor () to simulate the LR images. For the latter case, the LR images are generated by applying the Gaussian blur kernel to the original images followed by subsampling. The Gaussian blur kernel of standard deviation of 1.6 is used in this case. The LR/HR patch pairs are extracted from the LR/HR training image pairs and augmented by flip and rotations, generating patch pairs. The LR patch size is , while the HR patch size is . We train each network for the two downsampling cases. The image data sets commonly used in the image superresolution (SR) literature are adopted for performance verification, including the set5, set14, the Berkeley segmentation dataset containing 100 images (denoted as BSD100), and the Urban 100 dataset [34] containing 100 highquality images. We compared the proposed method with several leading image SR methods, including two DCNN based SR methods (SRCNN [22] and VDSR [34]) and two denoising methods (TNRD [35] and DnCNN [24]), which produce the HR images by first upsampling the LR images with the bicubic interpolator and then denoising the upsampled images to recovery the highfrequency details. For fair comparisons, the results of the others are directly borrowed from their papers or generated by the codes released by the authors.
The PSNR results of the test methods for bicubic downsampling are reported in Tables IVV, from which we can see that the proposed method outperforms other competing methods. We observed that the PSNR gains over other methods becomes larger for large scaling factors, verifying the importance of observation consistencies for IR. The PSNR results of the test methods for Gaussian downsampling with scaling factor 3 are reported in Table VI. For this case, we compare the proposed method with DDCNN [36], which has much better results than their earlier DnCNN [24]. Since VDSR and SRCNN are trained for bicubic downsampling, it is unfair to directly apply these methods to the LR images generated by Gaussian downsampling and thus we didn’t include their results into this table. Parts of the reconstructed HR images by the test methods are shown in Fig. 911, from which we can see that the proposed method can produce sharper edges than other methods.


Images  Scaling factor  TNRD  SRCNN  VDSR  DnCNN  Ours 


Baby  2  38.53  38.54  38.75  38.62  38.88 
Bird  41.31  40.91  42.42  42.20  42.89  
Butterfly  33.17  32.75  34.49  34.51  34.72  
Head  35.75  35.72  35.93  35.84  36.00  
Woman  35.50  35.37  36.05  35.99  36.24  
Δ  
7Δ  
Average  36.85  36.66  37.53  37.43  37.75  


Baby  3  35.28  35.25  35.38  35.42  35.57 
Bird  36.09  35.48  36.66  36.61  37.51  
Butterfly  28.92  27.95  29.96  29.95  29.81  
Head  33.75  33.71  33.96  33.92  34.03  
Woman  31.79  31.37  32.36  32.31  32.71  
Δ  
7Δ  
Average  33.17  32.75  33.66  33.64  33.93  


Baby  4  31.30  33.13  33.41  33.23  33.64 
Bird  32.99  32.52  33.54  33.06  34.09  
Butterfly  26.22  25.46  27.28  26.94  27.68  
Head  32.51  32.44  32.70  32.36  32.88  
Woman  29.20  28.89  29.81  29.46  30.34  
Δ  
7Δ  
Average  30.85  30.48  31.35  31.01  31.72  




Dataset  Scaling factor  TNRD  SRCNN  VDSR  DnCNN  Ours  
PSNR  SSIM  PSNR  SSIM  PSNR  SSIM  PSNR  SSIM  PSNR  SSIM  


Set14  2  32.54  0.907  32.42  0.906  33.03  0.912  33.03  0.911  33.30  0.915 
3  29.46  0.823  29.28  0.821  29.77  0.831  29.82  0.830  30.02  0.836  
4  27.68  0.756  27.49  0.750  28.01  0.767  27.83  0.755  28.28  0.773  


BSD100  2  31.40  0.888  31.36  0.888  31.90  0.896  31.84  0.894  32.04  0.898 
3  28.50  0.788  28.41  0.786  28.82  0.798  28.80  0.795  28.91  0.801  
4  27.00  0.714  26.90  0.710  27.29  0.725  27.08  0.709  27.39  0.729  


Urban100  2  29.70  0.899  29.50  0.895  30.76  0.914  30.63  0.911  31.50  0.922 
3  26.44  0.807  26.24  0.799  27.14  0.828  27.08  0.824  27.61  0.842  
4  24.62  0.729  24.52  0.722  25.18  0.752  24.94  0.735  25.53  0.768  



Dataset  NCSR  SRCNN  VDSR  DDCNN  Ours 


Set5  32.07      33.88  34.22 


Set14  29.30      29.63  29.88 

Vi Conclusion
In this paper, we have proposed a novel deep neural network for general image restoration (IR) tasks. Different from current deep network based IR methods, where the observation models are generally ignored, we construct the deep network based on a denoisingbased IR framework. To this end, we first developed an efficient algorithm for solving the denoisingbased IR method and then unfolded the algorithm into a deep network, which is composed of multiple denoising modules interleaved with backprojection modules for data consistencies. A DCNNbased denoiser exploiting multiscale redundancies of natural images was developed. Therefore, the proposed deep network can exploit not only the effective DCNN denoising prior but also the prior of the observation model. Experimental results show that the proposed method can achieve very competitive and often stateoftheart results on several IR tasks, including image denoising, deblurring and superresolution.
Convergence
Theorem 2.
Consider the energy function
Assume that is lower bounded and coercive^{2}^{2}2 whenever .. For Algorithm 1, has a subsequence that converges to a stationary point of the the energy function provided that the denoiser satisfies the sufficient descent condition:
(12) 
where and is a continuous limiting subgradient of .
Proof.
Since is Lipschitz continuous with constant , it is well known that the gradient step on with step size satisfies the descent property
(13) 
where . By assumption, the step satisfies
(14) 
Since is coercive and, by (13) and (14), is monotonically nonincreasing, the sequence is bounded (otherwise, it would cause the contradiction ), so it has a convergent subsequence . Since is lower bounded, adding (9) and (10) yields
(15) 
and, by telescopic sum over and by monotonicity and boundedness of , we have the summability properties and , from which we conclude
(16)  
(17) 
Based on , we get , where we have used the continuity of in . Also, , where the first “” follows from the continuity of in and (16). Therefore, is a stationary point of . ∎
References
 [1] M. Elad and M. Aharon, “Image denoising via sparse and redundant representation over learned dictionaries,” IEEE Transactions on Image Processing, vol. 15, no. 12, pp. 3736–3745, 2006.
 [2] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3d transformdomain collaborative filtering,” IEEE Transactions on image processing, vol. 16, no. 8, pp. 2080–2095, 2007.
 [3] W. Dong, X. Li, L. Zhang, and G. Shi, “Sparsitybased image denoising via dictionary learning and structural clustering,” in Proc. of the IEEE CVPR, 2011, pp. 457–464.

[4]
W. Dong, G. Shi, and X. Li, “Nonlocal image restoration with bilateral variance estimation: a lowrank approach,”
IEEE Transactions on image processing, vol. 22, no. 2, pp. 700–711, 2013.  [5] W. Dong, X. Li, L. Zhang, and G. Shi, “Weighted nuclear norm minimization with application to image denoising,” in Proc. of the IEEE CVPR, 2014, pp. 2862–2869.
 [6] W. Dong, L. Zhang, G. Shi, and X. Wu, “Image deblurring and superresolution by adaptive sparse domain selection and adaptive regularization,” IEEE Transactions on image processing, vol. 20, no. 7, pp. 1838–1857, 2011.
 [7] A. Danielyan, V. Katkovnik, and K. Egiazarian, “Bm3d frames and variational image deblurring,” IEEE Transactions on image processing, vol. 21, no. 4, pp. 1715–1728, 2012.
 [8] W. Dong, L. Zhang, G. Shi, and X. Li, “Nonlocally centralized sparse representation for image restoration.” IEEE Transactions on Image Processing, vol. 22, no. 4, pp. 1620–1630, 2013.

[9]
W. Dong, G. Shi, Y. Ma, and X. Li, “Image restoration via simultaneous sparse
coding: Where structured sparsity meets gaussian scale mixture,”
International Journal of Computer Vision
, pp. 1–16, 2015.  [10] A. Marquina and S. J. Osher, “Image superresolution by tvregularization and bregman iteration,” Journal of Scientific Computing, vol. 37, no. 3, pp. 367–382, 2008.
 [11] J. Yang, J. Wright, T. Huang, and Y. Ma, “Image superresolution as sparse representation of raw image patches,” in Proc. of the IEEE CVPR, 2008, pp. 1–8.
 [12] X. Gao, K. Zhang, D. Tao, and X. Li, “Image superresolution with sparse neighbor embedding,” Image Processing, IEEE Transactions on, vol. 21, no. 7, pp. 3194–3205, 2012.
 [13] S. Osher, M. Burger, D. Goldfarb, J. Xu, and W. Yin, “An iterative regularization method for total variationbased image restoration,” Multiscale Modeling and Simulation, vol. 4, no. 2, pp. 460–489, 2005.
 [14] J. M. BioucasDias and M. A. Figueiredo, “A new twist: Twostep iterative shrinkage/thresholding algorithms for image restoration,” IEEE Transactions on Image Processing, vol. 16, no. 12, pp. 2992–3004, 2007.
 [15] J. Mairal, M. Elad, and G. Sapiro, “Sparse representation for color image restoration,” IEEE Transactions on image processing, vol. 17, no. 1, pp. 53–69, 2008.
 [16] D. Zoran and Y. Weiss, “From learning models of natural image patches to whole image restoration,” in Proc. of the IEEE ICCV, 2011, pp. 479–486.
 [17] S. Roth and M. J. Black, “Fields of experts,” International Journal of Computer Vision, vol. 82, no. 2, pp. 205–229, 2009.
 [18] G. Yu, G. Sapiro, and S. Mallat, “Solving inverse problems with piecewise linear estimators: From gaussian mixture models to structured sparsity,” IEEE Transactions on Image Processing, vol. 21, no. 5, pp. 2481–2499, 2012.
 [19] W. T. Freeman, T. R. Jones, and E. C. Pasztor, “Examplebased superresolution,” Computer Graphics and Applications, vol. 22, no. 2, pp. 56–65, 2002.
 [20] R. Timofte, V. De Smet, and L. Van Gool, “A+: Adjusted anchored neighborhood regression for fast superresolution,” in Asian Conference on Computer Vision. Springer, 2014, pp. 111–126.
 [21] U. Schmidt and S. Roth, “Shrinkage fields for effective image restoration,” in Proc. of the IEEE CVPR, 2014, pp. 2774–2781.
 [22] C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional network for image superresolution,” in European Conference on Computer Vision. Springer, 2014, pp. 184–199.
 [23] Z. Wang, D. Liu, J. Yang, W. Han, and T. Huang, “Deep networks for image superresolution with sparse prior,” in Proc. of the IEEE ICCV, 2015, pp. 370–378.
 [24] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising,” IEEE Transactions on image processing, vol. 26, no. 7, pp. 3142–3155, 2017.
 [25] A. Buades, B. Coll, and J.M. Morel, “A nonlocal algorithm for image denoising,” in Proc. of the IEEE CVPR, 2005, pp. 60–65.
 [26] S. Venkatakrishnan, C. Bouman, E. Chu, and B. Wohlberg, “Plugandplay priors for model based reconstruction,” in Proc. of IEEE Global Conference on Signal and Information Processing, 2013, pp. 945–948.
 [27] A. Brifman, Y. Romano, and M. Elad, “Turning a denoiser into a superresolver using plug and play priors,” in Proc. of the IEEE ICIP, 2016, pp. 1404–1408.
 [28] A. M. Teodoro, J. M. BioucasDias, and M. A. T. Figueiredo, “Image restoration and reconstruction using variable splitting and classadapted image priors,” in Proc. of IEEE ICIP, 2016, pp. 3518–3522.
 [29] M. E. Y. Romano and P. Milanfar, “The little engine that could: Regularization by denoising (red),” SIAM Journal on Imaging Sciences, vol. 10, no. 4, pp. 1804–1844, 2017.
 [30] S. H. Chan, X. Wang, and O. A. Elgendy, “Plugandplay admm for image restoration: Fixedpoint convergence and applications,” IEEE Transactions on Computational Imaging, vol. 3, no. 1, pp. 84–98, 2017.

[31]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in
In Advances in Neural Information Processing Systems, 2012, pp. 1097–1105.  [32] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. of the IEEE CVPR, 2016, pp. 770–778.
 [33] C. Dong, C. C. Loy, and X. Tang, “Accelerating the superresolution convolutional neural network,” in European Conference on Computer Vision. Springer, 2016, pp. 391–407.

[34]
J. Kim, J. K. Lee, and K. M. Lee, “Accurate image superresolution using very
deep convolutional networks,” in
IEEE Conference on Computer Vision and Pattern Recognition
, 2016, pp. 1646–1654.  [35] Y. Chen and T. Pock, “Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1256–1272, 2017.
 [36] K. Zhang, W. Zuo, S. Gu, and L. Zhang, “Learning deep cnn denoiser prior for image restoration,” in Proc. of the IEEE CVPR, 2017, pp. 2808–2817.
 [37] S. A. Bigdeli and M. Zwicker, “Image restoration using autoencoding priors,” in arXiv:1703.09964v1, 2017.
 [38] S. Ren, K. He, R. Girshick, and J. Sun, “Faster rcnn: towards realtime object detection with region proposal networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017.
 [39] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proc. of IEEE CVPR, 2015, pp. 3431–3440.
 [40] H. C. Burger, C. J. Schuler, and S. Harmeling, “Image denoising: can plain neural networks compete with bm3d?” in Proc. of IEEE CVPR, 2012, pp. 2392–2399.
 [41] L. Xu, J. S. Ren, C. Liu, and J. Jia, “Deep convolutional neural network for image deconvolution,” in In Advances in Neural Information Processing Systems, 2014.
 [42] K. Gregor and Y. LeCun, “Learning fast approximations of sparse coding,” in Proc. of the IEEE ICML, 2010.
 [43] Y. Yang, J. Sun, H. Li, and Z. Xu, “Deep admmnet for compressive sensing mri,” in In Advances in Neural Information Processing Systems, 2016.
 [44] B. Xin, Y. Wang, W. Gao, and D. Wipf, “Maximal sparsity with deep networks?” in In Advances in Neural Information Processing Systems, 2016.
 [45] Y. Wang, W. Yin, and J. Zeng, “Global convergence of admm in nonconvex nonsmooth optimization,” Journal of Scientific Computing, 2018.
 [46] J. Bolte, A. Daniilidis, and A. Lewis, “The lojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems,” SIAM Journal on Optimization, vol. 17, no. 4, pp. 1205–1223, 2007.

[47]
Y. Xu and W. Yin, “A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion,”
SIAM Journal on Imaging Sciences, vol. 6, no. 3, pp. 1758–1789, 2013.  [48] J. Lee, I. Panageas, G. Piliouras, M. Simchowitz, M. Jordan, and B. Recht, “Firstorder methods almost always avoid saddle points,” arXiv:1710.07406.

[49]
G. Alain and Y. Bengio, “What regularized autoencoders learn from the
datagenerating distribution,”
Journal of Machine Learning Research
, vol. 15, pp. 3743–3773, 2014.  [50] P. O. Pinheiro, T.Y. Lin, R. Collobert, and P. Dollar, “Learning to refine object segments,” in Proc. of ECCV, 2016.
 [51] O. Ronneberger, P. Fischer, and T. Brox, “Unet: Convolutional networks for biomedical image segmentation,” in In International Conference on Medical Image Computing and ComputerAssisted Intervention, 2015, pp. 234–241.
 [52] D. Kingma and J. Ba, “Adam: a method for stochastic optimization,” in Proc. of ICLR, 2014.
 [53] A. Levin, Y. Weiss, F. Durand, and W. T. Freeman, “Understanding and evaluating blind deconvolution algorithms,” in Proc. of CVPR, 2009, pp. 1964–1971.
Comments
There are no comments yet.