1 Introduction
Image deconvolution, also known as image deblurring, aims to recover a sharp image from an observed blurry image. The blurry image is usually modeled as a convolution of a latent image and a blur kernel :
(1) 
where denotes the convolution operator,
denotes an i.i.d. white Gaussian noise term with unknown standard deviation (
i.e. noise level). Given a blurry image and the corresponding blur kernel , the task of recovering the sharp image is referred to as (nonblind) image deconvolution, which is often used as the final step in blind image deblurring [1, 2, 3].Single image deconvolution is challenging and mathematically illposed due to the unknown noise and the loss of the highfrequency information. Many conventional methods resort to different natural image priors based on manually designed empirical statistics (e.g. sparse gradient prior [4, 5, 6]) or learned generative models (e.g. Gaussian mixture models (GMMs) [7]), which usually lead to nonconvex problems and timeconsuming optimization. Discriminative learning methods [8, 9, 5, 10, 11] are thus investigated for both efficiency and image restoration quality. To handle the instancespecific blur kernel, previous methods [8] approach discriminative learning by parameterizing part of the conventional optimization processes.
Due to the successes on many computer vision applications, deep neural networks (DNNs) have been used more frequently for learning discriminative image restoration models
[10, 11, 12, 13]. Since directly using endtoend DNNs to perform deconvolution is nontrivial [10], many approaches resort to integrating neural networks into the optimization methods [11, 13, 9, 14], in which the DNNs are usually only used to learn the operators corresponding to the priors/regularizers (e.g. proximal projectors [13, 12]). As a result, they need the manually parameter tuning (that reflects the unknown noise level) for a specific blurred image (in testing) [14, 12, 13] or customized training for specific noise levels [11, 9], limiting their applications in practice. Specifically, the prior related operators (e.g. convolution neural network (CNN) based projectors [13, 12]) are usually learned independently to the form of degenerations and then used to process the intermediate results produced in the modelbased optimization. Although the independent training shows a flexible formulation for handling different degenerations, the learned priors may not be able to handle arbitrary intermediate images, which never appear in training. Thus it fails to produce satisfactory results for some challenging cases with severe degenerations [12].We address the above issues by proposing Recurrent Gradient Descent Network (RGDN), a recurrent DNN architecture derived from gradient descent optimization methods. The RGDN iteratively updates the unknown variable by mimicking the gradient descent optimization process. To achieve this, we parametrize and learn a universal gradient descent optimizer, which can be repeatedly used to update based on its previous updates. Unlike previous methods [13, 11, 12, 14] only focusing on image prior learning, we parametrize and learn all main operations of a general gradient descent algorithm, including the gradient of a freeform image prior, based on CNNs. The RGDN implicitly learns an image prior and tunes adaptive parameters through the CNN based gradient descent optimizer. As a result, the RGDN is free of parameters (e.g. with no need for noise level) in deconvolution and can be trained to see different degenerated images, leading to high effectiveness and robustness. Moreover, it is worth emphasizing that RGDN is composed of a universal optimizer sharing parameters across all steps. It is significantly different from previous methods [11, 14] that fix the step number and train different parameters for different steps. The structure of RGDN ensures that a universal optimizer is trained to handle different states during the optimization, which tallies with the nature of iterative optimizer.
To summarize, the main contributions of this paper are:

[topsep=1pt]

We learn an optimizer for image deconvolution by fully parameterizing the general gradient descent optimizer, instead of learning only image priors [7, 15] or the priorrelated operators [13, 14, 12]. The integration of trainable DNNs and the fully parameterized optimization algorithm yields a parameterfree, effective and robust deconvolution method, making a substantial step towards the practical deconvolution for realworld images.

We propose a new discriminative learning model, i.e. the RGDN, to learn an optimizer for image deconvolution. The RGDN systematically incorporates a series of CNNs into the general gradient descent scheme. Benefiting from the parameter sharing and recursive supervision, RGDN tends to learn a universal updating unit (i.e. optimizer), which can be iteratively applied arbitrary times to boost the performance (as the classic optimization algorithms), making it a very flexible and practical method.

Training one RGDN model is able to handle various types of blur and noise. Extensive experiments on both synthetic data and real images show that the parameterfree RGDN learned from a synthetic dataset can produce competitive or even better results against the other stateoftheart methods requiring given/known noise level.
2 Related Work
Nonblind image deconvolution has been extensively studied in computer vision, signal processing and other related fields. We will only discuss the most relevant works. Existing nonblind deconvolution methods can be mainly categorized into two groups: manuallydesigned conventional methods and the learning based methods.
Manually designed nonblind deconvolution Many manuallydesigned approaches use empirical statistics on natural image gradients as the regularization or prior term [6, 5, 4], such as the total variation (TV) regularizer [6], sparsity prior on secondorder image gradients [4] and approximate hyperLaplacian distribution [5]. Meanwhile, various optimization methods have been studied for solving image deconvolution problem, e.g. alternating direction method of multipliers (ADMM) [16]. These conventional methods are often sensitive to the parameter settings and may be computationally expensive.
Learningbased nonblind deconvolution Rather than using manuallydesigned regularizers, some methods learn generative models from data as image priors [7, 15, 17]. Zoran and Weiss [7] propose a GMMbased image prior and a corresponding algorithm (EPLL) for deconvolution, which is further extended in [17]. EPLL is effective but very computationally expensive. Schmidt et al. [15] train a Markov Random Field (MRF) based natural image prior for image restoration. Similar to the manuallydesigned priors, the learned priors also require well tuned parameters for specific noise levels.
To improve efficiency, some approaches address deconvolution by directly learning a discriminative function [18, 8, 9, 14, 13]. Schuler et al. [9]
impose a regularized inversion of the blur in the Fourier domain and then remove the noise using a learned multilayer perceptron (MLP). Schmidt and Roth
[8] propose shrinkage fields (CSF), an efficent discriminative learning procedure based on a random field structure. Schmidt et al. [18] propose an approach based on Gaussian conditional random field, in which the parameters are calculated through regression trees.Deep neural networks have been studied as a more flexible and efficient approach for deconvolution. Xu et al. [10]
train a CNN to restore the images with outliers in an endtoend fashion, which requires a finetuning for every blur kernel. As shown by the plugandplay framework
[19, 20], the variable splitting techniques [21, 16] can be used to decouple the restoration problem as a data fidelity term and a regularization term corresponding to a projector in optimization. To handle the instancespecific blur kernel more easily, a series of methods [11, 13, 12] learn a denoisor and integrate it into the optimization as the projector reflecting the regularization. In [11], a fully convolutional network (FCN) is trained to remove noise in image gradients to guide the image deconvolution, which has to be customtrained for specific noise level. Zhang et al. [13] learn a set of CNN denoisors (for different noise levels) and plug them into a halfquadratic splitting (HQS) scheme for image restoration. Chang et al. [12] learn a proximal operator with adversarial training as image prior. Relying on HQS, Kruse et al. [14] learn a CNNbased prior term companying with an FFTbased deconvolution scheme. These methods only focus on learning the prior/regularization term, and the noise level is required to be known in the testing phase. In a recent work, Jin et al. [22] propose a Bayesian framework for noise adaptive deconvolution.Other related works Early works [23, 24, 25] have explored the general idea of learning to optimize for speeding up and boosting the learning process by accumulating learning experiences on some specific tasks, such as fewshot learning [25]. In the most related work [24], a coordinatewise LSTM is trained to train neural networks for image classification. Deep neural networks with recurrent structures have also been studied in many other lowlevel image processing tasks, such as blind image deblurring [26, 27]
, image superresolution
[28], and image filtering [29].3 Recurrent Gradient Descent Network
In this section, we will first briefly review the classical modelbased nonblind deconvolution problem and the general gradient descent algorithm. We then propose the RGDN model with a fully parameterized gradient descent scheme. Finally, we discuss how to perform training and deconvolution with RGDN.
We consider the common blur model in (1), which can also be rewritten as , where denotes the convolution matrix of .
3.1 Revisiting Gradient Descent for Nonblind Deconvolution
Based on the blur model (1) and the common Gaussian noise assumption, given a blurry image and the blur kernel , the desired solution of the nonblind deconvolution should minimize a data fidelity term , where the weighting term reflects the noise level in . Considering the illposed nature of the problem, given a regularizer , the nonblind deconvolution can be achieved by solving the minimization problem
(2) 
where the regularizer corresponds to the image prior, and the weighting term controls the strength of the regularization. Generally, can be in any form, such as the classical choice TV regularizer [6] or an arbitrary learningbased freeform regularizer.
Although the optimization algorithms with high level abstractions (e.g. proximal algorithm [30]) are often used for problem (2) [5, 7, 31], to show the potential of the proposed idea, we start from the gradient descent method sitting in a basic level. Let denote the step index. The vanilla gradient descent solves for (i.e. an estimate of ) via a sequence of updates:
(3) 
where denotes the descent direction, denotes the step length, and and denote the gradients of and at step . In classic gradient methods, the step length is usually determined by an exact or approximate line search procedure [32]. Specifically, for the deconvolution problem (2), . Note that may also be a subgradient for some regularizers.
To accelerate the optimization, we can scale the descent direction via a scaling matrix using the curvature information, which can be determined by different ways. For example, is the inverse Hessian matrix (or an approximation) when the second order information of the objective is used [32]. We thus arrive a general updating equation at step :
(4) 
Given an initialization , a general gradient descent solves problem (2) by repeating the updating in (4) until some certain stopping conditions are achieved. The compact formulation in (4) offers an advantage to learn a universal parametrized optimizer.
3.2 Parameterization of the Gradient Descent
Our final goal is to learn a mapping function that takes a blurry image and the blur kernel as input and recovers the target clear image as . We achieve this by learning a fully parameterized optimizer.
Given from the previous step, the gradient descent in (4) calculates relying on several main operations including gradient (or derivative) calculation for data fidelity term and regularizer , calculation of scaling matrix and step length determination. For enabling the flexibility for learning, we fully parameterize the gradient descent optimizer in (4). To achieve this, we replace the main computation entities with a series of parameterized mapping functions. Firstly, we let replace to supplant the gradient of the regularizer. It implicitly plays as an image prior. Considering that the noise level in is unknown and hard to estimate in a prior, a predefined is insufficient in practice. We then define an operator to handle the unknown noise and the varying estimation error (in ) by adjusting . implicitly tunes adaptively. Finally, we define as a functional operator to replace in each step to control the descent direction (i.e. ). and absorb the tradeoff weight and the step length , respectively. As shown in Fig. 1 (b), by replacing the calculation entities in (4) with the mapping functions introduced above, the gradient descent optimizer at each step can be formulated as:
(5) 
where denotes the parametrized gradient descent optimizer, and denotes the gradient generator consisting of , and . Given an initial (e.g. letting ), we can formulate the whole estimation model as
(6) 
where denotes the composition operator, denotes a the fold composition of , and denotes the set of all parameters of (i.e. the parameters of , and ). means the optimizer is performed times.
3.3 The Structure of the RGDN
We propose to formulate the model in equation (6) as a Recurrent Gradient Descent Network (RGDN). Considering that the updates of from an iterative optimization scheme naturally compose a sequence of arbitrary length, we use a universal gradient descent unit (GDU) to implement and apply it in all steps in a recurrent manner (see Fig. 1 (a)).
In each single GDU, the gradient generator takes a current prediction of size and generates a gradient with the same size. In , the subcomponents , and play as mapping functions with a same size for input and output as well. Considering that CNNs with an encoderdecoder architecture have been commonly used to model similar mapping functions, we implement , and using three CNNs with the same structure shown in Fig. 1 (c). Since finding the best structures for each subnetwork is not the main focus, we use the same structure as a default plain choice. Nevertheless, the three CNNs are trained with different parameters, resulting in different functions. We then construct the GDU by assembling the three CNNs according to the model in (5) (see Fig. 1 (b)).
As shown in Fig. 1, each trainable CNN consists of 3 convolution layers () and 3 transposed convolution () layers. Except for the first and the last layers, each or
is followed by a batch normalization (
) layer [33]and a ReLU activation function. Following a widely used setting
[13], the first is only followed by a ReLU activation function. Apart from the last , we apply 64 convolution features for each and . The last maps the 64channel intermediate features to a channel RGB output, wheredenotes the number of channels of the image. We set the stride size as 1 for all
and . Our contributions are agnostic to the specific implementation choice for the structure of each subnetwork corresponding to , and , respectively, which may be further tuned for better performance.Some previous methods [8, 14] truncate the classic iterative optimization algorithm with fixed step numbers and rigidly train different parameters to process the images from previous steps. However, in principle, a fixed step number may not be proper for all images. Unlike them, towards learning a universal optimizer, the proposed RGDN shares parameters among the GDUs in all steps, which enables the optimizer (i.e. the shared GDU) to see and handle different states during the iterations. Training of the RGDN thus gives us flexibility to repeat the learned optimizer arbitrary times to approach the desired deconvolution results for different cases. We can stop the process relying on some stopping conditions as the classic iterative optimization algorithms.
3.4 Learning an Optimizer via Training an RGDN
Training loss We expect to determine the best model parameter that accurately estimates through training on a given dataset . We minimize the mean squared error (MSE) between the ground truth and the estimate over the training dataset:
(7) 
Inspired by [11, 34], we also consider to minimize the gradient discrepancy in training:
(8) 
where and
denote the operators calculating the image gradients in the horizontal and vertical directions, respectively. The loss function in (
8) is expected to help to produce sharp images [34]. For all experiments, the models are trained by minimizing the sum of and .Recursive supervision and training objective Instead of solely minimizing the difference between the ground truth and the output of the final step, we impose recursive supervision [28] that supervises not only the final estimate but also the outputs of the intermediate steps (i.e. outputs of ’s, for ). The recursive supervision directly forces the output of each step to approach the ground truth, which accelerates the training and enhances the performance (see Section 4.3). Let denote the estimate of from the th step. By averaging over all training samples and the steps, we have the whole training objective
where denotes the importance weight. Note that in the experiments. As shown in Fig. 2, the learned optimizer steadily pushes the results close to the ground truth, which is consistent with the recursive supervision.
Implementation details Although the number of steps the RGBN takes is not bounded in principle, considering the training efficiency, we run the optimizer for 5 steps in training (i.e. ). As shown in experiments, benefiting from parameter sharing and recursive supervision, training with a fixed step number does not interfere the generalization of the learned optimizer.
For training, we randomly initialize the parameters of RGDN. The training is carried out using a minibatch Adam [35] optimizer. We set the batch size and learning rate as 4 and , respectively.
3.5 Deconvolution using RGDN
Although we train the RGDN using a fixed number of steps, benefiting from the parameter sharing and recursive supervision, we can achieve a universal optimizer after training on diverse samples. We thus can perform nonblind deconvolution using arbitrary times and step the processing relying on some certain stopping conditions as a classic optimizer. Fig. 2 shows that the qualities of the intermediate images steadily increase with the increasing steps. In the intermediate images, with more iterations, more details are recovered and the artifacts are suppressed by the learned optimizer. The learned optimizer is able to generally handle the varying visual appearance among the input images and the intermediate results as well as consistently improve the estimates. More numerical studies are in Section 4.4. The optimization can be stopped when achieving , where and is a small tolerance parameter. In practice, a maximum iteration number is also used as a stopping criterion.
4 Experiments
We conduct experiments with the proposed method for single image nonblind deconvolution. Our implementation is based on PyTorch
[36] and uses an NVIDIA TITAN Xp graphics card for acceleration. Our learned optimizer takes about 0.03 seconds for one step on an image of pixel, and roughly 0.2 seconds for an image less than one megapixel.4.1 Datasets and Experimental Settings
Training To generate the triplet set for training, we crop 40,960 RGB images of pixels from the PASCAL VOC dataset [37] as the ground truth images . We then independently generate blur kernels according to [38] for each and generate blurred image based on model (1), which gives 204,800 triplets in total. After adding a Gaussian noise term from , 8bit quantization is used following [14]. Instead of training a customized model for a specific blur kernel [9] or noise level [8, 18, 11], we uniformly sample kernel sizes from a set and noise levels from an interval ^{1}^{1}1An image with a ratio of Gaussian noise is generated by adding noise from for image with intensity range., which helps to evaluate the ability of the network to handle diverse data.
Testing The testing is performed on several benchmark datasets [39, 40, 41] that are independent to the training data. Considering that RGB images are predominant in reality, we trained our model on RGB images with 3 channels. To test on the benchmark dataset [39] of gray images, we replicate the single existing channel twice. Different noise levels are used to measure the robustness of the methods. In the experiments, we apply the stopping conditions introduced in Section 3.5 and set the maximum iteration number as if not indicated otherwise.
In the following, we will first conduct a full numerical comparison with other stateoftheart methods, e.g. FD [5], the method of Levin et al. [4], EPLL [7], MLP [9], CSF [8], IRCNN [13] and FDN [14]. We then conduct a series of empirical analyses and ablation studies for the proposed method. Finally, quantitative comparisons between the methods are conducted on realworld images. It is worth noting that, in deconvolution, apart from the RGDN that is free of parameters, the parameters of all other methods are set using the ground truth noise level. We use the pairwise version of CSF [8] trained for deconvolution in comparison. The comparison with the CNN based baseline method [10] is absent since it needs finetuning for every blur kernel, making it unpractical. We measure the performance in terms of PSNR and SSIM [42]. Following [14], the regions close to the image boundary are discarded when calculating the measurements.
4.2 Numerical Evaluations on Synthetic Datasets
We first conduct experiments on three datasets with various types of images, blur kernels and noise levels. The testing images are independent to the training data.


Ground Truth  Levin et al. [4]  CSF [8]  IRCNN[13]  FDN [14]  RGDN (ours) 
Evaluation on grayscale image benchmark We first evaluate the performance of the methods on a widely used benchmark dataset of Levin et al. [39], which contains 32 blurry gray images (of pixels) from 4 clear images and 8 blur kernels. To deal with the gray images, we generate 3channel images via replication. Images with different noise levels () are also generated by adding adding noise to all channels. Note that the noise level on the original blurry images are about , as discussed in [14]. The comparison on the three noise levels is shown in Table 1. IRCNN [13], FDN [14] and RGDN outperform other methods due to the deep neural networks that provide more powerful natural image priors. Although RGDN is trained as a noiselevelversatile model, its performance is better than other methods or competitive with the best one. The performance of EPLL [7] is close to the best one on this benchmark, however it is dozens of times slower than the proposed methods.
Mea.  FD  Levin  EPLL  MLP  CSF  IRCNN  FDN  RGDN  

0.59%  PSNR  32.36  33.60  34.35  31.55  32.08  33.35  36.15  35.04 
SSIM  0.917  0.934  0.941  0.876  0.916  0.884  0.965  0.954  
1%  PSNR  30.85  32.01  32.45  30.68  28.12  33.14  33.62  33.68 
SSIM  0.892  0.913  0.930  0.882  0.828  0.896  0.949  0.954  
2%  PSNR  28.84  29.92  30.03  28.16  21.68  30.09  29.70  31.01 
SSIM  0.851  0.877  0.883  0.841  0.594  0.887  0.896  0.899 
Evaluation on large RGB images To study the performance of the proposed methods on large images, we evaluate the methods on the dataset in [40]. We generate an RGB version of the benchmark [40] using the original 80 RGB images [43] and same 8 blur kernels from Levin et al. ’s dataset [39]. Three different noise levels are adopted. The average PSNR and SSIM values are shown in Table 2. The performance of RGDN is on par with or better than other methods. Maybe because FDN [14] takes the ground truth noise level as input, it achieves marginally better performance than the proposed method when the noise level is low (). Even though the RGDN is trained on the data with noise level lower than , it still performs well on high noise level data (i.e. and ), which also proves the generalization ability of the proposed method. IRCNN [13] also performs very well for large noise levels. An example of visual comparison is shown in Fig. 3.
Mea.  FD  Levin  EPLL  MLP  CSF  IRCNN  GradNet  FDN  RGDN  

1%  PSNR  29.90  30.29  32.05  31.01  28.32  30.44  31.75  32.52  32.33 
SSIM  0.826  0.841  0.880  0.882  0.797  0.900  0.873  0.909  0.907  
2%  PSNR  29.08  28.81  29.60  27.82  20.06  29.47  29.31  29.04  29.59 
SSIM  0.816  0.795  0.807  0.789  0.362  0.867  0.798  0.842  0.855  
3%  PSNR  23.19  28.00  28.25  25.30  16.66  28.05  28.04  24.41  28.45 
SSIM  0.532  0.768  0.758  0.627  0.237  0.806  0.750  0.653  0.812 
Evaluation on images with large blur kernels and strong noise The above datasets only use 8 blur kernels from [39], whose sizes are limited to . To study the behaviors of the methods on large blur kernels, we generate a dataset (BSDBlur) with 150 images by randomly selecting 15 images from the dataset BSD [41] and 10 blur kernels of size from [38]. In order to study the noise robustness, high noise levels (2%, 3% and 5%) are used. As shown in Table 3, IRCNN [13] and the proposed method significantly outperform the other methods. However, as shown in Fig. 3, the results of IRCNN [13] suffer from more ringing artifacts and oversmoothness, which may be related to the conventional HQS image updating scheme in IRCNN. The performance of FDN [14] degenerated quickly with increasing of noise level, although it is trained on a dataset with a similar noise level setting to ours. The proposed method achieves better generalization on the testing data. As shown in Fig. 3, the visual quality of the image recovered by the proposed method also outperforms the other methods. Even the input image is degenerated by severe noise, the proposed method can still recover a clear image with rich details.
Mea.  FD  Levin  MLP  CSF  IRCNN  FDN  RGDN  

2%  PSNR  23.60  22.70  19.23  17.40  22.29  23.48  24.27 
SSIM  0.648  0.577  0.570  0.497  0.657  0.697  0.699  
3%  PSNR  20.65  22.12  19.71  15.15  22.03  20.25  23.17 
SSIM  0.555  0.541  0.546  0.385  0.654  0.559  0.637  
5%  PSNR  6.410  21.48  19.87  12.51  21.10  9.090  21.80 
SSIM  0.004  0.501  0.500  0.259  0.605  0.094  0.560 
Summary of the numerical studies The three experiments above prove the effectiveness and robustness of the proposed method on images with diverse blur kernels and noise levels. Comparing to the previous stateoftheart methods requiring the ground truth noise level, the proposed method achieves better or competitive results with a parameterfree setting.
4.3 Ablation Study for RGDN
In this section, we perform an ablation study to analyze several aspects in terms of the structure of the RGDN. For simplicity, we run all the studies on the Levin et al. ’s dataset [39] with different noise levels used in Section 4.2.
Study on the structure of RGDN As shown in Fig. 1, RGDN mainly consists of three subnetworks corresponding to three parameterized operations , and , which are trained jointly. To verify the importance of each subnetwork, we conduct experiments by removing them from RGDN, respectively, and train the networks with the same setting for the complete RGDN. Table 4 shows that removing the regularization term substantially degrades results, showing that is crucial for RGDN. The RGDN without corresponds to the problem (2) without the regularizer , which suffers from the illposedness. Removing both the direction scaling operator and also significantly degrades the performances, as does removing . This may be interpreted as the deficiency of the ability to handle noise. Table 4 also shows that the performance degenerates without or , and the direction scaling operator plays a more important role than . We therefor conclude that all the three terms are important to the results, and they work interdependently.
PSNR  SSIM  PSNR  SSIM  PSNR  SSIM  

RGDN w/o  18.94  0.629  18.69  0.619  18.44  0.588 
RGDN w/o  33.71  0.939  32.61  0.928  30.47  0.886 
RGDN w/o  33.73  0.941  33.00  0.929  30.92  0.892 
RGDN w/o and  11.33  0.211  11.32  0.210  11.30  0.202 
RGDN w/o recursive supervision  22.04  0.744  21.54  0.709  20.82  0.655 
RGDN  35.04  0.954  33.68  0.954  31.01  0.899 
Study of the recursive supervision in training We use recursive supervision to accelerate training and enable the learned optimizer to push the image towards the ground truth in each step. We study the importance of the recursive supervision by removing the supervision on intermediate steps (i.e. only keeping the loss at the last step) and training the model with same setting as RGDN. As shown in Table 4, removing the recursive supervision incurs a significant performance degeneration due to the difficulties on training. The other possible reason is that only imposing supervision on the final step restricts the training merely minimizing the loss after a fixed number of steps, making the leaned optimizer less flexible.
4.4 Empirical Convergence Analysis
Since the neural networks are too complicated to derive some general properties for convergence analysis, we tend to empirically analyze the convergence of the learned optimizer in testing. To study the instancespecific convergence speed in the meantime, we select two images from Levin et al. ’s dataset [39] and BSDBlur, respectively, and perform deconvolution on them. Fig. 4 shows the variation of the PSNR and the data fitting error with increasing iteration numbers. As more updating steps are performed, PSNR values smoothly increase and the fitting error decrease, which is consistent with results shown in Fig. 2. The empirical results in Fig. 4 show that the learned optimizer converges well after 30 iterations. Moreover, Fig. 4 also shows that different images require different numbers of steps for convergence. Comparing the the previous methods with a fixed number of steps [14, 8, 11], the learned universal optimizer provides a flexibility to fit the different requirements for different images.
4.5 Visual Comparison on Realworld Images
In realworld applications, the nonblind deconvolution performs as a part of the blind deblurring [1, 44], where the ground truth blur kernel is unknown. The nonblind deconvolution is performed using some imprecise kernels estimated by other methods, e.g. [1, 45], which brings more challenges. We thus conduct experiments to study the practicability of the proposed method. Since the ground truth images are also unknown, we only present the visual comparison against the stateoftheart methods.
We first test on a realworld image given a blur kernel estimated by [1]. As shown in Fig. 5, even though the input kernel is imprecise, the proposed method can recover the details of the blurry image and suppress the ringing artifacts due to the powerful learned optimizer. However, the results of other methods suffers from artifacts or oversmoothness due to the inaccurate kernel and the unknown noise level, which also shows that the proposed method is generally more practical in realworld scenarios.
[width=0.24trim=2cm 2cm 2cm 2cm, clip]results/res/real_03_.jpg  [width=0.24trim=2cm 2cm 2cm 2cm,, clip]results/res/real_03_results_levin.jpg  [width=0.24trim=2cm 2cm 2cm 2cm,, clip]results/res/real_03__results_csf.jpg  [width=0.24trim=2cm 2cm 2cm 2cm, clip]results/res/real_03_results_schuler.jpg  
[width=0.24trim=2cm 2cm 2cm 2cm, clip]results/res/real_03__results_epll.jpg  [width=0.24trim=2cm 2cm 2cm 2cm,, clip]results/res/real_03__results_ircnn.jpg  [width=0.24trim=2cm 2cm 2cm 2cm,, clip]results/res/real_03__result_fft.jpg  [width=0.24trim=2cm 2cm 2cm 2cm, clip]results/res/real_03__results_ours.jpg  
Fig. 6 shows a comparison on a text image where the blur kernel is from [1]. The visual quality of the proposed method outperforms other methods, which implies that the proposed method can handle diverse images. The results of other methods suffer from heavy artifacts due to the imprecise blur kernel and unknown noise.
To assess the robustness of the propose method, we further test on a realworld blurry image with severe noise, in which the blur kernel is estimated using the method in [45]. As shown in Fig. 7, our restored image contains more details and suffers less ring artifacts than others. Fig. 7 (c) shows that IRCNN [13] does not competently handle the noise in the realworld image. Even though the result of FDN [14] may look sharp, it suffers from severe artifacts due the high noise level.
[width=0.19trim=0 8.7cm 0 0cm, clip]results/res/real_10_.jpg  [width=0.19trim=0 8.7cm 0 0cm, clip]results/res/real_10_results_levin.jpg  [width=0.19trim=0 8.7cm 0 0cm, clip]results/res/real_10__results_ircnn.jpg  [width=0.19trim=0 8.7cm 0 0cm, clip]results/res/real_10__result_fft.jpg  [width=0.19trim=0 8.7cm 0 0cm, clip]results/res/real_10__results.jpg  

5 Conclusion
We have developed a Recurrent Gradient Descent Network (RGDN) which serves as an optimizer to deconvolute images. The components of the network are inspired by the key components of gradient descent method and designed accordingly. The proposed RGDN implicitly learns a prior and tunes adaptive parameters through the CNN components of the gradient generator. The network has been trained on a diverse dataset thus is able to restore a wide range of blur images much better than previous approaches. Our gradient descent unit is designed to handle Gaussian noise as specified in the loss (as shown in (2)). One way to extend our network is to allow the gradient descent unit to model other type of noises or other losses.
References

[1]
Pan, J., Hu, Z., Su, Z., Yang, M.H.:
Deblurring text images via l0regularized intensity and gradient
prior.
In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (2014) 2901–2908
 [2] Xu, L., Jia, J.: Twophase kernel estimation for robust motion deblurring. In: European Conference on Computer Vision (ECCV). (2010) 157–170

[3]
Gong, D., Yang, J., Liu, L., Zhang, Y., Reid, I., Shen, C., Hengel, A.v.d.,
Shi, Q.:
From motion blur to motion flow: a deep learning solution for removing heterogeneous motion blur.
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)  [4] Levin, A., Fergus, R., Durand, F., Freeman, W.T.: Image and depth from a conventional camera with a coded aperture. ACM transactions on graphics (TOG) 26(3) (2007) 70
 [5] Krishnan, D., Fergus, R.: Fast image deconvolution using hyperlaplacian priors. In: Advances in Neural Information Processing Systems (NIPS). (2009) 1033–1041
 [6] Wang, Y., Yang, J., Yin, W., Zhang, Y.: A new alternating minimization algorithm for total variation image reconstruction. SIAM Journal on Imaging Sciences 1(3) (2008) 248–272
 [7] Zoran, D., Weiss, Y.: From learning models of natural image patches to whole image restoration. In: The IEEE International Conference on Computer Vision (ICCV). (2011) 479–486
 [8] Schmidt, U., Roth, S.: Shrinkage fields for effective image restoration. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (2014) 2774–2781

[9]
Schuler, C.J., Christopher Burger, H., Harmeling, S., Scholkopf, B.:
A machine learning approach for nonblind image deconvolution.
In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (2013) 1067–1074  [10] Xu, L., Ren, J.S., Liu, C., Jia, J.: Deep convolutional neural network for image deconvolution. In: Advances in Neural Information Processing Systems (NIPS). (2014) 1790–1798
 [11] Zhang, J., Pan, J., Lai, W.S., Lau, R., Yang, M.H.: Learning fully convolutional networks for iterative nonblind deconvolution. CVPR (2017)
 [12] Chang, J.R., Li, C.L., Poczos, B., Kumar, B.V., Sankaranarayanan, A.C.: One network to solve them all—solving linear inverse problems using deep projection models. (2017)
 [13] Zhang, K., Zuo, W., Gu, S., Zhang, L.: Learning deep cnn denoiser prior for image restoration. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (2017) 3929–3938
 [14] Kruse, J., Rother, C., Schmidt, U.: Learning to push the limits of efficient fftbased image deconvolution. In: IEEE International Conference on Computer Vision (ICCV). (2017) 4586–4594
 [15] Schmidt, U., Gao, Q., Roth, S.: A generative perspective on mrfs in lowlevel vision. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (2013) 1067–1074
 [16] Goldstein, T., O’Donoghue, B., Setzer, S., Baraniuk, R.: Fast alternating direction optimization methods. SIAM Journal on Imaging Sciences 7(3) (2014) 1588–1623
 [17] Sun, L., Cho, S., Wang, J., Hays, J.: Good image priors for nonblind deconvolution: Generic vs specific. In: European Conference on Computer Vision (ECCV). (2014) 231–246
 [18] Schmidt, U., Jancsary, J., Nowozin, S., Roth, S., Rother, C.: Cascades of regression tree fields for image restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 38(4) (2016) 677–689
 [19] Venkatakrishnan, S.V., Bouman, C.A., Wohlberg, B.: Plugandplay priors for model based reconstruction. In: Global Conference on Signal and Information Processing (GlobalSIP), IEEE (2013) 945–948
 [20] Heide, F., Diamond, S., Nießner, M., RaganKelley, J., Heidrich, W., Wetzstein, G.: Proximal: Efficient image optimization using proximal algorithms. ACM Transactions on Graphics (TOG) 35(4) (2016) 84
 [21] Geman, D., Yang, C.: Nonlinear image recovery with halfquadratic regularization. IEEE Transactions on Image Processing 4(7) (1995) 932–946
 [22] Jin, M., Roth, S., Favaro, P.: Noiseblind image deblurring. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (2017)
 [23] Li, K., Malik, J.: Learning to optimize. arXiv preprint arXiv:1606.01885 (2016)
 [24] Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M.W., Pfau, D., Schaul, T., de Freitas, N.: Learning to learn by gradient descent by gradient descent. In: Advances in Neural Information Processing Systems (NIPS). (2016) 3981–3989
 [25] Ravi, S., Larochelle, H.: Optimization as a model for fewshot learning. (2016)
 [26] Wieschollek, P., Hirsch, M., Schölkopf, B., Lensch, H.: Learning blind motion deblurring. The IEEE International Conference on Computer Vision (ICCV) (2017)
 [27] Kim, T.H., Lee, K.M., Schölkopf, B., Hirsch, M.: Online video deblurring via dynamic temporal blending network. arXiv preprint arXiv:1704.03285 (2017)
 [28] Kim, J., Kwon Lee, J., Mu Lee, K.: Deeplyrecursive convolutional network for image superresolution. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (2016) 1637–1645
 [29] Liu, S., Pan, J., Yang, M.H.: Learning recursive filters for lowlevel vision via a hybrid neural network. In: European Conference on Computer Vision (ECCV). (2016) 560–576
 [30] Parikh, N., Boyd, S., et al.: Proximal algorithms. Foundations and Trends® in Optimization 1(3) (2014) 127–239
 [31] Gong, D., Tan, M., Zhang, Y., van den Hengel, A., Shi, Q.: Mpgl: An efficient matching pursuit method for generalized lasso. In: AAAI. (2017) 1934–1940
 [32] Wright, S., Nocedal, J.: Numerical optimization. Springer Science 35(6768) (1999) 7
 [33] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (ICML). (2015) 448–456
 [34] Fan, Q., Yang, J., Hua, G., Chen, B., Wipf, D.: A generic deep architecture for single image reflection removal and image smoothing. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV). (2017)
 [35] Kinga, D., Adam, J.B.: A method for stochastic optimization. In: International Conference on Learning Representations (ICLR). (2015)
 [36] https://github.com/pytorch/pytorch
 [37] Everingham, M., Eslami, S.M.A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision (IJCV) 111(1) (2015) 98–136
 [38] Chakrabarti, A.: A neural approach to blind motion deblurring. In: European Conference on Computer Vision (ECCV). (2016) 221–235
 [39] Levin, A., Weiss, Y., Durand, F., Freeman, W.T.: Understanding and evaluating blind deconvolution algorithms. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE (2009) 1964–1971
 [40] Sun, L., Cho, S., Wang, J., Hays, J.: Edgebased blur kernel estimation using patch priors. In: IEEE International Conference on Computational Photography (ICCP), IEEE (2013) 1–8
 [41] Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 33(5) (2011) 898–916
 [42] Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13(4) (2004) 600–612
 [43] Sun, L., Hays, J.: Superresolution from internetscale scene matching. In: IEEE International Conference on Computational Photography (ICCP), IEEE (2012) 1–12
 [44] Gong, D., Tan, M., Zhang, Y., van den Hengel, A., Shi, Q.: Selfpaced kernel estimation for robust blind image deblurring. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2017) 1661–1670
 [45] Gong, D., Tan, M., Zhang, Y., Van den Hengel, A., Shi, Q.: Blind image deconvolution by automatic gradient activation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (2016) 1827–1836
Comments
There are no comments yet.