Learning an Optimizer for Image Deconvolution

04/10/2018 ∙ by Dong Gong, et al. ∙ 0

As an integral component of blind image deblurring, non-blind deconvolution removes image blur with a given blur kernel, which is essential but difficult due to the ill-posed nature of the inverse problem. The predominant approach is based on optimization subject to regularization functions that are either manually designed, or learned from examples. Existing learning based methods have shown superior restoration quality but are not practical enough due to their restricted model design. They solely focus on learning a prior and require to know the noise level for deconvolution. We address the gap between the optimization-based and learning-based approaches by learning an optimizer. We propose a Recurrent Gradient Descent Network (RGDN) by systematically incorporating deep neural networks into a fully parameterized gradient descent scheme. A parameter-free update unit is used to generate updates from the current estimates, based on a convolutional neural network. By training on diverse examples, the Recurrent Gradient Descent Network learns an implicit image prior and a universal update rule through recursive supervision. Extensive experiments on synthetic benchmarks and challenging real-world images demonstrate that the proposed method is effective and robust to produce favorable results as well as practical for real-world image deblurring applications.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

page 10

page 14

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Image deconvolution, also known as image deblurring, aims to recover a sharp image from an observed blurry image. The blurry image is usually modeled as a convolution of a latent image and a blur kernel :

(1)

where denotes the convolution operator,

denotes an i.i.d. white Gaussian noise term with unknown standard deviation (

i.e. noise level). Given a blurry image and the corresponding blur kernel , the task of recovering the sharp image is referred to as (non-blind) image deconvolution, which is often used as the final step in blind image deblurring [1, 2, 3].

Single image deconvolution is challenging and mathematically ill-posed due to the unknown noise and the loss of the high-frequency information. Many conventional methods resort to different natural image priors based on manually designed empirical statistics (e.g. sparse gradient prior [4, 5, 6]) or learned generative models (e.g. Gaussian mixture models (GMMs) [7]), which usually lead to non-convex problems and time-consuming optimization. Discriminative learning methods [8, 9, 5, 10, 11] are thus investigated for both efficiency and image restoration quality. To handle the instance-specific blur kernel, previous methods [8] approach discriminative learning by parameterizing part of the conventional optimization processes.

Due to the successes on many computer vision applications, deep neural networks (DNNs) have been used more frequently for learning discriminative image restoration models

[10, 11, 12, 13]. Since directly using end-to-end DNNs to perform deconvolution is non-trivial [10], many approaches resort to integrating neural networks into the optimization methods [11, 13, 9, 14], in which the DNNs are usually only used to learn the operators corresponding to the priors/regularizers (e.g. proximal projectors [13, 12]). As a result, they need the manually parameter tuning (that reflects the unknown noise level) for a specific blurred image (in testing) [14, 12, 13] or customized training for specific noise levels [11, 9], limiting their applications in practice. Specifically, the prior related operators (e.g. convolution neural network (CNN) based projectors [13, 12]) are usually learned independently to the form of degenerations and then used to process the intermediate results produced in the model-based optimization. Although the independent training shows a flexible formulation for handling different degenerations, the learned priors may not be able to handle arbitrary intermediate images, which never appear in training. Thus it fails to produce satisfactory results for some challenging cases with severe degenerations [12].

We address the above issues by proposing Recurrent Gradient Descent Network (RGDN), a recurrent DNN architecture derived from gradient descent optimization methods. The RGDN iteratively updates the unknown variable by mimicking the gradient descent optimization process. To achieve this, we parametrize and learn a universal gradient descent optimizer, which can be repeatedly used to update based on its previous updates. Unlike previous methods [13, 11, 12, 14] only focusing on image prior learning, we parametrize and learn all main operations of a general gradient descent algorithm, including the gradient of a free-form image prior, based on CNNs. The RGDN implicitly learns an image prior and tunes adaptive parameters through the CNN based gradient descent optimizer. As a result, the RGDN is free of parameters (e.g. with no need for noise level) in deconvolution and can be trained to see different degenerated images, leading to high effectiveness and robustness. Moreover, it is worth emphasizing that RGDN is composed of a universal optimizer sharing parameters across all steps. It is significantly different from previous methods [11, 14] that fix the step number and train different parameters for different steps. The structure of RGDN ensures that a universal optimizer is trained to handle different states during the optimization, which tallies with the nature of iterative optimizer.

To summarize, the main contributions of this paper are:

  • [topsep=-1pt]

  • We learn an optimizer for image deconvolution by fully parameterizing the general gradient descent optimizer, instead of learning only image priors [7, 15] or the prior-related operators [13, 14, 12]. The integration of trainable DNNs and the fully parameterized optimization algorithm yields a parameter-free, effective and robust deconvolution method, making a substantial step towards the practical deconvolution for real-world images.

  • We propose a new discriminative learning model, i.e. the RGDN, to learn an optimizer for image deconvolution. The RGDN systematically incorporates a series of CNNs into the general gradient descent scheme. Benefiting from the parameter sharing and recursive supervision, RGDN tends to learn a universal updating unit (i.e. optimizer), which can be iteratively applied arbitrary times to boost the performance (as the classic optimization algorithms), making it a very flexible and practical method.

  • Training one RGDN model is able to handle various types of blur and noise. Extensive experiments on both synthetic data and real images show that the parameter-free RGDN learned from a synthetic dataset can produce competitive or even better results against the other state-of-the-art methods requiring given/known noise level.

2 Related Work

Non-blind image deconvolution has been extensively studied in computer vision, signal processing and other related fields. We will only discuss the most relevant works. Existing non-blind deconvolution methods can be mainly categorized into two groups: manually-designed conventional methods and the learning based methods.

Manually designed non-blind deconvolution Many manually-designed approaches use empirical statistics on natural image gradients as the regularization or prior term [6, 5, 4], such as the total variation (TV) regularizer [6], sparsity prior on second-order image gradients [4] and approximate hyper-Laplacian distribution [5]. Meanwhile, various optimization methods have been studied for solving image deconvolution problem, e.g. alternating direction method of multipliers (ADMM) [16]. These conventional methods are often sensitive to the parameter settings and may be computationally expensive.

Learning-based non-blind deconvolution Rather than using manually-designed regularizers, some methods learn generative models from data as image priors [7, 15, 17]. Zoran and Weiss [7] propose a GMM-based image prior and a corresponding algorithm (EPLL) for deconvolution, which is further extended in [17]. EPLL is effective but very computationally expensive. Schmidt et al. [15] train a Markov Random Field (MRF) based natural image prior for image restoration. Similar to the manually-designed priors, the learned priors also require well tuned parameters for specific noise levels.

To improve efficiency, some approaches address deconvolution by directly learning a discriminative function [18, 8, 9, 14, 13]. Schuler et al. [9]

impose a regularized inversion of the blur in the Fourier domain and then remove the noise using a learned multi-layer perceptron (MLP). Schmidt and Roth

[8] propose shrinkage fields (CSF), an efficent discriminative learning procedure based on a random field structure. Schmidt et al. [18] propose an approach based on Gaussian conditional random field, in which the parameters are calculated through regression trees.

Deep neural networks have been studied as a more flexible and efficient approach for deconvolution. Xu et al. [10]

train a CNN to restore the images with outliers in an end-to-end fashion, which requires a fine-tuning for every blur kernel. As shown by the plug-and-play framework

[19, 20], the variable splitting techniques [21, 16] can be used to decouple the restoration problem as a data fidelity term and a regularization term corresponding to a projector in optimization. To handle the instance-specific blur kernel more easily, a series of methods [11, 13, 12] learn a denoisor and integrate it into the optimization as the projector reflecting the regularization. In [11], a fully convolutional network (FCN) is trained to remove noise in image gradients to guide the image deconvolution, which has to be custom-trained for specific noise level. Zhang et al. [13] learn a set of CNN denoisors (for different noise levels) and plug them into a half-quadratic splitting (HQS) scheme for image restoration. Chang et al. [12] learn a proximal operator with adversarial training as image prior. Relying on HQS, Kruse et al. [14] learn a CNN-based prior term companying with an FFT-based deconvolution scheme. These methods only focus on learning the prior/regularization term, and the noise level is required to be known in the testing phase. In a recent work, Jin et al. [22] propose a Bayesian framework for noise adaptive deconvolution.

Other related works Early works [23, 24, 25] have explored the general idea of learning to optimize for speeding up and boosting the learning process by accumulating learning experiences on some specific tasks, such as few-shot learning [25]. In the most related work [24], a coordinate-wise LSTM is trained to train neural networks for image classification. Deep neural networks with recurrent structures have also been studied in many other low-level image processing tasks, such as blind image deblurring [26, 27]

, image super-resolution

[28], and image filtering [29].

3 Recurrent Gradient Descent Network

In this section, we will first briefly review the classical model-based non-blind deconvolution problem and the general gradient descent algorithm. We then propose the RGDN model with a fully parameterized gradient descent scheme. Finally, we discuss how to perform training and deconvolution with RGDN.

We consider the common blur model in (1), which can also be rewritten as , where denotes the convolution matrix of .

3.1 Revisiting Gradient Descent for Non-blind Deconvolution

Based on the blur model (1) and the common Gaussian noise assumption, given a blurry image and the blur kernel , the desired solution of the non-blind deconvolution should minimize a data fidelity term , where the weighting term reflects the noise level in . Considering the ill-posed nature of the problem, given a regularizer , the non-blind deconvolution can be achieved by solving the minimization problem

(2)

where the regularizer corresponds to the image prior, and the weighting term controls the strength of the regularization. Generally, can be in any form, such as the classical choice TV regularizer [6] or an arbitrary learning-based free-form regularizer.

Although the optimization algorithms with high level abstractions (e.g. proximal algorithm [30]) are often used for problem (2) [5, 7, 31], to show the potential of the proposed idea, we start from the gradient descent method sitting in a basic level. Let denote the step index. The vanilla gradient descent solves for (i.e. an estimate of ) via a sequence of updates:

(3)

where denotes the descent direction, denotes the step length, and and denote the gradients of and at step . In classic gradient methods, the step length is usually determined by an exact or approximate line search procedure [32]. Specifically, for the deconvolution problem (2), . Note that may also be a subgradient for some regularizers.

To accelerate the optimization, we can scale the descent direction via a scaling matrix using the curvature information, which can be determined by different ways. For example, is the inverse Hessian matrix (or an approximation) when the second order information of the objective is used [32]. We thus arrive a general updating equation at step :

(4)

Given an initialization , a general gradient descent solves problem (2) by repeating the updating in (4) until some certain stopping conditions are achieved. The compact formulation in (4) offers an advantage to learn a universal parametrized optimizer.

[trim=1 1 1 1, clip, width=1] figures/rgdn.pdf
Figure 1: (a) The overall architecture of our RGDN. Given a blurry image and the corresponding blur kernel , the optimizer (i.e. gradient descent unit) produces a new estimate from the estimate from previous step . Note a universal optimizer is used for all steps with shared structures and parameters. (b) The structure of the optimizer . Each colored block of the optimizer (Left) corresponds to an operation in the classical gradient descent method (Right). (c) , and share a common architecture of a CNN block with different parameters to be learned. Both the input and output (for the optimizer and all subnetworks) are tensors, where is the number of channels of the input image .

3.2 Parameterization of the Gradient Descent

Our final goal is to learn a mapping function that takes a blurry image and the blur kernel as input and recovers the target clear image as . We achieve this by learning a fully parameterized optimizer.

Given from the previous step, the gradient descent in (4) calculates relying on several main operations including gradient (or derivative) calculation for data fidelity term and regularizer , calculation of scaling matrix and step length determination. For enabling the flexibility for learning, we fully parameterize the gradient descent optimizer in (4). To achieve this, we replace the main computation entities with a series of parameterized mapping functions. Firstly, we let replace to supplant the gradient of the regularizer. It implicitly plays as an image prior. Considering that the noise level in is unknown and hard to estimate in a prior, a predefined is insufficient in practice. We then define an operator to handle the unknown noise and the varying estimation error (in ) by adjusting . implicitly tunes adaptively. Finally, we define as a functional operator to replace in each step to control the descent direction (i.e. ). and absorb the trade-off weight and the step length , respectively. As shown in Fig. 1 (b), by replacing the calculation entities in (4) with the mapping functions introduced above, the gradient descent optimizer at each step can be formulated as:

(5)

where denotes the parametrized gradient descent optimizer, and denotes the gradient generator consisting of , and . Given an initial (e.g. letting ), we can formulate the whole estimation model as

(6)

where denotes the composition operator, denotes a the -fold composition of , and denotes the set of all parameters of (i.e. the parameters of , and ). means the optimizer is performed times.

3.3 The Structure of the RGDN

We propose to formulate the model in equation (6) as a Recurrent Gradient Descent Network (RGDN). Considering that the updates of from an iterative optimization scheme naturally compose a sequence of arbitrary length, we use a universal gradient descent unit (GDU) to implement and apply it in all steps in a recurrent manner (see Fig. 1 (a)).

In each single GDU, the gradient generator takes a current prediction of size and generates a gradient with the same size. In , the subcomponents , and play as mapping functions with a same size for input and output as well. Considering that CNNs with an encoder-decoder architecture have been commonly used to model similar mapping functions, we implement , and using three CNNs with the same structure shown in Fig. 1 (c). Since finding the best structures for each subnetwork is not the main focus, we use the same structure as a default plain choice. Nevertheless, the three CNNs are trained with different parameters, resulting in different functions. We then construct the GDU by assembling the three CNNs according to the model in (5) (see Fig. 1 (b)).

As shown in Fig. 1, each trainable CNN consists of 3 convolution layers () and 3 transposed convolution () layers. Except for the first and the last layers, each or

is followed by a batch normalization (

) layer [33]

and a ReLU activation function. Following a widely used setting

[13], the first is only followed by a ReLU activation function. Apart from the last , we apply 64 convolution features for each and . The last maps the 64-channel intermediate features to a -channel RGB output, where

denotes the number of channels of the image. We set the stride size as 1 for all

and . Our contributions are agnostic to the specific implementation choice for the structure of each subnetwork corresponding to , and , respectively, which may be further tuned for better performance.

Some previous methods [8, 14] truncate the classic iterative optimization algorithm with fixed step numbers and rigidly train different parameters to process the images from previous steps. However, in principle, a fixed step number may not be proper for all images. Unlike them, towards learning a universal optimizer, the proposed RGDN shares parameters among the GDUs in all steps, which enables the optimizer (i.e. the shared GDU) to see and handle different states during the iterations. Training of the RGDN thus gives us flexibility to repeat the learned optimizer arbitrary times to approach the desired deconvolution results for different cases. We can stop the process relying on some stopping conditions as the classic iterative optimization algorithms.

3.4 Learning an Optimizer via Training an RGDN

Training loss We expect to determine the best model parameter that accurately estimates through training on a given dataset . We minimize the mean squared error (MSE) between the ground truth and the estimate over the training dataset:

(7)

Inspired by [11, 34], we also consider to minimize the gradient discrepancy in training:

(8)

where and

denote the operators calculating the image gradients in the horizontal and vertical directions, respectively. The loss function in (

8) is expected to help to produce sharp images [34]. For all experiments, the models are trained by minimizing the sum of and .

Recursive supervision and training objective Instead of solely minimizing the difference between the ground truth and the output of the final step, we impose recursive supervision [28] that supervises not only the final estimate but also the outputs of the intermediate steps (i.e. outputs of ’s, for ). The recursive supervision directly forces the output of each step to approach the ground truth, which accelerates the training and enhances the performance (see Section 4.3). Let denote the estimate of from the -th step. By averaging over all training samples and the steps, we have the whole training objective

where denotes the importance weight. Note that in the experiments. As shown in Fig. 2, the learned optimizer steadily pushes the results close to the ground truth, which is consistent with the recursive supervision.

Implementation details Although the number of steps the RGBN takes is not bounded in principle, considering the training efficiency, we run the optimizer for 5 steps in training (i.e. ). As shown in experiments, benefiting from parameter sharing and recursive supervision, training with a fixed step number does not interfere the generalization of the learned optimizer.

For training, we randomly initialize the parameters of RGDN. The training is carried out using a mini-batch Adam [35] optimizer. We set the batch size and learning rate as 4 and , respectively.

[trim=15 110 0 0,, clip, width=1] results/inter/levinK/y.jpg (a) and [trim=15 110 0 0, clip, width=1] results/inter/levinK/1.jpg (b) Step #3 [trim=15 110 0 0, clip, width=1] results/inter/levinK/2.jpg (c) Step #20 [trim=15 110 0 0,clip, width=1] results/inter/levinK/3.jpg (d) Step #30 [trim=15 110 0 0, clip, width=1] results/inter/levinK/4.jpg (e) Step #40 [trim=35 130 20 20, clip, width=1] results/inter/levinK/x_gt.jpg (f) Ground truth [trim=0 5 10 5, clip, width=1] results/inter/train/y.jpg (g) and [trim=0 5 10 5, clip, width=1] results/inter/train/01.jpg (h) Step #1 [trim=0 5 10 5, clip, width=1] results/inter/train/02.jpg (i) Step #2 [trim=0 5 10 5,clip, width=1] results/inter/train/03.jpg (j) Step #3 [trim=0 5 10 5, clip, width=1] results/inter/train/04.jpg (k) Step #5 [trim=0 5 10 5, clip, width=1] results/inter/train/x_gt.jpg (l) Ground truth
Figure 2: Intermediate results of RGDN. (a) and (g) are the input blurry images and the corresponding blur kernels . (a) is an image with noise level 0.15%. (g) is an image from the training data. Both the images and the kernels in (a) are not in the training set. (b)-(e) are the intermediate results of the RGDN at the steps #3, #20, #30 and #40. (h)-(k) show the results on steps #1-3 and #5, since we perform 5 steps during training. (f) and (l) are the ground truth images.

3.5 Deconvolution using RGDN

Although we train the RGDN using a fixed number of steps, benefiting from the parameter sharing and recursive supervision, we can achieve a universal optimizer after training on diverse samples. We thus can perform non-blind deconvolution using arbitrary times and step the processing relying on some certain stopping conditions as a classic optimizer. Fig. 2 shows that the qualities of the intermediate images steadily increase with the increasing steps. In the intermediate images, with more iterations, more details are recovered and the artifacts are suppressed by the learned optimizer. The learned optimizer is able to generally handle the varying visual appearance among the input images and the intermediate results as well as consistently improve the estimates. More numerical studies are in Section 4.4. The optimization can be stopped when achieving , where and is a small tolerance parameter. In practice, a maximum iteration number is also used as a stopping criterion.

4 Experiments

We conduct experiments with the proposed method for single image non-blind deconvolution. Our implementation is based on PyTorch

[36] and uses an NVIDIA TITAN Xp graphics card for acceleration. Our learned optimizer takes about 0.03 seconds for one step on an image of pixel, and roughly 0.2 seconds for an image less than one megapixel.

4.1 Datasets and Experimental Settings

Training To generate the triplet set for training, we crop 40,960 RGB images of pixels from the PASCAL VOC dataset [37] as the ground truth images . We then independently generate blur kernels according to [38] for each and generate blurred image based on model (1), which gives 204,800 triplets in total. After adding a Gaussian noise term from , 8-bit quantization is used following [14]. Instead of training a customized model for a specific blur kernel [9] or noise level [8, 18, 11], we uniformly sample kernel sizes from a set and noise levels from an interval 111An image with a ratio of Gaussian noise is generated by adding noise from for image with intensity range., which helps to evaluate the ability of the network to handle diverse data.

Testing The testing is performed on several benchmark datasets [39, 40, 41] that are independent to the training data. Considering that RGB images are predominant in reality, we trained our model on RGB images with 3 channels. To test on the benchmark dataset [39] of gray images, we replicate the single existing channel twice. Different noise levels are used to measure the robustness of the methods. In the experiments, we apply the stopping conditions introduced in Section 3.5 and set the maximum iteration number as if not indicated otherwise.

In the following, we will first conduct a full numerical comparison with other state-of-the-art methods, e.g. FD [5], the method of Levin et al. [4], EPLL [7], MLP [9], CSF [8], IRCNN [13] and FDN [14]. We then conduct a series of empirical analyses and ablation studies for the proposed method. Finally, quantitative comparisons between the methods are conducted on real-world images. It is worth noting that, in deconvolution, apart from the RGDN that is free of parameters, the parameters of all other methods are set using the ground truth noise level. We use the pairwise version of CSF [8] trained for deconvolution in comparison. The comparison with the CNN based baseline method [10] is absent since it needs fine-tuning for every blur kernel, making it unpractical. We measure the performance in terms of PSNR and SSIM [42]. Following [14], the regions close to the image boundary are discarded when calculating the measurements.

4.2 Numerical Evaluations on Synthetic Datasets

We first conduct experiments on three datasets with various types of images, blur kernels and noise levels. The testing images are independent to the training data.


Ground Truth Levin et al. [4] CSF [8] IRCNN[13] FDN [14] RGDN (ours)
Figure 3: Visual comparison on the images with different noise levels. The first two rows show results from the dataset [40] with . The bottom two rows show results on an image from the generated BSD-Blur dataset with .

Evaluation on grayscale image benchmark We first evaluate the performance of the methods on a widely used benchmark dataset of Levin et al. [39], which contains 32 blurry gray images (of pixels) from 4 clear images and 8 blur kernels. To deal with the gray images, we generate 3-channel images via replication. Images with different noise levels () are also generated by adding adding noise to all channels. Note that the noise level on the original blurry images are about , as discussed in [14]. The comparison on the three noise levels is shown in Table 1. IRCNN [13], FDN [14] and RGDN outperform other methods due to the deep neural networks that provide more powerful natural image priors. Although RGDN is trained as a noise-level-versatile model, its performance is better than other methods or competitive with the best one. The performance of EPLL [7] is close to the best one on this benchmark, however it is dozens of times slower than the proposed methods.

     Mea.  FD  Levin  EPLL  MLP  CSF  IRCNN  FDN  RGDN
 0.59% PSNR 32.36 33.60 34.35 31.55 32.08 33.35 36.15 35.04
SSIM 0.917 0.934 0.941 0.876 0.916 0.884 0.965 0.954
1% PSNR 30.85 32.01 32.45 30.68 28.12 33.14 33.62 33.68
SSIM 0.892 0.913 0.930 0.882 0.828 0.896 0.949 0.954
2% PSNR 28.84 29.92 30.03 28.16 21.68 30.09 29.70 31.01
SSIM 0.851 0.877 0.883 0.841 0.594 0.887 0.896 0.899
Table 1: Comparison on Levin et al. ’s dataset [39].

Evaluation on large RGB images To study the performance of the proposed methods on large images, we evaluate the methods on the dataset in [40]. We generate an RGB version of the benchmark [40] using the original 80 RGB images [43] and same 8 blur kernels from Levin et al. ’s dataset [39]. Three different noise levels are adopted. The average PSNR and SSIM values are shown in Table 2. The performance of RGDN is on par with or better than other methods. Maybe because FDN [14] takes the ground truth noise level as input, it achieves marginally better performance than the proposed method when the noise level is low (). Even though the RGDN is trained on the data with noise level lower than , it still performs well on high noise level data (i.e. and ), which also proves the generalization ability of the proposed method. IRCNN [13] also performs very well for large noise levels. An example of visual comparison is shown in Fig. 3.

     Mea.  FD  Levin  EPLL  MLP  CSF  IRCNN GradNet  FDN  RGDN
1% PSNR 29.90 30.29 32.05 31.01 28.32 30.44 31.75 32.52 32.33
SSIM 0.826 0.841 0.880 0.882 0.797 0.900 0.873 0.909 0.907
2% PSNR 29.08 28.81 29.60 27.82 20.06 29.47 29.31 29.04 29.59
SSIM 0.816 0.795 0.807 0.789 0.362 0.867 0.798 0.842 0.855
3% PSNR 23.19 28.00 28.25 25.30 16.66 28.05 28.04 24.41 28.45
SSIM 0.532 0.768 0.758 0.627 0.237 0.806 0.750 0.653 0.812
Table 2: Comparison on 640 RGB images from [40] and [43]. *Note that the scores of EPLL [7] and GradNet [22] are quoted from [22] as a reference.

Evaluation on images with large blur kernels and strong noise The above datasets only use 8 blur kernels from [39], whose sizes are limited to . To study the behaviors of the methods on large blur kernels, we generate a dataset (BSD-Blur) with 150 images by randomly selecting 15 images from the dataset BSD [41] and 10 blur kernels of size from [38]. In order to study the noise robustness, high noise levels (2%, 3% and 5%) are used. As shown in Table 3, IRCNN [13] and the proposed method significantly outperform the other methods. However, as shown in Fig. 3, the results of IRCNN [13] suffer from more ringing artifacts and over-smoothness, which may be related to the conventional HQS image updating scheme in IRCNN. The performance of FDN [14] degenerated quickly with increasing of noise level, although it is trained on a dataset with a similar noise level setting to ours. The proposed method achieves better generalization on the testing data. As shown in Fig. 3, the visual quality of the image recovered by the proposed method also outperforms the other methods. Even the input image is degenerated by severe noise, the proposed method can still recover a clear image with rich details.

      Mea.  FD  Levin  MLP  CSF IRCNN  FDN  RGDN
2% PSNR 23.60 22.70 19.23 17.40 22.29 23.48 24.27
SSIM 0.648 0.577 0.570 0.497 0.657 0.697 0.699
3% PSNR 20.65 22.12 19.71 15.15 22.03 20.25 23.17
SSIM 0.555 0.541 0.546 0.385 0.654 0.559 0.637
5% PSNR 6.410 21.48 19.87 12.51 21.10 9.090 21.80
SSIM 0.004 0.501 0.500 0.259 0.605 0.094 0.560
Table 3: Comparison on 150 images from BSD-Blur with larger blur kernel and strong noise.

Summary of the numerical studies The three experiments above prove the effectiveness and robustness of the proposed method on images with diverse blur kernels and noise levels. Comparing to the previous state-of-the-art methods requiring the ground truth noise level, the proposed method achieves better or competitive results with a parameter-free setting.

4.3 Ablation Study for RGDN

In this section, we perform an ablation study to analyze several aspects in terms of the structure of the RGDN. For simplicity, we run all the studies on the Levin et al. ’s dataset [39] with different noise levels used in Section 4.2.

Study on the structure of RGDN As shown in Fig. 1, RGDN mainly consists of three subnetworks corresponding to three parameterized operations , and , which are trained jointly. To verify the importance of each subnetwork, we conduct experiments by removing them from RGDN, respectively, and train the networks with the same setting for the complete RGDN. Table 4 shows that removing the regularization term substantially degrades results, showing that is crucial for RGDN. The RGDN without corresponds to the problem (2) without the regularizer , which suffers from the ill-posedness. Removing both the direction scaling operator and also significantly degrades the performances, as does removing . This may be interpreted as the deficiency of the ability to handle noise. Table 4 also shows that the performance degenerates without or , and the direction scaling operator plays a more important role than . We therefor conclude that all the three terms are important to the results, and they work interdependently.

     
 PSNR  SSIM  PSNR  SSIM  PSNR  SSIM
RGDN w/o 18.94 0.629 18.69 0.619 18.44 0.588
RGDN w/o 33.71 0.939 32.61 0.928 30.47 0.886
RGDN w/o 33.73 0.941 33.00 0.929 30.92 0.892
RGDN w/o and 11.33 0.211 11.32 0.210 11.30 0.202
RGDN w/o recursive supervision 22.04 0.744 21.54 0.709 20.82 0.655
RGDN 35.04 0.954 33.68 0.954 31.01 0.899
Table 4: Ablation study: performances of different variants of our method on Levin et al. ’s dataset [39]. Different structures and supervision settings are studied.

Study of the recursive supervision in training We use recursive supervision to accelerate training and enable the learned optimizer to push the image towards the ground truth in each step. We study the importance of the recursive supervision by removing the supervision on intermediate steps (i.e. only keeping the loss at the last step) and training the model with same setting as RGDN. As shown in Table 4, removing the recursive supervision incurs a significant performance degeneration due to the difficulties on training. The other possible reason is that only imposing supervision on the final step restricts the training merely minimizing the loss after a fixed number of steps, making the leaned optimizer less flexible.

4.4 Empirical Convergence Analysis

Since the neural networks are too complicated to derive some general properties for convergence analysis, we tend to empirically analyze the convergence of the learned optimizer in testing. To study the instance-specific convergence speed in the meantime, we select two images from Levin et al. ’s dataset [39] and BSD-Blur, respectively, and perform deconvolution on them. Fig. 4 shows the variation of the PSNR and the data fitting error with increasing iteration numbers. As more updating steps are performed, PSNR values smoothly increase and the fitting error decrease, which is consistent with results shown in Fig. 2. The empirical results in Fig. 4 show that the learned optimizer converges well after 30 iterations. Moreover, Fig. 4 also shows that different images require different numbers of steps for convergence. Comparing the the previous methods with a fixed number of steps [14, 8, 11], the learned universal optimizer provides a flexibility to fit the different requirements for different images.

(a) Levin et al. [39]
(b) BSD-Blur
Figure 4: PSNR / fitting error vs. iteration: empirical convergence analysis of the learned optimizer on image deconvolution. (a) and (b) are the results on the images from Levin et al. ’s dataset [39] and the BSD-Blur, respectively.

4.5 Visual Comparison on Real-world Images

In real-world applications, the non-blind deconvolution performs as a part of the blind deblurring [1, 44], where the ground truth blur kernel is unknown. The non-blind deconvolution is performed using some imprecise kernels estimated by other methods, e.g. [1, 45], which brings more challenges. We thus conduct experiments to study the practicability of the proposed method. Since the ground truth images are also unknown, we only present the visual comparison against the state-of-the-art methods.

We first test on a real-world image given a blur kernel estimated by [1]. As shown in Fig. 5, even though the input kernel is imprecise, the proposed method can recover the details of the blurry image and suppress the ringing artifacts due to the powerful learned optimizer. However, the results of other methods suffers from artifacts or over-smoothness due to the inaccurate kernel and the unknown noise level, which also shows that the proposed method is generally more practical in real-world scenarios.

[width=0.24trim=2cm 2cm 2cm 2cm, clip]results/res/real_03_.jpg

(a) Input

[width=0.24trim=2cm 2cm 2cm 2cm,, clip]results/res/real_03_results_levin.jpg

(b) Levin et al. [4]

[width=0.24trim=2cm 2cm 2cm 2cm,, clip]results/res/real_03__results_csf.jpg

(c) CSF [8]

[width=0.24trim=2cm 2cm 2cm 2cm, clip]results/res/real_03_results_schuler.jpg

(d) MLP [9]

[width=0.24trim=2cm 2cm 2cm 2cm, clip]results/res/real_03__results_epll.jpg

(e) EPLL [7]

[width=0.24trim=2cm 2cm 2cm 2cm,, clip]results/res/real_03__results_ircnn.jpg

(f) IRCNN [13]

[width=0.24trim=2cm 2cm 2cm 2cm,, clip]results/res/real_03__result_fft.jpg

(g) FDN [14]

[width=0.24trim=2cm 2cm 2cm 2cm, clip]results/res/real_03__results_ours.jpg

(h) RGDN (ours)

Figure 5: Deconvolution results on a real-world image.

Fig. 6 shows a comparison on a text image where the blur kernel is from [1]. The visual quality of the proposed method outperforms other methods, which implies that the proposed method can handle diverse images. The results of other methods suffer from heavy artifacts due to the imprecise blur kernel and unknown noise.

To assess the robustness of the propose method, we further test on a real-world blurry image with severe noise, in which the blur kernel is estimated using the method in [45]. As shown in Fig. 7, our restored image contains more details and suffers less ring artifacts than others. Fig. 7 (c) shows that IRCNN [13] does not competently handle the noise in the real-world image. Even though the result of FDN [14] may look sharp, it suffers from severe artifacts due the high noise level.

[trim =10 10 10 10, clip, width=1] results/eccv_text/y.png (a) Input [trim =10 10 10 10, clip, width=1] results/eccv_text/levin.jpg (b) Levin [4] [trim =10 10 10 10, clip, width=1] results/eccv_text/epll.jpg (c) EPLL [7] [trim =10 10 10 10, clip, width=1] results/eccv_text/mlp.jpg (d) MLP [9] [trim =10 10 10 10, clip, width=1] results/eccv_text/csf.jpg (e) CSF [8] [trim =18 18 18 18, clip, width=1] results/eccv_text/ircnn.jpg (f) IRCNN[13] [trim =10 10 10 10, clip, width=1] results/eccv_text/fdn.jpg (g) FDN [14] [trim =0mm 0mm 0mm 0mm, clip, width=1] results/eccv_text/ours.png (h) RGDN
Figure 6: Deconvolution results on a text image from [1].
[width=0.19trim=0 8.7cm 0 0cm, clip]results/res/real_10_.jpg

(a) Input

[width=0.19trim=0 8.7cm 0 0cm, clip]results/res/real_10_results_levin.jpg

(b) Levin et al. [4]

[width=0.19trim=0 8.7cm 0 0cm, clip]results/res/real_10__results_ircnn.jpg

(c) IRCNN [13]

[width=0.19trim=0 8.7cm 0 0cm, clip]results/res/real_10__result_fft.jpg

(d) FDN [14]

[width=0.19trim=0 8.7cm 0 0cm, clip]results/res/real_10__results.jpg

(e) RGDN (ours)

Figure 7: Deconvolution results on a real-world image with high level of noise. The images are best viewed by zooming in.

5 Conclusion

We have developed a Recurrent Gradient Descent Network (RGDN) which serves as an optimizer to de-convolute images. The components of the network are inspired by the key components of gradient descent method and designed accordingly. The proposed RGDN implicitly learns a prior and tunes adaptive parameters through the CNN components of the gradient generator. The network has been trained on a diverse dataset thus is able to restore a wide range of blur images much better than previous approaches. Our gradient descent unit is designed to handle Gaussian noise as specified in the -loss (as shown in (2)). One way to extend our network is to allow the gradient descent unit to model other type of noises or other losses.

References

  • [1] Pan, J., Hu, Z., Su, Z., Yang, M.H.: Deblurring text images via l0-regularized intensity and gradient prior.

    In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (2014) 2901–2908

  • [2] Xu, L., Jia, J.: Two-phase kernel estimation for robust motion deblurring. In: European Conference on Computer Vision (ECCV). (2010) 157–170
  • [3] Gong, D., Yang, J., Liu, L., Zhang, Y., Reid, I., Shen, C., Hengel, A.v.d., Shi, Q.:

    From motion blur to motion flow: a deep learning solution for removing heterogeneous motion blur.

    The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
  • [4] Levin, A., Fergus, R., Durand, F., Freeman, W.T.: Image and depth from a conventional camera with a coded aperture. ACM transactions on graphics (TOG) 26(3) (2007)  70
  • [5] Krishnan, D., Fergus, R.: Fast image deconvolution using hyper-laplacian priors. In: Advances in Neural Information Processing Systems (NIPS). (2009) 1033–1041
  • [6] Wang, Y., Yang, J., Yin, W., Zhang, Y.: A new alternating minimization algorithm for total variation image reconstruction. SIAM Journal on Imaging Sciences 1(3) (2008) 248–272
  • [7] Zoran, D., Weiss, Y.: From learning models of natural image patches to whole image restoration. In: The IEEE International Conference on Computer Vision (ICCV). (2011) 479–486
  • [8] Schmidt, U., Roth, S.: Shrinkage fields for effective image restoration. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (2014) 2774–2781
  • [9] Schuler, C.J., Christopher Burger, H., Harmeling, S., Scholkopf, B.:

    A machine learning approach for non-blind image deconvolution.

    In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (2013) 1067–1074
  • [10] Xu, L., Ren, J.S., Liu, C., Jia, J.: Deep convolutional neural network for image deconvolution. In: Advances in Neural Information Processing Systems (NIPS). (2014) 1790–1798
  • [11] Zhang, J., Pan, J., Lai, W.S., Lau, R., Yang, M.H.: Learning fully convolutional networks for iterative non-blind deconvolution. CVPR (2017)
  • [12] Chang, J.R., Li, C.L., Poczos, B., Kumar, B.V., Sankaranarayanan, A.C.: One network to solve them all—solving linear inverse problems using deep projection models. (2017)
  • [13] Zhang, K., Zuo, W., Gu, S., Zhang, L.: Learning deep cnn denoiser prior for image restoration. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (2017) 3929–3938
  • [14] Kruse, J., Rother, C., Schmidt, U.: Learning to push the limits of efficient fft-based image deconvolution. In: IEEE International Conference on Computer Vision (ICCV). (2017) 4586–4594
  • [15] Schmidt, U., Gao, Q., Roth, S.: A generative perspective on mrfs in low-level vision. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (2013) 1067–1074
  • [16] Goldstein, T., O’Donoghue, B., Setzer, S., Baraniuk, R.: Fast alternating direction optimization methods. SIAM Journal on Imaging Sciences 7(3) (2014) 1588–1623
  • [17] Sun, L., Cho, S., Wang, J., Hays, J.: Good image priors for non-blind deconvolution: Generic vs specific. In: European Conference on Computer Vision (ECCV). (2014) 231–246
  • [18] Schmidt, U., Jancsary, J., Nowozin, S., Roth, S., Rother, C.: Cascades of regression tree fields for image restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 38(4) (2016) 677–689
  • [19] Venkatakrishnan, S.V., Bouman, C.A., Wohlberg, B.: Plug-and-play priors for model based reconstruction. In: Global Conference on Signal and Information Processing (GlobalSIP), IEEE (2013) 945–948
  • [20] Heide, F., Diamond, S., Nießner, M., Ragan-Kelley, J., Heidrich, W., Wetzstein, G.: Proximal: Efficient image optimization using proximal algorithms. ACM Transactions on Graphics (TOG) 35(4) (2016)  84
  • [21] Geman, D., Yang, C.: Nonlinear image recovery with half-quadratic regularization. IEEE Transactions on Image Processing 4(7) (1995) 932–946
  • [22] Jin, M., Roth, S., Favaro, P.: Noise-blind image deblurring. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (2017)
  • [23] Li, K., Malik, J.: Learning to optimize. arXiv preprint arXiv:1606.01885 (2016)
  • [24] Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M.W., Pfau, D., Schaul, T., de Freitas, N.: Learning to learn by gradient descent by gradient descent. In: Advances in Neural Information Processing Systems (NIPS). (2016) 3981–3989
  • [25] Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. (2016)
  • [26] Wieschollek, P., Hirsch, M., Schölkopf, B., Lensch, H.: Learning blind motion deblurring. The IEEE International Conference on Computer Vision (ICCV) (2017)
  • [27] Kim, T.H., Lee, K.M., Schölkopf, B., Hirsch, M.: Online video deblurring via dynamic temporal blending network. arXiv preprint arXiv:1704.03285 (2017)
  • [28] Kim, J., Kwon Lee, J., Mu Lee, K.: Deeply-recursive convolutional network for image super-resolution. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (2016) 1637–1645
  • [29] Liu, S., Pan, J., Yang, M.H.: Learning recursive filters for low-level vision via a hybrid neural network. In: European Conference on Computer Vision (ECCV). (2016) 560–576
  • [30] Parikh, N., Boyd, S., et al.: Proximal algorithms. Foundations and Trends® in Optimization 1(3) (2014) 127–239
  • [31] Gong, D., Tan, M., Zhang, Y., van den Hengel, A., Shi, Q.: Mpgl: An efficient matching pursuit method for generalized lasso. In: AAAI. (2017) 1934–1940
  • [32] Wright, S., Nocedal, J.: Numerical optimization. Springer Science 35(67-68) (1999)  7
  • [33] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (ICML). (2015) 448–456
  • [34] Fan, Q., Yang, J., Hua, G., Chen, B., Wipf, D.: A generic deep architecture for single image reflection removal and image smoothing. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV). (2017)
  • [35] Kinga, D., Adam, J.B.: A method for stochastic optimization. In: International Conference on Learning Representations (ICLR). (2015)
  • [36] https://github.com/pytorch/pytorch
  • [37] Everingham, M., Eslami, S.M.A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision (IJCV) 111(1) (2015) 98–136
  • [38] Chakrabarti, A.: A neural approach to blind motion deblurring. In: European Conference on Computer Vision (ECCV). (2016) 221–235
  • [39] Levin, A., Weiss, Y., Durand, F., Freeman, W.T.: Understanding and evaluating blind deconvolution algorithms. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE (2009) 1964–1971
  • [40] Sun, L., Cho, S., Wang, J., Hays, J.: Edge-based blur kernel estimation using patch priors. In: IEEE International Conference on Computational Photography (ICCP), IEEE (2013) 1–8
  • [41] Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 33(5) (2011) 898–916
  • [42] Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13(4) (2004) 600–612
  • [43] Sun, L., Hays, J.: Super-resolution from internet-scale scene matching. In: IEEE International Conference on Computational Photography (ICCP), IEEE (2012) 1–12
  • [44] Gong, D., Tan, M., Zhang, Y., van den Hengel, A., Shi, Q.: Self-paced kernel estimation for robust blind image deblurring. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2017) 1661–1670
  • [45] Gong, D., Tan, M., Zhang, Y., Van den Hengel, A., Shi, Q.: Blind image deconvolution by automatic gradient activation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (2016) 1827–1836