Simultaneous Fidelity and Regularization Learning for Image Restoration

04/12/2018 ∙ by Dongwei Ren, et al. ∙ University of California, Merced 2

Most existing non-blind restoration methods are based on the assumption that a precise degradation model is known. As the degradation process can only partially known or inaccurately modeled, images may not be well restored. Rain streak removal and image deconvolution with inaccurate blur kernels are two representative examples of such tasks. For rain streak removal, although an input image can be decomposed into a scene layer and a rain streak layer, there exists no explicit formulation for modeling rain streaks and the composition with scene layer. For blind deconvolution, as estimation error of blur kernel is usually introduced, the subsequent non-blind deconvolution process does not restore the latent image well. In this paper, we propose a principled algorithm within the maximum a posterior framework to tackle image restoration with a partially known or inaccurate degradation model. Specifically, the residual caused by a partially known or inaccurate degradation model is spatially dependent and complexly distributed. With a training set of degraded and ground-truth image pairs, we parameterize and learn the fidelity term for a degradation model in a task-driven manner. Furthermore, the regularization term can also be learned along with the fidelity term, thereby forming a simultaneous fidelity and regularization learning model. Extensive experimental results demonstrate the effectiveness of the proposed model for image deconvolution with inaccurate blur kernels and rain streak removal. Furthermore, for image restoration with precise degradation process, e.g., Gaussian denoising, the proposed model can be applied to learn the proper fidelity term for optimal performance based on visual perception metrics.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 4

page 9

page 10

page 11

page 12

page 14

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Image restoration that aims to recover the latent clean image from a degraded observation is a fundamental problem in low-level vision. However, the degradation generally is irreversible, making image restoration an ill-posed inverse problem. While significant advances have been made in the past decades, it is challenging to develop proper models for various image restoration tasks.

In general, the linear degradation process of a clean image can be modeled as

(1)

where is additive noise, is degradation operator, and is degraded observation. By changing the settings of the degradation operator and noise type, they can be applied to different image restoration tasks. For example,

can be an identity matrix for denoising, a blur kernel convolution for deconvolution, and a downsampling operator for super-resolution, to name a few. The maximum a posterior (MAP) model for image restoration can then be formulated as

(2)

where is a trade-off parameter, is the regularization term associated with image prior, and the fidelity term is specified by degradation as well as noise  [1, 2, 3]. Assuming the noise is additive white Gaussian, the fidelity term can be characterized by the -norm.

When the degradation operator is precisely known, noise and image prior models play two key roles in the MAP-based image restoration model. Two widely-used types of noise distributions are Gaussian and Poisson. Other distributions, e.g., hyper-Laplacian [4]

, Gaussian Mixture Model (GMM) 

[5] and Mixture of Exponential Power (MoEP) [6], are also introduced for modeling complex noise. For image prior, gradient-based models, e.g., total variation [7] and hyper-Laplacian distribution [1], are first studied due to simplicity and efficiency. Subsequently, patch-based [2] and non-local similarity [8, 9] models are developed to characterize more complex and internal dependence among image patches. Recently, data-driven and task-driven learning methods have also been exploited to learn regularization from training images. The approach based on fields of experts (FoE) [10] is designed to learn the distribution of filter responses on images. Following the FoE framework, numerous discriminative learning approaches, e.g., cascaded shrinkage field (CSF) [3], trainable non-linear reaction diffusion (TNRD) [11, 12] and universal denoising network (UNET) [13], use the stage-wise learning scheme to enhance the restoration performance as well as computational efficiency.

However, the precise degradation process for most restoration tasks is not known and thus the degradation process is modeled as

(3)

In the restoration stage, only the model parameter is known, while the form , the noise type or the parameters are unknown. Here we define this problem as image restoration with partially known or inaccurate degradation models.

(a)
Blurry image Ground-truth ROBUST [14] SFARL
(b)
Degraded image Ground-truth DCNN[15] SFARL
(c)
Rainy image Ground-truth DDNET[16] SFARL
Fig. 1: Illustration of the SFARL model on three restoration tasks. (a) In image deconvolution with inaccurate blur kernels, The SFARL method is effective in relieving the ringing artifacts. (b) For deconvolution along with saturation, Gaussian noise and JPEG compression, the SFARL model can achieve visually plausible result with less noises than DCNN [15]. (c) For rain streak removal, the SFARL model can produce more clean image than DDNET [16].

Image deconvolution with inaccurate blur kernels and rain streak removal are two representative image restoration tasks with partially known or inaccurate degradation models. Image deconvolution with an inaccurate blur kernel is a subproblem of blind deconvolution which generally includes blur kernel estimation and non-blind deconvolution. In the blur kernel estimation stage, the kernel error generally is inevitable to be introduced by a specific method [17, 18, 19, 20, 21, 22]. In the non-blind deconvolution stage, the degradation model can then be written as

(4)

where denotes the 2D convolution operator. Thus, the subproblem in the non-blind deconvolution stage is equivalent to image deconvolution with inaccurate blur kernels. Based on (3), we have , but is unknown. Existing non-blind deconvolution methods are sensitive to kernel error and usually result in ringing and other artifacts [1, 2], as shown in Figure 1.

For rain streak removal, an input image can be represented as the composition of a scene image layer and a rain streak layer . However, it remains challenging to model rain streak with any explicit formulation. On one hand, a linear summation is usually used for combining the scene image and rain streak layers [23, 24]. On the other hand, it has been suggested [25] that a complex model based on screen blend is more effective for combining the scene image and rain streak layers,

(5)

where denotes the element-wise product. By setting , rain streak removal can be treated as an image restoration problem with a partially known degradation model, i.e., both and cannot be explicitly modeled in the deraining stage. As shown in Figure 1, the method [24] is less effective for modeling rainy scenes, resulting in an over-smooth image with visible streaks.

Image restoration with partially known or inaccurate degradation models cannot be simply addressed by noise modeling. From (3), we define the residual image as

(6)

Due to the introduction of , even is white, the residual is spatially dependent and complexly distributed. Although several noise models have been suggested for complex noise modeling, these are all based on the independent and identically distributed (i.i.d.) assumption and ineffective for modeling the spatial dependency of the residual. Furthermore, the characteristics of is task specific and there exists no universal model that can be applied to all problems, thereby making it more challenging to solve (6).

Recently, deep CNN-based methods have achieved considerable progress on some low level vision tasks [26, 27, 28, 29, 30], e.g., rain streak removal [16, 31, 32], non-blind deconvolution [33, 34, 15] and Gaussian denosing [35]. These CNN methods, however, either do not take partially known degradations into consideration, or simply address this issue by learning a direct mapping from degraded image to ground-truth. In comparison with CNN-based models, we aim at providing a principled restoration framework for handling partially known or inaccurate degradations.

In this paper, we propose a principled fidelity learning algorithm for image restoration with partially known or inaccurate degradation models. For either kernel error caused by a specific kernel estimation method or rain streaks, the resulting residual is not entirely random and can be characterized by spatial dependency and distribution models. Thus, a task-driven scheme is developed to learn the fidelity term from a training set of degraded and ground-truth image pairs. For modeling spatial dependence and complex distribution, the residual is characterized by a set of nonlinear penalty functions based on filter responses, leading to a parameterized formulation of the fidelity term. Such a fidelity term is effective and flexible in modeling complex residual patterns and spatial dependency caused by partially known or inaccurate degradation for a variety of image restoration tasks. Furthermore, for different tasks (e.g., rain streak removal and image deconvolution), the residual patterns are also different. With task-driven learning, the proposed method can adaptively tailor the fidelity term to specific inaccurate or partially known degradation models.

We show that the regularization term can be parameterized and learned along with the fidelity term, resulting in our simultaneous fidelity and regularization learning (SFARL) model. In addition, we characterize the regularizer by a set of nonlinear penalty functions on filters responses of clean image. The SFARL model is formulated as a bi-level optimization problem where a gradient descent scheme is used to solve the inner task and stage-wise parameters are learned from the training data. Experimental results on image deconvolution and rain streak removal demonstrate the effectiveness of the SFARL model in terms of quantitative metrics and visual quality (see Figure 1(a)(b)(c)). Furthermore, for image restoration with precise degradation process, e.g., non-blind Gaussian denoising, the SFARL model can be used to learn the proper fidelity term for optimizing visual perception metrics, and obtain results with better visual quality (see the results in the supplementary material).

In CSF [3], TNRD [12], and UNET [13], similar parametric formulation has been adopted to model natural image prior, and discriminative learning is employed to boost restoration performance. However, the degradation in these methods is assumed as precisely known, and thus the fidelity term is explicitly specified, e.g., -norm for deconvolution with ground-truth kernel. But in practical applications, the degradation process is usually partially known, e.g., inaccurately estimated blur kernel, separation of rain layer and background layer and combination of multiple degradations. In comparison, our SFARL model aims at providing a principled restoration framework, in which fidelity term is flexible and effective to model partially known degradation and can be jointly learned with the regularization terms during training. As a result, when applied to image restoration with partially known or inaccurate degradation models, SFARL can be trained to perform favorably in comparison with TNRD and the state-of-the-arts.

The contributions of this work are summarized as follows:

  • We propose a principled algorithm for image restoration with partially known or inaccurate degradation. Give an image restoration task, our model can adaptively learn the proper fidelity term from the training set for modeling the spatial dependency and highly complex distribution of the task-specific residual caused by partially known or inaccurate degradation.

  • We present a bi-level optimization model for simultaneous learning of the fidelity term as well as regularization term, and stage-wise model parameters for task-specific image restoration.

  • We carry out experiments on rain streak removal, image deconvolution with inaccurate blur kernels and deconvolution with multiple degradations to validate the effectiveness of the SFARL model.

2 Related Work

For specific vision tasks, numerous methods have been proposed for image deconvolution with inaccurate blur kernels and rain streak removal. However, considerably less effort has been made to address image restoration with partially known or inaccurate degradation models. In this section, we review related topics most relevant to this work, including noise modeling, discriminative image restoration, image deconvolution with inaccurate blur kernels, and rain streak removal.

2.1 Noise Modeling

For vision tasks based on robust principal component analysis (RPCA) or low rank matrix factorization (LRMF), noise is often assumed to be sparsely distributed and can be characterized by

-norms [4, 36]. However, the noise in real scenarios is usually more complex and cannot be simply modeled using -norms. Consequently, GMM and its variants have been used as universal approximations for modeling complex noise. In RPCA models, Zhao et al. [37] use a GMM model to fit a variety of noise types, such as Gaussian, Laplacian, sparse noise and their combinations. For LRMF, GMM is used to approximate unknown noise, and its effectiveness has been validated in face modeling and structure from motion [5]

. In addition, a GMM model is also extended for noise modeling by low rank tensor factorization

[38], and generalized to the Mixture of exponential power (MoEP) scheme [6] for modeling complex noise. To determine the parameters of a GMM model, the Dirichlet process has been suggested to estimate the number of Gaussian components under variational Bayesian framework [39]. Recently, the weighted mixture of -norm, -norm [40] and Gaussian [41, 42] models have also been used for blind denoising with unknown noise.

However, noise modeling cannot be readily used to address image restoration with partially known or inaccurate degradation models. The residual caused by inaccurate degradation is not i.i.d. Thus, both spatial dependency and complex noise distribution need to be considered to characterize the residual.

2.2 Discriminative Image Restoration

In a MAP-based image restoration model, the regularization term is associated with a statistical prior and assumed to be learned solely based on clean images in a generative manner, e.g., K-SVD [43], GMM [2], and FoE [10]. Recently, discriminative learning has been extensively studied in image restoration. In general, discriminative image restoration aims to learn a fast inference procedure by optimizing an objective function using a training set of the degraded and ground-truth image pairs. One typical discriminative learning approach is to combine existing image prior models with truncated optimization procedures [44, 45]. For example, CSF [3, 46] uses truncated half-quadratic optimization to learn stage-wise model parameters of a modified FoE. On the other hand, TNRD [11, 12] unfolds a fixed number of gradient descent inference steps. Non-parametric methods, such as regression tree fields (RTF) [44, 45] and filter forests [47], are also used for modeling image priors.

Existing discriminative image restoration methods, however, are all based on the precise degradation assumption. These algorithms focus on learning regularization terms in a discriminative framework such that the models can be applied to arbitrary images and blur kernels. In contrast, we propose a discriminative learning algorithm that considers both fidelity and regularization terms, and apply it to image restoration with partially known or inaccurate degradation models.

2.3 Image Deconvolution with Inaccurate Blur Kernels

Typical blind deconvolution approaches consist of two stages: blur kernel estimation and non-blind deconvolution. Existing methods mainly focus on the first stage [18, 19, 48, 22], and considerable attention has been paid to blur kernel estimation. For the second stage, conventional non-blind deconvolution methods usually are used to restore the clean image based on the estimated blur kernels. Despite significant progress has been made in blur kernel estimation, errors are inevitable introduced after the first stage. Furthermore, non-blind deconvolution methods are not robust to kernel errors, and artifacts are likely to be introduced or exacerbated during deconvolution [1, 2].

One intuitive solution is to design specific image priors to suppress artifacts [49, 50, 51, 52]. To the best of our knowledge, there exists only one attempt [14] to implicitly model kernel error in fidelity term,

(7)

Here the residual is defined as , where is associated with the -norm, and is additive white Gaussian noise. However, a method based on with the -norm does not model the spatial dependency of residual signals. The method [14] alleviates the effect of kernel errors at the expense of potential over-smoothing restoration results. A recent deep CNN-based approach, i.e., FCN [34], receives multiple inputs with complementary information to produce high quality restoration result. But FCN relies on tuning parameters of non-blind deconvolution method to provide proper network inputs. In this work, we focus on the second stage of blind deconvolution, and propose the SFARL model to characterize the kernel error of a specific kernel estimation method.

2.4 Rain Streak Removal

Rain streak and scene composition models are two important issues for removing rain drops from input images. Based on the linear model , the MAP-based deraining model can be formulated as

(8)

where denotes the regularization term of the rain streak layer, and the inequality constraints are introduced to obtain non-negative solutions of and [24].

In [23], hand-crafted regularization is employed to impose smoothness on the image layer and low rank on the rain streak layer. In [24], both image and rain streak layers are modeled as GMMs that are separately trained on clean patches and rain streak patches. Based on the screen blend model, Luo et al. [25] use the discriminative dictionary learning scheme to separate rain streaks by enforcing that two layers need to share fewest dictionary atoms. Recently, specifically designed CNN models [16, 32] have achieved progress in rain streak removal. Instead of using explicit analytic models, the SFARL method is developed based on a data-driven learning approach to accommodate the complexity and diversity of rain streak and scene composition models.

3 Proposed Algorithm

We consider a class of image restoration problems, where the degradation model is partially known or inaccurate but a training set of degraded and ground-truth image pairs is available. To handle these problems, we use a flexible model to parameterize the fidelity term caused by partially known or inaccurate degradation. For a given problem, a task-driven learning approach can then be developed to obtain a task-specific fidelity term from training data.

In this section, we first present our method for parameterizing the fidelity term to characterize the spatial dependency and complex distribution of the residual images. In addition, the regularization term is also parameterized, resulting in our simultaneous fidelity and regularization learning model. Finally, we propose a task-driven manner to learn the proposed model from training data.

3.1 Fidelity Term

The fidelity term is used to characterize the spatial dependency and highly complex distribution of the residual image . On one hand, the popular explicit formulation, e.g., -norm and -norm, cannot model the complex distribution of residual image . Due to the i.i.d. assumption, the existing noise modeling approaches, e.g., GMM [37] and MoEP [6], also cannot be readily adopted to model spatial dependency in fidelity term. On the other hand, the residual generally is spatially dependent and complicatedly distributed. Motivated by the success discriminative regularization learning [3, 11], we also use a set of linear filters with diverse patterns to model the spatial dependency in . Moreover, due to the effect of and its combination with , the filter responses remain of complex distribution. Therefore, a set of non-linear penalty functions is further introduced to characterize the distribution of filter responses.

To sum up, we propose a principled residual modeling in the fidelity term as follows,

(9)

where is the degradation operator defined in (1) and is the 2D convolution operator. In the proposed fidelity term, the parameters include . When , is delta function and is the squared -norm, the proposed model (9) is equivalent to the standard MAP-based model in (2).

Due to the introduction of linear filters and penalty functions , the proposed fidelity term can describe the complex patterns in residual caused by partially known or inaccurate degradation models. Furthermore, our fidelity model is flexible and applicable to different tasks. With proper training, it can be specified to certain image restoration tasks, such as rain streak removal, image deconvolution with inaccurate blur kernels. It is worth noting that the fidelity term in (9) can be regarded as a special form of convolution layer in CNN. Nonetheless, the fidelity term (9) can retain better interpretability and flexibility in characterizing residual . In particular, the learned s and s are closely related to the characteristics of redidual (see an example in the supplementary material). Moreover, the distribution of

generally is much more complex, and cannot be simply characterized by ReLU and its variants in conventional CNN.

3.2 Regularization Term

To increase modeling capacity on image prior, the regularization term is further parameterized as

(10)

where is the -th linear filter, is the corresponding non-linear penalty function, and is the number of linear filters and penalty functions for the regularization term. The parameters for the regularization term include . The proposed model is the generalization of the FoE [10] model by parameterizing the regularization term with both the filters and penalty functions. Similar models have also been used in discriminative non-blind image restoration [3, 11, 13].

3.3 SFARL Model

Given a specific image restoration task, the parameters for the fidelity and regularization terms need to be specified. As a large number of parameters are involved in and , it is not feasible to manually determine proper values. In this work, we propose to learn the parameters of both fidelity and regularization terms in a task-driven manner.

Denote a training set of samples by , where is the -th degraded image and is the corresponding ground-truth image. The parameters can be learned by solving the following bi-level optimization problem,

(11)

where is the feasible solution space. For image deconvolution with an inaccurate blur kernel, the feasible solution is only constrained to be in real number space, i.e., . For rain streak removal, additional constraints on the feasible solution space are required, i.e., , where (and ) is the -th element of clean image (and rainy image ). In principle, the trade-off parameter

can be absorbed into the non-linear transform

and removed from the model (11). However, the trade-off between the fidelity and regularization terms cannot be easily made due to that the scales of and vary for different restoration tasks, thereby making it necessary to include in (11).

The loss function

measures the dissimilarity between the output of the SFARL model and the ground-truth image. One representative loss used in discriminative image restoration is based on the mean-squared error (MSE) [11],

(12)

For image restoration when the precise degradation process is known, the optimal fidelity term in terms of MSE becomes the negative log-likelihood. The standard MAP model can then be used in the inner loop of the bi-level optimization task. Thus, the MSE loss is only applicable to learning fidelity term for image restoration with partially known or inaccurate degradation models.

In this work, we use the visual perception metric, e.g., negative SSIM [53, 54], as the loss function,

(13)

The reason of using negative SSIM is two-fold. On one hand, it is known that SSIM is closely related to visual perception of image quality, and minimizing negative SSIM is expected to benefit the visual quality of restoration result. On the other hand, even for image restoration with precise degradation process, the negative log-likelihood will not be the optimal fidelity term when the negative SSIM loss is used. Thus the residual model (9) can be utilized to learn proper fidelity term from training data for either image deconvolution with inaccurate blur kernels, rain streak removal, or Gaussian denoising. In addition, the experimental results also validate the effectiveness of negative SSIM and residual modeling in terms of both visual quality and perception metric.

4 SFARL Training

In this section, we first present an iterative solution to inner task in the bi-level optimization problem. The SFARL model is then parameterized and gradient-based optimization algorithm can be used for training. The SFARL model is trained by sequentially performing greedy training in Algorithm 2 and joint fine-tuning in Algorithm 3. Finally, the derivations of gradients for the greedy and end-to-end training processes are presented.

4.1 Iterative Solution to Inner Optimization Task

The inner task in (11) implicitly defines a function on the model parameters. As the optimization problem is non-convex, it is difficult to obtain the explicit analytic form of either or . In this work, we learn by considering the truncation of an iterative optimization algorithm [3, 46, 11, 12]. Furthermore, the stage-wise model parameters are also used to improve image restoration[3, 11].

To solve (11), the updated solution can then be written as a function of and , i.e., . Suppose that are known. The stage-wise parameters can then be learned by solving the following problem,

(14)

Here we use a gradient descent method to solve the inner optimization loop, and can be written as

(15)

where the influence functions are defined as and

. These functions are entry-wisely performed on a vector or matrix. In addition,

and are filters by rotating and 180 degrees, respectively. After each gradient descent step, is projected to the feasible solution space . The inference procedure is shown in Algorithm 1.

We use ADAM [55] to solve the optimization problem in (14). Therefore, we need to present the parameterization of the solution in (15) and derive the gradients for the greedy and end-to-end learning processes.

1:Current result , degraded image , degradation operator , model parameters
2:Restoration results
3:for  do
4:      Compute using (15)
5:end for
6:
Algorithm 1 SFARL
1:Training data
2:SFARL parameters
3:Set stage number

, epoch number

, mini-batch size , mini-batch number
4:Initialize: ,
5:for  do
6:      for  do
7:            for  do
8:                 Prepare -th mini-batch data:
9:                 Forward samples in -th mini-batch:
10:                 Compute gradients for stage :
11:                 Use Adam to optimize stage parameters
12:            end for
13:      end for
14:end for
Algorithm 2 Greedy Training
1:Training data , model parameters
2:SFARL parameters
3:Set epoch number , mini-batch size , mini-batch number
4:Initialize
5:for  do
6:      for  do
7:            Prepare -th mini-batch data:
8:            Forward samples in -th mini-batch:
9:            Compute gradients for each stage:
10:            Use Adam to end-to-end optimize parameters
11:      end for
12:end for
Algorithm 3 Joint Fine-tuning

4.2 Parameterization

Similar to [3, 11], we use the weighted summation of Gaussian RBF functions to parameterize the influence functions in regularization term

(16)

and in fidelity term

(17)

where and are weight coefficients, is mean value and is precision.

The filters in regularization term and in fidelity term are specified as linear combination of DCT basis with unit norm constraint,

(18)

where is complete DCT basis, is DCT basis by excluding the DC component, and are coefficients for regularization term and fidelity term respectively.

In our implementation, we utilize filters with size in both regularization term and fidelity term. Thus, the numbers of non-linear functions and filters can be accordingly set, i.e., for regularization term, and for fidelity term. The numbers of Gaussian functions are fixed to for both fidelity and regularization terms, i.e.,

. To handle the boundary condition in convolution operation, the image is padded for processing and only the valid region is cropped for output.

4.3 Greedy Training

The SFARL model is firstly trained stage-by-stage. To learn the model parameters of stage

, we need to compute gradient by the chain rule,

(19)

4.3.1 Deviation of

When the loss function is specified as MSE, i.e., , the gradient can be simply computed as

(20)

Visual perception metric, i.e., negative SSIM

When the loss function is specified as visual perception metric, i.e., [53, 54], we give the gradient deviation as follows. To distinct the entire image and small patch, only in this subsection we use and as entire image and reference image respectively. The SSIM value is computed based on the small patches and

(21)

where is the number of patches. The value on each patch is computed as

(22)

where is mean value of patch ,

is variance of patch

, and is covariance of pathes and , and , are some constant values. Let us define , , and . Then we have .

The gradient of negative SSIM is

(23)

where

(24)

For simplicity, we hereafter use to denote for both MSE and negative SSIM.

4.3.2 Deviation of

Since the parameterization of fidelity term and regularization term is similar, we only use the fidelity term as an example, and it is easy to extend it to the regularization term.

Weight parameter

The gradient with respect to is

(25)

The overall gradient with respect to is

(26)

Filter

The function with respect to each filter can be simplified to,

(27)

where denotes a constant which is independent with . Let us define and . Thus, we can obtain the gradient deviation as

(28)

Based on the convolution theorem [56], we have

(29)

where and are sparse convolution matrices of and , respectively. Thus, the first term in (28) is

(30)

where rotates matrix by 180 degrees.

For the second term, we introduce an auxiliary variable , , and we have . We note that

Therefore, we have

(31)

where is a diagonal matrix. The gradient of is

(32)

Since the filter is specified as linear combination of DCT basis, one need to derive the gradient with respect to the combination coefficients , i.e.,

(33)

By introducing , we then have

(34)

Finally, the overall gradient with respect to combination coefficients is given by

(35)

Non-linear function

We first reformulate the function with respect to into the matrix form

(36)

where . Therefore, the column vector can be reformulated into the matrix form,

(37)

where is the vectorized version of parameters , matrix is

Thus, we can get

(38)

and finally the overall gradient with respect to is

(39)

In our implementation, we do not explicitly compute the matrix , since they can be efficiently operated via 2D convolution.

4.4 Joint Fine-tuning

Once the greedy training process for each stage is carried out, an end-to-end training process is used to fine-tune all the parameters across stages. The joint training loss function is defined as

(40)

where is the maximum iteration number. The gradient can be computed by the chain rule,

(41)

where only need to be additionally computed. By reformulating the solution in the matrix form,

(42)

the gradient can be computed as

(43)

where is also a diagonal matrix.

Once is computed, the overall gradient can be computed by the chain rule and the other gradient parts in (41) can be borrowed from greedy training.

4.4.1 Training Procedure

Given a training dataset, the training of SFARL is to sequentially run greedy training as Algorithm 2 and joint fine-tuning as Algorithm 3. Algorithm 1 lists the inference of SFARL given model parameters, in which all the intermediate results are recorded for backward propagation during training. In greedy training , parameters in previous stages are fixed, and only gradients in stage are computed and are fed to ADAM algorithm. In joint fine-tuning, gradients in each stage are computed, and are fed to ADAM algorithm to optimize the parameters for all the stages.

5 Experimental Results

In this section, we evaluate the proposed SFARL algorithm on several restoration tasks, i.e., image deconvolution either with an inaccurate blur kernel or with multiple degradations, rain streak removal from a single image. SFARL can also be evaluated on Gaussian denoising, and we have presented the results in the supplementary material. In our experiments, filters are adopted in both fidelity and regularization terms. As for stage number, we recommend to set it based on the convergence behavior during greedy training, and empirically use 10-stage SFARL for image deconvolution, and 5-stage SFARL for rain streak removal and Gaussian denoising. During training SFARL, greedy training ends with 10 epoches for each stage, and then the parameters are further jointly fine-tuned with 50 epochs. We use ADAM [55] to optimize these SFARL models with learning rate , and . Using rain streak removal as an example, it takes about 19 hours to train a SFARL model on a computer equipped with a GTX 1080Ti GPU. The SFARL models are quantitatively and qualitatively evaluated and compared with state-of-the-art conventional and deep CNN-based approaches.

More experimental settings and results are included in the supplementary material. The testing codes are available at https://github.com/csdwren/sfarl, and the training codes will also be given after this paper is accepted.

5.1 Deconvolution with Inaccurate Blur Kernels

We consider the blind deconvolution task and use two blur kernel estimation methods, i.e., Cho and Lee [48] and Xu and Jia [18], for experiments. For each estimation approach, we evaluate the performance of SFARL for handling approach-specific blur kernel estimation error. To construct the training dataset, we use eight blur kernels [57] on 200 clean images from the BSD dataset [58]. The Gaussian noise with is added to generate the blurry images. The methods by Cho and Lee [48] and Xu and Jia [18] are used to estimate blur kernels. Thus, we have 1,600 training samples for each blur kernel estimation approach. To ensure the training sample quality, we randomly select 500 samples with error ratio [57] above 3 for each image deconvolution method.

Kernel estimation EPLL[2] ROBUST[14] IRCNN[33] SFARL
Cho and Lee [48] 0.8801 0.8659 0.8825 0.8903
Xu and Jia [18] 0.9000 0.8917 0.9023 0.9164
TABLE I: Quantitative SSIM results on the dataset by Levin et al. [57].
Blurry image EPLL [2] ROBUST [14] IRCNN [33] SFARL
Fig. 2: Visual quality comparison on Levin et al.’s dataset [57].
Blurry images IRCNN [33] ROBUST [14] SFARL
Fig. 3: Deblurring results on real blurry images, in which blur kernels are estimated by Xu and Jia [18].

On the widely used synthetic dataset, i.e., Levin et al. [57], we compare our SFARL with EPLL [2], ROBUST [14] and IRCNN [33]. The testing dataset includes 4 clean images and 8 blur kernels. The blur kernels are estimated by Cho and Lee [48] and Xu and Jia [18]. Table I lists the average SSIM values of all evaluated methods on the dataset by Levin et al. [57]. Overall, the SFARL algorithm performs favorably against the other methods in terms of SSIM. From Table I, we also have the following observations. First, the SFARL algorithm models the residual images by specific blur kernel estimation method to improve restoration result. For each blur kernel estimation method, what we need to do is to retrain the SFARL model from the synthetic data. Second, when the estimated blur kernel is more accurate (e.g., Xu and Jia [18]), better quantitative performance indexes are also attained by our SFARL.

We evaluate the SFARL algorithm against the state-of-the-art methods on a synthetic and a real blurry images in Figures 2 and 3. The blur kernels are estimated using the method by Xu and Jia [18]. As the blur kernel can be accurately estimated in Fig. 2, all the evaluated methods perform well and the SFARL algorithm restores more texture details. On the other hand, the estimated blur kernel is less accurately estimated in Fig. 3. Among all the evaluated methods, the deblurred image by the SFARL algorithm is sharper with fewer ringing effects than those by the other methods. We note that IRCNN [33] use the -norm in the fidelity term and the ROBUST scheme [14] introduces an -norm regularizer on the residual caused by kernel error. However, both -norm and -norm are limited in modeling the complex distribution of the residual, and neither GMM prior in EPLL nor deep CNN prior in IRCNN cannot well compensate the effect caused by inaccurate blur kernels. Thus, the performance gain of the SFARL model can be attributed to its effectiveness in characterizing the spatial dependency and complex distribution of residual images.

5.2 Deconvolution with Multiple Degradations

We consider a more challenging deconvolution task [15], in which blur convolution is followed by multiple degradations including saturation, Gaussian noise and JPEG compression. SFARL is compared with DCNN [15], Whyte [59], IRCNN [33] and SRN [60]. Following the degradation steps in [15], 500 clean images from BSD dataset [58] are used to synthesize training dataset, on which SFARL and SRN are trained. Since only testing code of DCNN [15] and 30 testing images on a disk kernel with radius 7 (Disk7) are released, SFARL is only evaluated on Disk7 kernel. From Table II, SFARL performs favorably in terms of average PSNR and SSIM. The results by SFARL are also visually more pleasing, while the results by the other methods suffer from visible noises and artifacts, as shown in Fig. 4. It is worth noting that IRCNN works well in reducing blurring, but magnifies other degradations to yield ringing effects and noises. SRN is an up-to-date deep motion deblurring network, but is still suffering from visible noises and artifacts, since the ill-poseness caused by disk blur is usually more severe than motion blur. Thus, we conclude that SFARL is able to model these multiple degradations in fidelity term. Moreover, it should be noted that DCNN needs to initialize deconvolution sub-network using inverse kernels, while our SFARL is much easier to train given proper training dataset.

Method Whyte[59] DCNN[15] IRCNN[33] SRN[60] SFARL
PSNR 26.35 26.50 23.84 26.46 26.66
SSIM 0.8307 0.8442 0.6673 0.8447 0.8532
TABLE II: Quantitative comparison on deconvolution with multiple degradations [15].
Blurry image Whyte [59] DCNN [15]
IRCNN [33] SRN [60] SFARL
Fig. 4: Visual quality comparison on deconvolution along with Gaussian noise, satature and JPEG compression.

5.3 Singe Image Rain Streak Removal

Rainy image SR [61] LRA [23]
GMM [24] CNN [62] SFARL
Fig. 5: Rain streak removal results of five evaluated methods on a synthetic image in [24].

To train the SFARL model for rain streak removal, we construct a synthetic rainy dataset. We randomly select 100 clean outdoor images from the UCID dataset [63], and use the Photoshop function (http://www.photoshopessentials.com/photo-effects/rain/) to generate 7 rainy images at 7 random rain scales and different orientations ranged from 60 to 90 degrees. The training dataset contains 700 images with different rain orientations and scales.

Method #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 Avg.
SR[61] 0.74 0.79 0.84 0.77 0.63 0.73 0.82 0.77 0.74 0.74 0.65 0.77 0.75
LRA[23] 0.83 0.88 0.76 0.96 0.92 0.93 0.94 0.81 0.90 0.82 0.85 0.80 0.87
GMM[24] 0.89 0.93 0.92 0.94 0.90 0.95 0.96 0.90 0.91 0.90 0.86 0.92 0.91
CNN[62] 0.75 0.79 0.71 0.89 0.76 0.80 0.85 0.77 0.81 0.76 0.79 0.73 0.78
SFARL 0.93 0.93 0.92 0.95 0.97 0.94 0.98 0.95 0.97 0.98 0.95 0.97 0.95
TABLE III: Deraining results on synthetic rainy images in [24] in terms of SSIM

We evaluate the SFARL method with the state-of-the-art algorithms including SR [61], LRA [23], GMM [24], and the CNN [62], on a the synthetic dataset [24]. The dataset consists of 12 rainy images with orientation ranged from left to right. Table III shows that the SFARL algorithm achieves the highest SSIM values for each test image. Fig. 5 shows rain streak removal results by all the evaluated algorithms on a synthetic rainy image. The results by the SFARL and GMM algorithms are significantly better than the other methods. However, the result by the GMM method still has visible rain streaks, while the SFARL model recovers satisfying clean image.

Furthermore, we compare SFARL with a recent deep CNN-based method, i.e., DDNET [16]. The authors [16] provide a training dataset of 12,600 rainy images and a testing dataset of 1,400 rainy images (Rain1400). We train SFARL on the training dataset, and on the testing dataset, SFARL is quantitatively and qualitatively compared with DDNET. From Table IV, SFARL obtains better PSNR and SSIM values on Rain1400. In Fig. 6, SFARL produces satisfactory deraining results, while rain streaks are still visible in the results by DDNET.