1 Introduction
Image restoration that aims to recover the latent clean image from a degraded observation is a fundamental problem in lowlevel vision. However, the degradation generally is irreversible, making image restoration an illposed inverse problem. While significant advances have been made in the past decades, it is challenging to develop proper models for various image restoration tasks.
In general, the linear degradation process of a clean image can be modeled as
(1) 
where is additive noise, is degradation operator, and is degraded observation. By changing the settings of the degradation operator and noise type, they can be applied to different image restoration tasks. For example,
can be an identity matrix for denoising, a blur kernel convolution for deconvolution, and a downsampling operator for superresolution, to name a few. The maximum a posterior (MAP) model for image restoration can then be formulated as
(2) 
where is a tradeoff parameter, is the regularization term associated with image prior, and the fidelity term is specified by degradation as well as noise [1, 2, 3]. Assuming the noise is additive white Gaussian, the fidelity term can be characterized by the norm.
When the degradation operator is precisely known, noise and image prior models play two key roles in the MAPbased image restoration model. Two widelyused types of noise distributions are Gaussian and Poisson. Other distributions, e.g., hyperLaplacian [4]
, Gaussian Mixture Model (GMM)
[5] and Mixture of Exponential Power (MoEP) [6], are also introduced for modeling complex noise. For image prior, gradientbased models, e.g., total variation [7] and hyperLaplacian distribution [1], are first studied due to simplicity and efficiency. Subsequently, patchbased [2] and nonlocal similarity [8, 9] models are developed to characterize more complex and internal dependence among image patches. Recently, datadriven and taskdriven learning methods have also been exploited to learn regularization from training images. The approach based on fields of experts (FoE) [10] is designed to learn the distribution of filter responses on images. Following the FoE framework, numerous discriminative learning approaches, e.g., cascaded shrinkage field (CSF) [3], trainable nonlinear reaction diffusion (TNRD) [11, 12] and universal denoising network (UNET) [13], use the stagewise learning scheme to enhance the restoration performance as well as computational efficiency.However, the precise degradation process for most restoration tasks is not known and thus the degradation process is modeled as
(3) 
In the restoration stage, only the model parameter is known, while the form , the noise type or the parameters are unknown. Here we define this problem as image restoration with partially known or inaccurate degradation models.
(a)  
Blurry image  Groundtruth  ROBUST [14]  SFARL  
(b)  
Degraded image  Groundtruth  DCNN[15]  SFARL  
(c)  
Rainy image  Groundtruth  DDNET[16]  SFARL 
Image deconvolution with inaccurate blur kernels and rain streak removal are two representative image restoration tasks with partially known or inaccurate degradation models. Image deconvolution with an inaccurate blur kernel is a subproblem of blind deconvolution which generally includes blur kernel estimation and nonblind deconvolution. In the blur kernel estimation stage, the kernel error generally is inevitable to be introduced by a specific method [17, 18, 19, 20, 21, 22]. In the nonblind deconvolution stage, the degradation model can then be written as
(4) 
where denotes the 2D convolution operator. Thus, the subproblem in the nonblind deconvolution stage is equivalent to image deconvolution with inaccurate blur kernels. Based on (3), we have , but is unknown. Existing nonblind deconvolution methods are sensitive to kernel error and usually result in ringing and other artifacts [1, 2], as shown in Figure 1.
For rain streak removal, an input image can be represented as the composition of a scene image layer and a rain streak layer . However, it remains challenging to model rain streak with any explicit formulation. On one hand, a linear summation is usually used for combining the scene image and rain streak layers [23, 24]. On the other hand, it has been suggested [25] that a complex model based on screen blend is more effective for combining the scene image and rain streak layers,
(5) 
where denotes the elementwise product. By setting , rain streak removal can be treated as an image restoration problem with a partially known degradation model, i.e., both and cannot be explicitly modeled in the deraining stage. As shown in Figure 1, the method [24] is less effective for modeling rainy scenes, resulting in an oversmooth image with visible streaks.
Image restoration with partially known or inaccurate degradation models cannot be simply addressed by noise modeling. From (3), we define the residual image as
(6) 
Due to the introduction of , even is white, the residual is spatially dependent and complexly distributed. Although several noise models have been suggested for complex noise modeling, these are all based on the independent and identically distributed (i.i.d.) assumption and ineffective for modeling the spatial dependency of the residual. Furthermore, the characteristics of is task specific and there exists no universal model that can be applied to all problems, thereby making it more challenging to solve (6).
Recently, deep CNNbased methods have achieved considerable progress on some low level vision tasks [26, 27, 28, 29, 30], e.g., rain streak removal [16, 31, 32], nonblind deconvolution [33, 34, 15] and Gaussian denosing [35]. These CNN methods, however, either do not take partially known degradations into consideration, or simply address this issue by learning a direct mapping from degraded image to groundtruth. In comparison with CNNbased models, we aim at providing a principled restoration framework for handling partially known or inaccurate degradations.
In this paper, we propose a principled fidelity learning algorithm for image restoration with partially known or inaccurate degradation models. For either kernel error caused by a specific kernel estimation method or rain streaks, the resulting residual is not entirely random and can be characterized by spatial dependency and distribution models. Thus, a taskdriven scheme is developed to learn the fidelity term from a training set of degraded and groundtruth image pairs. For modeling spatial dependence and complex distribution, the residual is characterized by a set of nonlinear penalty functions based on filter responses, leading to a parameterized formulation of the fidelity term. Such a fidelity term is effective and flexible in modeling complex residual patterns and spatial dependency caused by partially known or inaccurate degradation for a variety of image restoration tasks. Furthermore, for different tasks (e.g., rain streak removal and image deconvolution), the residual patterns are also different. With taskdriven learning, the proposed method can adaptively tailor the fidelity term to specific inaccurate or partially known degradation models.
We show that the regularization term can be parameterized and learned along with the fidelity term, resulting in our simultaneous fidelity and regularization learning (SFARL) model. In addition, we characterize the regularizer by a set of nonlinear penalty functions on filters responses of clean image. The SFARL model is formulated as a bilevel optimization problem where a gradient descent scheme is used to solve the inner task and stagewise parameters are learned from the training data. Experimental results on image deconvolution and rain streak removal demonstrate the effectiveness of the SFARL model in terms of quantitative metrics and visual quality (see Figure 1(a)(b)(c)). Furthermore, for image restoration with precise degradation process, e.g., nonblind Gaussian denoising, the SFARL model can be used to learn the proper fidelity term for optimizing visual perception metrics, and obtain results with better visual quality (see the results in the supplementary material).
In CSF [3], TNRD [12], and UNET [13], similar parametric formulation has been adopted to model natural image prior, and discriminative learning is employed to boost restoration performance. However, the degradation in these methods is assumed as precisely known, and thus the fidelity term is explicitly specified, e.g., norm for deconvolution with groundtruth kernel. But in practical applications, the degradation process is usually partially known, e.g., inaccurately estimated blur kernel, separation of rain layer and background layer and combination of multiple degradations. In comparison, our SFARL model aims at providing a principled restoration framework, in which fidelity term is flexible and effective to model partially known degradation and can be jointly learned with the regularization terms during training. As a result, when applied to image restoration with partially known or inaccurate degradation models, SFARL can be trained to perform favorably in comparison with TNRD and the stateofthearts.
The contributions of this work are summarized as follows:

We propose a principled algorithm for image restoration with partially known or inaccurate degradation. Give an image restoration task, our model can adaptively learn the proper fidelity term from the training set for modeling the spatial dependency and highly complex distribution of the taskspecific residual caused by partially known or inaccurate degradation.

We present a bilevel optimization model for simultaneous learning of the fidelity term as well as regularization term, and stagewise model parameters for taskspecific image restoration.

We carry out experiments on rain streak removal, image deconvolution with inaccurate blur kernels and deconvolution with multiple degradations to validate the effectiveness of the SFARL model.
2 Related Work
For specific vision tasks, numerous methods have been proposed for image deconvolution with inaccurate blur kernels and rain streak removal. However, considerably less effort has been made to address image restoration with partially known or inaccurate degradation models. In this section, we review related topics most relevant to this work, including noise modeling, discriminative image restoration, image deconvolution with inaccurate blur kernels, and rain streak removal.
2.1 Noise Modeling
For vision tasks based on robust principal component analysis (RPCA) or low rank matrix factorization (LRMF), noise is often assumed to be sparsely distributed and can be characterized by
norms [4, 36]. However, the noise in real scenarios is usually more complex and cannot be simply modeled using norms. Consequently, GMM and its variants have been used as universal approximations for modeling complex noise. In RPCA models, Zhao et al. [37] use a GMM model to fit a variety of noise types, such as Gaussian, Laplacian, sparse noise and their combinations. For LRMF, GMM is used to approximate unknown noise, and its effectiveness has been validated in face modeling and structure from motion [5]. In addition, a GMM model is also extended for noise modeling by low rank tensor factorization
[38], and generalized to the Mixture of exponential power (MoEP) scheme [6] for modeling complex noise. To determine the parameters of a GMM model, the Dirichlet process has been suggested to estimate the number of Gaussian components under variational Bayesian framework [39]. Recently, the weighted mixture of norm, norm [40] and Gaussian [41, 42] models have also been used for blind denoising with unknown noise.However, noise modeling cannot be readily used to address image restoration with partially known or inaccurate degradation models. The residual caused by inaccurate degradation is not i.i.d. Thus, both spatial dependency and complex noise distribution need to be considered to characterize the residual.
2.2 Discriminative Image Restoration
In a MAPbased image restoration model, the regularization term is associated with a statistical prior and assumed to be learned solely based on clean images in a generative manner, e.g., KSVD [43], GMM [2], and FoE [10]. Recently, discriminative learning has been extensively studied in image restoration. In general, discriminative image restoration aims to learn a fast inference procedure by optimizing an objective function using a training set of the degraded and groundtruth image pairs. One typical discriminative learning approach is to combine existing image prior models with truncated optimization procedures [44, 45]. For example, CSF [3, 46] uses truncated halfquadratic optimization to learn stagewise model parameters of a modified FoE. On the other hand, TNRD [11, 12] unfolds a fixed number of gradient descent inference steps. Nonparametric methods, such as regression tree fields (RTF) [44, 45] and filter forests [47], are also used for modeling image priors.
Existing discriminative image restoration methods, however, are all based on the precise degradation assumption. These algorithms focus on learning regularization terms in a discriminative framework such that the models can be applied to arbitrary images and blur kernels. In contrast, we propose a discriminative learning algorithm that considers both fidelity and regularization terms, and apply it to image restoration with partially known or inaccurate degradation models.
2.3 Image Deconvolution with Inaccurate Blur Kernels
Typical blind deconvolution approaches consist of two stages: blur kernel estimation and nonblind deconvolution. Existing methods mainly focus on the first stage [18, 19, 48, 22], and considerable attention has been paid to blur kernel estimation. For the second stage, conventional nonblind deconvolution methods usually are used to restore the clean image based on the estimated blur kernels. Despite significant progress has been made in blur kernel estimation, errors are inevitable introduced after the first stage. Furthermore, nonblind deconvolution methods are not robust to kernel errors, and artifacts are likely to be introduced or exacerbated during deconvolution [1, 2].
One intuitive solution is to design specific image priors to suppress artifacts [49, 50, 51, 52]. To the best of our knowledge, there exists only one attempt [14] to implicitly model kernel error in fidelity term,
(7) 
Here the residual is defined as , where is associated with the norm, and is additive white Gaussian noise. However, a method based on with the norm does not model the spatial dependency of residual signals. The method [14] alleviates the effect of kernel errors at the expense of potential oversmoothing restoration results. A recent deep CNNbased approach, i.e., FCN [34], receives multiple inputs with complementary information to produce high quality restoration result. But FCN relies on tuning parameters of nonblind deconvolution method to provide proper network inputs. In this work, we focus on the second stage of blind deconvolution, and propose the SFARL model to characterize the kernel error of a specific kernel estimation method.
2.4 Rain Streak Removal
Rain streak and scene composition models are two important issues for removing rain drops from input images. Based on the linear model , the MAPbased deraining model can be formulated as
(8) 
where denotes the regularization term of the rain streak layer, and the inequality constraints are introduced to obtain nonnegative solutions of and [24].
In [23], handcrafted regularization is employed to impose smoothness on the image layer and low rank on the rain streak layer. In [24], both image and rain streak layers are modeled as GMMs that are separately trained on clean patches and rain streak patches. Based on the screen blend model, Luo et al. [25] use the discriminative dictionary learning scheme to separate rain streaks by enforcing that two layers need to share fewest dictionary atoms. Recently, specifically designed CNN models [16, 32] have achieved progress in rain streak removal. Instead of using explicit analytic models, the SFARL method is developed based on a datadriven learning approach to accommodate the complexity and diversity of rain streak and scene composition models.
3 Proposed Algorithm
We consider a class of image restoration problems, where the degradation model is partially known or inaccurate but a training set of degraded and groundtruth image pairs is available. To handle these problems, we use a flexible model to parameterize the fidelity term caused by partially known or inaccurate degradation. For a given problem, a taskdriven learning approach can then be developed to obtain a taskspecific fidelity term from training data.
In this section, we first present our method for parameterizing the fidelity term to characterize the spatial dependency and complex distribution of the residual images. In addition, the regularization term is also parameterized, resulting in our simultaneous fidelity and regularization learning model. Finally, we propose a taskdriven manner to learn the proposed model from training data.
3.1 Fidelity Term
The fidelity term is used to characterize the spatial dependency and highly complex distribution of the residual image . On one hand, the popular explicit formulation, e.g., norm and norm, cannot model the complex distribution of residual image . Due to the i.i.d. assumption, the existing noise modeling approaches, e.g., GMM [37] and MoEP [6], also cannot be readily adopted to model spatial dependency in fidelity term. On the other hand, the residual generally is spatially dependent and complicatedly distributed. Motivated by the success discriminative regularization learning [3, 11], we also use a set of linear filters with diverse patterns to model the spatial dependency in . Moreover, due to the effect of and its combination with , the filter responses remain of complex distribution. Therefore, a set of nonlinear penalty functions is further introduced to characterize the distribution of filter responses.
To sum up, we propose a principled residual modeling in the fidelity term as follows,
(9) 
where is the degradation operator defined in (1) and is the 2D convolution operator. In the proposed fidelity term, the parameters include . When , is delta function and is the squared norm, the proposed model (9) is equivalent to the standard MAPbased model in (2).
Due to the introduction of linear filters and penalty functions , the proposed fidelity term can describe the complex patterns in residual caused by partially known or inaccurate degradation models. Furthermore, our fidelity model is flexible and applicable to different tasks. With proper training, it can be specified to certain image restoration tasks, such as rain streak removal, image deconvolution with inaccurate blur kernels. It is worth noting that the fidelity term in (9) can be regarded as a special form of convolution layer in CNN. Nonetheless, the fidelity term (9) can retain better interpretability and flexibility in characterizing residual . In particular, the learned s and s are closely related to the characteristics of redidual (see an example in the supplementary material). Moreover, the distribution of
generally is much more complex, and cannot be simply characterized by ReLU and its variants in conventional CNN.
3.2 Regularization Term
To increase modeling capacity on image prior, the regularization term is further parameterized as
(10) 
where is the th linear filter, is the corresponding nonlinear penalty function, and is the number of linear filters and penalty functions for the regularization term. The parameters for the regularization term include . The proposed model is the generalization of the FoE [10] model by parameterizing the regularization term with both the filters and penalty functions. Similar models have also been used in discriminative nonblind image restoration [3, 11, 13].
3.3 SFARL Model
Given a specific image restoration task, the parameters for the fidelity and regularization terms need to be specified. As a large number of parameters are involved in and , it is not feasible to manually determine proper values. In this work, we propose to learn the parameters of both fidelity and regularization terms in a taskdriven manner.
Denote a training set of samples by , where is the th degraded image and is the corresponding groundtruth image. The parameters can be learned by solving the following bilevel optimization problem,
(11) 
where is the feasible solution space. For image deconvolution with an inaccurate blur kernel, the feasible solution is only constrained to be in real number space, i.e., . For rain streak removal, additional constraints on the feasible solution space are required, i.e., , where (and ) is the th element of clean image (and rainy image ). In principle, the tradeoff parameter
can be absorbed into the nonlinear transform
and removed from the model (11). However, the tradeoff between the fidelity and regularization terms cannot be easily made due to that the scales of and vary for different restoration tasks, thereby making it necessary to include in (11).The loss function
measures the dissimilarity between the output of the SFARL model and the groundtruth image. One representative loss used in discriminative image restoration is based on the meansquared error (MSE) [11],(12) 
For image restoration when the precise degradation process is known, the optimal fidelity term in terms of MSE becomes the negative loglikelihood. The standard MAP model can then be used in the inner loop of the bilevel optimization task. Thus, the MSE loss is only applicable to learning fidelity term for image restoration with partially known or inaccurate degradation models.
In this work, we use the visual perception metric, e.g., negative SSIM [53, 54], as the loss function,
(13) 
The reason of using negative SSIM is twofold. On one hand, it is known that SSIM is closely related to visual perception of image quality, and minimizing negative SSIM is expected to benefit the visual quality of restoration result. On the other hand, even for image restoration with precise degradation process, the negative loglikelihood will not be the optimal fidelity term when the negative SSIM loss is used. Thus the residual model (9) can be utilized to learn proper fidelity term from training data for either image deconvolution with inaccurate blur kernels, rain streak removal, or Gaussian denoising. In addition, the experimental results also validate the effectiveness of negative SSIM and residual modeling in terms of both visual quality and perception metric.
4 SFARL Training
In this section, we first present an iterative solution to inner task in the bilevel optimization problem. The SFARL model is then parameterized and gradientbased optimization algorithm can be used for training. The SFARL model is trained by sequentially performing greedy training in Algorithm 2 and joint finetuning in Algorithm 3. Finally, the derivations of gradients for the greedy and endtoend training processes are presented.
4.1 Iterative Solution to Inner Optimization Task
The inner task in (11) implicitly defines a function on the model parameters. As the optimization problem is nonconvex, it is difficult to obtain the explicit analytic form of either or . In this work, we learn by considering the truncation of an iterative optimization algorithm [3, 46, 11, 12]. Furthermore, the stagewise model parameters are also used to improve image restoration[3, 11].
To solve (11), the updated solution can then be written as a function of and , i.e., . Suppose that are known. The stagewise parameters can then be learned by solving the following problem,
(14) 
Here we use a gradient descent method to solve the inner optimization loop, and can be written as
(15)  
where the influence functions are defined as and
. These functions are entrywisely performed on a vector or matrix. In addition,
and are filters by rotating and 180 degrees, respectively. After each gradient descent step, is projected to the feasible solution space . The inference procedure is shown in Algorithm 1.We use ADAM [55] to solve the optimization problem in (14). Therefore, we need to present the parameterization of the solution in (15) and derive the gradients for the greedy and endtoend learning processes.
4.2 Parameterization
Similar to [3, 11], we use the weighted summation of Gaussian RBF functions to parameterize the influence functions in regularization term
(16) 
and in fidelity term
(17) 
where and are weight coefficients, is mean value and is precision.
The filters in regularization term and in fidelity term are specified as linear combination of DCT basis with unit norm constraint,
(18) 
where is complete DCT basis, is DCT basis by excluding the DC component, and are coefficients for regularization term and fidelity term respectively.
In our implementation, we utilize filters with size in both regularization term and fidelity term. Thus, the numbers of nonlinear functions and filters can be accordingly set, i.e., for regularization term, and for fidelity term. The numbers of Gaussian functions are fixed to for both fidelity and regularization terms, i.e.,
. To handle the boundary condition in convolution operation, the image is padded for processing and only the valid region is cropped for output.
4.3 Greedy Training
The SFARL model is firstly trained stagebystage. To learn the model parameters of stage
, we need to compute gradient by the chain rule,
(19) 
4.3.1 Deviation of
When the loss function is specified as MSE, i.e., , the gradient can be simply computed as
(20) 
Visual perception metric, i.e., negative SSIM
When the loss function is specified as visual perception metric, i.e., [53, 54], we give the gradient deviation as follows. To distinct the entire image and small patch, only in this subsection we use and as entire image and reference image respectively. The SSIM value is computed based on the small patches and
(21) 
where is the number of patches. The value on each patch is computed as
(22) 
where is mean value of patch ,
is variance of patch
, and is covariance of pathes and , and , are some constant values. Let us define , , and . Then we have .The gradient of negative SSIM is
(23)  
where
(24)  
For simplicity, we hereafter use to denote for both MSE and negative SSIM.
4.3.2 Deviation of
Since the parameterization of fidelity term and regularization term is similar, we only use the fidelity term as an example, and it is easy to extend it to the regularization term.
Weight parameter
The gradient with respect to is
(25) 
The overall gradient with respect to is
(26) 
Filter
The function with respect to each filter can be simplified to,
(27) 
where denotes a constant which is independent with . Let us define and . Thus, we can obtain the gradient deviation as
(28) 
Based on the convolution theorem [56], we have
(29) 
where and are sparse convolution matrices of and , respectively. Thus, the first term in (28) is
(30) 
where rotates matrix by 180 degrees.
For the second term, we introduce an auxiliary variable , , and we have . We note that
Therefore, we have
(31) 
where is a diagonal matrix. The gradient of is
(32) 
Since the filter is specified as linear combination of DCT basis, one need to derive the gradient with respect to the combination coefficients , i.e.,
(33) 
By introducing , we then have
(34)  
Finally, the overall gradient with respect to combination coefficients is given by
(35) 
Nonlinear function
We first reformulate the function with respect to into the matrix form
(36) 
where . Therefore, the column vector can be reformulated into the matrix form,
(37) 
where is the vectorized version of parameters , matrix is
Thus, we can get
(38) 
and finally the overall gradient with respect to is
(39) 
In our implementation, we do not explicitly compute the matrix , since they can be efficiently operated via 2D convolution.
4.4 Joint Finetuning
Once the greedy training process for each stage is carried out, an endtoend training process is used to finetune all the parameters across stages. The joint training loss function is defined as
(40) 
where is the maximum iteration number. The gradient can be computed by the chain rule,
(41) 
where only need to be additionally computed. By reformulating the solution in the matrix form,
(42)  
the gradient can be computed as
(43) 
where is also a diagonal matrix.
Once is computed, the overall gradient can be computed by the chain rule and the other gradient parts in (41) can be borrowed from greedy training.
4.4.1 Training Procedure
Given a training dataset, the training of SFARL is to sequentially run greedy training as Algorithm 2 and joint finetuning as Algorithm 3. Algorithm 1 lists the inference of SFARL given model parameters, in which all the intermediate results are recorded for backward propagation during training. In greedy training , parameters in previous stages are fixed, and only gradients in stage are computed and are fed to ADAM algorithm. In joint finetuning, gradients in each stage are computed, and are fed to ADAM algorithm to optimize the parameters for all the stages.
5 Experimental Results
In this section, we evaluate the proposed SFARL algorithm on several restoration tasks, i.e., image deconvolution either with an inaccurate blur kernel or with multiple degradations, rain streak removal from a single image. SFARL can also be evaluated on Gaussian denoising, and we have presented the results in the supplementary material. In our experiments, filters are adopted in both fidelity and regularization terms. As for stage number, we recommend to set it based on the convergence behavior during greedy training, and empirically use 10stage SFARL for image deconvolution, and 5stage SFARL for rain streak removal and Gaussian denoising. During training SFARL, greedy training ends with 10 epoches for each stage, and then the parameters are further jointly finetuned with 50 epochs. We use ADAM [55] to optimize these SFARL models with learning rate , and . Using rain streak removal as an example, it takes about 19 hours to train a SFARL model on a computer equipped with a GTX 1080Ti GPU. The SFARL models are quantitatively and qualitatively evaluated and compared with stateoftheart conventional and deep CNNbased approaches.
More experimental settings and results are included in the supplementary material. The testing codes are available at https://github.com/csdwren/sfarl, and the training codes will also be given after this paper is accepted.
5.1 Deconvolution with Inaccurate Blur Kernels
We consider the blind deconvolution task and use two blur kernel estimation methods, i.e., Cho and Lee [48] and Xu and Jia [18], for experiments. For each estimation approach, we evaluate the performance of SFARL for handling approachspecific blur kernel estimation error. To construct the training dataset, we use eight blur kernels [57] on 200 clean images from the BSD dataset [58]. The Gaussian noise with is added to generate the blurry images. The methods by Cho and Lee [48] and Xu and Jia [18] are used to estimate blur kernels. Thus, we have 1,600 training samples for each blur kernel estimation approach. To ensure the training sample quality, we randomly select 500 samples with error ratio [57] above 3 for each image deconvolution method.
Kernel estimation  EPLL[2]  ROBUST[14]  IRCNN[33]  SFARL 
Cho and Lee [48]  0.8801  0.8659  0.8825  0.8903 
Xu and Jia [18]  0.9000  0.8917  0.9023  0.9164 
Blurry image  EPLL [2]  ROBUST [14]  IRCNN [33]  SFARL 
Blurry images  IRCNN [33]  ROBUST [14]  SFARL 
On the widely used synthetic dataset, i.e., Levin et al. [57], we compare our SFARL with EPLL [2], ROBUST [14] and IRCNN [33]. The testing dataset includes 4 clean images and 8 blur kernels. The blur kernels are estimated by Cho and Lee [48] and Xu and Jia [18]. Table I lists the average SSIM values of all evaluated methods on the dataset by Levin et al. [57]. Overall, the SFARL algorithm performs favorably against the other methods in terms of SSIM. From Table I, we also have the following observations. First, the SFARL algorithm models the residual images by specific blur kernel estimation method to improve restoration result. For each blur kernel estimation method, what we need to do is to retrain the SFARL model from the synthetic data. Second, when the estimated blur kernel is more accurate (e.g., Xu and Jia [18]), better quantitative performance indexes are also attained by our SFARL.
We evaluate the SFARL algorithm against the stateoftheart methods on a synthetic and a real blurry images in Figures 2 and 3. The blur kernels are estimated using the method by Xu and Jia [18]. As the blur kernel can be accurately estimated in Fig. 2, all the evaluated methods perform well and the SFARL algorithm restores more texture details. On the other hand, the estimated blur kernel is less accurately estimated in Fig. 3. Among all the evaluated methods, the deblurred image by the SFARL algorithm is sharper with fewer ringing effects than those by the other methods. We note that IRCNN [33] use the norm in the fidelity term and the ROBUST scheme [14] introduces an norm regularizer on the residual caused by kernel error. However, both norm and norm are limited in modeling the complex distribution of the residual, and neither GMM prior in EPLL nor deep CNN prior in IRCNN cannot well compensate the effect caused by inaccurate blur kernels. Thus, the performance gain of the SFARL model can be attributed to its effectiveness in characterizing the spatial dependency and complex distribution of residual images.
5.2 Deconvolution with Multiple Degradations
We consider a more challenging deconvolution task [15], in which blur convolution is followed by multiple degradations including saturation, Gaussian noise and JPEG compression. SFARL is compared with DCNN [15], Whyte [59], IRCNN [33] and SRN [60]. Following the degradation steps in [15], 500 clean images from BSD dataset [58] are used to synthesize training dataset, on which SFARL and SRN are trained. Since only testing code of DCNN [15] and 30 testing images on a disk kernel with radius 7 (Disk7) are released, SFARL is only evaluated on Disk7 kernel. From Table II, SFARL performs favorably in terms of average PSNR and SSIM. The results by SFARL are also visually more pleasing, while the results by the other methods suffer from visible noises and artifacts, as shown in Fig. 4. It is worth noting that IRCNN works well in reducing blurring, but magnifies other degradations to yield ringing effects and noises. SRN is an uptodate deep motion deblurring network, but is still suffering from visible noises and artifacts, since the illposeness caused by disk blur is usually more severe than motion blur. Thus, we conclude that SFARL is able to model these multiple degradations in fidelity term. Moreover, it should be noted that DCNN needs to initialize deconvolution subnetwork using inverse kernels, while our SFARL is much easier to train given proper training dataset.
5.3 Singe Image Rain Streak Removal
Rainy image  SR [61]  LRA [23] 
GMM [24]  CNN [62]  SFARL 
To train the SFARL model for rain streak removal, we construct a synthetic rainy dataset. We randomly select 100 clean outdoor images from the UCID dataset [63], and use the Photoshop function (http://www.photoshopessentials.com/photoeffects/rain/) to generate 7 rainy images at 7 random rain scales and different orientations ranged from 60 to 90 degrees. The training dataset contains 700 images with different rain orientations and scales.
Method  #1  #2  #3  #4  #5  #6  #7  #8  #9  #10  #11  #12  Avg. 
SR[61]  0.74  0.79  0.84  0.77  0.63  0.73  0.82  0.77  0.74  0.74  0.65  0.77  0.75 
LRA[23]  0.83  0.88  0.76  0.96  0.92  0.93  0.94  0.81  0.90  0.82  0.85  0.80  0.87 
GMM[24]  0.89  0.93  0.92  0.94  0.90  0.95  0.96  0.90  0.91  0.90  0.86  0.92  0.91 
CNN[62]  0.75  0.79  0.71  0.89  0.76  0.80  0.85  0.77  0.81  0.76  0.79  0.73  0.78 
SFARL  0.93  0.93  0.92  0.95  0.97  0.94  0.98  0.95  0.97  0.98  0.95  0.97  0.95 
We evaluate the SFARL method with the stateoftheart algorithms including SR [61], LRA [23], GMM [24], and the CNN [62], on a the synthetic dataset [24]. The dataset consists of 12 rainy images with orientation ranged from left to right. Table III shows that the SFARL algorithm achieves the highest SSIM values for each test image. Fig. 5 shows rain streak removal results by all the evaluated algorithms on a synthetic rainy image. The results by the SFARL and GMM algorithms are significantly better than the other methods. However, the result by the GMM method still has visible rain streaks, while the SFARL model recovers satisfying clean image.
Furthermore, we compare SFARL with a recent deep CNNbased method, i.e., DDNET [16]. The authors [16] provide a training dataset of 12,600 rainy images and a testing dataset of 1,400 rainy images (Rain1400). We train SFARL on the training dataset, and on the testing dataset, SFARL is quantitatively and qualitatively compared with DDNET. From Table IV, SFARL obtains better PSNR and SSIM values on Rain1400. In Fig. 6, SFARL produces satisfactory deraining results, while rain streaks are still visible in the results by DDNET.
Comments
There are no comments yet.