Physics-Based Generative Adversarial Models for Image Restoration and Beyond

08/02/2018 ∙ by Jinshan Pan, et al. ∙ Nanjing University University of California, Merced 6

We present an algorithm to directly solve numerous image restoration problems (e.g., image deblurring, image dehazing, image deraining, etc.). These problems are highly ill-posed, and the common assumptions for existing methods are usually based on heuristic image priors. In this paper, we find that these problems can be solved by generative models with adversarial learning. However, the basic formulation of generative adversarial networks (GANs) does not generate realistic images, and some structures of the estimated images are usually not preserved well. Motivated by an interesting observation that the estimated results should be consistent with the observed inputs under the physics models, we propose a physics model constrained learning algorithm so that it can guide the estimation of the specific task in the conventional GAN framework. The proposed algorithm is trained in an end-to-end fashion and can be applied to a variety of image restoration and related low-level vision problems. Extensive experiments demonstrate that our method performs favorably against the state-of-the-art algorithms.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 4

page 6

page 7

page 8

page 9

page 10

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Many fundamental image restoration and related low-level vision problems (e.g., image filtering, image deblurring, image super-resolution, image dehazing, image deraining, etc.) ask: given an input image

, how to estimate a clear image of the same scene. The fundamental constraint is that the estimated should be consistent with the input under the formation model:

(1)

where the operator maps the unknown result to the observed image . For example, if (1) is an image deblurring problem, corresponds to the blur operation. As estimating from is usually ill-posed, it is natural to introduce additional constraints to regularize . The commonly used method is based on the maximum a posterior (MAP) framework, where can be solved by

(2)

Here and

are probability density functions, which are usually referred to as the likelihood term and image prior. In recent years, numerous deep models have been developed to deal with image restoration and related low-level vision tasks, e.g., image super-resolution 

[3, 4, 5, 6, 7, 8], image filtering [9, 10], noise removal [11, 12, 13], image deraining [14], and dehazing [15, 16, 17], to name a few. Mathematically, these methods directly learn the mapping functions between and , which can be written by

(3)

where is the mapping function which can be regarded as an inverse operator of in (1). Theoretically, should be close to the ground truth if the network can simulate the solution. However, due to the complexity of the problem, e.g., the solution space of the corresponding problem is too large, a simple network with a random initialization is unable to estimate the solution well. Thus, only using a feed-forward network to learn the inverse operator does not generate good results. Figure 1(e) shows an image deblurring result by a deep feed-forward network [2], where the characters of the generated image are not estimated well.

Another kind of deep neural network, i.e., GAN 

[18], has been developed in some image restoration problems, e.g., image super-resolution [19], image deraining [1], image deblurring [20], etc. The GAN framework contains a generative model and a discriminative model, where the discriminative model is used to regularize the generative model, so that the distribution of the output is close to that of realistic images. However, the adversarial loss does not ensure that the contents of the outputs are consistent with those of the inputs. Although several algorithms [1, 19]

use a pixel-wise loss function based on the ground truths and a perceptual loss function 

[21] based on pre-trained VGG features as the constraints in the GAN formulation, these algorithms are still less effective as shown in Figure 1(b).

We note that the aforementioned end-to-end trainable networks only aim to learn the solutions (i.e., (3)) and do not guarantee whether the learned solutions satisfy the physics model (1) or not. Without the physics model constraint, the methods based on the feed-forward networks do not generate physically correct results and some main structures and details of the generated images are incorrect (e.g., Figure 1(b) and (e)). Thus, it is important to develop an algorithm that can model both the solutions (i.e., (3)) and physics model (i.e., (1)) in a unified framework to address image restoration and related problems.

In this paper, we propose a physics model constrained learning algorithm so that it can guide the estimation in the conventional GAN framework. The physics model ensures that the estimated result (i.e., ) should be consistent with the observed image . The GAN with the physics model constrained learning algorithm is jointly trained in an end-to-end fashion. We show that the proposed algorithm is a flexible framework which can be applied to a variety of image restoration and related low-level vision problems. It performs favorably against state-of-the-art methods on each task. Figure 1 shows two applications of the proposed algorithm.

2 Related Work

Recent years have witnessed significant advances in numerous image restoration problems due to the use of designed image priors and deep learning. In this section, we mainly review the most related work and put this work in proper context.

Image deblurring.

Image deblurring is an ill-posed problem. Conventional methods usually design kinds of effective statistical priors to solve this problem. In contrast to conventional methods, Schuler et al. [22]

develop a multi-layer perceptron approach to remove noise and artifacts in the deblurring process. Xu et al. 

[23]

develop a convolutional neural network based on the singular value decomposition (SVD) to deal with outliers. However, these methods are designed for non-blind image deblurring. It is not trivial to extend these algorithms to blind image deblurring. In blind image deblurring, some approaches 

[24, 25] first use convolutional neural networks to estimate blur kernels and then deblur images with the conventional image restoration methods. To directly restore clear images, several methods propose end-to-end trainable neural networks [2, 26]. Although these methods avoid a complex blur kernel estimation step, the generated results by these methods are not physically correct (e.g., the structures of the recovered images are not preserved well as shown in Figure 1(e)).

Image dehazing.

The success of the conventional methods is due to designing the handcrafted features for the estimation of transmission maps, e.g., dark channel [27]. Recent deep learning-based methods [15, 16] first use neural networks to estimate transmission maps and then estimate clear images based on the traditional methods [27]. Different from these methods, we propose an end-to-end trainable network to solve the image dehazing problem, which can directly estimate clear images from hazy inputs.

Image deraining.

For the image deraining, conventional algorithms are usually based on the statistical properties of rainy streaks [28, 29, 30, 31]. Recently, Eigen et al. [14] develop a neural network to remove rain/dirt. Motivated by the success of the ResNet [32], Fu et al. [33] develop a deep detail network for image deraining. In [1], Zhang et al. develop an improved GAN method by introducing a perceptual loss function [21]. Yang et al.  [34] develop a multi-task network for rain detection and removal. These deep learning-based methods do not consider the physical formation model and are based on end-to-end trainable networks, which do not effectively solve the deraining problem (Figure 1(b)).

Image super-resolution and other problems.

Super-resolution has achieved significant progress due to the use of deep learning [5, 6, 19, 35]. In [5], Dong et al. develop an end-to-end trainable network for super-resolution (SRCNN). As SRCNN is less effective in the detail estimation, Kim et al. [6] propose the residual learning algorithm based on a deeper network. To generate more realistic images, Ledig et al. [19] develop a GAN for image super-resolution. In addition, the deep learning method has been applied to other low-level vision problems, such as image filtering [9, 10] and image denoising [11, 12, 13]. Different from these methods, we propose a GAN-based method under the constraint of the physics model to solve image restoration and related low-level vision problems.

Generative adversarial networks.

Goodfellow et al. [18] propose the GAN framework to generate realistic-looking images from random noise. Motivated by this framework, lots of methods including methodology and applications [36, 37, 38, 39, 40] have been proposed. Recently, the GAN framework has also been applied to some low-level vision problems [1, 19, 20, 41, 42]. Different from these methods, we propose an efficient physics model constrained learning algorithm to improve the GAN framework to solve image deblurring, image dehazing, and related tasks.

3 Image Restoration from GAN

The GAN algorithm learns a generative model via an adversarial process. It simultaneously trains a generator network and a discriminator network by optimizing

(4)

where denotes random noise, denotes a real image, and denotes a discriminator network. For simplicity, we also use to denote a generator network.

In the training process of GAN, the generator generates samples (i.e., ) that can fool the discriminator, while the discriminator learns to distinguish the real data and the samples from the generator.

We note that the image restoration problem (2) can be solved by minimizing the objective energy function

(5)

where is the data term which ensures the recovered image should be consistent with the input image and is the regularization of which models the properties of (e.g., sparse gradient distribution [43]).

In some problems, e.g., image deblurring, acts as a discriminator where the value of is much smaller if is clear and larger otherwise [44]. In other words, optimizing the objective function (5) will make the value of smaller. Thus, the estimated intermediate image will be much clearer and physically correct.

Based on above discussions, if we use the observed image as the input of (4), the adversarial loss

(6)

has the similar effect to as the value of (6) is much larger if is clear and smaller otherwise. With this property, the adversarial loss can be used as a prior to regularize the solution space of image restoration as evidenced by [45].

We note that GAN with the observed data as the input has shown promising results in image super-resolution [19], image deraining [1], image deblurring [20, 42], etc. However, GAN does not guarantee whether the solutions satisfy the physics model (1) or not (i.e., GAN does not consider the effect of the data term in (5)) and thus fails to generate clear images as illustrated in Section 1. In this following, we will develop a new method to improve the estimation of GAN under the guidance of the physics model (1).

Fig. 2: Proposed framework. The discriminative network

is used to classify whether the distributions of the outputs from the generator

are close to those of the ground truth images or not. The discriminative network is used to classify whether the regenerated result is consistent with the observed image or not. All the networks are jointly trained in an end-to-end manner.

4 Proposed Algorithm

Our goal is not to propose a novel network structure, but to have a standard framework using the fundamental constraint to guide the training of GANs and ensure that the restored results are physically correct.

To ensure the output of GANs (i.e., ) is consistent with the input under the model (1), we introduce an additional discriminative network. The proposed framework is shown in Figure 2. It contains two discriminative networks, one generative network, and one fundamental constraint (i.e., (1)). We take the image deblurring problem as an example. Let and denote the clear images and the corresponding blurred images. The generative network learns the mapping function and generates the intermediate deblurred image from the input . Then, we apply the physics model (1) to

(7)

where denotes a blur kernel which is only used in the training process111Note that the blur kernel in (7) is known, which is also used to generate blurred image from clear image when synthesizing the training data. Thus, the physics model used in the proposed algorithm is not stochastic. and denotes the convolution operator. The discriminative network takes the blurred image and the regenerated image as the inputs, which is used to classify whether the generated results satisfy the blur model or not. The other discriminative network takes the ground truth and the intermediate deblurred image as the inputs and it is used to classify whether is clear or not.

For image dehazing, the physics model is , where is the transmission map and is the atmospheric light. In image super-resolution, the physics model is , where

denotes the down-sampling and filtering operator. In this paper, we use the bicubic interpolation operation. For other applications, we can use the corresponding physics models in the proposed algorithm to solve the problem.

We note that although the proposed network is trained in an end-to-end manner, it is constrained by a physics model and thus is not fully blind in the training stage. With the learned generator , the test stage is blind. We can directly obtain the final results by applying it to the input images.

4.1 Network Architecture

Once the physics model constraint is determined in the proposed network, we can use some existing network architectures to determine the architectures of generative network and discriminative network.

Generative network.

The generative network is used to generate the final results. As there exist lots of generative networks which are used in image restoration and related low-level vision problems, e.g., super-resolution [21, 19], image editing [36], we use the similar network architectures by [36] as our generative network. The detailed network parameters are shown in Table I.

Discriminative network.

We note that the PatchGANs [19, 36, 39] have fewer parameters than a full image discriminator and achieve state-of-the-art results in many vision problems. Similar to [19, 36, 39], we use PatchGANs to classify whether the generated results are real or fake. The discriminator architecture is shown in Table I.

Parameters of the generative network
Layers CIR CIR CIR ResBlock-ResBlock CTIR CTIR CIR
Filter size 7 3 3 3 3 3 3 7
Filter numbers 64 128 256      256      256 128 64 3
Stride 1 2 2 1 1 2 2 1
Parameters of the discriminative network
Layers CILR CILR CILR CILR CILR
Filter size 4 4 4 4 4
Filter numbers 64 128 256 512 1
Stride 2 2 2 1 1
TABLE I: Network parameters. “CIR” denotes the convolutional layer with the instance normalization (IN) [46]

and ReLU; “ResBlock” denotes the residual block 

[32]

which contains two convolutional layers with the IN and ReLU; “CTIR” denotes the fractionally-strided convolutional layers with IN and ReLU; “CILR” denotes the convolutional layer with the IN and LeakyReLU.

4.2 Loss Function

A straightforward way for training is to use the original GAN formulation (4). However, the contents of the generated images based on this training loss may be different from the ground truth images as evidenced by [19]. To ensure that the contents of the generated results from the generative networks are close to those of the ground truth images and also consistent with those of the inputs under the physical formulation model (1), we use the norm regularized pixel-wise loss functions

(8)

and

(9)

in the training stage. To make the generative network learning process more stable, we further use the loss function

(10)

to regularize the generator .

Finally, we propose a new objective function

(11)

to ensure that the output of GANs is consistent with the observed input under the model (1).

Based on above considerations, the networks , , and are trained by solving

(12)

where is a weight parameter. To make the training process stable, we use the least square GAN loss [40] in to generate higher quality results.

5 Experimental Results

In this section, we evaluate the proposed algorithm on several image restoration tasks including image deblurring, dehazing, super-resolution, and deraining. Due to the comprehensive experiments conducted, we only show a small portion of the results in the main paper. More results and applications are included in the supplementary material. The trained models, datasets, and source code will be made available at https://sites.google.com/site/jspanhomepage/physicsgan/.

5.1 Datasets

For image deblurring, we use the training dataset by Hradiš et al. [2], which consists of images with both defocus blur generated by anti-aliased disc and motion blur generated by the random walk. We randomly crop one million blurred image patches from the dataset for training and use the test dataset [2] which includes 100 clear images to evaluate our algorithm. In addition, with the same blur kernels [2], we further randomly choose 50,000 clear face images from CelebA [47] and 50,000 natural images from COCO [48] to generate the training data for the face image deblurring and natural image deblurring. We add the random noise into each blurred image, where the noise level ranges from 0 to 10%.

For image dehazing, we use the NYU depth dataset [49] and randomly select 1,200 clear images and the corresponding depths to generate hazy images according to the hazy model [27]. As the images in the NYU depth dataset [49] are indoor images, we also randomly choose 500 outdoor images from the Make3D dataset [50] to synthesize the outdoor hazy images as the training dataset. All the training images are resized to the canonical size of pixels. To evaluate image dehazing, we randomly select 240 images from the NYU depth dataset [49] and the Make3D dataset [50], where the images are not used in the training stage. In addition to these synthetic data, we also compare with state-of-the-art methods using the commonly used real hazy images.

In the image deraining task, we use the dataset by Zhang et al. [1] to train and evaluate our algorithm.

5.2 Training

We train the models using the Adam optimizer [51]

with an initial learning rate 0.0002, where it is linearly decayed after every 100 epochs. We set the batch size to be 1. Similar to Glorot and Bengio 

[52]

, the weights of filters in each layer are initialized using a Gaussian distribution with zero mean and variance of

, where is the size of the respective convolutional filter. The slope of the LeakyReLU is 0.2. After obtaining generator , as we know the paired training data {,} and the corresponding physics model parameters (e.g., in image deblurring) that are used to synthesize from , we apply the same physics model parameters to and generate . Then the discriminator takes and as the input while the discriminator takes and as the input. Similar to [53], we update the discriminators using a history of generated images instead of the ones by the latest generative networks according to [36]. The update ratio between the generator and the discriminators is set to be 1.

Methods Input Xu [54] Pan [55] Pan [44] CNN [2] Nah [26] pix2pix [39] CycleGAN [36] Ours
PSNR 18.52 17.52 18.19 18.47 26.53 22.57 23.33 11.92 28.80
SSIM 0.6658 0.4186 0.6270 0.6127 0.9422 0.8924 0.9170 0.2792 0.9744
TABLE II: Quantitative evaluations with state-of-the-art methods on the text image deblurring dataset by [2].
(a) Input (b) Xu [54] (c) Pan [55] (d) Pan [44]
(e) Nah [26] (f) CNN [2] (g) CycleGAN [36] (h) Ours
Fig. 3: One synthetic blurred image from the text image deblurring dataset [2]. The proposed method generates images with much clearer characters.

5.3 Image Deblurring

We compare our algorithm with the image deblurring methods including conventional state-of-the-art methods [44, 54, 55] and CNN-based deblurring methods [2, 20, 26, 56]. We note that recent CNN-based deblurring algorithms [20, 26, 56] are designed for natural images. For text and face image deblurring applications, we retrain these algorithms for fair comparisons. For natural image deblurring, we use the provided trained models for fair comparisons. We further note that the pix2pix [39] and CycleGAN [36] algorithms are designed for image-to-image translation which can be applied to image restoration. We retrain these two algorithms for fair comparisons.

Synthetic blurred images.

We quantitatively evaluate our method using the text image dataset described above and the PSNR/SSIM as the metrics. Table II shows that the proposed algorithm performs favorably against state-of-the-art methods in terms of PSNR and SSIM222As the implementation of [57] is not available, we do not compare this method in this paper.. Note that the method by [26] uses a multi-scale CNN with an adversarial learning algorithm. However, it fails to generate the results with high PSNR values as shown in Table II. The pix2pix [39] method achieves the similar results to [26] as both these methods improve GAN by introducing the pixel-wise loss (9). In contrast, our algorithm develops a physical formation model learning algorithm and generates the results with higher PSNR values.

Figure 3 shows a blurred text image from the test dataset. The conventional algorithms [44, 54, 55] fail to generate clear images. The CNN-based method [2] generates better results. However, it still contains significant blur residual as this method only uses a feed-forward network and does not consider the consistency between the estimated results and blurred inputs. We note that the CycleGAN method [36] improves GAN by introducing cycle consistency constraint for the image-to-image translation task. However, it fails to generate clear images as shown in Figure 3(g). In contrast, our method is able to generate much clearer images with recognizable characters as shown in Figure 3(h), which further demonstrates the importance of the physical formation constraint.

Methods Xu [54] Pan [58] Pan [44] Zhang [56] Nah [26] DeblurGAN [20] CycleGAN [36] Ours
PSNR 18.84 18.85 21.51 23.22 22.48 19.18 20.73 24.17
SSIM 0.4054 0.4652 0.4263 0.6832 0.4962 0.2563 0.5978 0.7705
TABLE III: Quantitative evaluations with state-of-the-art deblurring methods on face images.
(a) Input (b) Xu [54] (c) Pan [58] (d) Pan [44]
(e) Zhang [56] (f) DeblurGAN [2] (g) CycleGAN [36] (h) Ours
Fig. 4: Face image deblurring results. The proposed method generates images with fewer artifacts.
Methods Xu [54] Pan [55] Pan [44] Zhang [2] Nah [26] DeblurGAN [39] CycleGAN [36] Ours
PSNR 20.11 19.97 20.72 22.48 20.89 20.10 19.98 22.63
SSIM 0.3802 0.4419 0.3450 0.5982 0.4878 0.2585 0.5963 0.7151
TABLE IV: Quantitative evaluations with state-of-the-art deblurring methods on natural images.
(a) Input (b) Xu [54] (c) Pan [55] (d) Pan [44]
(e) Zhang [56] (f) DeblurGAN [2] (g) CycleGAN [36] (h) Ours
Fig. 5: Natural image deblurring results. The proposed method generates images with fewer artifacts.
(a) Input (b) Xu [54] (c) Pan [55] (d) Pan [44]
(e) Nah [26] (f) CNN [2] (g) CycleGAN [36] (h) Ours
Fig. 6: Real text image deblurring results. The proposed method generates images with much clearer characters.

We then select 160 blurred face images to evaluate our algorithm, where the test examples and training examples do not overlap. We compare the proposed method with state-of-the-art algorithms including conventional methods [44, 54, 55] and recent deep learning based methods [20, 26, 56]. Table III shows that the proposed algorithm performs favorably against state-of-the-art methods in terms of PSNR and SSIM. Figure 4 shows deblurred results from several algorithms. As the input image contains significant blur, the algorithms [26, 36, 56, 58] fail to generate clear images. The DeblurGAN algorithm is mainly based on a conditional GAN framework. However, the deblurred result by this method still contains significant artifacts as the conventional GAN usually leads to checkerboard artifacts. In contrast, the proposed algorithm generates a much clearer image due to the proposed physics model constraint.

Finally, we use 160 blurred natural images to evaluate our algorithm, where the test examples and training examples do not overlap. The results in Table IV show that the proposed method performs well on natural image deblurring.

Although the proposed network is trained on the uniform blurred images, it also works on the dynamic scenes [26]. More results are included in the supplemental material.

Real blurred images.

We further evaluate our method on the real images. Figure 6(a) shows a real blurred image. The proposed method generates a visually better result with clearer characters than other algorithms as shown in Figure 6(h).

Methods Input He [27] Berman [59] Ren [15] Cai [16] pix2pix [39] CycleGAN [36] Ours
PSNR 25.84 28.36 28.54 27.29 28.45 28.52 27.73 28.79
SSIM 0.5868 0.6654 0.6823 0.6128 0.6789 0.6821 0.6237 0.6849
TABLE V: Quantitative evaluations with state-of-the-art methods on the proposed image dehazing dataset.
(a) Input (b) He  [27] (c) Berman [59] (d) Ren [15]
(e) Cai [16] (f) pix2pix [39] (g) Ours (h) GT
Fig. 7: One synthetic hazy image from the proposed synthetic hazy dataset. The structures enclosed in the red box in (f) are not preserved well. The proposed method does not need to estimate the transmission map and atmospheric light and generates a much clearer image which is visually close to the ground truth image.

5.4 Image Dehazing

For image dehazing, we compare our algorithm with the conventional methods [27, 59, 60] and the recent CNN-based methods [15, 16, 61]. We also retrain pix2pix [39] and CycleGAN [36] for fair comparisons.

Synthetic hazy images.

We first quantitatively evaluate our method using the synthetic dataset as mentioned in Section 5.1. The test dataset contains 240 images which include both indoor and outdoor scenes. To evaluate the recovered images by each algorithm, we use the PSNR and SSIM as the quality metrics. Table V shows that the proposed algorithm achieves competitive results against state-of-the-art methods in terms of PSNR and SSIM.

We show one example from the test dataset in Figure 7. The conventional methods [27, 59] need to estimate both the transmission map and the atmospheric light based on manually designed features. The colors of the generated images of [27] and [59] are little different from those of the ground truth images due to the inaccurate estimations of the transmission map and the atmospheric light. The CNN-based methods [15, 16] first estimate the transmission maps from hazy images and then follow the conventional methods to estimate clear images. Their recovered images contain haze residual as shown in Figure 7(d)-(e). The pix2pix [39] method is based on a conditional GAN. However, the structures of the estimated image are not preserved well. In contrast, due to the physics model constraint, our algorithm generates a much clearer image with fine details, which is visually close to the ground truth image.

In addition, we also evaluate our algorithm on the image dehazing dataset [62]. Table VI shows that the proposed algorithm performs favorably against state-of-the-art methods. The visual comparisons shown in Figure 8 demonstrate that the proposed algorithm generate a much clearer image.

Methods Input He [27] Berman [59] Chen [60] Ren [15] Cai [16] Zhang [61] Ours
PSNR 14.78 17.91 16.78 16.94 18.46 16.78 19.10 18.48
SSIM 0.6477 0.7269 0.7788 0.6607 0.7845 0.6935 0.7750 0.7992
TABLE VI: Quantitative evaluations with state-of-the-art methods on the outdoor image dehazing dataset [62].
(a) Input (b) He [27] (c) Berman [59] (d) Chen [60]
(e) Ren [15] (f) Cai [16] (g) Ours (h) GT
Fig. 8: A hazy image from the dataset [62]. The proposed method generates a much clearer image which is visually close to the ground truth.
(a) Input (b) He [27] (c) Berman [59] (d) Ren [15]
(e) Cai [16] (f) pix2pix [39] (g) CycleGAN [36] (h) Ours
Fig. 9: Results on a real hazy image. The proposed method generates a clearer image which looks more natural.

Real hazy images.

We evaluate the proposed algorithm on real hazy images and show some comparisons with the state-of-the-art methods in Figure 9. Our method generates decent results where the colors of the recovered images look more realistic.

5.5 Image Super-resolution

The proposed algorithm can be applied to image super-resolution by changing the physics model (1) into the image formulation of super-resolution. To show the effect our algorithm on image super-resolution, we use the same training dataset in [6] to train the proposed algorithm and evaluate it on the commonly used test dataset, i.e., “Set5”. We compare our method with state-of-the-art algorithms including SRCNN [3], ESPCN [63], VDSR [6], SRGAN [19], and EDSR [34]. Quantitative evaluation results are shown in Table VII. Although the proposed algorithm is not specially designed for super-resolution, it achieves competitive results compared to state-of-the-art methods.

Methods Bicubic SRCNN [3] ESPCN [63] VDSR [6] SRGAN [19] EDSR [64] Ours
PSNR 28.42 30.48 30.27 31.35 32.05 32.46 30.03
SSIM 0.8104 0.8628 0.8540 0.8838 0.8910 0.8968 0.9030
TABLE VII: Quantitative evaluations with state-of-the-art methods on the image super-resolution problem ().

5.6 Image Deraining

The proposed algorithm also works for image deraining which aims to remove rainy streaks or dirties from the input images. To evaluate our algorithm, we compare it with the conventional method [31] and state-of-the-art deep learning based methods [1, 34, 65] using the test dataset by [1]. We also retrain pix2pix [39] and CycleGAN [36] for fair comparisons.

(a) Input (b) Li [31] (c) Zhang [1] (d) Fu [65]
(e) Yang [34] (f) pix2pix [39] (g) CycleGAN [36] (h) Ours
Fig. 10: Results on a real rainy image. Due to the heavy rain, both the CNN-based methods [65, 34] and GAN-based methods [1, 39, 36] are not able to generate clear images. The part enclosed in the red box in (c) contains significant color distortions (best viewed on high-resolution displays with zoom-in).

Figure 10 shows a real rainy image. The method [31] based on handcrafted prior does not remove rainy streaks. We note that the image deraining algorithm [1] improves GAN by introducing the perceptual loss function [21]. However, there exist significant color distortions in some image regions (e.g., the part in the red box in Figure 10(c)). As mentioned in Section 1, this is mainly because that the algorithm [1] does not consider the physics model constraint. Compared to the other methods, our algorithm is able to remove the rainy streaks and generate a clean image. The method by Fu et al. [65] decomposes the input image into a detail layer and a base layer, where the detail layer is estimated by a CNN. However, this method is less effective when the rain is heavy. Yang et al.  [34] develop a multi-task network for rain detection and removal. However, some rainy streaks are not removed as shown in Figure 10(e). The pix2pix [39] and CycleGAN [36] algorithms fail to generate clear images. In contrast, the proposed method is able to remove rainy streaks and generates a much clearer image.

In addition, we further evaluate our method on image filtering and blind super-resolution problems. More experimental results are included in the supplemental material.

6 Analysis and Discussion

In this section, we further analyze the proposed method and compare it with related methods.

Effect of the physics model constraint.

Our method without the physics model constraint reduces to the conventional GAN [18] with loss function (9). To examine the effect of this constraint and ensure the fair comparison, we disable the physics model constrained learning process (BaseGAN for short) in our implementation. As shown in Figure 11(b), BaseGAN is not able to generate clear images. The structures and colors of the recovered image are significantly different from the input as the physics model is not used. In contrast, our method with the physics model constrained learning process generates much clearer images where the strokes and colors are preserved well.

We further quantitatively evaluate the effect of the proposed loss functions using the deblurring benchmark dataset [2]. Table VIII shows the average PSNR and SSIM values of the recovered images generated by the methods with different loss functions. The results by the method using (12) have the highest PSNR values and are also much clearer as shown in Figure 11(f).

Relation with GAN-based methods.

Recently, several methods have been proposed to improve the conventional GAN framework, e.g., CycleGAN [36], DiscoGAN [37], DualGAN [38]. The CycleGAN algorithm [36] introduces two generative networks and two discriminative networks to solve the image-to-image translation when the paired training data is not available. In addition, a cycle consistency loss is proposed to make the network be well trained. The other methods [1, 19, 39, 41, 42] explore GANs in the conditional settings and introduce pixel-wise loss functions (i.e., (9)) and perceptual loss functions [21] to ensure that the outputs of the generative network are close to the ground truth images in the training stage.

Although some of these methods are not designed for the image restoration problems addressed in this paper, to clarify the difference from these methods, we have retrained the most related algorithms for fair comparisons. As these algorithms directly learn the inverse process (i.e., (3)) by end-to-end trainable networks without considering the physical formation process, they usually fail to generate physically correct results as shown in Sections 5.3-5.6.

Following the notations in the proposed algorithm, the CycleGAN algorithm assumes that and , where is a generator which may have the similar effect to the mapping function in the physics model (1). With this assumption, this algorithm is likely to converge to a trivial solution as the identity mapping functions always hold for this assumption. This is the main reason that the results by the CycleGAN algorithm are quite similar to the inputs, e.g., the dehazing results are still similar to the hazy inputs (Figure 9(g)), the derained results still contain some rainy streaks (Figure 10(d)), and the deblurred results contain significant blur residual (Figure 3(g) and Figure 6(g)).

We further note that the CycleGAN algorithm is designed for unpaired images. For further fair comparisons, we retrain the CycleGAN algorithm on the image deraining and image deblurring tasks using the paired images (PCycleGAN for short). However, both the visual comparisons (Figure 11(e)) and quantitative evaluations (Table VIII) demonstrate that PCycleGAN does not generate clear images.

Different from the CycleGAN algorithm, our algorithm does not learn the generator as the physics model is known. Thus, our algorithm is able to avoid trivial solutions and performs favorably against state-of-the-art algorithms on each task.

(a) Input (b) BaseGAN (c) PCycleGAN
(d) Without (9) (e) Without (8) (f) Ours
Fig. 11: Effectiveness of the proposed method on text image deblurring.
BaseGAN w/o (8) w/o (9) w/o (10) PCycleGAN Ours
PSNR 23.48 26.86 19.83 27.50 19.71 28.80
SSIM 0.9178 0.9526 0.5941 0.9665 0.5833 0.9744
TABLE VIII: Effectiveness of the physical formation constrained learning algorithm and proposed loss functions.
(a) (b)
Fig. 12: (a) Quantitative evaluations on the blurred text images with random noise. The proposed method is robust to image noise in image deblurring. (b) Quantitative evaluations of the convergence property on the blurred text images.

Robustness to image noise.

The proposed method is robust to image noise. We evaluate it using 1,000 noisy blurred text images where each blurred image is added with the random noise. The noise level ranges from 0% to 10%. Figure 12(a) shows that the proposed method performs well even when the noise level is high.

Convergence property.

As our algorithm needs to jointly train generative and discriminative networks, a natural question is whether our method converges. We quantitatively evaluate the convergence properties of our method on the text deblurring dataset [2]. Figure 12(b) shows that the proposed method converges after less than 200 epochs in terms of the average PSNR.

We note that although using multiple discriminators in GANs may increase the difficulty for the training, our numerical results in Figure 12(b) show that using the physics model constraint makes the training process more stable and leads to a better convergence property compared to the GAN with one discriminator and generator.

Parameter analysis.

The proposed objective function involves the weight parameter . We analyze the effect of this parameter using 50 blurred face images by setting its values from 10 to 100 with the step size of 10. Table IX demonstrates that the proposed method performs slightly better when but overall is robust to this parameter with a wide range, i.e., within [40, 100].

Effect of weight parameter on image deblurring
10 20 30 40 50 60 70 80 90 100
PSNR 22.92 23.77 24.06 24.25 24.57 24.36 24.55 24.54 24.41 24.55
SSIM 0.6782 0.7103 0.7378 0.7489 0.7531 0.7528 0.7489 0.7521 0.7477 0.7488
Influence of the number of ResBlocks (#) on image deblurring
ResBlock (#) 3 6 9 20 25 30 35 40 45 50
PSNR 20.45 20.49 20.77 20.98 20.33 20.82 21.22 20.93 20.88 20.83
SSIM 0.5513 0.5628 0.6005 0.6120 0.5483 0.6137 0.6173 0.6062 0.6018 0.6075
TABLE IX: Sensitivity analysis w.r.t. the weight parameter and the number of ResBlocks

Ablation study w.r.t. the proposed network.

To analyze the effects of the number of ResBlocks, we evaluate the proposed network using 50 blurred images by setting the number of ResBlocks from 3 to 50. Table IX demonstrates that the proposed method is insensitive to the number of ResBlocks. Thus, we empirically use 9 ResBlocks as a trade-off between accuracy and speed.

Limitations.

Although the proposed method is able to restore images from degraded ones, it is still less effective for those examples which are caused by multiple degradation factors, e.g., both rain and haze, because the physics model does not describe such complex degradation process well. Figure 13

shows an example where the proposed algorithm fails to remove rain/snow from the input image due to the complex degradation process. Future work will consider jointly using semi-supervise and supervise learning algorithms to solve this problem.

(a) Input (b) Restored result
Fig. 13: The proposed algorithm is less effective for those examples, where the physics model does not describe the complex degradation process well.

7 Concluding Remarks

Motivated by the observation that the estimated results should be consistent with the observed inputs under the fundamental constraints in image restoration and related low-level vision problems, we enforce this fundamental constraint in the conventional GAN framework. As the fundamental constraint is derived from the physical formation process of some low-level problems, the proposed algorithm can be applied to a variety of image restoration and related low-level vision problems. By training in an end-to-end fashion, the proposed algorithm performs favorably against state-of-the-art methods on each task.

References

  • [1] H. Zhang, V. Sindagi, and V. M. Patel, “Image de-raining using a conditional generative adversarial network,” CoRR, vol. abs/1701.05957, 2017.
  • [2] M. Hradis, J. Kotera, P. Zemcík, and F. Sroubek, “Convolutional neural networks for direct text deblurring,” in BMVC, 2015, pp. 6.1–6.13.
  • [3] C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE TPAMI, vol. 38, no. 2, pp. 295–307, 2016.
  • [4] C. Dong, C. C. Loy, and X. Tang, “Accelerating the super-resolution convolutional neural network,” in ECCV, 2016, pp. 391–407.
  • [5] C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional network for image super-resolution,” in ECCV, 2014, pp. 184–199.
  • [6] J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super-resolution using very deep convolutional networks,” in CVPR, 2016, pp. 1646–1654.
  • [7] ——, “Deeply-recursive convolutional network for image super-resolution,” in CVPR, 2016, pp. 1637–1645.
  • [8] R. Liao, X. Tao, R. Li, Z. Ma, and J. Jia, “Video super-resolution via deep draft-ensemble learning,” in ICCV, 2015, pp. 531–539.
  • [9] L. Xu, J. S. J. Ren, Q. Yan, R. Liao, and J. Jia, “Deep edge-aware filters,” in ICML, 2015, pp. 1669–1678.
  • [10] S. Liu, J. Pan, and M.-H. Yang, “Learning recursive filters for low-level vision via a hybrid neural network,” in ECCV, 2016, pp. 560–576.
  • [11] C. Dong, Y. Deng, C. C. Loy, and X. Tang, “Compression artifacts reduction by a deep convolutional network,” in ICCV, 2015, pp. 576–584.
  • [12] V. Jain and H. S. Seung, “Natural image denoising with convolutional networks,” in NIPS, 2008, pp. 769–776.
  • [13] J. Xie, L. Xu, and E. Chen, “Image denoising and inpainting with deep neural networks,” in NIPS, 2012, pp. 350–358.
  • [14] D. Eigen, D. Krishnan, and R. Fergus, “Restoring an image taken through a window covered with dirt or rain,” in ICCV, 2013, pp. 633–640.
  • [15] W. Ren, S. Liu, H. Zhang, J. Pan, X. Cao, and M.-H. Yang, “Single image dehazing via multi-scale convolutional neural networks,” in ECCV, 2016, pp. 154–169.
  • [16] B. Cai, X. Xu, K. Jia, C. Qing, and D. Tao, “Dehazenet: An end-to-end system for single image haze removal,” IEEE TIP, vol. 25, no. 11, pp. 5187–5198, 2016.
  • [17] B. Li, X. Peng, Z. Wang, J. Xu, and D. Feng, “Aod-net: All-in-one dehazing network,” in ICCV, 2017, pp. 4770–4778.
  • [18] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, and Y. Bengio, “Generative adversarial nets,” in NIPS, 2014, pp. 2672–2680.
  • [19] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, “Photo-realistic single image super-resolution using a generative adversarial network,” in CVPR, 2017, pp. 4681–4690.
  • [20]

    O. Kupyn, V. Budzan, M. Mykhailych, D. Mishkin, and J. Matas, “Deblurgan: Blind motion deblurring using conditional adversarial networks,” in

    CVPR, 2018, pp. 8183–8192.
  • [21] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in ECCV, 2016, pp. 694–711.
  • [22]

    C. J. Schuler, H. C. Burger, S. Harmeling, and B. Schölkopf, “A machine learning approach for non-blind image deconvolution,” in

    CVPR, 2013, pp. 1067–1074.
  • [23] L. Xu, J. S. J. Ren, C. Liu, and J. Jia, “Deep convolutional neural network for image deconvolution,” in NIPS, 2014, pp. 1790–1798.
  • [24] J. Sun, W. Cao, Z. Xu, and J. Ponce, “Learning a convolutional neural network for non-uniform motion blur removal,” in CVPR, 2015, pp. 769–777.
  • [25] C. J. Schuler, M. Hirsch, S. Harmeling, and B. Schölkopf, “Learning to deblur,” IEEE TPAMI, vol. 38, no. 7, pp. 1439–1451, 2016.
  • [26] S. Nah, T. Hyun Kim, and K. Mu Lee, “Deep multi-scale convolutional neural network for dynamic scene deblurring,” in CVPR, 2017, pp. 3883–3891.
  • [27] K. He, J. Sun, and X. Tang, “Single image haze removal using dark channel prior,” in CVPR, 2009, pp. 1956–1963.
  • [28] L.-W. Kang, C.-W. Lin, and Y.-H. Fu, “Automatic single-image-based rain streaks removal via image decomposition,” IEEE TIP, vol. 21, no. 4, pp. 1742–1755, 2012.
  • [29] Y. Luo, Y. Xu, and H. Ji, “Removing rain from a single image via discriminative sparse coding,” in ICCV, 2015, pp. 3397–3405.
  • [30] Y.-L. Chen and C.-T. Hsu, “A generalized low-rank appearance model for spatio-temporally correlated rain streaks,” in ICCV, 2013, pp. 1968–1975.
  • [31] Y. Li, R. T. Tan, X. Guo, J. Lu, and M. S. Brown, “Rain streak removal using layer priors,” in CVPR, 2016, pp. 2736–2744.
  • [32] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016, pp. 770–778.
  • [33] X. Fu, J. Huang, D. Zeng, Y. Huang, X. Ding, and J. Paisley, “Removing rain from single images via a deep detail network,” in CVPR, 2017, pp. 3855–3863.
  • [34] W. Yang, R. T. Tan, J. Feng, J. Liu, Z. Guo, and S. Yan, “Deep joint rain detection and removal from a single image,” in CVPR, 2017, pp. 1357–1366.
  • [35] M. S. M. Sajjadi, B. Schölkopf, and M. Hirsch, “Enhancenet: Single image super-resolution through automated texture synthesis,” in ICCV, 2017, pp. 4501–4510.
  • [36] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in ICCV, 2017, pp. 2223–2232.
  • [37] T. Kim, M. Cha, H. Kim, J. K. Lee, and J. Kim, “Learning to discover cross-domain relations with generative adversarial networks,” in ICML, 2017, pp. 1857–1865.
  • [38] Z. Yi, H. Zhang, P. Tan, and M. Gong, “Dualgan: Unsupervised dual learning for image-to-image translation,” in CoRR, 2017, pp. 2849–2857.
  • [39] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in CVPR, 2017, pp. 1125–1134.
  • [40] X. Mao, Q. Li, H. Xie, R. Y. K. Lau, and Z. Wang, “Multi-class generative adversarial networks with the L2 loss function,” CoRR, vol. abs/1611.04076, 2016.
  • [41] C. K. Sønderby, J. Caballero, L. Theis, W. Shi, and F. Huszár, “Amortised MAP inference for image super-resolution,” in ICLR, 2017.
  • [42] X. Xu, D. Sun, J. Pan, Y. Zhang, H. Pfister, and M.-H. Yang, “Learning to super-resolve blurry face and text images,” in ICCV, 2017, pp. 251–260.
  • [43] A. Levin, Y. Weiss, F. Durand, and W. T. Freeman, “Understanding and evaluating blind deconvolution algorithms,” in CVPR, 2009, pp. 1964–1971.
  • [44] J. Pan, D. Sun, H. Pfister, and M.-H. Yang, “Blind image deblurring using dark channel prior,” in CVPR, 2016, pp. 1628–1636.
  • [45] L. Li, J. Pan, W.-S. Lai, C. Gao, N. Sang, and M.-H. Yang, “Learning a discriminative prior for blind image deblurring,” in CVPR, 2018, pp. 6616–6625.
  • [46] D. Ulyanov, A. Vedaldi, and V. S. Lempitsky, “Instance normalization: The missing ingredient for fast stylization,” CoRR, vol. abs/1607.08022, 2016.
  • [47] Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in ICCV, 2015, pp. 3730–3738.
  • [48] T.-Y. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: common objects in context,” in ECCV, 2014, pp. 740–755.
  • [49] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from RGBD images,” in ECCV, 2012, pp. 746–760.
  • [50] A. Saxena, M. Sun, and A. Y. Ng, “Make3d: Learning 3d scene structure from a single still image,” IEEE TPAMI, vol. 31, no. 5, pp. 824–840, 2009.
  • [51] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980, 2014.
  • [52] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in AISTATS, 2010, pp. 249–256.
  • [53] A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, and R. Webb, “Learning from simulated and unsupervised images through adversarial training,” CoRR, vol. abs/1612.07828, 2016.
  • [54] L. Xu, S. Zheng, and J. Jia, “Unnatural L sparse representation for natural image deblurring,” in CVPR, 2013, pp. 1107–1114.
  • [55] J. Pan, Z. Hu, Z. Su, and M.-H. Yang, “-regularized intensity and gradient prior for deblurring text images and beyond,” IEEE TPAMI, vol. 39, no. 2, pp. 342–355, 2017.
  • [56]

    J. Zhang, J. Pan, J. Ren, Y. Song, L. Bao, R. W. Lau, and M.-H. Yang, “Dynamic scene deblurring using spatially variant recurrent neural networks,” in

    CVPR, 2018, pp. 2521–2529.
  • [57] L. Xiao, J. Wang, W. Heidrich, and M. Hirsch, “Learning high-order filters for efficient blind deconvolution of document photographs,” in ECCV, 2016, pp. 734–749.
  • [58] J. Pan, Z. Hu, Z. Su, and M.-H. Yang, “Deblurring face images with exemplars,” in ECCV, 2014, pp. 47–62.
  • [59] D. Berman, T. Treibitz, and S. Avidan, “Non-local image dehazing,” in CVPR, 2016, pp. 1674–1682.
  • [60] C. Chen, M. N. Do, and J. Wang, “Robust image and video dehazing with visual artifact suppression via gradient residual minimization,” in ECCV, 2016, pp. 576–591.
  • [61] H. Zhang and V. M. Patel, “Densely connected pyramid dehazing network,” in

    The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    , 2018, pp. 3194–3203.
  • [62] http://www.vision.ee.ethz.ch/ntire18/.
  • [63] W. Shi, J. Caballero, F. Huszar, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang, “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,” in CVPR, 2016, pp. 1874–1883.
  • [64] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee, “Enhanced deep residual networks for single image super-resolution,” in CVPR Workshops, 2017, pp. 1132–1140.
  • [65] X. Fu, J. Huang, X. Ding, Y. Liao, and J. Paisley, “Clearing the skies: A deep network architecture for single-image rain removal,” IEEE TIP, 2017.