1 Introduction
Many fundamental image restoration and related lowlevel vision problems (e.g., image filtering, image deblurring, image superresolution, image dehazing, image deraining, etc.) ask: given an input image
, how to estimate a clear image of the same scene. The fundamental constraint is that the estimated should be consistent with the input under the formation model:(1) 
where the operator maps the unknown result to the observed image . For example, if (1) is an image deblurring problem, corresponds to the blur operation. As estimating from is usually illposed, it is natural to introduce additional constraints to regularize . The commonly used method is based on the maximum a posterior (MAP) framework, where can be solved by
(2) 
Here and
are probability density functions, which are usually referred to as the likelihood term and image prior. In recent years, numerous deep models have been developed to deal with image restoration and related lowlevel vision tasks, e.g., image superresolution
[3, 4, 5, 6, 7, 8], image filtering [9, 10], noise removal [11, 12, 13], image deraining [14], and dehazing [15, 16, 17], to name a few. Mathematically, these methods directly learn the mapping functions between and , which can be written by(3) 
where is the mapping function which can be regarded as an inverse operator of in (1). Theoretically, should be close to the ground truth if the network can simulate the solution. However, due to the complexity of the problem, e.g., the solution space of the corresponding problem is too large, a simple network with a random initialization is unable to estimate the solution well. Thus, only using a feedforward network to learn the inverse operator does not generate good results. Figure 1(e) shows an image deblurring result by a deep feedforward network [2], where the characters of the generated image are not estimated well.
Another kind of deep neural network, i.e., GAN
[18], has been developed in some image restoration problems, e.g., image superresolution [19], image deraining [1], image deblurring [20], etc. The GAN framework contains a generative model and a discriminative model, where the discriminative model is used to regularize the generative model, so that the distribution of the output is close to that of realistic images. However, the adversarial loss does not ensure that the contents of the outputs are consistent with those of the inputs. Although several algorithms [1, 19]use a pixelwise loss function based on the ground truths and a perceptual loss function
[21] based on pretrained VGG features as the constraints in the GAN formulation, these algorithms are still less effective as shown in Figure 1(b).We note that the aforementioned endtoend trainable networks only aim to learn the solutions (i.e., (3)) and do not guarantee whether the learned solutions satisfy the physics model (1) or not. Without the physics model constraint, the methods based on the feedforward networks do not generate physically correct results and some main structures and details of the generated images are incorrect (e.g., Figure 1(b) and (e)). Thus, it is important to develop an algorithm that can model both the solutions (i.e., (3)) and physics model (i.e., (1)) in a unified framework to address image restoration and related problems.
In this paper, we propose a physics model constrained learning algorithm so that it can guide the estimation in the conventional GAN framework. The physics model ensures that the estimated result (i.e., ) should be consistent with the observed image . The GAN with the physics model constrained learning algorithm is jointly trained in an endtoend fashion. We show that the proposed algorithm is a flexible framework which can be applied to a variety of image restoration and related lowlevel vision problems. It performs favorably against stateoftheart methods on each task. Figure 1 shows two applications of the proposed algorithm.
2 Related Work
Recent years have witnessed significant advances in numerous image restoration problems due to the use of designed image priors and deep learning. In this section, we mainly review the most related work and put this work in proper context.
Image deblurring.
Image deblurring is an illposed problem. Conventional methods usually design kinds of effective statistical priors to solve this problem. In contrast to conventional methods, Schuler et al. [22]
develop a multilayer perceptron approach to remove noise and artifacts in the deblurring process. Xu et al.
[23]develop a convolutional neural network based on the singular value decomposition (SVD) to deal with outliers. However, these methods are designed for nonblind image deblurring. It is not trivial to extend these algorithms to blind image deblurring. In blind image deblurring, some approaches
[24, 25] first use convolutional neural networks to estimate blur kernels and then deblur images with the conventional image restoration methods. To directly restore clear images, several methods propose endtoend trainable neural networks [2, 26]. Although these methods avoid a complex blur kernel estimation step, the generated results by these methods are not physically correct (e.g., the structures of the recovered images are not preserved well as shown in Figure 1(e)).Image dehazing.
The success of the conventional methods is due to designing the handcrafted features for the estimation of transmission maps, e.g., dark channel [27]. Recent deep learningbased methods [15, 16] first use neural networks to estimate transmission maps and then estimate clear images based on the traditional methods [27]. Different from these methods, we propose an endtoend trainable network to solve the image dehazing problem, which can directly estimate clear images from hazy inputs.
Image deraining.
For the image deraining, conventional algorithms are usually based on the statistical properties of rainy streaks [28, 29, 30, 31]. Recently, Eigen et al. [14] develop a neural network to remove rain/dirt. Motivated by the success of the ResNet [32], Fu et al. [33] develop a deep detail network for image deraining. In [1], Zhang et al. develop an improved GAN method by introducing a perceptual loss function [21]. Yang et al. [34] develop a multitask network for rain detection and removal. These deep learningbased methods do not consider the physical formation model and are based on endtoend trainable networks, which do not effectively solve the deraining problem (Figure 1(b)).
Image superresolution and other problems.
Superresolution has achieved significant progress due to the use of deep learning [5, 6, 19, 35]. In [5], Dong et al. develop an endtoend trainable network for superresolution (SRCNN). As SRCNN is less effective in the detail estimation, Kim et al. [6] propose the residual learning algorithm based on a deeper network. To generate more realistic images, Ledig et al. [19] develop a GAN for image superresolution. In addition, the deep learning method has been applied to other lowlevel vision problems, such as image filtering [9, 10] and image denoising [11, 12, 13]. Different from these methods, we propose a GANbased method under the constraint of the physics model to solve image restoration and related lowlevel vision problems.
Generative adversarial networks.
Goodfellow et al. [18] propose the GAN framework to generate realisticlooking images from random noise. Motivated by this framework, lots of methods including methodology and applications [36, 37, 38, 39, 40] have been proposed. Recently, the GAN framework has also been applied to some lowlevel vision problems [1, 19, 20, 41, 42]. Different from these methods, we propose an efficient physics model constrained learning algorithm to improve the GAN framework to solve image deblurring, image dehazing, and related tasks.
3 Image Restoration from GAN
The GAN algorithm learns a generative model via an adversarial process. It simultaneously trains a generator network and a discriminator network by optimizing
(4) 
where denotes random noise, denotes a real image, and denotes a discriminator network. For simplicity, we also use to denote a generator network.
In the training process of GAN, the generator generates samples (i.e., ) that can fool the discriminator, while the discriminator learns to distinguish the real data and the samples from the generator.
We note that the image restoration problem (2) can be solved by minimizing the objective energy function
(5) 
where is the data term which ensures the recovered image should be consistent with the input image and is the regularization of which models the properties of (e.g., sparse gradient distribution [43]).
In some problems, e.g., image deblurring, acts as a discriminator where the value of is much smaller if is clear and larger otherwise [44]. In other words, optimizing the objective function (5) will make the value of smaller. Thus, the estimated intermediate image will be much clearer and physically correct.
Based on above discussions, if we use the observed image as the input of (4), the adversarial loss
(6) 
has the similar effect to as the value of (6) is much larger if is clear and smaller otherwise. With this property, the adversarial loss can be used as a prior to regularize the solution space of image restoration as evidenced by [45].
We note that GAN with the observed data as the input has shown promising results in image superresolution [19], image deraining [1], image deblurring [20, 42], etc. However, GAN does not guarantee whether the solutions satisfy the physics model (1) or not (i.e., GAN does not consider the effect of the data term in (5)) and thus fails to generate clear images as illustrated in Section 1. In this following, we will develop a new method to improve the estimation of GAN under the guidance of the physics model (1).
is used to classify whether the distributions of the outputs from the generator
are close to those of the ground truth images or not. The discriminative network is used to classify whether the regenerated result is consistent with the observed image or not. All the networks are jointly trained in an endtoend manner.4 Proposed Algorithm
Our goal is not to propose a novel network structure, but to have a standard framework using the fundamental constraint to guide the training of GANs and ensure that the restored results are physically correct.
To ensure the output of GANs (i.e., ) is consistent with the input under the model (1), we introduce an additional discriminative network. The proposed framework is shown in Figure 2. It contains two discriminative networks, one generative network, and one fundamental constraint (i.e., (1)). We take the image deblurring problem as an example. Let and denote the clear images and the corresponding blurred images. The generative network learns the mapping function and generates the intermediate deblurred image from the input . Then, we apply the physics model (1) to
(7) 
where denotes a blur kernel which is only used in the training process^{1}^{1}1Note that the blur kernel in (7) is known, which is also used to generate blurred image from clear image when synthesizing the training data. Thus, the physics model used in the proposed algorithm is not stochastic. and denotes the convolution operator. The discriminative network takes the blurred image and the regenerated image as the inputs, which is used to classify whether the generated results satisfy the blur model or not. The other discriminative network takes the ground truth and the intermediate deblurred image as the inputs and it is used to classify whether is clear or not.
For image dehazing, the physics model is , where is the transmission map and is the atmospheric light. In image superresolution, the physics model is , where
denotes the downsampling and filtering operator. In this paper, we use the bicubic interpolation operation. For other applications, we can use the corresponding physics models in the proposed algorithm to solve the problem.
We note that although the proposed network is trained in an endtoend manner, it is constrained by a physics model and thus is not fully blind in the training stage. With the learned generator , the test stage is blind. We can directly obtain the final results by applying it to the input images.
4.1 Network Architecture
Once the physics model constraint is determined in the proposed network, we can use some existing network architectures to determine the architectures of generative network and discriminative network.
Generative network.
The generative network is used to generate the final results. As there exist lots of generative networks which are used in image restoration and related lowlevel vision problems, e.g., superresolution [21, 19], image editing [36], we use the similar network architectures by [36] as our generative network. The detailed network parameters are shown in Table I.
Discriminative network.
We note that the PatchGANs [19, 36, 39] have fewer parameters than a full image discriminator and achieve stateoftheart results in many vision problems. Similar to [19, 36, 39], we use PatchGANs to classify whether the generated results are real or fake. The discriminator architecture is shown in Table I.
Parameters of the generative network  

Layers  CIR  CIR  CIR  ResBlockResBlock  CTIR  CTIR  CIR  
Filter size  7  3  3  3  3  3  3  7 
Filter numbers  64  128  256  256  256  128  64  3 
Stride  1  2  2  1  1  2  2  1 
Parameters of the discriminative network  

Layers  CILR  CILR  CILR  CILR  CILR 
Filter size  4  4  4  4  4 
Filter numbers  64  128  256  512  1 
Stride  2  2  2  1  1 
and ReLU; “ResBlock” denotes the residual block
[32]which contains two convolutional layers with the IN and ReLU; “CTIR” denotes the fractionallystrided convolutional layers with IN and ReLU; “CILR” denotes the convolutional layer with the IN and LeakyReLU.
4.2 Loss Function
A straightforward way for training is to use the original GAN formulation (4). However, the contents of the generated images based on this training loss may be different from the ground truth images as evidenced by [19]. To ensure that the contents of the generated results from the generative networks are close to those of the ground truth images and also consistent with those of the inputs under the physical formulation model (1), we use the norm regularized pixelwise loss functions
(8) 
and
(9) 
in the training stage. To make the generative network learning process more stable, we further use the loss function
(10) 
to regularize the generator .
Finally, we propose a new objective function
(11) 
to ensure that the output of GANs is consistent with the observed input under the model (1).
Based on above considerations, the networks , , and are trained by solving
(12) 
where is a weight parameter. To make the training process stable, we use the least square GAN loss [40] in to generate higher quality results.
5 Experimental Results
In this section, we evaluate the proposed algorithm on several image restoration tasks including image deblurring, dehazing, superresolution, and deraining. Due to the comprehensive experiments conducted, we only show a small portion of the results in the main paper. More results and applications are included in the supplementary material. The trained models, datasets, and source code will be made available at https://sites.google.com/site/jspanhomepage/physicsgan/.
5.1 Datasets
For image deblurring, we use the training dataset by Hradiš et al. [2], which consists of images with both defocus blur generated by antialiased disc and motion blur generated by the random walk. We randomly crop one million blurred image patches from the dataset for training and use the test dataset [2] which includes 100 clear images to evaluate our algorithm. In addition, with the same blur kernels [2], we further randomly choose 50,000 clear face images from CelebA [47] and 50,000 natural images from COCO [48] to generate the training data for the face image deblurring and natural image deblurring. We add the random noise into each blurred image, where the noise level ranges from 0 to 10%.
For image dehazing, we use the NYU depth dataset [49] and randomly select 1,200 clear images and the corresponding depths to generate hazy images according to the hazy model [27]. As the images in the NYU depth dataset [49] are indoor images, we also randomly choose 500 outdoor images from the Make3D dataset [50] to synthesize the outdoor hazy images as the training dataset. All the training images are resized to the canonical size of pixels. To evaluate image dehazing, we randomly select 240 images from the NYU depth dataset [49] and the Make3D dataset [50], where the images are not used in the training stage. In addition to these synthetic data, we also compare with stateoftheart methods using the commonly used real hazy images.
In the image deraining task, we use the dataset by Zhang et al. [1] to train and evaluate our algorithm.
5.2 Training
We train the models using the Adam optimizer [51]
with an initial learning rate 0.0002, where it is linearly decayed after every 100 epochs. We set the batch size to be 1. Similar to Glorot and Bengio
[52], the weights of filters in each layer are initialized using a Gaussian distribution with zero mean and variance of
, where is the size of the respective convolutional filter. The slope of the LeakyReLU is 0.2. After obtaining generator , as we know the paired training data {,} and the corresponding physics model parameters (e.g., in image deblurring) that are used to synthesize from , we apply the same physics model parameters to and generate . Then the discriminator takes and as the input while the discriminator takes and as the input. Similar to [53], we update the discriminators using a history of generated images instead of the ones by the latest generative networks according to [36]. The update ratio between the generator and the discriminators is set to be 1.Methods  Input  Xu [54]  Pan [55]  Pan [44]  CNN [2]  Nah [26]  pix2pix [39]  CycleGAN [36]  Ours 

PSNR  18.52  17.52  18.19  18.47  26.53  22.57  23.33  11.92  28.80 
SSIM  0.6658  0.4186  0.6270  0.6127  0.9422  0.8924  0.9170  0.2792  0.9744 
5.3 Image Deblurring
We compare our algorithm with the image deblurring methods including conventional stateoftheart methods [44, 54, 55] and CNNbased deblurring methods [2, 20, 26, 56]. We note that recent CNNbased deblurring algorithms [20, 26, 56] are designed for natural images. For text and face image deblurring applications, we retrain these algorithms for fair comparisons. For natural image deblurring, we use the provided trained models for fair comparisons. We further note that the pix2pix [39] and CycleGAN [36] algorithms are designed for imagetoimage translation which can be applied to image restoration. We retrain these two algorithms for fair comparisons.
Synthetic blurred images.
We quantitatively evaluate our method using the text image dataset described above and the PSNR/SSIM as the metrics. Table II shows that the proposed algorithm performs favorably against stateoftheart methods in terms of PSNR and SSIM^{2}^{2}2As the implementation of [57] is not available, we do not compare this method in this paper.. Note that the method by [26] uses a multiscale CNN with an adversarial learning algorithm. However, it fails to generate the results with high PSNR values as shown in Table II. The pix2pix [39] method achieves the similar results to [26] as both these methods improve GAN by introducing the pixelwise loss (9). In contrast, our algorithm develops a physical formation model learning algorithm and generates the results with higher PSNR values.
Figure 3 shows a blurred text image from the test dataset. The conventional algorithms [44, 54, 55] fail to generate clear images. The CNNbased method [2] generates better results. However, it still contains significant blur residual as this method only uses a feedforward network and does not consider the consistency between the estimated results and blurred inputs. We note that the CycleGAN method [36] improves GAN by introducing cycle consistency constraint for the imagetoimage translation task. However, it fails to generate clear images as shown in Figure 3(g). In contrast, our method is able to generate much clearer images with recognizable characters as shown in Figure 3(h), which further demonstrates the importance of the physical formation constraint.
Methods  Xu [54]  Pan [58]  Pan [44]  Zhang [56]  Nah [26]  DeblurGAN [20]  CycleGAN [36]  Ours 

PSNR  18.84  18.85  21.51  23.22  22.48  19.18  20.73  24.17 
SSIM  0.4054  0.4652  0.4263  0.6832  0.4962  0.2563  0.5978  0.7705 
(a) Input  (b) Xu [54]  (c) Pan [58]  (d) Pan [44] 
(e) Zhang [56]  (f) DeblurGAN [2]  (g) CycleGAN [36]  (h) Ours 
Methods  Xu [54]  Pan [55]  Pan [44]  Zhang [2]  Nah [26]  DeblurGAN [39]  CycleGAN [36]  Ours 

PSNR  20.11  19.97  20.72  22.48  20.89  20.10  19.98  22.63 
SSIM  0.3802  0.4419  0.3450  0.5982  0.4878  0.2585  0.5963  0.7151 
(a) Input  (b) Xu [54]  (c) Pan [55]  (d) Pan [44] 
(e) Zhang [56]  (f) DeblurGAN [2]  (g) CycleGAN [36]  (h) Ours 
(a) Input  (b) Xu [54]  (c) Pan [55]  (d) Pan [44] 
(e) Nah [26]  (f) CNN [2]  (g) CycleGAN [36]  (h) Ours 
We then select 160 blurred face images to evaluate our algorithm, where the test examples and training examples do not overlap. We compare the proposed method with stateoftheart algorithms including conventional methods [44, 54, 55] and recent deep learning based methods [20, 26, 56]. Table III shows that the proposed algorithm performs favorably against stateoftheart methods in terms of PSNR and SSIM. Figure 4 shows deblurred results from several algorithms. As the input image contains significant blur, the algorithms [26, 36, 56, 58] fail to generate clear images. The DeblurGAN algorithm is mainly based on a conditional GAN framework. However, the deblurred result by this method still contains significant artifacts as the conventional GAN usually leads to checkerboard artifacts. In contrast, the proposed algorithm generates a much clearer image due to the proposed physics model constraint.
Finally, we use 160 blurred natural images to evaluate our algorithm, where the test examples and training examples do not overlap. The results in Table IV show that the proposed method performs well on natural image deblurring.
Although the proposed network is trained on the uniform blurred images, it also works on the dynamic scenes [26]. More results are included in the supplemental material.
Real blurred images.
We further evaluate our method on the real images. Figure 6(a) shows a real blurred image. The proposed method generates a visually better result with clearer characters than other algorithms as shown in Figure 6(h).
Methods  Input  He [27]  Berman [59]  Ren [15]  Cai [16]  pix2pix [39]  CycleGAN [36]  Ours 

PSNR  25.84  28.36  28.54  27.29  28.45  28.52  27.73  28.79 
SSIM  0.5868  0.6654  0.6823  0.6128  0.6789  0.6821  0.6237  0.6849 
(a) Input  (b) He [27]  (c) Berman [59]  (d) Ren [15] 
(e) Cai [16]  (f) pix2pix [39]  (g) Ours  (h) GT 
5.4 Image Dehazing
For image dehazing, we compare our algorithm with the conventional methods [27, 59, 60] and the recent CNNbased methods [15, 16, 61]. We also retrain pix2pix [39] and CycleGAN [36] for fair comparisons.
Synthetic hazy images.
We first quantitatively evaluate our method using the synthetic dataset as mentioned in Section 5.1. The test dataset contains 240 images which include both indoor and outdoor scenes. To evaluate the recovered images by each algorithm, we use the PSNR and SSIM as the quality metrics. Table V shows that the proposed algorithm achieves competitive results against stateoftheart methods in terms of PSNR and SSIM.
We show one example from the test dataset in Figure 7. The conventional methods [27, 59] need to estimate both the transmission map and the atmospheric light based on manually designed features. The colors of the generated images of [27] and [59] are little different from those of the ground truth images due to the inaccurate estimations of the transmission map and the atmospheric light. The CNNbased methods [15, 16] first estimate the transmission maps from hazy images and then follow the conventional methods to estimate clear images. Their recovered images contain haze residual as shown in Figure 7(d)(e). The pix2pix [39] method is based on a conditional GAN. However, the structures of the estimated image are not preserved well. In contrast, due to the physics model constraint, our algorithm generates a much clearer image with fine details, which is visually close to the ground truth image.
In addition, we also evaluate our algorithm on the image dehazing dataset [62]. Table VI shows that the proposed algorithm performs favorably against stateoftheart methods. The visual comparisons shown in Figure 8 demonstrate that the proposed algorithm generate a much clearer image.
Methods  Input  He [27]  Berman [59]  Chen [60]  Ren [15]  Cai [16]  Zhang [61]  Ours 

PSNR  14.78  17.91  16.78  16.94  18.46  16.78  19.10  18.48 
SSIM  0.6477  0.7269  0.7788  0.6607  0.7845  0.6935  0.7750  0.7992 
(a) Input  (b) He [27]  (c) Berman [59]  (d) Chen [60] 
(e) Ren [15]  (f) Cai [16]  (g) Ours  (h) GT 
(a) Input  (b) He [27]  (c) Berman [59]  (d) Ren [15] 
(e) Cai [16]  (f) pix2pix [39]  (g) CycleGAN [36]  (h) Ours 
Real hazy images.
We evaluate the proposed algorithm on real hazy images and show some comparisons with the stateoftheart methods in Figure 9. Our method generates decent results where the colors of the recovered images look more realistic.
5.5 Image Superresolution
The proposed algorithm can be applied to image superresolution by changing the physics model (1) into the image formulation of superresolution. To show the effect our algorithm on image superresolution, we use the same training dataset in [6] to train the proposed algorithm and evaluate it on the commonly used test dataset, i.e., “Set5”. We compare our method with stateoftheart algorithms including SRCNN [3], ESPCN [63], VDSR [6], SRGAN [19], and EDSR [34]. Quantitative evaluation results are shown in Table VII. Although the proposed algorithm is not specially designed for superresolution, it achieves competitive results compared to stateoftheart methods.
5.6 Image Deraining
The proposed algorithm also works for image deraining which aims to remove rainy streaks or dirties from the input images. To evaluate our algorithm, we compare it with the conventional method [31] and stateoftheart deep learning based methods [1, 34, 65] using the test dataset by [1]. We also retrain pix2pix [39] and CycleGAN [36] for fair comparisons.
(a) Input  (b) Li [31]  (c) Zhang [1]  (d) Fu [65] 
(e) Yang [34]  (f) pix2pix [39]  (g) CycleGAN [36]  (h) Ours 
Figure 10 shows a real rainy image. The method [31] based on handcrafted prior does not remove rainy streaks. We note that the image deraining algorithm [1] improves GAN by introducing the perceptual loss function [21]. However, there exist significant color distortions in some image regions (e.g., the part in the red box in Figure 10(c)). As mentioned in Section 1, this is mainly because that the algorithm [1] does not consider the physics model constraint. Compared to the other methods, our algorithm is able to remove the rainy streaks and generate a clean image. The method by Fu et al. [65] decomposes the input image into a detail layer and a base layer, where the detail layer is estimated by a CNN. However, this method is less effective when the rain is heavy. Yang et al. [34] develop a multitask network for rain detection and removal. However, some rainy streaks are not removed as shown in Figure 10(e). The pix2pix [39] and CycleGAN [36] algorithms fail to generate clear images. In contrast, the proposed method is able to remove rainy streaks and generates a much clearer image.
In addition, we further evaluate our method on image filtering and blind superresolution problems. More experimental results are included in the supplemental material.
6 Analysis and Discussion
In this section, we further analyze the proposed method and compare it with related methods.
Effect of the physics model constraint.
Our method without the physics model constraint reduces to the conventional GAN [18] with loss function (9). To examine the effect of this constraint and ensure the fair comparison, we disable the physics model constrained learning process (BaseGAN for short) in our implementation. As shown in Figure 11(b), BaseGAN is not able to generate clear images. The structures and colors of the recovered image are significantly different from the input as the physics model is not used. In contrast, our method with the physics model constrained learning process generates much clearer images where the strokes and colors are preserved well.
We further quantitatively evaluate the effect of the proposed loss functions using the deblurring benchmark dataset [2]. Table VIII shows the average PSNR and SSIM values of the recovered images generated by the methods with different loss functions. The results by the method using (12) have the highest PSNR values and are also much clearer as shown in Figure 11(f).
Relation with GANbased methods.
Recently, several methods have been proposed to improve the conventional GAN framework, e.g., CycleGAN [36], DiscoGAN [37], DualGAN [38]. The CycleGAN algorithm [36] introduces two generative networks and two discriminative networks to solve the imagetoimage translation when the paired training data is not available. In addition, a cycle consistency loss is proposed to make the network be well trained. The other methods [1, 19, 39, 41, 42] explore GANs in the conditional settings and introduce pixelwise loss functions (i.e., (9)) and perceptual loss functions [21] to ensure that the outputs of the generative network are close to the ground truth images in the training stage.
Although some of these methods are not designed for the image restoration problems addressed in this paper, to clarify the difference from these methods, we have retrained the most related algorithms for fair comparisons. As these algorithms directly learn the inverse process (i.e., (3)) by endtoend trainable networks without considering the physical formation process, they usually fail to generate physically correct results as shown in Sections 5.35.6.
Following the notations in the proposed algorithm, the CycleGAN algorithm assumes that and , where is a generator which may have the similar effect to the mapping function in the physics model (1). With this assumption, this algorithm is likely to converge to a trivial solution as the identity mapping functions always hold for this assumption. This is the main reason that the results by the CycleGAN algorithm are quite similar to the inputs, e.g., the dehazing results are still similar to the hazy inputs (Figure 9(g)), the derained results still contain some rainy streaks (Figure 10(d)), and the deblurred results contain significant blur residual (Figure 3(g) and Figure 6(g)).
We further note that the CycleGAN algorithm is designed for unpaired images. For further fair comparisons, we retrain the CycleGAN algorithm on the image deraining and image deblurring tasks using the paired images (PCycleGAN for short). However, both the visual comparisons (Figure 11(e)) and quantitative evaluations (Table VIII) demonstrate that PCycleGAN does not generate clear images.
Different from the CycleGAN algorithm, our algorithm does not learn the generator as the physics model is known. Thus, our algorithm is able to avoid trivial solutions and performs favorably against stateoftheart algorithms on each task.
(a) Input  (b) BaseGAN  (c) PCycleGAN 
(d) Without (9)  (e) Without (8)  (f) Ours 
BaseGAN  w/o (8)  w/o (9)  w/o (10)  PCycleGAN  Ours  

PSNR  23.48  26.86  19.83  27.50  19.71  28.80 
SSIM  0.9178  0.9526  0.5941  0.9665  0.5833  0.9744 
(a)  (b) 
Robustness to image noise.
The proposed method is robust to image noise. We evaluate it using 1,000 noisy blurred text images where each blurred image is added with the random noise. The noise level ranges from 0% to 10%. Figure 12(a) shows that the proposed method performs well even when the noise level is high.
Convergence property.
As our algorithm needs to jointly train generative and discriminative networks, a natural question is whether our method converges. We quantitatively evaluate the convergence properties of our method on the text deblurring dataset [2]. Figure 12(b) shows that the proposed method converges after less than 200 epochs in terms of the average PSNR.
We note that although using multiple discriminators in GANs may increase the difficulty for the training, our numerical results in Figure 12(b) show that using the physics model constraint makes the training process more stable and leads to a better convergence property compared to the GAN with one discriminator and generator.
Parameter analysis.
The proposed objective function involves the weight parameter . We analyze the effect of this parameter using 50 blurred face images by setting its values from 10 to 100 with the step size of 10. Table IX demonstrates that the proposed method performs slightly better when but overall is robust to this parameter with a wide range, i.e., within [40, 100].
Effect of weight parameter on image deblurring  
10  20  30  40  50  60  70  80  90  100  
PSNR  22.92  23.77  24.06  24.25  24.57  24.36  24.55  24.54  24.41  24.55 
SSIM  0.6782  0.7103  0.7378  0.7489  0.7531  0.7528  0.7489  0.7521  0.7477  0.7488 
Influence of the number of ResBlocks (#) on image deblurring  

ResBlock (#)  3  6  9  20  25  30  35  40  45  50 
PSNR  20.45  20.49  20.77  20.98  20.33  20.82  21.22  20.93  20.88  20.83 
SSIM  0.5513  0.5628  0.6005  0.6120  0.5483  0.6137  0.6173  0.6062  0.6018  0.6075 
Ablation study w.r.t. the proposed network.
To analyze the effects of the number of ResBlocks, we evaluate the proposed network using 50 blurred images by setting the number of ResBlocks from 3 to 50. Table IX demonstrates that the proposed method is insensitive to the number of ResBlocks. Thus, we empirically use 9 ResBlocks as a tradeoff between accuracy and speed.
Limitations.
Although the proposed method is able to restore images from degraded ones, it is still less effective for those examples which are caused by multiple degradation factors, e.g., both rain and haze, because the physics model does not describe such complex degradation process well. Figure 13
shows an example where the proposed algorithm fails to remove rain/snow from the input image due to the complex degradation process. Future work will consider jointly using semisupervise and supervise learning algorithms to solve this problem.
(a) Input  (b) Restored result 
7 Concluding Remarks
Motivated by the observation that the estimated results should be consistent with the observed inputs under the fundamental constraints in image restoration and related lowlevel vision problems, we enforce this fundamental constraint in the conventional GAN framework. As the fundamental constraint is derived from the physical formation process of some lowlevel problems, the proposed algorithm can be applied to a variety of image restoration and related lowlevel vision problems. By training in an endtoend fashion, the proposed algorithm performs favorably against stateoftheart methods on each task.
References
 [1] H. Zhang, V. Sindagi, and V. M. Patel, “Image deraining using a conditional generative adversarial network,” CoRR, vol. abs/1701.05957, 2017.
 [2] M. Hradis, J. Kotera, P. Zemcík, and F. Sroubek, “Convolutional neural networks for direct text deblurring,” in BMVC, 2015, pp. 6.1–6.13.
 [3] C. Dong, C. C. Loy, K. He, and X. Tang, “Image superresolution using deep convolutional networks,” IEEE TPAMI, vol. 38, no. 2, pp. 295–307, 2016.
 [4] C. Dong, C. C. Loy, and X. Tang, “Accelerating the superresolution convolutional neural network,” in ECCV, 2016, pp. 391–407.
 [5] C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional network for image superresolution,” in ECCV, 2014, pp. 184–199.
 [6] J. Kim, J. K. Lee, and K. M. Lee, “Accurate image superresolution using very deep convolutional networks,” in CVPR, 2016, pp. 1646–1654.
 [7] ——, “Deeplyrecursive convolutional network for image superresolution,” in CVPR, 2016, pp. 1637–1645.
 [8] R. Liao, X. Tao, R. Li, Z. Ma, and J. Jia, “Video superresolution via deep draftensemble learning,” in ICCV, 2015, pp. 531–539.
 [9] L. Xu, J. S. J. Ren, Q. Yan, R. Liao, and J. Jia, “Deep edgeaware filters,” in ICML, 2015, pp. 1669–1678.
 [10] S. Liu, J. Pan, and M.H. Yang, “Learning recursive filters for lowlevel vision via a hybrid neural network,” in ECCV, 2016, pp. 560–576.
 [11] C. Dong, Y. Deng, C. C. Loy, and X. Tang, “Compression artifacts reduction by a deep convolutional network,” in ICCV, 2015, pp. 576–584.
 [12] V. Jain and H. S. Seung, “Natural image denoising with convolutional networks,” in NIPS, 2008, pp. 769–776.
 [13] J. Xie, L. Xu, and E. Chen, “Image denoising and inpainting with deep neural networks,” in NIPS, 2012, pp. 350–358.
 [14] D. Eigen, D. Krishnan, and R. Fergus, “Restoring an image taken through a window covered with dirt or rain,” in ICCV, 2013, pp. 633–640.
 [15] W. Ren, S. Liu, H. Zhang, J. Pan, X. Cao, and M.H. Yang, “Single image dehazing via multiscale convolutional neural networks,” in ECCV, 2016, pp. 154–169.
 [16] B. Cai, X. Xu, K. Jia, C. Qing, and D. Tao, “Dehazenet: An endtoend system for single image haze removal,” IEEE TIP, vol. 25, no. 11, pp. 5187–5198, 2016.
 [17] B. Li, X. Peng, Z. Wang, J. Xu, and D. Feng, “Aodnet: Allinone dehazing network,” in ICCV, 2017, pp. 4770–4778.
 [18] I. J. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. C. Courville, and Y. Bengio, “Generative adversarial nets,” in NIPS, 2014, pp. 2672–2680.
 [19] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, “Photorealistic single image superresolution using a generative adversarial network,” in CVPR, 2017, pp. 4681–4690.

[20]
O. Kupyn, V. Budzan, M. Mykhailych, D. Mishkin, and J. Matas, “Deblurgan: Blind motion deblurring using conditional adversarial networks,” in
CVPR, 2018, pp. 8183–8192.  [21] J. Johnson, A. Alahi, and L. FeiFei, “Perceptual losses for realtime style transfer and superresolution,” in ECCV, 2016, pp. 694–711.

[22]
C. J. Schuler, H. C. Burger, S. Harmeling, and B. Schölkopf, “A machine learning approach for nonblind image deconvolution,” in
CVPR, 2013, pp. 1067–1074.  [23] L. Xu, J. S. J. Ren, C. Liu, and J. Jia, “Deep convolutional neural network for image deconvolution,” in NIPS, 2014, pp. 1790–1798.
 [24] J. Sun, W. Cao, Z. Xu, and J. Ponce, “Learning a convolutional neural network for nonuniform motion blur removal,” in CVPR, 2015, pp. 769–777.
 [25] C. J. Schuler, M. Hirsch, S. Harmeling, and B. Schölkopf, “Learning to deblur,” IEEE TPAMI, vol. 38, no. 7, pp. 1439–1451, 2016.
 [26] S. Nah, T. Hyun Kim, and K. Mu Lee, “Deep multiscale convolutional neural network for dynamic scene deblurring,” in CVPR, 2017, pp. 3883–3891.
 [27] K. He, J. Sun, and X. Tang, “Single image haze removal using dark channel prior,” in CVPR, 2009, pp. 1956–1963.
 [28] L.W. Kang, C.W. Lin, and Y.H. Fu, “Automatic singleimagebased rain streaks removal via image decomposition,” IEEE TIP, vol. 21, no. 4, pp. 1742–1755, 2012.
 [29] Y. Luo, Y. Xu, and H. Ji, “Removing rain from a single image via discriminative sparse coding,” in ICCV, 2015, pp. 3397–3405.
 [30] Y.L. Chen and C.T. Hsu, “A generalized lowrank appearance model for spatiotemporally correlated rain streaks,” in ICCV, 2013, pp. 1968–1975.
 [31] Y. Li, R. T. Tan, X. Guo, J. Lu, and M. S. Brown, “Rain streak removal using layer priors,” in CVPR, 2016, pp. 2736–2744.
 [32] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016, pp. 770–778.
 [33] X. Fu, J. Huang, D. Zeng, Y. Huang, X. Ding, and J. Paisley, “Removing rain from single images via a deep detail network,” in CVPR, 2017, pp. 3855–3863.
 [34] W. Yang, R. T. Tan, J. Feng, J. Liu, Z. Guo, and S. Yan, “Deep joint rain detection and removal from a single image,” in CVPR, 2017, pp. 1357–1366.
 [35] M. S. M. Sajjadi, B. Schölkopf, and M. Hirsch, “Enhancenet: Single image superresolution through automated texture synthesis,” in ICCV, 2017, pp. 4501–4510.
 [36] J.Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired imagetoimage translation using cycleconsistent adversarial networks,” in ICCV, 2017, pp. 2223–2232.
 [37] T. Kim, M. Cha, H. Kim, J. K. Lee, and J. Kim, “Learning to discover crossdomain relations with generative adversarial networks,” in ICML, 2017, pp. 1857–1865.
 [38] Z. Yi, H. Zhang, P. Tan, and M. Gong, “Dualgan: Unsupervised dual learning for imagetoimage translation,” in CoRR, 2017, pp. 2849–2857.
 [39] P. Isola, J.Y. Zhu, T. Zhou, and A. A. Efros, “Imagetoimage translation with conditional adversarial networks,” in CVPR, 2017, pp. 1125–1134.
 [40] X. Mao, Q. Li, H. Xie, R. Y. K. Lau, and Z. Wang, “Multiclass generative adversarial networks with the L2 loss function,” CoRR, vol. abs/1611.04076, 2016.
 [41] C. K. Sønderby, J. Caballero, L. Theis, W. Shi, and F. Huszár, “Amortised MAP inference for image superresolution,” in ICLR, 2017.
 [42] X. Xu, D. Sun, J. Pan, Y. Zhang, H. Pfister, and M.H. Yang, “Learning to superresolve blurry face and text images,” in ICCV, 2017, pp. 251–260.
 [43] A. Levin, Y. Weiss, F. Durand, and W. T. Freeman, “Understanding and evaluating blind deconvolution algorithms,” in CVPR, 2009, pp. 1964–1971.
 [44] J. Pan, D. Sun, H. Pfister, and M.H. Yang, “Blind image deblurring using dark channel prior,” in CVPR, 2016, pp. 1628–1636.
 [45] L. Li, J. Pan, W.S. Lai, C. Gao, N. Sang, and M.H. Yang, “Learning a discriminative prior for blind image deblurring,” in CVPR, 2018, pp. 6616–6625.
 [46] D. Ulyanov, A. Vedaldi, and V. S. Lempitsky, “Instance normalization: The missing ingredient for fast stylization,” CoRR, vol. abs/1607.08022, 2016.
 [47] Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in ICCV, 2015, pp. 3730–3738.
 [48] T.Y. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: common objects in context,” in ECCV, 2014, pp. 740–755.
 [49] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from RGBD images,” in ECCV, 2012, pp. 746–760.
 [50] A. Saxena, M. Sun, and A. Y. Ng, “Make3d: Learning 3d scene structure from a single still image,” IEEE TPAMI, vol. 31, no. 5, pp. 824–840, 2009.
 [51] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980, 2014.
 [52] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in AISTATS, 2010, pp. 249–256.
 [53] A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, and R. Webb, “Learning from simulated and unsupervised images through adversarial training,” CoRR, vol. abs/1612.07828, 2016.
 [54] L. Xu, S. Zheng, and J. Jia, “Unnatural L sparse representation for natural image deblurring,” in CVPR, 2013, pp. 1107–1114.
 [55] J. Pan, Z. Hu, Z. Su, and M.H. Yang, “regularized intensity and gradient prior for deblurring text images and beyond,” IEEE TPAMI, vol. 39, no. 2, pp. 342–355, 2017.

[56]
J. Zhang, J. Pan, J. Ren, Y. Song, L. Bao, R. W. Lau, and M.H. Yang, “Dynamic scene deblurring using spatially variant recurrent neural networks,” in
CVPR, 2018, pp. 2521–2529.  [57] L. Xiao, J. Wang, W. Heidrich, and M. Hirsch, “Learning highorder filters for efficient blind deconvolution of document photographs,” in ECCV, 2016, pp. 734–749.
 [58] J. Pan, Z. Hu, Z. Su, and M.H. Yang, “Deblurring face images with exemplars,” in ECCV, 2014, pp. 47–62.
 [59] D. Berman, T. Treibitz, and S. Avidan, “Nonlocal image dehazing,” in CVPR, 2016, pp. 1674–1682.
 [60] C. Chen, M. N. Do, and J. Wang, “Robust image and video dehazing with visual artifact suppression via gradient residual minimization,” in ECCV, 2016, pp. 576–591.

[61]
H. Zhang and V. M. Patel, “Densely connected pyramid dehazing network,” in
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, 2018, pp. 3194–3203.  [62] http://www.vision.ee.ethz.ch/ntire18/.
 [63] W. Shi, J. Caballero, F. Huszar, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang, “Realtime single image and video superresolution using an efficient subpixel convolutional neural network,” in CVPR, 2016, pp. 1874–1883.
 [64] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee, “Enhanced deep residual networks for single image superresolution,” in CVPR Workshops, 2017, pp. 1132–1140.
 [65] X. Fu, J. Huang, X. Ding, Y. Liao, and J. Paisley, “Clearing the skies: A deep network architecture for singleimage rain removal,” IEEE TIP, 2017.
Comments
There are no comments yet.