With the increase of digital cameras and mobile phones, a huge amount of high resolution images are taken every day khmag2018natural fan2018detail cambra2018generic guan2018deep yu2018deeper , e.g. the latest Huawei Mate20 series mobile phones have over 60 megapixels. However, sensor shake is often inevitable, that resulting in undesirable motion blurring. Although sharp images might be obtained by fixing devices or taking the images again, in many occasions, however, we have no chance to fix the devices or take the images again, for example in remote sensingineichen2018high , Video surveillancesun2018dppdl , medical imaginghobbs2018physician and some other related fields. Therefore, how to obtain sharp images from blurry images has been noticed by researchers in many fields for many years, but the problem still cannot be well solved due to the complexity of motion blur process and, most importantly, the rich details in high-resolution natural images. For example, whenever the blur kernel is complicated and the desired sharp images are rich in details, most existing methods may not produce satisfactory results, as we can see from Figure 1.
Image deblurring problems are a kind of image degradation problems, which can be expressed as
where is the given blurred image, is the sharp image, A is a degradation function and n denotes possible noise. In this work, we shall focus on the cases where the degradation process is shift invariant, thereby the generation process of a blurred image is given by
where * denotes 2D convolution and is the blur kernel. To obtain the sharp image and the blur kernel simultaneously, some commonly used approaches are MAP zhang2011sparse chan1998total , Variational Bayes fergus2006removing levin2011efficient . Lots of methods have been proposed and explored in the literature. For example, Chan and Wang chan1998total proposed total variation to regularize the gradient of the sharp image. Zhang et al. zhang2011sparse proposed a sparse coding method for sharp image recovering. Cai et al. cai2009blind
applied sparse representation to estimate sharp image and blur kernel at the same time. Although obtained moderately good results, these methods cannot apply to real applications and most importantly, cannot handle well high frenquency features.
To achieve fast image deblurring, it is straightforward to consider the idea of deep learning that pre-trains network models by a great deal of training data. Although the training process is computationally expensive, deep learning methods can process testing images very efficiently, as they only need to pass an image through the learnt network. Most existing deep based methods are built upon the well known Convolution Neural Network (CNN)pan2018physics guo2018toward . However, CNN tends to suppress the high frequency details in images. To relieve this issue, generative adversarial network (GAN) goodfellow2014generative is a promising idea. Kupyn et al. kupyn2017deblurgan proposed a GAN based method that uses the ResBlocks architecture as the generator. Pan et al. pan2018physics used GAN to extract intrinsic physical features in images.
In this work, we prove that encoder-decoder network architecture performs better for image deblurring problems. Specifically, we propose a GAN based method using cycle consistency training strategy to ensure that the networks can model appropriately high frequency features. We build a cycle generator, which transfers the blurred images to the sharp domain and transfers the sharp images back to the blurred domain in a cycle manner. Different from previous works, we build two discriminators to distinguish the blurred and sharp images seperately. For generator, encoder-decoder based architecture performs better than ResBlock based architecture. Experimental results show that the method proposed in this work achieves the competitive performance.
2 Related Works
Image deblurring is a classical problem in image processing and signal processing. We can divide image deblurring problems into learning-based methods and learning-free methods.
In learning-free methods, Most of works in this field suppose that deblurring is shift invariant and cause by motion liu2014blind chandramouli2018plenoptic kotera2018motion , which can be treated as a deconvolution problem liu2014blind krishnan2009fast wang2018training zhang2017learning . There are many ways to solve this, many worksliu2014blind used bayesian estimation to solve it, that is
One commonly used deblurring method is based on the Maximum A MAP framework, where latent sharp image and blur kernel can be obtained by
Chan and Wangchan1998total proposed a robust total variation minimization method which is effective for regularizing the gradient or the edge of the sharp image. Zhang et al.zhang2011sparse proposed a sparse coding method for sharp image recovering, which assumes that the natural image patch can be sparsely represented by a over-complete dictionary. Cai et al. cai2009blind applied sparse representation for estimate sharp image and blur kernel at the same time. Krishnan et al.krishnan2011blind
found that the minimum of their loss function in many existing methods do not correspond to their real sharp images,so Krishnan et al.krishnan2011blind proposed a normalized sparsity prior to handle this problems. Michaeli and Irani michaeli2014blind found that multiscale properties can also be used for blind deblurring problems, so Michaeli and Iranli michaeli2014blind proposed self-similarity as image prior. Ren et al.ren2016image proposed low rank prior for both images and their gradients.
Another commonly approach to estimate motion blur process is to maximize the marginal distribution
Fergus fergus2006removing proposed a motion deblurring method based on Variational Bayes method. Levin levin2011efficient proposed a Expectation Maximization (EM) method to estimate blur process. The above two approaches does have some drawbacks: it is hard to optimize, time consuming and cannot handle high frequency features well.
While learning-based methods use deep learning techniques, which aims to find the intrinsic features which can be find by the models themselves through learning process. Deep learning lecun2015deep has boost the research in related fields such as image recognition krizhevsky2012imagenet , image segmentation he2017mask and so on. For deblurring problems using deep learning techniques, kupyn2017deblurgan trained a CNN architecture to learn the mapping function from blurred images to sharp ones. pan2018physics used a CNN architecture with physics-based image prior to learn the mapping function.
One of the novel deep learning techiques is Generative Adversarial Networks, usually known as GANs, introduced by Goodfellow goodfellow2014generative
, and inspired by the zero-sum game in game theory proposed by Nashnash1951non
which has achieved many excited results in image inpaintingyeh2017semantic , style transfer isola2017image zhu2017unpaired johnson2016perceptual , and it can even be used in other fields such as material sciencesanchez2018inverse . The system includes a generator and a discriminator. Generator tries to capture the latent real data distribution, and output a new data sample, while discriminator tries to discriminate the input data is from real data distribution or not. Both the generator and the discriminator can build based on Convolutional Neural Nets lecun2015deep , and trained based on the above ideas. Instead of input a random noise in origin generative adversarial nets goodfellow2014generative , conditional GAN dai2017towards input random noise with discrete labels or even images isola2017image Zhu et al. zhu2017unpaired take a step further, which based on conditional GAN and trained a cycle consistency objective, which gives more realistic images in image transfer tasks. Inspired by this idea, Isola isola2017image proposed one of the first image deblurring algorithms based on Generative Adversarial Nets goodfellow2014generative .
3 Proposed Method
The goal of image deblurring model proposed in this work is to recover the sharp images given only the blurred images, with no information about the blur process, we build a generative adversarial network based model. A CNN was trained as a generator, given blurred images as inputs, outputs the sharp images. In addition, we also give a critic rules and train these models in an adversarial manner. We denote the two distributions as and two mapping functions, or so-called generator in GANs: . In addition, two discriminator and were introduced, tries to distinguish whether the input is blur or not while tries to distinguish whether the input is sharp or not. Our loss function contains two parts: adversarial loss and cycle loss. The architecture is shown in Figure 2, 3.
3.1 Loss funtion
Our goal is to learn the mapping function between blurred domain B and sharp domain S given samples where and where . A combination of the following loss was used as our loss function:
where are the total loss function, adversarial loss, cycle loss and their parameters, respectively. The adversarial loss tries to ensure the deblurred images as realistic as possible, cycle loss tries to ensure that the deblurred images can transfer back to the blur domain, which can also make the deblurred images as realistic as possible. For the two mapping functions which aims to transfer the sharp images to the blur domain and transfer the blurrred images to the sharp domain, respectively. The two corresponding discriminators tries to distinguish whether the input images are blur or not, sharp or not, respectively. The adversarial loss are as follows: The following loss function were proposed
where tries to distinguish whether the inputs are from target distribution or not, generators and tries to fool the discriminator and generate the images as realistic as possible. Isola et al.isola2017image and Zhu et al.zhu2017unpaired shown that least square lossmao2017least can perform better than mean square loss in image style transfer tasks, Kupyn et al.kupyn2017deblurgan used least square loss mao2017least for image deblurring tasks. So far, we don’t know which loss objective performs better in image deblurring problems, mean square loss or least square lossmao2017least , we have done some experiments to find out the better choice.
where is the feature map which obtained from the i-th maxpool layer after the j-th convolution layer from VGG-19 networks, and are the dimensions of the corresponding feature maps, the perceptual loss can capture high level intrinsic features which has been proved to work well in image deblurringkupyn2017deblurgan , and some other image processing tasksisola2017image zhu2017unpaired .
For cycle loss, which aims to make the reconstructed images and the input images as close as possible under some measurements, there are two classical choice for evaluation, L1 loss or mean square loss, Least Square lossmao2017least or perceptual lossjohnson2016perceptual . The experiments shown that Perceptual lossjohnson2016perceptual can capture high frequency features in image deblurirng tasks, which gives more texture and details. So perceptual loss is used for evaluation in all experiments.
3.2 Model Architecture
The goal of image deblurring problems is to map a low resolution inputs to a high resolution outputs. We use generative adversarial networks based model to deal with this problems. For discriminator, instead of classify the whole image is sharp or not, we use PatchGAN based architecture tries to classify each image patch from the whole image, which gives better results in image deblurring problems. For the objective uses to distinguish whether the input is sharp or not, perceptual objectivejohnson2016perceptual gives better results which not to evaluate the results in mean square or least square objectives, instead it tries to captures high freqency features of the two given inputs. So PatchGAN based architectureisola2017image zhu2017unpaired with perceptual lossjohnson2016perceptual were used as discriminator in all experiments. Experiments shown PatchGAN based architecture isola2017image can achieves good results if the image patch is a quarter size of the input image, so in this work we choose patch in all experiments according the input image size. For generator, many previous solutions for learning-based methods uses a encoder-decoder network such as Ronneberger et al. ronneberger2015u as the generator, which shown in Figure 3, which tries to capture the high frequency features, it can distinguish blurred images from sharp images better than low-level features. Kupyn et al. kupyn2017deblurgan used the architecture proposed by Johnsonjohnson2016perceptual in deblurring problems as generator, which gives good performance. Some comparative experiments shown in table 1 are given to find out which generator network architecture and objective gives better results in image deblurring problems. The experiments shown that for image deblurring problems, the optimal choice for generator architecture is U-net based architecture and the optimal evaluation for optimization objective is least square loss. So the above generator architecture was used in our model in the following experiments. The whole model and the generator architecture were shown in Figure 2 and Figure 3.
4 Experimental Results
experiments were performed on a workstation with NVIDIA Tesla K80 GPU. The network proposed was trained on the images sampled randomly from the GoPro datasets, and then divide into training sets and test sets. Figure 4 gives some images sampled from the datasets build in this paper. We sampled 10,100 images and resize each of them to
, applied blur process proposed by Kupyn et al.kupyn2017deblurgan
. We randomly choose 10,000 images for training precedure and the rest 100 images for testing. We train the network with a batch size of 2, giving 50 epochs over the training data. The reconstructed images are regularized with cycle consistency objective with a strength of 10. we use paired data to train the model in 50 epochs. No dropout technique was used since the model does not overfit within 50 epochs. For the image pairs used during training and testing process, we use the method similar to the one proposed by Kupyn et al.kupyn2017deblurgan , which produce more realistic motion blur than other existing methods xu2014deep sun2015learning . For the optimization procedure, we perform 10 steps on and , and then one step to and . We use Adam kingma2014adam optimizer with a learning rate of in the first 40 epochs, and then linearly decay the learning rate to zero in the following epochs to ensure the convergence. Both generator and discriminators were applied with Instance Normalization to boost the convergence. We choose some deblurred results from different scenes, which are given in Figure 5, 6, 7, 8 and Table 2, 3, 4, 5, respectively. For image evaluation, most works use full reference measurments PSNR and SSIM in all their experiments pan2014deblurring pan2016blind pan2018deblurring pan2018physics wang2004image , which need reference images (groundtruth) during assessments. For other image assessments, VIF sheikh2004image captures wavelets features which focus on high frequency features, IFC sheikh2005information puts more wights on edge features. Lai et al lai2016comparative points out that the full reference image assessments VIF and IFC is better than PSNR and SSIM. So in this paper, we take a step further, which use PSNR, SSIMwang2004image and some new full reference methods MS-SSIM wang2003multiscale , IFCsheikh2005information , VIFsheikh2004image and one No Reference image quality assessment NIQE mittal2013making in all experiments. For the experimental comparision, we choose different learning-free methods proposed by Pan et al.pan2014deblurring , Pan et al.pan2016blind and Xu and Jia.xu2014deep , and for fairness, we also choose one learning-based method proposed by Kupyn et al.kupyn2017deblurgan for comparisions. All the salient regions were pointed out in each images.
The results shown that the model proposed in this work outperformed many existing image deblurring models, it can recover more high frequency textures and details, the salient regions were pointed out in Figure 5, 6, 8. Our model outperfiorms many existing learning-free and learning-based methods in most full reference assessments and human visualization evaluations, the results are shown in Table 2, 3, 4, 5. But our model does not perform well in no reference assessments. Some results get higher score (e.g. in Table 2, 3, 4, 5) does not perform well in human visualization evaluations, so we think that NIQE may not applicable for image deblurring problems assessments. It also shown that our models can handle blur caused by motion or camera shake, the recovered image has less artifacts comparing to many existing methods.
We shown that encoder-decoder based architecture performs better for image deblurring problems comparing the Resblock based architecture. For optimization objectives, least square loss performs better than mean square loss. The experiments shown that the model proposed in this work can deal with image deblurring problems well without giving any domain specific knowledege. It can recover more high frequency textures and details, which not only outperform many competitive methods in many different full reference and no reference image quality assessments but also in human visualization evaluation. It also shown that our models can handle blur caused by motion or camera shake, the recovered image has less artifacts comparing to many existing methods.
6 Compliance with Ethical Standards
Fundings: This study was funded by the National Natural Science Foundation of China (NSFC) under grant 61502238, grant 61622305, grant 6170227 and the Natural Science Foundation of Jiangsu Province of China (NSFJS) under grant BK20160040.
Conflict of Interest: The authors declare that they have no conflict of interest.
Acknowledgements.We thank Guangcan Liu, Yubao Sun, Jinshan Pan and Jiwei Chen for their helpful discussions and advices.
Cai, J.F., Ji, H., Liu, C., Shen, Z.: Blind motion deblurring from a single
image using sparse approximation.
In: Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 104–111. IEEE (2009)
- (2) Cambra, A.B., Murillo, A.C., Muñoz, A.: A generic tool for interactive complex image editing. The Visual Computer 34(11), 1493–1505 (2018)
- (3) Chan, T.F., Wong, C.K.: Total variation blind deconvolution. IEEE transactions on Image Processing 7(3), 370–375 (1998)
- (4) Chandramouli, P., Jin, M., Perrone, D., Favaro, P.: Plenoptic image motion deblurring. IEEE Transactions on Image Processing 27(4), 1723–1734 (2018)
- (5) Chollet, F., et al.: Keras (2015)
- (6) Dai, B., Fidler, S., Urtasun, R., Lin, D.: Towards diverse and natural image descriptions via a conditional gan. arXiv preprint arXiv:1703.06029 (2017)
- (7) Fan, Q., Shen, X., Hu, Y.: Detail-preserved real-time hand motion regression from depth. The Visual Computer pp. 1–10 (2018)
- (8) Fergus, R., Singh, B., Hertzmann, A., Roweis, S.T., Freeman, W.T.: Removing camera shake from a single photograph. In: ACM transactions on graphics (TOG), vol. 25, pp. 787–794. ACM (2006)
- (9) Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in neural information processing systems, pp. 2672–2680 (2014)
- (10) Guan, H., Cheng, B.: How do deep convolutional features affect tracking performance: an experimental study. The Visual Computer 34(12), 1701–1711 (2018)
- (11) Guo, S., Yan, Z., Zhang, K., Zuo, W., Zhang, L.: Toward convolutional blind denoising of real photographs. arXiv preprint arXiv:1807.04686 (2018)
- (12) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Computer Vision (ICCV), 2017 IEEE International Conference on, pp. 2980–2988. IEEE (2017)
- (13) Hobbs, J.B., Goldstein, N., Lind, K.E., Elder, D., Dodd III, G.D., Borgstede, J.P.: Physician knowledge of radiation exposure and risk in medical imaging. Journal of the American College of Radiology 15(1), 34–43 (2018)
- (14) Ineichen, P.: High turbidity solis clear sky model: Development and validation. Remote Sensing 10(3), 435 (2018)
- (15) arXiv preprint (2017)
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution.In: European Conference on Computer Vision, pp. 694–711. Springer (2016)
Khmag, A., Al Haddad, S., Ramlee, R., Kamarudin, N., Malallah, F.L.: Natural image noise removal using nonlocal means and hidden markov models in transform domain.The Visual Computer 34(12), 1661–1675 (2018)
- (18) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
- (19) Kotera, J., Šroubek, F.: Motion estimation and deblurring of fast moving objects. In: 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 2860–2864. IEEE (2018)
- (20) Krishnan, D., Fergus, R.: Fast image deconvolution using hyper-laplacian priors. In: Advances in Neural Information Processing Systems, pp. 1033–1041 (2009)
- (21) Krishnan, D., Tay, T., Fergus, R.: Blind deconvolution using a normalized sparsity measure. In: Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pp. 233–240. IEEE (2011)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks.In: Advances in neural information processing systems, pp. 1097–1105 (2012)
- (23) Kupyn, O., Budzan, V., Mykhailych, M., Mishkin, D., Matas, J.: Deblurgan: Blind motion deblurring using conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2261–2269 (2017)
- (24) Lai, W.S., Huang, J.B., Hu, Z., Ahuja, N., Yang, M.H.: A comparative study for single image blind deblurring. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1709 (2016)
- (25) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436 (2015)
- (26) Levin, A., Weiss, Y., Durand, F., Freeman, W.T.: Efficient marginal likelihood optimization in blind deconvolution. In: Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pp. 2657–2664. IEEE (2011)
- (27) Liu, G., Chang, S., Ma, Y.: Blind image deblurring using spectral properties of convolution operators. IEEE Transactions on image processing 23(12), 5047–5056 (2014)
- (28) Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Smolley, S.P.: Least squares generative adversarial networks. In: Computer Vision (ICCV), 2017 IEEE International Conference on, pp. 2813–2821. IEEE (2017)
- (29) Michaeli, T., Irani, M.: Blind deblurring using internal patch recurrence. In: European Conference on Computer Vision, pp. 783–798. Springer (2014)
- (30) Mittal, A., Soundararajan, R., Bovik, A.C.: Making a” completely blind” image quality analyzer. IEEE Signal Process. Lett. 20(3), 209–212 (2013)
- (31) Nash, J.: Non-cooperative games. Annals of mathematics pp. 286–295 (1951)
- (32) Pan, J., Hu, Z., Su, Z., Yang, M.H.: Deblurring face images with exemplars. In: European Conference on Computer Vision, pp. 47–62. Springer (2014)
- (33) Pan, J., Liu, Y., Dong, J., Zhang, J., Ren, J., Tang, J., Tai, Y.W., Yang, M.H.: Physics-based generative adversarial models for image restoration and beyond. arXiv preprint arXiv:1808.00605 (2018)
- (34) Pan, J., Sun, D., Pfister, H., Yang, M.H.: Blind image deblurring using dark channel prior. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1628–1636 (2016)
- (35) Pan, J., Sun, D., Pfister, H., Yang, M.H.: Deblurring images via dark channel prior. IEEE transactions on pattern analysis and machine intelligence 40(10), 2315–2328 (2018)
- (36) Ren, W., Cao, X., Pan, J., Guo, X., Zuo, W., Yang, M.H.: Image deblurring via enhanced low-rank prior. IEEE Transactions on Image Processing 25(7), 3426–3437 (2016)
- (37) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Springer (2015)
Sanchez-Lengeling, B.e.a.: Inverse molecular design using machine learning: Generative models for matter engineering.Science 361(6400), 360–365 (2018)
- (39) Sheikh, H.R., Bovik, A.C.: Image information and visual quality. In: Acoustics, Speech, and Signal Processing, 2004. Proceedings.(ICASSP’04). IEEE International Conference on, vol. 3, pp. iii–709. IEEE (2004)
- (40) Sheikh, H.R., Bovik, A.C., De Veciana, G.: An information fidelity criterion for image quality assessment using natural scene statistics. IEEE Transactions on image processing 14(12), 2117–2128 (2005)
- (41) Sun, J., Cao, W., Xu, Z., Ponce, J.: Learning a convolutional neural network for non-uniform motion blur removal. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 769–777 (2015)
- (42) Sun, Z., Zhang, Q., Li, Y., Tan, Y.a.: Dppdl: a dynamic partial-parallel data layout for green video surveillance storage. IEEE Transactions on Circuits and Systems for Video Technology 28(1), 193–205 (2018)
- (43) Wang, R., Tao, D.: Training very deep cnns for general non-blind deconvolution. IEEE Transactions on Image Processing 27(6), 2897–2910 (2018)
- (44) Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13(4), 600–612 (2004)
- (45) Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, vol. 2, pp. 1398–1402. Ieee (2003)
- (46) Xu, L., Ren, J.S., Liu, C., Jia, J.: Deep convolutional neural network for image deconvolution. In: Advances in Neural Information Processing Systems, pp. 1790–1798 (2014)
- (47) Yeh, R.A., Chen, C., Lim, T.Y., Schwing, A.G., Hasegawa-Johnson, M., Do, M.N.: Semantic image inpainting with deep generative models. In: CVPR, vol. 2, p. 4 (2017)
- (48) Yu, Z., Liu, Q., Liu, G.: Deeper cascaded peak-piloted network for weak expression recognition. The Visual Computer 34(12), 1691–1699 (2018)
- (49) Zhang, H., Yang, J., Zhang, Y., Huang, T.S.: Sparse representation based blind image deblurring. In: Multimedia and Expo (ICME), 2011 IEEE International Conference on, pp. 1–6. IEEE (2011)
- (50) Zhang, K., Zuo, W., Gu, S., Zhang, L.: Learning deep cnn denoiser prior for image restoration. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 2 (2017)
- (51) Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint (2017)