Introduction
Images are representation of visual information in digital form but the image acquisition and formation process degrade the information of the representation of the original scene while capturing it. Blur, pointwise nonlinearities and the noise formation are common case of degradation that usually occurs by the image sensing system. Image blur is a unavoidable information degradation. In another hand, it is form of bandwidth reduction of the images due to the image formation process. One of the easiest possible solution is to capture the images in shorter exposure intervals during the image acquisition process. In this case, noise formation in capture is inevitable when capturing the images in dark lighting conditions. Another possible solution is information recovery and reconstruction / offline process. There has been a lot of developments in digital image processing techniques to sustain the visual informations and to increase the quality of the images while capturing the images in offline . The main objective of the information recovery and restoration is to estimating the possible information of an original scene from the degraded images. It severs in many area such as astronomy, medical image process and satellite image processing as well as the commercial photographic industries. Information recovery and reconstruction can be carried by image inprinting and deblurring process
[1] [2] [3] [4]. Image inpainting is a process of generating possible information to fill the damaged or missing regions in an image by utilizing the available information that includes restoration of old images, removing scratches, texts, special effects and filling the damaged regions [5]. The inprinting process can be largely classified into two categories such as structural inprinting and texture inprinting. Structural inprinting is concerned with propagation of structure into the missing region and synthesization of texture on that area which can be very effective for inprinting small region. The texture inprinting is using the global information from multiple images and filling the missing regions which is very effective in large missing area
[6]. In early 1990’s, first image inpaint model is proposed based on nonlinear partial differential equations to restore the information in the damaged images. In this method, the gray level information is propagated in the direction of isophotes to obtain a full image
[7]. Bertozzi is proposed a method for inprinting image through two dimensional fluid dynamics navierstrokes equation [8]. ChengShian Lin and JinJang Leou suggested an another fourstep approach for inpainting [9]. Marcelo Bertalm et al proposed a hybrid approach of using structure and texture inprinting. The idea of this approach is to decomposing the images into two functions by their character and work on those functions with structure and texture filling [10].Information degradation caused by the blur effects produces visually unattractive images. Fast moving objects, image acquisition in dim lighting condition, capturing long distance objects, other side of focused area in the images are typical example for the blur generation where even the high speed and higher resolution sensing system perform very badly [11]. Image deblurring is an inverse problems where the reconstruction or recover the information of shared images from the degraded images [12]. Numerous investigation is carried out to deblur the images based on nonblind deblur and blind deblur method. In nonblind deblur methods such as RichardsonLucy [13] and Wiener filter [14] , the blur kernal is assumed to be known and the information recovery is carried out by using the blurred images and the kernels. In blind deblur methods, the kernel is known to be unknown and the estimation of kernel is carried out by using the blurred images. However, the recent developments in deblur methods are presented to tackle the both blind and nonblind problems. Due to the illnature of the blur kernels as well as the noise in the kernel, the acquired blurred images doesn’t exactly represent the information about the original scene and true kernels are significantly mismatched [15].
The recent development in convolutional Neural Network (CNN) would provide the possibility of addressing this limitation. The architecture of CNN, learning of CNN, hyperparameter optimization, and the limitation are elaborately discussed in our previous studies
[16] [17] [18] [19]. Another remarkable invention based on CNN architecture is Generative Adversarial Network. A typical CNN architecture has two convolutional neural networks. Image style transformation, deblur GAN, text to images are the one of most promising developments in translation and transformation of image information. Generative Adversarial Network is an artificial intelligence technique, consists of two networks namely discriminator and generator competing with each other in zero sum game. The idea of this generative model is firstly introduced by Ion Goodfellow
[20]. The generator networks learn to map from the latent space and generates the image from the data distribution while the discriminator networks discriminates the real data distribution and the generator data distribution. The training objective of the GAN network is to increase the error rate of the discriminative network. The subsequent development of GAN network such as DCGAN, improved DCGAN improved the performance of the vanilla GAN. In DCGAN, the author suggested that the importance of Batch Normalization in both generator and discriminator module as well as the importance of avoiding fully connected layers and striding instead of pooling
[21]. The technique proposed in improved DCGAN allows to generator high resolution images where the authors has suggested various enhancements on training such as feature matching, Historical averaging, Onesided label smoothing, and Virtual batch normalization [22]. Another GAN network is conditional GAN, known as CGAN where utilization label information is resulted with better quality of image generation and governance over the images. One of the important invention in GAN is Wasserstein GAN which overcomes the limitation of vanilla GAN by optimizing the learning parameters by using wasserstein Earth Mover distance as the objective function [23]. Super resolution GAN
[24], pix2pix GAN, cycle GAN
[25] are the notable promising inventions in generative models that provide the possibilities of translation of image information from the real data distribution over a noisy data distribution. In super resolution GAN, the authors proposed an content loss objective functions along with the adversarial loss function. Content loss function is Euclidean distance between the high level feature of generated and real data image distribution which allows the generation of more similar images to the high resolution original image. Yeh et al proposed an architecture for image inprinting to fill the information in the mission region of the images [26]. Ramakrishnan et al, proposed a kernel free blind algorithm to deblur images by using pix2pix and fully connected dense layer [27]. From the recent invention in Generative Adversarial Network, it is been clearly understood that the GAN network have the potential of preserving the inherent textural and structural information and generating convening images that looks more close to the real data distribution.1 Proposed Network
Recently, Generative Adversarial Networks are finding important role in supervised, semisupervised as well as unsupervised learning vision tasks as the generative models implicitly learn probability density of high dimensional distributions of the data and generate natural looking images. The generator and the discriminator in the GAN network competing each other in zero sum game to optimize the learning parameters. The schemantic of the GAN network is shown in the Figure.
1. The generator generate images of natural looking data samples from noise input data to fake the discriminator while the discriminator tends to distinguish the generated samples from the read data. Both the forger (Generator) and the expert (Discriminator) learn simultaneously by minimize the distance between the probability distribution of real and generated data. However, while the discriminator has the access to the generated data and real data, the generator has no access to the real data distribution. The noise input data to the discriminator provide the possible information about the ground truth to distinguish between the synthetic generated data and real data distribution. The same noise data distribution is used for training the generator to produce natural looking images close to the real data with superior quality. The generator and the discriminator composed of deep convolutional layer and fully connected dense layers. Since the necessity of direct invertible of the generator and the discriminator, the both network modules has to be continuous and differentiable everywhere.
In a typical GAN architecture, the discriminator network maps the generated images distribution to the real data distribution and the generator learn to map the representations of the latent space to the space of data distribution , where is represent the samples from the latent space of image distribution. For a fixed generator , the discriminator
is trained to classifying the images as the fake and real input. Once the discriminator is trained optimally, the generator is continued to learn to generate images close the original images. These captured statistical distribution of the training data is applied to solve wide variety of problems such as semantic image editing, style transfer, data augmentation and image retrieval, translation and transformation. Detail overview of super resolution, style transfer, photo generation are discussed in
[25]. Conditional GAN known as cGAN learns the statistical distribution from the training data and the noise vector
toby placing a condition on the discriminator. Markovian discriminator allows to achieve perceptually superior results on generation of images from label maps reconstructing objects from the edge maps, and the colorizing images.
1.1 Generator and Discriminator
The network generator architecture is shown in the Figure 2
. It consists of one convolutional block at the head, two convolutional block at the rear side, seven residual blocks. Residual blocks consist of four sequent convolutional layer, instance normalization layer and the activation function sequentially. The output of the third activation layer in every residual blocks is internally connected with output of the first activation layer in next residual blocks. Along with these local connection, a global skip connection is also introduced. Drop with probability of 50 % is implemented in the residual blocks. InstanceNorm and LeakyReLU with
is introduced in every convolutional network expect the last layer. Reutilization of features between the subsequent layers allow the network to reconstruct the possible information from the learned features. Also, it is noted that the performance of the architecture is higher even with the smaller network.The discriminator in the GEN architecture is the expert that distinguish the difference between real data and generator data. In other hand, it helps the generator to generate more realistic information from the learned data distribution. In this architecture, Markovian patch discriminator with ten convolutional layer is implemented which also enforce the coloration in the generated natural images.
1.2 Loss function
The objective of the Generator is to learn the distribution , approximate to the real distribution
and generate samples such that the probability density function of the generated samples
equals to the probability density function of the real samples ). This can be approached by for learn the differential function such that and directly and optimize through maximum likelihood or learn the differential transformation function of and optimize through maximum likelihood whereis the existing common distribution such as uniform or Gaussian distribution.
The discriminator has to recognize the data from the real data distribution , where indicates the estimated probability of data points . In case of binary classification, if the estimated probability is the positive class and is the negative class , the cross entropy distribution between and is,
(1) 
For a given point and corresponding label , the Eq (1) can be expressed as,
(2) 
It is been understood from the above equation, one of term tends to set to zero depending on the values of . For the entire dataset distribution, the above equation can be written as,
(3) 
In Generative Adversarial Network, the data distribution can be from the real data or the generator data . Addition to that, we expect exactly half of data from the two sources. In order to encode this information probabilistically in the Eq (3), the sum is replaced with expectation and label is with half of the values. Hence, the loss function is,
(4) 
In optimizing the value , for the given real data distribution , the estimated probability over the real data accurate by maximizing the value and for given fake distribution from the generator is close to zero by maximizing the value . On other hand, the generator is trained to increase the chance of producing the estimated probability high for the fake data by mini zing the . Hence, the generator and discriminator tends to fight each other in minmax game to minimize the loss function.
(5) 
If the discriminator is trained before the generator parameter update, the minimization of loss function is equal to minimization of JensenShannon divergence between the real data distribution and generated data distribution . However, optimizing the value function Eq. (5) suffer from vanishing gradient and model collapse as the discriminator saturates on GD. They proposed an another method to measure the probability distribution based on Wasserstein distance, known as Earth Mover distance. It can be stated as, minimum transportation cost of moving the mass from the distribution into the distribution , provided the and distribution are continuous and differentiable everywhere. The Wasserstein loss function can be expressed as by using KantorovichRubinstein duality ,
(6) 
where, is the set of 1Lipschitz functions . After the optimization of the network , to ensure the discriminator probability estimation is to be in the set of 1Lipschitz functions, the author introduce the weight clipping though it lead to undesirable results. In order to overcome this, gradient penalty term is introduced,
(7) 
As stated in the original paper, is kept as 10 during the learning. When gradient penalty in the Eq (7
) is fully optimized by back propagation, the discriminator will act as 1Lipschitz function. However, if the ’image latent information transforming GAN architecture’ is trained without the perceptual content loss network don’t converge to meaningful state. Perceptional loss is nothing but the
loss and can be defined as Euclidean difference between deep network feature maps of the real data samples and generated data samples. The perceptional loss is given in the Eq. (8).(8) 
where, is represent the feature maps of deep Convolutional Neural Network before maxpooling layer. and are pixels dimensions of the feature maps. The adversarial network loss function is combination of the above loss functions (7) (8). Hence the total loss is given by,
2 Dataset preparation
The real blurs in the images are extremely complex which cannot be approximated simple parametric model to generate synthetically. Also ,It is very unlikely to happen to acquire the image pairs of blurred image and corresponding shape images to train the GAN network. Hence, the image pairs are created artificially. The high resolution sharp images are collected from various mobile phone camera images. YOLO network is trained to localize the faces in the images. The cropped images are later used in the training. The main objective of the generating blurred images is to degrade the information presented in the original image. In that connection, high degree of motion blur, camera shake blur and defocus blur is applied to the original image. There are
image pairs are created for training. Since we want only learn the statistical information about the sharp image, the repeated image is also present in the training but with different blur. The original image and the corresponding blurred images are taken for training the GAN network. Many investigation is also successfully developed to develop synthetic blur images. These method proposed that the synthetic blurred images can be created by convolving the shape images with linear motion of blur kernel or randomly sampling sie random points and fit a spline. In our synthetic generation of blur images are concerned with varying the direction of the blur kernel.2.1 Image blur generation
The blur kernel, known as point spread functions causes the image pixel to record light photons from the multiple scene points. In real time, many factor can cause the image blur that can degrade the information and quality of the objects appeared in the scene [28]. Commonly image blur can be induced by object motion, atmospheric turbulence, physical intrinsic, camera shake and defocus. In classical deblur algorithm methods, the information recovery from the degraded images requires the understanding the kernel and appropriate modeling of the information presented. Also, it highly complicated to generate blur images that could occur in real time. Hence, its necessity to understand the image formation models. Image formation posses the information of radiometric and geometric by projecting the 3D world in to the 2D focal plane. The light rays passes through the camera lens is projected into the focal points. This can be modeled as, concatenation of the perspective projection and geometric distortion. The digital information of the images are formed by the discretization of the analog images which is transformed by the light photons [29]. This can be expressed as,
(9) 
where, is represent the absorbed blurred image as the function of sampling operator, is represent the perspective projection of the real planer scene, is represent the extrinsic kernel blur, is represent the intrinsic kernel blur and is represent the convolution operation. The (9) is show the blur image formulation. The information recovery from the blurred image can expressed as ignoring the sampling effects,
(10) 
where, is the latent sharp image from the and h is the estimated kernel by combining the extrinsic and intrinsic blur . The general objective is to recover the information and the kernel from the degraded absorbed blurred images . For simplification, the effect of camera response function can be neglected and the blur generation can be written as,
(11) 
Considering the whole image, this can be expressed with matrixvector form as follows,
(12) 
is the noise happen to be appear in the information of the image by the sensing system. cab be modeled as Gaussian noise, Poisson noise, and impulse noise. This various degradation models based on the noise assumptions is shown in the Figure 3.
Though many other factors cause the blur in the images, can be characterized as a specific properties of the blur which leads to different types of blur such as motion blur, camera shake blur, defocus blur and atmospheric intrinsic blur.
2.2 Motion blur
Motion blur is commonly occurred when capturing fast moving object and when long time of exposure is needed. Motion blur caused bu the relative motion between the objects and sensing system. When the object motions is relatively fast as compares with the exposure period, the motion blur can occur as a linear motion blur. This can be represented by 1D averaging of the neighbor pixels,
where, is the coordinates from the center , is the moving distance and is the direction of the object moving. Motion blur can be generated by the mentioned Equation. However, in real time, the moving objects in the image only occupies a part of the image and the blur generated by such operation is extremely complicated.
2.3 camera shake blur
Camera shake blur is another kind of blur which commonly occur in many real time cases. Unlike motion blur, it caused by the motion of the camera instead of the object which result with degradation of information. Camera rotation cause the most complicated blur as it indulges the inplane and ourofplane rotation with respect to the focal plane. In inplane rotation, kernel blur varies across the images from the camera rotation axis while in outofplane rotation, the degree of spatial variance across the image is dependent on the focal length of the camera. Also, it can occur in in low lighting conditions. The motion of the camera in an irregular directions cause the in plane or out of plane motion. While capturing a long distance object, light translation in the camera motion is spatially invariant. Hence, this can be modeled as linear blur motion,
During the camera translation, the objects nearer undergoes large shift, if different objects lies in different focal plane in a single scene that causes the large degree of information degradation in an image.
2.4 Defocus blur
Defocus blur usually occurs at the image by improper focus of the image by the image sensing system. In different depths of scenes, the object outside the focus are highly suffocates from the defocus blur. It may even occur by the single lens incorporation of the sensing system by acquiring information from the object out side of the depth of the focus. Defocus blur can be approximately modeled by uniform circular model,
where, is the radius of the circle.
3 Result and Discussion
The training of the GAN network to recover the image information is carried out by using NVDIA GTX 980m GPU. Stochastic gradient descent with batch size of 4 and Adam optimizer are implemented to increase the learning speed and network convergence towards to global minimum.
The learning rate is set to and are set to
respectively. The network is trained for 240 epochs for a week. In order to acquire the probability density from the real data distribution,
number of kernels with large pixel size of is implemented in first and second convolutional layer in generator. 128 number of kernels are used in Resnet blocks with the kernel size of . It is important to note that the Instance BatchNorm is implemented in after every convolution operators to avoid discriminator loss quickly approach zero. LeakyReLU with is used as the nonlinear activation function to avoid the sparse gradient. In upsampling module, PixelShuffle, Convolution Transpose with stride two is used while in downsampling module, average Pooling, Convolution operation with stride of two is enchanted. Without these upsampling, downsampling modules, generator suffers from the generation of undesirable pixel noise. Dropout with probability of is used after the convolution operation. To optimize the network, as mentioned in image to image translation GAN network, training is carried out to increase the instead minimizing the . Alternate gradient step is used between the generator and the discriminator. The generated images on training is shown in the Figure 5. First column of the images is the artificially created blurred images, middle column is the ground truth and the final column is the generated images. During the training, the network is generated more realistic images. Figure 6 shows the generated images on test data. Left pair of images are the successful information on the testing. Right side pair images are the failure cases. During the testing of the architecture, many cases are resulted with failure. The network is capable of recover the information, if the network is trained on same family of degraded images. However, the network can be trained to have more generalization, if the network is trained on wide variety of image pairs.4 Conclusion
A cGAN based framework for recovering the possible information from heavily blurred images is proposed. The training of the network is carried out by using adversarial loss and perceptional loss function. The generated images from various experiments show that the addition of upsampling and downsampling module in the generator network is help to increase the performance of the network dramatically as well as to recover the informations. The primary objective is only to recover the information from the blurred image faces, only face images with blur is considered for training and testing. Hence, any comparative study between the stateofart models isn’t conducted. Also, another important conclusion is made from the experimentation that the network is capable of recover the information fully as long as the network is trained on the same verity of degraded images. This research work is carried out as a part of our IOP studio software development ’Facial recognition module’.
References
 [1] RC Puetter, TR Gosnell, and Amos Yahil. Digital image reconstruction: Deblurring and denoising. Annual Review of Astronomy and Astrophysics, 43, 2005.
 [2] Mariana SC Almeida and Luís B Almeida. Blind and semiblind deblurring of natural images. IEEE Transactions on Image Processing, 19(1):36–52, 2010.
 [3] Bahadir Kursat Gunturk and Xin Li. Image restoration: fundamentals and advances. CRC Press, 2012.
 [4] Hui Ji and Kang Wang. Robust image deblurring with an inaccurate blur kernel. IEEE Transactions on Image processing, 21(4):1624–1634, 2012.

[5]
Marcelo Bertalmio, Luminita Vese, Guillermo Sapiro, and Stanley Osher.
Simultaneous structure and texture image inpainting.
IEEE transactions on image processing, 12(8):882–889, 2003.  [6] Sung Ha Kang, Tony F Chan, and Stefano Soatto. Inpainting from multiple views. In 3D Data Processing Visualization and Transmission, 2002. Proceedings. First International Symposium on, pages 622–625. IEEE, 2002.
 [7] Marcelo Bertalmio, Guillermo Sapiro, Vincent Caselles, and Coloma Ballester. Image inpainting. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques, pages 417–424. ACM Press/AddisonWesley Publishing Co., 2000.
 [8] Marcelo Bertalmio, Andrea L Bertozzi, and Guillermo Sapiro. Navierstokes, fluid dynamics, and image and video inpainting. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, volume 1, pages I–I. IEEE, 2001.
 [9] Jianhong Shen and Tony F Chan. Mathematical models for local nontexture inpaintings. SIAM Journal on Applied Mathematics, 62(3):1019–1043, 2002.
 [10] Shantanu D Rane, Guillermo Sapiro, and Marcelo Bertalmio. Structure and texture fillingin of missing image blocks in wireless transmission and compression applications. IEEE transactions on image processing, 12(3):296–303, 2003.
 [11] Ruxin Wang and Dacheng Tao. Recent progress in image deblurring. arXiv preprint arXiv:1409.6838, 2014.
 [12] AN Tikhonov. Vy arsenin solutions of ill posed problems wh winston. Washington DC, 1977.
 [13] William Hadley Richardson. Bayesianbased iterative method of image restoration. JOSA, 62(1):55–59, 1972.

[14]
Robert Grover Brown, Patrick YC Hwang, et al.
Introduction to random signals and applied Kalman filtering
, volume 3. Wiley New York, 1992.  [15] Mario Bertero and Patrizia Boccacci. Introduction to inverse problems in imaging. CRC press, 1998.
 [16] Pushparaja Murugan. Feed forward and backward run in deep convolution neural network. arXiv preprint arXiv:1711.03278, 2017.
 [17] Pushparaja Murugan and Shanmugasundaram Durairaj. Regularization and optimization strategies in deep convolutional neural network. arXiv preprint arXiv:1712.04711, 2017.
 [18] Pushparaja Murugan. Hyperparameters optimization in deep convolutional neural network/bayesian approach with gaussian process prior. arXiv preprint arXiv:1712.07233, 2017.
 [19] Pushparaja Murugan. Implementation of deep convolutional neural network in multiclass categorical image classification. arXiv preprint arXiv:1801.01397, 2018.
 [20] Ian Goodfellow, Jean PougetAbadie, Mehdi Mirza, Bing Xu, David WardeFarley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
 [21] Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
 [22] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. In Advances in Neural Information Processing Systems, pages 2234–2242, 2016.
 [23] Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein gan. arXiv preprint arXiv:1701.07875, 2017.
 [24] Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. Photorealistic single image superresolution using a generative adversarial network. arXiv preprint, 2017.

[25]
Phillip Isola, JunYan Zhu, Tinghui Zhou, and Alexei A Efros.
Imagetoimage translation with conditional adversarial networks.
arXiv preprint, 2017.  [26] Raymond Yeh, Chen Chen, Teck Yian Lim, Mark HasegawaJohnson, and Minh N Do. Semantic image inpainting with perceptual and contextual losses. arXiv preprint arXiv:1607.07539, 2016.
 [27] Sainandan Ramakrishnan, Shubham Pachori, Aalok Gangopadhyay, and Shanmuganathan Raman. Deep generative filter for motion deblurring. arXiv preprint arXiv:1709.03481, 2017.
 [28] Manya V Afonso, José M BioucasDias, and Mário AT Figueiredo. Fast image recovery using variable splitting and constrained optimization. IEEE transactions on image processing, 19(9):2345–2356, 2010.
 [29] Mauricio Delbracio, Pablo Musé, Andrés Almansa, and JeanMichel Morel. The nonparametric subpixel local point spread function estimation is a well posed problem. International journal of computer vision, 96(2):175–194, 2012.
Comments
There are no comments yet.