Subsampled Turbulence Removal Network

07/12/2018 ∙ by Wai Ho Chak, et al. ∙ The Chinese University of Hong Kong 0

We present a deep-learning approach to restore a sequence of turbulence-distorted video frames from turbulent deformations and space-time varying blurs. Instead of requiring a massive training sample size in deep networks, we purpose a training strategy that is based on a new data augmentation method to model turbulence from a relatively small dataset. Then we introduce a subsampled method to enhance the restoration performance of the presented GAN model. The contributions of the paper is threefold: first, we introduce a simple but effective data augmentation algorithm to model the turbulence in real life for training in the deep network; Second, we firstly purpose the Wasserstein GAN combined with ℓ_1 cost for successful restoration of turbulence-corrupted video sequence; Third, we combine the subsampling algorithm to filter out strongly corrupted frames to generate a video sequence with better quality.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The problem of image restoration from a sequence of frames under the atmospheric turbulence is challenging due to the dramatic downgrade in the image quality from the geometric distortions and the space-time varying blurs. Multiple factors such as temperature changes, air turbulent flow, densities of air particles, carbon dioxide level and humidity lead to the occurrence of several turbulence layers with various changes in the refractive index hufnagel1964modulation roggemann2018imaging . These factors together explain the higher chance of obtaining corrupted video sequences in locations where the variation among these factors is large. In practice, either techniques in hardware-based adaptive optics pearson1976atmospheric tyson2010principles or methods in image processing shimizu2008super seitz2009filter li2007atmospheric vorontsov1999parallel hirsch2010efficient are employed to remove the turbulence distortion in the images, but those prevailing models from either way can barely address to the majority of these factors.

Due to the fact that the atmospheric turbulence is complicated to be modeled, a deep learning approach which does not heavily require the underlying assumptions is more reasonable to tackle the problem than models relying on certain assumptions on the turbulence. We are thus motivated to investigate the possibility to remove geometric distortions and restore a good-quality image by using a generative model that does not explicitly take the above-mentioned factors into consideration. However, the unavailability of massive turbulence-distorted video frames disables the application of deep learning approaches to tackle the problem.

In this paper, we introduce a simple and yet effective data augmentation method to overcome the problem of data scarcity. The method models real turbulence with different deformations and different extent of blurs in order to provide sufficient training data. Since the artificial turbulence is randomly generated with different strength of deformations and blurs, a variety of turbulence-distorted videos can be produced from a single image. In general, it is known that the performance of image restoration is commensurate with the training sample size. Nevertheless, with the data augmentation method, the size requirement of the training data is not too restrictive and demanding in our proposed deep network to restore the turbulence-distorted images.

With the augmented training data, a deep network can be trained to solve the deturbulence problem. We propose a subsampled Wasserstein Generative Adversarial Network (WGAN) with multiframe input and cost to simultaneously remove geometric distortions and blurring effects of turbulence-distorted image sequences. WGAN is known for its effectiveness in generating a clear image from noises. Together with the cost applied to the network, important features of the images can be restored even though they are corrupted. To gather enough information, it is natural to take multiple frames from the video as the input of the turbulence-removal network. Using multiple frames as input is essential to obtain a clear image from a turbulence-distorted video.

In the testing stage, we propose to incorporate a subsampling algorithm to the trained network for better performance. Usually, turbulence-distorted video consists of mildly distorted frames. The subsampling method extracts those sharp and mildly distorted frames in order to achieve an even better restoration result. We experimentally show that by incorporating the subsampling method, the performance of removing geometric distortions and blurs of the degraded images can be significantly improved.

1.1 Contributions

The main contributions of this paper are listed as follows:

  1. We propose a deep-learning approach, which is a WGAN model with the multiframe input and

    loss, for the restoration of turbulence-distorted images. To the best of our knowledge, it is the first work to study the feasibility of applying deep convolutional neural network for solving the deturbulence problem to simultaneously remove geometric distortions and space-time varying blurs.

  2. We propose a data augmentation method to generate geometrically distorted and blurry images for training. It overcomes the problem of data scarcity. As such, the use of deep learning approaches to tackle the deturbulence problem is made possible that a sufficiently large dataset is not required.

  3. We propose to incorporate a subsampling method into the trained network to obtain a better restored image. Experimental results demonstrate that the performance of the proposed model can be significantly improved with the subsampling strategy.

2 Related Work

2.1 Restoration of turbulence-distorted images

The main tasks of restoring turbulence-corrupted images consist of the removal of geometric distortions and space-time varying blurs. It is in general challenging to discard both the geometric distortions and blurs simultaneously. Several previous works are firstly devoted to reconstruct a clean image through the process of image fusion by registering the image frames to a good reference image. Meinhardt-Llopis and Micheli meinhardt2014implementation micheli2014linear

proposed a reference extraction method by registering frames to a ‘centroid’ image. The basic idea is to warp each image frame by the average deformation field between it and the other images from the turbulence-degraded video. This method has an assumption that the deformation between the original image and the distorted frames is zero on average. However, the estimated movements of individual pixels can sometimes be much larger, and the mean displacement of each pixel may deviate more significantly from zero in a real turbulence-distorted video. It may pose a challenge for the centroid method to remove all the geometric distortions. Another approach to obtain a clear reference image is done by selecting a ”lucky frame”, which is the sharpest frame from a distorted video

vorontsov2001anisoplanatic . This method is motivated and supported by the statistical proofs fried1978probability

that show a high probability of extracting video frames with sharp texture details given a sufficient amount of frames. Nevertheless, in many situations, getting a frame that is entirely sharp everywhere is difficult. To alleviate this issue, the Lucky-Region method has been proposed by Aubailly

et al. aubailly2009automated , which chooses the sharpest patch from each frame and combine them afterward. Motivated by this patch-wise sharpness selection method, another approach introduced by Anantrasirichai et al. anantrasirichai2013atmospheric suggests to having a frame selection prior to registration. A composite cost function was introduced, and the selection was done in one step by sorting. However, some of the selected frame may geometrically differ significantly from the reference image. On the other hand, the cost function assumes the reference image given by the temporal intensity mean over all frames can accurately approximate the underlying true image, which is usually not the case. Similarly, a subsampling method introduced by Roggemann roggemann1994image selects subsamples from images produced by adaptive-optics systems to generate a temporal mean with higher signal-to-noise ratio.

To enhance the accuracy of the registration onto a reference image, a feasible approach is to stabilize the video and reduce the deformation between each frame and the reference image. The SGL method purposed by Lou et al. lou2013video incorporates Sobolev gradient and Laplacian for stabilization of the video sequence, and finds the latent image by the Lucky-Region method.

Robust Principle Component Analysis (RPCA) candes2011robust is a recent approach to solve the deturbulence problem. Low-rank decomposition method proposed by He et al. he2016atmospheric decomposes the video sequence into the low-rank and sparse parts. A variational approach introduced by Xie et al. xie2016removing is applied to improve the initial reference image as the low-rank image that captures the texture information and suppresses geometric distortions, although it usually looks blurry. Registration may sometimes fail when there is a large deformation between the observed video frames and the reference image.

Another recent approach is the joint subsampling and reconstruction variational model proposed by Lau et al. lau2017variational . An advantage of the model is that there is no registration involved during the subsampling and reconstruction processes, and hence it is computationally efficient. Using the proposed energy model with various fidelity terms, restoration of turbulence-distorted images of different degrees of distortions can be achieved.

2.2 Generative Adversarial Networks

Generative adversarial networks (GANs) firstly proposed by Goodfellow at al. goodfellow2014generative defines two separated competitors: the generator and the discriminator . The generator is designed to produce samples from noise while a discriminator is designed to distinguish real sample and generated sample . The main objective of the generator is to generate perceptually persuasive samples that are challenging to be discriminated by the real samples. The competition between the generator and the discriminator can be described by the minimax objective shown as follows:

(1)

where is the data distribution and is the generated distribution given by , where is sampled from a noise distribution. The advantage of GANs is the ability to generate clear samples with high perceptual quality. However, as described by Salimans et al. salimans2016improved , there are undesirable issues such as vanishing gradients and mode collapse in the training. The difficulties can be explained by the fact that minimizing the objective function for GANs is equivalent to minimizing the Jensen-Shannon divergence, which is locally saturated and results in vanishing gradients, between the data and model distributions.

Later, Arjovsky et al. arjovsky2017wasserstein addressed the gradient vanishing problem by introducing the weaker Wasserstain-1 distance which gives clear gradients almost everywhere in the GAN model. The competition between the two networks is reformulated as our minimax optimization objective:

(2)

where is the set of -Lipschitz functions such that . The original Lipschitz constraint enforcement proposed by Arjovsky at al. arjovsky2017wasserstein is weight clipping to . Another approach proposed by Gulrajani et al. gulrajani2017improved is adding the gradient penality term

(3)

The approach does not require hyperparameter tuning and is robust to the selection of the architecture of generator. In contrast to the conventional convolutional neural network,

GANs can generate clearer images. WGAN- proposed by Kupyn et al. kupyn2017deblurgan has been shown effective in image deblurring.

3 Trn: Turbulence Removal Network

In this section, we describe our proposed method based on deep convolutional neural network, namely, the Turbulence Removal Network (TRN). Figure 3 shows that whole network architecture for both the generator network and the critic network .

3.1 Data Augmentation for Turbulence

The first step of our proposed algorithm is to synthesize sufficient training data distorted by turbulence. A large sample size of training data is typically necessary for solving tasks using deep learning approaches, but unfortunately there is limited turbulence-distorted data available. This hinders the application of deep learning approaches for turbulence removal. To alleviate this issue, we introduce a new, simple but effective method for generating the training data from a few data for deep learning.

More specifically, a single (clean) frame is transformed to another image with geometric distortions and blurs as follows. We first select pixel positions randomly, where and are the width and height of the image respectively. At each randomly selected pixel position , we consider a local patch

around the pixel. A motion vector field

is then generated in . For each , the vector

is sampled from a normal distribution, smoothened by a Gaussian kernel and entry-wisely multiplied by a strength value of distortion. Mathematically, the vector field

can be written as:

(4)

where

is the Gaussian kernel with standard deviation

, is the strength value, and are randomly selected from a normal distribution. is then extended to the whole image domain by setting zero outside . It is then employed to wrap the original image to get a transformed image. We repeat this process by iterations.

Essentially, the overall motion vector field after iterations is defined by fusing the vector patches together wherever overlapping. Mathematically,

(5)

where is the collection of randomly selected pixel positions.

We denote the transformed image by . The transformed image is further blurred by a Gaussian kernel . The parameter is sampled uniformly from . The final transformed image with geometric distortions and blurs is given by . Using the proposed algorithm with randomized parameters, an original clean image is transformed into a sequence of images with geometric distortions and blurs for training. Figure 2 shows some of the transformed images with different strength values. Experimental results suggest that this proposed algorithm with random parameters can successfully cover most possible deformations and hence geometric distortions can be successfully learnt from the deep network.

The data augmentation algorithm to synthesize turbulence-distorted video frames for training is summarized in Algorithm 1. Figure 1 illustrates the overall procedure of the data augmentation algorithm.

Figure 1: The overall procedure of the data augmentation algorithm.
(a) S = 0.1 (b) S = 0.2 (c) S = 0.3 (d) S = 0.4
Figure 2: Generation of turbulence-distorted frames with different distortion strength and fixed blur constant .

3.2 Wgan with Multiframe Input

The proposed turbulence removal network (TRN) is a multi-frame subsampled WGAN with the cost incorporated into the model. Multiframe input is adopted in TRN to absorb sufficient information on the turbulence deformation of the original image. Then, TRN is trained to remove geometric distortions and blurs with the WGAN architecture. The additional cost attempts to retain the important textures of the original image.

3.2.1 Multiframe Input

We first discuss the input of our proposed TRN. The conventional input for GANs is a noise vector randomly generated according to the normal distribution. Then the noise vector is transformed into the desired output through the generator. Our network is similar to DeBlurGan kupyn2017deblurgan . The architecture of DeBlurGan kupyn2017deblurgan requires blurred image as an input and produce a deblurred image. Although blur is one of the consequences of turbulence observed in the frames, a single frame from a turbulence-distorted video as an input is experimentally shown to be ineffective in recovering the original image. Therefore, using the original architecture from DeBlurGan is insufficient to remove undesirable effects such as the geometric distortions. Motivated by this observation, the input in our network is a turbulence-distorted multiframe input originated from a clear image. Thus, the improved version of our new architecture is to include multiple frames as the input. Instead of taking the whole video sequence as the input, subsampled frames are selected.

1:Parameters:
2: - number of iterations
3: - patch size
4: - mean of the Guassian kernel
5: - standard deviation of the Guassian kernel
6: - distortion strength, uniform from [0.1,0.4]
7: - blur constant, uniform from [0.1, 1]
8:procedure DistortBlur (Img, , , )
9:     Create a Gaussian kernel from Normal CDF ()
10:     for  do
11:         
12:         
13:         
14:
15:         
16:
17:         Convolve the vector fields with the kernel      
18:     Wrap the image with vector fields with the kernel.
19:     Blur the image by convoluting it with a Gaussian smoothing window
20:return Distorted Video Frames
Algorithm 1 Distortion and Blur Generation

With the data augmentation method described in last subsection, the training data is a multiple frames transformed from the original clean image of size . In TRN, the input is a selected subsampled frames from . Instead of using the whole sequence of frames as the input, we randomly select frames from the whole video in the training stage as the input for the generator in the GAN model. In the testing stage, we incorporate a subsampling method lau2017variational to select the most useful frames as the input. The incorporating of the subsampling method into the network is shown to be effective in obtaining a significantly better restored image.

We now describe the subsampling method we incorporate in the network in details. Given a turbulence-distorted video frames , we consider a variational model to get an optimal subsample set of sharp and mildly distorted images. is the index set of the subsample set , where is the number of chosen video frames in the subsample. Simultaneously, we obtain a reference image from the subsample set . The variational model is formulated in the following form:

(6)

The fidelity term is the discrepancy term between the reference image and the video frames. In our model, we define for measuring the distance between the reference image and the subsampled video . The quality term for each video frame is based on the normalized version of :

(7)

The term is the convolution of with the Laplacian kernel specifically highlighting the edges and features of objects in the image . The sharper the image , the higher the magnitude of . As a consequence, the normalized quality measure is smaller when is sharp. The term in the energy model is a positive constant to quantify the importance of sharpness of the frame . The regularization term is the concave increasing function , where is a constant to quantify the importance of the number of selected frames. The function is chosen in order to acquire more information from additional video frames, whereas the effect on the quality of the reference image is reduced with a marginal increase in the subsample size. The detailed formulation of the variational model is described in lau2017variational . An alternating minimization strategy can be used to solve the model, which is described in Algorithm 2.

1:Parameters:
2: - sharpness parameter
3: - subsample size parameter
4: - subsample decay rate parameter
5:procedure Image Subsampling (, , , )
6:     Compute
7:     Compute the quality measure for each video frame
8:     repeat
9:         Given . Fixing , solve
10:         Compute for each and arrange in ascending order.
11:         Compute the sum for each and arrange in ascending order.
12:         
13:         Fixing , solve
14:         
15:     until 
16:return subsampled image sequence
Algorithm 2 Image Subsampling

3.2.2 U-Net Architecture for Generator Network

The generator network we use is the U-Net ronneberger2015u

, which consists of five types of layers: convolutional layer, deconvolutional layer, max-pooling layer, Randomized Leaky ReLU activation layer (

) xu2015empirical and instance normalization layer ulyanovinstance

. U-Net is known to involve a contracting path for contextual preservation and a symmetric expanding path for localization, and hence was particularly successful in image segmentation, denoising and super-resolution. Thus, U-net is used as the main architecture for our generator network

.

The subsampled turbulence-distorted multiframe passes through 7 blocks of convolutional layers and 6 blocks of deconvolutional layers to generate a clear image. The first 7 blocks contains convolutional layers, followed by the 6 remaining blocks consisting of deconvolutional layers. Each block

contains convolutional layers, non-linear activation layers and instance normalization layers. The temporal features extracted in each block are down-sampled by max-polling except for the features of the last block

. The features in and are concatenated before passing through the block

in order to retain the deep features without too much information loss. The feature collected in the first block

is then concatenated with the feature from the block to output the feature in the second block . Repeating the process, we obtain a clear image which is of the same size as the original undistorted image.

The generator network is not pre-trained, since the input of the architecture is different from the conventional one. The conventional model takes the three channels of the image as input. In our case, we have a subsample of turbulence-distorted video frames , which are randomly chosen in the training stage and selected by the subsampling method introduced in the last subsection in the testing stage. The generator network is trained after the critic network is trained multiple times to give a clearer image

. The loss function of the generator network for removing geometric distortion and blurs is defined by

(8)

The first term in the loss function is the adversarial loss that encourages solutions to reside on the manifold of natural images. In order to retain the textures inherited from the turbulence-distorted video frames, we further incorporate the loss into the loss function . Note that the combination of the pixel-wise error term with the adversarial loss has an advantage. It was suggested that the minimization of the loss function that contains only the pixel-wise error term, such as the or error, is insufficient to produce a clear image ledig2017photo . Besides, the error term can often cause image blur. Thus, we employ error, instead of the error, in our loss function to make the image much less blurry. Experimental results demonstrate that the combination of the two terms in the loss function can effectively remove geometric distortions and undesirable artifacts, such as image blurs.

3.2.3 Critic Network

The critic network in the WGAN arjovsky2017wasserstein is a deep CNN involving convolutional layers, fully-connected layer, ReLU activation layer nair2010rectified and instance normalization layer ulyanovinstance . We denote the first 6 convolutional layers by and the last fully connected layer by . The critic values and are passed into the critic network to output the Wasserstain-1 distance

(9)

The critic network is trained till optimal before updating the generator network . The loss function of the critic network in the training process is provided as follows:

(10)

where

is a randomly generated number from uniform distribution

. Since there is no pre-trained model involved in the generator network , the whole training takes a long time and the loss blows up. In order to further enforce the -Lipschitz assumption in the critic network , we further impose the weight constraint . The weight is clipped in the interval . Together with this weight constraint, the training becomes more stable.

Figure 3: The generator network has the U-net architecture, and the critic network is the conventional convolutional neural network. The subsampled frames are concatenated before passing through the generator network.

4 Experiments

4.1 Dataset

The dataset for training is collected from Flicker. It consists of 1500 images of buildings and 1000 images of chimneys. All collected images are resized to 256x256 and are synthetically deformed by our data augmentation algorithm. More specifically, each image is deformed to produce 100 deformed video sequences. Therefore, the whole dataset is enlarged by a factor of 100. We test the trained network on more than 400 testing data, which are different from the training dataset. The testing dataset consists of simulated video sequences as well as real turbulence-distorted video sequences.

4.2 Training Details

The experiments are conducted in PyTorch

paszke2017automatic with a CUDA-enabled GPU. The data augmentation algorithm is carried out in Matlab before the deep learning process is conducted. The strength value of distortion and the blurring parameter are randomly sampled from and respectively. Adam solver kingma2014adam is used for the gradient descent with a learning rate of , and for both the generator and the critic . We set 3 gradient descent steps for and then 1 step for . We also apply the instance normalization and dropout to improve the training. In addition to the gradient penalty term gulrajani2017improved , we enforce the parameters in the range

. For each epoch, we train both the network with batch size of 1, and set

. Furthermore, we randomly select 20 frames from the video sequence as our input. The whole training process for 40 epochs takes around 3 days. Figure 4 demonstrates that the restoration performance is gradually better in the training. The network firstly discards geometric distortion from the turbulence in the first few epochs and then attempts to deblur and preserve the texture of the original image in the remaining epochs.

Figure 4: The training starting from the 1st epoch (left) to the 9th epoch (right). Each displayed image from its left image, except the first one, are generated for 1-2 epochs. The training performance is gradually improved.

5 Result

After the TRN is trained, we test its performance on more than 400 testing data consisting of simulated and real turbulence-distorted videos. The testing data are different from the training data. In this section, we report some of the experimental results.

5.1 Restoration of simulated turbulence-distorted videos

Figure 5 and 6 show the restoration results of some simulated turbulence-distorted image sequences capturing different buildings. The first column shows the observed frames from each turbulence distorted image sequences, which are degraded by both geometric distortions and blurs. The middle column shows the restoration results using the TRN without the incorporation of the subsampling method. Note that most geometric distortions and blurs are removed, although some amount of distortions can still be observed. The right column shows the restoration results using the TRN with the incorporation of the subsampling method. With subsampling, the geometric distortions and blurs can be removed more successfully. The restoration results are more satisfactory compared to those without subsampling. It demonstrates the incorporation of the subsampling method into the deep network is beneficial.

      (a) Observed             (b) TRN (no sub)           (c) TRN (with sub)
Figure 5: Restoration of turbulence-distorted ’building’ images. Column (a) shows the observed frames from each video. Column (b) shows the restoration results using the proposed TRN without subsampling. Column (c) shows the restoration results using TRN with subsampling.
      (a) Observed             (b) TRN (no sub)           (c) TRN (with sub)
Figure 6: Restoration of another set of turbulence-distorted ’building’ images. Column (a) shows the observed frames from each video. Column (b) shows the restoration results using the proposed TRN without subsampling. Column (c) shows the restoration results using TRN with subsampling.

We have test our deep network on some ’chimney’ image sequences. Figure 7 and 8 shows the restoration results of some simulated turbulence-distorted image sequences capturing different chimneys. Again, the first column shows the observed frames from each turbulence distorted image sequences, which are degraded by both geometric distortions and blurs. The middle column shows the restoration results using the TRN without the incorporation of the subsampling method. The right column shows the restoration results using the TRN with the incorporation of the subsampling method. Again, with the incorporation of the subsampling method, the geometric distortions and blurs can be removed more successfully. The restoration results are more satisfactory compared to those without subsampling. It again demonstrates the benefit of incorporating the subsampling method into the deep network.

      (a) Observed             (b) TRN (no sub)           (c) TRN (with sub)
Figure 7: Restoration of turbulence-distorted ’chimney’ images. Column (a) shows the observed frames from each video. Column (b) shows the restoration results using the proposed TRN without subsampling. Column (c) shows the restoration results using TRN with subsampling.
      (a) Observed             (b) TRN (no sub)           (c) TRN (with sub)
Figure 8: Restoration of another set of turbulence-distorted ’chimney’ images. Column (a) shows the observed frames from each video. Column (b) shows the restoration results using the proposed TRN without subsampling. Column (c) shows the restoration results using TRN with subsampling.

To test the performance of our trained network to handle general large deformations, we randomly generate geometric distortions of an original image using large quasi-conformal deformations. More specifically, we randomly select some pixel positions in the image domain. A patch-wise triangular mesh is formed with the chosen position as the center. The method we use to generate artificial turbulence is deformation using Laplace-Beltrami solver (LBS) lam2014landmark . We propose to assign the Beltrami coefficient which is a measure of nonconformality on each face vertex as follows:

where are numbers randomly chosen in the range for and for . Then we obtain the deformation field by using the LBS solver, and wrap the image. By introducing image blurs to each deformed images, we obtain an image sequence with large geometric distortions and blurs. Note that the quasi-conformal deformations have never been seen in the training process. Our aim is to investigate whether the trained deep network can deal with general deformations. The experimental results are shown in Figure 9. In the Figure 9, the first column shows the observed frames from each distorted image sequences with large quasi-conformal deformations. The second column shows the restored images using TRN. The geometric distortions and blurs are successfully removed. These results show that the TRN can effectively handle general large deformations.

(a) Observed      (b) Restored by TRN
Figure 9: Restoration of image sequence distorted by large quasi-conformal deformations. Column (a) shows the observed frames from each video. Column (b) shows the restoration results using the proposed TRN.

5.2 Comparison with other methods

We also compare our proposed deep-learning based algorithms with other existing methods, namely, the SGL method lou2013video and the IRIS method lau2017variational . Some experimental results are shown in Figure 10. In the Figure 10, the first column shows the restoration results of some turbulence-distorted image sequences using TRN. The second column shows the restoration results using SGL. The last column shows the restoration results using IRIS. The restoration results using SGL is generally blurry and geometrically distorted. The results restored by IRIS have less geometric distortions and blurs, although some geometric deformations can still be visualized. In general, TRN gives the best restoration results with least geometric distortions and blurs. These visual results are also validated quantitatively using PSNR and SSIM, as reported in Table 1.

(a) Restored by TRN       (b) Restored by SGL     (c) Restored by IRIS
Figure 10: Comparison between TRN, SGL and IRIS on ’building 1’, ’building 2’ and ’building 3’. (a) shows the restoration results by TRN. (b) shows the restoration results by SGL. (b) shows the restoration results by IRIS

.

PSNR SSIM
TRN SGL IRIS TRN SGL IRIS
building 1 21.3 19.3 20.1 0.838 0.697 0.749
building 2 23.7 22.1 23.0 0.808 0.714 0.740
building 3 24.7 23.8 24.5 0.836 0.770 0.793
Table 1: PSNR and SSIM of the restored images using different deturbulence models.

5.3 Restoration of real turbulence-distorted videos

We also test the TRN on real turbulence-distorted videos that do not have a clear ground-truth image. Figure 11 shows the restoration results of a real ‘chimney’ turbulence-distorted image sequence. (a) shows an observed frame from the image sequence. (b) shows the restoration results using TRN without subsampling. Most geometric distortions and blurs are suppressed. (c) shows the restoration results using TRN with subsampling. With subsampling, the results are more satisfactory compared to those without subsampling. It again demonstrates the effectiveness of incorporating the subsampling model into the deep network.

Figure 12 shows the restoration results of another real turbulence-distorted image sequence capturing a building. Again, (a) shows an observed frame from the image sequence. (b) shows the restoration results using TRN without subsampling. (c) shows the restoration results using TRN with subsampling. As before, with subsampling, the results are more satisfactory than those without subsampling.

(a) Observed          (b) TRN (no sub)      (c) TRN (with sub)
Figure 11: Restoration of real turbulence-distorted image sequence capturing a chimney. (a) shows an observed frame from the image sequence. (b) shows the restored image using TRN without subsampling. (c) shows the restored image using TRN with subsampling.
(a) Observed           (b) TRN (no sub)      (c) TRN (with sub)
Figure 12: Restoration of real turbulence-distorted image sequence capturing a building. (a) shows an observed frame from the image sequence. (b) shows the restored image using TRN without subsampling. (c) shows the restored image using TRN with subsampling.

6 Conclusion

We introduce the turbulence removal network (TRN), which is a Generative Adversarial Network (GAN) incorporated with objective function, to suppress geometric distortions as well as removing blurs of image sequences distorted by turbulence. Although there is only a limited amount of available data corrupted by real turbulence, we proposed a data augmentation method to synthetically generate turbulence-distorted image frames for training. A subsampling method is further incorporated into the trained network to obtain an improved restoration result. Extensive experiments have been carried out to test the deep network, which demonstrates the effectiveness of the proposed model to restore turbulence-distorted images. In the future, we will explore the possibility to develop a turbulence removal network to restore turbulence-distorted video with moving objects.

Acknowledgment

We would like to thank Mr. M. Hirsch and Dr. S. Harmeling from Max Planck Institute for Biological Cybernetics for sharing the real chimney and building video sequence. Lok Ming Lui is supported by HKRGC GRF (Project ID: 402413).

References

References

  • (1) R. Hufnagel, N. Stanley, Modulation transfer function associated with image transmission through turbulent media, JOSA 54 (1) (1964) 52–61.
  • (2) M. C. Roggemann, B. M. Welsh, B. R. Hunt, Imaging through turbulence, CRC press, 2018.
  • (3) J. E. Pearson, Atmospheric turbulence compensation using coherent optical adaptive techniques, Applied optics 15 (3) (1976) 622–631.
  • (4) R. Tyson, Principles of adaptive optics, CRC press, 2010.
  • (5)

    M. Shimizu, S. Yoshimura, M. Tanaka, M. Okutomi, Super-resolution from image sequence under influence of hot-air optical turbulence, in: Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, IEEE, 2008, pp. 1–8.

  • (6) S. M. Seitz, S. Baker, Filter flow, in: Computer Vision, 2009 IEEE 12th International Conference on, IEEE, 2009, pp. 143–150.
  • (7)

    D. Li, R. M. Mersereau, S. Simske, Atmospheric turbulence-degraded image restoration using principal components analysis, IEEE Geoscience and Remote Sensing Letters 4 (3) (2007) 340–344.

  • (8) M. A. Vorontsov, Parallel image processing based on an evolution equation with anisotropic gain: integrated optoelectronic architectures, JOSA A 16 (7) (1999) 1623–1637.
  • (9) M. Hirsch, S. Sra, B. Schölkopf, S. Harmeling, Efficient filter flow for space-variant multiframe blind deconvolution, in: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, IEEE, 2010, pp. 607–614.
  • (10) E. Meinhardt-Llopis, M. Micheli, Implementation of the centroid method for the correction of turbulence, Image Processing On Line 4 (2014) 187–195.
  • (11) M. Micheli, Y. Lou, S. Soatto, A. L. Bertozzi, A linear systems approach to imaging through turbulence, Journal of mathematical imaging and vision 48 (1) (2014) 185–201.
  • (12) M. A. Vorontsov, G. W. Carhart, Anisoplanatic imaging through turbulent media: image recovery by local information fusion from a set of short-exposure images, JOSA A 18 (6) (2001) 1312–1324.
  • (13) D. L. Fried, Probability of getting a lucky short-exposure image through turbulence, JOSA 68 (12) (1978) 1651–1658.
  • (14) M. Aubailly, M. A. Vorontsov, G. W. Carhart, M. T. Valley, Automated video enhancement from a stream of atmospherically-distorted images: the lucky-region fusion approach, in: Atmospheric Optics: Models, Measurements, and Target-in-the-Loop Propagation III, Vol. 7463, International Society for Optics and Photonics, 2009, p. 74630C.
  • (15) N. Anantrasirichai, A. Achim, N. G. Kingsbury, D. R. Bull, Atmospheric turbulence mitigation using complex wavelet-based fusion, IEEE Transactions on Image Processing 22 (6) (2013) 2398–2408.
  • (16) M. C. Roggemann, C. A. Stoudt, B. M. Welsh, Image-spectrum signal-to-noise-ratio improvements by statistical frame selection for adaptive-optics imaging through atmospheric turbulence, Optical Engineering 33 (10) (1994) 3254–3265.
  • (17) Y. Lou, S. H. Kang, S. Soatto, A. L. Bertozzi, Video stabilization of atmospheric turbulence distortion, Inverse Probl. Imaging 7 (3) (2013) 839–861.
  • (18) E. J. Candès, X. Li, Y. Ma, J. Wright, Robust principal component analysis?, Journal of the ACM (JACM) 58 (3) (2011) 11.
  • (19) R. He, Z. Wang, Y. Fan, D. Fengg, Atmospheric turbulence mitigation based on turbulence extraction, in: Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on, IEEE, 2016, pp. 1442–1446.
  • (20) Y. Xie, W. Zhang, D. Tao, W. Hu, Y. Qu, H. Wang, Removing turbulence effect via hybrid total variation and deformation-guided kernel regression, IEEE Transactions on Image Processing 25 (10) (2016) 4943–4958.
  • (21) C. P. Lau, Y. H. Lai, L. M. Lui, Variational models for joint subsampling and reconstruction of turbulence-degraded images, arXiv preprint arXiv:1712.03825.
  • (22) I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, in: Advances in neural information processing systems, 2014, pp. 2672–2680.
  • (23) T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, Improved techniques for training gans, in: Advances in Neural Information Processing Systems, 2016, pp. 2234–2242.
  • (24) M. Arjovsky, S. Chintala, L. Bottou, Wasserstein gan, arXiv preprint arXiv:1701.07875.
  • (25) I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, A. C. Courville, Improved training of wasserstein gans, in: Advances in Neural Information Processing Systems, 2017, pp. 5769–5779.
  • (26) O. Kupyn, V. Budzan, M. Mykhailych, D. Mishkin, J. Matas, Deblurgan: Blind motion deblurring using conditional adversarial networks, arXiv preprint arXiv:1711.07064.
  • (27) O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation, in: International Conference on Medical image computing and computer-assisted intervention, Springer, 2015, pp. 234–241.
  • (28) B. Xu, N. Wang, T. Chen, M. Li, Empirical evaluation of rectified activations in convolutional network, arXiv preprint arXiv:1505.00853.
  • (29) D. Ulyanov, A. Vedaldi, V. Lempitsky, Instance normalization: the missing ingredient for fast stylization. corr abs/1607.08022 (2016).
  • (30) C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al., Photo-realistic single image super-resolution using a generative adversarial network, arXiv preprint.
  • (31)

    V. Nair, G. E. Hinton, Rectified linear units improve restricted boltzmann machines, in: Proceedings of the 27th international conference on machine learning (ICML-10), 2010, pp. 807–814.

  • (32) A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, A. Lerer, Automatic differentiation in pytorch, in: NIPS-W, 2017.
  • (33) D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980.
  • (34) K. C. Lam, L. M. Lui, Landmark-and intensity-based registration with large deformations via quasi-conformal maps, SIAM Journal on Imaging Sciences 7 (4) (2014) 2364–2392.