Haze is a traditional atmospheric phenomenon where dust, smoke and other dry particles obscure the clarity of the atmosphere. In this age of ubiquitous smartphone usage, images captured by smartphone cameras under difficult hazy weather conditions undergo degradations that drastically affect the visual quality of images and make the images useless for sharing and usage. Meanwhile, the existence of haze dramatically degrades the visibility of outdoor images captured in the inclement weather and affects high-level computer vision tasks, such as image classifacation and other computer vision applications, such as autonomous driving, aerial photography and remote sensing.
where is the pixel coordinates, is the observed hazy image, is the original clear image and is the global atmospheric light. is the medium transmission map and it is distance-dependent:
where is the atmospheric scattering coefficient and is the scene depth. The goal of image dehazing is to recover clear image from hazy image .
Single image dehazing is an ill-posed problem and some methods try to use visual cues to capture deterministic and statistical properties of hazy images [27, 8, 37, 1, 5]. Recent years, we have witnessed significant advances in image dehazing mainly due to emerging CNN-based dehazing methods [22, 2, 16, 23, 34, 36]. Some works [22, 2, 16] remove haze based on atmospheric scattering model and some works [23, 34, 36] train an end-to-end model to gain a clear image.
, because they never consider information related to image classification. We find that this always leads to that the dehazed images have high performance based on dehazing evaluation metrics, such as PSNR and Structural Similarity Index (SSIM), but low performance based on the classification accuracy, or vice. Our usual purpose of image dehazing is helpful for further usage such as image classification, not just visual effects. Pei et al. show that image dehazing achieves higher PSNR and SSIM values, but cannot improve the image classification much. Therefore, it is an important problem to develop a dehazing method that not only has better dehazing effect based on dehazing evaluation metrics (e.g. PSNR and SSIM) and but also has higher classification performance.
A common approach in computer vision is to separate low-level vision tasks (e.g. image dehazing) from high-level vision tasks (e.g. image classification) and solve them independently. In this paper, we propose an unified method considering both image dehazing and classification tasks. We jointly minimize the image dehazing loss and the classification loss. With the guidance of image classification, the dehazing network is able to further improve visual quality and generate more visually appealing outputs and have better classification accuracy, which demonstrates the importance of high-level information for image dehazing. We achieve this by enforcing the dehazing sub-network to adaptively learn those features which can lead to improved visual appeal and image classification performance.
The main contributions of this paper are as follows:
We first propose an end-to-end unified CNN architecture combining dehazing and classification for image dehazing and the CNN architecture can be optimized jointly.
Instead of using general CGAN, we use a classification-driven CGAN sub-network and a classification sub-network for improve the dehazing and classification performance of the dehazed images simultaneously.
We conduct extensive experiments on synthesized hazy images, which show that our method achieves best performance both on images dehazing metrics (PSNR and SSIM) and classification accuracies of AlexNet, VGGNet and ResNet. Besides, we test our model on real hazy images and it has good visual appeal, which indicates that the effectiveness of our proposed model.
Ii Related Work
In this section, we will briefly review the most related works: image dehazing, image classification and generative adversarial network.
Ii-a Image Dehazing
Single image dehazing is an extremely ill-posed and challenging problem. Single image haze removal has made significant progresses recently, due to the use of better assumptions and priors [27, 8, 5, 37, 1]. Specifically, Tan et al.  propose a local contrast maximizing method based on markov random field for haze removal under the assumption that the local contrast of the haze-free image is much higher than that of hazy image. Although contrast maximizing approach is able to achieve impressive results, it tends to produce over-saturated images. Inspired by dark-object subtraction technique, He et al.  propose a dehazing method based on dark channel prior that is at least one color channel has some pixels with very low intensities in most of non-haze patches. Meng et al.  propose an effective regularization dehazing method to restore the haze-free image by exploring the inherent boundary constraint. Tang et al. 
combine four types of haze relevant features with random forests to estimate the transmission. The four types of haze relevant features are dark channel, local max contrast, hue disparity and local max saturation. Fattal proposes a dehazing method relying on a generic regularity in natural images in which pixels of small image patches exhibit one-dimensional distributions in RGB space, known as color-lines. Zhu et al.  present a single image haze removal algorithm using the color attenuation prior by creating a linear model for modeling the scene depth of the hazy image under this prior. Berman et al.  introduce a haze removal method based on a non-local prior, by assuming that colors of a haze-free image are well approximated by a few hundred of distinct colors in the form of tight clusters in RGB space. In a hazy image, these tight color clusters change due to haze and form lines in RGB space that pass through the airlight coordinate.
CNNs have witnessed prevailing success in computer vision tasks and are recently introduced to haze removal [22, 2, 16, 23, 34, 36]. Ren et al.  propose a multi-scale deep neural network for haze removal, and the network consists of a coarse-scale sub-network for a holistic transmission map and a fine-scale sub-network for local refinement. Cai et al. 
adopt CNN-based deep architecture, whose layers are specially designed to embody the established priors in image dehazing and it is constructed by three convolution layers, a max-pooling, a Maxout unit and a BReLU activation function. Li et al. propose a light-weight CNN designation based on a re-formulated atmospheric scattering model. Instead of estimating the transmission matrix and the atmospheric light separately as most previous models did, Ren et al.  propose an end-to-end trainable neural network that consists of an encoder and a decoder. The encoder is exploited to capture the context of the derived input images that are White Balance, Contrast Enhancing, and Gamma Correction, while the decoder is employed to estimate the contribution of each input to the final dehazed result. Zhang and Patel 
directly embed the atmospheric scattering model into the network and propose a new edge-preserving densely connected encoder-decoder structure with multi-level pyramid pooling module for estimating the transmission map and this network is optimized using a newly introduced edge-preserving loss function. Zhang proposes a dehazing method based on a conditional generative adversarial network, where the clear image is estimated by an end-to-end trainable neural network.
Ii-B Image Classification
In recent years, image classification has made significant progress, partly due to the creation of large-scale hand-labeled datasets such as ImageNet, and the development of deep convolutional neural networks . Current state-of-the-art image classification methods focus on training feed forward convolutional neural networks using “very deep” structure [24, 26, 9]. VGGNet , Inception  and residual learning  have been proposed to train very deep neural networks, resulting in excellent image-classification performances on clear natural images. Liu et al.  propose a cross-convolutional-layer pooling method for image classification. Wang et al. 
combine CNN with recurrent neural networks (RNN) for improving the image classification performance. Durand et al. study three important visual recognition tasks, image classification, weakly supervised point-wise object localization and semantic segmentation in an integrative way. Wang et al.  develop a convolutional neural network using attention mechanism for image classification. Hu et al.  propose an architectural unit based on the channel relationship, which adaptively recalibrates the channel-wise feature responses by explicitly modeling interdependencies between channels.
Ii-C Generative Adversarial Network
Generative Adversarial Networks (GANs) have become more and more popular recently. Goodfellow et al.  first propose GAN  to synthesize realistic images by learning the distribution of training images. Initially, the training of GANs is unstable, which often results in artifacts in the synthesized images. Incorporating conditional information in GAN results in more effective learning . The conditioning variables augmenting information increases the stability of learning process and improves the representation capability of the generator. Different from original GAN , the CGAN algorithm learns to generate a clear image from an input image and random noise
by optimizing the objective function. The CGAN has been made great progress in image processing field such as super-resolution32] and style transfer . Raymond et al.  propose a semantic image inpainting algorithm using a CGAN. In image super-resolution, Ledig et al.  modify the GAN formulation by introducing pixel-wise content loss and perceptual loss  to generate high quality images. Zhang et al.  use the pixel-wise content loss and perceptual loss in CGAN to solve image deraining problem. Based on CGAN, Zhang  also proposes an architecture for image dehazing.
Iii Our method
Instead of directly learning a mapping from an input hazy image to a dehazed image by using MSE loss, which can generate dehazed images that always have better performance in terms of PSNR and SSIM metrics, we aim to generate dehazed images that have better performance both on dehazing metrics and image classification accuracy. To this end, we introduce the classification-driven CGAN sub-network and the classification sub-network. The proposed network is composed of three important parts: dehazing sub-network, classification-driven CGAN sub-network and the classification sub-network, which serves as distinct purposes. In this section, we first introduce the architecture of the proposed network. Then we describe each part in detail as well as the loss function.
We propose an unified network that can be used not only to image dehazing but also to image classification, which takes a hazy image as input and can output the dehazed image as well as the image category. The proposed network is composed of three parts: image dehazing sub-network (DNet), image classification-driven CGAN sub-network (CCGAN) and image classification sub-network (CNet). The overview of our method is shown in Fig. 1. For the DNet, we use the commonly used MSE loss to generate dehazed image that aims to have visual appeal. For CCGAN, we use the GAN loss to generate dehazed image that aims to have better classification performance. For CNet, we use Cross Entropy loss to generate dehazed image that aims to further improve the classification performance.
Iii-B Dehazing Sub-network
The purpose of the dehazing sub-network is to generate a clear image from an input hazy image. Therefore, it should not only preserve the structure and detail information of an input image but also remove the haze as much as possible. Motivated by “ResNet”  and “U-Net” , we introduce skip connections of the symmetric layers to break through the bottleneck of information in decoding process. The details of the generator structures and parameter settings are shown in Table I
. Each layer of the encoding process consists of the convolution, batch normalization and LeakyReLU. Each layer of the decoding process is composed of deconvolution, batch normalization and ReLU. The size of the input and output in the generator is set to be. The size of the input in the discriminator is set to be and the size of its output is .
Iii-C Classification-driven CGAN Sub-network
In order to make the generated image have better classification performance, we introduce the classification-driven CGAN sub-network. For learning a good generator G so as to fool the learned discriminator D and make the discriminator D good enough to distinguish the real and the fake, the proposed method alternatively updates G and D. Given an input hazy image
and a random noise vector, conditional GAN aims to learn a mapping function to generate dehazed image by solving the following optimization problem:
Instead of generating good dehazed image as common CGAN, the function of the generator in this paper is to generate good features of an image. As shown in Fig. 1, we feed the clear image and the dehazed image to the generator and gain the features of those two images, respectively. Then, we use discriminator to discriminate which features come from the clear image and which features come from the dehazed image. The network structure of generator uses VGGNet that removes the fully connected layers. Due to the size of the dehazed image is , The size of the features in the last layer is , instead of . Note that we can also use other network structure.
The discriminator is used to distinguish whether the features come from a clear image (real) or a dehazed image (fake). Therefore, we develop a two fully connected layers network. For the final layer of the discriminator, we apply a sigmoid function to the feature maps so that the probability score can be normalized into [0,1].
Iii-D Classification Sub-network
In order to further improve the classification performance, we introduce a classification sub-network. We jointly train dehazing sub-networks, classification-driven CGAN sub-network and classification sub-networks to achieve better performance not only for PSNR and SSIM, but also for classification performance. The predicted output image (dehazed image) from dehazing sub-network is fed as an input to the classification sub-network. The classification sub-network can help the dehazing sub-network to generate clearer dehazed image that has better classification performance.
|Kernel size||4 4||4 4||4 4||4 4||4 4||4 4||4 4||4 4||4 4||4 4||4 4||4 4||4 4||4 4||4 4||4 4|
|Stride||2 2||2 2||2 2||2 2||2 2||2 2||2 2||2 2||2 2||2 2||2 2||2 2||2 2||2 2||2 2||2 2|
|Padding||1 1||1 1||1 1||1 1||1 1||1 1||1 1||1 1||1 1||1 1||1 1||1 1||1 1||1 1||1 1||1 1|
Iii-E Loss Function
Let and denote the hazy images and the corresponding clear images. A straightforward way to train the dehazing network is to directly utilize the MSE loss , which is given by:
where is the dehazed image and is the size of . However, we find that the method using this function is not able to make the dehazed image have better performance both on PSNR, SSIM and classification accuracy.
In order to recover realistic images, we introduce the CCGAN, the loss of which is given by:
Besides, in order to improve the image classification performance, we introduce the Cross Entropy loss . Where is the output of the last fully-connected layer of CNet that is fed to a -way softmax function and is the number of classes.
Finally, we combine the MSE loss, the GAN loss and the Cross Entropy loss to regularize the proposed network, which is defined as
We learn all parameters of the network jointly in an end-to-end fashion.
In this section, we first introduce datesets, experimental details and evaluation metrics briefly. Then we quantitatively and qualitatively evaluate our method against several state-of-the-art algorithms on synthetic and real-world hazy images.
In this section, we evaluate various image dehazing methods on the hazy images synthesized from CUB-200-2011  dataset and on the hazy images synthesized from Caltech-256  dataset, which have been widely used for evaluating image classification algorithms. We synthesize hazy images following .
CUB-200-2011 dataset contains 11,788 images from 200 classes, which has 5994 training images and 5794 testing images. Among the training images, 20% images are used as a validation set. Caltech-256 dataset contains 30,607 images from 257 classes. In Caltech-256, we select 60 images from each class as training images, and the rest as test images. Among the training images, 20% per class are used as a validation set. We follow this to split the synthetic hazy image data: an image is in training set if it is synthesized from an image in the training set and in testing set otherwise.
Iv-B Experimental Details and Evaluation Metrics
In training process, we empirically set , and . The learning rate is set to be 0.0002. We use the Adam optimization method  to train our network. While the proposed CNet can use ResNet-50, ResNet-101, VGGNet or other models, for convenience, we use ResNet-50 in this paper. We set the parameter in Eq. 2.
We will quantitatively evaluate our dehazing method on the synthetic datasets and compare it with several state-of-the-art single image dehazing methods not only using PSNR and SSIM which are widely used for evaluating the performance of image dehazing when the ground-truth haze-free image is available, but also using classification accuracy of AlexNet , VGGNet-16  and ResNet-50 . The AlexNet, VGGNet-16 and ResNet-50 architectures are pre-trained on ImageNet dataset that consists of 1,000 classes with 1.2 million training images. For fair and comprehensive comparison, we have two strategies. First, we fine-tune AlexNet, VGGNet-16 and ResNet-50 on original clear images in CUB-200-2011 and Caltech-256 datasets, respectively. Note that we change the number of channels in the last fully connected layer from 1,000 to , where
is the number of classes in our datasets. We use the fine-tuned model as a classifier to test the dehazed images of our method and other state-of-the-art methods. Second, we use the CNet in our network structure as a classifier to classify the dehazed images of our method and the state-of-the-art dehazing methods.
Iv-C Quantitative and Qualitative Comparison on Synthetic Hazy images
We compare our proposed method with nine state-of-the-art dehazing methods: Dark Channel Prior (DCP) , Boundary Constrained Context Regularization (BCCR) , Color Attenuation Prior (CAP) , Non-local Image Dehazing (NLD) , DehazeNet , Multi-Scale Convolutional Neural Networks (MSCNN) , All-in-One Dehazing Network (AOD) , Gated Fusion Network (GFN)  and Single Image Dehazing via Conditional Generative Adversarial Network (ID-CGAN) . We compare the performance of different methods on the test images from the synthetic datasets quantitatively and qualitatively. As the ground truth is available for these test hazy images, we can calculate the quantitative measures such as PSNR and SSIM. Besides, in order to test whether the image classification performance is improved or not for the dehazed images, we also calculate the classification accuracies (%) of AlexNet, VGGNet-16 and ResNet-50. The quantitative results are shown in Table. II and Table. III. It can be clearly observed that the proposed method is able to achieve superior quantitative performance. Our proposed network structure is not only used for image dehazing, but also used for image classification. We use our CNet as a classifier to test the dehazed images of our dehazing method and other state-of-the-art dehaizng methods, the results are shown in the last column in Table. II and Table. III. We can see that our CNet can improve the classification performance significantly, especially for fine-grained image classification shown in the last column in Table. II. Experiments show that our method is very useful both for image dehazing and classification.
To visually demonstrate the improvements obtained by the proposed method on the synthetic dataset, we sample some dehazing results, as shown in Fig. 2. While DCP , BCCR , CAP  and NLD  are able to remove the haze, they remove haze excessively (e.g., the dehazed images in the first row and in the third and sixth columns and the dehazed images in the fifth row and in the third column) and make the image have color distortion (e.g., the dehazed image in the forth row and in the sixth column and the dehazed images in the sixth row and in the third to sixth columns). The CNN-based methods are able to either reduce the intensity of haze or remove the haze in parts, but they fail to completely remove the haze. GFN  removes haze excessively (e.g., the first row and the tenth column) and ID-CGAN  dehazing method leads to color distortion (e.g., the third row and the eleventh column). In contrast to the other methods, our proposed method is able to successfully remove majority of the haze while guarantees no color distortion and the dehazed images using our method are closest to the ground truth images, as shown in the last column in Fig. 2.
Iv-D Ablation Study
To better demonstrate the effectiveness of each part of our proposed method, we implement detailed ablation experiments by considering the combination of three factors: dehazing sub-network, classification-driven CGAN sub-network and the classification sub-network. The results are shown in Table IV and Table V. DNet refers to use dehazing sub-network only, DNet+CCGAN refers to use dehazing sub-network and classfication-driven CGAN, DNet+CNet refers to use dehazing sub-network and classification sub-network, and DNet+CCGAN+CNet refers to use all parts.
We can see that DNet+CCGAN+CNet achieves the best performance of image dehazing both in PSNR and SSIM and classification accuracy. Compared with DNet, when we add the classification sub-network (DNet+CNet) and the classification-driven GAN (DNet+CCGAN) respectively, not only the dehazing performance is improved, but also the classification accuracies are improved. These ablation study demonstrates that the classification-driven CGAN sub-network and the classification sub-network are effective for image dehazing.
Fig. 3 shows some dehazed images with different parts. We can see that when only use the DNet, the dehazed image remains some haze. When we add CCGAN and CNet respectively, the dehazed images are clearer. When we add CCGAN and CNet simultaneously, the generated images are clearest and they are closest to the corresponding clear images.
Iv-E Qualitative Comparison on Real Hazy Images
Although the proposed network is trained on synthetic hazy images, we show that it can be generalized to handle real-world hazy images. Fig. 4 shows real hazy images and the corresponding dehazing results generated by state-of-the-art dehazing methods and our method. Although the non-CNN-based dehazing methods are able to remove haze, they excessively remove haze, such as the third row and the fifth column. The CNN-based dehazing methods do not remove haze excessively, but they remain some haze in the images, such as the second row and the sixth, eighth columns. Different from these methods, the images generated by our method shown in the last column are much clearer than those of other methods.
In this paper, we propose an unified CNN architecture with the goal to improve the performance both on image dehazing and image classification in an end-to-end learning approach. In comparison to the existing approaches, we investigate the use of class information for synthesizing the dehazed image from a given input hazy image. We evaluate our framework on two benchmark datasets: CUB-200-2011 and Caltech-256. Detailed experiments and comparisons are performed both on synthetic and real-world hazy images to demonstrate that the proposed method significantly outperforms many recent state-of-the-art methods. Additionally, the proposed method is compared against baseline configurations to illustrate the performance gains obtained by introducing the classification sub-network into the framework.
-  (2016) Non-local image dehazing. In CVPR, Cited by: §I, §II-A, §IV-C, §IV-C.
-  (2016) Dehazenet: an end-to-end system for single image haze removal. IEEE Transactions on Image Processing. Cited by: §I, §II-A, §IV-A, §IV-C.
-  (2009) Imagenet: a large-scale hierarchical image database. In CVPR, Cited by: §II-B.
Wildcat: weakly supervised learning of deep convnets for image classification, pointwise localization and segmentation. In CVPR, Cited by: §II-B.
-  (2014) Dehazing using color-lines. ACM Transactions on Graphics. Cited by: §I, §II-A.
-  (2014) Generative adversarial nets. In NIPS, Cited by: §II-C.
-  (2007) Caltech-256 object category dataset. California Institute of Technology. Cited by: §IV-A.
-  (2011) Single image haze removal using dark channel prior. IEEE Transactions on Pattern Analysis and Machine Intelligence. Cited by: §I, §II-A, §IV-C, §IV-C.
-  (2016) Deep residual learning for image recognition. In CVPR, Cited by: §I, §II-B, §IV-B.
-  (2017) Squeeze-and-excitation networks. arXiv preprint arXiv:1709.01507. Cited by: §I, §II-B.
-  (2016) Perceptual losses for real-time style transfer and super-resolution. In ECCV, Cited by: §II-C.
-  (2015) Adam: a method for stochastic optimization. In ICLR, Cited by: §IV-B.
-  (1924) Theorie der horizontalen sichtweite. Beitrage zur Physik der freien Atmosphare. Cited by: §I.
-  (2012) Imagenet classification with deep convolutional neural networks. In NIPS, Cited by: §I, §II-B, §IV-B.
-  (2017) Photo-realistic single image super-resolution using a generative adversarial network. In CVPR, Cited by: §II-C.
-  (2017) Aod-net: all-in-one dehazing network. In ICCV, Cited by: §I, §II-A, §IV-C.
-  (2015) The treasure beneath convolutional layers: cross-convolutional-layer pooling for image classification. In CVPR, Cited by: §II-B.
-  (2013) Efficient image dehazing with boundary constraint and contextual regularization. In ICCV, Cited by: §II-A, §IV-C, §IV-C.
-  (2003) Contrast restoration of weather degraded images. IEEE Transactions on Pattern Analysis and Machine Intelligence. Cited by: §I.
-  (1999) Vision in bad weather. In ICCV, Cited by: §I.
-  (2018) Does haze removal help cnn-based image classification?. In ECCV, Cited by: §I.
-  (2016) Single image dehazing via multi-scale convolutional neural networks. In ECCV, Cited by: §I, §II-A, §IV-C.
-  (2018) Gated fusion network for single image dehazing. In ICCV, Cited by: §I, §II-A, §IV-C, §IV-C.
-  (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §I, §II-B.
-  (2015) Learning structured output representation using deep conditional generative models. In NIPS, Cited by: §II-C.
-  (2015) Going deeper with convolutions. In CVPR, Cited by: §I, §II-B, §IV-B.
-  (2008) Visibility in bad weather from a single image. In CVPR, Cited by: §I, §II-A.
-  (2014) Investigating haze-relevant features in a learning framework for image dehazing. In CVPR, Cited by: §II-A.
-  (2011) The caltech-ucsd birds200-2011 dataset. California Institute of Technology. Cited by: §IV-A.
-  (2017) Residual attention network for image classification. arXiv preprint arXiv:1704.06904. Cited by: §II-B.
-  (2016) Cnn-rnn: a unified framework for multi-label image classification. In CVPR, Cited by: §II-B.
-  (2017) Learning to super-resolve blurry face and text images. In ICCV, Cited by: §II-C.
-  (2016) Semantic image inpainting with deep generative models. In CVPR, Cited by: §II-C.
-  (2018) Densely connected pyramid dehazing network. In CVPR, Cited by: §I, §II-A.
-  (2017) Image de-raining using a conditional generative adversarial network. arXiv preprint arXiv:1701.05957. Cited by: §II-C.
-  (2018) Image de-raining using a conditional generative adversarial network. In CVPR, Cited by: §I, §II-A, §II-C, §IV-C, §IV-C.
-  (2015) A fast single image haze removal algorithm using color attenuation prior. IEEE Transactions on Image Processing. Cited by: §I, §II-A, §IV-C, §IV-C.