Many computer vision applications, such as those for image classification and object detection, are trained on datasets comprised of mostly pristine imagery. However, to ensure dependability in real-world environments, computer vision algorithms must be able to perform consistently in various levels of visual degradation. One primary source of image degradation is haze, which introduces challenging, nonlinear noise to a scene. Haze is caused by particulates in the atmosphere, such as dust, fumes, and mist, that absorb and scatter light. Image degradation from haze can adversely affect computer vision algorithms, making it a principle concern for future systems that incorporate visual information into their decision-making processes. Previous works[20, 21] have established the negative impact of haze on object detection and recognition tasks and have furthermore shown the benefit of introducing image dehazing as a prepossessing step to computer vision tasks. Introducing image enhancement algorithms such as image dehazing may prove to be an important step in creating reliable vision-based systems.
1.1 Single Image Dehazing
where is the captured hazy image, is the haze-free image, is the global atmospheric light, and is the transmission map. Consequently, by estimating the global atmospheric light and transmission map for a captured hazy image, the haze-free image can be recovered. This approach has been the basis of several successful approaches [6, 10, 11, 15, 17, 23, 34]. More recently, neural network approaches have also been proposed to estimate these scene properties .
To evaluate the performance of these algorithms, peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) are commonly used to quantify dehazed image restoration quality. PSNR (measured in decibels) is an absolute error, calculated using the mean square error (MSE) of a pixel relative to its maximum possible value. Alternatively, SSIM attempts to improve upon absolute error metrics by more closely aligning with human perception under the assumption that humans visual systems are highly attuned to extracting structural information . Nevertheless, these metrics do not always agree with human assessment of the similarity of images  and qualitative assessment remains an important component in evaluating performance.
In this paper, we present a family of fully convolutional neural network architectures for single image dehazing capable of being deployed on edge GPUs. First, we present two network variants, dubbed Small and Big FastNet, where Small and Big refer to the widths of the networks. Second, we present a neural network based on the atmospheric scattering model that estimates the transmission map and atmospheric light of a scene. We utilize these networks to study change in accuracy as a function of total network parameters, as well as to assess the benefits of estimating a scene’s transmission map and atmospheric light. For this paper, we loosely define efficiency based on model performance versus parameter count. All of the proposed networks utilize an encoder–decoder structure adapted from efficient image segmentation networks  and utilize a fully connected pyramid pooling network  for output image refinement. Finally, we show benchmarks on reference hardware for varying pixel counts to examine the feasibility of incorporating these algorithms in real-time systems.
The paper makes the following contributions:
A novel neural network architecture that efficiently achieves state-of-the-art performance in single image dehazing on the NYU Depth dataset.
A scaled-down architecture capable of running on super-resolution imagery without the need for cropping, which is a common requirement for previous approaches.
An empirical evaluation of the impact of loss function on restoration quality.
A discussion of the value of utilizing the atmospheric scattering model when designing neural network image dehazing models.
A discussion on the challenges of using deep learning methods for haze removal, such as the effects from overfitting.
Timing benchmarks for running our architectures on desktop and edge GPUs.
2 Related Work
Although there has and continues to be a tremendous amount of success in single image dehazing without the use of neural networks, many recent state-of-the-art techniques utilize deep learning frameworks [7, 9, 20, 32]. These approaches generally incorporate neural network building blocks originally proposed for image segmentation, style transfer, object detection, and other computer vision tasks. For example, U-Nets , feature pyramid networks , and residual networks  were all utilized as part of the 2018 NTIRE Image Dehazing Challenge .
2.1 Atmospheric Model Learning
Several successful techniques leverage hand-engineered features to estimate the transmission map for image dehazing [10, 11, 30]. In contrast to these approaches, Cai et al.  proposed an end-to-end network that learns features useful for estimating a transmission map. However, this method and similar transmission estimation methods  do not address estimating the atmospheric light within a scene. Zhang and Patel  addressed this issue by estimating both the atmospheric light and transmission map within a generative adversarial learning framework. In this approach, the unknown variables from the atmospheric scattering model are modeled using independent neural network architectures; U-Net is used to learn atmospheric light and a densely connected network is used to learn a transmission map estimation. Additionally, Li et al.  showed that the atmospheric scattering model, described in Equation 1
, could be reformulated via a linear transform to a single variable and bias.
This formulation fits naturally within a deep learning framework and hints at the effectiveness of purely convolutional approaches.
2.2 Style Transfer and Segmentation Networks
Generative adversarial networks (GANs) for image style transfer have become increasingly popular in recent years with algorithms such as Pix2Pix and CycleGAN . Haze removal can also be thought of from a style transfer perspective: transferring images from the hazy domain to the haze-free domain. This approach was attempted by Engin et al. , in which cycle consistency and perceptual losses were combined in a CycleGAN framework.
Additionally, approaches from semantic image segmentation, such as feature pyramid networks, have proven to be effective in image dehazing applications. Image segmentation networks often utilize encoder–decoder pairs to learn embedded representations of inputs that take into account multi-scale features. Chaurasia and Culurciello  proposed an efficient semantic segmentation architecture based on a fully convolutional encoder–decoder framework. Their encoder uses a ResNet18 model  for feature encoding and avoids a loss of spatial information by reintroducing residuals from each encoder to the output of its corresponding decoder.
2.3 Super-Resolution Imagery
One challenge in using neural networks for single image dehazing is processing high-resolution input. Several techniques in the 2018 NTIRE Image Dehazing Challenge handled the relatively high-input resolution of the I-HAZE  and O-HAZE  datasets by cropping input imagery into many smaller frames or downsampling the input imagery and resizing the final outputs . These approaches are limited by total GPU memory and not GPU processing power; therefore, models with fewer parameters are capable of accepting higher-resolution input imagery.
3 Proposed Method
3.1 Network Architecture
Our proposed fully convolutional neural networks (CNNs) build upon past work in efficient image segmentation and deep learning-based image dehazing. For our Small FastNet, we adapted the LinkNet architecture  by removing the final softmax and prediction layers in order to pass features directly into a pyramid pooling network at the full input spatial resolution. LinkNet uses layers from a pretrained ResNet18 model for its encoder modules. For our Big FastNet, we modified the original architecture’s encoder to utilize ResNet50 as its encoder module; we observe that the increased model width (achieved with the deeper ResNet encoder) leads to improved restoration quality at a small speed trade off. Both these models use a single encoder–decoder to learn features of the image, followed by an image refinement pyramid pooling network. The pyramid pooling network helps preserve multi-scale features when forming the final output image by progressively embedding inputs at multiple scales and then resizing all scaled embeddings to the output resolution.
In addition to the two single-encoder models, we introduce DualFastNet, which is inspired by past work in atmospheric model networks, notably by Zhang and Patel . Rather than using a single encoder–decoder, our DualFastNet approach uses two separate encoder–decoder models to learn atmospheric light and transmission map estimations. These estimations are then used as input to calculate a dehazed image using the formulation described in Equation 1. This approach was used in our submission to the 2019 NTIRE Image Dehazing Challenge; however, as described in later sections, further studies indicate that Big FastNet yields better performance on larger datasets. Our single encoder–decoder FastNet variant and double encoder–decoder DualFastNet variant are both shown in Figure 3.1.
3.2 Implementation and Training Details
We utilized several loss functions and data augmentation techniques described further in subsequent sections. Our implementation was developed in PyTorch and all results can be generated using our provided code***https://github.com/pmm09c/ntire-dehazing. We utilized ADAM  as an optimizer for training with an initial learning rate of
. During training, validation was done per epoch and models with improved validation loss were saved. Early stopping was used and training ended upon reaching convergence in validation loss to prevent overfitting. Each model was initially trained using MSE as the loss function. However, as described in later sections, some models were fine tuned using a secondary loss function. When validating models based on SSIM and PSNR, we chose to report the model with the highest SSIM, even if the corresponding PSNR was not the highest of all models trained. This means that for all results presented in this paper, models with higher PSNR may be achievable, but with degradation to SSIM.
We trained and evaluated our proposed methods on four datasets. First, we leveraged the NYU Depth dataset V2 as prepared by Zhang and Patel †††https://github.com/hezhangsprinter/DCPDN and demonstrate an improvement over previous state-of-the-art approaches. This dataset contains 1,000 unique training examples from the NYU Depth dataset V2  and 4,000 total training samples (each sample has four variations with varying levels of haze). These images are synthesized with the following parameters: and , where is atmospheric light and is the scattering coefficient. Each training sample consists of a hazy image, an atmospheric light image, a transmission map image, and a dehazed ground truth image. Four hundred test examples are generated in a similar fashion from the NYU Depth dataset V2.
Additionally, we evaluated our approach on the more challenging Dense-Haze dataset  and the high-resolution O- and I-HAZE datasets. These datasets provide real-world imagery that can be used to evaluate the generalizability of our models, the usability of our models on high-resolution data, and the overall performance of our models in various conditions. For the NTIRE 2019 Image Dehazing Challenge, our models were trained exclusively on the Dense-Haze dataset with randomly initialized weights. Models used to evaluate the O/I-HAZE datasets were trained on O/I-HAZE data using weights generated from training the NYU Depth dataset.
4.2 Architecture Comparison
We studied three variants of our fully convolutional neural network: Small FastNet, Big FastNet, and DualFastNet. As a result of studying these models, we present empirical evidence of the benefits of increasing model width and show the capability of fully convolutional methods to generalize image dehazing mechanisms without the need for an explicit atmospheric model. In later sections, we use the Dense-Haze dataset to show that introducing model priors through the atmospheric scattering model, as is done in the DualFastNet architecture, can benefit training when limited training samples are available.
Each model was trained and tested on the NYU Depth dataset with MSE loss only and we report the resulting PSNR and SSIM. MSE loss was enforced on the output image of both Small and Big FastNet. For DualFastNet, we examined three ways to train our model. Originally proposed by Zhang and Patel , we employed a stage-wise learning technique to train atmospheric light, transmission map, and image formation networks separately to quicken convergence; training is completed with the entire model being fine tuned. Although this approach was found to be effective, it burdens the training process. We denote this step-wise learning technique as DualFastNet. We also explored whether our DualFastNet model can be trained wholly from scratch — both with MSE loss enforced on atmospheric light, the transmission map, the dehazed image, and refined output image (DualFastNet), and with MSE loss enforced only on the refined output image (DualFastNet). Results are summarized in Table 1.
Model performance and parameter count appear to be related; models with higher parameter count yield higher performance. In addition, the step-wise learning technique is the most effective for training the atmospheric scattering model based DualFastNet. The widest architecture, Big FastNet, performs the best of our proposed architectures in both PSNR and SSIM, indicating that using a wider network is a viable alternative to incorporating an atmospheric model prior into the neural network architecture.
4.3 Loss Functions and Fine Tuning
We investigated the impact of loss function selection when optimizing our model on the NYU Depth dataset. Specifically, we fitted our models using a least absolute deviations (L1) loss and MSE loss baseline, and then further trained with a second refinement loss function. Refinement functions considered were: content loss , L1 loss, MSE loss, and SSIM loss. For the purpose of training time, we trained with our smallest model, Small FastNet. Results from this study are summarized in Table 2. Results indicate that training with L1 loss followed by MSE refinement generates images with the highest PSNR, whereas training with MSE loss followed by SSIM refinement generates images with the highest SSIM. Images generated with any of the loss functions studied are qualitatively similar, as shown in Figure 3.
|MSE Content Loss||22.12||0.8559|
|Input||L1||L1 MSE||L1 SSIM||MSE||MSE L1||MSE SSIM||Truth|
4.4 Timing Benchmarks
We performed timing benchmarks to help asses the feasibility of introducing our method as a pre-processing step for computer vision algorithms in real-time systems. The average timing over 20 runs is presented on both the Titan RTX desktop GPU and Tegra Xavier edge GPU. We progressively increased input resolution until we could no longer process a given input batch size due to GPU memory limitations. Timing results are given in frames per second for both floating point 32 and floating point 16. Full timing results are presented in Table 6. Unsurprisingly, the biggest timing gains come from utilizing a batch size greater than 1 and operating at floating point 16. For real-time applications, this introduces latency in exchange for throughput.
4.5 Comparison with State-of-the-Art Methods
4.5.1 Results on NYU Depth Dataset
For the NYU Depth dataset, we show state-of-the-art performance using our Big FastNet model trained with MSE loss and SSIM loss as refinement. Additionally, the model performs efficiently relative to its parameter count. Our model width can also be scaled down in exchange for SSIM and PSNR. For instance, Small FastNet has 11 million parameters, 6x smaller than Zhang and Patel’s  approach, and still performs competitively. Results for our method and other approaches are summarized in Table 4.
|He. et al. (CVPR’09)||-||0.86||-|
|Zhu. et al. (TIP’15)||-||0.86||-|
|Ren. et al. (ECCV’16)||-||0.82||0.0084|
|Berman. et al. (CVPR’16)||16.92||0.80||-|
|Li. et al. (ICCV’17)||-||0.88||0.018|
|Small FastNet (Ours)||25.18||0.94||11.55|
|Zhang & Patel (CVPR’18)||29.28||0.96||66.89|
|Big FastNet (Ours)||30.37||0.97||28.78|
4.5.2 Results on High-Resolution O/I-HAZE Dataset
We evaluated our method on the benchmark high-resolution O/I-HAZE datasets [4, 5]. Because of limitations in GPU memory, we used our Small FastNet model in this evaluation. Because this model has fewer parameters than other models studied, we were able to perform inference on the native full-resolution test imagery with a Titan RTX GPU. Past approaches typically use one of two methods: (1) forward pass patches of the test image and stitch the final output, or (2) operate on lower-input resolution imagery and rescale the output . Two models were trained separately using MSE loss, one on the O-HAZE dataset and one on the I-HAZE dataset. Each model was trained using the NYU Depth dataset pretrained model generated in earlier experiments rather than from randomly initialized weights. Training loss converged after only a few epochs, indicating that features learned from the NYU Depth dataset transfer well to other datasets, such as the O/I-HAZE datasets. To train our models, we augmented the dataset by extracting multi-scale patches reshaped to our training input size.
Our Small FastNet results are competitive with results from the 2018 NTIRE Image Dehazing Challenge, Indoor and Outdoor tracks . We achieve SSIM of 0.8089 and PSNR of 18.56 for the I-HAZE test dataset and SSIM of 0.7459 and PSNR of 22.07 for the O-HAZE test dataset. Each metric is ranked within the top 10 for its category with respect to the 2018 NTIRE Image Dehazing Challenge .
Figure 5 shows several images generated with our approach, including results from early training and results from the end of training when top SSIM has been reached. Although SSIM and PSNR continue to improve in later epochs of training, artifacts in imagery commonly seen in neural network approaches for image generation become noticeable. This indicates that it is important to not only maximize SSIM and PSNR, but also to conduct thorough qualitative analysis when evaluating top models for image dehazing. Because pixel values within areas of continuous dense haze are likely unrecoverable, the neural network learns to minimize its loss by using the average pixel value learned from similar areas in training data when it encounters dense haze. This causes the artifacts observed in Figure 5. In short, areas with unrecoverable pixel values are substituted with random training artifacts, which are likely to be strong indicators of overfitting.
|Input||Early Training Output||Top SSIM Output|
|Image Size||Batch Size||Small-32 RTX||Small-16 RTX||Big-32 RTX||Big-16 RTX||Small-32 AGX||Small-16 Xavier||Big-32 Xavier||Big-16 Xavier|
4.5.3 Results on Dense-Haze Dataset
The 2019 NTIRE Image Dehazing Challenge  introduces a novel dataset containing challenging, dense haze imagery, called Dense-Haze. The dataset contains 45 training images, 5 validation images, and 5 test images, each with a resolution of 1600 x 1200 pixels. We trained our DualFastNet model on all 45 training images, as well as 2 validation images, leaving 3 images for validation and 5 for testing. Training started with randomly initialized weights and data were randomly cropped and rotated throughout. Our model produced results that were competitive with other models in the challenge, achieving a PSNR score of 16.37 and an SSIM score of 0.569 on test images. Examples of the images generated are shown in Figure 1.
Although Big FastNet outperforms our other models in earlier experiments, these experiments did not use models trained on sparse datasets with randomly initialized weights as was done in this challenge. We have observed that when limited to fewer training samples, DualFastNet can generate superior results, indicating that the atmospheric scattering model can be a helpful prior in certain conditions. Specifically, on the Dense-Haze dataset, Big FastNet achieved the highest SSIM and DualFastNet achieved the highest PSNR. A qualitative study of output images was done that informed the decision to use DualFastNet in our challenge submission.
This paper proposes a family of novel neural network architectures for single image dehazing, as well as presents both quantitative and a qualitative evaluation of these architectures and their loss functions. On the NYU Depth dataset, Big FastNet, our largest model, outperforms its smaller variant and our architecture based on the atmospheric scattering model, DualFastNet. Additionally, this approach outperforms other state-of-the-art neural networks on the NYU Depth dataset in both performance and efficiency. However, our experimental results indicate that the atmospheric scattering model is a useful prior for a neural network architecture when training data is limited. Our architectures can be run as part of real-time systems on edge GPUs and have been benchmarked on multiple input imagery sizes. Finally, we discuss our results and challenges working with the O-HAZE, I-HAZE, and Dense-Haze datasets. In the 2019 NTIRE Image Dehazing Challenge, our efficient, atmospheric scattering model-based neural network architecture, DualFastNet, achieved competitive results, obtaining a PSNR score of 16.37 and an SSIM score of 0.569 on test images.
DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited. This material is based upon work supported under Air Force Contract No. FA8702-15-D-0001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the U.S. Air Force.
Cosmin Ancuti, Codruta O Ancuti, and Radu Timofte.
Ntire 2018 challenge on image dehazing: Methods and results.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 891–901, 2018.
-  Codruta O. Ancuti, Cosmin Ancuti, and Radu Timofte et al. Ntire 2019 challenge on image dehazing: Methods and results. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2019.
-  Codruta O. Ancuti, Cosmin Ancuti, Mateu Sbert, and Radu Timofte. Dense haze: A benchmark for image dehazing with dense-haze and haze-free images. 2019.
-  Codruta O. Ancuti, Cosmin Ancuti, Radu Timofte, and Christophe De Vleeschouwer. I-haze: a dehazing benchmark with real hazy and haze-free indoor images. In arXiv:1804.05091v1, 2018.
-  Codruta O. Ancuti, Cosmin Ancuti, Radu Timofte, and Christophe De Vleeschouwer. O-haze: a dehazing benchmark with real hazy and haze-free outdoor images. In IEEE Conference on Computer Vision and Pattern Recognition, NTIRE Workshop, NTIRE CVPR’18, 2018.
-  D. Berman, T. Treibitz, and S. Avidan. Non-local image dehazing. In IEEE Conf. CVPR, 2016.
-  Bolun Cai, Xiangmin Xu, Kui Jia, Chunmei Qing, and Dacheng Tao. Dehazenet: An end-to-end system for single image haze removal. CoRR, abs/1601.07661, 2016.
-  Abhishek Chaurasia and Eugenio Culurciello. Linknet: Exploiting encoder representations for efficient semantic segmentation. In 2017 IEEE Visual Communications and Image Processing (VCIP), pages 1–4. IEEE, 2017.
-  Deniz Engin, Anıl Genç, and Hazım Kemal Ekenel. Cycle-dehaze: Enhanced cyclegan for single image dehazing. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2018.
-  Raanan Fattal. Dehazing using color-lines. ACM Trans. Graph., 34(1):13:1–13:14, Dec. 2014.
-  Kaiming He, Jian Sun, and Xiaoou Tang. Single image haze removal using dark channel prior. IEEE transactions on pattern analysis and machine intelligence, 33(12):2341–2353, 2011.
-  Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. CoRR, abs/1406.4729, 2014.
-  Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros.
Image-to-image translation with conditional adversarial networks.CVPR, 2017.
-  Yutong Jiang, Changming Sun, Yu Zhao, and Li Yang. Image dehazing using adaptive bi-channel priors on superpixels. Computer Vision and Image Understanding, 165:17 – 32, 2017.
-  Justin Johnson, Alexandre Alahi, and Fei-Fei Li. Perceptual losses for real-time style transfer and super-resolution. CoRR, abs/1603.08155, 2016.
-  Mingye Ju, Zhenfei Gu, and Dengyin Zhang. Single image haze removal based on the improved atmospheric scattering model. Neurocomputing, 260:180 – 191, 2017.
-  Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
-  Harold Koschmieder. Theorie der horizontalen sichtweite. In Beiträge zur Physik der Freien Atmosphäre, 1924.
-  Boyi Li, Xiulian Peng, Zhangyang Wang, Jizheng Xu, and Dan Feng. Aod-net: All-in-one dehazing network. In The IEEE International Conference on Computer Vision (ICCV), Oct 2017.
-  Tsung-Yi Lin, Piotr Dollár, Ross B. Girshick, Kaiming He, Bharath Hariharan, and Serge J. Belongie. Feature pyramid networks for object detection. CoRR, abs/1612.03144, 2016.
-  Kede Ma, Wentao Liu, and Zhou Wang. Perceptual evaluation of single image dehazing algorithms. In 2015 IEEE International Conference on Image Processing (ICIP), pages 3600–3604. IEEE, 2015.
-  G. Meng, Y. Wang, J. Duan, S. Xiang, and C. Pan. Efficient image dehazing with boundary constraint and contextual regularization. In 2013 IEEE International Conference on Computer Vision, pages 617–624, Dec 2013.
-  William Edgar Knowles Middleton. Vision through the atmosphere. University of Toronto Press, 1952.
-  Srinivasa G Narasimhan and Shree K Nayar. Chromatic framework for vision in bad weather. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No. PR00662), volume 1, pages 598–605. IEEE, 2000.
-  Srinivasa G Narasimhan and Shree K Nayar. Vision and the atmosphere. International journal of computer vision, 48(3):233–254, 2002.
-  Wenqi Ren, Si Liu, Hua Zhang, Jinshan Pan, Xiaochun Cao, and Ming-Hsuan Yang. Single image dehazing via multi-scale convolutional neural networks. In European conference on computer vision, pages 154–169. Springer, 2016.
-  Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. CoRR, abs/1505.04597, 2015.
-  Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. Indoor segmentation and support inference from rgbd images. In European Conference on Computer Vision, pages 746–760. Springer, 2012.
-  R. T. Tan. Visibility in bad weather from a single image. In 2008 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8, June 2008.
-  Zhou Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: From error visibility to structural similarity. Trans. Img. Proc., 13(4):600–612, Apr. 2004.
-  He Zhang and Vishal M Patel. Densely connected pyramid dehazing network. In CVPR, 2018.
-  Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image translation using cycle-consistent adversarial networkss. In Computer Vision (ICCV), 2017 IEEE International Conference on, 2017.
-  Qingsong Zhu, Jiaming Mai, and Ling Shao. A fast single image haze removal algorithm using color attenuation prior. IEEE Transactions on Image Processing, 24(11):3522–3533, 2015.