HHazy weather easily causes poor image quality, which raises the risk of invalidating various outdoor computer vision applications including the object detection systems[1, 2, 3], recognition systems  and so on. To overcome this issue, a lot of methods have been proposed to remove the haze from single image.
, which formulates the haze-free image as an analytic solution with three variables, i.e., haze image, transmission map and atmospheric light. Then, various prior assumptions are proposed to estimate the transmission map or atmospheric light, such as dark channel, color attenuation , boundary constraint  and so on. Limited by aforementioned specific assumptions, these prior based methods hardly perform well under diverse hazy scenes.
Recently, with the rising up of deep learning, many data-driven methods have achieved impressive dehazing performance. Some methods follow the atmospheric scattering model, and derive more accurate transmission map and atmospheric light via designing convolution neural networks with BReLU , coarse-scale and fine-scale net , and densely connected pyramid structure .
Meanwhile, to reduce the error accumulation, many methods proposed to directly regress the haze-free image. Such as Li et al.  applies a light-weight CNN, Ren et al.  and Chen et al  perform a gated fusion module, Liu et al. 
adopts a dual residual connection structure, Liuet al.  introduces multi-scale network with attention units, Dong et al.  develops a fusion discriminator, and Dong et al.  utilizes a dense feature fusion module.
Although different network architectures are explored, these deep learning based methods share two common issues, i.e., limited scene prior and contextual information utilization. Firstly, only the original version of a haze image is fed to the dehazing network. This ignores the underlying scene prior of the input image, which could be captured from its multiple versions under different exposure conditions. Secondly, existing methods mainly cope with non-homogeneous haze by selectively preserving the features of each local region with the guidance of transmission or attention maps, which fail to explicitly utilize the information from the other distant regions, even they contain important scene prior. As shown in Fig. 1, the scattering model based deep network DCPDN  is unable to remove the haze, while fully end-to-end deep network GridNet  has limited ability to restore details and colors for non-homogeneous haze.
To address aforementioned issues, our previous work  develops the artificial multiple shots and multidimensional context network (MCN) for the single image dehazing. The artificial multiple shots use exponential and logarithmic transformations to convert an input image into multiple preprocessed versions to enrich the scene prior. Meanwhile, we further develop the spatial and channel graph reasoning module (SGR, CGR) to produce non-local filtering towards the spatial and channel dimensions of feature maps, which models their long-range dependency and propagates the natural scene prior between the well-preserved nodes and the nodes contaminated by haze. In this paper, we further extend our previous exploration  with a Non-Homogeneous Haze Removal Network (NHRN), and improve the dehazing performance from the following four aspects:
Artificial Scene Prior: We replace the exponential and logarithmic transformations with iterative gamma corrections, which is beneficial to reduce the parameter number and redundancy between different artificial shots. Meanwhile, in view of the potential noise, a gated fusion module is applied to selectively incorporate these artificial shots into the input features.
Spatial Projection Matrix: For SGR module, we derive a dynamic spatial projection matrix via the similarity of multiple predefined anchors and pixel-wise features rather than a convolutional layer, which generates the node by aggregating the pixels with the same filter parameters regardless of the visual content change. In this way, we can adaptively enhance the consistency of all pixels within each node and reduce the risk of noise propagation when a node contains both the clear and contaminated pixels.
Unique Adjacency Matrix: In , the graph reasoning is conducted via two 11 convolutions, whose parameters become constant after training. It results that different images share the same adjacency matrix. In this paper, we derive the unique adjacency matrix for each image based on the pairwise similarity of its all nodes, which improves the adaptability of SGR and CGR in coping with diverse hazy scenes.
Comprehensive Verification: We conduct more extensive experiments on both synthetic and real-world hazy images, which verify the effectiveness of the proposed method using specifically designed image dehazing quality assessment metrics. Meanwhile, beyond perceptual quality evaluations, we conduct experiments to investigate the effect of dehazing methods in the detection and semantic segmentation tasks under a hazy scene.
Ii Related Work
Ii-a Single image dehazing
Various prior methods have been proposed for image dehazing. DCP  is an effective strategy to restore the the transmission map using the statistical prior of the lowest pixel values in the color channels. Meng et al.  estimates the transmission map by adding boundary constraint and contextual regularization. CAP  applies the relationship between brightness and saturation to estimate the transmission map. IDE  introduces a light absorption coefficient to the atmospheric scattering model. Wu et al.  proposes an interleaved cascade of shrinkage fields to reduce noise during recovering the transmission map and the scene radiance. Further, learning-based methods are becoming popular, thanks to the strong nonlinear capabilities of deep networks. DehazeNet , MCSNN  and DCPDN  design CNNs to restore the haze-free images by using the atmospheric scattering model. Furthermore, AOD-Net  applies a light-weight CNN to restore the clear images. GCANet
leverages a gated-fusion subnetwork to fuse the middle features extracted from different convolutional layers. DuRN introduces a novel style of dual residual connection for image dehazing. GridNet  proposes the attention-based multi-scale dehazing network. FD-GAN 
develops a fully end-to-end generative adversarial network with fusion discriminator for image dehazing. MSBDN designs a dense feature fusion module, which remedies the missing spatial information from high-resolution features. RDN  designs a deep retinex dehazing network, which jointly estimates the residual illumination map and the haze-free image. Meanwhile, some methods propose to enrich the scene prior information of the input by various preprocessing operations. AMEF  applies a multi-scale Laplacian pyramid to fuse a series of gamma correction operations, which compensates valid information with each other. GFN  performs a gated network with fusing enhancement versions in three ways for preprocessing operations, including white balanced, contrast enhanced and gamma corrected. However, these methods ignore that real-world haze usually exhibits non-homogeneous distribution, so that well-preserved information could provide us with many valuable clues. However, convolution operations suffer from the limitation, which fails to explicitly utilize the information from the other distant regions, even they contain important scene prior.
Ii-B Graph-based method
Recently, Graph-based methods have been very popular and aim to model long-range dependencies through graph-structured data. In particular, an image could be regarded as a regular graph-structure data. CRFs  is an effective graph model used as post-processing for image segmentation. Besides, Graph Convolutional Networks (GCN) exhibits the superior performance on the task of reasoning relation. Specifically, Kipf et al.  encodes the graph structure to reason relation between graph nodes for semi-supervised classification. Wang et al.  exploits GCN to capture relations between objects in video recognition tasks. Xu et al.  proposes to use GCN for large-scale object detection, which discovers and incorporates key semantic and spatial relationships for reasoning over each object. Furthermore, Chen et al.  adopts the reasoning power of graph convolutions to reason disjoint and distant regions without extra annotations for semantic segmentation.
Iii Proposed Method
The proposed network architecture is illustrated in Fig. 2. The hazy image I is firstly preprocessed via the AMS module, which generates artificial multiple shots and connects them along the channel dimension to obtain , i.e.,
Meanwhile, the hazy input intermediate layer features is obtained through the convolutional operation , which is formulated by
Moreover, we apply to get scene prior features by extracting the corresponding features from . Meanwhile, we incorporate it into to obtain the transformed features , which preserves the regions with important recovery information and filters extra noise, i.e.,
We further restore the clear images via an encoder-decoder network. We apply the SGR and CGR module to conduct non-local filtering between different spatial regions and channel dimensions in the encoder layers, which model long-range dependency and propagate the natural scene prior between the well-preserved nodes and the nodes contaminated by haze, i.e.,
where denotes the encoder convolutional layers, denotes the spatial reasoning features, and denotes the channel reasoning features.
In the following, we feed and to the remaining encoder layers and a decoder network to generate the final dehazed image .
Iii-a Artificial Scene Prior
Iii-A1 Artificial Multiple Shots
The AMS module is a learnable preprocessing operation, which compensates their high-frequency components in multiple degrees. Inspired by the traditional image enhancement process, we attempt to obtain multiple high-frequency compensation versions using gamma correction for the input hazy image. Meanwhile, the parameters of high-frequency compensation are adaptively determined by the input image. The first level artificial shot formula is as follows:
where and are the trainable enhancement parameters, which adjust the magnitude and also control the high-frequency compensation level.
In view of the diversity of the haze distribution, can be applied iteratively to generate artificial shots in different levels, which simulates the images captured under different exposure conditions, i.e.,
where , is the number of iteration, and represents the - level artificial shot. Meanwhile, is equivalent to when is 1. As shown in Fig. 3, to learn the mapping between an input image and its suitable multiple sets of high-frequency parameters and , we feed the input image to two
Finally, we combine multiple artificial shots to adapt to different enhancement requirements, which is formulated as:
where is the total number of artificial shots, and denotes the concatenation operation.
Iii-A2 Gated Fusion module
We regard artificial multiple shots as the scene prior of the hazy image. Since each and are used for all pixels, it is a global adaptive high-frequency compensation. Notably, the generated artificial multiple shots may introduce noise interference for some local regions while using global mapping.
To address this problem, we apply the GF module to preserve pivotal prior for better dehazing optimization. Firstly, GF applies the convolution operation to obtain by extracting the prior features from artificial multiple shots, which is formulated as:
Second, GF adopts a gated fusion operation to incorporate the features of artificial multiple shots into the features of the hazy input. The outputs of the gated fusion operating are two different importance weights (), which correspond to each feature level respectively.
where represents the gated operation, which consists of two convolutional layers with kernel size 3x3 and a Sigmod function. Meanwhile, the channel dimension of and is one, respectively.
Iii-B Graph Reasoning module
We employ the pretrained ResNet-50  as the encoder network. The decoder is consisted of five consecutive deconvolutional layers  to restore the original resolution. Considering that haze distribution is non-homogeneous across spatial location, we aim to build the long-range interactions between different regions with similar structure. Meanwhile, it is critical to capture the relationship between channels. Inspired by the graph convolutional network, We develop Spatial Graph Reasoning module and Channel Graph Reasoning module to model the long-range dependency between regions and channels on a graph, respectively.
In the Res3 encoder layers, we obtain a feature tensor, where denotes locations, and denotes feature dimension. We project X into a graph structure , where V and E denote the nodes and edges, respectively. In particular, represents a set of nodes for a graph, where denotes the number of nodes, and is the dimension for each node representation. Meanwhile, represents an associated set of edges, where represents the adjacency weight from to , . In general, our proposed two Graph Reasoning modules describe the graph relationship for X across different spatial regions and channel dimensions.
Iii-B1 Spatial Graph Reasoning module
As illustrated in Fig. 4 (a), Spatial Graph Reasoning module consists of three operations: Spatial Graph Projection, Spatial Graph Reasoning, Spatial Graph Reprojection.
Spatial Graph Projection: For an input feature tensor X, we aim to use to project X into a set of nodes in a graph. Specifically, we first adopt a convolution as a linear embedding, resulting in . We then reshape to . Finally, we multiply and to obtain nodes in a graph. The features of nodes are , which is formulated as:
where each node is represented by
Considering the pixel features consistency within each node, we derive a dynamic , which could group pixels into coherent regions. In particular, we capture the similarity of pixel-wise features to guide the allocation for each pixel. However, the complexity of computing is large with increasing number of pixels . To address this issue, we uniformly divide the spatial feature map into grids as shown in Fig. 5 (a), and use the average pooling of each grid to represent its anchor. In addition to reducing computational complexity, the average pooling could promote the compact representation over local features to remove redundancy. Then, the similarity is measured between all anchors and the pixel-wise features.
Specifically, to generate a dynamic projection matrix , we first apply a convolution as a linear embedding to obtain . Second, we build anchors of nodes by an average pooling operation , which reduces the spatial dimensions of from () to () of . We then flatten to obtain the anchor features , . Meanwhile, we reshape and transpose to obtain the pixel-wise features , . Finally, we derive via the similarity of multiple predefined anchors and pixel-wise features . In particular, we take the multiplication of and to obtain . Additionally, we choose the Softmax function for normalization in the node direction, i.e.,
where denotes the Hadamard product, and .
Fig. 5 (f) shows the projection process of forming spatial graph nodes. Fig. 5 (g), (h) and (i) are the weights for these projection maps of (i.e., Node-24, 118 and 210). Meanwhile, Node-24 aggregates pixels with similar features as Anchor-24, which focuses on the pavilion structure in clear regions. Node-118 and 210 aggregate grass structures from dense haze and clear regions, respectively, which are similar to Anchor-118 and 210. In other word, aggregates pixels with similar features as each anchor to one node. In particular, each node represents an arbitrary region with the similar features. Finally, we use to obtain the node features , with each node represented by -dimensional vectors.
Moreover, we construct the edges in the graph. Taking into account the difference in dependency between the nodes, we choose to construct the directed graph. Specifically, we contrast the directed adjacency matrix , where each matrix element is represented by . For two node features , , the pairwise adjacency weight is defined as:
where and denote two linear embeddings, which are both dimensional weights. We adopt a Softmax function for normalization.
Spatial Graph Reasoning: These disjoint graph nodes contain corresponding feature descriptor that needs to be propagated. Specifically, we make use of graph convolution to update the feature information for each node, which is obtained from other nodes, i.e.,
where denotes the activation function ReLU, is the weight matrix, and denotes the updated node features after diffusing information across nodes.
Spatial Graph Reprojection: In the following, we perform the features reprojection from graph space to pixel space via spatial graph reprojection. For simplicity, we reuse the spatial projection matrix . In particular, we obtain the spatial reprojection matrix by transposing . We further project the graph node features to the pixel-wise features by the spatial projection matrix , i.e.,
where is . We then reshape the size of to to get the residual value for . Finally, we add the input and the residual to obtain
Iii-B2 Channel Graph Reasoning module
Similar to SGR, Channel Graph Reasoning module consists of the following three operations: Channel Graph Projection, Channel Graph Reasoning, Channel Graph Reprojection. The structure of CGR is illustrated in Fig.4 (b).
Channel Graph Projection: We perform the features projection on the channel dimension. In particular, we project each channel feature into a channel graph node. We first feed into a convolution layer to generate , where determines the number of nodes. Considering that is too large, we utilize an average pooling to process the features to reduce the scale for and :
where , denotes the output scale after the pooling operation. Then, we reshape to () and obtain the transpose of it, i.e., . represents the node features of the input X projected in the channel dimension, with each node represented by -dimensional vectors. Fig. 5 (b) shows the projection process of forming channel nodes. And Fig. 5 (c),(d) and (e) are the feature maps before pooling of channel Node-1, 2 and 3. Similarly, we construct the edges in the graph. We exploit to generate a directed adjacency matrix in the graph.
|SOTS indoor||SOTS outdoor||TestA-DCPDN||NH-Haze|
|Meng et al. ||23.49||0.936||0.136||15.51||0.797||0.186||24.33||0.904||0.172||13.12||0.492||0.499|
Channel Graph Reasoning: Similarly, we exploit to reason the graph relationship to acquire the new node features , i.e.,
where is the weight matrix, and denotes the updated node features after diffusing information across channel nodes.
Channel Graph Repojection: We perform the features reprojection from graph space to pixel space via channel graph reprojection. Considering that the operation of channel projecting to the nodes could cause some information to be lost, so we relearn a mapping function, which transforms the node features to . In particular, we generate from by a linear projection as follows:
Then we reshape and transpose as the channel reprojection matrix . Furthermore, we reproject the channel node features to obtain pixel-wise features, i.e.,
where is the pixel-wise features. Furthermore, we reshape it to obtain . Additionally, we use another a convolution layer to obtain the residual value from , which changes to to match the input dimension for . Finally, we add the input and the residual to obtain .
Iv Loss Function
The final dehazing result is . denotes the clear image, and denotes the number of pixels in an image. The image content loss and ssim loss are incorporated for training the proposed network, i.e.,
Hence, the total loss can be expressed as:
is a parameter used to balance two loss functions and is set to 0.1 by default.
V Experimental Results
In this part, we first describe the benchmark databases, which contain both the synthetic and real-world haze images. Second, we expound the implementation details of our method. Next, we conduct quantitative and qualitative analysis to compare our dehazing network against state-of-the-art methods. Moreover, we demonstrate the effectiveness of our proposed module through a series of analysis and ablation studies.
V-a Training and Testing Datasets
, Foggy Cityscapes-DBF followed by the non-homogeneous realistic dataset NH-HAZE  and DHQ .
The RESIDE dataset consists of paired synthetic indoor and outdoor hazy images. Specifically, we choose the RESIDE ITS and OTS-  as the train sets, respectively. 13990 indoor hazy images of ITS  are generated from 1399 clear images, and we evaluate the performance on 500 indoor images of RESIDE SOTS indoor . The RESIDE OTS-  consists of 69,510 hazy images generated by 1986 clear images. We evaluate the performance on 500 outdoor images of RESIDE SOTS outdoor .
The real-world NH-HAZE dataset  has a total of 55 pairs of non-homogeneous hazy and clear images of the real outdoor scenes. We train and evaluate our method according to the NH-HAZE 2020 dehazing challenge  requirements.
The DHQ dataset  contains 250 high-quality real-world hazy images. Moreover, the dataset provides a no-reference quality evaluation method NR-IQA  that could be used to evaluate the performance of various dehazing methods in the real-world scenes.
The Foggy Cityscapes-DBF  dataset has the corresponding clear images with bounding box and segmentation masks. We strictly follow  to synthesize more kinds of haze density images. Specifically, we set nine kinds of , where the value of ranges from 0.004 to 0.02, and the interval is 0.002. Notably, it contains image pairs for training. Meanwhile, we evaluate on image pairs.
|Method||DCP ||CAP ||Meng et al. ||AMEF ||DehazeNet ||AOD-Net ||GFN ||DCPDN ||GCANet ||DuRN ||GridNet ||FD-GAN ||MSBDN ||MCN ||NHRN|
|FQ IQA||SOTS indoor||49.95||74.02||76.87||84.64||76.87||71.50||84.30||73.03||94.11||95.63||95.02||89.07||96.66||97.22||97.58|
V-B Implementation Detail
The network is trained in an end-to-end manner. Specifically, we train the network with the Adam optimizer, where and take 0.9 and 0.999, respectively. The initial learning rate is set to 0.0001. We employ a mini-batch size of 16 to train our network. The hazy images are randomly cropped to
and used as the input to our network during training phase. Moreover, we train the network for 120 epochs in the ITS, TrainA-DCPDN  and NH-HAZE decay the learning rate by 0.5 times after 40 epochs. As for OTS-, considering the large number of data sets, we train 20 epochs and multiply the learning rate with 0.5 every 10 epochs. Moreover, for the Foggy Cityscapes-DBF , we train 20 epochs with the learning rate of 0.0001. The training is carried out on a PC with NVIDIA GeForce GTX TITAN XP.
V-C Quantitative Analysis with Objective Metrics
Our comparison methods include the traditional dehazing algorithms, e.g., DCP , CAP , Meng et al.  and AMEF , as well as the CNN-based methods, e.g., DehazeNet , AOD-Net , GFN , DCPDN , GCANet , DuRN , GridNet , FD-GAN  and MSBDN . These objective quality metrics, PSNR, SIMM (i.e., the higher, the better) and perceptual quality LPIPS  (i.e., the lower, the better), are used to measure the dehazing performance. TABLE I reports the quantitative results on synthetic datasets SOTS indoor, SOTS outdoor and TestA-DCPDN as well as real-world dataset NH-HAZE, from which we can clearly observe that our proposed method performs best in terms of PSNR, SSIM and LPIPS metrics. Compared with the latest methods for each dataset for PSNR, SSIM and LPIPS value, our previous method  are better than those of the second place method with significant margins on SOTS indoor, SOTS outdoor, TestA-DCPDN and NH-HAZE, respectively. Furthermore, our improved method NHRN performs better than our previous method  in terms of PSNR, SSIM and LPIPS metrics on four datasets, especially for the NH-HAZE dataset.
V-D Quantitative Analysis with IQA Metrics
In this part, we introduce the IQA metrics to compare different dehazing methods as a complement to three metrics, PSNR, SSIM and LPIPS. Specifically, we adopt FR-IQA evaluation algorithm  to evaluate SOTS indoor, SOTS outdoor, TestA-DCPDN and NH-HAZE as a full-reference quality assessment. As a further experiment supplement, we apply NR-IQA  as a no-reference quality assessment to evaluate the dehazing effect in real-world hazy images.
AS seen in TABLE II, for FR-IQA metric, our methods shows sufficient advantages on different datasets. In particular, our method achieves the best performance, whose score is 97.58 on SOTS indoor, 95.87 on SOTS outdoor, 96.45 on TestA-DCPDN and 78.68 on NH-HAZE dataset, respectively. Moreover, we present the results of NR-IQA for the NHQ dataset. Our method achieves the best score 49.46, where the gaps with our previous method  is 0.1.
V-E Qualitative Evaluation
We further show the dehazing results of the state-of-the-art dehazing methods on real-world NH-HAZE and DHQ datasets in Fig. 6 and Fig. 7 for qualitative comparisons. It is observed that our method achieves desirable dehazing results on these datasets, which show the robustness of our method.
From visual results, we can observe that DCP , CAP  and Meng et al.  easily suffers from color distortion, which make the brightness of several areas relatively dark. AMEF , DehazeNet , AOD-Net  and DCPDN  still remain haze in heavily hazy scene. The processing power of GFN , GCANet  and FD-GAN  at high-frequency detail information performance is unnatural. Although DuRN , GridNet  and MSBDN  look good in some cases, there are flaws in detail restoration. our method achieves best visual effect, which could both preserve the image detail and color as well as remove the haze as much as possible from the input. This is because our method could provide suitable scene prior. Meanwhile, our method conducts non-local filtering in the spatial and channel dimensions to model long-range dependency and propagate information between graph nodes for better dehazing optimization.
|mAP and AP on Foggy Cityscapes-DBF test set (mAP/AP)|
|CAP ||35.6/57.9||34.2/56.4||32.6/53.8||31.2/51.1||29.8/49.2||28.2/45.9||26.4/ 41.7||25.5/40.3||24.2/37.9|
|Meng et al. ||36.1/58.7||35.4/57.2||34.4/55.2||33.4/53.7||32.1/52.3||31.4/51.1||30.2/49.5||29.3/48.1||28.4/46.2|
|AMEF ||33.9/56.0||32.6/54.1||31.2/51.3||29.8/48.9||28.7/47.0||27.2/44.6||26.3/42.9||25.1 /40.2||24.0/38.1|
|mIoU and mAcc on Foggy Cityscapes-DBF test set (mIou/mAcc)|
|Meng et al. ||66.05/77.81||65.54/77.03||65.11/76.29||64.56/75.47||63.94/74.58||63.26/73.69||62.60/72.84||61.92/72.05||61.32/71.32|
V-F Effect of Haze Scene Detection and Segmentation
In addition to visual effectiveness, we further compare the performance of these dehazing algorithms in detection and segmentation tasks using Foggy Cityscapes-DBF dataset. For object detection task, we adopt Faster-RCNN  as our detector, which is pretrained on the Cityspaces dataset . We employ mAP and AP to measure the detection effect. As seen in TABLE III, our algorithm achieves the highest detection precision in the dehazing results of nine haze concentrations. More intuitively, Fig. 8 shows that our method has lesser missed detection and is close to the target detection result under the haze-free scene compared other CNN-based methods. Meanwhile, we conduct similar experiments on the semantics segmentation task, whose metric consists of mIoU and mAcc. We exploit PSPNet  as the semantics segmentation algorithm pretrained on the Cityspaces dataset. The comparison results are shown in TABLE IV. As revealed in Fig. 8, it can be oserved that our method leads to more consistent semantic information with ground-truth on the qualitative segmentation results. Note that conducting object detection and segmentation directly on hazy images and haze-free images are adopted as the baseline and ideal case, respectively.
V-G Analysis and Visualization
We consider that our method mainly relies on the proposed SGR and CGR module, which is able to capture the underlying non-local contextual information.
Thus we first visualized the SGR module learning the projection weights and corresponding adjacent weights (see Fig. 9). In particular, We totally set spatial graph nodes. The projection weights (i.e., in ) could reflect the irregular region formed by the feature aggregation of the spatial graph nodes. And the adjacent weight diffuses information across nodes, where the higher weight represents stronger dependency from the node to the current reference node, while the lower weight indicates weak dependency to the current reference node. We show the spatial graph Node-105, 76 on the first and second rows, respectively. We can see that Node-105 lends to focus on the features of grass under the dense haze, while Node-76 tends to aggregate the features of the parterre under the dense haze. The column (d) shows the adjacent weights on the Node-105 and 76. Beside, the column (e) and (f) show the projection weights of the response nodes that have the highest and lowest adjacency weights for Nodes-105 and 76, respectively. We observe that Node-105 has been able to interact with long-range nodes, which aggregates the clear regions of grass. Meanwhile, Node-76 captures the non-local contextual information, which focuses on the pavilion in the clearer region with similar structures.
For CGR module, we provide the corresponding adjacent weights added on the each channel feature map (see Fig. 10). Furthermore, we could observe that channel node-1 and 32 could capture the interact with non-local channel nodes. Meanwhile, each reference node could learn an adaptive adjacency matrix according to its dependencies with other nodes. Therefore, these visualizations demonstrate that SGR and CGR module could build long-range dependencies and propagate the natural scene prior between the well-preserved nodes and the nodes contaminated by haze.
V-H Ablation Study
V-H1 Effect of the components
We perform the ablation studies to verify the major components of our network. As shown in TABLE VI, overall, the PSNR and SSIM gradually rise when feeding AMS, GF, SGR and CGR. From the charts of PSNR and SSIM indicators, we can see that these two metrics increase when adding the AMS. When adding the SGR and CGR, PSNR and SSIM are significantly improved relative to the basic network. It confirms the benefit of reasoning across different spatial regions and multiple artificial shots for single image dehazing.
V-H2 The number of AMS
We perform the experiment by considering different amounts of artificial shots on the basis of our baseline network with GF module, where GF module is a compensation for AMS. As shown in Fig. 11, overall, the PSNR and SSIM gradually rise when the number of artificial shots increases from 0 to 4. From the charts of PSNR and SSIM indicators, it can be seen that there is a slight increase when adding one artificial shot. When the number is increased form 2 to 4, PSNR and SSIM is significantly improved relative to no artificial shot. However, when we continue to increase the number of shots, the PSNR and SSIM decrease a little. That is because the continuing feed in high-frequency compensation brings unnecessary noise interference. It confirms the benefit of appropriately enriching high-frequency input for single image dehazing.
V-H3 The number of spatial graph nodes
We perform the ablation studies to analyze the effect of different numbers of spatial graph nodes on the results. As shown in TABLE V, the PSNR and SSIM rise to the best performance when seting the number of nodes is . Meanwhile, increasing the number of nodes further does not bring performance improvement, which is because more detailed anchors affect the overall feature representation.
In this paper, we propose a Non-Homogeneous Haze Removal Network (NHRN) via artificial scene prior and bidimensional graph reasoning. Specifically, we enrich the underlying scene prior by utilizing the gamma correction iteratively to generate artificial multiple shots under different exposure conditions. Moreover, we conduct a bidimensional graph reasoning module to model long-range dependency in the spatial and channel dimensions. By doing so, it can propagate the natural scene prior between the well-preserved nodes and the nodes contaminated by haze. Extensive experiments show the superiority of the NHRN to various state-of-the-art methods.
B. Li, X. Peng, Z. Wang, J. Xu, and D. Feng, “End-to-end united video dehazing
and detection,” in
Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, 2018.
Y. Chen, W. Li, C. Sakaridis, D. Dai, and L. Van Gool, “Domain adaptive faster
r-cnn for object detection in the wild,” in
Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 3339–3348.
-  V. A. Sindagi, P. Oza, R. Yasarla, and V. M. Patel, “Prior-based domain adaptive object detection for hazy and rainy conditions,” in European Conference on Computer Vision. Springer, 2020, pp. 763–780.
-  L. Wang, G. Hua, J. Xue, Z. Gao, and N. Zheng, “Joint segmentation and recognition of categorized objects from noisy web image collection,” IEEE Transactions on Image Processing, vol. 23, no. 9, pp. 4070–4086, 2014.
-  S. G. Narasimhan and S. K. Nayar, “Chromatic framework for vision in bad weather,” in IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, 2000, pp. 598–605.
-  R. Fattal, “Single image dehazing,” ACM transactions on graphics (TOG), vol. 27, no. 3, p. 72, 2008.
-  S. G. Narasimhan and S. K. Nayar, “Vision and the atmosphere,” International journal of computer vision, vol. 48, no. 3, pp. 233–254, 2002.
-  R. T. Tan, “Visibility in bad weather from a single image,” in IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1–8.
-  K. He, J. Sun, and X. Tang, “Single image haze removal using dark channel prior,” IEEE transactions on pattern analysis and machine intelligence, vol. 33, no. 12, pp. 2341–2353, 2010.
-  Q. Zhu, J. Mai, and L. Shao, “A fast single image haze removal algorithm using color attenuation prior,” IEEE transactions on image processing, vol. 24, no. 11, pp. 3522–3533, 2015.
-  G. Meng, Y. Wang, J. Duan, S. Xiang, and C. Pan, “Efficient image dehazing with boundary constraint and contextual regularization,” in IEEE International Conference on Computer Vision, 2013, pp. 617–624.
-  B. Cai, X. Xu, K. Jia, C. Qing, and D. Tao, “Dehazenet: An end-to-end system for single image haze removal,” IEEE Transactions on Image Processing, vol. 25, no. 11, pp. 5187–5198, 2016.
W. Ren, S. Liu, H. Zhang, J. Pan, X. Cao, and M.-H. Yang, “Single image dehazing via multi-scale convolutional neural networks,” inEuropean Conference on Computer Vision, 2016, pp. 154–169.
-  H. Zhang and V. M. Patel, “Densely connected pyramid dehazing network,” in IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 3194–3203.
-  B. Li, X. Peng, Z. Wang, J. Xu, and D. Feng, “Aod-net: All-in-one dehazing network,” in IEEE International Conference on Computer Vision, 2017, pp. 4770–4778.
-  W. Ren, L. Ma, J. Zhang, J. Pan, X. Cao, W. Liu, and M.-H. Yang, “Gated fusion network for single image dehazing,” in IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 3253–3261.
-  D. Chen, M. He, Q. Fan, J. Liao, L. Zhang, D. Hou, L. Yuan, and G. Hua, “Gated context aggregation network for image dehazing and deraining,” in IEEE Winter Conference on Applications of Computer Vision, 2019, pp. 1375–1383.
-  X. Liu, M. Suganuma, Z. Sun, and T. Okatani, “Dual residual networks leveraging the potential of paired operations for image restoration,” in IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 7007–7016.
-  X. Liu, Y. Ma, Z. Shi, and J. Chen, “Griddehazenet: Attention-based multi-scale network for image dehazing,” in IEEE International Conference on Computer Vision, 2019, pp. 7314–7323.
-  Y. Dong, Y. Liu, H. Zhang, S. Chen, and Y. Qiao, “Fd-gan: Generative adversarial networks with fusion-discriminator for single image dehazing,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, 2020, pp. 10 729–10 736.
-  H. Dong, J. Pan, L. Xiang, Z. Hu, X. Zhang, F. Wang, and M.-H. Yang, “Multi-scale boosted dehazing network with dense feature fusion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 2157–2167.
-  H. Wei, Q. Wu, H. Li, K. N. Ngan, H. Li, and F. Meng, “Single image dehazing via artificial multiple shots and multidimensional context,” in 2020 IEEE International Conference on Image Processing. IEEE, 2020, pp. 1023–1027.
-  M. Ju, C. Ding, W. Ren, Y. Yang, D. Zhang, and Y. J. Guo, “Ide: Image dehazing and exposure using an enhanced atmospheric scattering model,” IEEE Transactions on Image Processing, vol. 30, pp. 2180–2192, 2021.
-  Q. Wu, W. Ren, and X. Cao, “Learning interleaved cascade of shrinkage fields for joint image dehazing and denoising,” IEEE Transactions on Image Processing, vol. 29, pp. 1788–1801, 2019.
-  P. Li, J. Tian, Y. Tang, G. Wang, and C. Wu, “Deep retinex network for single image dehazing,” IEEE Transactions on Image Processing, vol. 30, pp. 1100–1115, 2020.
-  A. Galdran, “Image dehazing by artificial multiple-exposure image fusion,” Signal Processing, vol. 149, pp. 135–147, 2018.
-  S. Chandra, N. Usunier, and I. Kokkinos, “Dense and low-rank gaussian crfs using deep embeddings,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 5103–5112.
-  T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907, 2016.
-  X. Wang and A. Gupta, “Videos as space-time region graphs,” in Proceedings of the European conference on computer vision, 2018, pp. 399–417.
-  H. Xu, C. Jiang, X. Liang, and Z. Li, “Spatial-aware graph relation network for large-scale object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 9298–9307.
-  Y. Chen, M. Rohrbach, Z. Yan, Y. Shuicheng, J. Feng, and Y. Kalantidis, “Graph-based global reasoning networks,” in IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 433–442.
-  K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
-  M. D. Zeiler, G. W. Taylor, R. Fergus et al., “Adaptive deconvolutional networks for mid and high level feature learning.” in IEEE International Conference on Computer Vision, vol. 1, no. 2, 2011, p. 6.
-  B. Li, W. Ren, D. Fu, D. Tao, D. Feng, W. Zeng, and Z. Wang, “Benchmarking single-image dehazing and beyond,” IEEE Transactions on Image Processing, vol. 28, no. 1, pp. 492–505, 2018.
C. Sakaridis, D. Dai, S. Hecker, and L. Van Gool, “Model adaptation with synthetic and real data for semantic dense foggy scene understanding,” inProceedings of the European Conference on Computer Vision, 2018, pp. 687–704.
-  C. O. Ancuti, C. Ancuti, and R. Timofte, “Nh-haze: An image dehazing benchmark with non-homogeneous hazy and haze-free images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 444–445.
-  X. Min, G. Zhai, K. Gu, Y. Zhu, J. Zhou, G. Guo, X. Yang, X. Guan, and W. Zhang, “Quality evaluation of image dehazing methods using synthetic hazy images,” IEEE Transactions on Multimedia, vol. 21, no. 9, pp. 2319–2333, 2019.
-  C. O. Ancuti, C. Ancuti, F.-A. Vasluianu, and R. Timofte, “Ntire 2020 challenge on nonhomogeneous dehazing,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 490–491.
-  C. Sakaridis, D. Dai, and L. Van Gool, “Semantic foggy scene understanding with synthetic data,” International Journal of Computer Vision, vol. 126, no. 9, pp. 973–992, 2018.
R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 586–595.
-  X. Min, G. Zhai, K. Gu, X. Yang, and X. Guan, “Objective quality evaluation of dehazed images,” IEEE Transactions on Intelligent Transportation Systems, vol. 20, no. 8, pp. 2879–2892, 2018.
-  S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 6, pp. 1137–1149, 2016.
-  M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3213–3223.
-  H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2881–2890.