I Introduction
Images captured in hazy conditions often suffer from absorption and scattering effects caused by floating atmospheric particles such as dust, mist, and fumes, which can result in low contrast, blurry, and noisy images. This degraded image quality potentially challenges many subsequent highlevel computer vision tasks, , object detection [1, 2, 3] and segmentation [4, 5, 6]. Therefore, removing haze and improving image quality benefits these applications, making image dehazing a subject of intense research and practical focus.
To be specific, image haze removal or dehazing refers to a technique that restores a hazefree image from a single or several observed hazy images. Many dehazing approaches have been proposed, which can be categorized into those that: 1) use auxiliary information such as scene depth [7] and polarization [8]; 2) use a sequence of captured images [9]; 3) use a single hazy image [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25], as the model input when dehazing. Of these, single image dehazing without the need for additional information is of most practical benefit. However, as a typical illposed problem, single image dehazing remains challenging and requires refinement.
The presence of haze leads to the combination of an attenuation term corresponding to the absorbing effect and a scattering term corresponding to the scattering effect that occur during imaging. Both terms are related to an intermediate variable, that is, transmission, which depends on scene depth. One feasible haze removal solution is to estimate the transmission and then recover the clear image by reversing the attenuation and scattering. Many single image dehazing methods have been proposed
[13, 26, 14, 15, 16, 17, 18, 21], which use either handcrafted features (, different image priors) or learningbased features to estimate the haze transmission.For example, He et al. [14] proposed a simple and effective dark channel prior for single image dehazing, which assumes that the minimum of all the spectral channels in clear images (the “dark channel”) is close to zero. The method effectively estimates the haze transmission. However, the dark channel prior may not work for some particular scenes such as for white objects, which are similar to atmospheric light, because it underestimates the transmission and leads to overdehazed artifacts. Zhu et al. [16] proposed a color attenuation prior that assumes a positive correlation between the scene depth and the haze concentration, which is represented by the subtraction of scene brightness from saturation. Then, the scene depth and haze transmission are easily estimated by a regressed linear model based on the above prior. Recently, Berman et al. [17] proposed a nonlocal prior based on the assumption that colors in a clear image can be approximated by some distinct colors clustering tightly in RGB space. Being affected by haze, each cluster becomes a line in RGB space (hazeline) due to the varying transmission coefficients of the clustered pixels. Consequently, the transmission and clear image are estimated according to these haze lines. Though priorbased methods are usually simple and effective for many scenes, they share the common limitation of describing specific statistics, which may not work for some images.
Learningbased methods adopt a datadriven approach to learn a linear/nonlinear mapping between features and transmission and so overcomes the limitations of specific priors. For example, Tang et al. [15]
proposed learning a regression model based on random forests from hazerelevant features including the dark channel, local max contrast, hue disparity, and local max saturation. They trained the model using a synthetic dataset and tested it on both synthetic and realworld hazy images, which then became common practice in subsequent learningbased methods. The learningbased idea for dehazing has subsequently been extended in three ways: 1) more powerful learning models; 2) more effective synthetic methods and larger datasets; 3) endtoend modeling/training.
Deep neural networks have now been successfully applied to many computer vision tasks including object recognition, detection, and semantic segmentation. By leveraging their powerful representation capacity and endtoend learning, many deep convolutional neural network (CNN)based approaches were proposed for image dehazing
[18, 19, 20, 21, 22, 23, 24]. For example, Cai et al. [18] proposed an endtoend trainable deep CNN model called DehazeNet to directly learn the transmission from hazy images, which is superior to contemporary priorbased methods and random forest models [15]. Ren et al. [19] proposed a multiscale CNN (MSCNN) to learn the transmission map in a fully convolutional manner and explore a multiscale architecture for coarsetofine regression.Despite the effectiveness of CNNbased approaches, a separate step is still needed to estimate the atmospheric light. Recently, Zhang et al. [23] proposed an endtoend densely connected pyramid dehazing network (DCPDN) to jointly learn the transmission map, atmospheric light, and dehazing. They adopted an encoderdecoder architecture with a multilevel pyramid pooling module to learn multiscale features. They also utilized an adversarial loss based on a generative adversarial network [27] to supervise the dehazing network. Rather than estimating the intermediate transmission, Li et al .[20] proposed an endtoend CNN model called the allinone dehazing network (AODNet) to learn the clear image from a hazy one. They integrated the transmission and atmospheric light into a single variable by reformulating the hazy imaging model. Ren et al. [22] proposed a gated fusion network (GFN) by adopting an encoderdecoder architecture, while Li et al.[24] also designed an encoderdecoder architecture but based on a conditional generative adversarial network (cGAN) to learn the dehazed image endtoend. Though cGAN and DCPDN have achieved good dehazing results, they contain dozens of convolutional layers and are about 200 MB in size, making them awkward and unlikely to be applicable in the resourceconstrained context of a computer vision system.
In this paper, we aim to develop a fast and accurate deep CNN model for single image dehazing. We use a fully convolutional and endtoend training/testing approach to efficiently process hazy images of arbitrary size. To this end, we propose a fast and accurate multiscale dehazing network called FAMEDNet, which comprises encoders at three scales and a fusion module to directly learn the hazefree image. Each encoder consists of cascaded pointwise convolutional layers and pooling layers via a densely connected mechanism. Since no larger convolutional kernels are used and features are reused layerbylayer, FAMEDNet is lightweight and computationally efficient. Thorough empirical studies on public synthetic datasets and realworld hazy images demonstrate the superiority of FAMEDNet over representative stateoftheart models with respect to model complexity, computational efficiency, restoration accuracy, and crossset generalization. The code will be made publicly available at https://github.com/chaimi2013/FAMEDNet.
The main contributions of this paper can be summarized as follows:
We devise a novel multiscale endtoend dehazing network called FAMEDNet, which implicitly learns efficient statistical image priors for fast and accurate haze removal from a single image.
FAMEDNet leverages fully pointwise convolutions as the basic unit to construct the encoderdecoder architecture, which has a small model size and is computationally efficient.
FAMEDNet outperforms stateoftheart models on both synthetic benchmarks and realworld hazy images.
Ii Related Work
Iia Atmospheric Scattering Model
Images captured in hazy condition can be mathematically formulated as [28, 29, 14]:
(1) 
where is the observed hazy image, is the scene radiance, is the atmospheric light assumed to be a global constant, is the haze transmission, denotes pixel location, and denotes the spectral channel, , . The first term, called the attenuation term, represents the haze absorbing effect on scene radiance, while the second term, called the scattering term, represents the haze scattering effect on ambient light. describes the fraction of scene radiance reaching the camera sensor, so is the “transmission”, which depends on scene depth. Under the homogeneous haze assumption, the transmission can be expressed as:
(2) 
where denotes the medium attenuation coefficient and is the scene depth.
IiB Priorbased and Learningbased Image Dehazing Methods
As can be seen from the atmospheric scattering model in Eq. (1), given an observed hazy image , recovering the scene radiance is illposed. Different image priors have been proposed to constrain the hazefree image and make the estimate tractable, including the dark channel prior [14], color attenuation prior [16], and nonlocal prior [17], etc. As defined in [14], each pixel value of the dark channel refers to the minimum pixel value on each patch centered at every pixel position. Figure 1 shows an example of the dark channels on both clear and hazy images. As can be seen, the dark channel of a clear image is almost dark everywhere except for the bright sky region, while the dark channel of a hazy image reveals the haze veil due to the haze scattering effect (corresponds to the second term in Eq. (1)). Based on the dark channel prior, the transmission can be efficiently estimated from the dark channel map. It is noteworthy that the pixel value of the dark channel reveals the hazy density (which is related to scene depth) even though it is calculated locally in a sliding window manner (See the regions and corresponding values indicated by the red boxes). It can be explained as follows: 1) the haze effects of both attenuation and scattering which are directly related to scene depth, can be described as a pixeltopixel (, locally) mapping from clear pixel to hazy pixel by the atmospheric scattering model. 2) the dark channel prior reveals the intrinsic locally statistical property of clear images. Similar to [14], our approach also solves the dehazing problem in a local manner which implicitly learns a statistical image prior as will be demonstrated in Section IVC4.
To overcome the limitations of priorbased methods, many deep CNNbased datadriven dehazing models have been proposed since Cai et al. [18] proposed DehazeNet, including MSCNN [19], AODNet [20], FPCNet [21], DCPDN [23], GFN [22], cGAN [24] and proximal DehazeNet [30]. These can be categorized into those that: 1) estimate using CNN [18, 19, 21, 23, 30]; 2) directly learn the scene radiance endtoend [20, 23, 22, 24, 30]. Our proposed method falls into the latter category and is partly inspired by AODNet [20] and FPCNet [21]. In contrast to AODNet, we propose a fully pointwise CNN to regress
and produce a stronger representation capacity. In contrast to FPCNet, we propose: 1) an endtoend model to regress the scene radiance directly; 2) a multiscale architecture to handle the scale variance, which achieves much better results than FPCNet while maintaining low model complexity and high computational efficiency; and 3) a new training/testing strategy that negates the need for a preprocessing shuffling step. Compared to MSCNN, in which coarsescale predictions are used as part of the input for the finer scale, the proposed method adopts a Gaussian pyramid architecture and follows a late fusion strategy. It produces better dehazing results than MSCNN and runs faster. Compared to the recently proposed DCPDN and cGAN, our model is much more compact,
, less than 90 kb vs. about 200 Mb, while having a high restoration accuracy and computational efficiency.IiC Multiscale pyramid architecture
The pyramid structure is a basic idea used for both multiresolution image representation and multiscale feature representation in the computer vision area, for example, Gaussian pyramid, Laplacian pyramid, wavelet [31], and SIFT [32]
. Leveraging this classical idea, CNN produces a feature pyramid through stacked convolutional layers and spatial pooling layers. Recently, different multiscale image or feature pyramid architectures have been devised for both low and highlevel computer vision applications, including deep Laplacian pyramid networks for image superresolution
[33], DeepExposure using Laplacian pyramid decomposition [34], deep generative image models [35], Laplacian pyramid reconstructive adversarial network [36], Deeplab using an image pyramid for semantic segmentation [37], and feature pyramid networks for object detection [38]. Our approach also adopts the Gaussian/Laplacian pyramid architectures for multiscale fusion (See Figure (a)a and Figure (b)b). In contrast to those above methods, the proposed FAMEDNet is specifically devised for single image dehazing. Moreover, it leverages fully pointwise convolutions instead of convolutions with large kernels for constructing a lightweight and computationally efficient network.IiD Deep Supervision
Adding auxiliary supervision on intermediate layers within a deep neural network also known as deep supervision is originally proposed by Xie and Tu in the seminal work [39, 40]
. This technique facilitates multiscale and multilevel feature learning by allowing error information backpropagation from multiple paths and alleviating the problem of vanishing gradients in deep neural networks. Deep supervision has been widely adopted in the following work in different areas such as Deeplab for semantic segmentation
[37], MSCNN for image dehazing [19], LapSRN for image superresolution [33], etc. We also add supervision on the dehazed image at each scale by leveraging the deep supervision idea.Iii FAMEDNet for Single Image Dehazing
Iiia A Probabilistic View to Solving the Illposed Dehazing Problem
Eq. (1) and Eq. (4) can be rewritten as:
(5) 
(6) 
Applying a logarithmic operation to both sides of the above equation produces the following general form:
(7) 
where is the observed degraded image, is the ground truth hazefree image, and is the intermediate variable related to the degrading process. and
can be estimated using maximum a posteriori estimation (MAP),
,(8) 
is the data likelihood, which corresponds to the data fidelity term measuring the reconstruction error. When using the L2 loss to supervise network training, it indeed assumes a normal distribution about the reconstruction error (see Section
IIIB and the yellow circle in Figure 2). The L1 loss can also be used to enforce a sparse constraint. is the conditional distribution of conditioned on the clear hazefree image. For example, DCP [14] assumes that (, ) concentrates on zeros. As with DehazeNet [18] and AODNet [20], the networks can implicitly learn and , as we will show in Section IVC4. is the prior distribution of , usually assumed to be longtailed due to the spatial continuity in natural images (locally smooth regions and sparse abrupt edges) [41]. Markov Random Fields or simple filters like guided filter are used to model the spatial continuity [42].Based on the above analysis, the key is to construct a model that can effectively learn statistical regularities. As shown in [21], statistical regularities in natural images can be efficiently learned by pointwise convolutions, which are compact and resists overfitting. Partly inspired by [21], we devise a novel endtoend fully pointwise CNN for single image dehazing.
IiiB The Singlescale FAMEDNet: FAMEDNetSS
As shown in Figure 2, the network is designed to learn the reformulated variable in Eq. (3) and recover the scene radiance according to Eq. (4) (see [20]). There are five pointwise convolutional layers, in which the first four form the Kencoder and the last forms the decoder. Features corresponding to different receptive fields are reused via dense connections (see black arcs and cubes in Figure 2). Mathematically, this can be formulated as:
(9) 
where represents the learned features from the block. We denote the input as the block, the hazy image of size as , and the decoded features in the block as , , . denotes the index set, which indexes the feature maps used by the block via dense connections (), , , , , , in the proposed network. denotes the mapping function in the
block learned by a combination of a convolutional layer, a batch normalization layer, a ReLU layer and a pooling layer.
We leverage pooling layers of different kernel sizes () after each convolutional layer to aggregate multilevel statistics (features) within the receptive fields, , . It is noteworthy that by using a combination of pointwise convolutional layers and a pooling layer, the output node has a receptive field of , which is equivalent to the one using a convolutional layer alone. In this way, we retain the representation capacity of the neural network for statistical modeling but using fewer parameters, leading to a more compact architecture. Further, no pooling layer and batch normalization layer are used in the final block. Since pooling with a
kernel is trivial, it is omitted. Strides in both the convolutional and pooling layers are set to 1 to retain the feature map size. The output feature channels in the Kencoder are kept at 32,
, (see blue cubes in Figure 2). Then, the decoded map is used to recover the scene radiance according to Eq. (4) (see the yellow circle in Figure 2). This structure is denoted FAMEDNetSS, where “SS” stands for single scale.We use the L2 loss to supervise the network during training:
(10) 
where is the estimated scene radiance, represents learnable parameters of the network, and is the weight decay factor in the regularization term.
IiiC The Multiscale Variants of FAMEDNet: FAMEDNetGP and FAMEDNetLP
Objects at distinct distances are of different size in the captured images, leading to variably sized homogenous regions in the transmission map or map. To handle the multiscale characteristics, we extend the proposed network to multiscale by adopting a Gaussian pyramid architecture as shown in Figure 3(a). We downsample the input hazy image to another two scales, , 1/2 scale and 1/4 scale, respectively. Then, we construct a Kencoder for each scale without sharing weights. Further, the estimated
maps from the coarse scales are interpolated to the original scale and concatenated as:
(11) 
where denote the interpolated maps. Bilinear interpolation is used for both downsampling and upsampling. Then, we introduce a fusion module to fuse the multiscale estimates into a more reliable one, which is again implemented by a convolutional layer and a ReLU layer as:
(12) 
Finally, is used to recover the scene radiance according to Eq. (4). This structure is denoted FAMEDNetGP, where “GP” stands for Gaussian pyramid.
The L2 loss is used to supervise the network:
(13) 
where and represent the ground truth and the estimated scene radiance at each scale, and represents the estimated scene radiance from the fusion module. and are loss weights, which are set to 1.
In addition to the Gaussian pyramid architecture, we also adopt a Laplacian pyramid architecture for comparison. As shown in Figure 3(b), the estimated map at the coarse scale is interpolated and added to the Kencoder output at the finer scale. Mathematically, it can be formulated as:
(14) 
Therefore, it enforces the Kencoder at the finer scale to learn a residual . The other parts are kept the same as the Gaussian pyramid one. This structure is denoted FAMEDNetLP, where “LP” stands for Laplacian pyramid. It is noteworthy that the receptive field of FAMEDNetSS is which is similar to the local window size in priorbased dehazing methods, , in DCP [26] and MRP [43]. As for FAMEDNetGP and FAMEDNetLP, their receptive fields become larger, , , which enables the network to learn more effective statistical regularities.
IiiD Model Complexity Analysis
Network  Type  Input Size  Num  Filter  Pad  Stride 

FAMEDNetSS  Conv1  128x128x3  32  1x1  0  1 
Conv2  128x128x32  32  1x1  0  1  
Pool2  128x128x32    3x3  1  1  
Concat2  128x128x64          
Conv3  128x128x64  32  1x1  0  1  
Pool3  128x128x32    5x5  2  1  
Concat3  128x128x64          
Conv4  128x128x64  32  1x1  0  1  
Pool4  128x128x32    7x7  3  1  
Concat4  128x128x128          
Conv5  128x128x128  3  1x1  0  1  
Params  5,987  
Complexity^{1}^{1}1Evaluated with FLOPs, the number of floatingpoint multiplicationadds.  9.39x10  
FAMEDNet  Params  17,991  
(GP/LP)  Complexity  1.24x10 
The details of FAMEDNet are shown in Table I. It can be seen that FAMEDNet is very lightweight and compact thanks to the fully pointwise convolutions. For example, FAMEDNetSS only contains 5,987 learnable parameters and has 9.39x10 FLOPs. The number of learnable parameters increases threefold in FAMEDNetGP, while the FLOPs only increase by about 30%. FAMEDNet can process hazy images of arbitrary size due to its fully convolutional structure, with the computational cost increases linearly with the image size.
To reduce the required FLOPs for large images, we propose a fixed size testing strategy. First, we resize the hazy image with the longest side to 360 and input it into the network. Then, we resize the estimated map from the fusion module back to the original size using bilinear interpolation. Further, we use the fastguided filter [44] to refine the interpolated map. The fastguided filter is times faster than the original guided filter [42], with almost no visible degradation, where is the downsampling ratio (refer to [44] for details). Finally, the scene radiance is recovered according to Eq.(4). In this way, we can process hazy images of arbitrary size at an almost fixed computational cost. We present our comparisons with stateoftheart models in Table II including parameters, model size, and runtime. These comparisons clearly show that FAMEDNet is lightweight and computationally efficient. More details can be found in Section IVC5.
Model  Param.  Size  Platform  Time (second) 
DCP [26]      Matlab(C)  1.62 
FVR [45]      Matlab(C)  6.79 
BCCR [46]      Matlab(C)  2.85 
GRM [47]      Matlab(C)  83.96 
CAP [16]      Matlab(C)  0.95 
NLD [17]      Matlab(C)  9.89 
DehazeNet [18]  8,240    Matlab(C)  1.3399 
MSCNN [19]  8,014    Matlab(C)  2.4840 
FPCNet [21]  288  2.2Kb  MatCaffe(C/G)  0.1924/0.0016 
0.2046/0.0178  
AODNet [20]  1,833  8.9Kb  MatCaffe(C/G)  0.3025/0.0043 
GFN [22]  514,415  1.99Mb  MatCaffe(C/G)  9.9763/0.0490 
cGAN [24]  1.23x  198.8Mb  Torch7(G)  0.0520 
DCPDN^{2}^{2}2The number was calculated on 512x512 images since DCPDN required a fixedsize input. [23]  6.69x  255.6Mb  Pytorch(G)  0.0417 
FAMEDNet  17,991  86.3Kb  MatCaffe(C/G)  0.8894/0.0116 
0.9061/0.0285 
Iv Experiments
To evaluate the performance of FAMEDNet, we compared it with stateoftheart image priorbased methods including DCP [26], FVR [45], BCCR [46], GRM [47], CAP [16], and NLD [17] and deep CNNbased methods including DehazeNet [18], MSCNN [19], AODNet [20, 2], FPCNet [21], GFN [22], and DCPDN [23]. We adopted the recently proposed RESIDE [3] as the benchmark dataset due to its large scale and diverse data sources and image contents. RESIDE contains 110,500 synthetic hazy indoor images (ITS) and 313,950 synthetic hazy outdoor images (OTS) in the training set. We reported the PSNR and SSIM for each method on the SOTS test set, which includes both indoor and outdoor scenes (500 of each). We also compared the subjective visual effects on realworld hazy images used in the literature. Ablation studies were conducted on TestSetS containing 400 hazy indoor/outdoor images, a dataset initially used in a challenge [48].
FAMEDNet was trained for a total of 400,000 iterations on the combination of ITS and OTS in RESIDE. 128x128 patches randomly cropped from training images were used for training. Figure 4 shows the corresponding statistics of depth levels within the training patches. We quantized depth maps into 10 uniform levels according to the maximum and minimum depth values. Then, we counted the number of unique depth levels within each patch and calculated the histogram and its corresponding cumulative distribution as shown in Figure 4. As can be seen, almost 65% patches cover at least 3 depth levels and more than 40% patches cover at least 4 depth levels. It is noteworthy that since the sizes of training images from different scenes are around , each patch could cover diverse scene structures as evident by the statistics. Consequently, there are different levels of haze in each patch, , light and dense haze. It facilitates FAMEDNet with a receptive field of to learn effective feature representation while avoiding overfitting plain structures.
Hyperparameters were tuned on the validation set. The batch size was set to 48. The initial learning rate was set to 0.00001, which decreased by 10 after 200,000 and 320,000 iterations. The momentum and weight decay were set to 0.9 and 0.0001, respectively. Average pooling was used unless otherwise specified. During testing, the kernel radius of the fastguided filter was set to 48. The regularization parameter epsilon was set to 0.0001. The downsampling factor was set to 4. FAMEDNet was implemented in Caffe
[49] and run on a workstation with a 3.5 GHz CPU, 32G RAM, and Nvidia Titan XP GPUs.Iva Ablation Experiments
PSNR (dB)  SSIM  

Model  BN  Feat. Dim  Scales  Indoor  Outdoor  Average  Indoor  Outdoor  Average 
FAMEDNetNoBN  x  4  1  18.78  24.68  21.73  0.7117  0.8949  0.8033 
FAMEDNetFD4  ✓  4  1  20.14  25.82  22.98  0.7488  0.9296  0.8392 
FAMEDNetFD8  ✓  8  1  20.45  25.88  23.17  0.7785  0.9195  0.8490 
FAMEDNetFD16  ✓  16  1  20.39  25.69  23.04  0.7905  0.9306  0.8605 
FAMEDNetS  ✓  32  1  20.71  25.71  23.21  0.7958  0.9307  0.8633 
FAMEDNetGP2  ✓  32  2  20.66  26.19  23.42  0.7940  0.9312  0.8626 
FAMEDNetGP  ✓  32  3  20.85  26.22  23.54  0.8051  0.9268  0.8660 
FAMEDNetGPFD64  ✓  64  3  20.59  26.61  23.60  0.7929  0.9362  0.8646 
PSNR (dB)  SSIM  

Model  Training Data  Iterations  Indoor  Outdoor  Average  Indoor  Outdoor  Average 
FAMEDNetGP  40,000  100,000  20.85  26.22  23.54  0.8051  0.9268  0.8660 
FAMEDNetGP  40,000  400,000  20.89  26.67  23.78  0.8082  0.9357  0.8719 
FAMEDNetGP  ALL(424,450)  400,000  23.42  27.94  25.68  0.8687  0.9483  0.9085 
PSNR (dB)  SSIM  

Model  3x3 Conv.  Training Data  Iterations  Indoor  Outdoor  Average  Indoor  Outdoor  Average 
FAMEDNetGP3x3  ✓(4)  40,000  100,000  20.62  26.83  23.73  0.7851  0.9427  0.8639 
FAMEDNetGP3x3  ✓(8)  40,000  100,000  21.07  26.53  23.80  0.8189  0.9445  0.8817 
FAMEDNetGP3x3  ✓(8)  ALL(424,450)  400,000  24.02  27.86  25.94  0.8840  0.9520  0.9180 
FAMEDNetLP  x  ALL(424,450)  400,000  23.35  27.85  25.60  0.8724  0.9492  0.9108 
FAMEDNetGPMaxP  x  ALL(424,450)  400,000  24.34  28.67  26.51  0.8797  0.9555  0.9176 
IvA1 Ablations on the Basic Architecture
First, we conducted ablations on the components of the basic FAMEDNet architecture. We sampled a total of 40,000 images from ITS and OTS evenly to form a training set for ablations. Moreover, the models were trained in a total of 100,000 iterations. The learning rate decreased by 0.1 after 50,000 and 80,000 iterations. All other parameters were as described above. The results on TestSetS are listed in Table III.
The dehazing results of FAMEDNetFD4 with batch normalization were much better than FAMEDNetNoBN. FAMEDNetFD4 was also found to converge faster than FAMEDNetNoBN. We also show the impact of the number of convolutional feature channels on the dehazing results. With more channels, the model tended to have a stronger representational capacity and achieved higher PSNR and SSIM scores. For example, FAMEDNetS achieved a gain of 0.3 dB and 0.024 SSIM score over FAMEDNetFD4 and a gain of 1.5 dB and 0.06 SSIM score over FAMEDNetNoBN. With respect to the multiscale architecture, with an additional downscale branch, the PSNR score was improved by 0.2 dB but the SSIM score only decreased marginally. With all three scales, FAMEDNetGP was the best architecture. Finally, we increased the feature channels in FAMEDNetGP, but this only marginally improved the PSNR score and decreased the SSIM score. As a tradeoff between accuracy and complexity, we chose FAMEDNetGP as the representative architecture.
IvA2 Ablations on Training Data Volume and Training Iterations
We next investigated the impact of training data volume and training iterations. Specifically, we trained FAMEDNetGP with 400,000 iterations and all the images in ITS and OTS, , a total of 424,450 images. The results are listed in Table IV. It can be seen that with sufficient training, FAMEDNetGP improved. Moreover, the PSNR and SSIM significantly improved when FAMEDNetGP was trained with all the images, producing a gain of 2.14 dB and 0.0425 SSIM score. Therefore, more training data benefits the deep neural network by exploiting its powerful representation capacity.
IvB Variants of the Multiscale Architecture
IvB1 Additional 3x3 Convolutions for Learning Structural Features
Due to the fully pointwise convolutional structure, FAMEDNetGP has limited ability on learn structural features. To see whether additional structural features benefit dehazing, we inserted additional 3x3 convolutional layers at the beginning of each scale in FAMEDNetGP (denoted FAMEDNetGP3x3). We tested different feature channel configurations including 4 and 8. The results are shown in the first three rows in Table V.
Compared with FAMEDNetGP (see the first and last rows in Table IV), FAMEDNetGP3x3 performed better with the same training settings. With more 3x3 convolutional channels, FAMEDNetGP3x3 trained with all training images was the best architecture, , 25.94 dB and 0.9180 SSIM score. Compared with its counterpart without 3x3 convolutional layers, gains of 0.26 dB and 0.01 SSIM score were achieved. However, this came at the cost of additional 6.69% parameters (, 1152) and 6.66% FLOPs (, 8.26x10).
IvB2 Laplacian Pyramid Architectures
In Section IIIC, we also presented a Laplacian pyramid architecture FAMEDNetLP (see Figure 3(b)). Compared with the Gaussian pyramid architecture FAMEDNetGP (see the last row in Table V), FAMEDNetLP achieved a marginally lower PSNR and a marginally higher SSIM. Generally, its performance was comparable to FAMEDNetGP. Since there was no evident benefit to using residual learning, FAMEDNetGP was used as our default multiscale architecture in the following experiments.
IvB3 The Effectiveness of Max Pooling
For dehazing, effective local features are usually extracted from extreme pixel values including the dark channel (the minimum value of all the channels within a local patch) [14], local max contrast and saturation [15], and the learned features using the maxout operation in DehazeNet [18]. Inspired by these studies, we hypothesized that max pooling may be more effective for aggregating local statistics and learning effective features for dehazing
. To verify this hypothesis, we changed the average pooling operations in all the pooling layers to max pooling. This structure is denoted FAMEDNetGPMaxP and it was trained using the same settings as FAMEDNetGP. The results are shown in the last row in Table
V.Compared with its counterpart using average pooling (last row in Table IV), FAMEDNetGPMaxP achieved a significant gain of 0.83 dB and 0.0091 SSIM score. It also outperformed FAMEDNetGP3x3 by 0.57 dB and achieved almost the same SSIM score. Therefore, we chose FAMEDNetGPMaxP as the representative model of the proposed architectures due to its light weight (a total of 17,991 parameters) and computational efficiency (1.24x10 FLOPs). For simplicity, it is denoted FAMEDNet in the following sections.
IvC Comparison with Stateoftheart Methods
To evaluate the performance of FAMEDNet, we compared it with several stateoftheart methods including DCP [26], FVR [45], BCCR [46], GRM [47], CAP [16], NLD [17], DehazeNet [18], MSCNN [19], AODNet [20, 2], FPCNet [21], GFN [22] and DCPDN [23]
IvC1 Results on RESIDE SOTS
The PSNR and SSIM scores of the different methods are listed in Table VI. Several observations can be made. 1) CNNbased methods [18, 21, 20, 2, 22] generally outperformed the image priorbased methods [26, 45, 46, 47, 16, 17]. By learning features in a datadriven manner, CNNbased dehazing models had stronger representative capacities than image priorbased models, which are usually limited to specific scenarios. 2) CNN architecture matters. For example, FPCNet achieved a significant gain over its counterpart DehazeNet by using a lightweight, fully pointwise convolutional architecture. It achieved the second best SSIM score and even outperformed some complicated networks like AODNet, GFN, and DCPDN. Further, by integrating the imaging model into the network architecture, the endtoend AODNet recovered the target hazefree image with higher accuracy than the none endtoend methods [18, 19]. 3) FAMEDNet was the best performing method. Moreover, it significantly improved the PSNR and SSIM scores. For example, FAMEDNet surpassed the secondbest methods by a large margin of 3.6 dB and 0.05 SSIM score.
Model  PSNR (dB)  SSIM 
DCP [26]  16.62  0.8179 
FVR [45]  15.72  0.7483 
BCCR [46]  16.88  0.7913 
GRM [47]  18.86  0.8553 
CAP [16]  19.05  0.8364 
NLD [17]  17.29  0.7489 
DehazeNet [18]  21.14  0.8472 
MSCNN [19]  17.57  0.8102 
FPCNet [21]  21.84 (20.92/22.75)  0.8872 (0.8729/0.9014) 
AODNet [20]  19.06  0.8504 
AODNet* [2]  23.43 (20.68/26.18)  0.8747 (0.8229/0.9266) 
GFN [22]  22.30  0.8800 
DCPDN [23]  20.81 (19.13/22.49)  0.8378 (0.8191/0.8565) 
FAMEDNet  27.01 (25.00/29.03)  0.9371 (0.9172/0.9570) 
After carefully dissecting the proposed architecture of FAMEDNet and comparing it with stateoftheart architectures, we can make the following conclusions. First, pointwise convolution plays a key role in constructing a compact and lightweight dehazing network. Cascaded pointwise convolutional layers are very effective for tackling the illposed dehazing problem by aggregating local statisticbased features layer by layer. Second, modeling the dehazing task in an endtoend manner is beneficial. Third, a carefully designed multiscale architecture can handle scale variance in complex scenes while only minimally increasing the computational cost. Finally, reusing features via dense connections like [20, 23, 50] leads to a better and more compact model.
IvC2 Subjective Evaluation
Subjective comparison on synthetic hazy images are presented in Figure 5. Dehazed results of MSCNN [19] on indoor images have residual haze indicated by the red boxes. Besides, MSCNN tended to produce oversaturated results with color distortions as indicated by the red arrows. Similar phenomena can also be found in the results of AODNet [20]. Although FPCNet [21] achieved better results, there are some haze residual and color distortions as well. Moreover, MSCNN and FPCNet produced noisy results due to the incorrectly estimated transmission in regions enclosed by the blue boxes. The proposed FAMEDNet successfully restores the clear images with higher color fidelity and less haze/noise residual. It demonstrates the fitting ability of FAMEDNet learned from synthetic training images.
Next, we present the results on realworld hazy images in Figure 6 to compare different methods’ generalization ability. Closeup views in the red rectangles are also presented. It can be seen that DCP, MSCNN, and AODNet tended to produce oversaturated results, especially in sky regions. MSCNN also exhibits color artifacts, making the dehazed results unrealistic (see the first two images). AODNet dehazed images appear dimmer than the others. DehazeNet achieved better results, but still produced some color artifacts (see the middle part of the first image and the bluish artifact in the second image). FPCNet outperformed DehazeNet but retained some haze.
Using some enhanced results as input and a fusion strategy, GFN generated visually better results. However, color distortions in the middle part of the first image and the oversaturated second image are visually unpleasant. DCPDN produced better dehazing results and brighter results. However, some details are missing due to the overexposurelike artifacts. Generally, FAMEDNet produced better or at least comparable results to stateoftheart methods, , clear details with fewer color artifacts and highfidelity sky regions. We also compared image enhancement for antihalation using different methods in the last row. FAMEDNet also produced visually pleasing results. More results can be found in the supplement.
IvC3 Crossset Generalization
We also compared the crossset generalization between FAMEDNet and two recently proposed methods, GFN and DCPDN. We used RESIDE SOTS and TestA in [23] as two test sets. We used the pretrained models of all three methods and did not finetune them. The results are listed in Table VII. It can be seen that FAMEDNet shows better generalization than GFN and DCPDN, which we ascribe to using the largescale training set and the effectiveness of the proposed architecture.
IvC4 Analysis on the Learned Latent Statistical Regularities
Image priorbased methods including DCP [26], CAP [16] and NLD [17] assume prior statistics on hazefree images, which are used to enforce statistical regularities on recovering the target dehazed results [41]. The learningbased methods also learn latent statistical regularities [18, 20, 21]. For example, DehazeNet and FPCNet, which regress the transmission, should produce a transmission map of all 1s for a hazefree image. In other words, they should learn dark channellike statistical priors, , . As for AODNet and FAMEDNet, they regress a latent variable K implicitly. For a hazefree image, the atmospheric light is usually assumed to be white, , . Therefore, the corresponding can be deduced as from Eq. (3). Also, it should be a map all of 1s, , , where is the mean across three channels.
To compare the learned statistical regularities of different methods, we collected 100 hazefree images (two examples are shown in the first column of Figure 7) . These images were resized such that the long side was 480 pixels and the short side ranged from 100 to 480 pixels. Then, we calculated the dark channel, , and within each local patch of size . Next, we split the range of pixel value into 20 uniform bin centers and counted the corresponding number of pixels belonging to each bin on all images. Finally, we plotted the histograms of dark channel, , and for DCP, FPCNet, AODNet, and FAMEDNet in Figure 8. FAMEDNet learned a much more effective statistical regularity than DCP, FPCNet, and AODNet. Besides, the statistics of AODNet are far from zero. In other words, the trained network implicitly assumes that there is haze that needs to be removed in hazefree images. Therefore, it leads to overdehazed artifacts, as seen in the third column. This is consistent with the visual results in Figure 6.
IvC5 Runtime Analysis
Following [3], we further compared the runtime of different methods on the indoor images () in RESIDE SOTS. The results are listed in Table II in Section IIID. Results of the classical methods above the line and cGAN are from [3, 24]. Others are reported using our workstation and the code released by the authors. We report the runtime of network forward computation and the whole algorithm including fastguided filter refinement for FPCNet and FAMEDNet, as shown in separate rows in Table II. The numbers before/after the slash denote the runtime in CPU/GPU mode, , C/G. FAMEDNet runs very fast and reaches 85 fps and 35 fps without/with fastguided filter refinement. In addition, we also list the number of parameters and model size of each CNN model. Compared with the recently proposed GFN, cGAN, and DCPDN, FAMEDNet is much more compact and lightweight.
IvD Limitations and Discussions
As stated in Section IIB and demonstrated in Section IVC4, the proposed FAMEDNet implicitly learns a locally statistical regularity for dehazing like many prior and learnbased methods [26, 16, 18, 19, 21, 20]. Though FAMEDNet outperforms these methods by leveraging more efficient architecture, it still has some limitations. Some examples of transmission maps estimated by FAMEDNet are shown in the bottom row in Figure 9. As indicated by the blue polygons, the transmission in the sky regions is incorrect, leading to underdehazed artifacts as shown in Figure 6. It may be solved by incorporating highlevel semantics into the dehazing network. However, it comes to the “chicken and egg” dilemma between the lowlevel enhancement and highlevel understanding of degraded images. We suppose that it could be solved by jointly modeling the two correlated problems in a unified framework, which we leave as future work.
Besides, as evident by the lowlight enhancement experiments in the supplement and color constancy results in [21], pointwise convolutions could be used for statistical modeling of illumination, color cast, etc. Referring to the haze imaging model in [43], we will also exploit FAMEDNet’s potential for haze removal in the presence of nonuniform atmosphere light, , artificial ambient light in nighttime haze environment. Extending FAMEDNet to remove heterogeneous haze is also promising by investigating regionbased techniques, , haze densityaware segmentation.
V Conclusions
In this paper, we introduce a novel fast and accurate multiscale endtoend dehazing network called FAMEDNet to tackle the challenging single image dehazing problem. FAMEDNet comprises three encoders at different scales and a fusion module, which is able to efficiently learn the hazefree image directly. Each encoder consists of cascaded pointwise convolutional layers and pooling layers via a densely connected mechanism. By leveraging a fully pointwise structure, FAMEDNet is lightweight and computationally efficient. Extensive experiments on public benchmark datasets and realworld hazy images demonstrate the superiority of FAMEDNet over other top performing models: it is a fast, lightweight, and accurate deep architecture for single image dehazing.
References
 [1] B. Li, X. Peng, Z. Wang, J. Xu, and D. Feng, “Endtoend united video dehazing and detection,” arXiv preprint arXiv:1709.03919, 2017.
 [2] Y. Liu, G. Zhao, B. Gong, Y. Li, R. Raj, N. Goel, S. Kesav, S. Gottimukkala, Z. Wang, W. Ren et al., “Improved techniques for learning to dehaze and beyond: A collective study,” arXiv preprint arXiv:1807.00202, 2018.
 [3] B. Li, W. Ren, D. Fu, D. Tao, D. Feng, W. Zeng, and Z. Wang, “Benchmarking single image dehazing and beyond,” IEEE Transactions on Image Processing, 2018.
 [4] Z. Tu, X. Chen, A. L. Yuille, and S.C. Zhu, “Image parsing: Unifying segmentation, detection, and recognition,” International Journal of computer vision, vol. 63, no. 2, pp. 113–140, 2005.
 [5] J.P. Tarel, N. Hautiere, A. Cord, D. Gruyer, and H. Halmaoui, “Improved visibility of road scene images under heterogeneous fog,” in Intelligent Vehicles Symposium (IV), 2010 IEEE. Citeseer, 2010, pp. 478–485.

[6]
C. Sakaridis, D. Dai, and L. Van Gool, “Semantic foggy scene understanding with synthetic data,”
International Journal of Computer Vision, pp. 1–20, 2018.  [7] K. Tan and J. P. Oakley, “Enhancement of color images in poor visibility conditions.” in ICIP, vol. 2, 2000.

[8]
Y. Y. Schechner, S. G. Narasimhan, and S. K. Nayar, “Instant dehazing of
images using polarization,” in
Proc. Computer Vision & Pattern Recognition Vol
, vol. 1, 2001, pp. 325–332.  [9] S. K. Nayar and S. G. Narasimhan, “Vision in bad weather,” in The IEEE International Conference on Computer Vision, vol. 2. IEEE, 1999, pp. 820–827.
 [10] Q. Liu, X. Gao, L. He, and W. Lu, “Single image dehazing with depthaware nonlocal total variation regularization,” IEEE Transactions on Image Processing, vol. 27, no. 10, pp. 5178–5191, 2018.
 [11] A. Wang, W. Wang, J. Liu, and N. Gu, “Aipnet: Imagetoimage single image dehazing with atmospheric illumination prior,” IEEE Transactions on Image Processing, 2018.
 [12] Z. Li and J. Zheng, “Single image dehazing using globally guided image filtering,” IEEE Transactions on Image Processing, vol. 27, no. 1, pp. 442–450, 2018.
 [13] R. Fattal, “Single image dehazing,” ACM transactions on graphics (TOG), vol. 27, no. 3, p. 72, 2008.
 [14] K. He, J. Sun, and X. Tang, “Single image haze removal using dark channel prior,” IEEE transactions on pattern analysis and machine intelligence, vol. 33, no. 12, pp. 2341–2353, 2011.
 [15] K. Tang, J. Yang, and J. Wang, “Investigating hazerelevant features in a learning framework for image dehazing,” in The IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2995–3000.
 [16] Q. Zhu, J. Mai, and L. Shao, “A fast single image haze removal algorithm using color attenuation prior,” IEEE Transactions on Image Processing, vol. 24, no. 11, pp. 3522–3533, 2015.
 [17] D. Berman, S. Avidan et al., “Nonlocal image dehazing,” in The IEEE conference on computer vision and pattern recognition, 2016, pp. 1674–1682.
 [18] B. Cai, X. Xu, K. Jia, C. Qing, and D. Tao, “Dehazenet: An endtoend system for single image haze removal,” IEEE Transactions on Image Processing, vol. 25, no. 11, pp. 5187–5198, 2016.
 [19] W. Ren, S. Liu, H. Zhang, J. Pan, X. Cao, and M.H. Yang, “Single image dehazing via multiscale convolutional neural networks,” in European Conference on Computer Vision. Springer, 2016, pp. 154–169.
 [20] B. Li, X. Peng, Z. Wang, J. Xu, and D. Feng, “Aodnet: Allinone dehazing network,” in The IEEE International Conference on Computer Vision, vol. 1, no. 4, 2017, p. 7.
 [21] J. Zhang, Y. Cao, Y. Wang, C. Wen, and C. W. Chen., “Fully pointwise convolutional neural network for modeling statistical regularities in natural images,” in ACM Multimedia Conference, 2018.
 [22] W. Ren, L. Ma, J. Zhang, J. Pan, X. Cao, W. Liu, and M.H. Yang, “Gated fusion network for single image dehazing,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
 [23] H. Zhang and V. M. Patel, “Densely connected pyramid dehazing network,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
 [24] R. Li, J. Pan, Z. Li, and J. Tang, “Single image dehazing via conditional generative adversarial network,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
 [25] T. M. Bui and W. Kim, “Single image dehazing using color ellipsoid prior,” IEEE Transactions on Image Processing, vol. 27, no. 2, pp. 999–1009, 2018.
 [26] K. He, J. Sun, and X. Tang, “Single image haze removal using dark channel prior,” in The IEEE conference on computer vision and pattern recognition, 2009.
 [27] I. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, 2014, pp. 2672–2680.
 [28] E. J. McCartney, “Optics of the atmosphere: scattering by molecules and particles,” New York, John Wiley and Sons, Inc., 1976. 421 p., 1976.
 [29] S. G. Narasimhan and S. K. Nayar, “Vision and the atmosphere,” International Journal of Computer Vision, vol. 48, no. 3, pp. 233–254, 2002.
 [30] D. Yang and J. Sun, “Proximal dehazenet: A prior learningbased deep network for single image dehazing,” in The European Conference on Computer Vision (ECCV), September 2018.
 [31] I. Daubechies, “The wavelet transform, timefrequency localization and signal analysis,” IEEE transactions on information theory, vol. 36, no. 5, pp. 961–1005, 1990.
 [32] D. G. Lowe, “Distinctive image features from scaleinvariant keypoints,” International journal of computer vision, vol. 60, no. 2, pp. 91–110, 2004.
 [33] W.S. Lai, J.B. Huang, N. Ahuja, and M.H. Yang, “Deep laplacian pyramid networks for fast and accurate superresolution,” in The IEEE conference on computer vision and pattern recognition, 2017, pp. 624–632.
 [34] R. Yu, W. Liu, Y. Zhang, Z. Qu, D. Zhao, and B. Zhang, “Deepexposure: Learning to expose photos with asynchronously reinforced adversarial learning,” in Advances in Neural Information Processing Systems, 2018, pp. 2149–2159.
 [35] E. L. Denton, S. Chintala, R. Fergus et al., “Deep generative image models using a laplacian pyramid of adversarial networks,” in Advances in neural information processing systems, 2015, pp. 1486–1494.
 [36] K. Xu, Z. Zhang, and F. Ren, “Lapran: A scalable laplacian pyramid reconstructive adversarial network for flexible compressive sensing reconstruction,” in The European Conference on Computer Vision (ECCV), 2018, pp. 485–500.
 [37] L.C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834–848, 2018.
 [38] T.Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
 [39] S. Xie and Z. Tu, “Holisticallynested edge detection,” in The IEEE international conference on computer vision, 2015, pp. 1395–1403.
 [40] C.Y. Lee, S. Xie, P. Gallagher, Z. Zhang, and Z. Tu, “Deeplysupervised nets,” in Artificial Intelligence and Statistics, 2015, pp. 562–570.
 [41] A. Hyvärinen, J. Hurri, and P. O. Hoyer, Natural Image Statistics. SpringerVerlag London, 2009.
 [42] K. He, J. Sun, and X. Tang, “Guided image filtering,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 6, pp. 1397–1409, 2013.
 [43] J. Zhang, Y. Cao, S. Fang, Y. Kang, and C. W. Chen, “Fast haze removal for nighttime image using maximum reflectance prior,” in The IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 7418–7426.
 [44] K. He and J. Sun, “Fast guided filter,” arXiv preprint arXiv:1505.00996, 2015.
 [45] J.P. Tarel and N. Hautiere, “Fast visibility restoration from a single color or gray level image,” in 2009 IEEE 12th International Conference on Computer Vision. IEEE, 2009, pp. 2201–2208.
 [46] G. Meng, Y. Wang, J. Duan, S. Xiang, and C. Pan, “Efficient image dehazing with boundary constraint and contextual regularization,” in The IEEE international conference on computer vision, 2013, pp. 617–624.
 [47] C. Chen, M. N. Do, and J. Wang, “Robust image and video dehazing with visual artifact suppression via gradient residual minimization,” in European Conference on Computer Vision, 2016, pp. 576–591.
 [48] W. Ren, Z. Wang, Y. Guo, g. Meng, X. Fan, and J. Guo, “Chinamm18dehazing,” https://rwenqi.github.io/ChinaMM18dehazing/, 2018.
 [49] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” in The 22nd ACM international conference on Multimedia. ACM, 2014, pp. 675–678.
 [50] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks.” in CVPR, vol. 1, no. 2, 2017, p. 3.
 [51] E. H. Land, “The retinex theory of color vision,” Scientific american, vol. 237, no. 6, pp. 108–129, 1977.
 [52] X. Guo, Y. Li, and H. Ling, “Lime: Lowlight image enhancement via illumination map estimation,” IEEE Transactions on Image Processing, vol. 26, no. 2, pp. 982–993, 2017.
 [53] C. Wei, W. Wang, W. Yang, and J. Liu, “Deep retinex decomposition for lowlight enhancement,” in British Machine Vision Conference, 2018.
 [54] M. Elad, “Retinex by two bilateral filters,” in International Conference on ScaleSpace Theories in Computer Vision. Springer, 2005, pp. 217–229.
Vi FAMEDNet: A Fast, Lightweight and Accurate Multiscale Endtoend Dehazing Network (Supplementary Material)
Via Illumination Balancing Network
ViA1 Modification of FAMEDNet for Illumination Balancing
Since the scene radiance is usually not as bright as the atmospheric light, the recovered hazefree image looks dim [26], especially for the dense haze regions and shading regions. It’s better to balance the illumination for both visually pleasing and facilitating subsequent highlevel tasks. Considering the following imaging model used in Retinex literatures [51, 52, 53]:
(15) 
where represents the observed image, the reflectance represents the intrinsic property of captured objects, the illumination represents the various lightness on objects, and denotes elementwise multiplication. Given an observed , estimating and is illposed. Various smoothness constraints have been proposed to make it tractable [54, 52, 53]. Instead of estimating the reflectance which typically looks unrealistic, we follow [54] by retaining some amount of illumination to make it enjoys both the desired brightness and the natural appearance. To this end, we propose a illumination balancing network (IBNet) to estimate a balanced illumination map from an input image. Then we replace the original unbalanced distributed illumination (approximated by the illumination channel in HSV color space) with the estimate. Specifically, we construct the IBNet from FAMEDNet with minor modification: 1) changing the 3channel in FAMEDNet to the onechannel illumination map; 2) omitting the recovery module depicted by the yellow circle. We used L2 loss to supervise the estimated illumination map.
To prepare the training/test datasets, we applied a fitted nonlinear mapping on the illumination channel of each clear image in RESIDE dataset (See Figure 10(a)) and used it to replace the original one to form the illumination unbalanced image (See Figure 10(b)). The nonlinear mapping was generated for each image specifically by fitting a cubic curve from some randomly selected control points in the rightbottom half plane as shown in Figure 10(c) and other four fixed control points, i.e., (0,0), (0.1,0.1), (0.9,0.9) and (1,1).
Original  IBNet (FAMEDNet)  

PSNR (dB)  Indoor  15.55  28.44 
Outdoor  16.13  26.10  
Average  15.84  27.27  
SSIM  Indoor  0.7021  0.9316 
Outdoor  0.7526  0.8959  
Average  0.7273  0.9137 
ViA2 Experimental Results
We evaluated the proposed IBNet for illumination balancing on RESIDE TestSetS generated according to Section VIA1. The results are listed in Table VIII. As can be seen, IBNet, an incarnation of FAMEDNet, achieved good restoration accuracy by enhancing the unbalanced distributed illumination. Some subjective visual inspection examples are shown in Figure 11. As can be seen, the enhancement results of IBNet on the dehazed images are more visually pleasing, e.g., the illumination has been balanced and details are revealed. However, the results also exhibits a few amount of color distortions constrained by the unrealistic synthetic mappings. In future work, we will collect realworld lowlight dataset for training a better model.
ViB More Subjective Comparisons
More subjective comparisons of FAMEDNet and several stateoftheart methods including DCP [26], DehazeNet [18], MSCNN [19], AODNet [20], FPCNet [21], GFN [22] and DCPDN [23] on realworld hazy images are shown in Figure 12 and Figure 13. As can be seen, FAMEDNet produced better or at least comparable results to stateoftheart methods with clear details, less color artifacts, and high fidelity in sky regions.
More subjective comparisons of FAMEDNet and DCP [26], AODNet [20] and FPCNet [21] on hazefree images are shown in Figure 14. These results demonstrate that FAMEDNet learned a much effective statistical regularity than DCP, FPCNet and AODNet. Please refer Section VC4 in the paper for more details.
Comments
There are no comments yet.