Reconstruction Loss Minimized FCN for Single Image Dehazing

11/27/2018 ∙ by Shirsendu Sukanta Halder, et al. ∙ IIT Roorkee 0

Haze and fog reduce the visibility of outdoor scenes as a veil like semi-transparent layer appears over the objects. As a result, images captured under such conditions lack contrast. Image dehazing methods try to alleviate this problem by recovering a clear version of the image. In this paper, we propose a Fully Convolutional Neural Network based model to recover the clear scene radiance by estimating the environmental illumination and the scene transmittance jointly from a hazy image. The method uses a relaxed haze imaging model to allow for the situations with non-uniform illumination. We have trained the network by minimizing a custom-defined loss that measures the error of reconstructing the hazy image in three different ways. Additionally, we use a multilevel approach to determine the scene transmittance and the environmental illumination in order to reduce the dependence of the estimate on image scale. Evaluations show that our model performs well compared to the existing state-of-the-art methods. It also verifies the potential of our model in diverse situations and various lighting conditions.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 6

page 8

page 9

page 10

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Images captured in outdoor scenarios are frequently affected by natural phenomena like haze or fog. The consequences include, degradation of the scene visibility and colour shift of the image. These effects occur due to the presence of minuscule particles in the atmosphere which hamper the passage of light by absorption and reflection Koschmieder (1924). In addition to hampering the passage of light, these particles also create a semi-transparent layer of light that affects the visibility. This layer is referred to as airlight and it directly depends on the transmittance of the medium. The technique of reconstructing clear haze-free images by mitigating the deteriorating effects of haze is known as image dehazing as given in Fig. 1.

Figure 1: Removal of haze by estimating the scene transmittance and airlight map using our proposed method

The problem of image dehazing is an ill-posed one because, the degradation due to haze depends on the scene depth which is non-uniform and unknown at different positions of the image. The primary works on image dehazing took the route of image contrast enhancement Stark (2000); Oakley and Bu (2007). Subsequently, diverse methods have been proposed in the literature that models the statistical and physical cues to estimate the scene transmittance and the environmental illumination. Single-image dehazing methods Tan (2008); He et al. (2011); Fattal (2014); Berman et al. (2016) are getting a good amount of attention over multiple image-based ones Narasimhan and Nayar (2002, 2003)

due to their practical significance. With the recent success of Convolutional Neural Networks (CNN) in Computer Vision

Krizhevsky et al. (2012); Simonyan and Zisserman (2014); Dong et al. (2014), the problem of image dehazing have also been explored in the light of CNNs Cai et al. (2016); Li et al. (2017); Ren et al. (2016). The main advantage of using CNNs is its ability to learn features from a diverse and large set of data without any human intervention.

Existing dehazing methods mainly focus on the estimation of scene transmittance and do not stress on the correct estimation of environmental illumination. So, in this work we try to estimate both scene transmittance and environmental illumination from image patches. The contributions of our work can be summarized as follows:

  • Design of a two-way forked Fully Convolutional Network that simultaneously estimates scene transmittance and environmental illumination.

  • A novel custom-defined reconstruction loss that conforms to the atmospheric scattering model.

  • A multilevel approach for inferring scene transmittance and environmental illuminanation to alleviate the problem of varied scale in input image.

The rest of the paper is arranged as follows. Section 2 describes the related works present in literature. The basics of the image formation in the presence of a scattering medium is described in Section 3. Section 4 describes our scene illumination and transmittance estimation network, while Section 5 explains our dehazing method. The training data generation and experimental settings are reported in section 6. In Section 7, we provide a comparative analysis (both quantitative and qualitative) of our proposed method. Section 8 consists of concluding remarks.

2 Related Work

Image dehazing is considered a challenging problem to solve as the degradation depends on the depth. The first thing that one can observe in a hazy image is its reduced contrast. Hence, earlier methods approached the problem using image enhancement techniques like contrast enhancement Stark (2000); Oakley and Bu (2007). These methods fail to produce satisfactory results in practical scenarios as they don’t take into account the change in haze density with varying depth. Narasimhan and Nayar Narasimhan and Nayar (2002, 2003) took the help of multiple images taken under different weather conditions to estimate the depth. This depth is then used for dehazing. Single image methods are receiving a lot of attention these days. These methods incorporate the use of additional priors for depth estimation. The method of He et al. He et al. (2011) is based on the observation that for outdoor clear images, in most local patches there are some pixels with very low intensity in at least one of the color channel. But during haze this value increases due to the added airlight . This prior information, denoted as the the dark channel prior (DCP), is utilized to estimate the scene transmittance. Tang et al. Tang et al. (2014) utilized existing hand-crafted features like local max contrast, dark channel, hue disparity and local max saturation in patches to learn a scene transmittance regressor. Fattal Fattal (2014) based his work on local color line prior. It says that for clear images, colors in a patch form a line in the RGB space, and this line passes through the origin. Under hazy conditions, this line gets shifted by the airlight depending on the amount of haze. This information is utilized to estimate the transmittance. The work of Berman et al. Berman et al. (2016) relies on the assumption that the colors of a natural haze-free image form a few hundred tight clusters in the RGB space. Under haze, these clusters get elongated and form linear structures. These lines, termed haze-lines, are employed to estimate the transmittance factors. Although these methods produce good results for certain images, they fail when their assumptions are broken.

The recent success of Convolutional Neural Networks (CNN) in the domain of Computer Vision Dong et al. (2014); Krizhevsky et al. (2012) has encouraged its use in the problem of Image Dehazing Li et al. (2017); Song et al. (2017). CNN based dehazing methods directly regress on transmittance by learning to extract features from the data, instead of relying on hand-crafted features. Dehazenet proposed by Cai et al. Cai et al. (2016) works on image patches similar to Tang et al. Tang et al. (2014), but employ a CNN to extract haze relevant features. Ren et al. Ren et al. (2016) works with full images to estimate the transmittance using a Multi-Scale CNN. They use two networks: a coarse network to estimate the transmittance map and a fine network for refining the estimated transmittance. Li et al. Li et al. (2017) reformulated the atmospheric scattering model Koschmieder (1924) so that the model contains a single parameter. This unified parameter integrates both scene transmittance and environmental illumination. This parameter is regressed using a CNN named AOD-Net.

All the above mentioned methods restrict themselves to daytime scenes where there is a single light source (the sun). Dehazing night-time scenes are more complicated due to the presence of non-uniform illumination. This is normally handled by using a spatially varying atmospheric map. Li et al. Li et al. (2015) went a bit further and proposed to add a glow term to the imaging model, that takes into account the glow effect around the light sources. The method of Santra and Chanda (2016) is an unified dehazing method that works for both day and night time images using a relaxed atmospheric model.

Figure 2: Proposed network for the estimation of the scene transmittance and the environmental illumination

3 Imaging Model under Haze

Light that propagates through a medium gets attenuated due to the scattering by the particles present in the medium. The image thus formed has a lesser contrast and a dull colour composition. This phenomenon is modelled by the following equation Koschmieder (1924),

(1)
(2)

Here, is the observed intensity of the image in RGB, whereas is the scene radiance in RGB without the effect of scattering. is the global environmental illumination. denotes the scene transmittance of the light which represents the amount of light that reaches the observer without getting scattered. is the scattering coefficient, and is the scene depth. In Eq. 1, is taken constant assuming the image is taken during the day with overcast sky, which is a common situation for haze and fog. However, that may not always be true. If the sunlight is dominant or if the image is taken during the night, this assumption is violated Narasimhan and Nayar (2002); He et al. (2011). Therefore, in order to tackle this issue of non-uniform illumination, we use a modified version of Eq. 1 for our purpose:

(3)

Most of the existing methods work by taking small patches and assume the transmittance to be constant within the patch. The environmental illumination is estimated separately. Our method also works using image patches. However, we aim to estimate the scene transmittance and the environmental illumination simultaneously. But, estimating environmental illumination in a small patch is difficult, as it is hard to differentiate whether the colors are due to the environmental illumination or the object present in the patch. For this reason, we work using bigger patches. But in bigger patches the constant transmittance assumption gets violated. So, for our method we assume transmittance can vary within a patch but the environmental illumination remains constant. This allows for different illumination estimates for different patches. This relaxation helps us to tackle the dehazing of night-time images where we have non-uniform illumination due to artificial lights. Now, to be able to dehaze an image, we need to estimate the constant airlight within a patch and the scene transmittance map of the same size as the input patch, as we have assumed the transmittance can vary within a patch. This is achieved by using a Fully Convolutional Network (FCN) that estimates both and from patches.

4 Proposed Solution

In the following subsections, we describe the architecture of our proposed network and the loss function we have used to train the network.

4.1 Dehazing Network

For the joint estimation of the environmental illumination and the scene transmittance, we propose a two-way forked FCN (Fig. 2). The initial four convolution layers of this network extract features for both and . This is done with the aim of capturing the interdependence between and . The network then bifurcates into two different sections, one estimates the and the other . The number of transposed convolutional layers is kept the same as the number of convolutional layers in each path. Some skip connections has also been added in each forked path. These skip connections help to compensate for the loss of fine-scale detail due to convolutions. We use the tanhactivation function for all the layers, except for the last layer in both the paths. In the last layer we use sigmoid to squash the output values between

. Every transposed convolutional layer is followed by a Batch-Normalization layer to reduce over-fitting.

4.2 Reconstruction Error Minimized Loss

To train a regressor, it is common to choose mean squared error (MSE) as the loss function. But it is not a good choice for image dehazing. A small error in the estimated can have a substantial impact on the dehazed output. This becomes significant as the value of t goes to 0, that is in the areas with dense haze. To address this problem we have formulated a custom loss function based on the imaging model (Eq. 1). We define the total loss as follows.

(4)

where,

,

,

,

(5)

Here , are the input hazy image and ground truth clean image respectively. and are ground truth transmittance map and ground truth environmental illumination, while and are the transmittance map and environmental illumination obtained from the network. denotes the number of pixel in the patch. With these three losses we try to minimize the loss of reconstructing hazy image from the ground truth haze-free images in three different ways.

  1. : with predicted , but ground-truth

  2. : with predicted , but ground-truth

  3. : with predicted and

Using only the network is likely to get stuck at trivial solutions like and . The addition of and prevents these situations. These two also guide the prediction towards the actual values. But, on the other hand as the value of increases, the effect of in the image reduces, because goes towards zero. In this case the network can learn to output some arbitrary . To address this issue we have reduced the importance of and using the term as we are using the predicted in these cases. We don’t do this for as we are using ground-truth in it. Note that, for and , it can be easily shown that . For our method we have taken to be 15.

5 Dehazing Method

Our dehazing method consists of four main steps:

  1. Multilevel estimation of and .

  2. and aggregation.

  3. Regularization and interpolation.

  4. Haze-free image recovery.

Each step is described in detail in the following subsections.

5.1 Multilevel Estimation of and

We first estimate and from different patches of the input image using our dehazing network. For that we consider only patches (overlapping) that are not smooth. Now, depending on the resolution of the image, a same sized patch can cover a different amount of area. So, for a given input image, if we just take patches out of it and feed to the network, the accuracy of the obtained estimate can vary depending on the resolution of the input image. For this reason, we estimate the two parameters ( and ) at multiple levels by taking patches of different sizes from the input image.

For the multilevel estimation, we begin with a patch size of . Where, for an image of size . These patches are resized to and fed to the dehazing network. This resize is being done as the network takes patches as input. The obtained and for a patch is re-sized back to the original size (). Now, for the second level we take patches of size and estimate the two parameters in the same way. This procedure is repeated until the size of the patch we are taking falls below . The number of levels thus obtained is given by:

(6)

5.2 and Aggregation

The estimates obtained from patches needs to be aggregated to obtain the full sized and maps before we are able to dehaze an image. Therefore, in each level we first aggregate the patches by averaging the values to obtain full sized maps. Now, the estimates obtained at each level are aggregated using weighted average to get the overall estimates as follows:

(7)
(8)

Here, and represent the maps obtained at level, while denotes the total number of levels. Note that, for our case we have taken the weights as 1.

5.3 Regularization and Interpolation

Due to the patch-based processing of the images, the overall estimated and contain halos at patch borders (Fig. 3). Also, we have not considered smooth patches in the estimation step. So, at those pixels we don’t have the estimates. To alleviate this problem, we interpolate and regularize the overall estimates. This is done using a Laplacian based regularizer similar to Fattal Fattal (2014). This is achieved by maximizing the following Gauss-Markov random field model,

(9)

where the overall estimate and denotes the neighborhood of . is 1 where estimates are available and 0 otherwise. This specific regularizer helps in smoothing the estimates at the same time retaining sharp profile along the edges depending on the input image. Both the transmittance and environmental illumination (each channel separately) is smoothed with this model.

5.4 Haze-free Image Recovery

Using the smoothed and map,we recover the dehazed image using the following equation,

(10)

Note that, to ensure the value of stays within the valid range, we clip the values of the present in the denominator.

(a) Without regularization
(b) With regularization
Figure 3: Effect of using regularizer on the and map
Dehazenet Cai et al. (2016) Berman et al.Berman et al. (2016) AOD-Net Li et al. (2017) MSCNN Ren et al. (2016) Ours
Church 14.64/0.82/20.45 15.69/0.88/16.91 9.44/0.61/34.64 14.18/0.85/20.26 18.52/0.89/13.54
Couch 16.71/0.83/14.34 17.28/0.86/14.19 16.79/0.82/17.33 18.02/0.87/12.92 18.69/0.87/12.31
Flower1 19.82/0.94/16.72 12.15/0.71/21.00 12.21/0.79/29.42 9.08/0.42/24.65 15.95/0.72/15.45
Flower2 19.44/0.91/15.37 11.86/0.67/21.17 13.13/0.78/25.27 10.82/0.59/22.46 20.58/0.88/11.74
Lawn1 13.80/0.81/23.01 14.78/0.83/17.93 11.33/0.67/31.74 14.38/0.80/21.00 16.05/0.83/18.65
Lawn2 13.61/0.81/22.47 15.32/0.85/17.81 10.98/0.66/31.70 13.30/0.76/22.27 16.55/0.85/19.78
Mansion 17.39/0.84/17.42 17.34/0.87/15.84 14.23/0.69/24.01 17.70/0.87/17.53 20.71/0.93/12.08
Moebius 19.18/0.94/16.38 14.59/0.83/22.40 13.22/0.76/27.61 16.38/0.89/19.86 16.94/0.79/16.53
Raindeer 17.87/0.85/13.73 16.60/0.80/15.28 16.54/0.79/18.50 16.83/0.80/15.49 20.15/0.89/13.16
Road1 13.74/0.79/22.20 16.33/0.87/19.06 11.75/0.65/29.32 14.13/0.82/22.22 17.67/0.89/18.38
Road2 13.22/0.77/23.43 18.23/0.89/16.83 11.95/0.61/30.96 16.45/0.86/20.18 17.49/0.78/16.63
Average 16.31/0.84/18.68 15.47/0.82/18.03 12.87/0.71/27.31 14.66/0.77/19.89 18.18/ 0.84/15.79
Table 1: Quantitative comparison of PSNR/SSIM/CIEDE2000 values on images from Fattal’s dataset
(a) Hazy
(b) Dehazenet
(c) Berman et al.
(d) AOD-Net
(e) MSCNN
(f) Ours
(g) Ground truth
Figure 4: Visual comparison of images from Fattal’s dataset Fattal (2014): Church, Couch, Lawn1, Mansion, Road1

6 Data Generation and Settings

In this section we describe the procedure of the training data generation for our proposed network and the experimental settings under which we evaluate our results.

6.1 Data Generation

One of the primary hurdles that one faces while working on image dehazing is the absence of a relevant dataset. It is logistically difficult to acquire a pair of clear and hazy image of the same scene. To circumvent this issue, we have synthesized images with known depth maps to create our training dataset. We utilize Eq. (2) to obtain the scene transmittance maps using known depths and Eq. (3) to generate the hazy images using the scene transmittance and environmental illumination. We have used the NYU Depth Dataset V2 Silberman et al. (2012) for this purpose. NYU-V2 contains 1449 indoor images along with their depth maps, captured using Microsoft Kinect. To generate the the transmittance maps using Eq. 2, we have taken in the interval . The usage of beyond this range is avoided, because it results in either very thin or very thick haze. For the environmental illumination, we have taken values between

for each channel. From the generated hazy images, we have extracted patches and have taken only the ones with variance greater than

. This is being done on the ground that smooth patches do not contain much information.

6.2 Experimental Settings

All the results we report here is obtained on a machine with an Intel® Xeon® 3.1GHz octa core CPU having 64 GB RAM, Nvidia® Tesla™ C2075 and running on Ubuntu 16.04. The dehazing network is built using Keras: The Python Deep Learning library, with Tensorflow as the backend. We have used the Adagrad

Duchi et al. (2011) optimizer with a learning rate of . Under these settings, the network has been trained with

sized input patches for 150 epochs with batch size of 32.

7 Results and Evaluations

In this section we have evaluated the effectiveness of our method on quantitative as well as qualitative grounds. We have reported ours results on synthetic images, real-world images and also results on night-time hazy images. We have compared the results with existing state-of-the-art methods like Dehazenet Cai et al. (2016), Berman et al.Berman et al. (2016), AOD-Net et al.Li et al. (2017) and MSCNN Ren et al. (2016). Apart from Berman et al.Berman et al. (2016), rest are CNN-based methods. The results we report here are generated using the code given by the respective authors.

7.1 Synthetic Images

In order to evaluate our proposed method in qualitative and quantitative terms, we use two different dataset with synthetically generated hazy images: Fattal’s dataset Fattal (2014) and Middlebury part of the D-hazy dataset Ancuti et al. (2016). We don’t use its NYU section as the network has been trained with it. Fattal’s dataset Fattal (2014) consists of synthetic indoor and outdoor hazy images and their corresponding haze-free images. Middlebury part of D-hazy dataset contains high resolution indoor images. We provide evaluations of some images from both the datasets (Fattal (2014) & Scharstein et al. (2014)).

Dehazenet Cai et al. (2016) Berman et al.Berman et al. (2016) AOD-Net Li et al. (2017) MSCNN Ren et al. (2016) Ours
Cable 8.14/0.64/29.46 9.94/0.60/24.11 6.95/0.6/32.64 7.65/0.62/29.44 7.88/0.6/32.13
Couch 11.49/0.62/19.01 13.77/0.68/16.50 10.56/0.61/21.11 10.13/0.60/23.16 12.13/0.66/19.97
Piano 15.75/0.78/15.62 15.07/0.76/15.146 13.89/0.74/13.93 12.39/0.70/17.34 15.79/0.75/15.07
Playroom 14.57/0.78/15.17 17.64/0.80/10.10 13.24/0.76/14.25 13.42/0.76/15.07 14.52/0.6/14.85
Shopvac 8.00/0.64/30.70 11.58/0.75/19.25 6.89/0.61/35.22 7.62/0.60/32.43 8.89/0.568/26.59
Average 11.59/0.69/21.99 13.60/0.72/17.02 10.31/0.67/23.43 10.24/0.60/23.49 11.81/0.67/21.75
Table 2: Quantitative comparison of PSNR/SSIM/CIEDE2000 values on images from Middlebury dataset
(a) Hazy
(b) DehazeNet
(c) Berman et al.
(d) AOD-Net
(e) MSCNN
(f) Ours
(g) Ground truth
Figure 5: Visual comparison of images from Middlebury dataset Scharstein et al. (2014): Cable, Couch, Piano, Playroom, Shopvac

To measure how good an image has been dehazed we have used metrics like peak-signal-to-noise ratio (PSNR) and the structural similarity index (SSIM). A high value of these two indicates a better dehazed result. Apart from these two, we have used CIEDE2000 Sharma et al. (2005) for measuring how well the colors have been restored. Its low value indicates that the resultant colors are close to the actual ones.

We demonstrate our results on Fattal’s dataset in Figure. 4. We can observe from this figure that Dehazenet Cai et al. (2016) has not been able to properly dehaze the images, especially in cases of outdoor images (see Church, Lawn1 and Road1). Berman et al.Berman et al. (2016) is able to eliminate the haze to a certain extent, but it fails at removing dense haze (notice the background of Lawn1 and Road1). AOD-Net Li et al. (2017) tends to saturate the images. Ren et al.Ren et al. (2016) performs a little better but it retains more haze compared to Berman et al. Berman et al. (2016)(see Lawn1 and Road1). Our method has not only been able to remove the haze efficiently from both the foreground and the background, but it also does not hallucinate any colours. This is mainly because we estimate have estimated the correctly. The competence of our method is visible from the visual comparison and is also clearly indicated by the quantitative results in Table. 1.

For the Middlebury dataset, the comparisons are demonstrated in Fig. 5. We notice that both MSCNN Ren et al. (2016) and AOD-Net Li et al. (2017) have not been able to extenuate the haze fully and haze is visible specially at sharp edge discontinuities where they leave haze to a significant extent. Dehazenet Cai et al. (2016) removes haze a little better, but can not nullify it completely. The method of Berman et al. Berman et al. (2016) and our method are successful in alleviating the haze from the images. Our method performs better when it comes to removing haze as visible from Cable, Couch, Piano and Shopvac compared to Berman et al. Berman et al. (2016), but tends to saturate the images. Our method performs better than all the other CNN-based methods as validated by Table. 2. But the over-saturation accounts for the lower Average PSNR and SSIM values when compared to Berman et al. Berman et al. (2016).

(a) Hazy
(b) Dehazenet
(c) Berman et al.
(d) AOD-Net
(e) MSCNN
(f) Ours
Figure 6: Visual comparison on real-world images: New York, Building and Tiananmen
(a) Haze
(b) Li et al.
(c) Santra and Chanda
(d) Ours
Figure 7: Comparison of night-time dehazing

7.2 Real World Images

In order to establish the efficacy of our model, we have qualitatively compared the results of dehazing real-world benchmark images. We have also include a comparison of night-time dehazing. Figure.6 shows the result on real-world hazy images: New York, Building and Tiananmen. Dehazenet Cai et al. (2016) and MSCNN Ren et al. (2016) are not able to clear the haze layer entirely due to under-estimation of the thickness of the haze. Due to this, dehazing results from both the methods tend to have a dull contrast (specifically visible in New York and Tiananmen). Berman et al.Berman et al. (2016) is able to mitigate the haze effectively and enhance the visibility. However, in the process, it over-saturates the contrast and tends to produce some colour distortions. For example see the sky region of New York which appears to be whiter than it actually is and occludes the top of the skyscrapers. Also in Building, Berman et al. (2016) tends to hallucinate a light-purple colour for the skies. The results from AOD-Net Li et al. (2017) do not produce any colour distortions or unwanted artifacts in the first two images but leaves some haze. While in Tiananmen, it envelopes the whole image by a yellowish layer. In distinction, our method produces images which are comparatively the least hazy while maintaining the clarity, colour and contrast composition and keeping the crisp details intact.

Our method has been designed taking into consideration the scenario of night-time dehazing. To establish the effectiveness of our method in this situation we provide qualitative comparison with night-time dehazing methods Li et al. (2015); Santra and Chanda (2016). Fig. 7 exhibits some comparison . Li et al.Li et al. (2015) tends to over-sharpen the images and create noise in form of grains that is visible especially around the areas of illumination. The dehazed images from Santra and Chanda Santra and Chanda (2016) deviates from the normal colour composition by over-saturation and anomalous colour hallucination as visible from the results. The light from the street light is yellow but the result displays a green tinge. Our proposed method is efficient in the removal of haze without introducing of artifacts and colour incoherence.

Dataset MSE + + + ++
Fattal 15.5/0.5/19.4 12.6/0.3/20.5 16.1/0.5/17.0 6.3/0.2/34.8 12.8/0.4/22.0 18.2/0.8/15.8
Middlebury 8.6/0.4/32.0 6.6/0.3/38.5 8.0/0.4/33.0 3.4/0.3/53.1 7.35/0.4/35.6 11.8/0.6/21.7
Table 3: Average PSNR/SSIM/CIEDE2000 values of different loss function on Fattal’s and Middlebury dataset
(a) Hazy
(b) MSE
(c)
(d) +
(e) +
(f) +
(g) ++
Figure 8: Visual comparison of Lawn2 from Fattal’s dataset and Piano from Middlebury dataset using different loss functions

7.3 Failure Cases

Figure 9: Failure on aerial(top) and cityscape(bottom). Left to Right: Input image, dehazed image, scene transmittance map, airlight map

Our proposed method performs well in diverse lighting conditions and both indoor and outdoor situations which is evident from our evaluations. However, there are some cases in which our model fails to produce satisfactory results. We demonstrate some examples in Figure. 9. In aerial image, our method is able to estimate the scene transmittance correctly but changes the colour of the final dehazed image due to incorrect airlight estimation. While in cityscape, our method is not able to estimate the scene transmittance correctly. As we can observe that the scene transmittance tends to stay constant after a certain point which is why the removal of haze is inadequate in the distant parts of the image. The airlight map is also anomalous which accounts for the purple shade of the dehazed output.

7.4 Ablation Studies

We have already stated, we use a custom defined loss function to train our network. In this subsection, we quantitatively show the improvement we get from moving away from MSE. We also show all the three components of our loss is necessary for correct estimation. To compare, we train our network with each of the following losses independently in addition to our original loss (, and ): MSE loss on and , only , and , and , and . Trained with each of the losses, we compute the PSNR, SSIM and CIEDE2000 values for both Fattal and Middlebury dataset. Table. 3 show the quantitative results. For visual comparison, we have included result of two images in Fig. 8. In can be seen, the model trained using and tends to estimate very small values for the transmittance and as a result the output is almost white. This is because the network was trained without the supervision of ground-truth . For MSE loss the images are dehazed where the haze is thin, but it had failed where haze is thick. This validates our observation that MSE can fail at places where the value of is small (thick haze). The model with only achieves similar results to MSE, but the color is worse as it has not been able to estimate the environmental illumination correctly. MSE performs better in this regard as it had access the ground-truth illumination during training. Networks trained on + produce very bad results especially in outdoor images as it can be seen in Lawn 2. This happens because the network was trained without the supervision of ground-truth as a result it could not estimate the environmental illumination. This also shows the dependence of the two haze parameters. The network trained with + gives the second best result as the network learns with ground-truth and . But it could not match the combination of all three losses as it didn’t capture the dependence of the two parameters.

8 Conclusions

In this paper, we have proposed to approach the problem of image dehazing by jointly estimating the scene transmittance and the environmental illumination map from image patches. Haze-relevant features are extracted using a two-way forked FCN that is trained by minimizing a novel loss function. The loss is based on the imaging model, therefore, it takes into consideration the relationship between scene transmittance and the environmental illumination. We have also shown that all parts of the loss function is necessary for correct estimation of the haze parameters. Although the method estimates the environmental illumination, due to patch based processing it fails at some cases. Using full images to estimate the environmental illumination can improve the results. While the FCNs can work independent of input image size but the trained network always depends on the scale of training data. This issue is yet to be solved.

References

  • Ancuti et al. [2016] C. Ancuti, C. O. Ancuti, and C. De Vleeschouwer. D-hazy: A dataset to evaluate quantitatively dehazing algorithms. In 2016 IEEE International Conference on Image Processing (ICIP), pages 2226–2230, Sept 2016.
  • Berman et al. [2016] D. Berman, T. Treibitz, and S. Avidan. Non-local image dehazing. In

    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    , 2016.
  • Cai et al. [2016] B. Cai, X. Xu, K. Jia, C. Qing, and D. Tao. Dehazenet: An end-to-end system for single image haze removal. IEEE Transactions on Image Processing, 25(11):5187–5198, Nov 2016. ISSN 1057-7149. doi: 10.1109/TIP.2016.2598681.
  • Dong et al. [2014] Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang.

    Learning a deep convolutional network for image super-resolution.

    In Computer Vision – ECCV 2014, pages 184–199, Cham, 2014. Springer International Publishing.
  • Duchi et al. [2011] John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization.

    Journal of Machine Learning Research

    , 12(Jul):2121–2159, 2011.
  • Fattal [2014] Raanan Fattal. Dehazing using color-lines. ACM Trans. Graph., 2014.
  • He et al. [2011] K. He, J. Sun, and X. Tang. Single image haze removal using dark channel prior. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011.
  • Koschmieder [1924] Harald Koschmieder. Theorie der horizontalen sichtweite. Beitrage zur Physik der freien Atmosphare, pages 33–53, 1924.
  • Krizhevsky et al. [2012] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, 2012. URL http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf.
  • Li et al. [2017] Boyi Li, Xiulian Peng, Zhangyang Wang, Jizheng Xu, and Dan Feng. Aod-net: All-in-one dehazing network. In The IEEE International Conference on Computer Vision (ICCV), Oct 2017.
  • Li et al. [2015] Y. Li, R. T. Tan, and M. S. Brown. Nighttime haze removal with glow and multiple light colors. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 226–234, Dec 2015. doi: 10.1109/ICCV.2015.34.
  • Narasimhan and Nayar [2003] S. G. Narasimhan and S. K. Nayar. Contrast restoration of weather degraded images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(6):713–724, June 2003. ISSN 0162-8828. doi: 10.1109/TPAMI.2003.1201821.
  • Narasimhan and Nayar [2002] Srinivasa G. Narasimhan and Shree K. Nayar. Vision and the atmosphere. International Journal of Computer Vision, 2002.
  • Oakley and Bu [2007] J. P. Oakley and H. Bu. Correction of simple contrast loss in color images. IEEE Transactions on Image Processing, 16(2):511–522, Feb 2007. ISSN 1057-7149. doi: 10.1109/TIP.2006.887736.
  • Ren et al. [2016] Wenqi Ren, Si Liu, Hua Zhang, Jinshan Pan, Xiaochun Cao, and Ming-Hsuan Yang. Single image dehazing via multi-scale convolutional neural networks. In European Conference on Computer Vision, 2016.
  • Santra and Chanda [2016] S. Santra and B. Chanda. Day/night unconstrained image dehazing. In 2016 23rd International Conference on Pattern Recognition (ICPR), pages 1406–1411, Dec 2016.
  • Scharstein et al. [2014] Daniel Scharstein, Heiko Hirschmüller, York Kitajima, Greg Krathwohl, Nera Nesic, Xi Wang, and Porter Westling. High-resolution stereo datasets with subpixel-accurate ground truth. In GCPR, Lecture Notes in Computer Science. Springer, 2014. URL http://dblp.uni-trier.de/db/conf/dagm/gcpr2014.html#ScharsteinHKKNWW14.
  • Sharma et al. [2005] Gaurav Sharma, Wencheng Wu, and Edul N. Dalal. The ciede2000 color-difference formula: Implementation notes, supplementary test data, and mathematical observations. Color Research & Application, 2005. ISSN 1520-6378. URL http://dx.doi.org/10.1002/col.20070.
  • Silberman et al. [2012] Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. Indoor segmentation and support inference from rgbd images. In Andrew Fitzgibbon, Svetlana Lazebnik, Pietro Perona, Yoichi Sato, and Cordelia Schmid, editors, Computer Vision – ECCV 2012, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg.
  • Simonyan and Zisserman [2014] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, 2014.
  • Song et al. [2017] Y. Song, J. Li, X. Wang, and X. Chen. Single image dehazing using ranking convolutional neural network. IEEE Transactions on Multimedia, 2017.
  • Stark [2000] J. A. Stark. Adaptive image contrast enhancement using generalizations of histogram equalization. IEEE Transactions on Image Processing, 9(5):889–896, May 2000. ISSN 1057-7149. doi: 10.1109/83.841534.
  • Tan [2008] R. T. Tan. Visibility in bad weather from a single image. In 2008 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8, June 2008.
  • Tang et al. [2014] K. Tang, J. Yang, and J. Wang. Investigating haze-relevant features in a learning framework for image dehazing. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.