Progressive Retinex: Mutually Reinforced Illumination-Noise Perception Network for Low Light Image Enhancement

11/26/2019 ∙ by Yang Wang, et al. ∙ Shandong University The University of Sydney USTC 20

Contrast enhancement and noise removal are coupled problems for low-light image enhancement. The existing Retinex based methods do not take the coupling relation into consideration, resulting in under or over-smoothing of the enhanced images. To address this issue, this paper presents a novel progressive Retinex framework, in which illumination and noise of low-light image are perceived in a mutually reinforced manner, leading to noise reduction low-light enhancement results. Specifically, two fully pointwise convolutional neural networks are devised to model the statistical regularities of ambient light and image noise respectively, and to leverage them as constraints to facilitate the mutual learning process. The proposed method not only suppresses the interference caused by the ambiguity between tiny textures and image noises, but also greatly improves the computational efficiency. Moreover, to solve the problem of insufficient training data, we propose an image synthesis strategy based on camera imaging model, which generates color images corrupted by illumination-dependent noises. Experimental results on both synthetic and real low-light images demonstrate the superiority of our proposed approaches against the State-Of-The-Art (SOTA) low-light enhancement methods.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 6

page 7

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Insufficient illumination in image capturing can significantly degrade the quality of images from many aspects, such as low visibility, contrast degradation, and high-level noise. These degradations not only cause unpleasant visual perception, but also hurt the performance of many computer vision systems which are designed for normal-light images. The Retinex that is motivated by human visual system (HVS) is an effective low-light image enhancement algorithm, providing color constancy and dynamic range compression. It assumes that observed images can be decomposed into the reflectance and illumination, denoted as

, where is the reflectance determined by the characteristics of objects, and is the illumination at each pixel which depends on the ambient light.

Retinex decomposition is known to be a mathematically ill-posed problem. One feasible solution for Retinex is to make assumptions of the statistics of illumination and thus to devise specialized regularities for minimization. For example, some early heuristic Retinex algorithms work by assuming some regularities in the colors of natural objects viewed under canonical illumination,

e.g., Multi-scale Retinex (MSR) (Jobson et al., 1997) and multi-scale Retinex with color restoration (MSRCR) (Rahman et al., 2004). Another practicable approach is to learn the statistical regularities, which formulates regression models of how the pixels related to illumination are distributed. Thus, the regression models indeed reveal the statistics of the pixels, and results in more general representations ((Wei et al., 2018) (Lore et al., 2017)). Our method also falls into the latter category.

As shown in Fig. 1(a), image noise inevitably exists in low-light image, due to dark current and electronics shot in camera imaging (Liu et al., 2008)(Tsin et al., 2001). To this end, M. Li et al. introduces a noise term into the classic Retinex model to better formulate images captured under low-light conditions (Li et al., 2018). Mathematically,

(1)

Eq.(1) implies that Retinex based low-light image enhancement indeed includes two tasks: contrast enhancement (determined by illumination map) and noise suppression (determined by noise level). The existing methods usually treat it as two separate tasks and solve them successively. For example, Joint-bilateral filter is applied to suppress the noise after the enhancement (Zhang et al., 2012). Guo et al. (Guo et al., 2017) attempts to further improve the visual quality by a post-processing via BM3D (Dabov et al., 2007). However, as shown in Fig. 2

, there are coupling relations between the two tasks, which manifest as: 1) the noise level is indeed dependent on the intensity of illumination, 2) the existence of noise significantly affects the statistical distribution of illumination. Ignoring the coupling relationship may lead to inaccurate estimation of illumination or noise, resulting in under- or over-smoothing enhancement results (Fig. 

1(b-e)).

To address this problem, this paper presents a novel progressive Retinex framework, in which illumination and noise of low-light image are perceived in a mutually reinforced manner to achieve Retinex-based image enhancement. Specifically, the statistical regularities of ambient light and image noise are modeled by two fully pointwise convolutional neural networks, and leveraged as constraints to facilitate the mutual learning process. The two modeling processes are progressively performed until obtaining stable results, and each of the preceding modeling processes benefits from the gradual improvement results in the other. The final estimations of illumination map and noise level are then leveraged for Retinex based image enhancement. Our proposed method not only suppresses the ambiguity between tiny textures and image noises, but also greatly improves the computational efficiency. Moreover, to solve the problem of insufficient training data, we propose a new data generation strategy based on camera imaging model. Experimental results on both synthetic and real low-light images demonstrate the superiority of our proposed method against the SOTA methods.

The contributions of this work can be summarized as:

We present a novel progressive Retinex framework, in which illumination and noise of low-light image are perceived in a mutual reinforced manner, leading to visual pleasing enhanced results.

We devise two fully pointwise convolutional neural networks to model the statistical regularities of ambient light and image noise respectively, and leverage them as constraints to facilitate the mutually learning process.

Experimental results on synthetic and real data both demonstrate the superiority of our method over the existing Retinex methods and the other SOTA low-light enhancement approaches.

Figure 3. Illustration of the pointwise convolution for statistical modeling in illumination estimation. The coordinates of the selected pixels is shown in (a). The statistical distribution of these pixels is shown in (b). With network going deeper, the selected representative pixels can be further refined as shown in (c). And the low-confidence pixels will be discarded as shown in (d). We obtain by projecting the feature map in pooling layer onto input. denotes the coordinates of the selected pixels by each pointwise kernel . represents the coordinates of all selected pixels on each layer, i.e., the union set of . Warm color represents high confidence.

2. Related Work

Extensive researches have been conducted to enhance the low-light images, which can be mainly divided into two categories: model-based methods and learning-based methods.

Model-based methods usually employ the model in Eq.(1) or its variants to decompose an image into reflectance and illumination. According to Eq.(1), the recovery of and from is an ill-posed inverse problem. A typical solution is to introduce a series of priors/assumptions of illumination and reflectance. In the PDE-based algorithm (Morel et al., 2010), this ill-posed decomposition problem is modeled as a poisson problem by assuming that the reflectance changes at sharp edges and the illumination varies smoothly. MSR (Jobson et al., 1997) and MSRCR (Rahman et al., 2004) incorporate illumination smoothness assumption and multi-scale information to get a robustness illumination estimation. In addition, The statistics of neighborhoods are used in (Forsyth, 1988) (Funt and Shi, 2010) (Joze et al., 2012) for illumination estimation. The illumination of each pixel is estimated as the mean or maximum value of all pixels within the neighborhood. Besides, LIME in (Guo et al., 2017) estimates the illumination of each pixel as the maximum value of R, G and B channels.

Learning-based methods aim to learn the regularities of illumination and reflectance from training data and leverage them to improve the generalization of Retinex algorithms. For example, L Shen et al. (Shen et al., 2017) proposes MSR-Net by combining the MSR (Jobson et al., 1997) with the feedforward convolution neural network. S Park et al. (Park et al., 2018)

constructs a dual autoencoder network based on the Retinex theory to learn the regularities of illumination and noise respectively. A deep Retinex-Net is proposed in

(Wei et al., 2018) to learn the key constraints including the smoothness of illumination, and the consistent reflectance shared by paired low/normal-light images. KG Lore et al. (Lore et al., 2017) proposes a LLNet to learn the contrast-enhancement and denoising relevant regularities simultaneously.

Image noise inevitably exists in low-light images, which will be amplified after contrast enhancement. L Li et al. (Li et al., 2015) improves the visual quality further by segmenting the observed image into superpixels and adaptively denoising different segments via BM3D (Dabov et al., 2007). After removing illumination effects, BM3D (Dabov et al., 2007) is executed in (Guo et al., 2017)(Shen et al., 2017) to suppress the amplified noise on dark regions. Besides, M Elad et al. (Elad, 2005) involves two bilateral filters on the modified Retinex model and transfers the illumination estimation and denoising into a progressive programming problem. However, the coupling relation between illumination and noise is neglect in these methods, which will lead to inaccurate estimation of illumination or noise, resulting in under or over-smoothing in the enhancement results.

Different from the existing methods, we propose a progressive framework to perceive the illumination and noise of low-light image in a mutually reinforced manner. Particularly, the perception models are achieved by fully pointwise convolutional units, which extract the representative pixels corresponding to illumination or noise from the input images. This modeling strategy can be interpreted as building unnatural representations for low-light images, which not only suppress the interference caused by ambiguity between tiny textures and image noises, but also improves the computational efficiency.

3. Proposed Method

3.1. Motivation

Referring to (Elad, 2005), low-light image enhancement task can be described as the following optimization problem:

(2)

where , , and are low-light image, illumination and reflectance, is a weight parameter. One feasible solution is to decompose the optimization into two sub-tasks, i.e., illumination estimation and noise level estimation, which can be optimized alternatively as follows:

(3)
(4)

where serves as a quadratic form regularized constraint which represents the local statistical properties of ambient illumination and image noise. Thus, modeling statistical regularities of an input low-light image plays an essential role in Retinex based image enhancement.

Modeling statistical regularities for low-light image enhancement can be interpreted as to select the most representative pixels for illumination and noise from input images, and thus to extract an unnatural representation for them. For low-light images, the representative pixels for illumination usually refer to the regions with high reflectance in the scene, which includes white (grey) or specularity area, e.g. sky, window and road surfaces, and thus have high intensity on all the three color channels (Zhang et al., 2017). Moreover, the noises existing in low-light images are mainly caused by dark current, electronics shot and photon in camera imaging (Liu et al., 2008)(Healey and Kondepudy, 1994), and are colored noises whose intensity distributions depend on ambient illumination. Therefore, the representative pixels for low-light noises should have different intensity distribution across three color channels from their neighborhoods.

Based on the above observation, we find that the intensity distribution across color channels can be leveraged for the extraction of representative pixels from low-light images. Therefore, we devise a novel CNN model consisting of pointwise convolutional units to represent the statistics for illumination and noise of low-light image. The principle of a pointwise convolutional layer is illustrated in Fig. 3

. The inherent correlations among the three color components of each pixel are firstly represented by pointwise convolution, and the statistically significant pixels are then selected in the representation space by Max pooling operation. The combined effect of multiple pointwise convolution layers is equivalent to multi-representation fusion and non-maximum suppression in different receptive fields. The statistics of the final selected representative pixels are modeled as regularity for Retinex decomposition.

Furthermore, due to the coupling relation between illumination and noise, there inevitably exists an intersection between the selected representative pixels for illumination and noise, which may introduce the interference for statistical modeling. To address this issue, we present a progressive learning mechanism, in which the representative pixels for illumination and noise are selected and statistically modeled in a mutually reinforced manner. The illumination map can be used as guidance to suppress the interference of intersection pixels for noise level estimation, and vice versa. As shown in Fig. 4, the two processes are progressive performed until getting stable results. Each of the preceding modeling processes benefits from the gradual improvement results in the other.

Figure 4. An overview of the proposed illumination-noise perception network for progressive Retinex framework.

3.2. Network Architecture

The proposed progressive framework consists of two subnetworks: IM-Net and NM-Net, for illumination estimation and noise level estimation respectively. Besides, we present a fully pointwise convolutional unit to select the representative pixels related to illumination or noise, and thus model the statistics of these pixels as the regularity in Eq.(3) and (4). Note that our method is quite different with the point-wise CNN in (Zhang and Tao, 2019)(Zhang et al., 2018a), which includes a pixel scrambling step and aims at achieving a lightweight CNN.

Pointwise convolution  By performing convolution on RGB channels, the pointwise kernel can map all pixels to the same representation space. Without introducing spatial structures, the feature responses are only related to the pixel intensity on RGB channels(Zhang and Tao, 2019)(Zhang et al., 2018a). The pointwise kernel produces strong feature responses for the pixels within a specific intensity distribution. As a result, these pixels can be selected by non-maximum suppression after performing max pooling operation. A typical example of pixel selection process during illumination estimation is shown in Fig. 3. As can be seen, every pointwise kernel can select a pixel set with similar intensity distribution. For example, as shown in Fig. 3(a), the kernel mainly concentrates on the yellow pixels within the bottom left corner of the image, while mainly selects the white pixels. By combining pointwise kernels together, the representative pixels for illumination can be selected completely, as shown in of Fig. 3(a). The pixel confidence is proportional to the feature response. The pixels with lower confidence cannot accurately reflect the illuminant properties. These pixels can be further refined in the second layer as shown in Fig. 3(c). With the network going deeper, the most representative pixels can be gradually selected. The statistical properties can be easily obtained from these pixels.

The Structure of IM-Net The goal of IM-Net is to estimate the illumination for a given low-light patch. The pixels with maximum intensity are most related to ambient illumination. In order to select the most representative pixels, we present a fully pointwise CNN, i.e., IM-Net, to learn the representation of ambient illumination. We introduce two parallel convolutional branches as shown in Fig. 4 to learn multi-scale features. A single branch consists of two parallel convolutional units to extract the statistics at each scale. Specifically, a convolutional unit contains a pointwise convolution process and a max pooling operation. Then, the features from the two branches are combined together as the input of the next convolutional unit. The details of IM-Net are shown in Table  1. Based on the selected representative pixels, IM-Net can accurately model the statistical regularity of ambient illumination and estimate the illumination.

Input Size Num Filter Stride pad
Conv-BP1 3x32x32 160 1x1 1 0
MaxPool-BP1 160x32x32 - 16x16 16 0
Conv-BP2 3x32x32 160 1x1 1 0
MaxPool-BP2 160x32x32 - 20x20 16 2
Conv-BP3 3x16x16 160 1x1 1 0
MaxPool-BP3 160x16x16 - 8x8 8 0
Conv-BP4 3x16x16 160 1x1 1 0
MaxPool-BP4 160x16x16 - 10x10 8 1
Concat 640x2x2 - - - -
Conv-DR1 640x2x2 80 1x1 1 0
MaxPool-DR1 80x2x2 - 2x2 2 0
Conv6 80x1x1 1 1x1 1 0
Table 1. The details of each component of IM-Net.
Input Size Num Filter Stride pad
Conv-NP1 3x32x32 160 1x1 1 0
MaxPool-NP1 160x32x32 - 4x4 4 0
Conv-NP2 160x8x8 160 1x1 1 0
MaxPool-NP2 160x8x8 - 4x4 4 0
Conv-DR2 160x2x2 80 1x1 1 0
MaxPool-DR2 80x2x2 - 2x2 2 0
Conv10 80x1x1 1 1x1 1 0
Table 2. The details of each component of NM-Net.

The Structure of NM-Net 

The goal of NM-Net is to estimate the noise variance for a given low-light patch. The variance on local patch can be formulated as:

(5)

where is the variance of all pixels within local patch, is the noise variance and is the texture variance. Conventional algorithms often treat all pixels equally and use the variance of all pixels within local patch to represent the noise variance. But it often leads to an overestimation of real noise level. In order to select out the most representative pixels for noise level estimation, we propose a fully pointwise CNN, i.e., NM-Net. The details of NM-Net are shown in Table  2. Based on the selected representative pixels, NM-Net can accurately model the statistical regularity of noise.

Progressive Mechanism  Because of the coupling relation between illumination and noise, there inevitably exists an intersection between the selected representative pixels for illumination and noise. In order to suppress the mutual effect between illumination and noise, we propose a progressive mechanism to perceive the inherent relation between illumination and noise. As shown in Fig. 4, the IM-Net is used as the first stage for illumination estimation. The input of the IM-Net in the current iteration is the estimated noise level from previous iteration, together with the original patch. Specifically, the noise level input of IM-Net in the first iteration is set to 0 manually. The noise level map has unified statistical property with noise relevant pixels, which can be used as a reference to suppress the response of noise relevant pixels in the first stage. Thus, the IM-Net can learn better representation for illumination distribution. After the first stage completed, the estimated illumination map is transmitted to the second stage, i.e., NM-Net, together with the original patch. The estimated illumination map can also be used as the reference to guide the NM-Net to suppress the response of illumination relevant pixels. Each of the preceding two processes benefits from the gradual improvement results in the other. We progressively perform the two processes until obtaining stable results.

During the test phase, the input low-light image is going through the network which produces the illumination map and noise level map. They are resized to the original size same to the input using bilinear interpolation. To further smooth the illumination map, guided image filtering is used to suppress the artifacts

(He et al., 2013). Besides, to quantitatively evaluate the accuracy of noise level estimation, we adopt BM3D (Dabov et al., 2007) denoising algorithm, which has only one crucial parameter–noise level, but achieves SOTA performance. Under the guidance of the estimated noise level map, BM3D algorithm can separate the noise from the high-frequency details of input image.

3.3. Loss Function

We used mean squared error to supervise the network, which can be written as:

(6)

where is the number of training samples, is the output of network and is ground-truth.

4. Experiments

4.1. Experimental Setup

The IM-Net and NM-Net were trained for 50,000 iterations using Stochastic Gradient Descent with an initial learning rate

, weight decay and momentum , batch size of 128. The learning rate decreased by half from to every iterations. In IM-Net and NM-Net, the filter weights of each layer were initialized by MSRA (He et al., 2015)

. The IM-Net and NM-Net were implemented in Caffe

(Jia et al., 2014).

4.2. Synthetic Data Generation

It is time-consuming and difficult to collect massive pairs of low-light and normal images of natural scenes (or the pairs of low-light images and their associated illumination and noise level maps). Instead, we resort to synthesized training data by simulating the low-light image generation process in cameras. Referring to (Liu et al., 2008), the noise distribution in low-light images can be modeled as:

(7)

where is the captured image, is camera response function (CRF), accounts for the illumination-dependent noise, accounts for the independent noise. Besides, the real noise in low-light images is not white; however, there are spatial correlations introduced by “demosaicing” (Ramanath et al., 2002). In order generate high fidelity low-light images, we propose a unified image generation model :

(8)

where and denote Bayer pattern and inverse Bayer pattern respectively. is the inverse camera response function.

The block diagrams of synthesizing low-light image is shown in Fig. 5(a). Given a clear image patch , the low-light condition is simulated by multiplying a coefficient , which belongs to [0,1]. The irradiance is obtained after performing and on . Then, we successively add the illumination-dependent noise and independent noise to irradiance. After that, we perform and to obtain the synthesized low-light image patch. We collect well-exposed images from the Internet and extract 25000 image patches with size . Grossberg, M.D et al. (Grossberg and Nayar, 2004) provides kinds of CRFs and we exploit the most widely used CRF50 and CRF60 in our experiments. Besides, the noise variances and are uniformly sampled from the ranges of [0; 0.16] and [0; 0.06], respectively. In this paper, we totally synthesize low-light patches. We use patches as the training set and patches as the test set. The synthesized image, for the test pattern (Fig. 5(b)), is shown in Fig. 5

(c). Compared with the synthetic result by using independent white noise in Fig. 

5(d), Fig. 5(c) can better reflect the dependence between noise and illumination.

Figure 5. (a) The image synthesis process described in Eq. (8). (b) Test pattern. (c) Independent white noise synthetic result. (d) The synthetic result of (a).

4.3. Ablation Studies

We evaluate the proposed method at different settings on synthetic images, i.e., the number of iterations between IM-net and NM-Net and different input. They are non-overlapping with the training and test set. Referring to (Cai et al., 2018), we use the same 15-layer residual network as the base model.

Number of Iterations 

To examine the improvements induced by different iterations, we experimentally compare models with different iterations on both synthetic images and real images. The peak signal-to-noise ratio (PSNR) and structural similarity (SSIM)

(Wang et al., 2004) between ground truth and enhanced results of models with different iterations are shown in Table 3. Compared with IM-Net and NM-Net working independently, i.e., iteration is 0, the introduction of progressive mechanism can effectively improve the accuracy for illumination and noise level estimation. As the number of iterations increases, PSNR and SSIM gradually increase, and reach stability in the fourth iteration. This implies that the mutual effect between illumination and noise is progressively suppressed. The accuracy of the optimal iteration model is higher than that of the basic model, which demonstrates the effectiveness of the progressive mechanism for modeling the coupling relation between illumination and noise.

Figure 6 shows an example in real scene. As shown in Fig. 6(b), the result generated by one iteration model is still low-light and contains noise. In contrast, these artifacts are gradually reduced as iterations increasing and it achieves stable on the fourth iteration, which proves the practicability of our method for real scenes. The model with four iterations is set as our default model in the following paper.

Figure 6. Enhanced images of models with different iterations on low-light image. (a) The original low-light image. (b)-(g) Enhanced images of our models with one iteration to five iterations.
Baseline Iter 0 Iter 1 Iter 2 Iter 3 Iter 4 Iter 5
PSNR 22.85 19.86 21.78 22.20 22.78 23.13 23.12
SSIM 0.907 0.883 0.895 0.897 0.911 0.911 0.910
Table 3. The PSNR and SSIM of enhanced results on 50 synthetic low-light images of models with different iterations.
Model Iter 1 Iter 2 Iter 3 Iter 4 Iter 5 Proposed
PSNR 20.69 17.56 16.46 15.87 15.50 23.13
SSIM 0.797 0.761 0.740 0.725 0.714 0.9111
Table 4. The PSNR and SSIM of enhanced results by using intermediate enhancements as network input.
Noise Level 0 0.04 0.08 0.12 0.16 0.20 Proposed
PSNR 18.32 20.84 21.66 22.55 22.07 21.6 23.13
SSIM 0.760 0.770 0.825 0.868 0.846 0.821 0.9111
Table 5. The PSNR and SSIM of enhanced results by using constant noise level to substitute the NM-Net.

Network Input We perform an experiment on 50 synthetic images by using the intermediate enhancements, i.e., illumination enhancement or denoising result in every iteration. The results are shown in Table 4. Due to the accumulation error, the testing results are getting worse with the increase of iteration times, which is inferior to our proposed model. Thus, we use the estimated noise level (or illumination) from previous iteration, coupling with the original patch as the input of the IM-Net (or NM-Net). And we only perform the image enhancement and denoising at the last iteration.

SRIE JIEP LIME HQEC SRLL NPE Baseline Ours
FNF38 PSNR 19.98 19.64 16.89 18.89 19.35 17.71 19.68 20.12
SSIM 0.81 0.80 0.78 0.79 0.80 0.78 0.80 0.82
IP100 NIQE 3.66 3.58 4.20 3.85 3.45 4.42 3.44 3.34
Table 6. The NIQE results on IP100 dataset and the PSNR/SSIM results on FNF38 dataset.
SRIE JIEP LIME HQEC SRLL NPE Ours
LOL PSNR 11.8552 12.0466 17.1818 16.6241 13.8765 16.6972 18.8025
SSIM 0.4979 0.5124 0.6349 0.6079 0.6577 0.5945 0.7215
NPE NIQE 2.8912 2.9174 3.2606 3.075 3.8476 2.8955 2.7519
Table 7. The PSNR/SSIM results on LOL dataset and the NIQE results on NPE dataset.
Figure 7. The contrast enhancement results of state-of-the-art methods including SRIE (Fu et al., 2016), JIEP (Cai et al., 2017), LIME (Guo et al., 2017), NPE (Wang and Luo, 2018), SRLL (Li et al., 2018), HQEC (Zhang et al., 2018b), and the proposed method.

Noise Level We also perform an experiment on 50 synthetic images by using constant noise level to replace the NM-Net. Table 5 shows the experiment results. Compared with using IM-Net only, i.e., the noise level is 0, the quantitative results are significantly improved after denoising. However, the denoising results guided by the constant noise level are inferior to the result guided by the NM-Net, which demonstrates the significance of the noise level estimation network.

4.4. Comparisons with SOTA Methods

To verify the superiority of our method, we compare it with the SOTA methods, including SRIE (Fu et al., 2016), LIME (Guo et al., 2017), JIEP (Cai et al., 2017), HQEC (Zhang et al., 2018b), SRLL (Li et al., 2018), NPE (Wang and Luo, 2018) and LDSE (Cai et al., 2018). We perform the experiments on two collected datasets (IP100 and FNF38) and three public datasets (MPI (Cai et al., 2018), LOL (Wei et al., 2018), NPE (Wang and Luo, 2018)).

The IP100 dataset consists of two parts: ICI35 and P-65. ICI35 contains 35 identified challenging low-light images collected from previous works (Cai et al., 2017). P-65 includes the other 65 challenging images which are captured by Huawei P20 smartphones. The ground-truth of images in IP100 are unavailable, and thus we adopt the widely used blind image quality assessment, i.e., natural image quality evaluator (NIQE) (Hautière et al., 2011), to evaluate the enhanced results. The lower NIQE value means higher image quality. FNF38 dataset includes 38 ambient and flash illumination pairs, which are selected from the FAID dataset (Aksoy et al., 2018). The two images from the same pair are well-aligned and the ambient image can be used as the ground truth. So we use the PSNR and SSIM to assess the enhanced result for this dataset. MIP dataset (Cai et al., 2018) contains 589 high-resolution multi-exposure sequences with 4,413 images. And the ground truth of this dataset is derived from several representative multi-exposure image fusion and stack-based high dynamic range imaging algorithms (Raman and Chaudhuri, 2009)(Shen et al., 2011)(Zhang and Cham, 2012)(Shen et al., 2014)(Kou et al., 2017)(Bruce, 2014).

Objective Comparisons Table  6 lists the NIQE and PSNR/SSIM results of different methods on IP100 and FNF38 datasets. As can be seen, the proposed method achieves lower NIQE and higher PSNR/SSIM than all of the other methods on two datasets. This implies that our method has better generalization for various scenes than other prior/assumption based methods. Next, we compare our method with CNN-based single image contrast enhancer (Cai et al., 2018) on MPI dataset. We achieve a slightly better result (PSNR: 19.77, FSIM (Zhang et al., 2011): 0.9456) than the method in (Cai et al., 2018) (PSNR: 19.77, FSIM: 0.9347) for under-exposure image enhancement. It demonstrates the superiority of our progressive network over the existing architectures. We also evaluate the performance of our method on LOL and NPE datasets, and the contrastive experimental results are listed in Table  7. Our method demonstrates the best performances in all three terms.

Figure 8. The contrast enhancement and denoising results of state-of-the-art methods including SRIE (Fu et al., 2016), JIEP (Cai et al., 2017), LIME (Guo et al., 2017), NPE (Wang and Luo, 2018), SRLL (Li et al., 2018), HQEC (Zhang et al., 2018b), and the proposed method.

Subjective Comparisons Figure. 7 shows the visual results of low-light images with little noise. We mainly concentrate on the contrast enhancement on these images. We can see that although methods such as SRIE, JIEP, NPE and HQEC can improve the contrast to a certain extent, the enhancement results still have a certain degree of low-light. The LIME algorithm can efficiently remove the unfavorable illumination and improve the global contrast. However, the enhanced result tends to exhibit over-exposure in some regions. SRLL can improve the global contrast but will introduce color cast. We also compare with LDSE on MIP dataset in Fig. 9. The LDSE can improve the overall visibility of scenes but introduces the color cast and blur. Compared with other methods, our method can efficiently remove the unfavorable illumination without introducing over-exposure or color cast. It demonstrates the proposed method has superior capacity for illuminant statistical modeling.

Figure 9. The contrastive experiment with LDSE (Cai et al., 2018). (a) Input low-light image. (b) Results of LDSE (Cai et al., 2018). (c) Results of the proposed method.

Figure.  8 shows the results on noisy low-light images captured in real scenes. Previous methods, like HQEC and NPE, are mainly designed for the contrast enhancement and do not include the specific operations for noise removal. With contrast improving, the noise is amplified in the enhanced result. The LIME algorithm uses BM3D to suppress the amplified noise in the enhancement result, but there is still some noise in the result. This is because globally consistent noise level coefficients cannot be applied to all regions of an image. The SRLL algorithm introduces the noise term into a variational Retinex model to formulate the captured low-light and noise images. This method can only suppress the noise to a certain degree and the enhanced result still contains the residual noise. Compared with other methods, our method can remove noise adequately while enhancing contrast, without introducing the over- or under-denoising problems. It demonstrates the superior capacity of our method for modeling the coupling relation between illumination and noise level.

Computational Efficiency  Moreover, we also compare the computational efficiency with the CNN based method (Cai et al., 2018). Our progressive framework is found to be 50 faster than (Cai et al., 2018). It processes a 129*129*3 image in 0.46s (run on CPU), compared to 26.47s in (Cai et al., 2018). Besides, the processing time can be further reduced to 0.03s with GPU acceleration. This advantage is due to the fully pointwise convolutional structure, which can be implemented efficiently.

5. Conclusion

In this paper, we present a progressive Retinex framework to improve the quality of low-light image in a mutually reinforced manner. The proposed framework is implemented based on fully pointwise convolutions, which can suppress the interferences caused by ambiguity between tiny textures and image noises, and achieve high computational efficiency. The comprehensive evaluations on synthetic and real low-light images demonstrate that our proposed method achieves superior performance over the representative state-of-the-art low-light image enhancement methods.

The limitation of our proposed method is that it only captures the statistical distribution in the pixel space while neglects the structural properties. One feasible solution to this issue is to design a multi-branch network, which is able to perceive both the inherent structural and statistical properties of low-light images together. We leave it as our future work.

Acknowledgments

This work was supported by the National Key RD Program of China under Grant 2017YFB1300201, the National Natural Science Foundation of China (NSFC) under Grants 61622211, 61620106009, 61872327, 61472380 and 61806062 as well as the Fundamental Research Funds for the Central Universities under Grant WK2100100030 and WK2380000001.

References

  • (1)
  • Aksoy et al. (2018) Yagız Aksoy, Changil Kim, Petr Kellnhofer, Sylvain Paris, Mohamed Elgharib, Marc Pollefeys, and Wojciech Matusik. 2018. A Dataset of Flash and Ambient Illumination Pairs from the Crowd. In Proceedings of the European Conference on Computer Vision (ECCV). 634–649.
  • Bruce (2014) Neil DB Bruce. 2014. Expoblend: Information preserving exposure blending based on normalized log-domain entropy. Computers & Graphics 39 (2014), 12–23.
  • Cai et al. (2017) Bolun Cai, Xianming Xu, Kailing Guo, Kui Jia, Bin Hu, and Dacheng Tao. 2017. A Joint Intrinsic-Extrinsic Prior Model for Retinex. In Proceedings of the IEEE International Conference on Computer Vision. 4000–4009.
  • Cai et al. (2018) Jianrui Cai, Shuhang Gu, and Lei Zhang. 2018. Learning a deep single image contrast enhancer from multi-exposure images. IEEE Transactions on Image Processing 27, 4 (2018), 2049–2062.
  • Dabov et al. (2007) Kostadin Dabov, Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian. 2007. Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Transactions on image processing 16, 8 (2007), 2080–2095.
  • Elad (2005) Michael Elad. 2005. Retinex by two bilateral filters. In International Conference on Scale-Space Theories in Computer Vision. Springer, 217–229.
  • Forsyth (1988) David A Forsyth. 1988. A novel approach to colour constancy. In 1988 Second International Conference on Computer Vision. IEEE, 9–18.
  • Fu et al. (2016) Xueyang Fu, Delu Zeng, Yue Huang, Xiao-Ping Zhang, and Xinghao Ding. 2016. A weighted variational model for simultaneous reflectance and illumination estimation. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    . 2782–2790.
  • Funt and Shi (2010) Brian Funt and Lilong Shi. 2010. The rehabilitation of maxrgb. In Color and imaging conference, Vol. 2010. Society for Imaging Science and Technology, 256–259.
  • Grossberg and Nayar (2004) Michael D Grossberg and Shree K Nayar. 2004. Modeling the space of camera response functions. IEEE transactions on pattern analysis and machine intelligence 26, 10 (2004), 1272–1282.
  • Guo et al. (2017) Xiaojie Guo, Yu Li, and Haibin Ling. 2017. LIME: Low-light image enhancement via illumination map estimation. IEEE Transactions on Image Processing 26, 2 (2017), 982–993.
  • Hautière et al. (2011) Nicolas Hautière, Jean-Philippe Tarel, Didier Aubert, and Eric Dumont. 2011. Blind contrast enhancement assessment by gradient ratioing at visible edges. Image Analysis & Stereology 27, 2 (2011), 87–95.
  • He et al. (2013) Kaiming He, Jian Sun, and Xiaoou Tang. 2013. Guided image filtering. IEEE transactions on pattern analysis & machine intelligence 6 (2013), 1397–1409.
  • He et al. (2015) Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015.

    Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In

    Proceedings of the IEEE international conference on computer vision. 1026–1034.
  • Healey and Kondepudy (1994) Glenn E Healey and Raghava Kondepudy. 1994. Radiometric CCD camera calibration and noise estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 16, 3 (1994), 267–276.
  • Jia et al. (2014) Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia. ACM, 675–678.
  • Jobson et al. (1997) Daniel J Jobson, Zia-ur Rahman, and Glenn A Woodell. 1997. A multiscale retinex for bridging the gap between color images and the human observation of scenes. IEEE Transactions on Image processing 6, 7 (1997), 965–976.
  • Joze et al. (2012) Hamid Reza Vaezi Joze, Mark S Drew, Graham D Finlayson, and Perla Aurora Troncoso Rey. 2012. The role of bright pixels in illumination estimation. In Color and Imaging Conference, Vol. 2012. Society for Imaging Science and Technology, 41–46.
  • Kou et al. (2017) Fei Kou, Zhengguo Li, Changyun Wen, and Weihai Chen. 2017. Multi-scale exposure fusion via gradient domain guided image filtering. In 2017 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1105–1110.
  • Li et al. (2015) Lin Li, Ronggang Wang, Wenmin Wang, and Wen Gao. 2015. A low-light image enhancement method for both denoising and contrast enlarging. In 2015 IEEE International Conference on Image Processing (ICIP). IEEE, 3730–3734.
  • Li et al. (2018) Mading Li, Jiaying Liu, Wenhan Yang, Xiaoyan Sun, and Zongming Guo. 2018. Structure-Revealing Low-Light Image Enhancement Via Robust Retinex Model. IEEE Transactions on Image Processing 27, 6 (2018), 2828–2841.
  • Liu et al. (2008) Ce Liu, Richard Szeliski, Sing Bing Kang, C Lawrence Zitnick, and William T Freeman. 2008. Automatic estimation and removal of noise from a single image. IEEE transactions on pattern analysis and machine intelligence 30, 2 (2008), 299–314.
  • Lore et al. (2017) Kin Gwn Lore, Adedotun Akintayo, and Soumik Sarkar. 2017. LLNet: A deep autoencoder approach to natural low-light image enhancement. Pattern Recognition 61 (2017), 650–662.
  • Morel et al. (2010) Jean Michel Morel, Ana Belén Petro, and Catalina Sbert. 2010. A PDE formalization of Retinex theory. IEEE Transactions on Image Processing 19, 11 (2010), 2825–2837.
  • Park et al. (2018) Seonhee Park, Soohwan Yu, Minseo Kim, Kwanwoo Park, and Joonki Paik. 2018. Dual Autoencoder Network for Retinex-based Low-Light Image Enhancement. IEEE Access (2018).
  • Rahman et al. (2004) Zia-ur Rahman, Daniel J Jobson, and Glenn A Woodell. 2004. Retinex processing for automatic image enhancement. Journal of Electronic imaging 13, 1 (2004), 100–111.
  • Raman and Chaudhuri (2009) Shanmuganathan Raman and Subhasis Chaudhuri. 2009. Bilateral Filter Based Compositing for Variable Exposure Photography.. In Eurographics (short papers). 1–4.
  • Ramanath et al. (2002) Rajeev Ramanath, Wesley E Snyder, Griff L Bilbro, and William A Sander. 2002. Demosaicking methods for Bayer color arrays. Journal of Electronic imaging 11, 3 (2002), 306–316.
  • Shen et al. (2014) Jianbing Shen, Ying Zhao, Shuicheng Yan, Xuelong Li, et al. 2014. Exposure fusion using boosting Laplacian pyramid. IEEE Trans. Cybernetics 44, 9 (2014), 1579–1590.
  • Shen et al. (2017) Liang Shen, Zihan Yue, Fan Feng, Quan Chen, Shihao Liu, and Jie Ma. 2017. Msr-net: Low-light image enhancement using deep convolutional network. arXiv preprint arXiv:1711.02488 (2017).
  • Shen et al. (2011) Rui Shen, Irene Cheng, Jianbo Shi, and Anup Basu. 2011. Generalized random walks for fusion of multi-exposure images. IEEE Transactions on Image Processing 20, 12 (2011), 3634–3646.
  • Tsin et al. (2001) Yanghai Tsin, Visvanathan Ramesh, and Takeo Kanade. 2001. Statistical calibration of CCD imaging process. In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, Vol. 1. IEEE, 480–487.
  • Wang and Luo (2018) Shuhang Wang and Gang Luo. 2018. Naturalness Preserved Image Enhancement Using a Priori Multi-Layer Lightness Statistics. IEEE transactions on image processing: a publication of the IEEE Signal Processing Society 27, 2 (2018), 938–948.
  • Wang et al. (2004) Zhou Wang, Alan C Bovik, Hamid R Sheikh, Eero P Simoncelli, et al. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600–612.
  • Wei et al. (2018) Chen Wei, Wenjing Wang, Wenhan Yang, and Jiaying Liu. 2018. Deep retinex decomposition for low-light enhancement. arXiv preprint arXiv:1808.04560 (2018).
  • Zhang et al. (2017) Jing Zhang, Yang Cao, Shuai Fang, Yu Kang, and Chang Wen Chen. 2017. Fast Haze Removal for Nighttime Image Using Maximum Reflectance Prior. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  • Zhang et al. (2018a) Jing Zhang, Yang Cao, Yang Wang, Chenglin Wen, and Chang Wen Chen. 2018a. Fully Point-wise Convolutional Neural Network for Modeling Statistical Regularities in Natural Images. In 2018 ACM Multimedia Conference on Multimedia Conference. ACM, 984–992.
  • Zhang and Tao (2019) Jing Zhang and Dacheng Tao. 2019. FAMED-Net: A Fast and Accurate Multi-scale End-to-end Dehazing Network. IEEE Transactions on Image Processing (2019), 1–13. https://doi.org/10.1109/TIP.2019.2922837
  • Zhang et al. (2011) Lin Zhang, Lei Zhang, Xuanqin Mou, and David Zhang. 2011. FSIM: A feature similarity index for image quality assessment. IEEE transactions on Image Processing 20, 8 (2011), 2378–2386.
  • Zhang et al. (2018b) Qing Zhang, Ganzhao Yuan, Chunxia Xiao, Lei Zhu, and Wei-Shi Zheng. 2018b. High-Quality Exposure Correction of Underexposed Photos. In 2018 ACM Multimedia Conference on Multimedia Conference. ACM, 582–590.
  • Zhang and Cham (2012) Wei Zhang and Wai-Kuen Cham. 2012. Gradient-directed multiexposure composition. IEEE Transactions on Image Processing 21, 4 (2012), 2318–2323.
  • Zhang et al. (2012) Xiangdong Zhang, Peiyi Shen, Lingli Luo, Liang Zhang, and Juan Song. 2012. Enhancement and noise reduction of very low light level images. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012). Ieee, 2034–2037.