Enhancing Underexposed Photos using Perceptually Bidirectional Similarity

07/25/2019 ∙ by Qing Zhang, et al. ∙ IEEE Wuhan University 3

This paper addresses the problem of enhancing underexposed photos. Existing methods have tackled this problem from many different perspectives and achieved remarkable progress. However, they may fail to produce satisfactory results due to the presence of visual artifacts such as color distortion, loss of details and uneven exposure, etc. To obtain high-quality results free of these artifacts, we present a novel underexposed photo enhancement approach in this paper. Our main observation is that, the reason why existing methods induce the artifacts is because they break a perceptual consistency between the input and the enhanced output. Based on this observation, an effective criterion, called perceptually bidirectional similarity (PBS) is proposed for preserving the perceptual consistency during enhancement. Particularly, we cast the underexposed photo enhancement as PBS-constrained illumination estimation optimization, where the PBS is defined as three constraints for estimating the illumination that can recover the enhancement results with normal exposure, distinct contrast, clear details and vivid color. To make our method more efficient and scalable to high-resolution images, we introduce a sampling-based strategy for accelerating the illumination estimation. Moreover, we extend our method to handle underexposed videos. Qualitative and quantitative comparisons as well as the user study demonstrate the superiority of our method over the state-of-the-art methods.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 5

page 6

page 7

page 8

page 9

page 10

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

With the popularization of the readily-available cameras on cell phones, photo sharing on social networks (e.g., Instagram and Facebook) has become a trendy lifestyle. However, capturing well-exposed photos under complex lighting conditions (e.g., low-light and back-light) remains a challenge for casual photographers. Hence, underexposed photos are inevitably created (see Fig. 1(a) for an example). Because of the low detail visibility and dull colors, these photos not only look unpleasing, but also fail to capture what user desires. Thus, underexposed photo enhancement is usually required for improving the detail visibility and the visual appeal of underexposed photos.

(a) Input
(b) Auto Enhance on iPhone
(c) Auto Tone in Lightroom
(d) Our result
Fig. 1: An example underexposed photo enhanced by existing tools and our approach. In comparison, we recover clearer details, more distinct contrast and vivid color, producing a visually compelling result.

Underexposed photo enhancement is a challenging task, since it is highly non-linear and subjective. Commercial softwares such as Adobe Lightroom and Photoshop allow users to interactively retouch photos, while they remain largely inscrutable to non-experts and typically require a tedious process to balance multiple controls (e.g., brightness, contrast, sharpness and saturation, etc.). Other ease of use alternatives such as the Auto Enhance feature on iPhone and the Auto Tone feature in Lightroom allow enhancing underexposed photos by just a single click. However, they may fail to produce high-quality results due to the inherent difficulty of automatically balancing all assorted factors in the adjustment, as shown in Fig. 1(b) and (c).

(a) Input
(b) NPE [1]
(c) WVM [2]
(d) JieP [3]
(e) LIME [4]
(f) HDRNet [5]
(g) DPE [6]
(h) Ours
Fig. 2: Comparison between our method and the state-of-the-art methods (b)-(g) on enhancing a challenging underexposed photo (a). Photo courtesy of Flickr user Tony Kazmierski. Better view this figure in electronic version.

Researchers have also developed numerous algorithms to tackle this problem. Early approaches work by performing histogram equalization [7, 8, 9], or by designing intensity mapping functions [10, 11, 12, 13], while many subsequent approaches [1, 2, 3, 4] rely on the Retinex model [14]

to enhance photos. Others learn data-driven photo adjustment by utilizing either traditional machine learning techniques 

[15, 16, 17]

, or the deep neural networks 

[5, 18, 6]. However, as demonstrated in Fig. 2, previous methods still have respective limitations, e.g., the unclear details, local overexposure and color distortion, making they fail to produce visually pleasing results.

To address the limitations of previous methods, this paper presents a novel method for enhancing underexposed photos. Our key observation is that the main reason why previous methods may produce visually unpleasing results is because they break a perceptually consistency between the input image and the enhancement result. Based on this observation, we propose a simple yet effective criterion, called perceptually bidirectional similarity (PBS), for explicitly describing how to preserve the perceptual consistency. With the proposed PBS, we adopt the Retinex theory and formulate photo enhancement as PBS-constrained illumination estimation optimization, where we solve for the illumination under three constraints characterized by the PBS, so as to recover high-quality results free of the artifacts encountered by existing methods. A sampling-based strategy is also described to accelerate the illumination estimation and allow more efficient and scalable photo enhancement. Moreover, we adapt our method to handle underexposed videos by enforcing the illumination to have temporally smooth transition among neighboring frames.

The major contributions of this work are as follows:

  • First, we propose PBS and design PBS-constrained illumination estimation optimization for enhancing underexposed photos, which allows robustly producing high-quality results.

  • Second, we introduce a sampling-based strategy for accelerating the illumination estimation.

  • Third, we extend our method to enhance underexposed videos.

  • Last, we evaluate our method on six datasets and compare it with various state-of-the-art methods. Results show that our method outperforms previous methods, both qualitatively and quantitatively.

2 Related Work

Our work mainly relates to photo enhancement, which is a long-standing problem with an immense literature. In this section, we focus on discussing the related works from the following four aspects rather than trying to be exhaustive.

Histogram-based methods. Histogram is an important representation for photo enhancement. One of the most widely-adopted techniques is histogram equalization (HE), which enlarges image contrast by evening out the intensity histogram to the entire dynamic range. However, it tends to yield unrealistic results because of ignoring the relationship between neighboring pixels. Later, various variants of HE are developed to improve the results [7, 8, 19, 9, 20, 21], which basically follow the idea of dividing the global histogram into local histograms. As the optimal histogram partition strategy varies with images and is typically unpredictable, they may also produce unsatisfactory results.

Sigmoid-mapping-based methods.

Mapping pixel intensities with sigmoid functions is another commonly-used way to enhance photos. A well-known representative is Gamma Correction, which expands the dynamic range via a power-law function. As globally applying sigmoid mapping may generate visually distorted results, existing methods usually perform locally adaptive mapping. For instance, Bennett and McMillan 

[11] decomposed the input image into a base and detail layers, and applied different mappings for the two layers to preserve the image details, while Yuan and Sun [13] segmented the image into subregions and computed luminance-aware detail-preserving mapping for each subregion. Zhang et al[22] created multiple tone mapped versions for the input image and fused them into a well-exposed image. Since finding locally optimal sigmoid mappings and ensuring globally smooth transition are difficult, these methods often fail for complex images.

Retinex-based methods. This kind of method is built upon the assumption that an underexposed image is the pixel-wise product of the expected enhancement result and a single-channel illumination map. In this fashion, the enhancement problem can be treated as an illumination estimation problem. Jobson et al[23] made an early attempt to this problem, but their results often look unnatural due to the frequently appeared artifacts such as loss of details, color distortion, and uneven exposure. Subsequent methods in this category focus on improving the results [1, 24, 2, 3, 4]. However, they may also fail, especially for non-uniformly illuminated underexposed images. Our method also belongs to this category. However, by maintaining the proposed PBS, our method is able to robustly generate visually pleasing results free of the visual artifacts encountered by previous methods (see Fig. 4, Fig. 13 and 14).

Learning-based methods. An increasing amount of efforts focus on investigating learning-based enhancement methods since the pioneering work of Bychkovsky et al[15], which provides the first and largest MIT-Adobe FiveK dataset consisting of input/output image pairs for tone adjustment. Yan et al[17] achieved automatic color enhancement by tackling a learning-to-rank problem, while Yan et al[25] enabled semantic-aware image enhancement. Recently, Lore et al[26]

presented a deep autoencoder-based approach for enhancing low-light images. Gharbi 

et al[5] proposed bilateral learning to enable real-time image enhancement, while Chen et al[6] designed an unpaired learning model for image enhancement based on a two-way generative adversarial networks (GANs). The main limitation of learning-based methods is that they typically do not generalize well to images that do not exist in the training datasets.

A preliminary version of this work appeared in [27]. In this paper, we have significantly extended the earlier conference version in five aspects. First, we present a sampling-based strategy for accelerating the illumination estimation. Second, we introduce an optional image denoising operation for removing the possible noise in the enhanced image. Third, we extend our method to enhance underexposed videos. Fourth, we provide deeper analysis on the proposed PBS’s properties and potentials, and the effect of the regularization parameters. Fifth, we have conducted extensive experiments to evaluate the advantage of our method, including further comparisons with more recent methods and evaluations on an additional dataset. Our code will be made publicly available at http://zhangqing-home.net/.

3 Methodology

This section presents the proposed underexposed photo enhancement approach. For completeness, we first summarize the background knowledge for Retinex-based image enhancement in Section 3.1. Section 3.2 introduces the proposed PBS and analyze how to characterize it as specific constraints on illumination, while Section 3.3 formulates the enhancement problem as PBS-constrained illumination estimation, and derives an ADMM based procedure for solving the involved non-convex optimization problem. Then, the implementation details and parameter setting are illustrated in Section 3.4. Next, we describe a sampling-based strategy for accelerating the illumination estimation in Section 3.5, and an optional image denoising step for removing the possible noise in the enhanced image in Section 3.6.

(a) Input
(b) Reflectance
(c) Enhanced image
Fig. 3: Difference between Retinex-based image enhancement and the intrinsic image decomposition. Images (a) and (b) are from the MIT intrinsic images dataset [28].
(a) Input
(b) CLAHE [8]
(c) Bennett and McMillan [11]
(d) Yuan and Sun [13]
(e) NPE [1]
(f) WVM [2]
(g) LIME [4]
(h) Ours
Fig. 4: Issues (indicated by red arrows in (c)-(g)) encountered by existing methods. Photo from Bychkovsky et al[15].

3.1 Background

Retinex-based image enhancement is built upon the following image formation model, which assumes that an underexposed image (normalized to [0,1]) is the pixel-wise product of the desired enhanced image and a single-channel illumination map :

(1)

where denotes pixel-wise multiplication. With this model, the enhancement problem can be reduced to an illumination estimation problem. Since once is known, we can recover the enhanced image by , where the division is pixel-wise. Note that the model in Eqn. (1) is essentially different with that of the intrinsic image decomposition [29, 30], which has a similar formulation and aims to separate an image into pixel-wise product of a reflectance/albedo component and an illumination/shading component. As demonstrated in Fig. 3, the reflectance component describes the inherent material property and usually loses the visual realism, while our desired enhanced image in Eqn. (1) is a natural image with improved detail visibility and visual appeal, as shown in Fig. 3(c).

3.2 Perceptually Bidirectional Similarity (PBS)

Here, we introduce the PBS and elaborate how we characterize it as constraints on the illumination map in Eqn. (1).

Before introducing PBS, we first summarize the common issues encountered by existing underexposed photo enhancement methods, which inspire the proposal of PBS. As shown in Fig. 4(b)-(g), color distortion, uneven exposure and loss of detail are the three main issues. More concretely, CLAHE [8] and NPE [1] distort the skin color and mistakenly make the girl’s face and arms gray, giving rise to color family mismatch between the input and the output. Yuan and Sun [13] and WVM [2] induce exposure inconsistency around the arms and the body, while these regions have consistent exposure in the input. Bennett and McMillan [11] and LIME [4] overexpose the background and lead to significant loss of detail.

Key observation. From the above analysis, we have an important observation — that is, the main reason why existing methods fail to produce visually pleasing results is because they break the bidirectional perceptual consistency on color, detail and local exposure distribution between the input and the enhanced output. Intuitively, this observation suggests that a good enhanced image should not only recover clear details from the underexposed regions, but also satisfy two properties: 1) it should contain all the visual information in the input image; 2) it should not introduce new visual artifacts that were not in the input image.

PBS definition. Based on the observation, we propose PBS, which more specifically characterizes the aforementioned two requirements for the expected enhanced image of the input underexposed image : 1) colors and details in underexposed regions of should all exist in as properly enhanced versions, and regions in with consistent exposure should also have consistent exposure in ; 2) should not contain distorted colors, additional details and local exposure inconsistencies that originally do not exist in . To utilize the PBS, we define it as three numerical constraints on illumination below, which help ensure the bidirectional consistency of color, detail and exposure distribution between and , respectively.

Color consistency. To preserve the color consistency, we enforce each pixel’s color in and are in the same color family by imposing a range constraint on . Since and is normalized to [0,1], small (large) yields with high (low) RGB values. Intuitively, color inconsistency may appear in terms of mismatched colors in derived from naive color truncation, when is too small to guarantee that each RGB color channel in the enhanced image remains in the color gamut [0,1]. Hence, we bound at each pixel is no less than a value that can enlarge the maximum RGB color channel of the corresponding pixel in to 1 through , which is expressed as

(2)

where is a color channel at pixel . is the Gamma function with , which is an optional operation for further illumination adjustment. From Eqn. (2), we can easily obtain . To avoid mistakenly darken the input image, we set the upper bound of to 1, in which case the input will be directly taken as the output. Overall, for each pixel , the constraint for color consistency can be defined as .

(a) Input
(b) Initial illumination
(c) Refined illumination
(d) Final enhanced image
Fig. 5: Two challenging underexposed photos enhanced by our method. The grayscale illumination maps are visualized by the hot colormap. Photos from Flicker user Jane Wei (top) and Bychkovsky et al[15] (bottom).

Detail consistency. To facilitate understanding, we reformulate the detail consistency described by PBS from a perspective of edge consistency as follows: 1) If is smooth at pixel , then should also be smooth at ; 2) If has an edge at pixel , then should have a stronger, or at least equivalent edge at . By associating edge with gradient and directional derivative, the above two cases can be characterized as the following constraint:

(3)

where denotes the gradient operator. is the first order derivative along the horizontal () or vertical () direction. is a small constant (typically 1e-5) for determining whether there is an edge at a pixel in the input image . Note that Eqn. (3) can also be expressed in terms of by replacing with .

Exposure distribution consistency. According to Eqn. (1), the key to preserving the exposure distribution consistency is to ensure that is locally smooth for regions with similar illumination in the input. To this end, we alternatively adopt the relative total variation (RTV) measure [31] as the smoothness regularizer for obtaining piecewise smooth illumination, while maintaining the prominent illumination discontinuities across regions. Adopting this regularizer can also help enhance image contrast, because when adjacent pixels and have similar illumination values (), their contrast in the enhanced image can be estimated as , which will be definitely enlarged, since . Note that other edge-aware smoothness regularizers [32, 33, 34] can also work with our approach. Formally, the RTV measure is defined as

(4)

where and denote the - and -direction RTV measure, respectively. Specifically, the -direction measure is written as

(5)

where denotes a window centered at pixel . and , where

denotes the Gaussian kernel with standard deviation

, is the convolution operator, and is a small constant fixed to 1e-3 for preventing division by zero. The -direction measure is defined similarly, we thus do not give its definition separately.

3.3 PBS-constrained Illumination Estimation

This section illustrates how we cast the underexposed photo enhancement as a PBS-constrained illumination estimation problem. We first introduce how to obtain an initial illumination map from the input image. Then, we adopt the PBS constraints and design an optimization framework for refining the initial illumination map, so that we can obtain the desired PBS-satisfied illumination. Finally, we describe an ADMM based solver for the optimization.

(a)
(b)
(c)
(d)
Fig. 6: Initial illumination and its limitation in photo enhancement. (a) Input image. (b) Initial illumination map (in hot colormap). (c) Enhanced image recovered from the initial illumination. (d) Our result recovered from the estimated PBS-constrained illumination.

Initial illumination extraction. Intuitively, the brightness of different areas in an underexposed image roughly reflect the magnitude of illumination. Hence, inspired by [14], we compute the initial illumination map by treating the maximum values among the RGB color channels of the input as the illumination values, which is expressed as

(6)

As analyzed by [4], by this means, the initial illumination can better model the global illumination distribution, and ensures that the enhanced image will be less saturated since it is recovered by , avoiding enlarging color channels in to one. However, though the initial illumination map can act as a robust estimation for the global illumination distribution, naively recovering enhanced image from it typically produces unrealistic result, as shown in Fig. 6(c). Hence, we further devise a PBS-constrained illumination estimation optimization for refining the initial illumination, so that we can recover visually compelling result from the refined illumination.

Objective function. Intuitively, the ideal illumination map should simultaneously preserve the global illumination distribution characterized by and satisfy the PBS constraints. Hence, we define the following objective function for estimating the desired illumination :

(7)

where is the balancing weight. The first term forces the target illumination to be close to the initial illumination in structure, while the second term and the other two constraints are the PBS constraints.

ADMM solver. The objective function in Eqn. (7) involves an intractable non-convex energy minimization. To obtain its solution, we derive a solver based on the alternating direction method of multipliers (ADMM) technique [35].

(a)
(b)
(c)
(d)
(e)
Fig. 7: Effectiveness of each PBS constraint. (a) Input image. (b)-(d) are enhanced images without color, detail and exposure distribution consistency constraints, respectively. (e) Result with all the three PBS constraints.

Before describing the details, we first convert the image formation model in Eqn. (1) to the log-domain, so that we can reduce the division operation in Eqn. (7) to more tractable subtraction form. Let , , and . Eqn. (1) is then written as . The color and detail consistency constraints can be accordingly expressed as and

(8)

For the exposure distribution consistency constraint, we can simply replace with in Eqn. (4). Since for any variable , similar to [2], we multiply the numerator and denominator in the second line of Eqn. (8) by and to eliminate the impact of the scaling weight. Note that, is seen as a constant here, since it can be estimated from previous iteration. With the log-transformation, the objective function in Eqn. (7) can be written in a matrix form as follows

(9)

where , and

are vector representations of

, and . and are diagonal matrices with and . and are the Toeplitz matrices from the discrete gradient operators with forward difference. is a binary matrix indicating whether a pixel in the input satisfies . and are diagonal matrices consisting of the constant parts in the second line of Eqn. (8).

To apply ADMM, we rewrite the minimization problem in Eqn. (9) as the following equivalent form:

(10)

where , and . and are defined similarly. , and are auxiliary variables for making the original problem separable. The augmented Lagrangian function of Eqn. (10) is then written as

(11)

where , and are the Lagrangian multipliers, is the penalty parameter. computes the standard inner product. The problem in Eqn. (11) can be further divided into the following subproblems with respect to , , and , respectively:

(12a)
(12b)
(12c)
(12d)

where denotes the -th iteration. By iteratively solving each subproblem while fixing others until convergence, we can obtain the solution to Eqn. (10). Specifically, we first obtain by solving the subproblem in Eqn. (12a). With , we then compute , and by:

(13)

where projects entries that satisfy to zero. ensures that other entries () are no less than 1. The Lagrangian multipliers are updated by:

(14)

where is the relaxation parameter. The whole ADMM procedure for the illumination estimation optimization is summarized in Algorithm 1.

(a) Image 1
(b) ,
(c) ,
(d) ,
(e) Image 2
(f) ,
(g) ,
(h) ,
Fig. 8: Effect of varying and . The 1st and 2nd rows show how and affect the enhanced images, respectively.
0:  Input image , parameter
1:  Initialization: Compute initial illumination , set , , , , , as zero matrices, and ,
2:  while not converged do
3:   Solve in Eqn. (12a)
4:   Solve , and in Eqn. (13)
5:   Update , and in Eqn. (14)
6:    and
7:  end while
7:  The estimated illumination map
Algorithm 1 Illumination Estimation Optimization

3.4 Implementation and Parameter Setting

We employ projected gradient descent method [36] to solve the subproblem in Eqn. (12a). The key parameter of our approach is , which determines the smoothness level of the estimated illumination map. In general, we set large for highly textured images. is another parameter that affects the result. In all our experiments, we empirically set and , which are able to produce reasonably good results for our testing images. The final enhanced image is computed by . Fig. 5 shows two examples.

Effectiveness of each PBS constraint. Fig. 7 demonstrates the effectiveness of each PBS constraint. We can see that the skin color of the girl is obviously distorted when we remove the color consistency constraint (Fig. 7(b)), while removing the detail consistency constraint overexposes the grass and the face and arm of the girl (Fig. 7(c)). Without the exposure distribution consistency constraint, the enhanced image shows disturbing exposure inconsistency around the body of the girl (Fig. 7(d)), while these regions have similar exposure level in the input image. Last, by combining all the three PBS constraints, we achieve the visually best result with clear details, vivid color, distinct contrast and normal exposure distribution, as shown in Fig. 7(e).

Effect of varying parameters. Fig. 8 evaluates the effect of varying and . As shown in the first row, larger produces result with stronger local contrast. However, this effect becomes less obvious when . As larger typically requires more iterations to converge, we fix as a trade-off. The second row of Fig. 8 shows how affects the results. We can see that the result without Gamma mapping (namely ) is also satisfactory, but too bright to be consistent with the image aesthetics. Decreasing can reduce the overall brightness, but at the cost of lowering the overall visibility. To obtain better visual results, we set for all our tested images.

Convergence analysis. Algorithm 1 stops iteration when: (i) the difference between two consecutive solutions is less than a small threshold (1e-3), or (ii) the maximum number (we empirically set it as 20) of iterations is reached. We have experimentally found that our algorithm has good convergence rate when and , and usually converges within 5-10 iterations. Fig. 9 plots the convergence curve of our algorithm for an example image. As shown, the illumination estimation optimization converges after 7 iterations, and more iterations barely improve the result.

Fig. 9: Convergence curve of our illumination estimation optimization for an example image. The ordinate axis indicates the iterative error of the solutions. The enhanced images with different iterations (2, 7, 10) are also shown.
(a) Input
(b) Without denoising
(c) With denoising
Fig. 10: Comparison of enhanced images with and without image denoising. Please zoom in to compare the results.
(a) Input
(b) Without (3 sec)
(c) With (0.3 sec)
Fig. 11: Comparison of results with and without performing the acceleration. The two enhanced images (b) and (c) are visually indistinguishable, while the one with acceleration only takes 0.3 seconds, which is faster than the direct implementation without acceleration.

3.5 Acceleration

The illumination estimation optimization described in Section 3.3 is relatively fast, but its naive application to underexposed photo enhancement would be computationally expensive for high-resolution images. This section describes a sampling-based strategy to enable more efficient photo enhancement, even for high-resolution images.

Due to the piece-wise smooth nature of the illumination in natural images, the main idea behind our acceleration method is to sample a low-resolution input image for illumination estimation, and then upsample the estimated low-resolution illumination to the full-resolution for photo enhancement. To this end, we first downsample the source image with its larger dimension (width or height) no more than 400 pixels, and perform illumination estimation optimization on the downsampled image. Then, we employ joint bilateral upsampling (JBU) [37] to upsample the obtained low-resolution illumination map to full-resolution version in an edge-aware manner, which is expressed as

(15)

where is the initial illumination (full-resolution) obtained from Eqn. (6). and denote coordinates of pixels in and , while and denote coordinates of pixels in the low-resolution solution . and are spatial and range filter kernels in terms of truncated Gaussian (see [38] for details) with standard deviation and , respectively. denotes a window centered at pixel . is the normalizing factor that sums the filter weight .

With the above acceleration, the runtime for enhancing an 685 1024 image (Fig. 11(a)) drops from 3 seconds to 0.3 seconds on a PC with Core i5-7400 CPU, while the enhanced image is visually indistinguishable from that of the direct implementation without acceleration, as shown in Fig. 11.

3.6 Image denoising

While the proposed method can robustly enhance underexposed photos, it may also amplify the underlying noise, as shown in Fig. 10(b). To further improve the visual quality, we introduce an image denoising operation as post-processing. For the sake of the performance and the runtime efficiency, we adopt CBM3D [39] to fulfill the task, though any other color image denoising algorithms would also work with our method. By performing image denoising, we are able to remove noise and produce visually more compelling result, as shown in Fig. 10(c). It is worth mentioning that the denoising operation is optional, since not every underexposed photo contains noise.

(a) Frame 1
(b) Frame 16
(c) Frame 42
(d) Frame 68
Fig. 12: Underexposed video enhancement. Top: the 1st, 16th, 42nd, and 68th frames from the original underexposed video. Bottom: the corresponding resulting frames from our enhanced video. Source video from Zhang et al[22].

4 Extension to Video

Since video basically involves dynamic information, naively implementing our illumination estimation optimization for each video frame tends to produce enhanced video with temporal inconsistencies in the form of jittering artifacts. Hence, we propose to estimate temporally coherent illumination for enhancing underexposed videos.

Let denote the frames of an input underexposed video. We first estimate the illumination map for the first frame by minimizing Eqn. (7). For each subsequent frame (), we design the following objective function to estimate its illumination :

(16)

where denotes the main body of the objective function in Eqn. (7). While the second term enforces illumination to have temporally smooth transition by constraining that the current frame and its previous frame have similar illumination values at the same spatial position. Note that, is assumed to be known in Eqn. (7). is a parameter for balancing the contribution of the two parts in Eqn. (16). By this means, we are able to obtain temporally coherent illumination sequence, which not only allows us to recover enhanced video with clear details, distinct contrast and vivid color (see Fig. 12), but also helps avoid jittering artifacts (see the supplementary material for our enhanced video and the result produced by per-frame implementation of our illumination estimation optimization).

5 Experiment

In this section, we present experiments to evaluate the performance of our underexposed photo enhancement method by comparing it with various state-of-the-art methods.

5.1 Datasets and Evaluation Metrics

Benchmark datasets. We employ six benchmark datasets to evaluate our method, which are the NPE dataset [1], MEF dataset [40], MF dataset [24], LIME dataset [4], VV dataset 111https://sites.google.com/site/vonikakis/datasets and the FiveK dataset [15]. Note that, for the FiveK dataset, we randomly select 100 underexposed images from it for evaluation, while the remaining 4900 images are used for training the HDRNet method [5] to be compared.

Evaluation metrics. We employ two commonly-used metrics to quantitatively evaluate the enhancement performance. The first one is DE (discrete entropy)  [41], which measures the performance of detail/contrast enhancement. The second one is NIQE (natural image quality evaluator) [42], which is a learned model for assessing the overall naturalness of images. In general, high DE values of the enhanced images mean that the detail visibility of the original underexposed images are better improved, while low NIQE values indicate that the enhanced images own good naturalness. Although it is not absolutely true, high DE and low NIQE values usually indicate reasonably good results.

5.2 Comparison with State-of-the-art Methods

We compare our method with six recent photo enhancement methods: NPE [1], WVM [2], JieP [3], LIME [4], HDRNet [5] and DPE [6]

. The first four are Retinex-based methods, while the last two are deep-learning-based methods. For fair comparison, we obtain the results of the compared methods either from the online demo programs or by producing them using implementations provided by the authors with the recommend parameter setting. Moreover, the image denoising in our method is not performed. In the following, we conduct the comparison in three aspects, including visual comparison, quantitative comparison, and a user study.

(a) Input
(b) NPE [1]
(c) WVM [2]
(d) JieP [3]
(e) LIME [4]
(f) HDRNet [5]
(g) DPE [6]
(h) Ours
Fig. 13: Visual comparison with state-of-the-art methods on a test image from the MEF dataset [40].
(a) Input
(b) NPE [1]
(c) WVM [2]
(d) JieP [3]
(e) LIME [4]
(f) HDRNet [5]
(g) DPE [6]
(h) Ours
Fig. 14: Visual comparison with state-of-the-art methods on a test image from the FiveK dataset [15].

Visual comparison. We first show visual comparison in Fig. 13 and 14 on two challenging cases from the employed datasets: (i) a non-uniformly exposed photo with dim candlelight and imperceptible scene details (from the MEF dataset), (ii) an uniformly underexposed photo with little portrait details of the crawling baby (from the FiveK dataset). Comparing the results, we can see that our method outperforms the compared methods and has the following two advantages. First, it is able to recover more details and better contrast for the underexposed regions, without degrading other parts of the image. Second, it can reveal more vivid and natural colors, which makes our enhanced images look more realistic. Please see the supplementary material for more visual comparison results.

Quantitative comparison. Second, we quantitatively evaluate the performance of our method by comparing it with other methods in terms of the DE and NIQE metrics. Table I reports the quantitative comparison results. Note that, the original average DE and NIQE values for each dataset are also shown for reference. As can be seen, all methods increase the DE value due to the detail/contrast enhancement, and reduce the NIQE value because of lightening the underexposed regions. In contrast, our method achieves higher DE and lower NIQE than other compared methods on almost all the datasets, which manifests that our method can not only recover clearer details and more distinct contrast, but also better preserve the overall naturalness and photorealism of the enhanced images.

User study. Since evaluating the visual quality of the enhanced images involves judgement of personal preference, we further conducted a user study to compare the results. To this end, we enhanced each test image in the six employed datasets using our method and the other six compared methods, and recruited 100 subjects via Amazon Mechanical Turk to rate the results. Specifically, for each test image, each subject was asked to rate seven different enhancement results (ours and other six methods’) using a Likert scale from 1 (least favorite) to 7 (most favorite), according to the following common requirements for the desired enhancement results: (i) clear details and distinct contrast in originally underexposed regions, (ii) colors are natural and vivid, (iii) no loss of details and overexposure, (iv) without degrading the overall photo realism. To avoid possible subjective bias, the subjects were assigned with anonymous results in random orders. After the subjects finished rating all the results, we computed the average ratings obtained by each method on different datasets.

Fig. 15

summarizes the ratings by dataset, where we can see that our method receives higher ratings compared to the others, demonstrating that results generated by our algorithm are more preferred by human subjects in average. We also performed a statistical analysis on the ratings by conducting paired t-tests between our method and the others. All results are statistically significant with

.

Dataset Original NPE [1] WVM [2] JieP [3] LIME [4] HDRNet [5] DPE [6] Ours
DE NIQE DE NIQE DE NIQE DE NIQE De NIQE DE NIQE DE NIQE DE NIQE
NPE 6.56 3.89 7.22 3.18 7.03 3.55 7.34 3.11 7.54 3.31 7.33 3.51 7.13 3.62 7.64 3.02
MEF 6.07 4.27 7.14 3.59 6.89 3.84 7.29 3.51 7.32 3.71 7.16 3.63 7.08 3.76 7.56 3.37
MF 6.36 3.35 7.11 3.02 7.14 3.25 7.23 3.17 7.49 3.12 7.19 3.26 7.03 3.41 7.74 2.81
LIME 6.02 4.47 6.91 4.09 6.82 4.29 6.98 3.87 7.39 4.10 7.18 3.95 6.87 4.31 7.45 3.57
VV 6.63 3.38 7.43 2.73 7.32 2.97 7.48 2.81 7.53 2.89 7.62 2.92 7.46 3.17 7.81 2.75
FiveK 6.45 3.29 7.09 2.93 7.03 3.12 7.16 2.82 7.21 2.88 7.11 2.79 6.93 3.17 7.25 2.68
TABLE I: Quantitative comparison between our method and the state-of-the-arts on the six employed datasets.
Fig. 15: Ratings of different methods on the six employed datasets in the user study. The ordinate axis shows the average ratings received by the methods from the subjects on each dataset. Higher ratings indicate better results.

5.3 More Analysis

Time performance. Thanks to the acceleration, our method is scalable and fast. Without performing the image denoising, our current unoptimized Matlab implementation takes about 0.5 seconds to enhance a 800 1200 image, which is slightly slower than [5] who enables real-time photo enhancement, but faster than most existing methods. For the optional image denoising, it takes about 20 seconds for denoising a 1-mega-pixel image.

(a)
(b)
(c)
Fig. 16: Our method allows enhancing underexposed images with color constancy. (a) Input images. (b) and (c) are enhancement results derived from single-channel and three-channel illumination maps, respectively.

Relationship to color constancy. Since our approach is built upon the Retinex theory, it can also be extended to preserving the color constancy while enhancing underexposed photos. As shown in Fig. 16, by performing our algorithm separately on each RGB channel of the source underexposed images and then recombining the three single-channel enhanced images, we obtain visually compelling results with the feature of color constancy.

Limitations. Our method has limitations. As shown in Fig. 18, both our method and the state-of-the art methods all fail to produce visually compelling results for the test image in Fig. 18(a), since the body of the knight and the horse is almost black and barely have any textures and details. Another limitation is that our method relies on users to judge whether the enhanced images require performing image denoising, which may introduce extra difficulty and time cost for users to obtain the enhancement results.

Fig. 17: More enhancement results produced by our method. Top: input underexposed images. Bottom: our results.

5.4 Additional Results

Fig. 17 shows more results produced by our method, where the underexposed images are diverse and involve various lighting conditions, including: (i) a nighttime outdoor image with irregular light source in the center of the image (1st column), (ii) an indoor low-light image with objects on the desk underexposed (2nd column), (iii) an unevenly exposed image with the sky normally exposed while the building underexposed (3rd column), and (iv) an evenly exposed image with little details of the dog and the grassland (4th column). As can be seen, for all these challenging cases, our method can still produce reasonably good results.

6 Conclusion

We have presented a novel approach for enhancing underexposed photos. Our approach is inspired by the observation that, the reason why existing methods fail to produce visually compelling results is because they break the perceptual consistency between the input image and the corresponding enhancement result. Based on this observation, we propose a simple yet effective criterion — perceptually bidirectional similarity (PBS), to explicitly characterize the perceptual consistency. With the PBS and the Retinex theory, we cast the underexposed photo enhancement as PBS-constrained illumination estimation optimization, where we define PBS as three constraints on illumination, so as to obtain the PBS-satisfied illumination that can recover the desired enhancement results. To enable more efficient and scalable photo enhancement, we introduce a sampling-based strategy to accelerate the illumination estimation. Moreover, we extend our method to handle videos. We have performed extensive experiments on six benchmark datasets, and compared our method with various state-of-the-art methods to show the superiority of our approach in terms of visual comparison, quantitative comparison, and a user study.

In the future, we will explore the possibility of adopting techniques in scene semantic analysis and photographic image synthesis to handle mostly black regions. Another direction is to unify enhancement and the denoising process by incorporating noise removal into the illumination estimation, such that the estimated illumination can directly recover enhancement results with noise suppressed.

(a) Input
(b) LIME [4]
(c) DPE [6]
(d) Ours
Fig. 18: Failed case. Our method, as well as other state-of-the-arts, all fail to handle purely black regions.

References

  • [1] S. Wang, J. Zheng, H.-M. Hu, and B. Li, “Naturalness preserved enhancement algorithm for non-uniform illumination images,” IEEE Trans. Image Process., vol. 22, no. 9, pp. 3538–3548, 2013.
  • [2] X. Fu, D. Zeng, Y. Huang, X.-P. Zhang, and X. Ding, “A weighted variational model for simultaneous reflectance and illumination estimation,” in

    Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit.

    , 2016, pp. 2782–2790.
  • [3] B. Cai, X. Xu, K. Guo, K. Jia, B. Hu, and D. Tao, “A joint intrinsic-extrinsic prior model for retinex,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 4000–4009.
  • [4] X. Guo, Y. Li, and H. Ling, “Lime: Low-light image enhancement via illumination map estimation,” IEEE Trans. Image Process., vol. 26, no. 2, pp. 982–993, 2017.
  • [5] M. Gharbi, J. Chen, J. T. Barron, S. W. Hasinoff, and F. Durand, “Deep bilateral learning for real-time image enhancement,” ACM Trans. Graph., vol. 36, no. 4, p. 118, 2017.
  • [6] Y.-S. Chen, Y.-C. Wang, M.-H. Kao, and Y.-Y. Chuang, “Deep photo enhancer: Unpaired learning for image enhancement from photographs with gans,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit., 2018, pp. 6306–6314.
  • [7] S. M. Pizer, E. P. Amburn, J. D. Austin, R. Cromartie, A. Geselowitz, T. Greer, B. ter Haar Romeny, J. B. Zimmerman, and K. Zuiderveld, “Adaptive histogram equalization and its variations,” Comput. Vis., Graph., Image Process., vol. 39, no. 3, pp. 355–368, 1987.
  • [8] K. Zuiderveld, “Contrast limited adaptive histogram equalization,” in Graphics gems IV, 1994, pp. 474–485.
  • [9] M. Abdullah-Al-Wadud, M. H. Kabir, M. A. A. Dewan, and O. Chae, “A dynamic histogram equalization for image contrast enhancement,” IEEE Trans. Consum. Electron., vol. 53, no. 2, 2007.
  • [10] F. Drago, K. Myszkowski, T. Annen, and N. Chiba, “Adaptive logarithmic mapping for displaying high contrast scenes,” Comput. Graph. Forum, vol. 22, no. 3, pp. 419–426, 2003.
  • [11] E. P. Bennett and L. McMillan, “Video enhancement using per-pixel virtual exposures,” ACM Trans. Graph., vol. 24, no. 3, pp. 845–852, 2005.
  • [12] Q. Shan, J. Jia, and M. S. Brown, “Globally optimized linear windowed tone mapping,” IEEE Trans. Vis. Comput. Graph., vol. 16, no. 4, pp. 663–675, 2010.
  • [13] L. Yuan and J. Sun, “Automatic exposure correction of consumer photographs,” in Proc. Eur. Conf. Comput. Vis., 2012, pp. 771–785.
  • [14] E. H. Land, “The retinex theory of color vision,” Sci. Am., vol. 237, no. 6, pp. 108–129, 1977.
  • [15] V. Bychkovsky, S. Paris, E. Chan, and F. Durand, “Learning photographic global tonal adjustment with a database of input/output image pairs,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit., 2011, pp. 97–104.
  • [16] S. J. Hwang, A. Kapoor, and S. B. Kang, “Context-based automatic local image enhancement,” in Proc. Eur. Conf. Comput. Vis., 2012, pp. 569–582.
  • [17] J. Yan, S. Lin, S. Bing Kang, and X. Tang, “A learning-to-rank approach for image color enhancement,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit., 2014, pp. 2987–2994.
  • [18] Y. Hu, H. He, C. Xu, B. Wang, and S. Lin, “Exposure: A white-box photo post-processing framework,” ACM Trans. Graph., vol. 37, no. 2, p. 26, 2018.
  • [19] J. A. Stark, “Adaptive image contrast enhancement using generalizations of histogram equalization,” IEEE Trans. Image Process., vol. 9, no. 5, pp. 889–896, 2000.
  • [20] T. Celik and T. Tjahjadi, “Contextual and variational contrast enhancement,” IEEE Trans. Image Process., vol. 20, no. 12, pp. 3431–3441, 2011.
  • [21] C. Lee, C. Lee, and C.-S. Kim, “Contrast enhancement based on layered difference representation of 2d histograms,” IEEE Trans. Image Process., vol. 22, no. 12, pp. 5372–5384, 2013.
  • [22] Q. Zhang, Y. Nie, L. Zhang, and C. Xiao, “Underexposed video enhancement via perception-driven progressive fusion,” IEEE Trans. Vis. Comput. Graph., vol. 22, no. 6, pp. 1773–1785, 2016.
  • [23] D. J. Jobson, Z.-u. Rahman, and G. A. Woodell, “A multiscale retinex for bridging the gap between color images and the human observation of scenes,” IEEE Trans. Image Process., vol. 6, no. 7, pp. 965–976, 1997.
  • [24] X. Fu, D. Zeng, Y. Huang, Y. Liao, X. Ding, and J. Paisley, “A fusion-based enhancing method for weakly illuminated images,” Signal Process., vol. 129, pp. 82–96, 2016.
  • [25] Z. Yan, H. Zhang, B. Wang, S. Paris, and Y. Yu, “Automatic photo adjustment using deep neural networks,” ACM Trans. Graph., vol. 35, no. 2, p. 11, 2016.
  • [26] K. G. Lore, A. Akintayo, and S. Sarkar, “Llnet: A deep autoencoder approach to natural low-light image enhancement,” Pattern Recogn., vol. 61, pp. 650–662, 2017.
  • [27] Q. Zhang, G. Yuan, C. Xiao, L. Zhu, and W.-S. Zheng, “High-quality exposure correction of underexposed photos,” in Proc. ACM Int. Conf. Multimedia, 2018, pp. 582–590.
  • [28] R. Grosse, M. K. Johnson, E. H. Adelson, and W. T. Freeman, “Ground truth dataset and baseline evaluations for intrinsic image algorithms,” in Proc. IEEE Int. Conf. Comput. Vis., 2009, pp. 2335–2342.
  • [29] J. Shen, X. Yang, Y. Jia, and X. Li, “Intrinsic images using optimization,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit., 2011, pp. 3481–3487.
  • [30] P.-Y. Laffont, A. Bousseau, and G. Drettakis, “Rich intrinsic image decomposition of outdoor scenes from multiple views,” IEEE Trans. Vis. Comput. Graph., vol. 19, no. 2, pp. 210–224, 2013.
  • [31] L. Xu, Q. Yan, Y. Xia, and J. Jia, “Structure extraction from texture via relative total variation,” ACM Trans. Graph., vol. 31, no. 6, p. 139, 2012.
  • [32] Z. Farbman, R. Fattal, D. Lischinski, and R. Szeliski, “Edge-preserving decompositions for multi-scale tone and detail manipulation,” ACM Trans. Graph., vol. 27, no. 3, p. 67, 2008.
  • [33] L. Xu, C. Lu, Y. Xu, and J. Jia, “Image smoothing via l0 gradient minimization,” ACM Trans. Graph., vol. 30, no. 6, p. 174, 2011.
  • [34] S. Bi, X. Han, and Y. Yu, “An l1 image transform for edge-preserving smoothing and scene-level intrinsic decomposition,” ACM Trans. Graph., vol. 34, no. 4, p. 78, 2015.
  • [35] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Found. Trends Mach. Learn., vol. 3, no. 1, pp. 1–122, 2011.
  • [36] C.-J. Lin, “Projected gradient methods for nonnegative matrix factorization,” Neural Comput., vol. 19, no. 10, pp. 2756–2779, 2007.
  • [37] J. Kopf, M. F. Cohen, D. Lischinski, and M. Uyttendaele, “Joint bilateral upsampling,” ACM Trans. Graph., vol. 26, no. 3, p. 96, 2007.
  • [38] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images.” in Proc. IEEE Int. Conf. Comput. Vis., vol. 98, no. 1, 1998, p. 2.
  • [39] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Color image denoising via sparse 3d collaborative filtering with grouping constraint in luminance-chrominance space,” in Proc. IEEE Int. Conf. Image Process., 2007, pp. 313–316.
  • [40] K. Ma, K. Zeng, and Z. Wang, “Perceptual quality assessment for multi-exposure image fusion,” IEEE Trans. Image Process., vol. 24, no. 11, pp. 3345–3356, 2015.
  • [41] Z. Ye, H. Mohamadian, and Y. Ye, “Discrete entropy and relative entropy study on nonlinear clustering of underwater and arial images,” in Proc. IEEE Int. Conf. Control Appl., 2007, pp. 313–318.
  • [42] A. Mittal, R. Soundararajan, and A. C. Bovik, “Making a “completely blind” image quality analyzer,” IEEE Signal Process. Lett., vol. 20, no. 3, pp. 209–212, 2013.