1 Introduction
With the popularization of the readilyavailable cameras on cell phones, photo sharing on social networks (e.g., Instagram and Facebook) has become a trendy lifestyle. However, capturing wellexposed photos under complex lighting conditions (e.g., lowlight and backlight) remains a challenge for casual photographers. Hence, underexposed photos are inevitably created (see Fig. 1(a) for an example). Because of the low detail visibility and dull colors, these photos not only look unpleasing, but also fail to capture what user desires. Thus, underexposed photo enhancement is usually required for improving the detail visibility and the visual appeal of underexposed photos.
Underexposed photo enhancement is a challenging task, since it is highly nonlinear and subjective. Commercial softwares such as Adobe Lightroom and Photoshop allow users to interactively retouch photos, while they remain largely inscrutable to nonexperts and typically require a tedious process to balance multiple controls (e.g., brightness, contrast, sharpness and saturation, etc.). Other ease of use alternatives such as the Auto Enhance feature on iPhone and the Auto Tone feature in Lightroom allow enhancing underexposed photos by just a single click. However, they may fail to produce highquality results due to the inherent difficulty of automatically balancing all assorted factors in the adjustment, as shown in Fig. 1(b) and (c).
Researchers have also developed numerous algorithms to tackle this problem. Early approaches work by performing histogram equalization [7, 8, 9], or by designing intensity mapping functions [10, 11, 12, 13], while many subsequent approaches [1, 2, 3, 4] rely on the Retinex model [14]
to enhance photos. Others learn datadriven photo adjustment by utilizing either traditional machine learning techniques
[15, 16, 17], or the deep neural networks
[5, 18, 6]. However, as demonstrated in Fig. 2, previous methods still have respective limitations, e.g., the unclear details, local overexposure and color distortion, making they fail to produce visually pleasing results.To address the limitations of previous methods, this paper presents a novel method for enhancing underexposed photos. Our key observation is that the main reason why previous methods may produce visually unpleasing results is because they break a perceptually consistency between the input image and the enhancement result. Based on this observation, we propose a simple yet effective criterion, called perceptually bidirectional similarity (PBS), for explicitly describing how to preserve the perceptual consistency. With the proposed PBS, we adopt the Retinex theory and formulate photo enhancement as PBSconstrained illumination estimation optimization, where we solve for the illumination under three constraints characterized by the PBS, so as to recover highquality results free of the artifacts encountered by existing methods. A samplingbased strategy is also described to accelerate the illumination estimation and allow more efficient and scalable photo enhancement. Moreover, we adapt our method to handle underexposed videos by enforcing the illumination to have temporally smooth transition among neighboring frames.
The major contributions of this work are as follows:

First, we propose PBS and design PBSconstrained illumination estimation optimization for enhancing underexposed photos, which allows robustly producing highquality results.

Second, we introduce a samplingbased strategy for accelerating the illumination estimation.

Third, we extend our method to enhance underexposed videos.

Last, we evaluate our method on six datasets and compare it with various stateoftheart methods. Results show that our method outperforms previous methods, both qualitatively and quantitatively.
2 Related Work
Our work mainly relates to photo enhancement, which is a longstanding problem with an immense literature. In this section, we focus on discussing the related works from the following four aspects rather than trying to be exhaustive.
Histogrambased methods. Histogram is an important representation for photo enhancement. One of the most widelyadopted techniques is histogram equalization (HE), which enlarges image contrast by evening out the intensity histogram to the entire dynamic range. However, it tends to yield unrealistic results because of ignoring the relationship between neighboring pixels. Later, various variants of HE are developed to improve the results [7, 8, 19, 9, 20, 21], which basically follow the idea of dividing the global histogram into local histograms. As the optimal histogram partition strategy varies with images and is typically unpredictable, they may also produce unsatisfactory results.
Sigmoidmappingbased methods.
Mapping pixel intensities with sigmoid functions is another commonlyused way to enhance photos. A wellknown representative is Gamma Correction, which expands the dynamic range via a powerlaw function. As globally applying sigmoid mapping may generate visually distorted results, existing methods usually perform locally adaptive mapping. For instance, Bennett and McMillan
[11] decomposed the input image into a base and detail layers, and applied different mappings for the two layers to preserve the image details, while Yuan and Sun [13] segmented the image into subregions and computed luminanceaware detailpreserving mapping for each subregion. Zhang et al. [22] created multiple tone mapped versions for the input image and fused them into a wellexposed image. Since finding locally optimal sigmoid mappings and ensuring globally smooth transition are difficult, these methods often fail for complex images.Retinexbased methods. This kind of method is built upon the assumption that an underexposed image is the pixelwise product of the expected enhancement result and a singlechannel illumination map. In this fashion, the enhancement problem can be treated as an illumination estimation problem. Jobson et al. [23] made an early attempt to this problem, but their results often look unnatural due to the frequently appeared artifacts such as loss of details, color distortion, and uneven exposure. Subsequent methods in this category focus on improving the results [1, 24, 2, 3, 4]. However, they may also fail, especially for nonuniformly illuminated underexposed images. Our method also belongs to this category. However, by maintaining the proposed PBS, our method is able to robustly generate visually pleasing results free of the visual artifacts encountered by previous methods (see Fig. 4, Fig. 13 and 14).
Learningbased methods. An increasing amount of efforts focus on investigating learningbased enhancement methods since the pioneering work of Bychkovsky et al. [15], which provides the first and largest MITAdobe FiveK dataset consisting of input/output image pairs for tone adjustment. Yan et al. [17] achieved automatic color enhancement by tackling a learningtorank problem, while Yan et al. [25] enabled semanticaware image enhancement. Recently, Lore et al. [26]
presented a deep autoencoderbased approach for enhancing lowlight images. Gharbi
et al. [5] proposed bilateral learning to enable realtime image enhancement, while Chen et al. [6] designed an unpaired learning model for image enhancement based on a twoway generative adversarial networks (GANs). The main limitation of learningbased methods is that they typically do not generalize well to images that do not exist in the training datasets.A preliminary version of this work appeared in [27]. In this paper, we have significantly extended the earlier conference version in five aspects. First, we present a samplingbased strategy for accelerating the illumination estimation. Second, we introduce an optional image denoising operation for removing the possible noise in the enhanced image. Third, we extend our method to enhance underexposed videos. Fourth, we provide deeper analysis on the proposed PBS’s properties and potentials, and the effect of the regularization parameters. Fifth, we have conducted extensive experiments to evaluate the advantage of our method, including further comparisons with more recent methods and evaluations on an additional dataset. Our code will be made publicly available at http://zhangqinghome.net/.
3 Methodology
This section presents the proposed underexposed photo enhancement approach. For completeness, we first summarize the background knowledge for Retinexbased image enhancement in Section 3.1. Section 3.2 introduces the proposed PBS and analyze how to characterize it as specific constraints on illumination, while Section 3.3 formulates the enhancement problem as PBSconstrained illumination estimation, and derives an ADMM based procedure for solving the involved nonconvex optimization problem. Then, the implementation details and parameter setting are illustrated in Section 3.4. Next, we describe a samplingbased strategy for accelerating the illumination estimation in Section 3.5, and an optional image denoising step for removing the possible noise in the enhanced image in Section 3.6.



3.1 Background
Retinexbased image enhancement is built upon the following image formation model, which assumes that an underexposed image (normalized to [0,1]) is the pixelwise product of the desired enhanced image and a singlechannel illumination map :
(1) 
where denotes pixelwise multiplication. With this model, the enhancement problem can be reduced to an illumination estimation problem. Since once is known, we can recover the enhanced image by , where the division is pixelwise. Note that the model in Eqn. (1) is essentially different with that of the intrinsic image decomposition [29, 30], which has a similar formulation and aims to separate an image into pixelwise product of a reflectance/albedo component and an illumination/shading component. As demonstrated in Fig. 3, the reflectance component describes the inherent material property and usually loses the visual realism, while our desired enhanced image in Eqn. (1) is a natural image with improved detail visibility and visual appeal, as shown in Fig. 3(c).
3.2 Perceptually Bidirectional Similarity (PBS)
Here, we introduce the PBS and elaborate how we characterize it as constraints on the illumination map in Eqn. (1).
Before introducing PBS, we first summarize the common issues encountered by existing underexposed photo enhancement methods, which inspire the proposal of PBS. As shown in Fig. 4(b)(g), color distortion, uneven exposure and loss of detail are the three main issues. More concretely, CLAHE [8] and NPE [1] distort the skin color and mistakenly make the girl’s face and arms gray, giving rise to color family mismatch between the input and the output. Yuan and Sun [13] and WVM [2] induce exposure inconsistency around the arms and the body, while these regions have consistent exposure in the input. Bennett and McMillan [11] and LIME [4] overexpose the background and lead to significant loss of detail.
Key observation. From the above analysis, we have an important observation — that is, the main reason why existing methods fail to produce visually pleasing results is because they break the bidirectional perceptual consistency on color, detail and local exposure distribution between the input and the enhanced output. Intuitively, this observation suggests that a good enhanced image should not only recover clear details from the underexposed regions, but also satisfy two properties: 1) it should contain all the visual information in the input image; 2) it should not introduce new visual artifacts that were not in the input image.
PBS definition. Based on the observation, we propose PBS, which more specifically characterizes the aforementioned two requirements for the expected enhanced image of the input underexposed image : 1) colors and details in underexposed regions of should all exist in as properly enhanced versions, and regions in with consistent exposure should also have consistent exposure in ; 2) should not contain distorted colors, additional details and local exposure inconsistencies that originally do not exist in . To utilize the PBS, we define it as three numerical constraints on illumination below, which help ensure the bidirectional consistency of color, detail and exposure distribution between and , respectively.
Color consistency. To preserve the color consistency, we enforce each pixel’s color in and are in the same color family by imposing a range constraint on . Since and is normalized to [0,1], small (large) yields with high (low) RGB values. Intuitively, color inconsistency may appear in terms of mismatched colors in derived from naive color truncation, when is too small to guarantee that each RGB color channel in the enhanced image remains in the color gamut [0,1]. Hence, we bound at each pixel is no less than a value that can enlarge the maximum RGB color channel of the corresponding pixel in to 1 through , which is expressed as
(2) 
where is a color channel at pixel . is the Gamma function with , which is an optional operation for further illumination adjustment. From Eqn. (2), we can easily obtain . To avoid mistakenly darken the input image, we set the upper bound of to 1, in which case the input will be directly taken as the output. Overall, for each pixel , the constraint for color consistency can be defined as .




Detail consistency. To facilitate understanding, we reformulate the detail consistency described by PBS from a perspective of edge consistency as follows: 1) If is smooth at pixel , then should also be smooth at ; 2) If has an edge at pixel , then should have a stronger, or at least equivalent edge at . By associating edge with gradient and directional derivative, the above two cases can be characterized as the following constraint:
(3) 
where denotes the gradient operator. is the first order derivative along the horizontal () or vertical () direction. is a small constant (typically 1e5) for determining whether there is an edge at a pixel in the input image . Note that Eqn. (3) can also be expressed in terms of by replacing with .
Exposure distribution consistency. According to Eqn. (1), the key to preserving the exposure distribution consistency is to ensure that is locally smooth for regions with similar illumination in the input. To this end, we alternatively adopt the relative total variation (RTV) measure [31] as the smoothness regularizer for obtaining piecewise smooth illumination, while maintaining the prominent illumination discontinuities across regions. Adopting this regularizer can also help enhance image contrast, because when adjacent pixels and have similar illumination values (), their contrast in the enhanced image can be estimated as , which will be definitely enlarged, since . Note that other edgeaware smoothness regularizers [32, 33, 34] can also work with our approach. Formally, the RTV measure is defined as
(4) 
where and denote the  and direction RTV measure, respectively. Specifically, the direction measure is written as
(5) 
where denotes a window centered at pixel . and , where
denotes the Gaussian kernel with standard deviation
, is the convolution operator, and is a small constant fixed to 1e3 for preventing division by zero. The direction measure is defined similarly, we thus do not give its definition separately.3.3 PBSconstrained Illumination Estimation
This section illustrates how we cast the underexposed photo enhancement as a PBSconstrained illumination estimation problem. We first introduce how to obtain an initial illumination map from the input image. Then, we adopt the PBS constraints and design an optimization framework for refining the initial illumination map, so that we can obtain the desired PBSsatisfied illumination. Finally, we describe an ADMM based solver for the optimization.
Initial illumination extraction. Intuitively, the brightness of different areas in an underexposed image roughly reflect the magnitude of illumination. Hence, inspired by [14], we compute the initial illumination map by treating the maximum values among the RGB color channels of the input as the illumination values, which is expressed as
(6) 
As analyzed by [4], by this means, the initial illumination can better model the global illumination distribution, and ensures that the enhanced image will be less saturated since it is recovered by , avoiding enlarging color channels in to one. However, though the initial illumination map can act as a robust estimation for the global illumination distribution, naively recovering enhanced image from it typically produces unrealistic result, as shown in Fig. 6(c). Hence, we further devise a PBSconstrained illumination estimation optimization for refining the initial illumination, so that we can recover visually compelling result from the refined illumination.
Objective function. Intuitively, the ideal illumination map should simultaneously preserve the global illumination distribution characterized by and satisfy the PBS constraints. Hence, we define the following objective function for estimating the desired illumination :
(7) 
where is the balancing weight. The first term forces the target illumination to be close to the initial illumination in structure, while the second term and the other two constraints are the PBS constraints.
ADMM solver. The objective function in Eqn. (7) involves an intractable nonconvex energy minimization. To obtain its solution, we derive a solver based on the alternating direction method of multipliers (ADMM) technique [35].
Before describing the details, we first convert the image formation model in Eqn. (1) to the logdomain, so that we can reduce the division operation in Eqn. (7) to more tractable subtraction form. Let , , and . Eqn. (1) is then written as . The color and detail consistency constraints can be accordingly expressed as and
(8) 
For the exposure distribution consistency constraint, we can simply replace with in Eqn. (4). Since for any variable , similar to [2], we multiply the numerator and denominator in the second line of Eqn. (8) by and to eliminate the impact of the scaling weight. Note that, is seen as a constant here, since it can be estimated from previous iteration. With the logtransformation, the objective function in Eqn. (7) can be written in a matrix form as follows
(9) 
where , and
are vector representations of
, and . and are diagonal matrices with and . and are the Toeplitz matrices from the discrete gradient operators with forward difference. is a binary matrix indicating whether a pixel in the input satisfies . and are diagonal matrices consisting of the constant parts in the second line of Eqn. (8).To apply ADMM, we rewrite the minimization problem in Eqn. (9) as the following equivalent form:
(10) 
where , and . and are defined similarly. , and are auxiliary variables for making the original problem separable. The augmented Lagrangian function of Eqn. (10) is then written as
(11) 
where , and are the Lagrangian multipliers, is the penalty parameter. computes the standard inner product. The problem in Eqn. (11) can be further divided into the following subproblems with respect to , , and , respectively:
(12a)  
(12b)  
(12c)  
(12d) 
where denotes the th iteration. By iteratively solving each subproblem while fixing others until convergence, we can obtain the solution to Eqn. (10). Specifically, we first obtain by solving the subproblem in Eqn. (12a). With , we then compute , and by:
(13) 
where projects entries that satisfy to zero. ensures that other entries () are no less than 1. The Lagrangian multipliers are updated by:
(14) 
where is the relaxation parameter. The whole ADMM procedure for the illumination estimation optimization is summarized in Algorithm 1.
3.4 Implementation and Parameter Setting
We employ projected gradient descent method [36] to solve the subproblem in Eqn. (12a). The key parameter of our approach is , which determines the smoothness level of the estimated illumination map. In general, we set large for highly textured images. is another parameter that affects the result. In all our experiments, we empirically set and , which are able to produce reasonably good results for our testing images. The final enhanced image is computed by . Fig. 5 shows two examples.
Effectiveness of each PBS constraint. Fig. 7 demonstrates the effectiveness of each PBS constraint. We can see that the skin color of the girl is obviously distorted when we remove the color consistency constraint (Fig. 7(b)), while removing the detail consistency constraint overexposes the grass and the face and arm of the girl (Fig. 7(c)). Without the exposure distribution consistency constraint, the enhanced image shows disturbing exposure inconsistency around the body of the girl (Fig. 7(d)), while these regions have similar exposure level in the input image. Last, by combining all the three PBS constraints, we achieve the visually best result with clear details, vivid color, distinct contrast and normal exposure distribution, as shown in Fig. 7(e).
Effect of varying parameters. Fig. 8 evaluates the effect of varying and . As shown in the first row, larger produces result with stronger local contrast. However, this effect becomes less obvious when . As larger typically requires more iterations to converge, we fix as a tradeoff. The second row of Fig. 8 shows how affects the results. We can see that the result without Gamma mapping (namely ) is also satisfactory, but too bright to be consistent with the image aesthetics. Decreasing can reduce the overall brightness, but at the cost of lowering the overall visibility. To obtain better visual results, we set for all our tested images.
Convergence analysis. Algorithm 1 stops iteration when: (i) the difference between two consecutive solutions is less than a small threshold (1e3), or (ii) the maximum number (we empirically set it as 20) of iterations is reached. We have experimentally found that our algorithm has good convergence rate when and , and usually converges within 510 iterations. Fig. 9 plots the convergence curve of our algorithm for an example image. As shown, the illumination estimation optimization converges after 7 iterations, and more iterations barely improve the result.
3.5 Acceleration
The illumination estimation optimization described in Section 3.3 is relatively fast, but its naive application to underexposed photo enhancement would be computationally expensive for highresolution images. This section describes a samplingbased strategy to enable more efficient photo enhancement, even for highresolution images.
Due to the piecewise smooth nature of the illumination in natural images, the main idea behind our acceleration method is to sample a lowresolution input image for illumination estimation, and then upsample the estimated lowresolution illumination to the fullresolution for photo enhancement. To this end, we first downsample the source image with its larger dimension (width or height) no more than 400 pixels, and perform illumination estimation optimization on the downsampled image. Then, we employ joint bilateral upsampling (JBU) [37] to upsample the obtained lowresolution illumination map to fullresolution version in an edgeaware manner, which is expressed as
(15) 
where is the initial illumination (fullresolution) obtained from Eqn. (6). and denote coordinates of pixels in and , while and denote coordinates of pixels in the lowresolution solution . and are spatial and range filter kernels in terms of truncated Gaussian (see [38] for details) with standard deviation and , respectively. denotes a window centered at pixel . is the normalizing factor that sums the filter weight .
3.6 Image denoising
While the proposed method can robustly enhance underexposed photos, it may also amplify the underlying noise, as shown in Fig. 10(b). To further improve the visual quality, we introduce an image denoising operation as postprocessing. For the sake of the performance and the runtime efficiency, we adopt CBM3D [39] to fulfill the task, though any other color image denoising algorithms would also work with our method. By performing image denoising, we are able to remove noise and produce visually more compelling result, as shown in Fig. 10(c). It is worth mentioning that the denoising operation is optional, since not every underexposed photo contains noise.




4 Extension to Video
Since video basically involves dynamic information, naively implementing our illumination estimation optimization for each video frame tends to produce enhanced video with temporal inconsistencies in the form of jittering artifacts. Hence, we propose to estimate temporally coherent illumination for enhancing underexposed videos.
Let denote the frames of an input underexposed video. We first estimate the illumination map for the first frame by minimizing Eqn. (7). For each subsequent frame (), we design the following objective function to estimate its illumination :
(16) 
where denotes the main body of the objective function in Eqn. (7). While the second term enforces illumination to have temporally smooth transition by constraining that the current frame and its previous frame have similar illumination values at the same spatial position. Note that, is assumed to be known in Eqn. (7). is a parameter for balancing the contribution of the two parts in Eqn. (16). By this means, we are able to obtain temporally coherent illumination sequence, which not only allows us to recover enhanced video with clear details, distinct contrast and vivid color (see Fig. 12), but also helps avoid jittering artifacts (see the supplementary material for our enhanced video and the result produced by perframe implementation of our illumination estimation optimization).
5 Experiment
In this section, we present experiments to evaluate the performance of our underexposed photo enhancement method by comparing it with various stateoftheart methods.
5.1 Datasets and Evaluation Metrics
Benchmark datasets. We employ six benchmark datasets to evaluate our method, which are the NPE dataset [1], MEF dataset [40], MF dataset [24], LIME dataset [4], VV dataset ^{1}^{1}1https://sites.google.com/site/vonikakis/datasets and the FiveK dataset [15]. Note that, for the FiveK dataset, we randomly select 100 underexposed images from it for evaluation, while the remaining 4900 images are used for training the HDRNet method [5] to be compared.
Evaluation metrics. We employ two commonlyused metrics to quantitatively evaluate the enhancement performance. The first one is DE (discrete entropy) [41], which measures the performance of detail/contrast enhancement. The second one is NIQE (natural image quality evaluator) [42], which is a learned model for assessing the overall naturalness of images. In general, high DE values of the enhanced images mean that the detail visibility of the original underexposed images are better improved, while low NIQE values indicate that the enhanced images own good naturalness. Although it is not absolutely true, high DE and low NIQE values usually indicate reasonably good results.
5.2 Comparison with Stateoftheart Methods
We compare our method with six recent photo enhancement methods: NPE [1], WVM [2], JieP [3], LIME [4], HDRNet [5] and DPE [6]
. The first four are Retinexbased methods, while the last two are deeplearningbased methods. For fair comparison, we obtain the results of the compared methods either from the online demo programs or by producing them using implementations provided by the authors with the recommend parameter setting. Moreover, the image denoising in our method is not performed. In the following, we conduct the comparison in three aspects, including visual comparison, quantitative comparison, and a user study.
Visual comparison. We first show visual comparison in Fig. 13 and 14 on two challenging cases from the employed datasets: (i) a nonuniformly exposed photo with dim candlelight and imperceptible scene details (from the MEF dataset), (ii) an uniformly underexposed photo with little portrait details of the crawling baby (from the FiveK dataset). Comparing the results, we can see that our method outperforms the compared methods and has the following two advantages. First, it is able to recover more details and better contrast for the underexposed regions, without degrading other parts of the image. Second, it can reveal more vivid and natural colors, which makes our enhanced images look more realistic. Please see the supplementary material for more visual comparison results.
Quantitative comparison. Second, we quantitatively evaluate the performance of our method by comparing it with other methods in terms of the DE and NIQE metrics. Table I reports the quantitative comparison results. Note that, the original average DE and NIQE values for each dataset are also shown for reference. As can be seen, all methods increase the DE value due to the detail/contrast enhancement, and reduce the NIQE value because of lightening the underexposed regions. In contrast, our method achieves higher DE and lower NIQE than other compared methods on almost all the datasets, which manifests that our method can not only recover clearer details and more distinct contrast, but also better preserve the overall naturalness and photorealism of the enhanced images.
User study. Since evaluating the visual quality of the enhanced images involves judgement of personal preference, we further conducted a user study to compare the results. To this end, we enhanced each test image in the six employed datasets using our method and the other six compared methods, and recruited 100 subjects via Amazon Mechanical Turk to rate the results. Specifically, for each test image, each subject was asked to rate seven different enhancement results (ours and other six methods’) using a Likert scale from 1 (least favorite) to 7 (most favorite), according to the following common requirements for the desired enhancement results: (i) clear details and distinct contrast in originally underexposed regions, (ii) colors are natural and vivid, (iii) no loss of details and overexposure, (iv) without degrading the overall photo realism. To avoid possible subjective bias, the subjects were assigned with anonymous results in random orders. After the subjects finished rating all the results, we computed the average ratings obtained by each method on different datasets.
Fig. 15
summarizes the ratings by dataset, where we can see that our method receives higher ratings compared to the others, demonstrating that results generated by our algorithm are more preferred by human subjects in average. We also performed a statistical analysis on the ratings by conducting paired ttests between our method and the others. All results are statistically significant with
.Dataset  Original  NPE [1]  WVM [2]  JieP [3]  LIME [4]  HDRNet [5]  DPE [6]  Ours  

DE  NIQE  DE  NIQE  DE  NIQE  DE  NIQE  De  NIQE  DE  NIQE  DE  NIQE  DE  NIQE  
NPE  6.56  3.89  7.22  3.18  7.03  3.55  7.34  3.11  7.54  3.31  7.33  3.51  7.13  3.62  7.64  3.02 
MEF  6.07  4.27  7.14  3.59  6.89  3.84  7.29  3.51  7.32  3.71  7.16  3.63  7.08  3.76  7.56  3.37 
MF  6.36  3.35  7.11  3.02  7.14  3.25  7.23  3.17  7.49  3.12  7.19  3.26  7.03  3.41  7.74  2.81 
LIME  6.02  4.47  6.91  4.09  6.82  4.29  6.98  3.87  7.39  4.10  7.18  3.95  6.87  4.31  7.45  3.57 
VV  6.63  3.38  7.43  2.73  7.32  2.97  7.48  2.81  7.53  2.89  7.62  2.92  7.46  3.17  7.81  2.75 
FiveK  6.45  3.29  7.09  2.93  7.03  3.12  7.16  2.82  7.21  2.88  7.11  2.79  6.93  3.17  7.25  2.68 
5.3 More Analysis
Time performance. Thanks to the acceleration, our method is scalable and fast. Without performing the image denoising, our current unoptimized Matlab implementation takes about 0.5 seconds to enhance a 800 1200 image, which is slightly slower than [5] who enables realtime photo enhancement, but faster than most existing methods. For the optional image denoising, it takes about 20 seconds for denoising a 1megapixel image.



Relationship to color constancy. Since our approach is built upon the Retinex theory, it can also be extended to preserving the color constancy while enhancing underexposed photos. As shown in Fig. 16, by performing our algorithm separately on each RGB channel of the source underexposed images and then recombining the three singlechannel enhanced images, we obtain visually compelling results with the feature of color constancy.
Limitations. Our method has limitations. As shown in Fig. 18, both our method and the stateofthe art methods all fail to produce visually compelling results for the test image in Fig. 18(a), since the body of the knight and the horse is almost black and barely have any textures and details. Another limitation is that our method relies on users to judge whether the enhanced images require performing image denoising, which may introduce extra difficulty and time cost for users to obtain the enhancement results.
5.4 Additional Results
Fig. 17 shows more results produced by our method, where the underexposed images are diverse and involve various lighting conditions, including: (i) a nighttime outdoor image with irregular light source in the center of the image (1st column), (ii) an indoor lowlight image with objects on the desk underexposed (2nd column), (iii) an unevenly exposed image with the sky normally exposed while the building underexposed (3rd column), and (iv) an evenly exposed image with little details of the dog and the grassland (4th column). As can be seen, for all these challenging cases, our method can still produce reasonably good results.
6 Conclusion
We have presented a novel approach for enhancing underexposed photos. Our approach is inspired by the observation that, the reason why existing methods fail to produce visually compelling results is because they break the perceptual consistency between the input image and the corresponding enhancement result. Based on this observation, we propose a simple yet effective criterion — perceptually bidirectional similarity (PBS), to explicitly characterize the perceptual consistency. With the PBS and the Retinex theory, we cast the underexposed photo enhancement as PBSconstrained illumination estimation optimization, where we define PBS as three constraints on illumination, so as to obtain the PBSsatisfied illumination that can recover the desired enhancement results. To enable more efficient and scalable photo enhancement, we introduce a samplingbased strategy to accelerate the illumination estimation. Moreover, we extend our method to handle videos. We have performed extensive experiments on six benchmark datasets, and compared our method with various stateoftheart methods to show the superiority of our approach in terms of visual comparison, quantitative comparison, and a user study.
In the future, we will explore the possibility of adopting techniques in scene semantic analysis and photographic image synthesis to handle mostly black regions. Another direction is to unify enhancement and the denoising process by incorporating noise removal into the illumination estimation, such that the estimated illumination can directly recover enhancement results with noise suppressed.
References
 [1] S. Wang, J. Zheng, H.M. Hu, and B. Li, “Naturalness preserved enhancement algorithm for nonuniform illumination images,” IEEE Trans. Image Process., vol. 22, no. 9, pp. 3538–3548, 2013.

[2]
X. Fu, D. Zeng, Y. Huang, X.P. Zhang, and X. Ding, “A weighted variational
model for simultaneous reflectance and illumination estimation,” in
Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit.
, 2016, pp. 2782–2790.  [3] B. Cai, X. Xu, K. Guo, K. Jia, B. Hu, and D. Tao, “A joint intrinsicextrinsic prior model for retinex,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 4000–4009.
 [4] X. Guo, Y. Li, and H. Ling, “Lime: Lowlight image enhancement via illumination map estimation,” IEEE Trans. Image Process., vol. 26, no. 2, pp. 982–993, 2017.
 [5] M. Gharbi, J. Chen, J. T. Barron, S. W. Hasinoff, and F. Durand, “Deep bilateral learning for realtime image enhancement,” ACM Trans. Graph., vol. 36, no. 4, p. 118, 2017.
 [6] Y.S. Chen, Y.C. Wang, M.H. Kao, and Y.Y. Chuang, “Deep photo enhancer: Unpaired learning for image enhancement from photographs with gans,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit., 2018, pp. 6306–6314.
 [7] S. M. Pizer, E. P. Amburn, J. D. Austin, R. Cromartie, A. Geselowitz, T. Greer, B. ter Haar Romeny, J. B. Zimmerman, and K. Zuiderveld, “Adaptive histogram equalization and its variations,” Comput. Vis., Graph., Image Process., vol. 39, no. 3, pp. 355–368, 1987.
 [8] K. Zuiderveld, “Contrast limited adaptive histogram equalization,” in Graphics gems IV, 1994, pp. 474–485.
 [9] M. AbdullahAlWadud, M. H. Kabir, M. A. A. Dewan, and O. Chae, “A dynamic histogram equalization for image contrast enhancement,” IEEE Trans. Consum. Electron., vol. 53, no. 2, 2007.
 [10] F. Drago, K. Myszkowski, T. Annen, and N. Chiba, “Adaptive logarithmic mapping for displaying high contrast scenes,” Comput. Graph. Forum, vol. 22, no. 3, pp. 419–426, 2003.
 [11] E. P. Bennett and L. McMillan, “Video enhancement using perpixel virtual exposures,” ACM Trans. Graph., vol. 24, no. 3, pp. 845–852, 2005.
 [12] Q. Shan, J. Jia, and M. S. Brown, “Globally optimized linear windowed tone mapping,” IEEE Trans. Vis. Comput. Graph., vol. 16, no. 4, pp. 663–675, 2010.
 [13] L. Yuan and J. Sun, “Automatic exposure correction of consumer photographs,” in Proc. Eur. Conf. Comput. Vis., 2012, pp. 771–785.
 [14] E. H. Land, “The retinex theory of color vision,” Sci. Am., vol. 237, no. 6, pp. 108–129, 1977.
 [15] V. Bychkovsky, S. Paris, E. Chan, and F. Durand, “Learning photographic global tonal adjustment with a database of input/output image pairs,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit., 2011, pp. 97–104.
 [16] S. J. Hwang, A. Kapoor, and S. B. Kang, “Contextbased automatic local image enhancement,” in Proc. Eur. Conf. Comput. Vis., 2012, pp. 569–582.
 [17] J. Yan, S. Lin, S. Bing Kang, and X. Tang, “A learningtorank approach for image color enhancement,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit., 2014, pp. 2987–2994.
 [18] Y. Hu, H. He, C. Xu, B. Wang, and S. Lin, “Exposure: A whitebox photo postprocessing framework,” ACM Trans. Graph., vol. 37, no. 2, p. 26, 2018.
 [19] J. A. Stark, “Adaptive image contrast enhancement using generalizations of histogram equalization,” IEEE Trans. Image Process., vol. 9, no. 5, pp. 889–896, 2000.
 [20] T. Celik and T. Tjahjadi, “Contextual and variational contrast enhancement,” IEEE Trans. Image Process., vol. 20, no. 12, pp. 3431–3441, 2011.
 [21] C. Lee, C. Lee, and C.S. Kim, “Contrast enhancement based on layered difference representation of 2d histograms,” IEEE Trans. Image Process., vol. 22, no. 12, pp. 5372–5384, 2013.
 [22] Q. Zhang, Y. Nie, L. Zhang, and C. Xiao, “Underexposed video enhancement via perceptiondriven progressive fusion,” IEEE Trans. Vis. Comput. Graph., vol. 22, no. 6, pp. 1773–1785, 2016.
 [23] D. J. Jobson, Z.u. Rahman, and G. A. Woodell, “A multiscale retinex for bridging the gap between color images and the human observation of scenes,” IEEE Trans. Image Process., vol. 6, no. 7, pp. 965–976, 1997.
 [24] X. Fu, D. Zeng, Y. Huang, Y. Liao, X. Ding, and J. Paisley, “A fusionbased enhancing method for weakly illuminated images,” Signal Process., vol. 129, pp. 82–96, 2016.
 [25] Z. Yan, H. Zhang, B. Wang, S. Paris, and Y. Yu, “Automatic photo adjustment using deep neural networks,” ACM Trans. Graph., vol. 35, no. 2, p. 11, 2016.
 [26] K. G. Lore, A. Akintayo, and S. Sarkar, “Llnet: A deep autoencoder approach to natural lowlight image enhancement,” Pattern Recogn., vol. 61, pp. 650–662, 2017.
 [27] Q. Zhang, G. Yuan, C. Xiao, L. Zhu, and W.S. Zheng, “Highquality exposure correction of underexposed photos,” in Proc. ACM Int. Conf. Multimedia, 2018, pp. 582–590.
 [28] R. Grosse, M. K. Johnson, E. H. Adelson, and W. T. Freeman, “Ground truth dataset and baseline evaluations for intrinsic image algorithms,” in Proc. IEEE Int. Conf. Comput. Vis., 2009, pp. 2335–2342.
 [29] J. Shen, X. Yang, Y. Jia, and X. Li, “Intrinsic images using optimization,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit., 2011, pp. 3481–3487.
 [30] P.Y. Laffont, A. Bousseau, and G. Drettakis, “Rich intrinsic image decomposition of outdoor scenes from multiple views,” IEEE Trans. Vis. Comput. Graph., vol. 19, no. 2, pp. 210–224, 2013.
 [31] L. Xu, Q. Yan, Y. Xia, and J. Jia, “Structure extraction from texture via relative total variation,” ACM Trans. Graph., vol. 31, no. 6, p. 139, 2012.
 [32] Z. Farbman, R. Fattal, D. Lischinski, and R. Szeliski, “Edgepreserving decompositions for multiscale tone and detail manipulation,” ACM Trans. Graph., vol. 27, no. 3, p. 67, 2008.
 [33] L. Xu, C. Lu, Y. Xu, and J. Jia, “Image smoothing via l0 gradient minimization,” ACM Trans. Graph., vol. 30, no. 6, p. 174, 2011.
 [34] S. Bi, X. Han, and Y. Yu, “An l1 image transform for edgepreserving smoothing and scenelevel intrinsic decomposition,” ACM Trans. Graph., vol. 34, no. 4, p. 78, 2015.
 [35] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Found. Trends Mach. Learn., vol. 3, no. 1, pp. 1–122, 2011.
 [36] C.J. Lin, “Projected gradient methods for nonnegative matrix factorization,” Neural Comput., vol. 19, no. 10, pp. 2756–2779, 2007.
 [37] J. Kopf, M. F. Cohen, D. Lischinski, and M. Uyttendaele, “Joint bilateral upsampling,” ACM Trans. Graph., vol. 26, no. 3, p. 96, 2007.
 [38] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images.” in Proc. IEEE Int. Conf. Comput. Vis., vol. 98, no. 1, 1998, p. 2.
 [39] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Color image denoising via sparse 3d collaborative filtering with grouping constraint in luminancechrominance space,” in Proc. IEEE Int. Conf. Image Process., 2007, pp. 313–316.
 [40] K. Ma, K. Zeng, and Z. Wang, “Perceptual quality assessment for multiexposure image fusion,” IEEE Trans. Image Process., vol. 24, no. 11, pp. 3345–3356, 2015.
 [41] Z. Ye, H. Mohamadian, and Y. Ye, “Discrete entropy and relative entropy study on nonlinear clustering of underwater and arial images,” in Proc. IEEE Int. Conf. Control Appl., 2007, pp. 313–318.
 [42] A. Mittal, R. Soundararajan, and A. C. Bovik, “Making a “completely blind” image quality analyzer,” IEEE Signal Process. Lett., vol. 20, no. 3, pp. 209–212, 2013.