The ability to swiftly capture high quality images with modest computations has led to the widespread proliferation of digital images. These advantages are, however, limited to good lighting conditions. Achieving similar results under low light is still a significant challenge. While much of the work in this direction has focused on enhancing weakly illuminated images [Kim(1997), Pizer et al.(1987)Pizer, Amburn, Austin, Cromartie, Geselowitz, Greer, ter Haar Romeny, Zimmerman, and Zuiderveld, Pisano et al.(1998)Pisano, Zong, Hemminger, DeLuca, Johnston, Muller, Braeuning, and Pizer, Li et al.(2018b)Li, Liu, Yang, Sun, and Guo, Li et al.(2011)Li, Wang, and Geng, Wei et al.(2018)Wei, Wang, Yang, and Liu, Guo et al.(2016)Guo, Li, and Ling], enhancement of extremely dark images has received comparatively lesser attention.
Recently, however, a landmark paper by Chen et al[Chen et al.(2018)Chen, Chen, Xu, and Koltun] has shown that it is possible to restore extremely dark images captured under near-zero lux conditions. Following this work, several modifications have been proposed in a bid to improve the reconstruction quality. This includes the incorporation of attention units [Ai and Kwon(2020)], recurrent units [Cai and Kintak(2019)], the adoption of a multi-scale approach [Gu et al.(2019)Gu, Li, Gool, and Timofte, Malik and Soundararajan(2019)] and the usage of deeper networks [Maharjan et al.(2019)Maharjan, Li, Li, Xu, Ma, and Li]. With these added complexities, these methods are constrained to run on desktop GPUs such as NVIDIA RTX 2080Ti with 12GB storage. But, real-world applications require image enhancement algorithms to run on embedded systems and edge devices with limited CPU RAM or minimal GPU capacity. One possible solution is to process the images in VGA resolution [Li et al.(2018b)Li, Liu, Yang, Sun, and Guo, Guo et al.(2016)Guo, Li, and Ling, Wei et al.(2018)Wei, Wang, Yang, and Liu, Zhang et al.(2019b)Zhang, Zhang, and Guo, Lore et al.(2017)Lore, Akintayo, and Sarkar, Wang et al.(2013)Wang, Zheng, Hu, and Li]. But, this is in contrast to the current trend of capturing and processing high-definition images. Consequently, we aim to design a deep network that can restore an extreme low-light high-definition single-image with minimal CPU latency and low memory footprint, but at the same time has a competitive image restoration quality.
We propose a deep neural network, called Low-Light Packing Network (LLPackNet), which is faster and computationally cheaper than the existing solutions. Recognizing the fact that a neural network’s complexity increases quadratically with spatial dimensions [Wang et al.(2017)Wang, Liu, and Foroosh], we perform the bulk of computations in a much lower resolution by performing aggressive down/up sampling operation. This is in contrast with much of the existing literature that down/up sample the feature maps in gradations [Zhang et al.(2018)Zhang, Tian, Kong, Zhong, and Fu, Lim et al.(2017)Lim, Son, Kim, Nah, and Mu Lee, Chen et al.(2018)Chen, Chen, Xu, and Koltun, Zhang et al.(2019a)Zhang, Lin, and Sheng, Szegedy et al.(2016)Szegedy, Vanhoucke, Ioffe, Shlens, and Wojna]Dumoulin and Visin(2016)] cannot be used as they would cause much loss in information. We therefore, propose Pack downsampling operation, which rearranges the pixels in such a manner that it reduces the spatial dimension by a factor of , while increasing the number of channels by a factor of , see Fig. 2. We show that the Pack operation bestows LLPackNet with an enormous receptive field which is not trivially possible by directly operating in the HR space. We also propose UnPack operation, which complements the Pack operation to do large upsampling. This operation is much faster than the usual transposed convolution layer [Dumoulin and Visin(2016)] and has no learnable parameters. For upsampling, PixelShuffle [Shi et al.(2016a)Shi, Caballero, Huszár, Totz, Aitken, Bishop, Rueckert, and Wang] is another viable option but it lacks proper correlation between the color channels and hence results in heavy color cast in the restored image as shown in Fig. 5. Altogether, the proposed Pack and UnPack operations allow us to operate in a much lower resolution space for computational advantages, without significantly affecting the restoration quality. See Fig. 1 for a qualitative comparison with state-of-the-art algorithms.
State-of-the-art deep learning solutions on extreme low-light image enhancement need to pre-amplify dark images before processing them[Chen et al.(2018)Chen, Chen, Xu, and Koltun, Maharjan et al.(2019)Maharjan, Li, Li, Xu, Ma, and Li, Ai and Kwon(2020), Gu et al.(2019)Gu, Li, Gool, and Timofte]
. However, these methods use ground-truth knowledge for predicting the amplification factor. In a real-world setting, because of lack of ground-truth (GT) knowledge, the amplification factor cannot be estimated properly and hence this would lead to degradation in performance. We therefore, equip the proposed LLPackNet with an amplifier module, which will estimate the amplification factor directly from the input image histogram.
To summarize, the main contributions of this paper are as follows — 1) We propose a deep neural network architecture, called LLPackNet, that enhances an extremely dark single-image at high resolution even on a CPU with very low latency and computational resources. 2) We propose Pack and UnPack operations for better color restoration. 3) LLPackNet can estimate the amplification factor directly from the input image, without relying on ground-truth information, making it practical for real world applications. 4) Our experiments show that compared to existing solutions, we are able to restore high definition, extreme low-light RAW images with 2–7 fewer model parameters, 2–3 lower memory and 5–20 speed up, with a competitive restoration quality. Our code is available at https://github.com/MohitLamba94/LLPackNet.
2 Related Work
Low-light enhancement methods are chiefly comprised of histogram equalization [Kim(1997), Pizer et al.(1987)Pizer, Amburn, Austin, Cromartie, Geselowitz, Greer, ter Haar Romeny, Zimmerman, and Zuiderveld, Pisano et al.(1998)Pisano, Zong, Hemminger, DeLuca, Johnston, Muller, Braeuning, and Pizer], Retinex based decomposition [Guo et al.(2016)Guo, Li, and Ling, Li et al.(2018b)Li, Liu, Yang, Sun, and Guo, Yu and Zhu(2017), Park et al.(2017)Park, Yu, Moon, Ko, and Paik, Li et al.(2011)Li, Wang, and Geng, Ghosh and Chaudhury(2019)] and Deep learning based methods [Lee et al.(2020)Lee, Sohn, and Min, Ren et al.(2019)Ren, Liu, Ma, Xu, Xu, Cao, Du, and Yang, Wang et al.(2019)Wang, Zhang, Fu, Shen, Zheng, and Jia, Cheng et al.(2019)Cheng, Yan, and Wang, Lore et al.(2017)Lore, Akintayo, and Sarkar, Wang et al.(2018)Wang, Wei, Yang, and Liu, Wei et al.(2018)Wei, Wang, Yang, and Liu, Shen et al.(2017)Shen, Yue, Feng, Chen, Liu, and Ma, Li et al.(2018a)Li, Guo, Porikli, and Pang, Zhang et al.(2020)Zhang, Liu, Ma, Zhong, Fan, and Luo]. Most of them however, do not target extreme low-light conditions or high resolution images. More recently, Chen et al[Chen et al.(2018)Chen, Chen, Xu, and Koltun] proposed an end-to-end pipeline to restore extreme low-light high-definition RAW images, which has spurred several other works in this direction [Maharjan et al.(2019)Maharjan, Li, Li, Xu, Ma, and Li, Ai and Kwon(2020), Gu et al.(2019)Gu, Li, Gool, and Timofte, Malik and Soundararajan(2019), Cai and Kintak(2019), Jenicek and Chum(2019)]. Most of these methods, however, have significantly large processing time and memory utilization. As noted in Sec. 1 many of them also require GT information for image pre-amplification. However, other image amplification techniques that involve the use of CRF [Ren et al.(2018)Ren, Ying, Li, and Li, Ying et al.(2017)Ying, Li, Ren, Wang, and Wang], image histogram [Kim(1997), Pizer et al.(1987)Pizer, Amburn, Austin, Cromartie, Geselowitz, Greer, ter Haar Romeny, Zimmerman, and Zuiderveld] or other assumptions [Yang et al.(2019)Yang, Zhang, and Li, Guo et al.(2016)Guo, Li, and Ling] have been used in traditional image enhancement methods to estimate amplification, using only the input image. Borrowing from these ideas, we develop an amplifier module that uses the histogram of the input dark image to predict the amplification factor automatically, without relying on GT information. To the best of our knowledge, this has not been attempted before for deep learning based dark image enhancement.
Fast and efficient CNN models have been explored in other areas, especially image classification, but is mostly achieved by either approximating [Rastegari et al.(2016)Rastegari, Ordonez, Redmon, and Farhadi, Wu et al.(2016)Wu, Leng, Wang, Hu, and Cheng] or pruning the learned weights [Frankle and Carbin(2019)]. In contrast, we propose a network that is inherently fast and efficient without using such weight-approximation or pruning approaches.
3 Low-Light Packing Network (LLPackNet)
We propose Low-Light Packing Network (LLPackNet) for enhancing extremely dark high resolution single-images with low time–memory complexity. We first describe the network architecture, shown Fig. 2, in Sec. 3.1 and then analyze the important components of our network, the Pack and UnPack operations, in Sec. 3.2.
3.1 Network architecture
|(a) LLPackNet Architecture||(b) Pack/UnPack Operations|
Image amplification: In general, dark images need to be pre-amplified before enhancing them. We estimate the amplification factor using the incoming RAW image
by constructing a 64 bin histogram, with the histogram bins being equidistant in the log domain. This provides a finer binning resolution for lower intensities and a coarser binning resolution for higher intensities. The histogram is used by a multilayer perceptron, having just one hidden layer, to estimate the amplification factor.
Fast and light-weight enhancement: As discussed in Sec 1, we want to perform most of the processing in LR space. Hence, our first step is to downsample the input image, without losing any information. For this purpose, we propose Pack operation, that downsamples the image by a factor of along the spatial dimensions while increasing the number of channels by a factor of . This is shown in Fig. 2 (b) for . A pseudo code is also provided in Algorithm 1. Our goal is to perform 16 downsampling, which we do in two stages. In the first stage, the Pack 2 operation separates out the red, green and blue color components lying in the 22 Bayer pattern [Hirakawa and Parks(2005)] of the amplified image . This reduces the spatial dimension by half and increases the channels from 1 to 4 (). Once the colors are separated into these channels, a subsequent Pack 8 operation is applied individually on each color channel, further reducing the spatial dimension from to lower resolution but increasing the number of channels from 1 to 64 (). Now, using a 33 convolution kernel, the channel dimension of each color component is reduced such that on concatenation, the resulting feature map has only 60 channels. The channel reduction at this stage is essential to prevent parameter and memory explosion in the downstream operations. This downsampled representation is then processed by a series of convolution operations. For this purpose, we use the Residual Dense Network [Zhang et al.(2018)Zhang, Tian, Kong, Zhong, and Fu] (RDN) — which consists of 3 residual dense blocks each with 6 convolutional layers and a growth rate of 32. RDN does not perform any down/up sampling operation or cause any change in channel dimension in its output. The output of the RDN now needs to be upsampled and for this we use the proposed UnPack operation, which is the inverse of Pack . UnPack , however, reduces the number of channels from 60 to 15 () and this needs to be increased to 192 () to allow the final upsampling using UnPack . For this we use another set of convolutions. Except for this operation, all the computations are done in the lower resolution. We finally perform UnPack operation to get the restored image, .
Loss function: Similar to Ignatov et al[Ignatov et al.(2017)Ignatov, Kobyshev, Timofte, Vanhoey, and Van Gool], we compute the color loss, content loss and total variation (TV) loss on the restored image to train the network. Specifically, we use,
where is a feature map of VGG-19, performs Gaussian smoothing and denotes the network weights. VGG-19 features are obtained right after the final 3 max-pool layers.
3.2 Pack/UnPack operation for better color restoration
The last section discussed LLPackNet from the vantage point of network complexity. In this section we analyze the network from the standpoint of reconstruction quality.
Improving color correlation with UnPack : Making abrupt transitions between LR and HR spaces introduces several distortions in the restored image. To minimize these, we propose the novel Pack and UnPack operations. To understand these operations, it is crucial to analyze PixelShuffle [Shi et al.(2016a)Shi, Caballero, Huszár, Totz, Aitken, Bishop, Rueckert, and Wang] - a fast and effective upsampling method, based on which they are formulated. Using an analysis similar to [Aitken et al.(2017)Aitken, Ledig, Theis, Caballero, Wang, and Shi, Shi et al.(2016b)Shi, Caballero, Theis, Huszar, Aitken, Ledig, and Wang, Shi et al.(2016a)Shi, Caballero, Huszár, Totz, Aitken, Bishop, Rueckert, and Wang], we will show that Pack/UnPack operations lead to better color correlation than PixelShuffle.
First we analyze the PixelShuffle operation. In Fig. 3 a),
refers to the penultimate feature map, which is upsampled with zero padding and then convolved withto obtain the restored image . We now explain the color coding used in the figure. When convolves with , for each shifted position of , only the weights in one set of colors in contribute to an output pixel in . We label the output pixel with the same color. Doing convolution in HR is computationally expensive. However, an equivalent operation in the LR space can be performed as shown in Fig. 3 b). This involves decomposing into smaller kernels of which are then convolved with to produce . Using PixelShuffle, can then be obtained from . However, in this scheme, each kernel in maintains a monopoly on one of the red, green or blue color channels in the restored image , see Fig. 3 c). Thus, restoring images using PixelShuffle causes weak correlation among the color channels of , leading to color artifacts as shown in Fig. 5.
The goal of UnPack operation is to enhance the correlation among the color channels of . For this purpose, along with the upsampling, zero-padding and convolution operations, we introduce a re-grouping step as shown in Fig. 3 d). This may appear to be a complicated two-stage operation, but using our UnPack operation we can easily perform an equivalent operation in the LR space, as shown in Fig. 3 e). Note that this operation has the same time complexity as the operation shown in Fig. 3 b). For this operation, we decompose into and then apply UnPack . From Fig. 3 f), we see that all the kernels of are collectively responsible for all the colors in . Thus, UnPack operation leads to better color correlation than PixelShuffle.
The effectiveness of the proposed UnPack operation can also be intuitively understood in the LR space by comparing Fig. 3 b) and Fig. 3 e). UnPack preserves the RGB ordering in the LR, whereas, PixelShuffle breaks this ordering, especially for large upsampling factors. For example, for a given spatial location in HR, PixelShuffle separates the Red and Blue pixels by channels in LR for upsampling. UnPack, however, always separates them by only 1 Green pixel for any upsampling factor. This is crucial because for CNNs it is well known that nearby features correlate more than spaced out ones [Szegedy et al.(2016)Szegedy, Vanhoucke, Ioffe, Shlens, and Wojna]. Thus even though, UnPack does not introduce any new parametrization, its arrangement favors better color restoration. Therefore, in Fig. 5, PixelShuffle’s restored image is heavily affected by color cast, but no such distortion is observed in the case of UnPack.
Increasing receptive field with Pack : Having a large receptive field is essential for capturing the contextual information in an image. Downsampling the incoming feature map using the novel Pack operation equips LLPackNet with a large receptive field. To illustrate this fact, let us consider a large feature map which is downsampled to using Pack 10 operation. Note that the neighboring pixels in are actually pixels apart in . Also, the pixels along the channel dimension of are in a neighborhood in . Thus, even using a convolution kernel on with a stride of 1 leads to a receptive field of 900 pixels in . In contrast, to do a similar operation directly on , requires a kernel with a stride of , which is impractical.
|a) The usual upsampling operation in HR.||b) Implementing (a) in LR using PixelShuffle.||c) Less color correlation.|
|d) Upsampling in HR followed by regrouping for better color correlation.||e) Implementing (d) in LR using UnPack.||f) Better color correlation|
|Model||Processing Time||Memory||Parameters||PSNR(dB) / SSIM|
|(in seconds)||( in GB)||(in million)||w/o GT exposure||using GT exposure|
|Maharjan et al[Maharjan et al.(2019)Maharjan, Li, Li, Xu, Ma, and Li]||/||28.41 / 0.81|
|Gu et al[Gu et al.(2019)Gu, Li, Gool, and Timofte]||/ 0.59||28.53 / 0.81|
|Chen et al[Chen et al.(2018)Chen, Chen, Xu, and Koltun]||22.93 / 0.70||28.30 / 0.79|
|Chen et al[Chen et al.(2018)Chen, Chen, Xu, and Koltun] + Our Amplifier||22.98 / 0.71||28.30 / 0.79|
|LLPackNet (Ours)||3||3||1.1||23.27 / 0.69||27.83 / 0.75|
4.1 Experimental settings
For extreme low-light single-image enhancement, we compare with Chen et al[Chen et al.(2018)Chen, Chen, Xu, and Koltun] , Gu et al[Gu et al.(2019)Gu, Li, Gool, and Timofte] and Maharjan et al[Maharjan et al.(2019)Maharjan, Li, Li, Xu, Ma, and Li]. In addition, we also tried conventional techniques such as LIME [Guo et al.(2016)Guo, Li, and Ling] and Li et al[Li et al.(2018b)Li, Liu, Yang, Sun, and Guo] but they did not work well for dark images. The publicly available training and test codes of these methods have been used for the comparisons. For experiments on dark images, we use See-in-the-Dark (SID) dataset [Chen et al.(2018)Chen, Chen, Xu, and Koltun] captured with high definition full-frame Sony 7S II Bayer sensor. Unlike some methods that collect their dataset by simulating pairs of low-light and GT images [Lore et al.(2017)Lore, Akintayo, and Sarkar, Park et al.(2017)Park, Yu, Moon, Ko, and Paik, Wang et al.(2019)Wang, Zhang, Fu, Shen, Zheng, and Jia, Ren et al.(2019)Ren, Liu, Ma, Xu, Xu, Cao, Du, and Yang, Li et al.(2018a)Li, Guo, Porikli, and Pang, Wei et al.(2018)Wei, Wang, Yang, and Liu], SID provides physically captured extreme low-light RAW images of resolution 28484256. We additionally show comparisons on the LOL dataset [Wei et al.(2018)Wei, Wang, Yang, and Liu] to evaluate the performance of LLPackNet on a notably distinct test set-up. In contrast to SID, LOL has weakly illuminated VGA resolution PNG compressed images. Additionally, SID comes with GT and low-light exposure information, which can be used for estimating the pre-amplification factor, but LOL has no such information.
We use the train/test split as given in the datasets. For LLPackNet, patches of size
are used for training and full resolution for testing. For benchmarking, we use the PyTorch[Paszke et al.(2019)Paszke, Gross, Massa, Lerer, Bradbury, Chanan, Killeen, Lin, Gimelshein, Antiga, et al.] framework on Intel Xeon E5-1620V4 @ 3.50 GHz CPU with 64 GB RAM. We use the default Adam optimizer of PyTorch with fixed learning rate of . All convolutions use kernels of size with He initialization [He et al.(2015)He, Zhang, Ren, and Sun]. Our network was allowed to train for 400,000 iterations. We use, , , , and .
4.2 Restoration results for extreme low-light images
We compare our network with Chen et al[Chen et al.(2018)Chen, Chen, Xu, and Koltun], Gu et al[Gu et al.(2019)Gu, Li, Gool, and Timofte] and Maharjan et al[Maharjan et al.(2019)Maharjan, Li, Li, Xu, Ma, and Li] on the SID dataset, see Table 1 and Fig. 4. These methods use the ratio of GT exposure to that of the input dark image, available in the SID dataset, to pre-amplify the images. The corresponding results are shown under the label ‘using GT exposure’ in Table 1 and Fig. 4. But, since the GT information will not be readily available in a real-world setting, we additionally show results in the absence of GT information. This is shown under the heading ‘w/o GT exposure’. We also show results for ‘Chen et al+ Our Amplifier’ in which our proposed amplifier is added to their algorithm. We have chosen Chen et albecause they have the least time and memory complexity, compared to the other existing methods. All the methods are appropriately retrained before evaluation.
|PixelShuffle||Our UnPacking||GT||PixelShuffle||Our UnPacking||GT|
|No Amplifier||With Amplifier||GT||No Amplifier||With Amplifier||GT|
Network speed and memory utilization: As shown in Table 1, LLPackNet is faster with lower memory and lesser model parameters. We achieve this because we do the bulk of operations in 16 lower resolution. In contrast, Maharjan et al[Maharjan et al.(2019)Maharjan, Li, Li, Xu, Ma, and Li] do not perform any downsampling operation and therefore, the feature maps propagating through their network are huge. This results in very high network latency and memory consumption. Gu et al[Gu et al.(2019)Gu, Li, Gool, and Timofte] adopt a multi-scale approach that requires feature map propagation at 2 and 4 lower resolution. But this marginal downsampling is not sufficient to contain the network latency and memory consumption. Chen et al[Chen et al.(2018)Chen, Chen, Xu, and Koltun] have relatively better metrics by performing up to 32 downsampling. But this is done only in steps of 2, requiring five downsampling and five upsampling operations. Further, four out of five upsampling operations are done using transposed convolution [Dumoulin and Visin(2016)], which is much slower than the proposed UnPack operation. Thus, Chen et alhave a moderately high processing time and memory utilization. Check the supplementary for more details.
Restoration quality: All methods perform notably well when the GT exposure is available. But in a practical setting when GT exposure is not readily available, except for our LLPackNet, the other methods struggle to restore proper colors. The results for this practical setting are also shown in Fig 1. Adding our amplifier module to Chen et alimproves their performance to some extent, but the restored images still exhibit noisy patches and color cast. This is because amplification is not the only deciding factor in improving the performance of a network. Rather, having a large receptive field, which provides more contextual information, and better correlation among the color channels is more important than the correct amplification factor. To further assess these claims, refer to the ablation studies in section 4.3, which show that LLPackNet continues to give structurally consistent results even when the amplifier is removed.
4.3 Ablation studies on LLPackNet
We now show ablation studies on LLPackNet to better understand the contribution of individual components. For each ablation study the network is appropriately retrained.
UnPack PixelShuffle: As a first ablation study, we replace the UnPack operation in the proposed LLPackNet with the PixelShuffle operation [Shi et al.(2016a)Shi, Caballero, Huszár, Totz, Aitken, Bishop, Rueckert, and Wang] and the results are shown in Fig. 5. We notice that the images restored using PixelShuffle are affected by heavy color cast. Using the UnPack operation in place of PixelShuffle improves the PSNR/SSIM from 22.72 dB/0.68 to 23.27 dB/0.69. Thus, the UnPack operation favors better color restoration.
In Section 3.2 we ascribed the better color restoration performance of Pack/UnPack over PixelShuffle to the fact that PixelShuffle breaks the RGB ordering in LR space, especially for large upsampling factors, whereas UnPack preserves the RGB ordering for any factor. To further test this hypothesis, we conducted an ablation study where we changed the image downsampling factor from 16 to 8, so that the final UnPack/PixelShuffle operation performs 4 upsampling instead of 8. This results in increased time and computational complexity for LLPackNet, but it reduces the separation between the Red and Blue channels in PixelShuffle. The performance of UnPack (23.29 dB) is almost the same as the 16 upsampling case, however the performance of PixelShuffle (23.28 dB) improves. In the case of 8 upsampling UnPack shows a gain of about 0.6 dB over PixelShuffle but for the case of 4 upsampling the difference reduces to 0.01dB. This confirms our hypothesis that Pack/UnPack has better performance because it preserves the RGB ordering for any upsampling factor.
Estimating proper amplification: Fig. 6 shows the restoration results using LLPackNet with and without the amplifier. Similar to the scotopic vision [Chen and Perona(2017), Westheimer(1965)], without the amplifier, the restoration has faded colors. But the removal of the amplifier does not induce the annoying artifacts seen in restoration done using Chen et al(see Fig. 4 (B)). This can be attributed to the large receptive field due to the Pack operation. With the amplifier the performance of the network improves from 22.53dB/0.66 to 23.27dB/0.69.
Overall the combined effect of using the proposed Pack/UnPack operation over PixelShuffle, and estimating proper amplification, increases the average PSNR/SSIM from 21.35 dB / 0.60 to 23.27 dB / 0.69.
|Model||Processing Time||PSNR (dB)||SSIM|
|Chen et al[Chen et al.(2018)Chen, Chen, Xu, and Koltun]||sec.|
|LIME [Guo et al.(2016)Guo, Li, and Ling]||0.19 sec.|
|Li et al[Li et al.(2018b)Li, Liu, Yang, Sun, and Guo]||sec.|
|Gu et al[Gu et al.(2019)Gu, Li, Gool, and Timofte]||sec.||0.75|
|LLPackNet-8 (Proposed)||0.06 sec.||19.61|
4.4 LLPackNet for low-resolution images
The SID dataset contains high definition images, thereby, allowing us to chose a large downsampling factor of 16. This leads us to the question: Can LLPackNet also work for LR images? When LR images are downsampled using a large factor, the intra-channel correlation in the downsampled image is reduced, which negatively impacts the restoration. To investigate this, we conducted experiments on the LOL dataset [Wei et al.(2018)Wei, Wang, Yang, and Liu] containing weakly illuminated images at VGA resolution of 400600. As the images in the LOL dataset are already in the compressed PNG format, the 2 downsampling at the beginning of LLPackNet to separate out the Bayer pattern is not required. Thus, the effective downsampling is only 8 and we denote this network by LLPackNet-8. The results are shown in Table 2. Once again, LLPackNet has the lowest processing time. We further observe that the large receptive field of LLPackNet enhances the denoising and color restoration capabilities. But, a slight blur is also introduced. To verify that the blur is because of large downsampling, we retrain LLPackNet on the LOL dataset with 4 downsampling, which we denote as LLPackNet-4. With LLPackNet-4, we obtain sharper results having higher SSIM values.
In this paper, a fast and light-weight extreme low-light image enhancement network (LLPackNet) has been presented. LLPackNet performs aggressive down/up-sampling using the proposed Pack/UnPack operations to obtain a large receptive field and better color restoration. The network also uses a novel amplifier module that amplifies the input image without relying on ground-truth information. Overall, LLPackNet is 5–20 faster and 2–3 lighter, and yet maintains a competitive restoration quality compared to state-of-the-art algorithms.
6 Supplementary Material
|Maharjan et al||Gu et al||Chen et al||Ours||GT|
6.1 More qualitative results
We show more qualitative results comparing the performance of the proposed method with existing methods. Fig. 7 shows results for enhancing extremely dark images for the practical scenario when the ratio of GT to input image exposure is not available. Refer to Fig. 4 (B) and Table 1 in main paper for more information about this setting.
Fig. 8 shows qualitative results for the LOL dataset corresponding to Table 2 in the main paper.
|LIME||Chen et al||Gu et al||LLPackNet 8||LLPackNet 4||GT|
6.2 Worked out example of Pack/UnPack operation
Pack/UnPack operators perform intermixing of pixels for better color correlation. This intermixing is shown in Fig. 2 of the main paper. To further facilitate how the Pack/UnPack
do the shuffling in LR we display a worked-out example below. Consider an input tensor ofspatial resolution with channels as shown below.
Channel Count Channel Channel Channel Channel Channel 1 2 5 6 9 10 45 46 Values 3 4 7 8 11 12 47 48
Then, applying the UnPack operation we get a tensor of spatial resolution with channels as shown below.
Red Channel or the first channel [ 1, 13, 2, 14] [25, 37, 26, 38] [ 3, 15, 4, 16] [27, 39, 28, 40], Green Channel or the second channel [ 5, 17, 6, 18] [29, 41, 30, 42] [ 7, 19, 8, 20] [31, 43, 32, 44], Blue Channel or the third channel [ 9, 21, 10, 22] [33, 45, 34, 46] [11, 23, 12, 24] [35, 47, 36, 48]
|HW; Channels||Execution Time in Seconds||Number of Learnable Parameters|
|; 32 -> ; 8||0.18||0.05||0.13||1032|
|; 128 -> ; 32||0.04||0.01||0.04||16416|
|; 512 -> ; 128||0.0025||0.0006||0.0025||262272|
6.3 Comparing Pack/UnPack with other popular down/up sampling operations
Downsampling: Max-pooling is the most popular technique for downsampling feature maps. This has been used in many deep learning methods, including Chen et al’s network. But for a large downsampling it will cause huge loss of information. For example, when doing a 8 downsampling, max-pooling will choose only a single element from an block.
Another popular downsampling technique is strided convolution, usually done with small kernels such as or . But, for a large downsampling factor, say 8, a stride of 8 is required. However, with such small kernels it would lead to loss of information. To alleviate these issues, we used the novel Pack operation for downsampling feature maps without loss of information.
Upsampling: We have already shown the effectiveness of UnPack operation over the PixelShuffle operation in the main paper. Here, we compare with two other popular approaches – Transposed convolution as used by Chen et aland interpolation suggested by Odena et al[Odena et al.(2016)Odena, Dumoulin, and Olah]. The transposed convolution is very slow as compared to the UnPack operation because it has to iterate the convolution kernel over the entire feature map. Moreover it increases the parameter count of the network. On the other hand, the interpolation technique suggested by Odena et alhas no learnable parameters but is still a slower operation. This can be seen in Table 3.
- [Ai and Kwon(2020)] Sophy Ai and Jangwoo Kwon. Extreme low-light image enhancement for surveillance cameras using attention u-net. Sensors, 20(2):495, 2020.
- [Aitken et al.(2017)Aitken, Ledig, Theis, Caballero, Wang, and Shi] Andrew Aitken, Christian Ledig, Lucas Theis, Jose Caballero, Zehan Wang, and Wenzhe Shi. Checkerboard artifact free sub-pixel convolution: A note on sub-pixel convolution, resize convolution and convolution resize. arXiv preprint arXiv:1707.02937, 2017.
[Cai and Kintak(2019)]
Yuantian Cai and U Kintak.
Low-light image enhancement based on modified u-net.
2019 International Conference on Wavelet Analysis and Pattern Recognition (ICWAPR), pages 1–7. IEEE, 2019.
- [Chen and Perona(2017)] Bo Chen and Pietro Perona. Seeing into darkness: Scotopic visual recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3826–3835, 2017.
- [Chen et al.(2018)Chen, Chen, Xu, and Koltun] Chen Chen, Qifeng Chen, Jia Xu, and Vladlen Koltun. Learning to see in the dark. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3291–3300, 2018.
- [Cheng et al.(2019)Cheng, Yan, and Wang] Yu Cheng, Jia Yan, and Zhou Wang. Enhancement of weakly illuminated images by deep fusion networks. In 2019 IEEE International Conference on Image Processing (ICIP), pages 924–928. IEEE, 2019.
- [Dumoulin and Visin(2016)] Vincent Dumoulin and Francesco Visin. A guide to convolution arithmetic for deep learning. arXiv preprint arXiv:1603.07285, 2016.
- [Frankle and Carbin(2019)] Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In International Conference on Learning Representations, 2019.
- [Ghosh and Chaudhury(2019)] Sanjay Ghosh and Kunal N Chaudhury. Fast bright-pass bilateral filtering for low-light enhancement. In 2019 IEEE International Conference on Image Processing (ICIP), pages 205–209. IEEE, 2019.
- [Gu et al.(2019)Gu, Li, Gool, and Timofte] Shuhang Gu, Yawei Li, Luc Van Gool, and Radu Timofte. Self-guided network for fast image denoising. In Proceedings of the IEEE International Conference on Computer Vision, pages 2511–2520, 2019.
- [Guo et al.(2016)Guo, Li, and Ling] Xiaojie Guo, Yu Li, and Haibin Ling. Lime: Low-light image enhancement via illumination map estimation. IEEE Transactions on Image Processing, 26(2):982–993, 2016.
[He et al.(2015)He, Zhang, Ren, and Sun]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.
Delving deep into rectifiers: Surpassing human-level performance on imagenet classification.In Proceedings of the IEEE international conference on computer vision, pages 1026–1034, 2015.
- [Hirakawa and Parks(2005)] Keigo Hirakawa and Thomas W Parks. Adaptive homogeneity-directed demosaicing algorithm. IEEE Transactions on Image Processing, 14(3):360–369, 2005.
- [Ignatov et al.(2017)Ignatov, Kobyshev, Timofte, Vanhoey, and Van Gool] Andrey Ignatov, Nikolay Kobyshev, Radu Timofte, Kenneth Vanhoey, and Luc Van Gool. Dslr-quality photos on mobile devices with deep convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 3277–3285, 2017.
[Jenicek and Chum(2019)]
Tomas Jenicek and Ondrej Chum.
No fear of the dark: Image retrieval under varying illumination conditions.In Proceedings of the IEEE International Conference on Computer Vision, pages 9696–9704, 2019.
- [Kim(1997)] Yeong-Taeg Kim. Contrast enhancement using brightness preserving bi-histogram equalization. IEEE transactions on Consumer Electronics, 43(1):1–8, 1997.
- [Lee et al.(2020)Lee, Sohn, and Min] Hunsang Lee, Kwanghoon Sohn, and Dongbo Min. Unsupervised low-light image enhancement using bright channel prior. IEEE Signal Processing Letters, 27:251–255, 2020.
- [Li et al.(2011)Li, Wang, and Geng] Bo Li, Shuhang Wang, and Yanbing Geng. Image enhancement based on retinex and lightness decomposition. In 2011 18th IEEE International Conference on Image Processing, pages 3417–3420. IEEE, 2011.
[Li et al.(2018a)Li, Guo, Porikli, and
Chongyi Li, Jichang Guo, Fatih Porikli, and Yanwei Pang.
Lightennet: a convolutional neural network for weakly illuminated image enhancement.Pattern Recognition Letters, 104:15–22, 2018a.
- [Li et al.(2018b)Li, Liu, Yang, Sun, and Guo] Mading Li, Jiaying Liu, Wenhan Yang, Xiaoyan Sun, and Zongming Guo. Structure-revealing low-light image enhancement via robust retinex model. IEEE Transactions on Image Processing, 27(6):2828–2841, 2018b.
[Lim et al.(2017)Lim, Son, Kim, Nah, and Mu Lee]
Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee.
Enhanced deep residual networks for single image super-resolution.In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 136–144, 2017.
[Lore et al.(2017)Lore, Akintayo, and Sarkar]
Kin Gwn Lore, Adedotun Akintayo, and Soumik Sarkar.
Llnet: A deep autoencoder approach to natural low-light image enhancement.Pattern Recognition, 61:650–662, 2017.
- [Maharjan et al.(2019)Maharjan, Li, Li, Xu, Ma, and Li] Paras Maharjan, Li Li, Zhu Li, Ning Xu, Chongyang Ma, and Yue Li. Improving extreme low-light image denoising via residual learning. In 2019 IEEE International Conference on Multimedia and Expo (ICME), pages 916–921. IEEE, 2019.
- [Malik and Soundararajan(2019)] Sameer Malik and Rajiv Soundararajan. Llrnet: A multiscale subband learning approach for low light image restoration. In 2019 IEEE International Conference on Image Processing (ICIP), pages 779–783. IEEE, 2019.
- [Odena et al.(2016)Odena, Dumoulin, and Olah] Augustus Odena, Vincent Dumoulin, and Chris Olah. Deconvolution and checkerboard artifacts. Distill, 2016. URL http://distill.pub/2016/deconv-checkerboard.
- [Park et al.(2017)Park, Yu, Moon, Ko, and Paik] Seonhee Park, Soohwan Yu, Byeongho Moon, Seungyong Ko, and Joonki Paik. Low-light image enhancement using variational optimization-based retinex model. IEEE Transactions on Consumer Electronics, 63(2):178–184, 2017.
- [Paszke et al.(2019)Paszke, Gross, Massa, Lerer, Bradbury, Chanan, Killeen, Lin, Gimelshein, Antiga, et al.] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. PyTorch: An imperative style, high-performance deep learning library. In Advances in neural information processing systems, pages 8026–8037, 2019.
- [Pisano et al.(1998)Pisano, Zong, Hemminger, DeLuca, Johnston, Muller, Braeuning, and Pizer] Etta D Pisano, Shuquan Zong, Bradley M Hemminger, Marla DeLuca, R Eugene Johnston, Keith Muller, M Patricia Braeuning, and Stephen M Pizer. Contrast limited adaptive histogram equalization image processing to improve the detection of simulated spiculations in dense mammograms. Journal of Digital imaging, 11(4):193, 1998.
- [Pizer et al.(1987)Pizer, Amburn, Austin, Cromartie, Geselowitz, Greer, ter Haar Romeny, Zimmerman, and Zuiderveld] Stephen M Pizer, E Philip Amburn, John D Austin, Robert Cromartie, Ari Geselowitz, Trey Greer, Bart ter Haar Romeny, John B Zimmerman, and Karel Zuiderveld. Adaptive histogram equalization and its variations. Computer vision, graphics, and image processing, 39(3):355–368, 1987.
- [Rastegari et al.(2016)Rastegari, Ordonez, Redmon, and Farhadi] Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. Xnor-net: Imagenet classification using binary convolutional neural networks. In European conference on computer vision, pages 525–542. Springer, 2016.
- [Ren et al.(2019)Ren, Liu, Ma, Xu, Xu, Cao, Du, and Yang] Wenqi Ren, Sifei Liu, Lin Ma, Qianqian Xu, Xiangyu Xu, Xiaochun Cao, Junping Du, and Ming-Hsuan Yang. Low-light image enhancement via a deep hybrid network. IEEE Transactions on Image Processing, 28(9):4364–4375, 2019.
- [Ren et al.(2018)Ren, Ying, Li, and Li] Yurui Ren, Zhenqiang Ying, Thomas H Li, and Ge Li. Lecarm: low-light image enhancement using the camera response model. IEEE Transactions on Circuits and Systems for Video Technology, 29(4):968–981, 2018.
- [Shen et al.(2017)Shen, Yue, Feng, Chen, Liu, and Ma] Liang Shen, Zihan Yue, Fan Feng, Quan Chen, Shihao Liu, and Jie Ma. Msr-net: Low-light image enhancement using deep convolutional network. arXiv preprint arXiv:1711.02488, 2017.
- [Shi et al.(2016a)Shi, Caballero, Huszár, Totz, Aitken, Bishop, Rueckert, and Wang] Wenzhe Shi, Jose Caballero, Ferenc Huszár, Johannes Totz, Andrew P Aitken, Rob Bishop, Daniel Rueckert, and Zehan Wang. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1874–1883, 2016a.
- [Shi et al.(2016b)Shi, Caballero, Theis, Huszar, Aitken, Ledig, and Wang] Wenzhe Shi, Jose Caballero, Lucas Theis, Ferenc Huszar, Andrew Aitken, Christian Ledig, and Zehan Wang. Is the deconvolution layer the same as a convolutional layer? arXiv preprint arXiv:1609.07009, 2016b.
- [Szegedy et al.(2016)Szegedy, Vanhoucke, Ioffe, Shlens, and Wojna] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016.
- [Wang et al.(2017)Wang, Liu, and Foroosh] Min Wang, Baoyuan Liu, and Hassan Foroosh. Factorized convolutional neural networks. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 545–553, 2017.
- [Wang et al.(2019)Wang, Zhang, Fu, Shen, Zheng, and Jia] Ruixing Wang, Qing Zhang, Chi-Wing Fu, Xiaoyong Shen, Wei-Shi Zheng, and Jiaya Jia. Underexposed photo enhancement using deep illumination estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6849–6857, 2019.
- [Wang et al.(2013)Wang, Zheng, Hu, and Li] Shuhang Wang, Jin Zheng, Hai-Miao Hu, and Bo Li. Naturalness preserved enhancement algorithm for non-uniform illumination images. IEEE Transactions on Image Processing, 22(9):3538–3548, 2013.
- [Wang et al.(2018)Wang, Wei, Yang, and Liu] Wenjing Wang, Chen Wei, Wenhan Yang, and Jiaying Liu. Gladnet: Low-light enhancement network with global awareness. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pages 751–755. IEEE, 2018.
- [Wei et al.(2018)Wei, Wang, Yang, and Liu] Chen Wei, Wenjing Wang, Wenhan Yang, and Jiaying Liu. Deep retinex decomposition for low-light enhancement. In British Machine Vision Conference, 2018.
- [Westheimer(1965)] G Westheimer. Spatial interaction in the human retina during scotopic vision. The Journal of physiology, 181(4):881–894, 1965.
- [Wu et al.(2016)Wu, Leng, Wang, Hu, and Cheng] Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, and Jian Cheng. Quantized convolutional neural networks for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4820–4828, 2016.
- [Yang et al.(2019)Yang, Zhang, and Li] Kai-Fu Yang, Xian-Shi Zhang, and Yong-Jie Li. A biological vision inspired framework for image enhancement in poor visibility conditions. IEEE Transactions on Image Processing, 29:1493–1506, 2019.
- [Ying et al.(2017)Ying, Li, Ren, Wang, and Wang] Zhenqiang Ying, Ge Li, Yurui Ren, Ronggang Wang, and Wenmin Wang. A new low-light image enhancement algorithm using camera response model. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 3015–3022, 2017.
- [Yu and Zhu(2017)] Shun-Yuan Yu and Hong Zhu. Low-illumination image enhancement algorithm based on a physical lighting model. IEEE Transactions on Circuits and Systems for Video Technology, 29(1):28–37, 2017.
- [Zhang et al.(2020)Zhang, Liu, Ma, Zhong, Fan, and Luo] J. Zhang, R. Liu, L. Ma, W. Zhong, X. Fan, and Z. Luo. Principle-inspired multi-scale aggregation network for extremely low-light image enhancement. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2638–2642, 2020.
- [Zhang et al.(2019a)Zhang, Lin, and Sheng] Shuo Zhang, Youfang Lin, and Hao Sheng. Residual networks for light field image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 11046–11055, 2019a.
- [Zhang et al.(2019b)Zhang, Zhang, and Guo] Yonghua Zhang, Jiawan Zhang, and Xiaojie Guo. Kindling the darkness: A practical low-light image enhancer. In Proceedings of the 27th ACM International Conference on Multimedia, pages 1632–1640, 2019b.
- [Zhang et al.(2018)Zhang, Tian, Kong, Zhong, and Fu] Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong, and Yun Fu. Residual dense network for image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2472–2481, 2018.