A publicly available database that contains real noisy and clean color images.
Filtering real-world color images is challenging due to the complexity of noise that can not be formulated as a certain distribution. However, the rapid development of camera lens pos- es greater demands on image denoising in terms of both efficiency and effectiveness. Currently, the most widely accepted framework employs the combination of transform domain techniques and nonlocal similarity characteristics of natural images. Based on this framework, many competitive methods model the correlation of R, G, B channels with pre-defined or adaptively learned transforms. In this chapter, a brief review of related methods and publicly available datasets is presented, moreover, a new dataset that includes more natural outdoor scenes is introduced. Extensive experiments are performed and discussion on visual effect enhancement is included.READ FULL TEXT VIEW PDF
Most of previous image denoising methods focus on additive white Gaussia...
Most of existing image denoising methods learn image priors from either
Filtering images of more than one channel is challenging in terms of bot...
Most of the existing denoising algorithms are developed for grayscale im...
We propose a simple method for estimating noise level from a single colo...
While deep Convolutional Neural Networks (CNNs) have shown extraordinary...
Measuring the colorfulness of a natural or virtual scene is critical for...
A publicly available database that contains real noisy and clean color images.
Very efficient and effective color image denoising method.
A publicly available software for real color image denoising
Images are inevitably contaminated by noise during acquisition. According to , the real-world noise of existence in the in-camera process [2, 3, 4] is signal dependent and stems mainly from five sources: photon shot, fixed pattern, dark current, readout and quantization noise. With the presence of noise, possible subsequent image processing tasks such as video processing, image analysis and tracking are adversely affected. Therefore, image denoising plays an important role in modern image processing systems.
The past few decades witness great achievements in the field of image denoising . The non-local and similarity feature  of natural images, as illustrated in Fig. 1, is mainly employed for designing image denoising methods. Furthermore, transform domain  technique is introduced according to the assumption that signal will be sparsely distributed. The combination of non-local similarity property and transform domain technique is a simple and effective framework, which can be roughly divided into three consecutive steps: grouping, collaborative filtering and aggregation. More detailed description is given in Algorithm 1. Following similar idea, there are several interconnected criteria to categorize existing methods, and techniques adopted by some representative methods are briefed in Table I.
Techniques adopted by representative methods. PCA: principal componen analysis, LR: low rank, RN: residual network, TF: tensor factorization, SC: sparse coding, HT: hard-thresholding, WF: Wiener filter.
|Method||LSCD ||LLRT ||DnCNN ||TNRD ||MCWNNM ||GID ||TWSC ||4DHOSVD1 ||CBM3D ||MS-TSVD |
|Techique||Color Line + PCA||LR + TF||RN||External Prior||Weighted LR||External+Internal Prior||Weighted SC||HT + TF||HT+WF||HT + TF|
First, based on the choice of transform, numerous methods are devised. Early works are from bilateral filters , wavelet  and curvelet  transforms. Then the well-known color block-matching 3D (CBM3D)  combines discrete cosine transform (DCT) with opponent color mode transform, and produces state-of-the-art performance. Instead of using predefined transforms, some recently proposed methods adopt adaptively learned transforms. 
uses singular value decomposition (SVD) and designs a weighted nuclear norm (WNNM) strategy. uses the idea of tensor  and trains corresponding mode transforms with 4D higher-order singular value decomposition (4DHOSVD). Most recently,  utilizes tensor-SVD  and characterizes color image with block circulant representation .
In this paper, tensors are denoted by calligraphic letters, e.g., , matrices by boldface capital letters, e.g.
; vectors are denoted by lowercase boldface letters, e.g.,. The th entry of a vector is denoted by , element of a matrix by , and element of a third-order tensor by . An th-order tensor is denoted as . The -mode product of a tensor by a matrix , denoted by is also a tensor. The Frobenius norm of a tensor is defined as . The mode- matricization or unfolding of a , denoted by , maps tensor elements to matrix element where , with .
where is core tensor and represents orthogonal factor matrix, then the matricized version of (1) is
where denotes the Kronecker product.
Regardless of various categorization criteria, almost all related methods under the nonlocal and transform domain framework employ the idea of multiway filtering technique , and apply different regularization to filter group of patches along each dimension. Vector and tensor are two commonly used representation to model each color image patch, as is illustrated in Fig. 2. Vector representation based methods such as TWSC , MCWNNM , LSCD 
can benefit from good and vastly-studied statistical properties of singular value decomposition (SVD) or principal component analysis (PCA). However, each vectorized patch can be a lengthy vector and increases computational burden. To alleviate this issue, many recently proposed methods such as 4DHOSVD, LLRT and MS-TSVD exploit tensor representation. In fact, pioneer work is from CBM3D, although tensor is not explicitly stated.
where are corresponding mode transform matrices along each dimension of . For CBM3D, all transforms in (3) are predefined, and its opponent color mode transform matrix is
Specifically, assume , then the first slice of in the new color mode can be regarded as luminance channel, while the rest two slices are chrominance channel. CBM3D is very efficient because it does not have to train local transforms, and its grouping step is performed only on the luminance channel. For 4DHOSVD, all mode transform matrices are learned by solving
is identity matrix andcan be obtained via SVD of corresponding unfolding matrix. Different from CBM3D and 4DHOSVD that use direct folding and unfolding of tensor, MS-TSVD adopts the idea of tensor-SVD (t-SVD), and characterizes each color image patch as a block circulant matrix, which can be represented by
where are three color channels of patch . Fig. 3 illustrates a group of color image patches that use block circulant representation. Then, according to tensor decomposition in (1), the reconstruction problem of MS-TSVD can be written as
the factor matrices of (7) can be obtained by solving
Directly solving (8
) is time-consuming, because it includes the decomposition of a large block circulant matrix. Thus the authors first perform fast Fourier transform (FFT) along the third mode of to convert the block circulant representation into diagonal representation in the Fourier domain. Each patch in the Fourier domain can be obtained by , where is the FFT matrix defined as
then problem (8) can be reformulated as
where . According to , . Furthermore, from (9), two interesting features of FFT and correspondingly of patch in the Fourier domain can be seen. First, the second and third slices of are conjugate. Second, the first slice of in the Fourier domain can be regarded as luminance channel of (4). Therefore, to train the group-wise transform more effectively, an alternative to (10) is using only the luminance channel information.
Indeed MS-TSVD can be regarded as a generalization of both CBM3D and 4DHOSVD because it uses a predefined color mode transform and applies HOSVD to every new channel in the Fourier domain.
Assume that the number of image pixels is , that the average time to compute similar patches per reference patch is , that the average number of patches similar to the reference patch is , and that the size of the patch is (). According to , the time complexity of 4DHOSVD and CBM3D are and , respectively. The computational burden of MS-TSVD mainly lies in the PCA transform and patch level t-SVD transform , leading to a total complexity of
. Considering MS-TSVD is a one-step algorithm, it is competitive in terms of efficiency, because in real cases, the input estimate of AWGN should be carefully tuned for best visual effects, and some intermediate results such as grouping index and local PCA can be saved to avoid re-calculation.
Currently, there are four publicly available datasets used to evaluate denoising methods on real-world noisy images. In this section, these existing real datasets are briefly described and a newly constructed dataset is introduced. Information of all included datasets is given in Table II. Examples of each dataset are respectively illustrated in Fig. 4 to Fig. 8. More detailed analysis of existing datasets can be found in .
To the best of our knowledge, the RENOIR dataset is the first trial to construct real-world color image dataset with noisy and ”ground-truth” image pairs. Three cameras are used to take photos of static indoor scenes with different ISO values. But the limitation of this dataset is that some image pairs exhibit misalignment and clear color differences due to the less refined post-processing step .
Three different cameras are used to take photos of 11 static scenes. Each ”ground truth” image is generated by capturing the same and unchanged scene for approximately 500 times and computing their mean value. The major problem with this dataset is that the images are mostly confined to printed scenes, whose statistical properties are similar .
Four cameras are used to construct the DND dataset, but different from the previous two datasets, the authors utilize the Tobit regression to estimate the parameters of the noise process by accessing only two images. Careful post-processing step is conducted to correct the misalignment and remove residual low-frequency bias. The ”ground-truth” images of this dataset are currently not available, but objective results of denoising methods can be obtained by submitting filtered images to the authors’ website.
In order to address the limitations of above datasets, this new benchmark uses 5 different cameras and includes a broader variety of indoor scenes with more camera settings. Similar to 
, each ”ground-truth” image is also obtained by averaging the static images captured on the same scene. Besides, three volunteers are invited to manually remove outlier images that show clear misalignment or different illuminations.
The limitation of previously mentioned datasets is that they are based mainly on static indoor scenes where the lighting conditions can be manually controlled. But in real cases, many photos are captured on outdoor scenes where the objects and source of light are more complicated. In our dataset, six different cameras are used to take photos of both indoor and outdoor scenes. The same strategy of  and  is employed to generate ”ground-truth” images due to its simplicity. But the images taken outdoor should be treated more carefully because natural objects such as flowers and trees are not completely motionless, and the variation of lighting condition can be intense due to the movement of cloud. Therefore, the shutter speed of cameras should be fast enough, and images that show very obvious misalignment and illumination differences are discarded. Each outdoor ”ground-truth” image is obtained by averaging 30 to 60 images of the same scene. Several examples of the new dataset are illustrated in Fig. 8.
|Dataset||Camera Brand||Number of Images||Sensor Size (mm)|
|RENOIR||CANON S90||40||7.4 5.6|
|CANON T3i||40||22.3 14.9|
|XIAOMI Mi3||40||4.7 3.5|
|Nam||CANON 5D MARK III||3||36.0 24.0|
|NIKON D600||3||36.0 24.0|
|NIKON D800||9||36.0 24.0|
|DND||SONY A7R||13||36.0 24.0|
|OLYMPUS E-M10||13||17.3 13.0|
|SONY RX100 IV||12||13.2 8.8|
|HUAWEI NEXUS 6P||12||6.2 4.6|
|Xu||CANON 5D||10||36.0 24.0|
|CANON 80D||6||22.5 15.0|
|CANON 600D||5||22.3 14.9|
|NIKON D800||12||36.0 24.0|
|SONY A7II||7||35.8 23.9|
|Ours||HUAWEI HONOR 6X||18||4.9 3.7|
|IPHONE 5S||18||4.8 3.6|
|IPHONE 6S||36||4.8 3.6|
|CANON 100D||26||22.3 14.9|
|CANON 600D||23||22.3 14.9|
|SONY A6500||18||15.6 23.5|
According to the description and analysis in Section III, the RENOIR and DND datasets are not used in our objective evaluations, because image pairs of the former exhibit some misalignment, while ”ground-truth” images of the latter are not yet open access. But they will be included in our discussion on visual evaluations, since in real cases, ”ground-truth” images are always not available. Besides, the image size of other three datasets are too large for some methods, thus four sub-datasets of cropped images are used, and a brief description of these sub-datasets is given in Table III. All subdatasets are available at https://github.com/ZhaomingKong.
A comprehensive evaluation is conducted on several state-of-the-art methods, including MS-TSVD , CBM3D , 4DHOSVD1 , MCWNNM , GID , TWSC , LLRT , LSCD , DnCNN  and TNRD . The well-known commercial software Neat Image (NI) is included in our discussion on visual evaluation. In real cases, the parameters of all competing methods should be specified for each corrupted image, it is impractical because of the high computational complexity of some compared methods, but to better understand the effectiveness of CBM3D, we carefully tune the input noise level of CBM3D for each image, and term this implementation ’CBM3D_best’ . In our experiments, code and implementation provided by the authors are used, and the average best results on each dataset are reported. All experiments are performed under MATLAB 2017a on a moderate laptop equipped with I5-8250U CPU of 1.8GHz and 8GB RAM.
There are several objective indexes [45, 46] being applied to evaluate image quality by comparing filtered image and ”ground-truth” image. In our experiments, two most commonly used indexes called Peak-to-Signal-Noise-Ratio (PSNR) and (Structural Similarity Index) SSIM  are adopted. PSNR is a pixel-by-pixel comparison strategy that can be obtained by:
where and are two compared images. Typically, the higher the PSNR and SSIM values, the better the results. But in our experiments, we will show that visual effects do not always correspond well with the objective index, partially because the human visual system (HSV) is not susceptible to the presence of noise.
To show the robustness of all compared methods, the PSNR value of each image is listed in Table IV. The average computational time is also provided. Apart from CBM3D, LSCD and TNRD that use C++ mex-function with parallelization technique, other compared methods are purely implemented in MATLAB. For MS-TSVD, the authors also provide a C++ implementation222https://github.com/ZhaomingKong/color_image_denoising that can reduce its computational time to less than 8 seconds. From Table IV, one can see that the simple MS-TSVD method is able to produce very competitive performance in terms of both effectiveness and efficiency. Fig. 9 and Fig. 10 show the denoised images captured by CANON D600 at ISO = 3200 and NIKON D800 at ISO = 1600, respectively. The visual evaluation shows that the representative low-rank based method LLRT and the sparse coding scheme TWSC produce better results in homogenous regions, because the underlying clean patches share similar feature, and thus can be approximated by a low-rank or sparse coding problem. But as is illustrated in Fig. 10, when the ground truth image contains more details, it may be risky to employ the low-rank approximation strategy, and the clear over-smooth result contradicts the improvement in PSNR value. Compared with CBM3D and MS-TSVD that utilize global patch representation, the local 4DHOSVD transform is more easily affected by the presence of noise, which is incorporated in the training process of color mode transform.
|ISO = 3200||35.52||35.93||36.45||34.09||36.92||35.15||36.48||36.98||37.15||36.55||37.01|
|ISO = 3200||38.24||40.58||39.49||35.41||38.68||41.13||39.56||40.13||40.45||39.78||40.93|
|ISO = 1600||38.32||37.78||38.17||35.48||39.20||38.81||39.54||39.40||39.59||39.11||39.54|
|ISO = 3200||37.73||41.03||38.21||33.31||37.64||42.09||39.42||39.47||40.38||39.47||40.05|
|ISO = 6400||32.62||34.11||32.29||30.09||32.96||33.93||33.97||33.85||34.16||34.01||34.01|
In Table V, we list the results of several most competitive methods on Dataset 2 and Dataset 3. It can be seen that on these two datasets, MS-TSVD also achieves the most competitive performance, it is almost the upper bound of the effectiveness of CBM3D. To further demonstrate the observation made in our experiments of the Dataset 1, two images that contain rich details and large homogenous regions are chosen respectively from Dataset 2 and Dataset 3. Visual evaluations are given in Fig. 11 and Fig. 12.
|HUAWEI HONOR 6X||30||PSNR||39.54||39.52||39.71||39.46||39.82||39.97||40.48||40.08|
Cropped images of this dataset is four times as large as that of the above datasets, thus several efficient and competitive methods are chosen according to our previous experiments. Average PSNR and SSIM values are listed in Table VI, while visual evaluations are given in Fig. 13, Fig. 14 and Fig. 15. Comparing Table VI with Table IV and Table V, it can be seen that no competing methods can consistently outperform the state-of-the-art CBM3D, especially when the image size is large and contains more natural outdoor scenes. Fig. 14 shows the slight over-smooth effects of CBM3D, largely because its pre-defined transform is less adaptive, but this drawback is offset by its robustness to local variation, as is illustrated in Fig. 15. Interestingly, we observe in both Fig. 13 and Fig. 15 that when the noise level is high, the commercial software NI seems to employ a pre-defined pattern to smooth out noise and avoid artifacts.
Our comprehensive experiments show that CBM3D and MS-TSVD demonstrate the most competitive performance of all comparison methods. But Fig. 9 shows that they also produce annoying artifacts on severely corrupted homogenous region, mainly due to the presence of strong noise in grouping and training steps. Although the strategy of NI risks sacrificing details, it shows satisfactory smooth visual effects. The implementation of NI is not publicly available, but similar to LLRT and TWSC, one plausible solution is to incorporate the low-rank approximation and sparse coding technique into MS-TSVD, however choosing proper ranks is not easy. Consider that in real cases, the parameters should be carefully tuned, an efficient and effective strategy should be utilized. In this subsection, we use some challenging images from RENOIR and DND datasets to demonstrate how to effectively produce smooth effects using current state-of-the-art framework.
Recently,  builds a pyramid and shows that in the downsampled image of noisy observation, patches tend to be noiseless and share more similar pattern with original underlying clean patches than the corresponding noisy ones. This observation is illustrated in Fig. 16
with images from RENOIR dataset. Therefore, instead of directly filtering noisy observation, an alternative is to first handle downsized image, and then upscale the denoised image back to its original size with some effective image super-resolution algorithms[48, 49]. In this chapter, we use the simplest build-in bicubic function of MATLAB. Fig. 17 and Fig. 18 compare the visual effects of results produced by MS-TSVD with and without this resize strategy. The ”ground-truth” images of DND dataset are not available, but the obvious smooth effects with less color and claw artifacts can be clearly seen. Another straightforward benefit of the resize strategy is efficiency, since the size of downsampled image is much smaller than that of the original one. But in real cases, this strategy should be used very carefully, because it is a tradeoff of details for smoothness.
In this paper, we present a brief review of real-world color image denoising framework and methodology. We describe several publicly available real-world color image datasets and introduce a newly constructed dataset for more comprehensive evaluation. Our experiments give an objective view on the effectiveness and efficiency of competing methods. Challenges and potentials of improving visual effects in real cases are also discussed.
Future work includes incorporating External priors [50, 51] and new grouping strategy  into current best methods.
The authors would like to thank all authors of related methods for providing their code and software package.
S. Harmeling, “Image denoising: Can plain neural networks compete with bm3d?” inIEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 2392–2399.