A Brief Review of Real-World Color Image Denoising

09/10/2018 ∙ by Zhaoming Kong, et al. ∙ South China University of Technology International Student Union 6

Filtering real-world color images is challenging due to the complexity of noise that can not be formulated as a certain distribution. However, the rapid development of camera lens pos- es greater demands on image denoising in terms of both efficiency and effectiveness. Currently, the most widely accepted framework employs the combination of transform domain techniques and nonlocal similarity characteristics of natural images. Based on this framework, many competitive methods model the correlation of R, G, B channels with pre-defined or adaptively learned transforms. In this chapter, a brief review of related methods and publicly available datasets is presented, moreover, a new dataset that includes more natural outdoor scenes is introduced. Extensive experiments are performed and discussion on visual effect enhancement is included.



There are no comments yet.


page 1

page 4

page 5

page 6

page 7

page 9

page 10

page 11

Code Repositories


A publicly available database that contains real noisy and clean color images.

view repo


Very efficient and effective color image denoising method.

view repo


A publicly available software for real color image denoising

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Images are inevitably contaminated by noise during acquisition. According to [1], the real-world noise of existence in the in-camera process [2, 3, 4] is signal dependent and stems mainly from five sources: photon shot, fixed pattern, dark current, readout and quantization noise. With the presence of noise, possible subsequent image processing tasks such as video processing, image analysis and tracking are adversely affected. Therefore, image denoising plays an important role in modern image processing systems.
The past few decades witness great achievements in the field of image denoising [5]. The non-local and similarity feature [6] of natural images, as illustrated in Fig. 1, is mainly employed for designing image denoising methods. Furthermore, transform domain [7] technique is introduced according to the assumption that signal will be sparsely distributed. The combination of non-local similarity property and transform domain technique is a simple and effective framework, which can be roughly divided into three consecutive steps: grouping, collaborative filtering and aggregation. More detailed description is given in Algorithm 1. Following similar idea, there are several interconnected criteria to categorize existing methods, and techniques adopted by some representative methods are briefed in Table I.


Techniques adopted by representative methods. PCA: principal componen analysis, LR: low rank, RN: residual network, TF: tensor factorization, SC: sparse coding, HT: hard-thresholding, WF: Wiener filter.

Method LSCD [8] LLRT [9] DnCNN [10] TNRD [11] MCWNNM [12] GID [13] TWSC [14] 4DHOSVD1 [15] CBM3D [16] MS-TSVD [17]
Techique Color Line + PCA LR + TF RN External Prior Weighted LR External+Internal Prior Weighted SC HT + TF HT+WF HT + TF
Fig. 1: Illustration of non-local similarity feature of natural images.

First, based on the choice of transform, numerous methods are devised. Early works are from bilateral filters [18], wavelet [19] and curvelet [20] transforms. Then the well-known color block-matching 3D (CBM3D) [16] combines discrete cosine transform (DCT) with opponent color mode transform, and produces state-of-the-art performance. Instead of using predefined transforms, some recently proposed methods adopt adaptively learned transforms. [21]

uses singular value decomposition (SVD) and designs a weighted nuclear norm (WNNM) strategy.

[15] uses the idea of tensor [22] and trains corresponding mode transforms with 4D higher-order singular value decomposition (4DHOSVD). Most recently, [17] utilizes tensor-SVD [23] and characterizes color image with block circulant representation [24].
The second difference lies in the modeling of noise. Most of existing methods [19, 20, 16, 21, 15, 17, 25, 26, 27] consider the simplest additive white Gaussian noise (AWGN), and aim to recover the clean image from its noisy observation , where is AWGN. Typically, -norm is combined with different regularization to enforce sparsity in the transform domain. Efforts beyond AWGN model are also made. [28] and [29] propose Poisson noise reduction models, [30] and [31] consider mixed Poisson-Gaussian noise, while [32] and [33] handle mixed Gaussian and impulsive noise. Besides, there are also some methods [34, 35, 36] and well-known software tool box Neat Image (NI) 111Neatlab ABSoft. https://ni.neatvideo.com/home developed for real-world noisy image denoising.
Last but not least, many methods vary according to how correlation among R, G, B channels are characterized. [37] proposes a multichannel nonlocal fusion (MNLF) approach, which constructs and fuses multiple NLM across three channels together. [38] introduces color line to model the correlation among neighbouring pixels as well as among RGB channels, and [12] extends WNNM to handle color image by using a weight matrix to balance noise in the RGB channel. Moreover, some methods apply a de-correlation transform to the RGB color space so that grayscale image filtering techniques can be directly utilized. [39] proposes to suppress noise in the hue-saturation-value space due to the good continuation feature of hue space. The representative CBM3D first transforms the RGB channel into chrominance-luminance color space, then performs noise removal with BM3D [16] on every component in the new space separately.
In order to verify the effectiveness and efficiency of above methods, several real-world color image datasets [40, 2, 41, 42] are constructed, where each scene includes noisy and ”ground-truth” image pairs. According to [42], a reasonable idea to obtain ”ground-truth” image is to capture the same and unchanged scene for many times and compute their mean image, because for each pixel the noise is assumed to be larger or smaller than 0, and thus can be greatly suppressed by sampling the same pixel for many times. A major problem with existing methods is that they mainly consist of indoor scenes and artificial objects. To comprehensively understand the effectiveness of competing methods, a new dataset that includes many natural outdoor scenes is constructed. Also, compared with indoor scenes, the lighting conditions and illuminations of outdoor scenes are more complicated, thus making denoising more challenging.
In this paper, background knowledge is introduced in Section II. A brief description of existing and newly constructed datasets is included in Section III. Extensive experiments and discussion are presented in Section IV. Conclusion is given in Section V.

Input: Color image , transform , inverse transform , patch size , local search window size , number of similar patches , pixels between two adjacent reference patches .
Output: Filtered image .
Step 1 (Grouping): For a given reference patch, use a certain criteria (often nearest neighbourhood) to stack similar patches in a group .
Step 2 (Collaborative filtering):
     (1) Apply transform T to and obtain .
     (2) Apply threshold technique (often low-rank or hard-threshold) to in the transform domain..
     (3) Apply inverse transform to thresholded and obtain filtered group .
Step 3 (Aggregation): (Often averagely) Write back all image patches in to their original locations and obtain filtered image .

Algorithm 1 Nonlocal and transform-domain framework

Ii Background Knowledge

Ii-a Notations and Preliminaries

In this paper, tensors are denoted by calligraphic letters, e.g., , matrices by boldface capital letters, e.g.

; vectors are denoted by lowercase boldface letters, e.g.,

. The th entry of a vector is denoted by , element of a matrix by , and element of a third-order tensor by . An th-order tensor is denoted as . The -mode product of a tensor by a matrix , denoted by is also a tensor. The Frobenius norm of a tensor is defined as . The mode- matricization or unfolding of a , denoted by , maps tensor elements to matrix element where , with .
Given a tensor , if can be expressed as the product


where is core tensor and represents orthogonal factor matrix, then the matricized version of (1) is


where denotes the Kronecker product.

Ii-B Related methods

(a) Vector representation
(b) Tensor representation
Fig. 2: Vector and tensor representation of patches.

Regardless of various categorization criteria, almost all related methods under the nonlocal and transform domain framework employ the idea of multiway filtering technique [43], and apply different regularization to filter group of patches along each dimension. Vector and tensor are two commonly used representation to model each color image patch, as is illustrated in Fig. 2. Vector representation based methods such as TWSC [14], MCWNNM [12], LSCD [8]

can benefit from good and vastly-studied statistical properties of singular value decomposition (SVD) or principal component analysis (PCA). However, each vectorized patch can be a lengthy vector and increases computational burden. To alleviate this issue, many recently proposed methods such as 4DHOSVD, LLRT

[9] and MS-TSVD exploit tensor representation. In fact, pioneer work is from CBM3D, although tensor is not explicitly stated.
Currently, CBM3D, 4DHOSVD and MS-TSVD demonstrate the most competitive performance in terms of both effectiveness and efficiency, and since tensor can be regarded as an extension of vector to higher dimension, we mainly introduce these three state-of-the-art methods and compare their similarity and difference. More detailed analysis is given in [17].
For a group of patches , reconstruction problem of both CBM3D and 4DHOSVD can be represented by fourth-order tensor decomposition


where are corresponding mode transform matrices along each dimension of . For CBM3D, all transforms in (3) are predefined, and its opponent color mode transform matrix is


Specifically, assume , then the first slice of in the new color mode can be regarded as luminance channel, while the rest two slices are chrominance channel. CBM3D is very efficient because it does not have to train local transforms, and its grouping step is performed only on the luminance channel. For 4DHOSVD, all mode transform matrices are learned by solving



is identity matrix and

can be obtained via SVD of corresponding unfolding matrix. Different from CBM3D and 4DHOSVD that use direct folding and unfolding of tensor, MS-TSVD adopts the idea of tensor-SVD (t-SVD), and characterizes each color image patch as a block circulant matrix, which can be represented by

Fig. 3: Block circulant representation of a group of similar patches.

where are three color channels of patch . Fig. 3 illustrates a group of color image patches that use block circulant representation. Then, according to tensor decomposition in (1), the reconstruction problem of MS-TSVD can be written as


the factor matrices of (7) can be obtained by solving


Directly solving (8

) is time-consuming, because it includes the decomposition of a large block circulant matrix. Thus the authors first perform fast Fourier transform (FFT)

[44] along the third mode of to convert the block circulant representation into diagonal representation in the Fourier domain. Each patch in the Fourier domain can be obtained by , where is the FFT matrix defined as


then problem (8) can be reformulated as


where . According to [17], . Furthermore, from (9), two interesting features of FFT and correspondingly of patch in the Fourier domain can be seen. First, the second and third slices of are conjugate. Second, the first slice of in the Fourier domain can be regarded as luminance channel of (4). Therefore, to train the group-wise transform more effectively, an alternative to (10) is using only the luminance channel information. Indeed MS-TSVD can be regarded as a generalization of both CBM3D and 4DHOSVD because it uses a predefined color mode transform and applies HOSVD to every new channel in the Fourier domain.
Assume that the number of image pixels is , that the average time to compute similar patches per reference patch is , that the average number of patches similar to the reference patch is , and that the size of the patch is (). According to [15], the time complexity of 4DHOSVD and CBM3D are and , respectively. The computational burden of MS-TSVD mainly lies in the PCA transform and patch level t-SVD transform , leading to a total complexity of

. Considering MS-TSVD is a one-step algorithm, it is competitive in terms of efficiency, because in real cases, the input estimate of AWGN should be carefully tuned for best visual effects, and some intermediate results such as grouping index and local PCA can be saved to avoid re-calculation.

Iii Real Datasets

Currently, there are four publicly available datasets used to evaluate denoising methods on real-world noisy images. In this section, these existing real datasets are briefly described and a newly constructed dataset is introduced. Information of all included datasets is given in Table II. Examples of each dataset are respectively illustrated in Fig. 4 to Fig. 8. More detailed analysis of existing datasets can be found in [42].

Iii-a Existing Dataset

Iii-A1 RENOIR Dataset [40]

To the best of our knowledge, the RENOIR dataset is the first trial to construct real-world color image dataset with noisy and ”ground-truth” image pairs. Three cameras are used to take photos of static indoor scenes with different ISO values. But the limitation of this dataset is that some image pairs exhibit misalignment and clear color differences due to the less refined post-processing step [42].

Iii-A2 Nam Dataset [2]

Three different cameras are used to take photos of 11 static scenes. Each ”ground truth” image is generated by capturing the same and unchanged scene for approximately 500 times and computing their mean value. The major problem with this dataset is that the images are mostly confined to printed scenes, whose statistical properties are similar [42].

Iii-A3 DND Dataset [41]

Four cameras are used to construct the DND dataset, but different from the previous two datasets, the authors utilize the Tobit regression to estimate the parameters of the noise process by accessing only two images. Careful post-processing step is conducted to correct the misalignment and remove residual low-frequency bias. The ”ground-truth” images of this dataset are currently not available, but objective results of denoising methods can be obtained by submitting filtered images to the authors’ website.

Iii-A4 Xu Dataset [42]

In order to address the limitations of above datasets, this new benchmark uses 5 different cameras and includes a broader variety of indoor scenes with more camera settings. Similar to [2]

, each ”ground-truth” image is also obtained by averaging the static images captured on the same scene. Besides, three volunteers are invited to manually remove outlier images that show clear misalignment or different illuminations.

Iii-B The Proposed Dataset

The limitation of previously mentioned datasets is that they are based mainly on static indoor scenes where the lighting conditions can be manually controlled. But in real cases, many photos are captured on outdoor scenes where the objects and source of light are more complicated. In our dataset, six different cameras are used to take photos of both indoor and outdoor scenes. The same strategy of [2] and [42] is employed to generate ”ground-truth” images due to its simplicity. But the images taken outdoor should be treated more carefully because natural objects such as flowers and trees are not completely motionless, and the variation of lighting condition can be intense due to the movement of cloud. Therefore, the shutter speed of cameras should be fast enough, and images that show very obvious misalignment and illumination differences are discarded. Each outdoor ”ground-truth” image is obtained by averaging 30 to 60 images of the same scene. Several examples of the new dataset are illustrated in Fig. 8.

Dataset Camera Brand Number of Images Sensor Size (mm)
RENOIR CANON S90 40 7.4 5.6
CANON T3i 40 22.3 14.9
XIAOMI Mi3 40 4.7 3.5
Nam CANON 5D MARK III 3 36.0 24.0
NIKON D600 3 36.0 24.0
NIKON D800 9 36.0 24.0
DND SONY A7R 13 36.0 24.0
OLYMPUS E-M10 13 17.3 13.0
SONY RX100 IV 12 13.2 8.8
HUAWEI NEXUS 6P 12 6.2 4.6
Xu CANON 5D 10 36.0 24.0
CANON 80D 6 22.5 15.0
CANON 600D 5 22.3 14.9
NIKON D800 12 36.0 24.0
SONY A7II 7 35.8 23.9
Ours HUAWEI HONOR 6X 18 4.9 3.7
IPHONE 5S 18 4.8 3.6
IPHONE 6S 36 4.8 3.6
CANON 100D 26 22.3 14.9
CANON 600D 23 22.3 14.9
SONY A6500 18 15.6 23.5
TABLE II: Information of Five Real-World Color Image Datasets.
(a) CANON S90
(b) CANON T3i
(c) XIAOMI Mi3
Fig. 4: Some sample real noisy images of RENOIR dataset.
(b) NIKON D600
(c) NIKON D800
Fig. 5: Some sample real noisy images of Nam dataset.
(a) NEXUS 6P
(c) SONY A7R
Fig. 6: Some sample real noisy images of DND dataset.
(a) CANON 5D
(b) CANON 80D
(c) NIKON D800
Fig. 7: Some sample real noisy images of Xu dataset.
(c) CANON 600D
Fig. 8: Some sample real noisy images of our newly constructed dataset.

Iv Experiments

Iv-a Benchmark Datasets

According to the description and analysis in Section III, the RENOIR and DND datasets are not used in our objective evaluations, because image pairs of the former exhibit some misalignment, while ”ground-truth” images of the latter are not yet open access. But they will be included in our discussion on visual evaluations, since in real cases, ”ground-truth” images are always not available. Besides, the image size of other three datasets are too large for some methods, thus four sub-datasets of cropped images are used, and a brief description of these sub-datasets is given in Table III. All subdatasets are available at https://github.com/ZhaomingKong.

Subdataset Dataset # of Cropped Images Image Size
Dataset 1 Nam [2] 15
Dataset 2 Nam [2] 60
Dataset 3 Xu [42] 100
Dataset 4 Ours [17] 298
TABLE III: Information of four sub-datasets.

Iv-B Comparison Methods

A comprehensive evaluation is conducted on several state-of-the-art methods, including MS-TSVD [17], CBM3D [16], 4DHOSVD1 [15], MCWNNM [12], GID [13], TWSC [14], LLRT [9], LSCD [38], DnCNN [10] and TNRD [11]. The well-known commercial software Neat Image (NI) is included in our discussion on visual evaluation. In real cases, the parameters of all competing methods should be specified for each corrupted image, it is impractical because of the high computational complexity of some compared methods, but to better understand the effectiveness of CBM3D, we carefully tune the input noise level of CBM3D for each image, and term this implementation ’CBM3D_best’ . In our experiments, code and implementation provided by the authors are used, and the average best results on each dataset are reported. All experiments are performed under MATLAB 2017a on a moderate laptop equipped with I5-8250U CPU of 1.8GHz and 8GB RAM.

Iv-C Objective Index

There are several objective indexes [45, 46] being applied to evaluate image quality by comparing filtered image and ”ground-truth” image. In our experiments, two most commonly used indexes called Peak-to-Signal-Noise-Ratio (PSNR) and (Structural Similarity Index) SSIM [45] are adopted. PSNR is a pixel-by-pixel comparison strategy that can be obtained by:


where and are two compared images. Typically, the higher the PSNR and SSIM values, the better the results. But in our experiments, we will show that visual effects do not always correspond well with the objective index, partially because the human visual system (HSV) is not susceptible to the presence of noise.

Iv-D Experiment Results

Iv-D1 Results on Dataset 1

To show the robustness of all compared methods, the PSNR value of each image is listed in Table IV. The average computational time is also provided. Apart from CBM3D, LSCD and TNRD that use C++ mex-function with parallelization technique, other compared methods are purely implemented in MATLAB. For MS-TSVD, the authors also provide a C++ implementation222https://github.com/ZhaomingKong/color_image_denoising that can reduce its computational time to less than 8 seconds. From Table IV, one can see that the simple MS-TSVD method is able to produce very competitive performance in terms of both effectiveness and efficiency. Fig. 9 and Fig. 10 show the denoised images captured by CANON D600 at ISO = 3200 and NIKON D800 at ISO = 1600, respectively. The visual evaluation shows that the representative low-rank based method LLRT and the sparse coding scheme TWSC produce better results in homogenous regions, because the underlying clean patches share similar feature, and thus can be approximated by a low-rank or sparse coding problem. But as is illustrated in Fig. 10, when the ground truth image contains more details, it may be risky to employ the low-rank approximation strategy, and the clear over-smooth result contradicts the improvement in PSNR value. Compared with CBM3D and MS-TSVD that utilize global patch representation, the local 4DHOSVD transform is more easily affected by the presence of noise, which is incorporated in the training process of color mode transform.

TABLE IV: PSNR (dB) results and SPEED comparison of different methods on Dataset 1. Three best results are bolded.
CANON 5D 37.86 39.23 39.51 37.26 40.82 40.55 41.20 40.77 40.96 40.22 40.79
36.21 36.31 36.47 34.13 37.19 35.92 37.25 37.31 37.31 36.97 37.37
ISO = 3200 35.52 35.93 36.45 34.09 36.92 35.15 36.48 36.98 37.15 36.55 37.01
NIKON D600 34.65 34.74 34.79 33.62 35.32 35.36 35.54 35.21 35.38 35.02 35.29
36.26 36.83 36.37 34.48 36.62 37.09 37.03 36.76 36.81 36.60 36.95
ISO = 3200 38.24 40.58 39.49 35.41 38.68 41.13 39.56 40.13 40.45 39.78 40.93
NIKON D800 37.90 37.39 38.11 35.79 38.88 39.36 39.26 39.02 39.25 38.85 39.21
38.88 40.27 40.52 36.08 40.66 41.91 41.45 41.65 41.65 41.35 41.98
ISO = 1600 38.32 37.78 38.17 35.48 39.20 38.81 39.54 39.40 39.59 39.11 39.54
NIKON D800 37.45 39.79 37.69 34.08 37.92 40.27 38.94 39.59 39.86 39.24 39.98
36.49 37.34 35.90 33.70 36.62 37.22 37.40 37.49 37.54 37.28 37.65
ISO = 3200 37.73 41.03 38.21 33.31 37.64 42.09 39.42 39.47 40.38 39.47 40.05
NIKON D800 32.33 35.09 32.81 29.83 33.01 35.53 34.85 34.13 34.85 34.40 34.50
32.55 34.05 32.33 30.55 32.93 34.15 33.97 33.73 33.92 33.81 33.93
ISO = 6400 32.62 34.11 32.29 30.09 32.96 33.93 33.97 33.85 34.16 34.01 34.01
Average 36.20 37.36 36.61 33.86 37.02 37.90 37.72 37.70 37.95 37.51 37.95
Time (s) 9.8 1168.9 5.6 98.6 55.6 498.8 298.8 6.8 6.8 130.8 98.8
(a) Clean
(b) Noisy
(c) LSCD
(d) LLRT
(e) GID
(f) TWSC
(h) 4DHOSVD1
(i) CBM3D
Fig. 9: Denoising images of compared methods on Dataset 1. The camera is CANON D600 with ISO = 3200. Please zoom-in for better view.
(a) Clean
(b) Noisy
(c) LSCD
(d) LLRT
(e) GID
(f) TWSC
(h) 4DHOSVD1
(i) CBM3D
Fig. 10: Denoising images of compared methods on Dataset 1. The camera is NIKON D800 with ISO = 1600. Please zoom-in to for better view.

Iv-D2 Results on Dataset 2 and Dataset 3

Dataset 2 PSNR 38.51 38.41 39.66 39.03 39.15 39.40 39.68 39.75
SSIM 0.9636 0.9633 0.9759 0.9698 0.9729 0.9740 0.9775 0.9756
Dataset 3 PSNR 38.51 38.37 38.62 38.51 38.51 38.69 38.81 38.82
SSIM 0.9707 0.9675 0.9674 0.9671 0.9673 0.9694 0.9712 0.9694
TABLE V: Average PSNR (dB) and SSIM if different denoising methods on Dataset 2 and Dataset 3. The best results are bolded.
(a) Clean
(b) Noisy
(c) NI
(d) LLRT
(e) GID
(f) TWSC
(h) 4DHOSVD1
(i) CBM3D
Fig. 11: Denoising images of compared methods on Dataset 2. The camera is NIKON D800 with ISO = 1600. Please zoom-in for better view.
(a) Clean
(b) Noisy
(c) NI
(d) LLRT
(e) GID
(f) TWSC
(h) 4DHOSVD1
(i) CBM3D
Fig. 12: Denoising images of compared methods on Dataset 3. The camera is CANON 5D with ISO = 6400. Please zoom-in for better view.

In Table V, we list the results of several most competitive methods on Dataset 2 and Dataset 3. It can be seen that on these two datasets, MS-TSVD also achieves the most competitive performance, it is almost the upper bound of the effectiveness of CBM3D. To further demonstrate the observation made in our experiments of the Dataset 1, two images that contain rich details and large homogenous regions are chosen respectively from Dataset 2 and Dataset 3. Visual evaluations are given in Fig. 11 and Fig. 12.

Iv-D3 Results on Dataset 4

TABLE VI: Average PSNR (dB) and SSIM if different denoising methods on our newly constructed dataset. The best results are bolded.
Camera Cropped Images Index LLRT GID TWSC MCWNNM 4DHOSVD1 CBM3D CBM3D_best MS-TSVD
HUAWEI HONOR 6X 30 PSNR 39.54 39.52 39.71 39.46 39.82 39.97 40.48 40.08
SSIM 0.9669 0.9653 0.9651 0.9610 0.9658 0.9669 0.9740 0.9674
IPHONE 5S 36 PSNR 40.02 40.12 40.27 39.87 40.68 40.77 41.25 40.84
SSIM 0.9676 0.9642 0.9617 0.9567 0.9664 0.9668 0.9758 0.9668
IPHONE 6S 67 PSNR 39.72 40.16 40.12 40.18 40.36 40.55 41.16 40.53
SSIM 0.9663 0.9670 0.9619 0.9628 0.9671 0.9693 0.9783 0.9674
CANON 100D 55 PSNR 41.84 40.86 41.65 41.47 41.41 41.69 42.08 41.99
SSIM 0.9784 0.9743 0.9767 0.9774 0.9771 0.9780 0.9808 0.9794
CANON 600D 25 PSNR 42.53 41.60 42.52 42.07 42.14 42.54 42.89 42.75
SSIM 0.9816 0.9790 0.9824 0.9795 0.9810 0.9836 0.9851 0.9840
SONY A6500 36 PSNR 45.71 44.94 45.48 45.37 45.56 45.7 45.81 45.89
SSIM 0.9899 0.9887 0.9896 0.9894 0.9901 0.9902 0.9904 0.9903
(a) Clean
(b) Noisy
(c) LLRT
(d) NI
(e) GID
(f) TWSC
(h) 4DHOSVD1
(i) CBM3D
Fig. 13: Denoising images of compared methods on Dataset 3. The camera is HUAWEI HONOR 6X with auto mode. Please zoom-in for better view.
(a) Clean
(b) Noisy
(c) LLRT
(d) NI
(e) GID
(f) TWSC
(h) 4DHOSVD1
(i) CBM3D
Fig. 14: Denoising images of compared methods on Dataset 3. The camera is IPHONE 6S with auto mode. Please zoom-in for better view.
(a) Clean
(b) Noisy
(c) LLRT
(d) NI
(e) GID
(f) TWSC
(h) 4DHOSVD1
(i) CBM3D
Fig. 15: Denoising images of compared methods on Dataset 3. The camera is IPHONE 6S with auto mode. Please zoom-in for better view.

Cropped images of this dataset is four times as large as that of the above datasets, thus several efficient and competitive methods are chosen according to our previous experiments. Average PSNR and SSIM values are listed in Table VI, while visual evaluations are given in Fig. 13, Fig. 14 and Fig. 15. Comparing Table VI with Table IV and Table V, it can be seen that no competing methods can consistently outperform the state-of-the-art CBM3D, especially when the image size is large and contains more natural outdoor scenes. Fig. 14 shows the slight over-smooth effects of CBM3D, largely because its pre-defined transform is less adaptive, but this drawback is offset by its robustness to local variation, as is illustrated in Fig. 15. Interestingly, we observe in both Fig. 13 and Fig. 15 that when the noise level is high, the commercial software NI seems to employ a pre-defined pattern to smooth out noise and avoid artifacts.

Iv-D4 Discussion

Our comprehensive experiments show that CBM3D and MS-TSVD demonstrate the most competitive performance of all comparison methods. But Fig. 9 shows that they also produce annoying artifacts on severely corrupted homogenous region, mainly due to the presence of strong noise in grouping and training steps. Although the strategy of NI risks sacrificing details, it shows satisfactory smooth visual effects. The implementation of NI is not publicly available, but similar to LLRT and TWSC, one plausible solution is to incorporate the low-rank approximation and sparse coding technique into MS-TSVD, however choosing proper ranks is not easy. Consider that in real cases, the parameters should be carefully tuned, an efficient and effective strategy should be utilized. In this subsection, we use some challenging images from RENOIR and DND datasets to demonstrate how to effectively produce smooth effects using current state-of-the-art framework.
Recently, [47] builds a pyramid and shows that in the downsampled image of noisy observation, patches tend to be noiseless and share more similar pattern with original underlying clean patches than the corresponding noisy ones. This observation is illustrated in Fig. 16

with images from RENOIR dataset. Therefore, instead of directly filtering noisy observation, an alternative is to first handle downsized image, and then upscale the denoised image back to its original size with some effective image super-resolution algorithms

[48, 49]. In this chapter, we use the simplest build-in bicubic function of MATLAB. Fig. 17 and Fig. 18 compare the visual effects of results produced by MS-TSVD with and without this resize strategy. The ”ground-truth” images of DND dataset are not available, but the obvious smooth effects with less color and claw artifacts can be clearly seen. Another straightforward benefit of the resize strategy is efficiency, since the size of downsampled image is much smaller than that of the original one. But in real cases, this strategy should be used very carefully, because it is a tradeoff of details for smoothness.

(a) Clean
(b) Noisy
(c) Reisized
(d) Clean
(e) Noisy
(f) Reisized
Fig. 16: Visual comparison of clean, noisy and resized image of the same scene from RENOIR dataset. The camera is CANON T3i. The resized images are generated by MATLAB with ’Resized = imresize(Noisy,0.5)’. Please zoom-in for better view.
Fig. 17: Visual evaluation of filtered images of DND dataset. From left to right: Noisy observation, MS-TSVD and resized MS-TSVD. Please zoom-in for better view.
Fig. 18: Visual evaluation of filtered images of RENOIR dataset. From left to right: Clean observation, Noisy observation, and resized MS-TSVD. Please zoom-in for better view.

V Conclusion

In this paper, we present a brief review of real-world color image denoising framework and methodology. We describe several publicly available real-world color image datasets and introduce a newly constructed dataset for more comprehensive evaluation. Our experiments give an objective view on the effectiveness and efficiency of competing methods. Challenges and potentials of improving visual effects in real cases are also discussed.
Future work includes incorporating External priors [50, 51] and new grouping strategy [52] into current best methods.


The authors would like to thank all authors of related methods for providing their code and software package.


  • [1] G. Healey and R. Kondepudy, “Radiometric ccd camera calibration and noise estimation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 3, pp. 267–276, 1994.
  • [2] S. Nam, Y. Hwang, Y. Matsushita, and S. J. Kim, “A holistic approach to cross-channel image noise modeling and its application to image denoising,” in Computer Vision and Pattern Recognition, 2016, pp. 1683–1691.
  • [3] Y. Tsin, V. Ramesh, and T. Kanade, “Statistical calibration of ccd imaging process,” in Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on, 2001, pp. 480–487 vol.1.
  • [4] S. J. Kim, H. T. Lin, Z. Lu, S. Süsstrunk, S. Lin, and M. S. Brown, “A new in-camera imaging model for color computer vision and its application.” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 12, pp. 2289–302, 2012.
  • [5] P. Milanfar, “A tour of modern image filtering: New insights and methods, both practical and theoretical,” Signal Processing Magazine IEEE, vol. 30, no. 1, pp. 106–128, 2013.
  • [6] A. Buades, “A review of image denoising algorithms, with a new one,” Siam Journal on Multiscale Modeling and Simulation, vol. 4, no. 2, pp. 490–530, 2005.
  • [7] L. P. Yaroslavsky, K. O. Egiazarian, and J. T. Astola, “Transform domain image restoration methods: review, comparison, and interpretation,” Proceedings of SPIE - The International Society for Optical Engineering, pp. 155–169, 2001.
  • [8] M. Rizkinia, T. Baba, K. Shirai, and M. Okuda, “Local spectral component decomposition for multi-channel image denoising,” IEEE Transactions on Image Processing, vol. 25, no. 7, pp. 3208–3218, 2016.
  • [9] Y. Chang, L. Yan, and S. Zhong, “Hyper-laplacian regularized unidirectional low-rank tensor recovery for multispectral image denoising,” in Proc. IEEE Conf. CVPR, 2017, pp. 4260–4268.
  • [10]

    S. Harmeling, “Image denoising: Can plain neural networks compete with bm3d?” in

    IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 2392–2399.
  • [11] Y. Chen, W. Yu, and T. Pock, “On learning optimized reaction diffusion processes for effective image restoration,” in IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 5261–5269.
  • [12] J. Xu, L. Zhang, D. Zhang, and X. Feng, “Multi-channel weighted nuclear norm minimization for real color image denoising,” arXiv preprint arXiv:1705.09912, 2017.
  • [13] J. Xu, L. Zhang, and D. Zhang, “External prior guided internal prior learning for real-world noisy image denoising,” IEEE Transactions on Image Processing, vol. 27, no. 6, pp. 2996–3010, 2018.
  • [14] ——, “A trilateral weighted sparse coding scheme for real-world image denoising,” CoRR, vol. abs/1807.04364, 2018. [Online]. Available: http://arxiv.org/abs/1807.04364
  • [15] A. Rajwade, A. Rangarajan, and A. Banerjee, “Image denoising using the higher order singular value decomposition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 4, pp. 849–862, 2013.
  • [16] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3-d transform-domain collaborative filtering.” IEEE Transactions on Image Processing, vol. 16, no. 8, pp. 2080–2095, 2007.
  • [17] Z. Kong and X. Yang, “Color image and multispectral image denoising using block circulant representation,” Unpublished draft, 2018.
  • [18] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,” in Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271), Jan 1998, pp. 839–846.
  • [19] S. G. Chang, B. Yu, and M. Vetterli, “Adaptive wavelet thresholding for image denoising and compression,” IEEE Transactions on Image Processing, vol. 9, no. 9, pp. 1532–1546, 2002.
  • [20] J. L. Starck, E. J. Candès, and D. L. Donoho, “The curvelet transform for image denoising,” IEEE Transactions on Image Processing, vol. 11, no. 6, pp. 670–684, 2002.
  • [21] S. Gu, L. Zhang, W. Zuo, and X. Feng, “Weighted nuclear norm minimization with application to image denoising,” in IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2862–2869.
  • [22] T. G. Kolda and B. W. Bader, “Tensor decompositions and applications,” SIAM Rev., vol. 51, no. 3, pp. 455–500, 2009.
  • [23] M. E. Kilmer, K. Braman, N. Hao, and R. C. Hoover, “Third-order tensors as operators on matrices: a theoretical and computational framework with applications in imaging,” SIAM J. Matrix Anal. Appl., vol. 34, no. 1, pp. 148–172, 2013.
  • [24] T. D. Mazancourt and D. Gerlic, “The inverse of a block-circulant matrix,” IEEE Transactions on Antennas and Propagation, vol. 31, no. 5, pp. 808–810, 1983.
  • [25] M. Lebrun, A. Buades, and J. M. Morel, “A nonlocal bayesian image denoising algorithm,” Siam Journal on Imaging Sciences, vol. 6, no. 3, pp. 1665–1688, 2013.
  • [26] D. Zoran and Y. Weiss, “From learning models of natural image patches to whole image restoration,” in 2011 International Conference on Computer Vision, Nov 2011, pp. 479–486.
  • [27] J. Xu, L. Zhang, W. Zuo, D. Zhang, and X. Feng, “Patch group based nonlocal self-similarity prior learning for image denoising,” in IEEE International Conference on Computer Vision, 2015, pp. 244–252.
  • [28] B. Zhang, J. M. Fadili, and J. L. Starck, “Wavelets, ridgelets, and curvelets for poisson noise removal,” IEEE Trans Image Process, vol. 17, no. 7, pp. 1093–1108, 2008.
  • [29] J. Salmon, C. A. Deledalle, R. Willett, and Z. Harmany, “Poisson noise reduction with non-local pca,” Journal of Mathematical Imaging and Vision, vol. 48, no. 2, pp. 279–294, 2014.
  • [30] F. Luisier, T. Blu, and M. Unser, “Image denoising in mixed poisson–gaussian noise,” IEEE Transactions on Image Processing, vol. 20, no. 3, pp. 696–708, 2011.
  • [31] M. Y. Le, E. D. Angelini, and J. C. Olivomarin, “An unbiased risk estimator for image denoising in the presence of mixed poisson-gaussian noise.” IEEE Trans Image Process, vol. 23, no. 3, pp. 1255–1268, 2014.
  • [32] J. Jiang, L. Zhang, and J. Yang, “Mixed noise removal by weighted encoding with sparse nonlocal regularization,” Image Processing IEEE Transactions on, vol. 23, no. 6, pp. 2651–2662, 2014.
  • [33] J. Xu, D. Ren, L. Zhang, and D. Zhang, “Patch group based bayesian learning for blind image denoising,” in Asian Conference on Computer Vision.   Springer, 2016, pp. 79–95.
  • [34] C. Liu, R. Szeliski, S. B. Kang, C. L. Zitnick, and W. T. Freeman, “Automatic estimation and removal of noise from a single image,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 2, p. 299, 2008.
  • [35] M. Lebrun, M. Colom, and J. M. Morel, “The noise clinic: a blind image denoising algorithm,” Image Processing on Line, vol. 5, pp. 1–54, 2015.
  • [36] ——, “Multiscale image blind denoising.” IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society, vol. 24, no. 10, p. 3149, 2015.
  • [37] J. Dai, O. C. Au, L. Fang, C. Pang, F. Zou, and J. Li, “Multichannel nonlocal means fusion for color image denoising,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 23, no. 11, pp. 1873–1886, 2013.
  • [38] W.-C. Tu, C.-L. Tsai, and S.-Y. Chien, “Collaborative noise reduction using color-line model,” in Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on.   IEEE, 2014, pp. 2465–2469.
  • [39] O. Ben-Shahar and S. W. Zucker, “Hue fields and color curvatures: a perceptual organization approach to color image denoising,” in 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings., vol. 2, June 2003, pp. II–II.
  • [40] J. Anaya and A. Barbu, “Renoir – a dataset for real low-light image noise reduction ☆,” Journal of Visual Communication and Image Representation, vol. 51, pp. 144–154, 2014.
  • [41] T. Plötz and S. Roth, “Benchmarking denoising algorithms with real photographs,” in Computer Vision and Pattern Recognition, 2017, pp. 2750–2759.
  • [42] J. Xu, H. Li, Z. Liang, D. Zhang, and L. Zhang, “Real-world Noisy Image Denoising: A New Benchmark,” ArXiv e-prints, Apr. 2018.
  • [43] D. Muti, S. Bourennane, and J. Marot, “Lower-rank tensor approximation and multiway filtering,” SIAM journal on Matrix Analysis and Applications, vol. 30, no. 3, pp. 1172–1204, 2008.
  • [44] L. J. Yang, B. H. Zhang, and X. Z. Ye, “Fast fourier transform and its applications,” Opto-electronic Engineering, vol. 31, pp. 303–350, 2004.
  • [45] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, 2004.
  • [46] L. Zhang, L. Zhang, X. Mou, and D. Zhang, “Fsim: A feature similarity index for image quality assessment,” IEEE transactions on Image Processing, vol. 20, no. 8, pp. 2378–2386, 2011.
  • [47] M. Zontak, I. Mosseri, and M. Irani, “Separating signal from noise using patch recurrence across scales,” in Proc. IEEE Conf. Comput. Vision Pattern Recog.   IEEE, 2013, pp. 1195–1202.
  • [48] Y. Tang and L. Shao, “Pairwise operator learning for patch-based single-image super-resolution,” IEEE Transactions on Image Processing, vol. 26, no. 2, pp. 994–1003, 2017.
  • [49] C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 2, pp. 295–307, 2016.
  • [50] S. Roth and M. J. Black, “Fields of experts,” International Journal of Computer Vision, vol. 82, no. 2, p. 205, 2009.
  • [51] E. Luo, S. H. Chan, and T. Q. Nguyen, “Adaptive image denoising by targeted databases,” IEEE Transactions on Image Processing, vol. 24, no. 7, pp. 2167–2181, 2015.
  • [52] A. Foi and G. Boracchi, “Foveated nonlocal self-similarity,” International Journal of Computer Vision, vol. 120, no. 1, pp. 78–110, 2016.