Deep Graph Laplacian Regularization

07/31/2018 ∙ by Jin Zeng, et al. ∙ 0

We propose to combine the robustness merit of model-based approaches and the learning power of data-driven approaches for image restoration. Specifically, by integrating graph Laplacian regularization as a trainable module into a deep learning framework, we are less susceptible to overfitting than pure CNN-based approaches, achieving higher robustness to small dataset and cross-domain denoising. First, a sparse neighborhood graph is built from the output of a convolutional neural network (CNN). Then the image is restored by solving an unconstrained quadratic programming problem, using a corresponding graph Laplacian regularizer as a prior term. The proposed restoration pipeline is fully differentiable and hence can be end-to-end trained. Experimental results demonstrate that our work avoids overfitting given small training data. It is also endowed with strong cross-domain generalization power, outperforming the state-of-the-art approaches by remarkable margin.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Image restoration is a class of inverse problems that seek the original image given only one or more observations degraded by corruption, e.g., noise, down-sampling, blurring or lost components (either in the spatial or frequency domain). Such problems are inherently under-determined. In order to regularize such ill-posed problems into well-posed ones, a large body of works adopt signal priors. By adopting a certain image model, one assumes that the original image should induce a small value for a given model-based signal prior. Well-known priors in the literature include total variation (TV) prior [1], sparsity prior [2], graph Laplacian regularizer [3], etc.

Recent developments in deep learning have revolutionized the aforementioned model-based paradigm in image restoration. Thanks to the strong learning capacity of convolutional neural networks (CNN) to capture image characteristics, CNN-based approaches have achieved state-of-the-art performance in a wide range of image restoration problems, such as image denoising [4, 5]

, super-resolution

[6, 7]

and colorization

[8]. Unlike model-based approaches, CNN-based approaches are data-driven. As a result, their restoration performance/behavior heavily relies on the sufficiency of the training data in describing the corruption process, in order to tune the huge number of model parameters [9]. Unfortunately, it can be infeasible to collect adequate amount of labelled data in practice. For instance, to learn a CNN for real image noise removal, thousands of noisy images and their noise-free versions are required to characterize the correspondence between the corrupted images and the ground-truths [10]. However, acquiring the noise-free images is non-trivial [11], leading to limited amount of training data. In this case, a purely data-driven approach is likely to overfit to the particular characteristics of the training data, and fails on test images with statistics different from the training images [12].

In contrast, a model-based approach relies on basic assumptions about the original images, which “encodes” assumed image characteristics. Without the notion of training, the performance of model-based approaches is generally more robust than data-driven approaches when facing the heterogeneity of natural images. However, the assumed characteristics may not perfectly hold in the real world, limiting their performance and flexibility in practice.

To alleviate the aforementioned problems, in this paper we combine the robustness merit of model-based approaches and the powerful learning capacity of data-driven approaches. We achieve this goal by incorporating the graph Laplacian regularizer—a simple yet effective image prior for image restoration tasks—into a deep learning framework. Specifically, we train a CNN which takes as input a corrupted image and outputs a set of feature maps. Subsequently, a neighborhood graph is built from the output features. The image is then recovered by solving an unconstrained quadratic programming (QP) problem, assuming that the underlying true image induces a small value of graph Laplacian regularizer. To verify the effectiveness of our hybrid framework, we focus on the basic task of image denoising, since a good denoising engine can be used as an important module in any image restoration problems in an ADMM plug-and-play framework [2, 13]. The contributions of our work are as follows:

  1. We are the first in literature to incorporate the widely used graph Laplacian regularizer into deep neural networks as a fully-differentiable layer, extracting underlying features of the input corrupted images and boosting the performance of the subsequent restoration.

  2. By coupling the strong graph Laplacian regularization layer with a light-weight CNN for pre-filtering, our approach is less susceptible to overfitting. Moreover, by localizing the graph construction and constraining the regularization weight to prevent steep local minimum, our pipeline is guaranteed to be numerically stable.

  3. Experimentation shows that, given small data (real low-light image denoising with 5 images), our proposal outperforms CNN-based approaches by avoiding overfitting; at the same time, our approach also exhibits strong cross-domain generalization ability. On the other hand, given sufficient data (for the case of Gaussian noise removal), we perform on par with the state-of-the-art CNN-based approaches.

We call our proposal deep graph Laplacian regularization, or DeepGLR for short. This paper is organized as follows. Related works are reviewed in Section 2. We then present our DeepGLR framework combining CNN and a differentiable graph Laplacian regularization layer in Section 3. Section 4 presents the experimentation and Section 5 concludes our work.

2 Related Works

We first review several image restoration approaches based on CNNs. We then turn to related works on graph Laplacian regularization and works combining graph and learning.

CNN-based image restoration: CNN-based approach was first popularized in high-level vision tasks, e.g., classification [14] and detection [15], then gradually penetrated into low-level restoration tasks such as image denoising [4], super-resolution [6], and non-blind deblurring [16]. We discuss several related works tackling image denoising with CNNs as follows. The work [4] by Zhang et al.

utilizes residual learning and batch normalization to build a deep architecture for denoising, which provides state-of-the-art results. In

[17], Jain et al. propose a simple network for natural image denoising and relate it to Markov random field (MRF) methods. To build a CNN capable of handling several noise levels, Vemulapalli et al. [5] employ conditional random field (CRF) for regularization. Other related works on denoising with CNN includes [18, 19, 20], etc. Despite their good performance, these approaches have strong dependency on the training data. Our DeepGLR, in contrast, enhances the robustness of the denoising pipeline, which prevents overfitting to training data.

Graph Laplacian regularization: graph Laplacian regularization is a popular image prior in the literature, e.g., [3, 21, 22]. Despite its simplicity, graph Laplacian regularization performs reasonably well for many restoration tasks [23]. It assumes that the original image, denoted as , is smooth with respect to an appropriately chosen graph . Specifically, it imposes that the value of , i.e., the graph Laplacian regularizer, should be small for the original image , where is the Laplacian matrix of graph . Typically, a graph Laplacian regularizer is employed for a quadratic programming (QP) formulation [3, 24, 25]. Nevertheless, choosing a proper graph for image restoration remains an open question. In [21, 25], the authors build their graphs from the corrupted image with simple ad-hoc rules; while in [3], Pang et al. derive sophisticated principles for building graphs under strict conditions. Different from existing works, our DeepGLR framework builds neighborhood graphs from the CNN outputs, i.e., our graphs are built in a data-driven manner, which learns the appropriate graph connectivity for restoration directly. In [26, 27, 28], the authors also formulate graph Laplacian regularization in a deep learning pipeline; yet unlike ours, their graph constructions are fixed functions, i.e., they are not data-driven.

Learning with graphs: there exist a few works combining tools of graph theory with data-driven approaches. In [29, 30] and subsequent works, the authors study the notion of convolution on graphs, which enables CNNs to be applied on irregular graph kernels. In [31], Turaga et al. let a CNN to directly output edge weights for fixed graphs; while Egilmez et al. [32] learn the graph Laplacian matrices with a maximum a posteriori (MAP) formulation. Our work also learns the graph structure. Different from the methodology of existing works, we build the graphs from the learned features of CNN for subsequent regularizations.

3 Our Framework

We now present our DeepGLR framework integrating graph Laplacian regularization into CNN. A graph Laplacian regularization layer is composed of two modules: a graph construction module [3] and a QP solver [33]. We first present the typical work-flow of adopting graph Laplacian regularization [3, 21, 24] for restoration, then present its encapsulation as a layer in CNN.

3.1 Formulation with Graph Laplacian Regularization

As mentioned in Section 1, we focus on the problem of denoising. It has the following image formation model:

(1)

Here

is the original image or image patch (in vector form) with

pixels, while is an additive noise term and is the noisy observation. Given an appropriate neighborhood graph with vertices representing the pixels, graph Laplacian regularization assumes the original image is smooth with respect to [34]. Denoting the edge weight connecting pixels and as , the adjacency matrix of graph is an -by- matrix, whose -th entry is . The degree matrix of is a diagonal matrix whose -th diagonal entry is . Then the (combinatorial) graph Laplacian matrix is a positive semidefinite (PSD) matrix given by , which induces the graph Laplacian regularizer [34].

To recover with graph Laplacian regularization, one can formulate a maximum a posteriori (MAP) problem as follows:

(2)

where the first term is a fidelity term (negative log likelihood) computing the difference between the observation and the recovered signal , and the second term is the graph Laplacian regularizer (negative log signal prior). is a weighting parameter. For effective regularization, one needs an appropriate graph reflecting the image structure of ground-truth . In most works such as [3, 24, 35], it is derived from the noisy or a pre-filtered version of .

For illustration, we define a matrix-valued function , where its -th column is denoted as where , . Hence, applying to observation maps it to a set of length- vectors . Using the same terminology in [3], the ’s are called exemplars. Then the edge weight () is computed by:

(3)

where

(4)

Here denotes the -th element of . (4) is the Euclidean distance between pixels and in the -dimension feature space defined by . In practice, the ’s should reflect the characteristics of the ground-truth image for effective restoration. Though different works use different schemes to build a similarity graph , most of them differ only in the choice of exemplars (or the ’s). In [25, 36], the authors restrict the graph structure to be a 4-connected grid graph and let , which is equivalent to simply let . In [24], Hu et al. operate on overlapping patches and let be the noisy patches similar to . Pang et al. [3] interpret the as samples on a high-dimensional Riemannian manifold and derive the optimal under certain assumptions.

3.2 Graph Laplacian Regularization Layer

In contrast to existing works, we deploy graph Laplacian regularization as a layer in a deep learning pipeline, by implementing the function F with a CNN. In other words, the corrupted observation is fed to a CNN (denoted as ) which outputs exemplars (or feature maps) .

Specifically, we perform denoising on a patch-by-patch basis, similarly done in [3, 24, 25]. Suppose the observed noisy image, denoted as , is divided into overlapping patches . Instead of naïvely feeding each patch to individually then performing optimization, we feed the whole noisy image to it, leading to exemplars images of the same size as , denoted as . By doing so, for the with receptive field size as , each pixel on is influenced by all the pixels on image if is in the neighborhood of . As a result, for a larger receptive field , the exemplar effectively takes into account more non-local information for denoising, resembling the notion of non-local means (NLM) in the classic works [37, 38].

Figure 1: Block diagram of the proposed GLRNet which employs a graph Laplacian regularization layer for image denoising.

With the exemplar images, we simply divide each of them, say, , into overlapping patches , . To denoise a patch , we build a graph with its corresponding exemplars in the way described in Section 3.1, leading to the graph Laplacian matrix . Rather than a fully connected graph, we choose the 8-connected pixel adjacency graph structure, i.e., in the graph , every pixel is only connected to its 8 neighboring pixels. Hence, the graph Laplacian is sparse with fixed sparsity pattern. The graph Laplacian , together with patch , are passed to the QP solver, which resolves the problem (2) and outputs the denoised patch . By equally aggregating the denoised image patches (), we arrive at the denoised image (denoted by ).

Apart from the aforementioned procedure, for practical restoration with the graph Laplacian regularization layer, the following ingredients are also adopted.

  1. Generation of : in (2), trades off the importance between the fidelity term and the graph Laplacian regularizer. To generate the appropriate ’s for regularization, we build a light-weight CNN (denoted as ). Particularly, based on the corrupted image , it produces a set of corresponding to the patches .

  2. Pre-filtering: in many denoising literature (e.g., [23, 39, 40]), it is popular to perform a pre-filtering operation to the noisy image before optimization. We borrow this idea and implement a pre-filtering step with a light-weight CNN (denoted as ). It operates on image and outputs the filtered image . Hence, instead of , we employ the patches of , i.e., , in the data term of problem (2).

We call the presented architecture which performs restoration with a graph Laplacian regularization layer GLRNet. Fig. 1 shows its block diagram, where the graph Laplacian regularization layer is composed of a graph construction module generating graph Laplacian matrices, and a QP solver producing denoised patches. The denoised image is obtained by aggregating the denoised patches. Since the graph construction process involves only elementary functions such as exponentials, powers and arithmetic operations, it is differentiable. Furthermore, from [33] the QP solver is also differentiable with respect to its inputs. Hence, the graph Laplacian regularization layer is fully differentiable, and our denoising pipeline can be end-to-end trained. The backward computation of the proposed graph Laplacian regularization layer is derived in the Appendix.

3.3 Iterative Filtering

Figure 2: Block diagram of the overall DeepGLR framework.

To achieve effective denoising, classic literature, e.g., [2, 23, 38] filters the noisy image iteratively to gradually enhance the image quality. Similarly, we implement such iterative filtering mechanism by cascading blocks of GLRNet (each block has a graph Laplacian regularization layer), leading to the overall DeepGLR framework. Similar to [5], all the GLRNet modules in our work have the same structure and share the same CNN parameters. Hence, to obtained the denoised image, the same denoising filter is iteratively applied to the noisy image for times. Fig. 2 shows the block diagram of our DeepGLR. In Fig. 2 and the following presentation, we have removed the superscript “” from the recovered for simplicity. We employ 2 or 3 cascades of GLRNet in our experiments.

To effectively train the proposed DeepGLR framework, we adopt a loss penalizing differences between the recovered image and the ground-truth. Given the noisy image , its corresponding ground-truth image and the restoration result

, our loss function is defined as the mean-square-error (MSE) between

and , i.e.,

(5)

where and are the height and width of the images, respectively. is the -th pixel of , the same for . Note that in our experiments, the restoration loss is only applied to the output of the last cascade , i.e., only the final restoration result is supervised.

3.4 Numerical Stability

We hereby analyze the stability of the QP solver tackling problem (2). Firstly, (2) essentially boils down to solving a system of linear equations

(6)

where I

is an identity matrix. It admits a closed-form solution

. Thus, one can interpret as a filtered version of noisy input y with linear filter . As a combinatorial graph Laplacian,

is positive semidefinite and its smallest eigenvalue is 0

[34]. Therefore, with , the matrix is always invertible, with the smallest eigenvalue as . However, the linear system becomes unstable for a numerical solver if has a large condition number —the ratio between the largest and the smallest eigenvalues for a normal matrix, assuming an -norm [41]. Using eigen-analysis, we have the following theorem regarding .

Theorem 3.1

The condition number of satisfies

(7)

where is the maximum degree of the vertices in .

Proof

As discussed, we know . By applying the Gershgorin circle theorem [42], can be upper-bounded as follows. First, the -th Gershgorin disc of has radius , and the center of the disc for is . From the Gershgorin circle theorem, the eigenvalues of have to reside in the union of all Gershgorin discs. Hence, , leading to . ∎

Thus, by constraining the value of the weighting parameter , we can suppress the condition number and ensure a stable denoising filter. Denote the maximum allowable condition number as where we impose , leading to

(8)

Hence, if generates a value no greater than , then stays unchanged, otherwise it is truncated to . We empirically set for both training and testing to guarantee the stability of our framework.

4 Experimentation

Figure 3: Network architectures of , and in the experiments. Data produced by the decoder of is colored in orange.
Figure 4: The 12 test images for evaluation on Gaussian noise removal.

Extensive experiments are presented in this section. We first describe our designed CNN architectures. Then we apply our proposal, DeepGLR, to the classic problem of Gaussian noise removal. We also test DeepGLR for real low-light image denoising. Then we apply our model trained for Gaussian noise removal to the task of low-light image denoising, affirming our strong ability of cross-domain generalization.

4.1 Network Architectures

Our framework does not limit the choices of network architectures. Hence, one has the freedom in designing the specifications of , and . In our experimentation, we choose the networks shown in Fig. 3. Specifically,

  1. : To generate exemplars , we adopt the popular hour-glass structure for which has an encoder and a decoder with skip-connections [43]. Similar to [3], we use exemplars to build the graphs.

  2. : The pre-filtered image is simply generated by 4 convolution layers using a residual learning structure [44].

  3. : The weighting parameter

    is estimated on a patch-by-patch basis. Our experiments uses patch size of

    for denoising. Hence, starting from a noisy patch, it has undergone 4 convolution layers with max pooling and 2 fully-connected layers, leading to the parameter .

Except for the last convolution layers of and , and the two deconvolution layers of , all the rest network layers shown in Fig. 3 are followed by a activation function. Note that the input image can have different sizes as long as it is a multiple of 4. For illustration, Fig. 3 shows the case when the input is of size .

Noise Metric
BM3D WNNM OGLR DnCNN-S DnCNN-B DeepGLR-S DeepGLR-B
25 PSNR 29.95 30.28 29.78 30.41 30.33 30.26 30.21 [t]
SSIM 0.8496 0.8554 0.8463 0.8609 0.8594 0.8599 0.8557 [b]
40 PSNR 27.62 28.08 27.68 28.10 28.13 28.16 28.04 [t]
SSIM 0.7920 0.8018 0.7949 0.8080 0.8091 0.8125 0.8063 [b]
50 PSNR 26.69 27.08 26.58 27.15 27.18 27.25 27.12 [t]
SSIM 0.7651 0.7769 0.7539 0.7809 0.7811 0.7852 0.7807 [b]
Table 1: Average PSNR (dB) and SSIM values of different methods for Gaussian noise removal. The best results for each metric is highlighted in boldface.

4.2 Synthetic Gaussian Noise Removal

We start our experiments with the removal of independent and identically distributed (i.i.d.) additive white Gaussian noise (AWGN), where we train the proposed DeepGLR for both denoising with known specific noise level and blind denoising with unknown noise level.

We use the dataset (with 400 gray-scale images of size ) provided by Zhang et al. [4] for training. The denoising performance is evaluated on 12 commonly used test images with sizes of or , similarly done in [4]. Thumbnails of these 12 images are shown in Fig. 4. For objective evaluation, peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) [45] are employed. During the training phase, the noisy images, accompanied with their noise-free versions, are fed to the network for training. For both training and testing, the overlapping patches are of size , i.e.,

, where neighboring patches are of a stride 22 apart. We let the batch size be 4 and the model is trained for 200 epochs. A multi-step learning rate decay policy, with values

, are used, where the learning rate decreases at the beginning of epochs

. We implement our work with the TensorFlow framework

[46] on an Nvidia GeForce GTX Titan X GPU.

For denoising with specific noise level, we train three different models, each with noisy images corrupted by a specific noise level, i.e., . For this case, we use three cascades of GLRNet, and the resulting models are generally referred to as DeepGLR-S. For blind AWGN removal, two different networks are trained, one for low noise levels with with two cascades of GLRNet, and the other for high noise levels with with three cascades of GLRNet.111When using three cascades for low-noise data, we do not observe sufficient gain at the third cascade, hence it is removed for simplicity. The resulting models are referred to as DeepGLR-B.

(a) Ground-truth
(b) Noisy (0.255)
(c) BM3D (0.818)
(d) WNNM (0.839)
(e) DnCNN-S (0.845)
(f) DnCNN-B (0.846)
(g) DeepGLR-S  (0.851)
(h) DeepGLR-B (0.846)
Figure 5: Gaussian noise removal for the image Butterfly, where the original image is corrupted by noise with . SSIM values of the images are also shown.
(a) Ground-truth
(b) Noisy (0.116)
(c) BM3D (0.795)
(d) WNNM (0.808)
(e) DnCNN-S (0.807)
(f) DnCNN-B (0.807)
(g) DeepGLR-S  (0.814)
(h) DeepGLR-B (0.810)
Figure 6: Gaussian noise removal for the image Lena, where the original image is corrupted by noise with . SSIM values of the images are also shown.
Figure 7: The 10 scenes of the RENOIR dataset [47] used for real low-light image denoising. The 5 images on the left belong to the first fold while rest belong to the second.

The proposed DeepGLR is compared with several state-of-the-art image denoising approaches, including:

  1. Two methods using non-local self-similarity: BM3D [38] and WNNM [48];

  2. A learning-based method: DnCNN [4], with DnCNN-S for known (specific) noise level denoising and DnCNN-B for blind denoising covering the same noise range as our DeepGLR-B;

  3. A method with graph Laplacian regularization: OGLR [3].

We use the codes released by the authors in our experimentation.

Table 1 shows the average PSNR and SSIM values of different methods on the test images. We observe that DeepGLR-S and DeepGLR-B provide comparable performances with state-of-the-art DnCNN-S and DnCNN-B, which affirms the presentation ability of our hybrid framework though with much smaller network parameter size. Fig. 5 shows the image Butterfly, where the original one and the noisy versions (with ), accompanied by the denoised results of different methods, are presented for comparison. It can be seen that, our method not only effectively recovers the sharp edges but also provides the most natural appearance. Fig. 6 shows the denoising results of Lena, where the fragments in the red rectangles are enlarged for better display. Again, compared to the other methods, our result looks most pleasant.

(a) Ground-truth
(b) Noisy
(c) CBM3D
(d) MC-WNNM
(e) CDnCNN
(f) CDeepGLR
Figure 8: Low-light image noise removal for image 4 of the RENOIR dataset.
Metric Noisy
CBM3D MC-WNNM CDnCNN CDnCNN CDeepGLR CDeepGLR [t]
(train) (train) [b]
PSNR 20.36 26.08 26.23 33.43 31.26 32.31 31.60 [t]
SSIM Y 0.5198 0.8698 0.8531 0.9138 0.8978 0.9013 0.9028
SSIM R 0.2270 0.6293 0.5746 0.8538 0.8218 0.8372 0.8297
SSIM G 0.4073 0.8252 0.7566 0.8979 0.8828 0.8840 0.8854
SSIM B 0.1823 0.5633 0.5570 0.8294 0.7812 0.8138 0.7997 [b]
Table 2: Evaluation of different methods for low-light image denoising. The best results for each metric, except for those tested on the training set, are highlighted in boldface.

4.3 Real Low-light Image Denoising

In this experiment, we consider the problem of real low-light image denoising. In fact, in low-light environment, cameras or smart-phones increases the light sensitivity (i.e., ISO) to capture plausible images. Due to limited sensor size and insufficient exposure time, the captured images suffer from noticeable chromatic noise, which severely affects visual quality, e.g., Fig. 9. According to [10], this kind of real-world noise manifests much more complex behavior than the homogeneous Gaussian noise. However, the majority of denoising algorithms are developed specifically for Gaussian noise removal which may fail in this practical setup. Moreover, as discussed in Section 1, it is troublesome to collect the noise-free version corresponding to a noisy image. The difficulty of acquiring ground-truth images limits the amount of training data, making pure data-driven approaches prone to overfitting. In contrast, as to be seen, our proposed DeepGLR provides superior performance for small dataset training.

(a) Ground-truth
(b) Noisy
(c) CBM3D
(d) MC-WNNM
(e) CDnCNN
(f) CDeepGLR
Figure 9: Low-light image noise removal for image 35 of the RENOIR dataset.

We employ the RENOIR [47] dataset, a dataset consisting of low-light noisy images with the corresponding (almost) noise-free versions. Specifically, its subset with 40 scenes collected with a Xiaomi Mi3 smart-phone are used in our experiments. Since some of the scenes have very low intensities while some of the given ground-truth images are still noisy, we remove the scenes whose ground-truths have: (a) average intensities lower than 0.3 (assuming the intensity ranges from 0 to 1); and (b) estimated PSNRs (provided by [47]) lower than 36 dB, leading to 10 valid image pairs. Thumbnails of their ground-truth images are shown in Fig. 7. We adopt a two-fold cross validation scheme to verify the effectiveness of our approach. In each of the two trials, we perform training on one fold and testing on the other, then evaluate the performance by the averaging of the results of both trials.

To adapt our framework for color images, the first layers of the CNNs are changed to take 3-channel inputs, while the loss function is also computed with all the 3 channels. Moreover, in the graph Laplacian regularization layer, the 3 channels share the same graph for utilizing inter-color correlation; while the QP solver solves for three separate systems of linear equations then outputs a color image. We use the same training settings as presented in Section 4.2, and a cascade of two GLRNets are adopted. The resulting model is referred to as CDeepGLR. Our CDeepGLR is compared with CBM3D dedicated for Gaussian noise removal on color image [38], MC-WNNM dedicated for real image noise removal [11], and CDnCNN [4], a data-driven approach trained with the same dataset as ours.222

For testing with CBM3D, we estimate the equivalent noise variances using the ground-truth and the noisy images.

We also evaluate CDnCNN and the proposed CDeepGLR on the training data, i.e.

, the training and testing are performed on the same fold of data, then the evaluation metrics are averaged.

Table 2 shows the average PSNR values, and the SSIM values of the luminance (Y) channel and the three color channels of different schemes. For all evaluation metrics, our CDeepGLR provides the best results on the test set.

(a) Ground-truth
(b) Noisy
(c) CBM3D
(d) Noise Clinic
(e) DnCNN
(f) DeepGLR
Figure 10: The proposed DeepGLR trained for AWGN denoising generalizes well to low-light image denoising.
Metric Noisy
CBM3D Noise Clinic CDnCNN CDeepGLR [t]
PSNR 20.36 26.08 28.01 24.36 29.88 [t]
SSIM Y 0.5198 0.8698 0.7863 0.6562 0.8826
SSIM R 0.2270 0.6293 0.6505 0.4640 0.7872
SSIM G 0.4073 0.8252 0.6836 0.6651 0.8690
SSIM B 0.1823 0.5633 0.5826 0.4328 0.7195 [b]
Table 3: Evaluation of cross-domain generalization for low-light image denoising. The best results are highlighted in boldface.

Particularly, CDeepGLR gives an average PSNR of 0.34 dB higher than CDnCNN, the state-of-the-art approach; interestingly, CDnCNN performs better on the training set (see the columns “CDnCNN (train)” and “CDeepGLR (train)” in Table 2). That is because:

  1. Only 5 images are available for training in this experiment, letting CDnCNN strongly overfit to the training data. However, our CDeepGLR is less sensitive to the deficiency of the training data.

  2. While CDnCNN is most suitable for Gaussian noise removal (as stated in [4]), our CDeepGLR adaptively learns the suitable graphs capturing the intrinsic structures of the original images, which weakens the impact of the complex real noise statistics.

Hence, our DeepGLR combining CNN and graph Laplacian regularization retains the high flexibility of data-driven approaches while manifesting the robustness of model-based approaches.

(a) Ground-truth
(b) Noisy
(c) CBM3D
(d) Noise Clinic
(e) DnCNN
(f) DeepGLR
Figure 11: The proposed DeepGLR trained for AWGN denoising generalizes well to low-light image denoising.

Fig. 8 and Fig. 9 show fragments of two different images from the RENOIR dataset. We see that, CDnCNN fails to fully remove the noise while the results of CBM3D and MC-WNNM contains even more severe chromatic distortions. In contrast, our CDeepGLR produces results with natural appearance which look closest to the ground-truth.

4.4 Cross-Domain Generalization

In this last experiment, we evaluate the robustness of our approach in terms of its cross-domain generalization ability. Specifically, we evaluate on the RENOIR dataset with CDeepGLR and CDnCNN trained for AWGN blind denoising. For comparison, we also include noise clinic [49] designed specifically for real noise removal, and CBM3D as a baseline method. Objective performance, in terms of PSNR and SSIM, are listed in Table 3. We see that, CDeepGLR has a PSNR performance of 29.88 dB, outperforming CDnCNN by 5.52 dB, and noise clinic by 1.87 dB. Hence, while CDnCNN is strongly overfitted to the case of Gaussian noise removal and fails to generalize to real noise, CDeepGLR still provides satisfactory denoising results. Fig. 10 and Fig. 11 show the denoising results of two image fragments from the RENOIR dataset. Again, our CDeepGLR provides the best visual quality, while the competing methods fail to fully remove the noise.

5 Conclusion

In this work, we incorporate graph Laplacian regularization into a deep learning framework. Given a corrupted image, it is first fed to a CNN, then neighborhood graphs are constructed from the CNN outputs. Using graph Laplacian regularization, the image can be recovered on a patch-by-patch basis. The graph construction, and the recovery is fully differentiable, and the overall pipeline can be end-to-end trained. We apply the proposed framework, DeepGLR, to the task of image denoising. Experimentation verifies that, our work not only achieves state-of-the-art denoising performance, but also demonstrates higher immunity to overfitting with strong cross-domain generalization ability.

References

  • [1] Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena 60(1-4) (1992) 259–268
  • [2] Elad, M., Aharon, M.: Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image processing 15(12) (2006) 3736–3745
  • [3] Pang, J., Cheung, G.: Graph Laplacian regularization for image denoising: Analysis in the continuous domain. IEEE Transactions on Image Processing 26(4) (2017) 1770–1785
  • [4] Zhang, K., Zuo, W., Chen, Y., Meng, D., Zhang, L.: Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising. IEEE Transactions on Image Processing 26(7) (2017) 3142–3155
  • [5] Vemulapalli, R., Tuzel, O., Liu, M.Y.: Deep Gaussian conditional random field network: A model-based deep network for discriminative denoising.

    In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (2016) 4801–4809

  • [6] Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: European Conference on Computer Vision, Springer (2014) 184–199
  • [7] Kim, J., Kwon Lee, J., Mu Lee, K.: Accurate image super-resolution using very deep convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (2016) 1646–1654
  • [8] Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: European Conference on Computer Vision (ECCV), Springer (2016) 649–666
  • [9] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553) (2015) 436–444
  • [10] Zhu, F., Chen, G., Heng, P.A.: From noise modeling to blind image denoising. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (2016) 420–429
  • [11] Xu, J., Zhang, L., Zhang, D., Feng, X.: Multi-channel weighted nuclear norm minimization for real color image denoising. In: The IEEE International Conference on Computer Vision (ICCV). (Oct 2017)
  • [12] McCann, M.T., Jin, K.H., Unser, M.: Convolutional neural networks for inverse problems in imaging: A review. IEEE Signal Processing Magazine 34(6) (2017) 85–95
  • [13] Romano, Y., Elad, M., Milanfar, P.: The little engine that could: Regularization by denoising (RED). SIAM Journal on Imaging Sciences 10(4) (2017) 1804–1844
  • [14] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems. (2012) 1097–1105
  • [15] Ouyang, W., Wang, X.: Joint deep learning for pedestrian detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV). (2013) 2056–2063
  • [16] Xu, L., Ren, J.S., Liu, C., Jia, J.: Deep convolutional neural network for image deconvolution. In: Advances in Neural Information Processing Systems. (2014) 1790–1798
  • [17] Jain, V., Seung, S.: Natural image denoising with convolutional networks. In: Advances in Neural Information Processing Systems. (2009) 769–776
  • [18] Mao, X., Shen, C., Yang, Y.B.: Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. In: Advances in Neural Information Processing Systems. (2016) 2802–2810
  • [19] Tai, Y., Yang, J., Liu, X., Xu, C.: Memnet: A persistent memory network for image restoration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2017) 4539–4547
  • [20] Chen, C., Chen, Q., Xu, J., Koltun, V.: Learning to see in the dark. arXiv preprint arXiv:1805.01934 (2018)
  • [21] Elmoataz, A., Lezoray, O., Bougleux, S.: Nonlocal discrete regularization on weighted graphs: A framework for image and manifold processing. IEEE Transactions on Image Processing 17(7) (2008) 1047–1060
  • [22] Gilboa, G., Osher, S.: Nonlocal linear image regularization and supervised segmentation. Multiscale Modeling & Simulation 6(2) (2007) 595–630
  • [23] Milanfar, P.: A tour of modern image filtering: New insights and methods, both practical and theoretical. IEEE Signal Processing Magazine 30(1) (2013) 106–128
  • [24] Hu, W., Cheung, G., Kazui, M.: Graph-based dequantization of block-compressed piecewise smooth images. IEEE Signal Processing Letters 23(2) (2016) 242–246
  • [25] Liu, X., Zhai, D., Zhao, D., Zhai, G., Gao, W.: Progressive image denoising through hybrid graph Laplacian regularization: A unified framework. IEEE Transactions on image processing 23(4) (2014) 1491–1503
  • [26] Shen, X., Tao, X., Gao, H., Zhou, C., Jia, J.: Deep automatic portrait matting. In: European Conference on Computer Vision, Springer (2016) 92–107
  • [27] Barron, J.T., Poole, B.: The fast bilateral solver. In: European Conference on Computer Vision, Springer (2016) 617–632
  • [28] Pang, J., Sun, W., Yang, C., Ren, J.S., Xiao, R., Zeng, J., Lin, L.: Zoom and learn: Generalizing deep stereo matching to novel domains. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (2018)
  • [29] Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations (ICLR). (2017)
  • [30] Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: Advances in Neural Information Processing Systems. (2016) 3844–3852
  • [31] Turaga, S.C., Murray, J.F., Jain, V., Roth, F., Helmstaedter, M., Briggman, K., Denk, W., Seung, H.S.: Convolutional networks can learn to generate affinity graphs for image segmentation. Neural Computation 22(2) (2010) 511–538
  • [32] Egilmez, H.E., Pavez, E., Ortega, A.: Graph learning from data under Laplacian and structural constraints. IEEE Journal of Selected Topics in Signal Processing 11(6) (2017) 825–841
  • [33] Amos, B., Kolter, J.Z.: OptNet: Differentiable optimization as a layer in neural networks.

    In: Proceedings of International Conference on Machine Learning (ICML), PMLR (2017) 136–145

  • [34] Shuman, D.I., Narang, S.K., Frossard, P., Ortega, A., Vandergheynst, P.:

    The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains.

    IEEE Signal Processing Magazine 30(3) (2013) 83–98
  • [35] Osher, S., Shi, Z., Zhu, W.: Low dimensional manifold model for image processing. SIAM Journal on Imaging Sciences 10(4) (2017) 1669–1690
  • [36] Kheradmand, A., Milanfar, P.: A general framework for regularized, similarity-based image restoration. IEEE Transactions on Image Processing 23(12) (2014) 5136–5151
  • [37] Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Volume 2., IEEE (2005) 60–65
  • [38] Dabov, K., Foi, A., Katkovnik, V., Egiazarian, K.: Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Transactions on image processing 16(8) (2007) 2080–2095
  • [39] Chatterjee, P., Milanfar, P.: Patch-based near-optimal image denoising. IEEE Transactions on Image Processing 21(4) (2012) 1635–1649
  • [40] Pang, J., Cheung, G., Ortega, A., Au, O.C.: Optimal graph Laplacian regularization for natural image denoising. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (2015) 2294–2298
  • [41] Horn, R.A., Johnson, C.R.: Matrix analysis. Cambridge University Press (1990)
  • [42] Varga, R.S.: Geršgorin and his circles. Volume 36. Springer Science & Business Media (2010)
  • [43] Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer (2015) 234–241
  • [44] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (2016) 770–778
  • [45] Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing 13(4) (2004) 600–612
  • [46] Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al.: Tensorflow: A system for large-scale machine learning. In: OSDI. Volume 16. (2016) 265–283
  • [47] Anaya, J., Barbu, A.: Renoir—A dataset for real low-light noise image reduction. Journal of Visual Communication and Image Representation 51 (2018) 144 – 154
  • [48] Gu, S., Zhang, L., Zuo, W., Feng, X.: Weighted nuclear norm minimization with application to image denoising. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2014) 2862–2869
  • [49] Lebrun, M., Colom, M., Morel, J.M.: The noise clinic: A blind image denoising algorithm. Image Processing On Line 5 (2015) 1–54

Appendix

We hereby derive the backward computation of the proposed graph Laplacian regularization layer, which consists of the graph construction module and the QP solver. Suppose for a noisy patch , its corresponding recovered patch is while the underlying ground-truth is , where . For simplicity, we consider a loss function defined on a patch basis, which computes the weighted Euclidean distance between and the ground-truth , ,

(9)

where is a diagonal matrix and represents the weight of the -th pixel. Consequently, the loss function in our paper can be regarded as the summation of a series of patch-based loss (9) with respective matrices ’s.

QP solver: we first consider the backward pass of the QP solver, i.e., we derive the error propagation of the weighting parameter and the graph Laplacian matrix . Hence, we have

(10)

We denote as the indication vector whose -th entry is 1 while the rest are zeros, then

(11)

where is the -th entry of , the same for .

Graph construction: we hereby derive the partial derivative of the graph Laplacian matrix with respect to the -th entry of the exemplar where , . From the definition of the graph Laplacian matrix and (3)(4) of our paper,

(12)

where denotes the 8 neighboring pixels of pixel .