Proximal Splitting Networks for Image Restoration

03/17/2019 ∙ by Raied Aljadaany, et al. ∙ Carnegie Mellon University 0

Image restoration problems are typically ill-posed requiring the design of suitable priors. These priors are typically hand-designed and are fully instantiated throughout the process. In this paper, we introduce a novel framework for handling inverse problems related to image restoration based on elements from the half quadratic splitting method and proximal operators. Modeling the proximal operator as a convolutional network, we defined an implicit prior on the image space as a function class during training. This is in contrast to the common practice in literature of having the prior to be fixed and fully instantiated even during training stages. Further, we allow this proximal operator to be tuned differently for each iteration which greatly increases modeling capacity and allows us to reduce the number of iterations by an order of magnitude as compared to other approaches. Our final network is an end-to-end one whose run time matches the previous fastest algorithms while outperforming them in recovery fidelity on two image restoration tasks. Indeed, we find our approach achieves state-of-the-art results on benchmarks in image denoising and image super resolution while recovering more complex and finer details.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 3

page 5

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Single image restoration aims to reconstruct a clear image from corrupted measurements. Assume a corrupted image can be generated via convolving a clear image with a known linear space-invariant blur kernel . This can be written as:

(1)

where is an additive zero-mean white Gaussian noise and is the convolution operation. The problem of recovering the clean image is an ill-posed inverse problem. One approach to solve it is by assuming some prior (or a set of) on the image space. Thus, the clean image can be approximated by solving the following optimization problem

(2)

where is the norm and is an operator that defines some prior (e.g norm is used to promote sparsity). A good prior is important to recover a feasible and high-quality solution. Indeed, priors are common in signal and image processing tasks such as inverse problems mallat2008wavelet ; kim2010single and these communities have spent considerable effort in hand designing suitable priors for signals antonini1992image ; rubinstein2010dictionaries ; sun2008image .

In this work, we build a framework where the image prior is a function class

during training, rather than a specific instantiation. Parameters of this function are learned which then acts as a fully instantiated prior during testing. This allows for a far more flexible prior which is learned and tuned according to the data. This is somewhat in contrast to what the concept of a prior is in the general machine learning setting where it is usually a specific function (rather a function class). In our application for image restoration, the prior function class is defined to be a deep convolutional network. Such networks have exceptional capacity at modelling complex functions while capturing natural structures in images such as spatial reciprocity through convolutions. A large function class for the prior allows the optimization to model richer statistics within the image space, which leads to better reconstruction performance (as we find in our experiments).

Our reconstruction network takes only two inputs, the corrupted image and the kernel, then it reconstructs the clear image with a single forward pass. The network architecture is designed following a recovery algorithm based on the half quadratic splitting method involving the proximal operator (see section 3). Proximal operators have been successfully applied in many image processing tasks (e.g. nikolova2005analysis ). They are powerful and can work under more general conditions (e.g don’t require differentiability) and are simple to implement and study. Theoretically, the reconstruction process is driven primarily by the half-quadratic splitting method with no trainable parameters. The only need for training arises when the proximal operators in this architecture are modelled using deep networks. This training only helps the network to learn parameters such that the overall pipeline is effective. Our overall framework is flexible and can be applied to almost any image based inverse problem, though in this work we focus on image denoising and image super-resolution.

Contributions. We propose a novel framework for image restoration tasks called Proximal Splitting Networks (PSN). Our network architecture is theoretically motivated through the half quadratic splitting method and the proximal operator. Further, we model the proximal operators as deep convolutional networks that result in more flexible signal priors tuned to the data, while requiring a number of iterations an order of magnitude less compared to some previous works. Finally, we demonstrate state-of-the-art results on two image restoration tasks, namely image denoising and image super resolution on multiple standard benchmarks.

Figure 1: De-noising results of “Fly” (Set12) with Gaussian noise added with . Numbers indicate PSNR. Our algorithm called Proximal-Splitting Network (PSN, U denotes unknown noise level) produces sharper edges in various orientations, an artifact of a powerful learned prior. For illustration, we compare against BM3D dabov2007image .

2 Prior Art

Priors in image restoration.

The design of priors for solving inverse problems has enjoyed a rich history. There have been linear transforms proposed as priors which also assume low energy or sparsity etc

hoeher1997two ; beck2009fast ; altintacc1988 . However, it has been shown that these approaches fail when the solution space invoked by the assumed prior does not contain good approximations of the real data elad2006image . There have also been other signal processing methods such as BM3D dabov2007image , and those levaraging total variation wang2008new and dictionary learning rubinstein2010dictionaries ; elad2006image ; tosic2011dictionary techniques that have been successful in several of these tasks. Proximal operators parikh2014proximal and half quadratic splitting wang2008new methods have been useful in a few image recovering algorithms such as beck2009fast . These methods assume hand-designed priors approximated via careful choice of the norm. Although this has provided much success, they are limited in the expressive capacity of the prior which ultimately limits the quality of the solution. In our approach, we learn the prior from a function class for our problem during the optimization. Thus our algorithm utilizes a more expressive prior that is informed by data.

Deep learning approaches and our generalization.Deep learning approaches have emerged successful in modelling high level image statistics and thus have excelled at image restoration problems. Some example applications include blind de-convolution nah2017deep , super-resolution kim2016accurate and de-noisingzhang2017beyond . Though these methods have powerful generalization, it remains unclear as to what the relation between the architecture and the prior used is. In this work however, the network is clearly motivated based on a combination of the proximal operator and half quadratic splitting methods. Further, we show that our network is a generalization of the approaches in kim2016accurate ; zhang2017beyond in the supplementary.

Figure 2: Super-resolution results of “Zebra”(Set14) downsampling (scale) factor X4. Numbers indicate PSNR. Proximal-Splitting Network (PSN) produces much sharper and well-defined edges in various orientations, as artifact of a powerful learned prior. For illustration, we compare against VDSR kim2016accurate .

Deep learning approaches which learn the prior It is worth mentioning that several approaches have used a proximal gradient decent algorithm meinhardt2017learning , ADMM algorithm rick2017one or a gradient decent method bigdeli2017deep to recover an image where the prior is computed via a deep learning network. Although, these approaches preform well with respect to the reconstruction performance, they inherit important limitations in terms of computation efficiency. Proximal gradient decent, ADMM and gradient decent methods being first order iterative methods with linear or sub-linear convergence rate, typically require many tens of iterations for convergence. Each iteration consists of a forward pass through the network, which emerges as a considerable bottleneck. Our approach addresses this problem by allowing different proximal operators to be learned at every ‘iteration’. This increases modelling capacity of the overall network and allows for much lower iterations (an order of magnitude lower in our case).

Deep learning structure based on theoretical approaches There are several approaches that employed CNNs for image restoration schmidt2014shrinkage ; jin2017noise ; chen2017trainable where the structure of the network is driven from a theoretical model for image recovery. In schmidt2014shrinkage , the author proposed the cascade of shrinkage fields for image restoration. CSF can be seen as a ConvNet where the architecture of this network is a cascade of Gaussian conditional random fields schmidt2013discriminative . chen2017trainable proposed trainable nonlinear reaction diffusion (TNRD) which is a ConvNet that has structure based on nonlinear diffusion models perona1990scale . In jin2017noise , the authors proposed GradNet applied to noise-blind deblurring. The architecture of GradNet is motivated by the Majorization-Minimization (MM) algorithm hunter2004tutorial

. However, these approaches assume that the prior term is driven from or approximated by Gaussian mixture model

zoran2011learning which is represented by ConvNet. Our method is free of this assumption.

3 Proximal Splitting in Deep Networks

Our main goal is the design of a feed forward neural network for image restoration. For the architecture, we take inspiration from two tools in optimization. The first being the proximal operator which allows for a solution to a problem to be part of some predefined solution space. The second component being the half quadratic splitting technique which allows a sum of objective functions to be solved in alternating sequence using proximal operators. We briefly describe these two components and then utilize them to design our system architecture.

Figure 3: Architecture of a single Proximal Block, the fundamental component of the Proximal Splitting Network (PSN). This block implements a single iteration of the half quadratic splitting method for the constraints in PSN optimization problem (Eq. 12

). The block has 10 layers, each being the Convolution, ReLU and Batch Normalization

ioffe2015batch combination. All intermediate layers had 64 channels with convolutions, except the first/last Conv layer with 3/1 channel (RGB /grey-scale).

3.1 Proximal Operator

Let be a function. The proximal operator of the function with the parameter is defined as

(3)

If the function is a strong convex function and twice differentiable with respect to and is large, the proximal operator of the function converges to a gradient descent step (a proof of this known result is presented in the supplementary). In this case the proximal operator can be approximated as:

(4)

3.2 Half Quadratic Splitting

Now note that the image recovery optimization problem (Eq. 2) can be rewritten as:

(5)

where is the data fidelity term and is a function that represents the prior. Depending on this prior , Eq. 5 might be hard to optimize, especially when the prior function is not convex. The half quadratic splitting method wang2008new restructures this problem (Eq. 5) into a constrained optimization problem by introducing an auxiliary variable . Under this approach, the optimization problem in Eq. 5 is reformulated as:

(6)

The next step is to convert the equality constraint into its Lagrangian.

(7)

where is a penalty parameter. As approaches infinity, the solution of Eq. 7 is equivalent to that of Eq. 5, and can be solved in iterative fashion by fixing one variable, updating the other and vice versa. By using the proximal operator, these updating steps become

(8)

When the image is fixed, the optimum can be found through the proximal operator involving and . Clearly, this depends on the prior which is the function. For instance, if is norm, the prox operator will be a soft threshold operator which forces the signal to be sparse beck2009fast . However, for real-world image data, the optimal class of functions for is not known, which by extension makes the prox-operator sub-optimal for recovery. In the following subsection, we will propose an approach to optimize for the prox operator within a predefined search space.

As a final note, recall that since the added noise is assumed to be Gaussian, is the euclidean distances between the corrupted image and the clean image convolved with a kernel. Thus, which is convex and twice differentiable. This allows the updating step in Eq. 8 for to be approximated via gradient decent while modifying the proximal operator from Eq. 4:

(9)

where is the matrix form of the convolution operation with and is its transpose.

Figure 4: The multi-scale Proximal Splitting Network (PSN) architecture for image restoration tasks. Each Up-sampling block does so by a factor of 2 whereas each Proximal Block has 10 layers (see Fig. 3). The two Conv layers have 1 or 3 channels each (Grey-scale vs. RGB image space) with kernels.

3.3 Proximal Splitting Networks

We now develop the core optimization problem which will then yield the Proximal Splitting Network architecture. Our main approach for image recovery is to use the half quadratic splitting method which alternately updates the image and an auxiliary variable as in Eq. 8. Thus, for iterations the optimization procedure becomes

(10)

Note that the update for still contains a proximal operator depending on the prior . There have been studies such as meinhardt2017learning , where the authors replace the proximal operator with a Deep Denoising Network (DnCNN) zhang2017beyond . Similarly, the authors in heide2014flexisp use BM3D or the NLM denoiser rather than the proximal operator to update the value of an image. It is also important to note that these studies utilized these denoisers in an iterative fashion i.e. the same proximal operator with its parameters was used through multiple iterations. Considering that the number of iterations in these studies were significantly high (about 30 for both meinhardt2017learning and heide2014flexisp ) and the fact that every iteration requires a forward pass through a deep network, these methods have large computational bottleneck.

Although these methods work well, there is much to gain from defining a more flexible proximal operator in two ways. First, defining a larger solution space for the proximal operator would allow for the algorithm to choose more fitting operators. Secondly, allowing the proximal operator networks at different stages (iterations) to maintain separate weights allows for each operator to be tuned to the statistics of the estimated image at that stage. This also allows us to keep the number of iterations or stages very small in comparison due to the larger modelling capacity (3 in our experiments, which is an order of magnitude less than previous studies

meinhardt2017learning ; heide2014flexisp ). Keeping these in mind, we choose the model for the proximal operator in our formulation to be a deep convolutional network, which introduces desirable inductive biases. These biases themselves act as our ‘prior’ while providing the optimization a large enough function search space to choose from. The rest of the prior (i.e. the actual parameters of the convolutional network) are tuned according to the data. Under this modification, the update step for becomes

(11)

where is a convolutional network for the iteration. Note that for every iteration, there is a separate such network. Defining the proximal operator (and the image prior) to be different for every iteration, the final optimization problem becomes

(12)

where is the ground truth clean image, is the final estimated image, is the number of stages (iterations) and is the initial input image. Note that the minimization in this formulation is only on i.e. the parameters of the set of proximal networks

. The loss function here can be any suitable function, though we minimize the Euclidean error for this study assuming Gaussian noise. It is important to note a subtle point regarding the recovery framework. The minimization in Eq. 

12 only tunes the network towards the desired task based off the data. However, the core algorithm for reconstruction is still based on Eq. 10 i.e. iterations of the half quadratic splitting based reconstruction. It is also useful to observe the interplay between the objective function and the constraints. The first constraint and the loss objective in Eq. 12 work to project the recovered image onto the image space while the second constraint pushes the recovered image to be as close as possible to the the corrupted input image. A single iteration over these constraints according to half quadratic splitting, and the proximal network together result in what we call the Proximal Block as shown in Fig. 3. The Proximal Block is the fundamental component using which the overall network is built (as we describe soon).

Multi-scale Proximal Splitting Network. Multi-scale decomposition has been widely applied in many applications, such as edge-aware filtering paris2011local , image blending burt1987laplacian and semantic segmentation ghiasi2016laplacian . Multi-scale architecture extensions have also emerged as a standard technique to further improve the performance of deep learning approaches to image recovery tasks such as image de-convolution nah2017deep and image super-resolution lai2017deep . We find that the multi-scaling is useful incorporate it into the Proximal Splitting Network algorithm. These approaches usually require that the output of each intermediate scale stage be the cleaned/processed image at that scale. Complying with this, multi-scaled PSN networks are designed such that the intermediate outputs form a Gaussian pyramid of cleaned images. For better performance, we apply reconstruction fidelity loss functions at each level of the pyramid. This also helps provide stronger gradients for the entire network pipeline, especially the first few layers which are typically harder to train.

Proximal Splitting Network Architecture for Image Restoration. Finally, we implement Eq. 12 to arrive at the PSN architecture while utilizing multiple Proximal Blocks in Fig. 4. The number of Proximal Blocks (from Fig. 3) equals the number of stages or iterations for the half-quadratic splitting method (Eq. 12) which we set to be 3 i.e. . Recall that this is an order of magnitude lesser than some previous works meinhardt2017learning ; heide2014flexisp . In Fig. 4, the input image is the corrupted image convolved with the

(e.g the input image is the noisy image for image denoising and it is the up sampled image via bi cubic interpolation for image super-resolution in the experiment part) . The down sampling is achieved via bi-cubic down sampling and the up sampling by a de-convolution layer

noh2015learning . Through a preliminary grid search, we find that works satisfactorily.

BM3D dabov2007image WNNM gu2014weighted EPLL zoran2011learning MLP burger2012image CSF schmidt2014shrinkage TNRD chen2017trainable DnCNN zhang2017beyond PSN-K (Ours) PSN-U (Ours)
15 31.07 31.37 31.21 31.24 31.42 31.61 31.70 31.60
25 28.57 28.83 28.68 28.96 28.74 28.92 29.16 29.27 29.17
50 25.62 25.87 25.67 26.03 25.97 26.23 26.32 26.30
Table 1: Denoising PSNR test results of several algorithms on BSD68 with noise levels of . Bold numbers denote the highest performing model, whereas Italics denotes the second highest. PSN outperforms previous state-of-the-art when the noise level is known, however matches it when it is unknown.

4 Empirical Evaluation on Image Restoration

We evaluate our proposed approach against state-of-the-art algorithms on standard benchmarks for the tasks of image denoising and image super resolution. For training we use Adam kingma2014adam

for 50 epochs with a batch size of 128 for all models. Runtimes for evaluated PSN network are on par with the fastest algorithms while outperforming previous state-of-the-arts (provided in the supplementary).

4.1 Image De-noising

Our first task is image denoising where given a noisy image (with a known and unknown level of noise), the task is to output a noiseless version of the image. Image denoising is considered as special case of Eq. 1 where is a delta function with no shift.

Experiment: We train on 400 images of size from the Berkeley Segmentation Dataset (BSD) arbelaez2007berkeley . We set the patch size as , and crop about one million random patches to train. We train four models as described in zhang2017beyond . Three of these models are trained on images with three different levels of Gaussian noise i.e., = 15, 25 and 50. We refer to these models as PSN-K (Proximal Split Net-Known noise level). The fourth model is trained for blind Gaussian denoising, where no level of is assumed. For blind Gaussian denoising, we train a single model and set the range of the noise level in the training images to be [0, 60]. We refer to these models as PSN-U (Unknown noise level). We test on two well known datasets, the Berkeley Segmentation Dataset (BSD68) roth2009fields containing a total of 68 images and Set12 dabov2007image with 12 images with no overlap during training. We compare our approach with several state-of-the-art methods such as BM3D dabov2007image , WNNM gu2014weighted , TRND chen2017trainable , EPLL zoran2011learning , DnCNN zhang2017beyond , MLP burger2012image and CSF schmidt2014shrinkage .

Results: Table 1 showcases the testing PSNR results on BSD68. We observe that PSN-K outperforms all other algorithms to obtain a new state-of-the-art on BSD68. However, the noise-blind version (PSN-U) very closely matches the previous state-of-the-art and for outperform it. Table 2 shows the testing PSNRs for Set12. We find that for most images, PSN-K achieves new state-of-the-arts. The noise-blind model PSN-U also beats the state-of-art on many images in some cases even PSN-K. PSN-U performs particularly well at high levels of noise i.e. . Fig. 1 and Fig. 5 present some qualitative results illustrating the high level of detail PSN recovers. More results are presented in the supplementary.

C.Man House Pepp Starf. Fly Airpl. Parrot Lena Barb. Boat Man Couple
15
BM3D dabov2007image 31.91 34.93 32.69 31.14 31.85 31.07 31.37 34.26 33.10 32.13 31.92 32.10
CSF schmidt2014shrinkage 31.95 34.39 32.85 31.55 32.33 31.33 31.37 34.06 31.92 32.01 32.08 31.98
EPLL zoran2011learning 31.85 34.17 32.64 31.13 32.10 31.19 31.42 33.93 31.38 31.93 32.00 31.93
WNNM gu2014weighted 32.17 35.13 32.99 31.82 32.71 31.39 31.62 34.27 33.60 32.27 32.11 32.17
TNRD chen2017trainable 32.19 34.53 33.04 31.75 32.56 31.46 31.63 34.24 32.13 32.14 32.23 32.11
DnCNN zhang2017beyond 32.10 34.93 33.15 32.02 32.94 31.56 31.63 34.56 32.09 32.35 32.41 32.41
PSN-K (Ours) 32.58 35.04 33.23 32.17 33.11 31.75 31.89 34.62 32.64 32.52 32.39 32.43
PSN-U (Ours) 32.04 35.03 33.21 31.94 32.93 31.61 31.62 34.56 32.49 32.41 32.37 32.43
25
BM3D dabov2007image 29.47 32.99 30.29 28.57 29.32 28.49 28.97 32.03 30.73 29.88 29.59 29.70
CSF schmidt2014shrinkage 29.51 32.41 30.32 28.87. 29.69 28.80 28.91 31.87 28.99 29.75 29.68 29.50
EPLL zoran2011learning 29.21 32.14 30.12 28.48 29.35 28.66 28.96 31.58 28.53 29.64 29.57 29.46
WNNM gu2014weighted 29.63 33.22 30.55 29.09 29.98 28.81 29.13 32.24 31.28 29.98 29.74 29.80
TNRD chen2017trainable 29.72 32.53 30.57 29.09 29.85 28.88 29.18 32.00 29.41 29.91 29.87 29.71
DnCNN zhang2017beyond 29.94 33.05 30.84 29.34 30.25 29.09 29.35 32.42 29.69 30.20 30.09 30.10
PSN-K (Ours) 30.28 33.26 31.01 29.57 30.30 29.28 29.38 32.57 30.17 30.31 30.10 30.18
PSN-U (Ours) 29.79 33.23 30.90 29.30 30.17 29.06 29.25 32.45 29.94 30.25 30.05 30.12
50
BM3D dabov2007image 26.13 29.69 26.68 25.04 25.82 25.10 25.90 29.05 27.22 26.78 26.81 26.46
MLP burger2012image 26.37 29.64 26.68 25.43 26.26 25.56 26.12 29.32 25.24 27.03 27.07 26.67
WNNM gu2014weighted 26.45 30.33 26.95 25.44 26.32 25.42 26.14 29.25 27.79 26.97 26.95 26.64
TNRD chen2017trainable 26.62 29.48 27.10 25.42 26.31 25.59 26.16 28.93 25.70 26.94 26.98 26.50
DnCNN zhang2017beyond 27.03 30.02 27.39 25.72 26.83 25.89 26.48 29.38 26.38 27.23 27.23 26.09
PSN-K (Ours) 27.10 30.34 27.40 25.84 26.92 25.90 26.56 29.54 26.45 27.20 27.21 27.09
PSN-U (Ours) 27.21 30.21 27.53 25.63 26.93 25.89 26.62 29.54 26.56 27.27 27.23 27.04
Table 2: Denoising PSNR results of several algorithms on Set12 with noise levels of . Bold numbers denote the highest performing model, whereas Italics denotes the second highest. PSN outperforms state-of-the-art for many images.

4.2 Image Super-Resolution

Our second task aims to reconstruct a high-resolution image from a single low-resolution image . Image super-resolution is considered as special case of Eq. 1 where is a bicubic down sampling filter with no added noise.

Experiment: For training, we use DIV2K dataset. The dataset consists of 800 training images (2K resolution).The data set is augmented with random horizontal flips and rotations. We set the high resolution patch size to be . The low res patches are generated via bicubic down sampling of the high resolution patches. We trained a single model for each of three different scales i.e., 2X, 3X and 4X. We test our algorithm on four benchmark datasets. The datasets are Set5 bevilacqua2012low , Set14 zeyde2010single , BSDS100 arbelaez2011contour and URBAN100 huang2015single . We compare our approach with several state-of-the-art methods such as A+ timofte2014a+ , RFL schulter2015fast , SelfExSR huang2015single , SRCNN dong2016image , FSRCNN dong2016accelerating , SCN wang2015deep , DRCN kim2016deeply , LapSRN lai2017deep and VDSR kim2016accurate in terms of the PSNR and SSIM metrics as in kim2016accurate .

Results: From Table. 5, it is clear that PSN achieves state-of-the-art results both in terms of PSNR and SSIM for all four benchmarks for all scales by a significant margin. This demonstrates the efficacy of the algorithm in application to the image super-resolution problem. Fig. 2 and Fig. 5 present some qualitative results. Notice that PSN recovers complex structures more clearly. More results are presented in the supplementary.

Algorithm Scale SET5 SET14 BSDS100 URBAN100
PSNR / SSIM PSNR / SSIM PSNR / SSIM PSNR / SSIM
Bicubic 33.69 / 0.931 30.25 / 0.870 29.57 / 0.844 26.89 / 0.841
FSRCNN dong2016accelerating 37.05 / 0.956 32.66 / 0.909 31.53 / 0.892 29.88 / 0.902
DRCN kim2016deeply 37.63 / 0.959 33.06 / 0.912 31.85 / 0.895 30.76 / 0.914
LapSRN lai2017deep 2X 37.52 / 0.959 33.08 / 0.913 31.80 / 0.895 30.41 / 0.910
DRRN tai2017image 37.74 / 0.959 33.23 / 0.914 32.05 / 0.897 31.23 / 0.919
VDSR kim2016accurate 37.53 / 0.959 33.05 / 0.913 31.90 / 0.896 30.77 / 0.914
PSN (Ours) 38.09 / 0.960 33.68 / 0.919 32.33 / 0.901 31.97 / 0.921
Bicubic 30.41 / 0.869 27.55 / 0.775 27.22 / 0.741 24.47 / 0.737
FSRCNN dong2016accelerating 33.18 / 0.914 29.37 / 0.824 28.53 / 0.791 26.43 / 0.808
DRCN kim2016deeply 33.83 / 0.922 29.77 / 0.832 28.80 / 0.797 27.15 / 0.828
LapSRN lai2017deep 3X 33.82 / 0.922 29.87 / 0.832 28.82 / 0.798 27.07 / 0.828
DRRN tai2017image 34.03 / 0.924 29.96 / 0.835 28.95 / 0.800 27.53 / 0.764
VDSR kim2016accurate 33.67 / 0.921 29.78 / 0.832 28.83 / 0.799 27.14 / 0.829
PSN (Ours) 34.56 / 0.927 30.14 / 0.845 29.26 / 0.809 27.43 / 0.757
Bicubic 28.43 / 0.811 26.01 / 0.704 25.97 / 0.670 23.15 / 0.660
FSRCNN dong2016accelerating 30.72 / 0.866 27.61 / 0.755 26.98 / 0.715 24.62 / 0.728
DRCN kim2016deeply 31.54 / 0.884 28.03 / 0.768 27.24 / 0.725 25.14 / 0.752
LapSRN lai2017deep 4X 31.54 / 0.885 28.19 / 0.772 27.32 / 0.727 25.21 / 0.756
DRRN tai2017image 31.68 / 0.888 28.21 / 0.772 27.38 / 0.728 25.44 / 0.764
VDSR kim2016accurate 31.35 / 0.883 28.02 / 0.768 27.29 / 0.726 25.18 / 0.754
PSN (Ours) 32.36 / 0.896 28.40 / 0.786 27.73 / 0.742 25.63 / 0.768
Table 3: The PSNR and SSIM results of several algorithms on four image super resolution benchmarks. PSN outperforms all previous algorithms significantly and consistently (except for the 3X case on Urban100). PSN also outperforms the works of timofte2014a+ ; schulter2015fast ; huang2015single ; dong2016image ; wang2015deep on all four benchmarks, whose specific results we present in the supplementary due to space constraints.
Figure 5: The first row shows the de-noising results of “Airpl.” (Set12) compared against BM3D dabov2007image with added Gaussian noise (). The second row illustrates the super-resolution output of “Comic”(Set14) compared against VDSR kim2016accurate with a down-sampling (scale) factor of 4. PSN recovers finer details and more complex structures than the baselines.

4.3 Conclusion

We proposed a theoretically motivated novel deep architecture for image recovery, inspired from the half quadratic algorithm and utilizing the proximal operator. Extensive experiments in image denoising and image super resolution demonstrated the proposed Proximal Splitting Network is effective and achieves a new state-of-the-art on both tasks. Furthermore, the proposed framework is flexible and can be potentially applied to other tasks such as image in-painting and compressed sensing, which are left to be explored in future work.

5 Appendix: Algorithms as Special Cases of the Proximal Splitting Network Optimization Problem

VDSR kim2016accurate as a special case. We now describe the relationship of Proximal Splitting Networks to some of the other deep learning approaches. The authors in kim2016accurate present a single-image super-resolution method called VDSR. In this work, they use a very deep convolutional network with residual-learning. We find that VDSR is special case of our formulation. VDSR can be modelled by modifying Eq. 12 (from the paper). We set , to be the bi-cubic up-sampling filter, to be 2 and where is the low res image and the up-sampled low res image via bi-cubic interpolation. Further, the last convolution layer of is a filter that can be represented by a matrix with weight equivalents to and the loss objective being the loss, we will have the following optimization problem:

(13)

Here, is modelled to be a deep CNN that consists of 20 layers. This formulation is the exact formulation of the VDSR method.

DnCNN zhang2017beyond as a special case.

Similarly, the Denoising Convolutional Neural Networks (DnCNN

zhang2017beyond ) can also be modelled as a special case of the PSN optimization problem (Eq. 12 from the paper). In this work the authors propose a CNN for image denoising. DnCNN model has the ability to recover images when the noise level is unknown. DnCNN can be represented by the formula in Eq. 13 if is a deep convolutional neural network, is the noisy image and

being the identity matrix. The formulation then describes the architecture of DnCNN.

Thus, we find that some previous approaches can be modelled as special cases of our formulation (Eq. 12 from the paper). Our approach we find, not only theoretically generalizes these methods, but also outperforms them practically on two image restoration tasks.

Furthermore, deep multi-scale convolutional neural network for dynamic scene deblurring nah2017deep is another special case of our approach. In this work, they proposed a blind deblurring method with CNN. the proposed network is a multi-scale convolutional neural network that recovers sharp images where blur is caused by several motion filters. To show that this approach is special case of our method, we need to present the formula of combing PSN with multi scale architecture first. The optimization formula of this combination is:

(14)
s.t

Where is a de-convolution filternoh2015learning .

is sub sampling matrix that reduce the size of the vector that is multiplied with.

is the identify matrix, is a down sampling matrix by .

To show that the proposed approach in nah2017deep is special case of PSN ,we need to manipulate with the value of and in Eq. 14 since the filter is unknown. Thus, it can be written as:

(15)
s.t

Optimizing this function is exactly equivalent to the approach in nah2017deep when is a deep residual network he2016deep

By applying the same methodology that we used in the previous three cases, we can show that our approach is general method of lai2017deep ; xu2014deep too.

6 Proof of Eq. 4

In Eq. 3 (from the paper) , can be approximated via the second order of Taylor series since it is twice differentiable. Thus, the optimization problem in Eq. 3 (from the paper) will be

(16)

Eq. 16 is convex. Therefore, Minimizing Eq. 16 can be found by taking the first derivative and computing the roots(when the function equals zero). The result will be:

(17)

when is large, the proximal of function can be approximated to be

(18)

7 Complexity

Table 4 shows the run times of different methods for denoising. The input images have three different sizes(, and . We see that the two versions of the PSN network, PSN-K and PSN-U are one of the fastest algorithms with less than 0.1 seconds for images less than . Though it is slightly slower than the networks of TNRD chen2017trainable and DnCNN zhang2017beyond , it still processes faster than 0.5 seconds for a image.

Method BM3D dabov2007image WNNM gu2014weighted EPLL zoran2011learning MLP burger2012image CSF schmidt2014shrinkage TNRD chen2017trainable DnCNN zhang2017beyond PSN-K PSN-U
0.65 203.1 25.4 1.42 2.11 0.010 0.016 0.017 0.018
2.85 773.2 45.5 5.51 5.67 0.032 0.060 0.072 0.081
11.89 2536.4 422.1 19.4 40.8 0.116 0.235 0.345 0.378
Table 4: The complexity in seconds for 3 different sizes

8 Complete Tabular result for Super-resolution

Table. 5 shows the full version of Table. 3 from the main paper. This is the complete result for the super-resolution experiments. We find that PSN still achieves state-of-the-art results on most benchmarks and settings.

Algorithm Scale SET5 SET14 BSDS100 URBAN100
PSNR / SSIM PSNR / SSIM PSNR / SSIM PSNR / SSIM
Bicubic 33.69 / 0.931 30.25 / 0.870 29.57 / 0.844 26.89 / 0.841
A+ timofte2014a+ 36.60 / 0.955 32.32 / 0.906 31.24 / 0.887 29.25 / 0.895
RFL schulter2015fast 36.59 / 0.954 32.29 / 0.905 31.18 / 0.885 29.14 / 0.891
SelfExSR huang2015single 36.60 / 0.955 32.24 / 0.904 31.20 / 0.887 29.55 / 0.898
SRCNN dong2016image 36.72 / 0.955 32.51 / 0.908 31.38 / 0.889 29.53 / 0.896
FSRCNN dong2016accelerating 37.05 / 0.956 32.66 / 0.909 31.53 / 0.892 29.88 / 0.902
SCN wang2015deep 2X 36.58 / 0.954 32.35 / 0.905 31.26 / 0.885 29.52 / 0.897
DRCN kim2016deeply 37.63 / 0.959 33.06 / 0.912 31.85 / 0.895 30.76 / 0.914
LapSRN lai2017deep 37.52 / 0.959 33.08 / 0.913 31.80 / 0.895 30.41 / 0.910
DRRN tai2017image 37.74 / 0.959 33.23 / 0.914 32.05 / 0.897 31.23 / 0.919
VDSR kim2016accurate 37.53 / 0.959 33.05 / 0.913 31.90 / 0.896 30.77 / 0.914
PSN (Ours) 38.09 / 0.960 33.68 / 0.919 32.33 / 0.901 31.97 / 0.921
Bicubic 30.41 / 0.869 27.55 / 0.775 27.22 / 0.741 24.47 / 0.737
A+ timofte2014a+ 32.62 / 0.909 29.15 / 0.820 28.31 / 0.785 26.05 / 0.799
RFL schulter2015fast 32.47 / 0.906 29.07 / 0.818 28.23 / 0.782 25.88 / 0.792
SelfExSR huang2015single 32.66 / 0.910 29.18 / 0.821 28.30 / 0.786 26.45 / 0.810
SRCNN dong2016image 32.78 / 0.909 29.32 / 0.823 28.42 / 0.788 26.25 / 0.801
FSRCNN dong2016accelerating 33.18 / 0.914 29.37 / 0.824 28.53 / 0.791 26.43 / 0.808
SCN wang2015deep 3X 32.62 / 0.908 29.16 / 0.818 28.33 / 0.783 26.21 / 0.801
DRCN kim2016deeply 33.83 / 0.922 29.77 / 0.832 28.80 / 0.797 27.15 / 0.828
LapSRN lai2017deep 33.82 / 0.922 29.87 / 0.832 28.82 / 0.798 27.07 / 0.828
DRRN tai2017image 34.03 / 0.924 29.96 / 0.835 28.95 / 0.800 27.53 / 0.764
VDSR kim2016accurate 33.67 / 0.921 29.78 / 0.832 28.83 / 0.799 27.14 / 0.829
PSN (Ours) 34.56 / 0.927 30.14 / 0.845 29.26 / 0.809 27.43 / 0.757
Bicubic 28.43 / 0.811 26.01 / 0.704 25.97 / 0.670 23.15 / 0.660
A+ timofte2014a+ 30.32 / 0.860 27.34 / 0.751 26.83 / 0.711 24.34 / 0.721
RFL schulter2015fast 30.17 / 0.855 27.24 / 0.747 26.76 / 0.708 24.20 / 0.712
SelfExSR huang2015single 30.34 / 0.862 27.41 / 0.753 26.84 / 0.713 24.83 / 0.740
SRCNN dong2016image 30.50 / 0.863 27.52 / 0.753 26.91 / 0.712 24.53 / 0.725
FSRCNN dong2016accelerating 30.72 / 0.866 27.61 / 0.755 26.98 / 0.715 24.62 / 0.728
SCN wang2015deep 4X 30.41 / 0.863 27.39 / 0.751 26.88 / 0.711 24.52 / 0.726
DRCN kim2016deeply 31.54 / 0.884 28.03 / 0.768 27.24 / 0.725 25.14 / 0.752
LapSRN lai2017deep 31.54 / 0.885 28.19 / 0.772 27.32 / 0.727 25.21 / 0.756
DRRN tai2017image 31.68 / 0.888 28.21 / 0.772 27.38 / 0.728 25.44 / 0.764
VDSR kim2016accurate 31.35 / 0.883 28.02 / 0.768 27.29 / 0.726 25.18 / 0.754
PSN (Ours) 32.36 / 0.896 28.40 / 0.786 27.73 / 0.742 25.63 / 0.768
Table 5: The PSNR and SSIM results of several algorithms for Super-res

References

  • (1) A. Altintaç, E. E. Altshuler, J. B. Andersen, M. Ando, E. Arvas, R. Raird, L. A. Baker, B. B. Balslcy, W. L. Ecklund, D. A. Bathker, et al. 1988 index ieee transactions on antennas and propagation.
  • (2) M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies. Image coding using wavelet transform. IEEE Transactions on image processing, 1(2):205–220, 1992.
  • (3) P. Arbelaez, C. Fowlkes, and D. Martin. The berkeley segmentation dataset and benchmark. see http://www. eecs. berkeley. edu/Research/Projects/CS/vision/bsds, 2007.
  • (4) P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik. Contour detection and hierarchical image segmentation. IEEE transactions on pattern analysis and machine intelligence, 33(5):898–916, 2011.
  • (5) A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences, 2(1):183–202, 2009.
  • (6) M. Bevilacqua, A. Roumy, C. Guillemot, and M. L. Alberi-Morel. Low-complexity single-image super-resolution based on nonnegative neighbor embedding. 2012.
  • (7) S. A. Bigdeli, M. Zwicker, P. Favaro, and M. Jin. Deep mean-shift priors for image restoration. In Advances in Neural Information Processing Systems, pages 763–772, 2017.
  • (8) H. C. Burger, C. J. Schuler, and S. Harmeling. Image denoising: Can plain neural networks compete with bm3d? In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 2392–2399. IEEE, 2012.
  • (9) P. J. Burt and E. H. Adelson. The laplacian pyramid as a compact image code. In Readings in Computer Vision, pages 671–679. Elsevier, 1987.
  • (10) Y. Chen and T. Pock. Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration. IEEE transactions on pattern analysis and machine intelligence, 39(6):1256–1272, 2017.
  • (11) K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Image denoising by sparse 3-d transform-domain collaborative filtering. IEEE Transactions on image processing, 16(8):2080–2095, 2007.
  • (12) C. Dong, C. C. Loy, K. He, and X. Tang. Image super-resolution using deep convolutional networks. IEEE transactions on pattern analysis and machine intelligence, 38(2):295–307, 2016.
  • (13) C. Dong, C. C. Loy, and X. Tang. Accelerating the super-resolution convolutional neural network. In European Conference on Computer Vision, pages 391–407. Springer, 2016.
  • (14) M. Elad and M. Aharon. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image processing, 15(12):3736–3745, 2006.
  • (15) G. Ghiasi and C. C. Fowlkes. Laplacian pyramid reconstruction and refinement for semantic segmentation. In European Conference on Computer Vision, pages 519–534. Springer, 2016.
  • (16) S. Gu, L. Zhang, W. Zuo, and X. Feng. Weighted nuclear norm minimization with application to image denoising. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2862–2869, 2014.
  • (17) K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  • (18) F. Heide, M. Steinberger, Y.-T. Tsai, M. Rouf, D. Pająk, D. Reddy, O. Gallo, J. Liu, W. Heidrich, K. Egiazarian, et al. Flexisp: A flexible camera image processing framework. ACM Transactions on Graphics (TOG), 33(6):231, 2014.
  • (19) P. Hoeher, S. Kaiser, and P. Robertson. Two-dimensional pilot-symbol-aided channel estimation by wiener filtering. In Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on, volume 3, pages 1845–1848. IEEE, 1997.
  • (20) J.-B. Huang, A. Singh, and N. Ahuja. Single image super-resolution from transformed self-exemplars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5197–5206, 2015.
  • (21) D. R. Hunter and K. Lange. A tutorial on mm algorithms. The American Statistician, 58(1):30–37, 2004.
  • (22) S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
  • (23) M. Jin, S. Roth, and P. Favaro. Noise-blind image deblurring. In Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on. IEEE, 2017.
  • (24) J. Kim, J. Kwon Lee, and K. Mu Lee. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1646–1654, 2016.
  • (25) J. Kim, J. Kwon Lee, and K. Mu Lee. Deeply-recursive convolutional network for image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1637–1645, 2016.
  • (26) K. I. Kim and Y. Kwon. Single-image super-resolution using sparse regression and natural image prior. IEEE transactions on pattern analysis and machine intelligence, 32(6):1127–1133, 2010.
  • (27) D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  • (28) W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang. Deep laplacian pyramid networks for fast and accurate super-resolution. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 624–632, 2017.
  • (29) S. Mallat. A wavelet tour of signal processing: the sparse way. Academic press, 2008.
  • (30) T. Meinhardt, M. Möller, C. Hazirbas, and D. Cremers. Learning proximal operators: Using denoising networks for regularizing inverse imaging problems. ArXiv e-prints, Apr, 2017.
  • (31) S. Nah, T. H. Kim, and K. M. Lee. Deep multi-scale convolutional neural network for dynamic scene deblurring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 2, 2017.
  • (32) M. Nikolova and M. K. Ng. Analysis of half-quadratic minimization methods for signal and image recovery. SIAM Journal on Scientific computing, 27(3):937–966, 2005.
  • (33) H. Noh, S. Hong, and B. Han. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision, pages 1520–1528, 2015.
  • (34) N. Parikh, S. Boyd, et al. Proximal algorithms. Foundations and Trends® in Optimization, 1(3):127–239, 2014.
  • (35) S. Paris, S. W. Hasinoff, and J. Kautz. Local laplacian filters: Edge-aware image processing with a laplacian pyramid. ACM Trans. Graph., 30(4):68–1, 2011.
  • (36) P. Perona and J. Malik. Scale-space and edge detection using anisotropic diffusion. IEEE Transactions on pattern analysis and machine intelligence, 12(7):629–639, 1990.
  • (37) J. Rick Chang, C.-L. Li, B. Poczos, B. Vijaya Kumar, and A. C. Sankaranarayanan. One network to solve them all–solving linear inverse problems using deep projection models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5888–5897, 2017.
  • (38) S. Roth and M. J. Black. Fields of experts. International Journal of Computer Vision, 82(2):205, 2009.
  • (39) R. Rubinstein, A. M. Bruckstein, and M. Elad. Dictionaries for sparse representation modeling. Proceedings of the IEEE, 98(6):1045–1057, 2010.
  • (40) U. Schmidt and S. Roth. Shrinkage fields for effective image restoration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2774–2781, 2014.
  • (41) U. Schmidt, C. Rother, S. Nowozin, J. Jancsary, and S. Roth. Discriminative non-blind deblurring. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pages 604–611. IEEE, 2013.
  • (42) S. Schulter, C. Leistner, and H. Bischof. Fast and accurate image upscaling with super-resolution forests. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3791–3799, 2015.
  • (43) J. Sun, Z. Xu, and H.-Y. Shum. Image super-resolution using gradient profile prior. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1–8. IEEE, 2008.
  • (44) Y. Tai, J. Yang, and X. Liu. Image super-resolution via deep recursive residual network. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 1, 2017.
  • (45) R. Timofte, V. De Smet, and L. Van Gool. A+: Adjusted anchored neighborhood regression for fast super-resolution. In Asian Conference on Computer Vision, pages 111–126. Springer, 2014.
  • (46) I. Tosic and P. Frossard. Dictionary learning. IEEE Signal Processing Magazine, 28(2):27–38, 2011.
  • (47) Y. Wang, J. Yang, W. Yin, and Y. Zhang. A new alternating minimization algorithm for total variation image reconstruction. SIAM Journal on Imaging Sciences, 1(3):248–272, 2008.
  • (48) Z. Wang, D. Liu, J. Yang, W. Han, and T. Huang. Deep networks for image super-resolution with sparse prior. In Proceedings of the IEEE International Conference on Computer Vision, pages 370–378, 2015.
  • (49) L. Xu, J. S. Ren, C. Liu, and J. Jia. Deep convolutional neural network for image deconvolution. In Advances in Neural Information Processing Systems, pages 1790–1798, 2014.
  • (50) R. Zeyde, M. Elad, and M. Protter. On single image scale-up using sparse-representations. In International conference on curves and surfaces, pages 711–730. Springer, 2010.
  • (51) K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing, 26(7):3142–3155, 2017.
  • (52) D. Zoran and Y. Weiss. From learning models of natural image patches to whole image restoration. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 479–486. IEEE, 2011.