DeepRED
DeepRED: Deep Image Prior Powered by RED
view repo
Inverse problems in imaging are extensively studied, with a variety of strategies, tools, and theory that have been accumulated over the years. Recently, this field has been immensely influenced by the emergence of deep-learning techniques. One such contribution, which is the focus of this paper, is the Deep Image Prior (DIP) work by Ulyanov, Vedaldi, and Lempitsky (2018). DIP offers a new approach towards the regularization of inverse problems, obtained by forcing the recovered image to be synthesized from a given deep architecture. While DIP has been shown to be effective, its results fall short when compared to state-of-the-art alternatives. In this work, we aim to boost DIP by adding an explicit prior, which enriches the overall regularization effect in order to lead to better-recovered images. More specifically, we propose to bring-in the concept of Regularization by Denoising (RED), which leverages existing denoisers for regularizing inverse problems. Our work shows how the two (DeepRED) can be merged to a highly effective recovery process while avoiding the need to differentiate the chosen denoiser, and leading to very effective results, demonstrated for several tested inverse problems.
READ FULL TEXT VIEW PDFDeepRED: Deep Image Prior Powered by RED
Glow+RED
Inverse problems in imaging center around the recovery of an unknown image based on given corrupted measurement . These problems are typically posed as energy minimization tasks, drawing their mathematical formulation from a statistical (Bayesian) modeling of the posterior distribution, . As inverse problems tend to be ill-posed, a key in the success of the recovery process is the choice of the regularization, which serves as the image prior that stabilizes the degradation inversion, and directs the outcome towards a more plausible image.
The broad field of inverse problems in imaging has been extensively explored in the past several decades. This vast work has covered various aspects, ranging from the formulation of such problems, through the introduction of diverse ways to pose and use the regularization, and all the way to optimization techniques for minimizing the obtained energy function. This massive research effort has led to one of the most prominent fields in the broad arena of imaging sciences, and to many success stories in applications, treating problems such as denoising, deblurring, inpainting, super-resolution, tomographic reconstruction, and more.
The emergence of deep-learning a decade ago brought a revolution to the way machine learning is practiced. At first, this feat mostly focused on supervised classification tasks, leading to state-of-the-art results in challenging recognition applications. However, this revolution found its way quite rapidly to inverse problems in imaging, due to the ability to consider these as specific regression problems. The practiced rationale in such schemes is as follows: Given many examples of pairs of an original image and its corrupted version, one could learn a deep network to match the degraded image to its source. This became a commonly suggested and very effective path to the above-described classical Bayesian alternative, see e.g.,
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12].The recent work by Ulyanov et. al. [13, 14] is an exceptional contribution in the intersection between inverse problems and deep-learning. This work presents the Deep Image Prior (DIP) method, a new strategy for handling the regularization task in inverse problems. Rather than taking the supervised avenue, as most earlier methods do, DIP suggests to use the deep network itself as the regularizer to the inverse problem. More specifically, DIP removes the explicit regularization, and replaces it by the assumption that the unknown image should be a generated image from a learned network. DIP fits the network’s parameters for the corrupted image, this way adapting it per each image to be treated. Our special interest in this work stems from the brilliant idea of implicitly using the structure of a network^{1}^{1}1… and possibly the optimization strategy as well. to obtain a regularization effect in recovering .
While DIP has been shown to be quite effective, and demonstrated successfully on several inverse problems (denoising, JPEG artifact removal, inpainting, and super-resolution), its results still fall short of state-of-the-art. This brings up the idea to offer an extra boost to DIP by returning the explicit regularization, so as to enrich the implicit one, and this way lead to better recovered images. The natural question is, of-course, which regularization to use, as there are so many options available. Interestingly, the need to bring back an extra regularization came up quite recently in the work reported in [15], where Total-Variation [16] has been used and shown to lead to improved recovery results.
Contributions: In this work we propose to bring in the recently introduced concept of Regularization by Denoising (RED) [17] and merge it with DIP. The special appeal of RED is its superiority to many other regularization schemes, and its elegant reliance on existing denoising algorithms^{2}^{2}2Even deep-learning based denoisers, e.g. [1, 2, 3, 4] can be used. for defining the regularization term. A special challenge in our work is finding a way to train the new compound objective, DIP+RED, while avoiding an explicit differentiation of the denoising function. This is achieved using the Alternating Directions Methods of Multipliers (ADMM) [18], which enjoys an extra side-benefit: a stable recovery with respect to the stopping rule employed. The proposed scheme, termed DeepRED, is tested on image denoising, single image super-resolution, and image deblurring, showing the clear benefit that RED provides. The obtained results exhibit a marked improvement, both with respect to the native RED as reported in [17], and DIP itself. Indeed, DeepRED gets closer to supervised solvers of inverse problems, despite being an unsupervised method.
This paper is organized as follows: The next section presents background material on the inverse problems we target in this work, as well as describing DIP and RED, the two pillars of this work. In Section 3 we present the combined DeepRED scheme, and develop the ADMM algorithm for its training. Section 4 presents our experimental results, validating the benefits of the additional explicit regularization on a series of inverse problems. We conclude the paper in Section 5 by summarizing its message and results and proposing potential future research directions.
In this section, we give more details on the inverse problems we target, and briefly present both the Deep Image Prior (DIP) approach and the concept of Regularization by Denoising (RED).
Within the broad field of inverse problems, our work considers the case where the measurement is given by , where is any known linear degradation matrix, and is an Additive White Gaussian Noise (AWGN). The recovery of from could be obtained by solving
(1) |
where serves as the chosen regularization term. By modifying the operator , we can switch between several popular problems in image recovery:
Denoising is obtained for ,
Deblurring (deconvolution) assumes that is a convolution filter,
Inpainting is obtained when
is built as the identity matrix with missing rows referring to missing samples,
Super-resolution refers to a matrix that represents both a blur followed by a sub-sampling, and
Tomographic reconstruction assumes that applies the Radon projections or portion thereof.
We stress that the paradigm presented in this paper could be easily extended to handle other types of noise (e.g., Laplace, Poisson, Gamma, or other models). This should be done by replacing the expression by the minus log-likelihood of the appropriate distribution, as done in [19] in the context of the Poisson noise. Note that our view on this matter is somewhat different from the view of the authors of [13], who suggest to handle other types of noise while still using the penalty.
DIP embarks from the formulation posed in Equation (1), and starts by removing the regularization term . The idea is to find the minimizer of the first term,
. However, this amounts to the Maximum-Likelihood Estimate (MLE), which is known to perform very poorly for the inverse problem cases considered in this paper. DIP overcomes this weakness by assuming that the unknown,
, should be the output of a deep network, , whereis a fixed random vector, and
stands for the network’s parameters to be learned. Thus, DIP suggests to solve(2) |
and presents as the recovered image.
Observe that the training of itself serves also as the inference, i.e., this raining is the recovery process, and this should be done for each input image separately and independently. This procedure is “unsupervised” in the sense that no ideal outcome (label) is presented to the learning. Rather, the training is guided by the attempt to best match the output of the network to the measured and corrupted image. Over-fitting in this case amounts to a recovery of that minimizes the above expression while being of poor visual quality. This is avoided due to the implicit regularization imposed by the architecture of the network and the early stopping.^{3}^{3}3The number of iterations is bounded so as to avoid overfitting. Indeed, the fact that DIP operates well and recovers high quality images could be perceived as a manifestation of the “correctness” of the chosen architecture to represent image synthesis.
In practice, DIP performs very well. The work in [13] reports several sets of experiments on (i) image denoising – leading to performance that is little bit weaker than CBM3D [20] and better than NLM [21]
; (ii) Single Image Super-Resolution – leading to substantially better results than bicubic interpolation and TV-based restoration, but inferior to the learning based methods
[22, 23]; and (iii) Inpainting – in which the results are shown to be much better than CSC-based ones [24].The quest for an effective regularization for inverse problems in imaging has played a central role in the vast progress of this field. Various ideas were brought to serve the construction of in Equation (1), all aiming to identify sources of inner structure in visual data. These may rely on piecewise spatial smoothness (e.g., [16]), self-similarity across different positions and scales (e.g., [21]) sparsity with respect to a properly chosen transform or representation (e.g. [20]), and more.
Among the various inverse problems mentioned above, denoising has gained a unique position due to its relative simplicity. This problem has become the de-facto testbed for exploring new regularization ideas. As a consequence, many highly effective and trustworthy denoising algorithms were developed in the past two decades. This brought a surprising twist in the evolution of regularizers, turning the table and seeking a way to construct a regularization by using denoising algorithms. The plug-and-play-prior [25] and the Regularization by Denoising (RED) [17] are two prime such techniques for turning a denoiser into a regularization. RED suggests to use the following as the regularization function:
(3) |
where is a denoiser of choice. We will not dwell on the rationale of this expression, beyond stating its close resemblance to a spatial smoothness term. Amazingly, under mild conditions^{4}^{4}4The function should be differentiable, have a symmetric Jacobian, satisfy a local homogeneity condition, and be passive. on , two key and highly beneficial properties are obtained: (i) The gradient of w.r.t. is simple and given by , which avoids differentiating the denoiser function; and (ii) is a convex functional. The work reported in [17] introduced the concept of RED and showed how to leverage these two properties in order to obtain an effective regularization for various inverse problems. Our goal in this work is to bring this method to DIP, with the hope to boost its performance.
Merging DIP^{5}^{5}5Note that all the derivations and algorithms proposed in this paper are applicable just as well to Deep-Decoder [26], an appealing followup work to DIP that promotes a simpler architecture for . and RED, our objective function becomes
(4) | |||
Note that a simple strategy is to avoid the use of and define the whole optimization w.r.t. the unknowns . This calls for solving
While this may seem simpler, it is in fact leading to a near dead-end, since back-propagating over calls for the differentiation of the denoising function . For most denoisers this would be a daunting task that must be avoided. As we have explained above, under mild conditions, RED enjoys the benefit of avoiding such a direct differentiation, and we would like to leverage this property here.
The remedy to this problem comes in the form of the Alternating Directions Method of Multipliers (ADMM) [18]. Starting with Equation (4), we turn the constraint into a penalty using the Augmented Lagrangian (AL) [27]:
In this expression stands for the Lagrange multipliers vector for the set of equality constraints, and is a free parameter to be chosen. Merging the last two terms, we get the scaled form of the AL [27],
The ADMM algorithm amounts to a sequential update of the three unknowns in this expression: , , and . Fixing and , the update of is done by solving
(7) |
which is very close in spirit to the optimization done in DIP (using back-propagation), modified by a proximity regularization that forces to be close to . This proximity term provides as an additional stabilizing and robustifying effect to the DIP minimization.
Fixing and , should be updated by solving
(8) |
This is a classic RED objective [17], representing a denoising of the image , and we suggest solving it in one of two ways: The first option is using the fixed-point strategy by zeroing the derivative of the above w.r.t. , and exploiting the fact that . This leads to
(9) |
Assigning indices to the above equation,
(10) |
leads to the update formula
(11) |
Applying this iterative update several times provides the needed update for . An alternative approach for updating is a simpler steepest-descent, using the above described gradient. Thus, the update equation would be
(12) |
and should be chosen so as to guarantee a descent.
The original DIP algorithm [13] offers three features that influence the output quality of the restored images. The first is an early stopping, which prevents the network from overfitting to the measurements. The second is a smoothing applied on the outcome of the last iterations, and the third is an averaging over separate runs with a different random vector . Our tests implement all these as well, but we emphasize that the early stopping is relevant in our DeepRED scheme only for saving computations, as the explicit regularization robustifies the recovery from the risk of overfitting.
Due to the involvement of a highly non-linear system in our overall optimization, no convergence guarantees can be provided. In addition, when using denoisers that violate the conditions posed in [17], the denoising residual is no longer the exact derivative of the RED prior. Nevertheless, as we show in the experimental results, tendency for a consistent descent and a convergence are obtained empirically.
In our tests we have chosen , which means that the denoiser is applied once in each ADMM round of updates. The heaviest loads in our algorithm are both the update of and the activation of the denoiser. Fortunately, we can speed the overall run of the algorithm by adopting the following two measures: (i) The denoiser and the update of can be run in parallel, as shown in Figure 1; and (ii) We apply the denoiser once every few outer iterations of the ADMM in order to save run-time.
We now present a series of experiments in which we test the proposed DeepRED scheme. We consider three applications: image denoising and Single Image Super-Resolution (SISR), which were also studied in [13], and image deblurring, following the experiment reported in [17]. Our aim in all these experiments is to show that
DeepRED is better than DIP,
DeepRED is better than RED,
For denoising only: DeepRED is better than the denoiser that RED uses,
DeepRED is better than DIP+TV [15], and
DeepRED behaves well numerically.
DeepRED is on par with supervised learning solutions.
In all the reported tests the same network as in [13]
is used with an i.i.d. Gaussian random input tensor of size
with , where is the size of the output image to synthesize. Table 1summarizes the various parameters used for each application. These include the additional noise perturbation standard-deviation (
), the learning rate (LR), the employed denoiser and the noise level fed to it , the values of and (see 1), and the number of iterations. All the reported results for DIP are obtained by directly running the released code. We note that there are slight changes between the values we get and the ones reported in [13].When using DeepRED, we employ the Fixed-Point Strategy as described in 1, and apply the denoiser once () every iterations. Following [13], in the deblurring and super-resolution experiments, the results are compared on the luminance channel, whereas the denoising results are evaluated with all three channels.
Parameters | |||||||
---|---|---|---|---|---|---|---|
LR | denoiser | iter. | |||||
Denoising | 0.033 | 0.01 | NLM | 3 | 0.5 | 0.5 | 5000 |
SISR x4 | 0.02 | 0.001 | BM3D | 5 | 0.05 | 0.06 | 2000 |
SISR x8 | 0.02 | 0.001 | BM3D | 5 | 0.05 | 0.06 | 4000 |
Deblur (Uniform) | 0.005 | 0.003 | NLM | 3 | 0.01 | 0.02 | 15000 |
Deblur (Gauss.) | 0.005 | 0.002 | NLM | 2 | 0.015 | 0.02 | 20000 |
In this experiment, which follows the one in [13], the goal is to remove a white additive Gaussian noise with from the given images. We evaluate our results on color images^{6}^{6}6 http://www.cs.tut.fi/~foi/GCF-BM3D/.. The regularization denoiser we use is Python’s scikit-image fast version of Non-Local-Means [21]. The average PSNR (Peak Signal-to-Noise Ratio) of this NLM filter stands on dB. When plugged into RED, the performance improves to dB. Turning to DIP and its boosted version, DIP’s best result is obtained using both averaging strategies (sliding window and average over two runs) getting to dB, whereas DeepRED obtains dB – a improvement.
Comparing our results to the ones in [15] poses some difficulties, since their performance is given in SNR and not PSNR. Also, we suspect that DIP is poorly functioning in their tests due to the excessive number of iterations used. Disregarding these reservations, we may state that [15] reports of an dB improvement over DIP in image denoising with , whereas our gain stands on dB.
We use this experiment to briefly discuss run-time of the involved algorithms. Both DIP and DeepRED are quite demanding optimization processes. When used with the same number of iterations (), DeepRED is clearly slower due to the additional denoising computations. In this case, the average run-time^{7}^{7}7All the reported simulations are run on Intel(R) Core(TM) i7-5930K CPU@3.50GHz with a TITAN Xp GPU. of DIP on the test images is minutes per image, whereas DeepRED requires minutes.
This experiment follows [13] as well. Given a low-resolution image, the goal is to recover it’s scaled-up version. We test scaling factors of and and compare our results to both DIP [13] and RED [17] on two datasets. These results are summarized in Tables 2 and 3. As can be seen, RED+DIP is consistently better than both DIP or RED alone. Figure 2 presents two visual results taken from these experiments to illustrate the recovery obtained.
Interestingly, DeepRED gets close to the recent supervised SISR methods reported in [22, 23]. Table 4 presents these average results, and as can be seen, DeepRED is on par with [22] for a scale factor of
. We should note that the DIP approach (with or without RED) has an important advantage over supervised regression methods: Whereas the later aim for a Minimum-Mean-Squared-Error estimation, DIP(+RED) is a Maximum-A’posteriori Probability estimate by definition, a fact that implies that a better expected perceptual quality at the cost of a reduced PSNR. This adds to the appeal of the DeepRED solution developed in this work, and explains the PSNR gap to the results in
[23].Set5 Super-Resolution Results (4:1) | ||||||
Algorithm | baby | bird | btrfly | head | woman | average |
DeepRED | 33.08 | 32.62 | 26.33 | 32.46 | 29.11 | 30.72 |
RED [FP-BM3D] | 33.38 | 32.66 | 24.03 | 32.62 | 28.46 | 30.23 |
DIP [Our Run] | 31.65 | 31.90 | 26.01 | 31.53 | 28.65 | 29.95 |
Set5 Super-Resolution Results (8:1) | ||||||
DeepRED | 28.93 | 27.05 | 20.04 | 30.06 | 24.09 | 26.04 |
RED [FP-BM3D] | 28.44 | 26.74 | 18.96 | 30.00 | 23.68 | 25.56 |
DIP [Our Run] | 28.36 | 27.01 | 20.10 | 29.85 | 23.89 | 25.84 |
Set14^{8}^{8}8We have used the 12 color images from this data-set.Super-Resolution Results (4:1) | |||||||||||||
Algorithm | bbn | barb. | c.grd | comic | face | flwrs | fr.man | lenna | mnrc | pepper | ppt3 | zebra | avg |
DeepRED | 22.51 | 25.76 | 26.00 | 22.74 | 32.37 | 27.29 | 29.70 | 31.62 | 30.76 | 31.10 | 24.97 | 26.78 | 27.63 |
RED [FP-BM3D] | 22.55 | 25.76 | 25.88 | 22.57 | 32.60 | 26.96 | 29.38 | 31.56 | 29.33 | 31.05 | 24.50 | 26.17 | 27.36 |
DIP [Our Run] | 22.21 | 25.53 | 25.82 | 22.46 | 31.48 | 26.55 | 29.38 | 30.86 | 30.27 | 30.52 | 24.75 | 26.04 | 27.16 |
Set14 Super-Resolution Results (8:1) | |||||||||||||
DeepRED | 21.33 | 24.02 | 23.98 | 20.05 | 29.95 | 23.51 | 25.38 | 28.12 | 25.34 | 27.91 | 20.69 | 21.03 | 24.28 |
RED [FP-BM3D] | 21.29 | 23.94 | 23.51 | 19.84 | 29.90 | 23.19 | 24.62 | 27.69 | 24.39 | 27.45 | 20.23 | 20.61 | 23.89 |
DIP [Our Run] | 21.18 | 24.01 | 23.74 | 19.95 | 29.65 | 23.32 | 25.00 | 27.92 | 24.85 | 27.99 | 20.59 | 20.98 | 24.10 |
We use this experiment to have a closer look at the numerical behavior of the proposed algorithm. For the image head from Set5, we present in Figure 3 the loss of DeepRED as given in Equation (4) as a function of the iteration number. As can be seen, there is a consistent descent. However, notice in the zoomed-in version of this graph the small fluctuations around this general descent behavior, which are due to the additional noise injected in each iteration. The same figure also shows the ADMM equality constraint gap (again, see Equation (4)). Clearly, this gap is narrowing, getting very close to the satisfaction of the constraint . The last graph shows the PSNR of the output image over the iterations. RED’s regularization tends to robustify the overall recovery algorithm against overfitting, which stands in contrast to the behavior of DIP alone. Similar qualitative graphs are obtained for various other images and applications, showing the same tendencies, and thus are omitted.
Super-Resolution Results (4:1) | ||
---|---|---|
Algorithm | Set5 | Set14 |
DIP | 29.95 | 27.16 |
DeepRED | 30.72 | 27.63 |
Lap | 31.58 | 28.43 |
SRR | 32.10 | 28.87 |
Super-Resolution Results (8:1) | ||
Algorithm | Set5 | Set14 |
DIP | 25.84 | 24.10 |
DeepRED | 26.04 | 24.28 |
Lap | 26.1 | 24.49 |
SRR | —- | —- |
Uniform Deblurring Results | |||||
Algorithm | Butterfly | Leaves | Parrots | Starfish | Average |
DeepRED | 31.22 | 31.12 | 31.55 | 31.00 | 31.22 |
DIP | 30.26 | 30.38 | 31.00 | 30.42 | 30.51 |
RED FP-TNRD | 30.41 | 30.13 | 31.83 | 30.57 | 30.74 |
NCSR Deblur | 29.68 | 29.98 | 31.95 | 30.28 | 30.47 |
Blurred | 19.07 | 18.28 | 23.87 | 22.56 | 20.94 |
Gaussian Deblurring Results | |||||
Algorithm | Butterfly | Leaves | Parrots | Starfish | Average |
DeepRED | 31.91 | 31.87 | 32.87 | 32.56 | 32.30 |
DIP | 31.21 | 31.51 | 31.91 | 31.83 | 31.62 |
RED FP-TNRD | 31.66 | 31.93 | 33.33 | 32.49 | 32.35 |
NCSR Deblur | 30.84 | 31.57 | 33.39 | 32.27 | 32.02 |
Blurred | 22.81 | 22.12 | 26.96 | 25.83 | 24.43 |
This experiment follows a similar one in [17], in which we are given a blurred and noisy image with a known degradation operator , and the goal is to restore the original image. We consider two cases: (i) A uniform blur, and (ii) A Gaussian blur of width . In both cases, the blurry image is further contaminated by white additive Gaussian noise with . We compare DeepRED results on color images^{9}^{9}9http://www4.comp.polyu.edu.hk/~cslzhang/NCSR.htm with DIP [13], RED [17] and NCSR Deblur [28]. The results are shown on table 5, and in addition, Figures 4, 5 and 6 present three sets of results from this experiment, showing clearly the benefit of the RED regularization effect. Looking at Table 5, while DeepRED performs very well and competitively, we draw the reader’s attention to the Gaussian blur results, where RED is performing slightly better. Note that whereas DeepRED employs a weak denoiser (NLM), RED is applied here with one of the strongest denoisers available, the TNRD [29].
DIP is a deep-learning-based unsupervised restoration algorithm of great appeal. This work offers a way to further boost its performance. Our solution relies on RED - the concept of regularizing inverse problems using an existing denoising algorithm. As demonstrated in this paper, DeepRED is a very effective machine for handling various inverse problems.
Further work is required in order to better understand and improve this scheme: (i) Both DIP and DeepRED should be sped-up in order to make them more practical and appealing. This may be within reach with alternative optimization strategies; (ii) Incorporating better denoisers within the RED scheme (perhaps deep-learning based ones) may lead to further boost in performance; (iii) A more thorough study of the regularization effect that DIP introduces may help in devising a complementary explicit regularization to add via RED, thereby getting a stronger effect and better performance; and (iv) Could we claim that DIP leads to a MAP estimation? Could the same be claimed about DeepRED? A more in-depth study of this riddle is central to the understanding of both these methods.
IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR)
, pages 3204–3213, 2018.Deep multi-scale convolutional neural network for dynamic scene deblurring.
In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pages 3883–3891, 2017.