Image reconstruction is one of the most widely studied problems in computational imaging. Since the problem is often ill-posed, the process is traditionally regularized by constraining the solutions to be consistent with our prior knowledge about the image. Some traditional imaging priors include nonnegativity, transform-domain sparsity, and self-similarity [1, 2, 3, 4]. Recently, however, the attention in the field has been shifting towards new imaging formulations based on deep learning .
The most common deep-learning approach is based on an end-to-end training of a convolutional neural network (CNN) for reproducing the desired image from its noisy measurements [6, 7, 8, 9, 10]. A popular alternative considers training a CNN as an image denoiser and using it within an iterative reconstruction algorithms [11, 12, 13, 14]. However, recently, it was also shown that a CNN can by itself regularize image reconstruction without data-driven training . This deep image prior (DIP)
framework naturally regularizes reconstruction by optimizing the weights of a CNN for it to synthesize the measurements from a given random input vector. The intuition behind DIP is that natural images can be well represented by CNNs, which is not the case for the random noise and certain other image degradations. DIP was shown to achieve remarkable performance on a number of image reconstruction tasks[15, 16].
In this paper, we propose to further improve DIP by combining an implicit CNN regularization with an explicit TV penalty. The idea of our DIP-TV approach is simple: by including an additional TV term into the objective function, we restrict the solutions synthesized by CNN to those that are piecewise smooth. We experimentally show that our DIP-TV method outperforms the traditional formulations of DIP and TV, and performs on a par with other state-of-the-art image restoration methods such as BM3D  and IRCNN .
Consider the restoration as a linear inverse problem
where the goal is to reconstruct an unknown image from the measurements . Here, is a degradation matrix and
corresponds to the measurement noise, which is assumed to be additive white Gaussian (AWGN) of variance.
As practical inverse problems are often ill-posed, it is common to regularize the task by constraining the solution according some prior knowledge. In practice, the reconstruction often relies on the regularized least-squares formulation
where the data-fidelity term ensures the consistency with measurements, and regularizer constrains the solution to the desired image class. The parameter controls the strength of regularization.
Total variation (TV) is one of the most widely used image priors that promotes sparsity in image in image gradients . It has been shown to be effective in a number of applications [19, 20, 21]. The -based anisotropic TV is given by
where and denote the finite difference operation along the first and second dimension of a two-dimensional (2D) image with appropriate boundary conditions.
where is the restored image, and represents the CNN parametrized by .
denotes the loss function. In practice, (425].
Recently, Ulyanov et al.  proposed to use CNN-based methods in an alternative way. They discovered that the architecture of deep CNN models is well-suited for representing natural images, but not random noise. With a random input vector, CNN can reproduce the clear image without supervised training on a large dataset. In the context of image restoration, the associated optimization for DIP can be formulated as
denotes the random input vector. The CNN generator is initialized with random variables, and these variables are iteratively optimized so that the output of the network is as close to the target measurement as possible.
3 Proposed Method
The goal of DIP-TV is to use the TV regularization to improve the basic DIP approach. We first consider the optimization problem shown in (2) and the objective function of DIP in (5). One can find that the term in (5) actually corresponds to the data-fidelity term in (3) by replacing with an unknown image output. Thus, we can consider replacing (5) with an optimization problem
Optimization in (6) is similar to training of a CNN and one can rely on any standard optimization algorithms.
Figure 3 illustrates the CNN architecture we used in this paper, which was adapted from . In particular, the popular U-net architecture  is modified such that the skip connections contain a convolutional layer. The decoder uses a down-sampling and up-sampling based scaling-expanding structure, which makes the effective receptive field of the network increase as the input goes deeper into the network . Besides, the skip connection enables the later layers to reconstruct the feature maps with both local details and global texture. Here, the input can be initialized with uniform noise and be further optimized. The proposed framework can deal with both grayscale and color images, where for color images anisotropic TV jointly regularizes all three channels.
We now present the experimental results on image denoising and deblurring. We consider 14 gray scale images and 8 standard color images ( and ) from set12, set14, and BSD68 as our testing images. The gray scale images are shown in Figure 1, while color images are: Monarch, Parrots, House, Lena, Peppers, Baby, and Jet.
|Input SNR = 5 dB / 76.26|
|Input SNR = 10 dB / 53.43|
|Input SNR = 15 dB / 30.02|
|Input SNR = 20 dB / 14.24|
|Input SNR = 25 dB / 5.12|
4.1 Image Denoising
In this subsection, we analyze the performance of DIP-TV method for image denoising problems. The CNN architecture in Figure 3 is used for both color and grayscale images, with
for each skip layers. All algorithmic hyperparameters were optimized in each experiment for the best signal-to-noise ratio (SNR) performance with respect to the ground truth test image. Both DIP-TV and DIP were set to run 5000 optimization step. We use theaverage SNR to denote the SNR values averaged over the associated set of test images.
We first present the results of the experiments on grayscale images, where we compared DIP-TV with EPLL , BM3D , TV  and DIP . In order to directly evaluate the range of noise levels that DIP-TV performs better, the input SNR to output SNR relationships are presented in Table 1. The grayscale images were corrupted by AWGN corresponding to input SNR of 5 dB, 10 dB, 15 dB, 20 dB, 25 dB, respectively. In particular, DIP-TV outperforms original DIP by around 0.5 dB for a wide range of noise levels from 5 dB to 20 dB. Note that the proposed method also bridge the gap between DIP and the state-of-the-art methods in high noise levels. Figure 4 illustrates the visual comparisons for grayscale images Tower and Jet under two different noise levels, respectively. The DIP-TV significantly promotes the denoising performance of DIP itself in terms of both visual qualities and SNR. The noise is effectively filtered out and the details of the image are preserved because of the TV regularization. For instance, DIP-TV improves the SNR with respect to Tower by over 1.06 dB against DIP, and outperforms BM3D by 0.35 dB. Visually, the door highlighted in Tower is clearly restored, while other methods bring serious distortion to it.
In color image denoising, we compared our method with CBM3D  and NLM  as well as DIP itself. We considered AWGN corresponding to variance from 25 to 75. Figure 2 compares the SNR performance of CBM3D, DIP, and DIP-TV on the image Monarch. Table 2 summaries the average SNR among different methods. Overall, DIP-TV exceeds DIP by at least 0.2 dB on the testing images. Moreover, DIP-TV outperforms CBM3D with the increase of noise level (e.g. ). Considering that the whole procedure of DIP-TV and DIP are image-agnostic and no prior information is learned from other images, it is notable that DIP-TV achieves comparable performance to the state-of-the-art for high noise levels.
4.2 Image Deblurring
In image deblurring, one is given an blurry image which is synthesized by firstly applying blur kernel and then adding AWGN with noise level ; The goal is to restore the image from the degraded ones. We tested DIP and DIP-TV based on the network architecture illustrated in , with .
|Methods||= 25||= 35||= 45||= 55||= 65||= 75|
Both DIP and DIP-TV were set to run 5500 optimization step. Taking advantage of recent progress in CNN and the merit of GPU computation, here we utilized convolution to implement the blur. As a baseline, we compared our method with IRCNN 
and DIP itself based on the same set of images in denoising. Two blur kernels were applied, including a general Gaussian kernel with standard deviation 1.6 as well as a realistic kernel defined in. Different AWGN of is added in each experiment.
Figure 5 shows the visual results for Peppers obtained by different methods. All methods can effectively remove the blurry and noise from the image. Particularly, our method further enhance the piecewise-smoothness and mitigate the noise of the image, and thus increases the peak-signal-to-noise ratio (PSNR) by over 0.45 dB against DIP. Also note that the aid of TV regularization makes DIP even outperform IRCNN by 0.15 dB on Peppers. Table 3 reports the average PSNR compassion with IRCNN and DIP on color and gray scale images, repectively.
|Gaussian blur with standard deviation 1.6|
|Kernel 1 ( )|
In general, the improvement by TV regularization outperforms DIP by at least 0.54 dB in terms of PSNR and makes the DIP framework more comparable with IRCNN. For example, DIP-TV is only 0.01 dB lower than IRCNN in terms of the average PSNR on color images, with standard Gaussian blur kernel and .
This work has presented a simple method, namely DIP-TV, to improve the deep image prior framework, leading to promising performance, equivalent to and sometimes surpassing recently published leading alternatives, such as BM3D and IRCNN. The proposed method is based on the recent idea that a CNN model itself can act as a prior on images and improve sparsity promoting priors via the -norm penalty on the image gradient. The results on images denoising and deblurring demonstrate that TV regularization can further improve on DIP and provides high-quality results.
-  L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D, vol. 60, no. 1–4, pp. 259–268, November 1992.
-  M. A. T. Figueiredo and R. D. Nowak, “Wavelet-based image estimation: An empirical bayes approach using Jeffreys’ noninformative prior,” IEEE Trans. Image Process., vol. 10, no. 9, pp. 1322–1331, September 2001.
-  M. Elad and M. Aharon, “Image denoising via sparse and redundant representations over learned dictionaries,” IEEE Trans. Image Process., vol. 15, no. 12, pp. 3736–3745, December 2006.
-  A. Danielyan, V. Katkovnik, and K. Egiazarian, “BM3D frames and variational image deblurring,” IEEE Trans. Image Process., vol. 21, no. 4, pp. 1715–1728, April 2012.
-  Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, pp. 436–444, May 28, 2015.
-  A. Mousavi, A. B. Patel, and R. G. Baraniuk, “A deep learning approach to structured signal recovery,” in Proc. Allerton Conf. Communication, Control, and Computing, Allerton Park, IL, USA, September 30-October 2, 2015, pp. 1336–1343.
-  K. H. Jin, M. T. McCann, E. Froustey, and M. Unser, “Deep convolutional neural network for inverse problems in imaging,” IEEE Trans. Image Process., vol. 26, no. 9, pp. 4509–4522, Sept. 2017.
-  Y. S. Han, J. Yoo, and J. C. Ye, “Deep learning with domain adaptation for accelerated projection reconstruction MR,” 2017, arXiv:1703.01135 [cs.CV].
-  Y. Sun, Z. Xia, and U. S. Kamilov, “Efficient and accurate inversion of multiple scattering with deep learning,” Opt. Express, vol. 26, no. 11, pp. 14678–14688, May 2018.
-  D. Lee, J. Yoo, S. Tak, and J. C. Ye, “Deep residual learning for accelerated MRI using magnitude and phase networks,” IEEE Trans. Biomed. Eng., vol. 65, no. 9, pp. 1985–1995, Sept. 2018.
-  T. Meinhardt, M. Moeller, C. Hazirbas, and D. Cremers, “Learning proximal operators: Using denoising networks for regularizing inverse imaging problems,” in Proc. IEEE Int. Conf. Comp. Vis. (ICCV), Venice, Italy, October 22-29, 2017, pp. 1799–1808.
-  K. Zhang, W. Zuo, S. Gu, and L. Zhang, “Learning deep CNN denoiser prior for image restoration,” in , 2017.
-  Y. Romano, M. Elad, and P. Milanfar, “The little engine that could: Regularization by denoising (RED),” SIAM J. Imaging Sci., vol. 10, no. 4, pp. 1804–1844, 2017.
-  Y. Sun, B. Wohlberg, and U. S. Kamilov, “An online plug-and-play algorithm for regularized image reconstruction,” 2018, arXiv:1809.04693 [cs.CV].
-  D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Deep image prior,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, June 18-22, 2018, pp. 9446–9454.
-  D. Van Veen, A. Jalal, E. Price, S. Vishwanath, and A. G. Dimakis, “Compressed sensing with deep image prior and learned regularization,” 2018, arXiv:1806.06438 [stat.ML].
-  K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3-D transform-domain collaborative filtering,” IEEE Trans. Image Process., vol. 16, no. 16, pp. 2080–2095, August 2007.
-  L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D: nonlinear phenomena, vol. 60, no. 1-4, pp. 259–268, 1992.
-  M. Persson, D. Bone, and H. Elmqvist, “Total variation norm for three-dimensional iterative reconstruction in limited view angle tomography,” Phys. Med. Biol., vol. 46, no. 3, pp. 853–866, 2001.
-  M. Lustig, D. L. Donoho, and J. M. Pauly, “Sparse MRI: The application of compressed sensing for rapid MR imaging,” Magn. Reson. Med., vol. 58, no. 6, pp. 1182–1195, December 2007.
-  U. S. Kamilov, I. N. Papadopoulos, M. H. Shoreh, A. Goy, C. Vonesch, M. Unser, and D. Psaltis, “Optical tomographic image reconstruction based on beam propagation and sparse regularization,” IEEE Transactions on Computational Imaging, vol. 2, no. 1, pp. 59–70, 2016.
-  M. Egmont-Petersen, D. de Ridder, and H. Handels, “Image processing with neural networks—a review,” Pattern recognition, vol. 35, no. 10, pp. 2279–2301, 2002.
-  J. Xie, L. Xu, and E. Chen, “Image denoising and inpainting with deep neural networks,” in Advances in neural information processing systems, 2012, pp. 341–349.
-  K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising,” IEEE Trans. Image Process., vol. 26, no. 7, pp. 3142–3155, July 2017.
-  D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
-  O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241.
-  K. H. Jin, M. T. McCann, E. Froustey, and M. Unser, “Deep convolutional neural network for inverse problems in imaging,” IEEE Transactions on Image Processing, vol. 26, no. 9, pp. 4509–4522, 2017.
-  A. Levin, Y. Weiss, F. Durand, and WT. Freeman, “Understanding and evaluating blind deconvolution algorithms,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, (CVPR), 2009.
-  D. Zoran and Y. Weiss, “From learning models of natural image patches to whole image restoration,” in Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011, pp. 479–486.
-  A. Beck and M. Teboulle, “Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems,” IEEE Transactions on Image Processing, vol. 18, no. 11, pp. 2419–2434, 2009.
-  A. Buades, B. Coll, and J. Morel, “A non-local algorithm for image denoising,” in Computer Vision and Pattern Recognition, (CVPR), 2005. IEEE Computer Society Conference on. IEEE, 2005, vol. 2, pp. 60–65.