Model-blind Video Denoising Via Frame-to-frame Training

by   Thibaud Ehret, et al.

Modeling the processing chain that has produced a video is a difficult reverse engineering task, even when the camera is available. This makes model based video processing a still more complex task. In this paper we propose a fully blind video denoising method, with two versions off-line and on-line. This is achieved by fine-tuning a pre-trained AWGN denoising network to the video with a novel frame-to-frame training strategy. Our denoiser can be used without knowledge of the origin of the video or burst and the post processing steps applied from the camera sensor. The on-line process only requires a couple of frames before achieving visually-pleasing results for a wide range of perturbations. It nonetheless reaches state of the art performance for standard Gaussian noise, and can be used off-line with still better performance.



There are no comments yet.


page 1

page 7

page 12

page 13

page 14

page 15

page 16


Self-Supervised training for blind multi-frame video denoising

We propose a self-supervised approach for training multi-frame video den...

FastDVDnet: Towards Real-Time Video Denoising Without Explicit Motion Estimation

In this paper, we propose a state-of-the-art video denoising algorithm b...

ViDeNN: Deep Blind Video Denoising

We propose ViDeNN: a CNN for Video Denoising without prior knowledge on ...

Practical Blind Denoising via Swin-Conv-UNet and Data Synthesis

While recent years have witnessed a dramatic upsurge of exploiting deep ...

FBI-Denoiser: Fast Blind Image Denoiser for Poisson-Gaussian Noise

We consider the challenging blind denoising problem for Poisson-Gaussian...

Camera Fingerprint Extraction via Spatial Domain Averaged Frames

Photo Response Non-Uniformity (PRNU) based camera attribution is an effe...

Learning Model-Blind Temporal Denoisers without Ground Truths

Denoisers trained with synthetic data often fail to cope with the divers...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Denoising is a fundamental image and video processing problem. While the performance of denoising methods and imaging sensors has steadily improved over decades of research, new challenges have also appeared. High-end cameras still acquire noisy images in low lightning conditions. High speed video cameras use short exposure times, reducing the SNR of the captured frames. Cheaper, lower quality sensors are used extensively, for example in mobile phones or surveillance cameras, and require denoising even with a good scene illumination.

A plethora of approaches have been proposed for image and video denoising: PDE and variational methods [36, 7], bilateral filters [41], domain transform methods [31, 33], non-local patch-based methods [3]. In the last decade, most research focused on modeling image patches [51, 45], [15] or groups of similar patches [13, 27, 22, 17], [5]

. Recently the focus has shifted towards neural networks.

The first neural network with results competitive with patch-based methods was introduced in [5], and consisted of a fully connected network trained to denoise image patches. More recently, [47] proposed a deep CNN with 17 to 20 convolutional layers with filters and reported a significant improvement over the state-of-the-art. The authors also trained a blind denoising network that can denoise an image with an unknown noise level , and a multi-task network that can handle blindly three types of noise. A lighter version of DnCNN was proposed in [49]

, which allows a spatially variant noise variance by adding the noise variance map

as an additional input. The architectures of DnCNN and FFDnet keep the image size throughout the network. Other networks have been proposed [30, 37, 8] that use pulling and up-convolutional layers in a U-shaped architecture [35]. Other works proposed neural networks with an architecture obtained by unrolling optimization algorithms such as those used for MAP inference with MRFs probabilistic models [2, 38, 11, 43]. For textures formed by repetitive patterns, non-local patch-based methods still perform better than “local” CNNs. To remedy this, some attempts have been made to include the non-local patch similarity in a CNN framework [34, 11, 24, 44, 12].

The most widely adopted assumption in the literature is that of additive white Gaussian noise (AWGN). This is justified by the fact that the noise generated by the photon count process at the imaging sensor can be modeled as Poisson noise, which in turn can be approximated by AWGN after a variance stabilizing transform (VST) [1, 29, 28]. However in many practical applications the data available is not the raw data straight from the sensor. The camera output is the result of a processing pipeline, which can include quantization, demosaicking, gamma correction, compression, etc. The noise at the end of the pipeline is spatially correlated and signal dependent, and it is difficult to model. Furthermore the details of the processes undergone by an image or video are usually unknown. To make things even more difficult, a large amount of images and video are generated by mobile phone applications which apply their own processing of the data (for example compression, of filter or effects selected by the user). The specifics of this processing are unknown, and might change with different releases.

The literature addressing this case is much more limited. The works [23, 16] address denoising noisy compressed images. RF3D [26] handles correlated noise in infrared videos. Data-driven approaches provide an interesting alternative when modelling is not challenging. CNNs have been applied successfully to denoise images with non-Gaussian noise [47, 9, 18]. In applications in which the noise type is unknown, one could use model-blind networks such as DnCNN-3 [48] trained to denoise several types of noise, or the blind denoiser of [18]. These however have two important limitations. First, the performance of such model-blind network drops with respect to model-specific networks [48]. Second, training the network requires a dataset of images corrupted with each type of noise that we wish to remove (or the ability to generate it synthetically [18]). Generating ground truth data for real photographs is not straightforward [32, 9]. Furthermore, in many occasions we do not have access to the camera, and a single image or a video is all that we have.

In this work we show that, for certain kinds of noise, in the context of video denoising one video is enough: a network can be trained from a single noisy video by considering the video itself as a dataset. Our approach is inspired by two works: the one-shot object segmentation method [6] and the noise-to-noise training proposed in the context of denoising by [25].

The aim of one-shot learning is to train a classifier network to classify a new class with only a very limited amount of labeled examples. Recently Caelles

et al[6] suggested a one-shot framework for object segmentation in video, where an object is manually segmented on the first frame and the objective is to segment it in the rest of the frames. Their main contribution is the use of a pre-trained classification network, which is fine-tuned to a manual segmentation of the first frame. This fine-tuned network is then able to segment the object in the rest of the frames. This generalizes the one-shot principle from classification to other types of problems. Borrowing the concept from [6], our work can be interpreted as a one-shot blind video denoising method: a network can denoise an unseen noise type by fine-tuning it to a single video. In our case however, we do not require “labels” (i.e. the ground truth images without noise). Instead, we benefit from the noise-to-noise training proposed by [25]: a denoising network can be trained by penalizing the loss between the predicted output given a noisy and a second noisy version of the same image, with an independent realization of the noise. We benefit from the temporal redundancy of videos and use the noise-to-noise training between adjacent frames to fine-tune a pre-trained denoising network. That is, the network is trained by minimizing the error between the predicted frame and the past (or future) frame. The noise used to pre-train the network can be very different from the type of noise in the video.

We present the different tools, namely one of the state-of-the-art denoising network DnCNN [48] and a training principle for denoising called noise2noise [25], necessary to derive our refined model in Section 2. We present our truly blind denoising principle in Section 3. We compare the quality of our blind denoiser to the state of the art in Section 4. Finally we conclude and open new perspectives for this type of denoising in Section 5.

2 Preliminaries

The proposed model-blind denoiser builds upon DnCNN and the noise-to-noise training. In this section we provide a brief review of these works, plus some other related work.

2.1 DnCNN

DnCNN [48] was the first neural network to report a significant improvement over patch-based methods such as BM3D [13] and WNNM [17]. It has a simple architecture inspired by the VGG network [39], consisting of 17 convolutional layers. The first layer consists of 64

followed by ReLU activations and outputs

feature maps. The next 15 layers also compute 64

convolutions, followed by batch normalization

[19] and ReLU. The output layer is simply a convolutional layer.

To improve training, in addition to the batch normalization layers, DnCNN uses residual learning, which means that network is trained to predict the noise in the input image instead of the clean image. The intuition behind this is that if the mapping from the noisy input to the clean target is close to the identity function, then it is easier for the network to learn the residual mapping, .

DnCNN provides state-of-the-art image denoising for Gaussian noise with a rather simple architecture. For this reason we will use it for all our experiments.

2.2 Noise-to-noise training

The usual approach for training a neural network for denoising (or other image restoration problems) is to synthesize a degraded image from a clean one according to a noise model. Training is then achieved by minimizing the empirical risk which penalizes the loss between the network prediction and the clean target . This method cannot be applied for many practical cases where the noise model is not known. In these settings, noise can not be synthetically added to a clean image. One can generate noisy data by acquiring it (for example by taking pictures with a camera), but the corresponding clean targets are unknown, or are hard to acquire [10, 32].

Lehtinen et al[25] recently pointed out that for certain types of noise it is possible to train a denoising network from pairs of noisy images corresponding to the same clean underlying data and independent noise realizations, thus eliminating the need for clean data. This allows to learn networks for noise that cannot be easily modeled (an appropriate choice of the loss is still necessary though so that the network converge to a good denoising).

Assume that the pairs are distributed according to

. For a dataset of infinite size, the empirical risk of an estimator

converges to the Bayesian risk, i.e. the expected loss: . The optimal estimator depends on the choice of the loss. From Bayesian estimation theory [20] we know that:111

The median and mode are taken element-wise. For a continuous random variable the

-loss is defined as a limit. See [20] and [25].


Here denotes by the expectation of the posterior distribution given the noisy observation . During training, the network learns to approximate the mapping .

The key observation leading to noise-to-noise training is that the same optimal estimators apply when the loss is computed between and , a second noisy version of . In this case we obtain the mean, median and mode of the posterior . Then, for example if the noise is such that , then the network can be trained by minimizing the MSE loss between and a second noisy observation . If the median (resp. the mode) is preserved by the noise, then the loss (resp. the ) loss can be used.

3 Model-blind video denoising

In this section we show how one can use a pre-trained denoising network learned for an arbitrary noise and fine-tune it to other target noise types using a single video sequence, attaining the same performance as a network trained specifically for the target noise. This fine tuning can be done off-line (using the whole video as a dataset) or on-line, i.e. frame-by-frame, depending on the application and the computational resources at hand.

Our approach is inspired by the one-shot video object segmentation approach of [6], where a classification network is fine-tuned using the manually segmented first frame, and then applied to the other frames. As opposed to the segmentation problem, we do not assume that we have a ground truth (clean frames). Instead, we adapt the noise-to-noise training to a single video.

We need pairs of independent noisy observations of the same underlying clean image. For that we take advantage of the temporal redundancy in videos: we consider consecutive frames as observations of the same underlying clean signal transformed by the motion in the scene. To account for the motion we need to estimate it and warp one frame to the other. We estimate the motion using an optical flow. We use the TV-L1 optical flow [46] with an implementation available in [40]. This method is reasonably fast and is quite robust to noise when the flow is computed at a coarser scale.

Let us denote by the optical flow from frame to frame . The warped is then

(we use bicubic interpolation). Similarly, we define the warped clean frame

. We assume

  1. that the warped clean frame matches , i.e. , and

  2. that the noise of consecutive frames is independent.

Occluded pixels in the backward flow from to do not have a correspondence in frame . Nevertheless, the optical flow assigns them a value. We use a simple occlusion detector to eliminate these false correspondences from our loss. A simple way to detect occlusions is to determine regions where the divergence of the optical flow is large [4]. We therefore define a binary occlusion mask as


Pixels with an optical flow that points out of the image domain are considered occluded. In practice, we compute a more conservative occlusion mask by dilating the result of Eq. (4).

We then compute the loss masking out occluded pixels. For example, for the loss we have:


Similarly one can define masked versions of other losses. For all the experiments shown we used the masked loss since it has better training properties than the (as has been demonstrated in [50]). In the noise-to-noise setting, the choice of the loss depends on the properties of the noise [25]. All the noise types considered in this work preserve the median of the posterior distribution, which justifies the use of an .

We now have pairs of images and the corresponding occlusions masks and we apply the noise-to-noise principle to fine-tune the network on this dataset. In order to increase the number of training samples the symmetric warping can also be done, i.e. warping to using the forward optical flow from to . This allows to double the amount of data used for the fine tuning. We consider two settings: off-line and on-line training.

Off-line fine-tuning.

We denote the network as a parametrized function , where

is the parameter vector. In the off-line setting we fine-tune the network parameters

by doing a fixed number of steps of the minimization of the masked loss over all frames in the video:


where by we denote an operator which does optimization steps of function starting from and following a given optimization algorithm (for instance gradient descent, Adam [21], etc.). The initial condition for the optimization is the parameter vector of the pre-trained network. The fine-tuned network is then applied to the rest of the video.

On-line fine-tuning

In the on-line setting we train the network in a frame-by-frame fashion. As a consequence we denoise each frame with a different parameter vector . At frame we compute by doing optimization steps corresponding to the minimization of the loss between frames and :


The initial condition for this iteration is given by the fine-tuned parameter vector at the previous frame . The first frame is denoised using the pre-trained network. The fine-tuning starts for the second frame. A reasonable concern is that the network overfits the given realization of the noise and the frame at each step. This is indeed the case if we use a large number of optimization iterations at a single frame. A similar behavior is reported in [42], which trains a network to minimize the loss on a single data point. We prevent this from happening by using a small number of iterations (e.g. ). We have observed that the parameters fine-tuned at can be applied to denoise any other frame without any significant drop in performance.

4 Experiments

In this section we demonstrate the flexibility of the proposed fine-tuning blind denoising approach with several experimental results. For all these experiments the starting point for the fine-tuning process is a DnCNN network trained for an additive white Gaussian noise of standard variation . In all cases we use the same hyper-parameters for the fine tuning: a learning rate of and iterations of the Adam optimizer. For the off-line case we use the entire video. The videos used in this section come from Derf’s database222 They’ve been converted to grayscale by averaging the three color channels and downscaled by a factor two in each direction to ensure that they contain little to no noise. The code and data to reproduce the results presented in this section are available on

To the best of our knowledge there is not any other blind video denoising method in the literature. We will compare with state-of-the-art methods on different types of noise. Most methods have been crafted (or trained) for a specific noise model and often a specific noise level. We will also compare with an image denoising method proposed by Lebrun et al[23] which assumes a Gaussian noise model with variance depending on the intensity and the local frequency of the image. This model was proposed for denoising of compressed noisy images. We cannot compare with some more recent blind denoising methods, such as [10], because there is no code available. We will compare with DnCNN [48] and VBM3D [14]. VBM3D is a video denoising altorithm. All the other methods are image denoising applied frame-by-frame (perspectives for videos are mentioned in Section 5).

The first experiment is to check that our fine-tuning does not deteriorate a well trained network (for example by overfitting). We applied the proposed learning process to a sequence contaminated with AWGN with standard deviation

, which is precisely the type of noise the network was trained on. The per-frame PSNR is presented in Figure 2. The off-line fine-tuning performs on par with the pre-trained network. The PSNR of the on-line process has a higher variance, with some significant drops for some frames.

Figure 2: The fine-tuning process is done on a sequence corrupted by an additive Gaussian noise of standard deviation ; this is the noise that the network DnCNN 25 was trained on. The process doesn’t reduce the performance in this case.

In Figure 3 we show the results obtained still with Gaussian noise, but with . The main point of this experiment is to be able to compare with a reference, namely a DnCNN network trained with . First, we can see that both fine-tuned networks perform better than the pre-trained network for , if fact their performance is as good as the DnCNN network trained specifically for (in fact the off-line trained actually performs slightly better than the reference network). Our process also outperforms the “noise clinic” of [23].

Figure 3: The fine-tuning process is applied on a sequence corrupted by an additive Gaussian noise with standard deviation . The fine-tuned network (both online and batch) performs as well as a network trained specifically for this noise!

We have also tested the proposed fine-tuning on other types of noise. Figure 4 shows the results for multiplicative Gaussian noise:

where the noise has a standard deviation of (the images are withing the range [0,1]). With this model, the variance depends on the pixel intensity . Results with correlated Gaussian noise of standard deviation (obtained by convolving an additive white Gaussian noise with a disk kernel) are shown in Figure 5. We also show results (Figure 6) with the salt and pepper uniform noise used in [25]

, obtained by replacing with probability

the value of a pixel with a value sampled uniformly in . Finally we show in Figure 7 results for JPEG compressed Gaussian noise, obtained by compressing an image corrupted by an AWGN of with JPEG. The last one is particularly interesting because it is a realistic use case for which the noise model is then hard to estimate. While in this case the noise can be generated synthetically for training a network over a dataset, this is not possible with other compression tools (for example for proprietary technologies). We can see the effectiveness of the fine-tuning in all examples. The off-line training is more stable (smaller variance) and gives slightly better results, although the difference is small.

A visual comparisons with other methods is shown in Figure 8 for JPEG compressed noise and in Figure 9 for AWGN with . The results of the fine-tuned network has no visible artifacts and produces a visually pleasing result even though the network has never seen this type of noise before the fine-tuning.

In Tables 1 and 2 we show the PSNR of the results obtained on 4 sequences for AWGN of and JPEG compressed AWGN of and compression factor . For the case of AWGN the fine-tuned networks attain the performance of the DnCNN trained for that specific noise. For JPEG compressed Gaussian noise, the fine-tuned network is on average above the pre-trained network.

Figure 4: The fine-tuning process is applied on a sequence corrupted by a multiplicative Gaussian noise with standard deviation . The fine-tuned network (both online and batch) outperforms the original network and the noise clinic by almost 4dB on average!
Figure 5: The fine-tuning process is applied on a sequence corrupted by a correlated Gaussian noise of standard deviation . The fine-tuned network (both online and batch) performs better than the noise clinic and the original network. On this example the online trained performs slightly worse than the batch trained.
Figure 6: The fine-tuning process is applied on a sequence corrupted by a uniform salt and pepper noise (see text for the exact definition). The fine-tuned network (both online and batch) performs better than the original network by about 2.5dB on average. On this example the online trained has a high variance while the batch trained is very stable.
Figure 7: The fine-tuning process is applied on a sequence corrupted by an additive Gaussian noise of standard deviation which was then compressed using JPEG. The fine-tuned network (both online and batch) performs better than the original network by almost 1dB on average. The noise clinic has difficulties estimating the noise in the frames.
Figure 8: Example of denoising of an image corrupted by a JPEG compressed Gaussian noise. The fine-tuned network doesn’t produce any visible artifacts, contrary to the original DnCNN used for the fine tuning process. From left to right, top to bottom: Noisy, fine-tuned, VBM3D, ground truth, DnCNN trained for a Gaussian noise, noise clinic.
Figure 9: Example of denoising of an image corrupted by a Gaussian noise of standard deviation . The fine-tuned network doesn’t produce any visible artifact, the results are comparable to a DnCNN trained for this particular type of noise. From left to right, top to bottom: Noisy, fine-tuned, DnCNN trained for a Gaussian noise with , VBM3D, ground truth, noise clinic, DnCNN trained for a Gaussian noise with .
Method pedestrian area crowd run touchdown pass station Average
DnCNN 25 28.06 28.07 28.05 28.04 28.06
DnCNN 50 32.81 30.51 33.23 32.07 32.16
Online fine-tuned 32.77 30.47 33.15 32.01 32.10
Batch fine-tuned 32.89 30.54 33.24 32.26 32.23
VBM3D 29.96 25.35 30.24 29.35 28.73
Noise Clinic 29.67 29.17 29.17 29.70 29.43
Table 1: PSNR values for 4 sequences with AGWN of standard deviation .
Method pedestrian area crowd run touchdown pass station Average

DnCNN 25
33.60 30.76 33.46 32.65 32.62
Online fine-tuned 34.14 30.86 34.15 33.09 33.06
Batch fine-tuned 34.40 30.88 34.05 33.25 33.15
VBM3D 34.16 28.95 33.83 33.53 32.62
Noise Clinic 30.63 29.73 30.46 30.24 30.27
Table 2: PSNR values on JPEG compressed AWGN noise with and compression factor .

Figure 10 shows the impact of stopping on-line fine-tuning at a frame , and using to process the remaining frame. We can see that the more frames are used for the fine-tuning the better the performance.

Figure 10: The online fine-tuning process is stopped after a specific number of frame. The more frames the better the performance, but even just a couple of frames already improve over the pre-trained network. This experiment was done using the salt and pepper noise.

5 Discussion and perspectives

Denoising methods based on deep learning often require large datasets to achieve state-of-the-art performance. Lehtinen

et al. [25] pointed out that in many cases the clean ground truth images are not necessary, thus simplifying the acquisition of the training datasets. With the framework presented in this paper we take a step further and show that a single video is often enough, removing the need for a dataset of images. By applying a simple frame-to-frame training on a generic pre-trained network (for example a DnCNN network trained for additive Gaussian noise with fixed standard deviation), we successfully denoise a wide range of different noise models even though the network has never seen the video nor the noise model before its fine-tuning. This opens the possibility to easily process data from any unknown origin.

We think that the current fine tuning process can still be improved. First, given that the application is video denoising, it is expected that better results will be achieved by a video denoising network (the DnCNN network processes each frame independent of the others). Using the temporal information could improve the denoising quality, just like video denoising methods improve over frame-by-frame image denoising methods, but also might stabilize the variance of the result for the on-line fine-tuning.


  • [1] F. J. Anscombe. The transformation of poisson, binomial and negative-binomial data. Biometrika, 35(3/4):246–254, 1948.
  • [2] A. Barbu. Training an active random field for real-time image denoising. IEEE Transactions on Image Processing, 18(11):2451–2462, Nov 2009.
  • [3] A. Buades, B. Coll, and J.-M. Morel. A non-local algorithm for image denoising. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 2, pages 60–65. IEEE, 2005.
  • [4] A. Buades, J.-L. Lisani, and M. Miladinović. Patch-based video denoising with optical flow estimation. IEEE Transactions on Image Processing, 25(6):2573–2586, June 2016.
  • [5] H. C. Burger, C. J. Schuler, and S. Harmeling. Image denoising: Can plain neural networks compete with bm3d? In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 2392–2399, June 2012.
  • [6] S. Caelles, K.-K. Maninis, J. Pont-Tuset, L. Leal-Taixé, D. Cremers, and L. Van Gool. One-shot video object segmentation. In Computer Vision and Pattern Recognition (CVPR), 2017.
  • [7] A. Chambolle and P.-L. Lions. Image recovery via total variation minimization and related problems. Numerische Mathematik, 76(2):167–188, 1997.
  • [8] C. Chen, Q. Chen, J. Xu, and V. Koltun. Learning to see in the dark. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
  • [9] C. Chen, Q. Chen, J. Xu, and V. Koltun. Learning to see in the dark. arXiv preprint arXiv:1805.01934, 2018.
  • [10] J. Chen, J. Chen, H. Chao, and M. Yang. Image blind denoising with generative adversarial network based noise modeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3155–3164, 2018.
  • [11] Y. Chen and T. Pock. Trainable Nonlinear Reaction Diffusion: A Flexible Framework for Fast and Effective Image Restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6):1256–1272, 6 2017.
  • [12] C. Cruz, A. Foi, V. Katkovnik, and K. Egiazarian.

    Nonlocality-reinforced convolutional neural networks for image denoising.

    IEEE Signal Processing Letters, 25(8):1216–1220, Aug 2018.
  • [13] K. Dabov and A. Foi. Image denoising with block-matching and 3D filtering. Electronic …, 6064:1–12, 2006.
  • [14] K. Dabov, A. Foi, and K. Egiazarian. Video denoising by sparse 3D transform-domain collaborative filtering. In EUSIPCO, 2007.
  • [15] M. Elad and M. Aharon. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image processing, 15(12):3736–3745, 2006.
  • [16] M. Gonzalez, J. Preciozzi, P. Muse, and A. Almansa. Joint denoising and decompression using cnn regularization. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2018.
  • [17] S. Gu, L. Zhang, W. Zuo, and X. Feng. Weighted nuclear norm minimization with application to image denoising. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2862–2869, 2014.
  • [18] S. Guo, Z. Yan, K. Zhang, W. Zuo, and L. Zhang. Toward convolutional blind denoising of real photographs. arXiv preprint arXiv:1807.04686, 2018.
  • [19] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
  • [20] S. Kay. Fundamentals of statistical processing, volume i: Estimation theory: Estimation theory v. 1, 1993.
  • [21] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  • [22] M. Lebrun, A. Buades, and J.-M. Morel. A nonlocal bayesian image denoising algorithm. SIAM Journal on Imaging Sciences, 2013.
  • [23] M. Lebrun, M. Colom, and J.-M. Morel. The noise clinic: a blind image denoising algorithm. Image Processing On Line, 5:1–54, 2015.
  • [24] S. Lefkimmiatis. Non-local color image denoising with convolutional neural networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5882–5891, July 2017.
  • [25] J. Lehtinen, J. Munkberg, J. Hasselgren, S. Laine, T. Karras, M. Aittala, and T. Aila. Noise2noise: Learning image restoration without clean data. arXiv preprint arXiv:1803.04189, 2018.
  • [26] M. Maggioni, E. Sánchez-Monge, and A. Foi. Joint removal of random and fixed-pattern noise through spatiotemporal video filtering. IEEE Transactions on Image Processing, 23(10):4282–4296, 2014.
  • [27] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman. Non-local sparse models for image restoration. In Computer Vision, 2009 IEEE 12th International Conference on, pages 2272–2279. IEEE, 2009.
  • [28] M. Makitalo and A. Foi. A closed-form approximation of the exact unbiased inverse of the anscombe variance-stabilizing transformation. IEEE transactions on image processing, 20(9):2697–2698, 2011.
  • [29] M. Makitalo and A. Foi. Optimal inversion of the anscombe transformation in low-count poisson image denoising. IEEE transactions on Image Processing, 20(1):99–109, 2011.
  • [30] X. Mao, C. Shen, and Y.-B. Yang. Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 2802–2810. Curran Associates, Inc., 2016.
  • [31] P. Moulin and J. Liu. Analysis of multiresolution image denoising schemes using generalized Gaussian and complexity priors. Information Theory, IEEE Transactions on, 45(3):909–919, Apr 1999.
  • [32] T. Plotz and S. Roth. Benchmarking Denoising Algorithms with Real Photographs. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2750–2759. IEEE, jul 2017.
  • [33] J. Portilla, V. Strela, M. Wainwright, and E. Simoncelli. Image denoising using scale mixtures of gaussians in the wavelet domain. Image Processing, IEEE Transactions on, 12(11):1338–1351, Nov 2003.
  • [34] P. Qiao, Y. Dou, W. Feng, R. Li, and Y. Chen. Learning non-local image diffusion for image denoising. In Proceedings of the 25th ACM International Conference on Multimedia, MM ’17, pages 1847–1855, New York, NY, USA, 2017. ACM.
  • [35] O. Ronneberger, P. Fischer, and T. Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation. Miccai, pages 234–241, 2015.
  • [36] L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena, 60(1-4):259–268, 1992.
  • [37] V. Santhanam, V. I. Morariu, and L. S. Davis. Generalized deep image to image regression. CoRR, abs/1612.03268, 2016.
  • [38] U. Schmidt and S. Roth. Shrinkage fields for effective image restoration. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 2774–2781, June 2014.
  • [39] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  • [40] J. Sánchez Pérez, E. Meinhardt-Llopis, and G. Facciolo. TV-L1 Optical Flow Estimation. Image Processing On Line, 2013.
  • [41] C. Tomasi and R. Manduchi. Bilateral filtering for gray and color images. In Computer Vision, 1998. Sixth International Conference on, pages 839–846. IEEE, 1998.
  • [42] D. Ulyanov, A. Vedaldi, and V. Lempitsky. Deep image prior. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  • [43] R. Vemulapalli, O. Tuzel, and M. Liu. Deep gaussian conditional random field network: A model-based deep network for discriminative denoising. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4801–4809, June 2016.
  • [44] D. Yang and J. Sun. Bm3d-net: A convolutional neural network for transform-domain collaborative filtering. IEEE Signal Processing Letters, 25(1):55–59, Jan 2018.
  • [45] G. Yu, G. Sapiro, and S. Mallat.

    Solving inverse problems with piecewise linear estimators: From gaussian mixture models to structured sparsity.

    Image Processing, IEEE Transactions on, 21(5):2481–2499, May 2012.
  • [46] C. Zach, T. Pock, and H. Bischof. A duality based approach for realtime tv-l 1 optical flow. In Joint Pattern Recognition Symposium. Springer, 2007.
  • [47] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang. Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising. IEEE Transactions on Image Processing, 26(7):3142–3155, 7 2017.
  • [48] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing, 26(7):3142–3155, 2017.
  • [49] K. Zhang, W. Zuo, and L. Zhang. FFDNet: Toward a Fast and Flexible Solution for {CNN} based Image Denoising. CoRR, abs/1710.0, 2017.
  • [50] H. Zhao, O. Gallo, I. Frosio, and J. Kautz. Loss Functions for Image Restoration With Neural Networks. IEEE Transactions on Computational Imaging, 3(X):47–57, 3 2017.
  • [51] D. Zoran and Y. Weiss. From learning models of natural image patches to whole image restoration. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 479–486, Nov 2011.