Towards Adversarially Robust Deep Image Denoising

by   Hanshu Yan, et al.
National University of Singapore

This work systematically investigates the adversarial robustness of deep image denoisers (DIDs), i.e, how well DIDs can recover the ground truth from noisy observations degraded by adversarial perturbations. Firstly, to evaluate DIDs' robustness, we propose a novel adversarial attack, namely Observation-based Zero-mean Attack (ObsAtk), to craft adversarial zero-mean perturbations on given noisy images. We find that existing DIDs are vulnerable to the adversarial noise generated by ObsAtk. Secondly, to robustify DIDs, we propose an adversarial training strategy, hybrid adversarial training (HAT), that jointly trains DIDs with adversarial and non-adversarial noisy data to ensure that the reconstruction quality is high and the denoisers around non-adversarial data are locally smooth. The resultant DIDs can effectively remove various types of synthetic and adversarial noise. We also uncover that the robustness of DIDs benefits their generalization capability on unseen real-world noise. Indeed, HAT-trained DIDs can recover high-quality clean images from real-world noise even without training on real noisy data. Extensive experiments on benchmark datasets, including Set68, PolyU, and SIDD, corroborate the effectiveness of ObsAtk and HAT.


page 4

page 5

page 12


Improving Global Adversarial Robustness Generalization With Adversarially Trained GAN

Convolutional neural networks (CNNs) have achieved beyond human-level ac...

Diffusion Models for Adversarial Purification

Adversarial purification refers to a class of defense methods that remov...

Hiding Images into Images with Real-world Robustness

The existing image embedding networks are basically vulnerable to malici...

Analysis and Improvement of Adversarial Training in DQN Agents With Adversarially-Guided Exploration (AGE)

This paper investigates the effectiveness of adversarial training in enh...

Automatic Feature Highlighting in Noisy RES Data With CycleGAN

Radio echo sounding (RES) is a common technique used in subsurface glaci...

Learning to Generate Realistic Noisy Images via Pixel-level Noise-aware Adversarial Training

Existing deep learning real denoising methods require a large amount of ...

NoiLIn: Do Noisy Labels Always Hurt Adversarial Training?

Adversarial training (AT) based on minimax optimization is a popular lea...

1 Introduction

Image denoising, which aims to reconstruct clean images from their noisy observations, is a vital part of the image processing systems. The noisy observations are usually modeled as the addition between ground-truth images and zero-mean noise maps [6, 25]

. Recently, deep learning-based methods have made significant advancements in denoising tasks

[25, 2] and have been applied in many areas including medical imaging [8] and photography [1]. Despite the success of deep denoisers in recovering high-quality images from a certain type of noisy images, we still lack knowledge about their robustness against adversarial perturbations, which may cause severe safety hazards in high-stake applications like medical diagnosis. To address this problem, the first step should be developing attack methods dedicated for denoising to evaluate the robustness of denoisers. In contrast to the attacks for classification [9, 13], attacks for denoising should consider not only the adversarial budget but also some assumptions of natural noise, such as zero-mean, because certain perturbations, such as adding a constant value, do not necessarily result in visual artifacts. Although Choi et al. choi_deep_2021,choi_evaluating_2019 studied the vulnerability for various deep image processing models, they directly applied the attack from classification. To the best of our knowledge, no attacks are truly dedicated for the denoising task till now.

To this end, we propose the observation-based zero-mean attack (ObsAtk), which crafts a worst-case zero-mean perturbation for a noisy observation by maximizing the distance between the output and the ground-truth. To ensure that the perturbation satisfies the adversarial budget and the zero-mean constraints, we utilize the classical projected-gradient-descent (PGD) [13]

method for optimization, and develop a two-step operation to project the perturbation back into the feasible region. Specifically, in each iteration, we first project the perturbation onto the zero-mean hyperplane. Then, we linearly rescale the perturbation to adjust its norm to be less or equal to the adversarial budget. We examine the effectiveness of

ObsAtk on several benchmark datasets and find that deep image denoisers are indeed susceptible to ObsAtk: the denoisers cannot remove adversarial noise completely and even yield atypical artifacts, as shown in Figure 1(g).

To robustify deep denoisers against adversarial perturbations, we propose an effective adversarial training strategy, namely hybrid adversarial training (HAT

), to train denoisers by using adversarially noisy images and non-adversarial noisy images together. The loss function of

HAT consists of two terms. The first term ensures the reconstruction performance from common non-adversarial noisy images, and the second term ensures the reconstructions between non-adversarial and adversarial images to be close to each other. Thus, we can obtain denoisers that perform well on both non-adversarial noisy images and their adversarial perturbed versions. Extensive experiments on benchmark datasets verify the effectiveness of HAT.

Moreover, we reveal that adversarial robustness benefits the generalization capability to unseen types of noise, i.e., HAT can train denoisers for real-world noise removal only with synthetic noise sampled from common distributions like Gaussians. That is because ObsAtk searches for the worst-case perturbations around different levels of noisy images, and training with adversarial data ensures the denoising performance on various types of noise. In contrast, other reasonable methods for real-world denoising [10, 12] mostly require a large number of real-world noisy data for the training, which are unfortunately not available in some applications like medical radiology. We conduct experiments on several real-world datasets. Numerical and visual results demonstrate the effectiveness of HAT for real-world noise removal.

In summary, there are three main contributions in this work: 1) We propose a novel attack, ObsAtk, to generate adversarial examples for noisy observations, which facilitates the evaluation of the robustness of deep image denoisers. 2) We propose an effective adversarial training strategy, HAT, for robustifying deep image denoisers. 3) We build a connection between adversarial robustness and the generalization to unseen noise, and show that HAT serves as a promising framework for training generalizable deep image denoisers.

2 Notation and Background

Adversarial robustness and adversarial training

Consider a deep neural network (DNN)

mapping an input to a target , the model is trained to minimize a certain loss function that is measured by particular distance between output and the target . In high stake applications, the DNN should resist small perturbations on the input data and map the perturbed input to a result close to the target. The notion of robustness has been proposed to measure the resistance of DNNs against the slight changes of the input [15, 9]. The robustness is characterized by the distance between and target , where the worst-case perturbed input is located within a small neighborhood of the original input and maximizes the distance between its output and target .


The worst-case perturbation can be approximated via many adversarial attack methods, such as FGSM [9], I-FGSM [11], and PGD [13], which solve (1) via gradient descent methods. The distance is an indication of the robustness of around : a small distance implies strong robustness and vice versa. In terms of image classification, the -neighborhood is usually defined by the -norm and the distance is measured by the cross-entropy loss [13] or a margin loss [5]. For image restoration, the distance between images is usually measured by the -norm [25].

In most cases, deep learning models have been shown to be vulnerable against adversarial attacks under normal training (NT) [16, 20]. To robustify DNNs, Madry et al. madry_towards_2018 proposed the PGD adversarial training (AT) method which trains DNNs with adversarial examples of the original data. AT is formally formulated as the following min-max optimization problem,


Its effectiveness has been verified by extensive empirical and theoretical results [21, 7]. For further improvement, many variants of PGD have been proposed in terms of its robustness enhancement [22], generalization to non-adversarial data [23], and computational efficiency [14].

Deep image denoising

During image capturing, unknown types of noise may be induced by physical sensors, data compression, and transmission. Noisy observations are usually modeled as the addition between the ground-truth images and certain zero-mean noise [6, 27], i.e., with , where is the element of

. The random vector

with distribution denotes a random clean image and the noise with a distribution satisfies the zero-mean constraint. Denoising techniques aim to recover clean images from their noisy observations [25, 6]. Suppose we are given a training set of noisy and clean image pairs sampled from distributions and respectively, we can train a DNN to effectively remove the noise induced by distribution from the noisy observations. A series of DNNs have been developed for denoising in recent years, including DnCNN [25], FFDNet [26], and RIDNet [2].

In real-world applications [1, 19], the noise distribution is usually unknown due to the complexity of the image capturing procedures; besides, collecting a large number of image pairs (clean/noisy or noisy/noisy) for training sometimes may be unrealistic in safety-critical domains such as medical radiology [27]. To overcome these, researchers developed denoising techniques by approximating real noise with common distributions like Gaussian or Poisson [6, 27]. To train denoisers that can deal with different levels of noise, where the noise level is measured by the energy-density of noise, the training set may consist of noisy images sampled from a variety of noise distributions [25], whose expected energy-densities range from zero to certain budget (the expected -norms range from zero to ). For example, where and and are sampled from and respectively and where

is randomly selected from a set of Gaussian distributions

. The denoiser trained with is termed as an -denoiser.

On robustness of deep image denoisers

In practice, data storage and transmission may induce imperceptible perturbations on the original data so that the perturbed noise may be statistically slightly different from the noise sampled from the specific original distribution. Although an -denoiser can successfully remove noise sampled from , the performance of noise removal on the perturbed data is not guaranteed. Thus, we propose a novel attack method, ObsAtk, to assess the adversarial robustness of DIDs in Section 3. To robustify DIDs, we propose an adversarial training strategy, HAT, in Section 4. HAT-trained DIDs can effectively denoise adversarial perturbed noisy images and preserve good performance on non-adversarial data.

Besides the adversarial robustness issue, it has been shown that -denoisers trained with cannot generalize well to unseen real-world noise [12, 3]. Several methods have been proposed for real-world noise removal, but most of them require a large number of real noisy data for training, e.g., CBDNet (clean/noisy pairs) [10] and Noise2Noise (noisy pairs) [12], which is sometimes impractical. In Section 4.3, we show that HAT-trained DIDs can generalize well to unseen real noise without the need of utilizing real noisy images for training.

3 ObsAtk for Robustness Evaluation

In this section, we propose a novel adversarial attack, Observation-based Zero-mean Attack (ObsAtk), to evaluate the robustness of DIDs. We also conduct experiments on benchmark datasets to demonstrate that normally-trained DIDs are vulnerable to adversarial perturbations.

3.1 Observation-based Zero-mean Attack

Figure 1: Illustration of ObsAtk. Left: We perturb a noisy observation of the ground-truth with an adversarial budget in the -norm. For an -denoiser, we choose a proper value of to ensure the norm of the total noise is bounded by , where denotes the image size. Right: The perturbation is projected via the two-step operation onto the region defined by the zero-mean and -ball constraints.

An -denoiser can generate a high-quality reconstruction close to the ground-truth from a noisy observation . To evaluate the robustness of with respect to a perturbation on , we develop an attack to search for the worst perturbation that degrades the recovered image as much as possible. Formally, we need to solve the problem stated in Eq. (3). The optimization problem is subject to two constraints: The first constraint requires the norm of to be bounded by a small adversarial budget . The second constraint restricts the mean of all elements in to be zero. This corresponds to the zero-mean assumption of noise in real-world applications because a small mean-shift does not necessarily result in visual noise. For example, a mean-shift in gray-scale images implies a slight change of brightness. Since the zero-mean perturbation is added to a noisy observation , we term the proposed attack as Observation-based Zero-mean Attack (ObsAtk).

s.t. (3b)
1:Denoiser , ground-truth , noisy observation , adversarial budget , #iterations , step-size , minimum pixel value , maximum pixel value
2:Adversarial perturbation
4:for  to  do
5:     ;
6:      where is in (4a)
7:     ;
Algorithm 1 ObsAtk

We solve the constrained optimization problem Eq. (3) by using the classical projected-gradient-descent (PGD) method. PGD-like methods update optimization variables iteratively via gradient descent and ensure the constraints to be satisfied by projecting parameters back to the feasible region at the end of each iteration. To deal with the -norm and zero-mean constraints, we develop a two-step operation in Eq. (4), that first projects the perturbation back to the zero-mean hyperplane and then projects the result onto the -neighborhood.


In each iteration, as shown in Figure 1, the first step involves projecting the perturbation onto the zero-mean hyperplane. The zero-mean hyperplane consists of all the vectors whose mean of all elements equals zero, i.e., , where is the length- all ones vector. Thus, is a normal of the zero-mean plane. We can project any vector onto the zero-mean plane via (4a). The vector is first projected along the direction of , then its projection onto the zero-mean plane equals itself minus its projection onto . The second step involves further projecting back to the -ball via linear scaling. If is already within the -ball, we keep unchanged. Otherwise, the final projection is obtained by scaling with a factor . For any two sets and , although the projection onto is, in general, not equal to the result obtained by first projecting onto , then onto , surprisingly, the following holds for the two sets in (3b).

Theorem 1 (Informal)

Given any vector , the projection of via the two-step operation in (4) satisfies the two constraints in (3b), and the two-step projection is equivalent to the exact projection onto the set defined by (3b).

The formal statement and the proof of Theorem 1 are provided in Appendix A. The complete procedure of ObsAtk is summarized in Algorithm 1.

Figure 2: Given a normally-trained denoiser , from left to right are the ground-truth image , Gaussian noise , the Gaussian noisy image , the reconstruction from , adversarial noise , the adversarially noisy image , and the reconstruction from . Comparing (a), (d) and (g), we observe that can effectively remove Gaussian noise but its performance is degraded when dealing with the adversarial noise (noise remains on the roof and strange contours appear in the sky).

3.2 Robustness Evaluation via ObsAtk

We use ObsAtk to evaluate the adversarial robustness of -denoisers on several gray-scale and RGB benchmark datasets, including Set12, Set68, BSD68, and Kodak24. For gray-scale image denoising, we use Train400 to train a DnCNN-B [25] model, which consists of 20 convolutional layers. We follow the training setting in Zhang et al. zhang_beyond_2017 and randomly crop patches in size of . Noisy and clean image pairs are constructed by injecting different levels of white Gaussian noise into clean patches. The noise levels are uniformly randomly selected from with . For RGB image denoising, we use BSD432 (BSD500 excluding images in BSD68) to train a DnCNN-C model with the same number of layers as DnCNN-B and but set the input and output channels to be three. Other settings follow those of the training of DnCNN-B.

We evaluate the denoising capability of the -denoiser on Gaussian noisy images and their adversarially perturbed versions. The image quality of reconstruction is measured via the peak-signal-noise ratio (PSNR) metric. A large PSNR between reconstruction and ground-truth implies a good performance of denoising. We denote the energy-density of the noise in test images as and consider three levels of noise, i.e., , , and . For Gaussian noise removal, we add white Gaussian noise with to clean images. For Uniform noise removal, we generate noise from . For denoising adversarial noisy images, the norm budgets of adversarial perturbation are set to be and respectively, where equals the size of test images. We perturb noisy observations whose noise are generated from , so that the -norms of total noise in adversarial images are still bounded by and the energy-density thus are bounded by . We use Atk- to denote the adversarially perturbed noisy images in the size of with adversarial budget . The number of iterations of PGD in ObsAtk is set to be five.

Dataset Atk- Atk-
Set68 29.16/0.02 29.15/0.01 24.26/0.12 23.12/0.10
31.68/0.00 31.68/0/00 26.66/0.04 26.08/0.02
Set12 30.39/0.01 30.41/0.01 24.32/0.18 22.96/0.13
32.78/0.00 32.81/0.00 26.91/0.05 26.25/0.01
BSD68 31.25/0.11 31.17/0.11 27.44/0.08 26.08/0.06
33.98/0.11 33.93/0.12 29.31/0.08 27.84/0.04
Kodak24 32.20/0.13 32.13/0.14 27.87/0.08 26.37/0.07
34.77/0.13 34.73/0.14 29.55/0.07 28.00/0.04
Table 1: The average PSNR (in dB) results of DnCNN denoisers on the gray-scale and RGB datasets. Four types of noise are used for evaluation, viz. Gaussian and Uniform random noise, and ObsAtk with two different adversarial budgets. The energy-density of noise is bounded by .

From Tables 1, we observe that ObsAtk clearly degrades the reconstruction performance of DIDs. In comparison to Gaussian or Uniform noisy images with the same noise levels, the recovered results from adversarial images are much worse in the sense of the PSNR. For example, when removing noisy images in Set12, the average PSNR of reconstructions from Gaussian noise can achieve 32.78 dB, whereas the PSNR drops to 26.25 dB when dealing with Atk- adversarial images. We observe the consistent phenomenon that a normally-trained denoiser cannot effectively remove adversarial noise from visual results in Figure 2.

4 Robust and Generalizable Denoising via Hat

The previous section shows that existing deep denoisers are vulnerable to adversarial perturbations. To improve the adversarial robustness of deep denoisers, we propose an adversarial training method, hybrid adversarial training (HAT), that uses original noisy images and their adversarial versions for training. Furthermore, we build a connection between the adversarial robustness of deep denoisers and their generalization capability to unseen types of noise. We show that HAT-trained denoisers can effectively remove real-world noise without the need to leverage the real-world noisy data.

4.1 Hybrid Adversarial Training

AT has been proved to be a successful and universally applicable technique for robustifying deep neural networks. Most variants of AT are developed for the classification task specifically, such as TRADES [22] and GAIRAT [24]. Here, we propose an AT strategy, HAT, for robust image denoising:


where and . Note that is the adversarial perturbation obtained by solving ObsAtk in Eq. (3).

As shown in Eq. (5), the loss function consists of two terms. The first term measures the distance between ground-truth images and reconstructions from non-adversarial noisy images , where contains noise sampled from a certain common distribution , such as Gaussian. This term encourages a good reconstruction performance of from common distributions. The second term is the distance between and the reconstruction from the adversarially perturbed version of . This term ensures that the reconstructions from any two noisy observations within a small neighborhood of have similar image qualities. Minimizing these two terms at the same time controls the worst-case reconstruction performance .

The coefficient balances the trade-off between reconstruction from common noise and the local continuity of . When equals zero, HAT degenerates to normal training on common noise. The obtained denoisers fail to resist adversarial perturbations as shown in Section 3. When is very large, the optimization gradually ignores the first term and completely aims for local smoothness. This may yield a trivial solution that always outputs a constant vector for any input. A proper value of thus ensures a denoiser that performs well for common noise and the worst-case adversarial perturbations simultaneously. We perform an ablation study on the effect of for the robustness enhancement and unseen noise removal in Appendix C.

To train a denoiser applicable to different levels of noise with an energy-density bounded by , we randomly select a noise distribution from a family of common distributions .

includes a variety of zero-mean distributions whose variance are bounded by

. For example, we define for the experiments in the remaining of this paper.

4.2 Robustness Enhancement via Hat

(a) Ground-truth
(b) NT
(c) HAT
Figure 3: From left to right are the ground-truth, the reconstruction of a normally-trained denoiser against attack, and the reconstruction of a HAT-trained denoiser against attack.

We follow the same settings as those in Section 3 for training and evaluating -deep denoisers. The highest level of noise used for training is set to be . Noise is sampled from a set of Gaussian distributions . We train deep denoisers with the HAT strategy and set to be , and use one-step Atk- to generate adversarially noisy images for training. We compare HAT with normal training (NT) and the vanilla adversarial training (vAT) used in Choi et al. choi_deep_2021 that trains denoisers only with adversarial data. The results on Set68 and BSD68 are provided in this section. More results on Set12 and Kodak24 (in Tables B.1 and B.2) are provided in Appendix B.

Method Atk- Atk- Atk-
NT 29.16/0.02 26.20/0.07 24.26/0.12 23.12/0.10
31.68/0.00 27.98/0.05 26.66/0.04 26.08/0.02
vAT 29.05/0.07 27.02/0.15 25.51/0.32 24.34/0.34
31.53/0.09 28.74/0.16 27.43/0.19 26.68/0.15
HAT 28.88/0.04 27.48/0.10 26.40/0.16 25.32/0.17
31.36/0.03 29.52/0.01 28.34/0.03 27.34/0.03
Table 2: The average PSNR (in dB) results of DnCNN-B denoisers on the gray-scale Set68 dataset. NT and HAT

are compared in terms of the noise removal of Gaussian noise and adversarial noise. We repeat the training for three times and report the mean and standard deviation (mean/std).

From Tables 2 and 3, we observe that HAT obviously improves the reconstruction performance from adversarial noise in comparison to normal training. For example, on the Set68 dataset (Table 2), when dealing with -level noise, the normally-trained denoiser achieves 31.68 dB for Gaussian noise removal, but the PSNR drops to 26.08 dB against Atk-. In contrast, the HAT-trained denoiser achieves a PSNR of 27.34 dB (1.26 dB higher) against Atk- and maintains a PSNR of 31.36 dB for Gaussian noise removal. In Figure 3, we can see that when dealing with adversarially noisy images, the HAT-trained denoiser can recover high-quality images while the normally-trained denoiser preserves noise patterns in the output. Besides, we observe that, similar to image classification tasks [22], AT-based methods (HAT and vAT) robustify deep denoisers at the expense of the performance on non-adversarial data (Gaussian denoising). Nevertheless, the degraded reconstructions are still reasonably good in terms of the PSNR.

Method Atk- Atk- Atk-
NT 31.25/0.11 28.93/0.08 27.44/0.08 26.08/0.06
33.98/0.11 31.09/0.10 29.31/0.08 27.84/0.04
vAT 30.64/0.02 28.81/0.03 27.67/0.01 26.64/0.03
33.45/0.06 31.10/0.05 29.79/0.02 28.63/0.08
HAT 30.98/0.03 29.18/0.03 28.02/0.02 26.93/0.04
33.67/0.04 31.38/0.04 30.03/0.02 28.80/0.01
Table 3: The average PSNR (in dB) results of DnCNN-C denoisers on the RGB BSD68 dataset.
Dataset BM3D DIP N2S(1) NT vAT HAT N2C
PolyU 37.40 / 0.00 36.08 / 0.01 35.37 / 0.15 35.86 / 0.01 36.77 / 0.00 37.82 / 0.04 – / –
CC 35.19 / 0.00 34.64 / 0.06 34.33 / 0.14 33.56 / 0.01 34.49 / 0.10 36.26 / 0.06 – / –
SIDD 25.65 / 0.00 26.89 / 0.02 26.51 / 0.03 27.20 / 0.70 27.08 / 0.28 33.44 / 0.02 33.50 / 0.03
Table 4: Comparison of different methods for denoising real-world noisy images in terms of PSNR (dB). We repeat the experiments of each denoising method for three times and report the mean/standard deviation of PSNR values.

4.3 Robustness Benefits Generalization to Unseen Noise

It has been shown that denoisers that are normally trained on common synthetic noise fail to remove real-world noise induced by standard imaging procedures [19, 1]. To train denoisers that can handle real-world noise, researchers have proposed several methods which can be roughly divided into two categories, namely dataset-based denoising methods and single-image-based denoising methods. High-performance dataset-based methods require a set of real noisy data for training, e.g., CBDNet requiring pairs of clean and noisy images [10] and Noise2Noise requiring multiple noisy observations of every single image [12]. However, a large number of paired data are not available in some applications, such as medical radiology and high-speed photography. To address this, single-image-based methods are proposed to remove noise by exploiting the correlation between signals across pixels and the independence between noise. This category of methods, such as DIP [17] and N2S [3], are adapted to various types of signal-independent noise, but they optimize the deep denoiser on each test image. The test-time optimization is extremely time-consuming, e.g., N2S needs to update a denoiser for thousands of iterations to achieve good reconstruction performance.

Here, we point out that HAT is a promising framework to train a generalizable deep denoiser only with synthetic noise. The resultant denoiser can be directly applied to perform denoising for unseen noisy images in real-time. During training, HAT first samples noise from common distributions (Gaussian) with noise levels from low to high. ObsAtk then explores the -neighborhood for each noisy image to search for a particular type of noise that degrades the denoiser the most. By ensuring the denoising performance of the worst-case noise, the resultant denoiser can deal with other unknown types of noise within the -neighborhood as well. To train a robust denoiser that generalizes well to real-world noise, we need to choose a proper adversarial budget . When is very small and close to zero, the HAT reduces to normal training. When is very much larger than the norm of basic noise , the adversarially noisy image may be visually unnatural because the adversarial perturbation

only satisfies the zero-mean constraint and is not guaranteed to be spatially uniformly distributed as other types of natural noise being. In practice, we set the value of

of ObsAtk to be , where denotes the size of image patches. The value of of HAT is kept unchanged as .

Experimental Settings

We evaluate the generalization capability of HAT on several real-world noisy datasets, including PolyU [18], CC [19], and SIDD [1]. PolyU, CC, and SIDD contain RGB images of common scenes in daily life. These images are captured by different brands of digital cameras and smartphones, and they contain various levels of noise by adjusting the ISO values. For the PolyU and CC, we use the clean images in BSD500 for training an adversarially robust -denoiser with . We sample Gaussian noise from a set of distributions and add the noise to clean images to craft noisy observations. HAT trains the denoiser jointly with Gaussian noisy images and their adversarial versions. For the SIDD, we use clean images in the SIDD-small set for training and test the denoisers on the SIDD-val set. The highest level of noise used for HAT is set to be . In each case, we only use clean images for training denoisers without the need of real noisy images


We compare HAT-trained denoisers with the NT and vAT-trained ones. From Table 4, we observe that HAT performs much better than both competitors. For example, on the SIDD-val dataset, the HAT-trained denoiser achieves an average PSNR value of 33.44 dB that is 6.24 dB higher than the NT-trained one. We also compare HAT-trained denoisers with single-image-based methods, including DIP, N2S, and the classical BM3D [6]. For DIP and N2S,111The officially released codes of DIP and N2S are used here. the numbers of iterations for each image are set to be 2,000 and 1,000, respectively. N2S works in two modes, namely single-image-based denoising and dataset-based denoising. Here, we use N2S in the single-image-based mode, denoted as N2S(1), due to the assumption that no real noisy data are available for training. We observe that HAT-trained denoisers consistently outperform these baselines. Visual comparisons are provided in Appendix D. Besides, since the SIDD-small provides a set of real noisy and ground-truth pairs, we train a denoiser, denoted as Noise2Clean (N2C), with these paired data and use the N2C denoiser as the oracle for comparison. We observe that HAT-trained denoisers are comparable to the N2C one for denoising images in SIDD-val (a PSNR of 33.44dB vs 33.50dB).

5 Conclusion

Normally-trained deep denoisers are vulnerable to adversarial attacks. HAT can effectively robustify deep denoisers and boost their generalization capability to unseen real-world noise. In the future, we will extend the adversarial-training framework to other image restoration tasks, such as deblurring. We aim to develop a generic AT-based robust optimization framework to train deep models that can recover clean images from unseen types of degradation.

6 Acknowledgments

HY and VYFT are funded by a Singapore National Research Foundation (NRF) Fellowship (R-263-000-D02-281).
JZ was supported by JST ACT-X Grant Number JPMJAX21AF.
MS was supported by JST CREST Grant Number JPMJCR18A2.


  • [1] A. Abdelhamed, S. Lin, and M. S. Brown (2018) A High-Quality Denoising Dataset for Smartphone Cameras. In CVPR, (en). External Links: ISBN 978-1-5386-6420-9, Link, Document Cited by: §1, §2, §4.3, §4.3.
  • [2] S. Anwar and N. Barnes (2019) Real Image Denoising with Feature Attention. In ICCV, External Links: Link Cited by: §1, §2.
  • [3] J. Batson and L. Royer (2019) Noise2Self: Blind Denoising by Self-Supervision. In ICML, (en). Cited by: §2, §4.3.
  • [4] S. Boyd, S. P. Boyd, and L. Vandenberghe (2004) Convex optimization. Cambridge university press. Cited by: Appendix A.
  • [5] N. Carlini and D. Wagner (2017) Towards Evaluating the Robustness of Neural Networks. In IEEE Symposium on Security and Privacy (SP), External Links: Document Cited by: §2.
  • [6] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian (2007) Image Denoising by Sparse 3-D Transform-Domain Collaborative Filtering. IEEE TIP 16. External Links: ISSN 1941-0042, Document Cited by: §1, §2, §2, §4.3.
  • [7] R. Gao, T. Cai, H. Li, C. Hsieh, L. Wang, and J. D. Lee (2019) Convergence of Adversarial Training in Overparametrized Neural Networks. In NeurIPS, (en). External Links: Link Cited by: §2.
  • [8] L. Gondara (2016)

    Medical image denoising using convolutional denoising autoencoders

    In ICDMW, External Links: Link, Document Cited by: §1.
  • [9] I. J. Goodfellow, J. Shlens, and C. Szegedy (2015) Explaining and Harnessing Adversarial Examples. arXiv:1412.6572 [cs, stat] (en). External Links: Link Cited by: §1, §2.
  • [10] S. Guo, Z. Yan, K. Zhang, W. Zuo, and L. Zhang (2019) Toward Convolutional Blind Denoising of Real Photographs. In CVPR, (en). External Links: ISBN 978-1-72813-293-8, Link, Document Cited by: §1, §2, §4.3.
  • [11] A. Kurakin, I. Goodfellow, and S. Bengio (2017) Adversarial examples in the physical world. arXiv:1607.02533 [cs, stat]. External Links: Link Cited by: §2.
  • [12] J. Lehtinen, J. Munkberg, J. Hasselgren, S. Laine, T. Karras, M. Aittala, and T. Aila (2018) Noise2Noise: Learning Image Restoration without Clean Data. In ICML, External Links: Link Cited by: §1, §2, §4.3.
  • [13] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu (2018) Towards Deep Learning Models Resistant to Adversarial Attacks. In ICLR, (en). External Links: Link Cited by: §1, §1, §2.
  • [14] A. Shafahi, M. Najibi, A. Ghiasi, Z. Xu, J. Dickerson, C. Studer, L. S. Davis, G. Taylor, and T. Goldstein (2019) Adversarial Training for Free!. In NeurIPS, External Links: Link Cited by: §2.
  • [15] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus (2014) Intriguing properties of neural networks. In ICLR, External Links: Link Cited by: §2.
  • [16] F. Tramer, N. Carlini, W. Brendel, and A. Madry (2020) On Adaptive Attacks to Adversarial Example Defenses. arXiv:2002.08347 [cs, stat]. External Links: Link Cited by: §2.
  • [17] D. Ulyanov, A. Vedaldi, and V. Lempitsky (2018) Deep Image Prior. In CVPR, External Links: Link Cited by: §4.3.
  • [18] J. Xu, H. Li, Z. Liang, D. Zhang, and L. Zhang (2018) Real-world Noisy Image Denoising: A New Benchmark. arXiv:1804.02603 [cs]. External Links: Link Cited by: §4.3.
  • [19] J. Xu, L. Zhang, D. Zhang, and X. Feng (2017) Multi-channel Weighted Nuclear Norm Minimization for Real Color Image Denoising. In ICCV, External Links: Link Cited by: §2, §4.3, §4.3.
  • [20] H. Yan, J. Du, V. Tan, and J. Feng (2019)

    On Robustness of Neural Ordinary Differential Equations

    In ICLR, (en). External Links: Link Cited by: §2.
  • [21] H. Yan, J. Zhang, G. Niu, J. Feng, V. Tan, and M. Sugiyama (2021)

    CIFS: Improving Adversarial Robustness of CNNs via Channel-wise Importance-based Feature Selection

    In ICML, (en). External Links: Link Cited by: §2.
  • [22] H. Zhang, Y. Yu, J. Jiao, E. P. Xing, L. E. Ghaoui, and M. I. Jordan (2019) Theoretically Principled Trade-off between Robustness and Accuracy. In ICML, External Links: Link Cited by: §2, §4.1, §4.2.
  • [23] J. Zhang, X. Xu, B. Han, G. Niu, L. Cui, M. Sugiyama, and M. Kankanhalli (2020) Attacks Which Do Not Kill Training Make Adversarial Learning Stronger. In ICML, External Links: Link Cited by: §2.
  • [24] J. Zhang, J. Zhu, G. Niu, B. Han, M. Sugiyama, and M. Kankanhalli (2020) Geometry-aware Instance-reweighted Adversarial Training. In ICLR, (en). External Links: Link Cited by: §4.1.
  • [25] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang (2017) Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising. IEEE TIP. External Links: ISSN 1057-7149, 1941-0042, Link, Document Cited by: §1, §2, §2, §2, §3.2.
  • [26] K. Zhang, W. Zuo, and L. Zhang (2018) FFDNet: Toward a Fast and Flexible Solution for CNN based Image Denoising. IEEE TIP. External Links: ISSN 1057-7149, 1941-0042, Link, Document Cited by: §2.
  • [27] Y. Zhang, Y. Zhu, E. Nichols, Q. Wang, S. Zhang, C. Smith, and S. Howard (2019) A Poisson_gaussian Denoising Dataset with Real Fluorescence Microscopy Images. In CVPR, External Links: Link Cited by: §2, §2.

Appendix A Two-step Projection

Theorem 1

For any arbitrary vector , its projection onto the region defined by the intersection of the norm-bounded and zero-mean constraints is equivalent to the projection first onto the zero-mean hyperplane followed by the projection onto the -ball (), i.e.,




and .

Figure A.1: Illustration of Theorem 1. In the case of , the red-dot lines show that the perturbation is projected onto the region defined by the zero-mean and -ball constraints sequentially. The blue-dot line shows the exact projection of on to .


Let us consider the RHS of Eq. (6) first. It is easy to derive the projections onto and seperately:


Thus, we have


Now let us consider the LHS of Eq. (6). The projection onto can be formulated as the solution of the following convex optimization problem:


where . We can write the Lagrangian, , associated with the problem (10) as


Since there exists an , e.g., , such that and , the problem (10) is strictly feasible, i.e., it satisfies the Slater’s condition [4]. Besides, the objective and the constraints are all differentiable, thus the KKT conditions in Eq. (12) provide necessary and sufficient conditions for optimality.


We obtain the optimal solution by considering the following two cases separately, i.e., and .

Case-(1): .
If , then Eq. (12) reduces to the following equation:


We can easily solve these equations and obtain that


If , then Eq. (12) reduces to the following set of equations:


According to (15b) and (15c), we obtain that with a norm strictly larger than , which contradicts the constraint . Thus, for the case of , we have that which is equal to in Eq. (9).

Case-(2): .
Since and , we have . For any other point and , we have , where the strict inequality holds because is the set of points from a hyperplane. Thus, is not the . Therefore, .

In summary, we show that for any arbitrary .

Appendix B Experiments of Robustness Enhancement on Set12 and Kodak24

We compare the robustness of deep denoisers trained via three strategies, i.e., NT, vAT and HAT. The results on Set 12 and Kodak24 are provided in Table B.1 and Table B.2 respectively. We observe that HAT can effectively robustify deep denoisers. The reconstruction quality of HAT-trained denoisers from adversarially noisy images is clearly better than that of the NT and vAT-trained ones.

Training Atk- Atk- Atk-
NT 30.39/0.01 26.51/0.14 24.32/0.18 22.96/0.13
32.78/0.00 28.50/0.08 26.91/0.05 26.25/0.01
vAT 30.25/0.08 27.56/0.06 25.82/0.04 24.33/0.04
32.63/0.09 29.37/0.17 27.83/0.15 26.91/0.08
HAT 30.01/0.06 27.96/0.15 26.46/0.20 25.13/0.19
32.47/0.04 29.95/0.03 28.45/0.04 27.20/0.03
Table B.1: The average PSNR (in dB) results of DnCNN-B denoisers on the gray-scale Set12 dataset.
Training Atk- Atk- Atk-
NT 32.20/0.13 29.57/0.09 27.87/0.08 26.37/0.07
34.77/0.13 31.54/0.11 29.55/0.07 28.00/0.04
vAT 31.44/0.01 29.41/0.05 28.13/0.06 26.98/0.02
34.14/0.08 31.53/0.11 30.06/0.08 28.78/0.06
HAT 31.83/0.04 29.85/0.02 28.56/0.02 27.34/0.05
34.36/0.06 31.84/0.05 30.37/0.02 29.05/0.01
Table B.2: The average PSNR (in dB) results of DnCNN-C denoisers on the RGB Kodak24 dataset.

Appendix C Ablation study

c.1 Effect of on Robustness Enhancement and Generalization to Real-world noise

Here, we evaluate the effect of in HAT on the adversarial robustness and the generalization capability to real-world noise. We train deep denoisers on the RGB BSD500 (except 68 images for test) dataset. The obtained denoisers are tested on the BSD68 dataset for Gaussian and adversarial noise removal. The generalization capability is evaluated on two datasets of real-world noisy images, i.e., PolyU and CC. Experimental settings follow those in Section 4.2.

Figure C.1 corroborates the analysis in Section 4.1 that the coefficient balances the trade-off between reconstruction from common noise and the adversarial robustness. We also find that the generalization capability to real-world noise is correlated to the adversarial robustness. Specifically, good adversarial robustness usually implies good generalization to real-world noise. In Figure C.1, the best robustness and the best performance on real-world noise appear around or . When is too large or too small, the robustness and generalization worsen. For the noise sampled from Gaussian distributions, increasing degrades the denoising performance. In summary, we set to or to achieve a good balance between the denoising performance on common noise and the adversarial robustness as well as real-world generalization.

Figure C.1: Ablation study on the effect of in HAT. Green lines show the denoising results on non-adversarial noise sampled from common distributions. The legend -w- denotes the Gaussian noise () with a energy-density bounded by . Blue lines show that denoising results on adversarially perturbed noisy images. -w-ObsAtk-5 denotes the adversarial noise crafted by ObsAtk-5 with a energy-density bounded by . Red lines show the denoising results on real-world noisy images.

c.2 Effect of on Generalization to Real-world Noise

Here, we evaluate the effect of used in HAT on the generalization capability to real-world noise. We train deep denoisers on the RGB BSD500 (except 68 images for test) dataset and evaluate the generalization capability on two real-world datasets, namely PolyU and CC. The is set to be . The adversarial budget of ObsAtk-, that generates adversarially noisy images for HAT, is set to be values from for comparison, where denotes the size of images. Other experimental settings follow those in Section 4.2.

Figure C.2 corroborates the analysis in Section 4.3. When is very small and close to zero, the HAT reduces to normal training. The resultant denoisers cannot effectively remove real-world noise. When is very much larger than the norm of basic noise , the statistics of adversarial noise may be very unnatural because the adversarial perturbation might concentrate on a certain region, like edges or texture, and not be spatially uniformly distributed as other types of natural noise being. We can see that, when , the denoising performance on real-world datasets starts to decrease. In practice, we set the value of of ObsAtk to be to train generalizable denoisers.

Figure C.2: Ablation study on the effect of in HAT.

Appendix D Visual results of real-world noise removal

We show the denoising results on SIDD-val set in Figure D.1. We observe that HAT-trained denoiser can effectively remove the real-world noise while the normally-trained one retains much noise in the reconstructions. Besides, the HAT-trained denoiser outperforms other baseline methods and produces much cleaner results.

(a) Noisy
(b) BM3D
(c) DIP
(d) N2S
(e) NT
(f) vAT
(g) HAT
(h) Ground-truth
Figure D.1: Comparison of different denoisers for denoising SIDD-val set. From left to right are the input noisy image, reconstructions of different denoisers including BM3D, DIP, N2S, NT-trained DnCNN, vAT-trained DnCNN, and HAT-trained DnCNN. We can see that the HAT-trained denoiser performs the best in comparison to other baseline methods.