1 Introduction
Image denoising, which aims to reconstruct clean images from their noisy observations, is a vital part of the image processing systems. The noisy observations are usually modeled as the addition between groundtruth images and zeromean noise maps [6, 25]
. Recently, deep learningbased methods have made significant advancements in denoising tasks
[25, 2] and have been applied in many areas including medical imaging [8] and photography [1]. Despite the success of deep denoisers in recovering highquality images from a certain type of noisy images, we still lack knowledge about their robustness against adversarial perturbations, which may cause severe safety hazards in highstake applications like medical diagnosis. To address this problem, the first step should be developing attack methods dedicated for denoising to evaluate the robustness of denoisers. In contrast to the attacks for classification [9, 13], attacks for denoising should consider not only the adversarial budget but also some assumptions of natural noise, such as zeromean, because certain perturbations, such as adding a constant value, do not necessarily result in visual artifacts. Although Choi et al. choi_deep_2021,choi_evaluating_2019 studied the vulnerability for various deep image processing models, they directly applied the attack from classification. To the best of our knowledge, no attacks are truly dedicated for the denoising task till now.To this end, we propose the observationbased zeromean attack (ObsAtk), which crafts a worstcase zeromean perturbation for a noisy observation by maximizing the distance between the output and the groundtruth. To ensure that the perturbation satisfies the adversarial budget and the zeromean constraints, we utilize the classical projectedgradientdescent (PGD) [13]
method for optimization, and develop a twostep operation to project the perturbation back into the feasible region. Specifically, in each iteration, we first project the perturbation onto the zeromean hyperplane. Then, we linearly rescale the perturbation to adjust its norm to be less or equal to the adversarial budget. We examine the effectiveness of
ObsAtk on several benchmark datasets and find that deep image denoisers are indeed susceptible to ObsAtk: the denoisers cannot remove adversarial noise completely and even yield atypical artifacts, as shown in Figure 1(g).To robustify deep denoisers against adversarial perturbations, we propose an effective adversarial training strategy, namely hybrid adversarial training (HAT
), to train denoisers by using adversarially noisy images and nonadversarial noisy images together. The loss function of
HAT consists of two terms. The first term ensures the reconstruction performance from common nonadversarial noisy images, and the second term ensures the reconstructions between nonadversarial and adversarial images to be close to each other. Thus, we can obtain denoisers that perform well on both nonadversarial noisy images and their adversarial perturbed versions. Extensive experiments on benchmark datasets verify the effectiveness of HAT.Moreover, we reveal that adversarial robustness benefits the generalization capability to unseen types of noise, i.e., HAT can train denoisers for realworld noise removal only with synthetic noise sampled from common distributions like Gaussians. That is because ObsAtk searches for the worstcase perturbations around different levels of noisy images, and training with adversarial data ensures the denoising performance on various types of noise. In contrast, other reasonable methods for realworld denoising [10, 12] mostly require a large number of realworld noisy data for the training, which are unfortunately not available in some applications like medical radiology. We conduct experiments on several realworld datasets. Numerical and visual results demonstrate the effectiveness of HAT for realworld noise removal.
In summary, there are three main contributions in this work: 1) We propose a novel attack, ObsAtk, to generate adversarial examples for noisy observations, which facilitates the evaluation of the robustness of deep image denoisers. 2) We propose an effective adversarial training strategy, HAT, for robustifying deep image denoisers. 3) We build a connection between adversarial robustness and the generalization to unseen noise, and show that HAT serves as a promising framework for training generalizable deep image denoisers.
2 Notation and Background
Adversarial robustness and adversarial training
Consider a deep neural network (DNN)
mapping an input to a target , the model is trained to minimize a certain loss function that is measured by particular distance between output and the target . In high stake applications, the DNN should resist small perturbations on the input data and map the perturbed input to a result close to the target. The notion of robustness has been proposed to measure the resistance of DNNs against the slight changes of the input [15, 9]. The robustness is characterized by the distance between and target , where the worstcase perturbed input is located within a small neighborhood of the original input and maximizes the distance between its output and target .(1) 
The worstcase perturbation can be approximated via many adversarial attack methods, such as FGSM [9], IFGSM [11], and PGD [13], which solve (1) via gradient descent methods. The distance is an indication of the robustness of around : a small distance implies strong robustness and vice versa. In terms of image classification, the neighborhood is usually defined by the norm and the distance is measured by the crossentropy loss [13] or a margin loss [5]. For image restoration, the distance between images is usually measured by the norm [25].
In most cases, deep learning models have been shown to be vulnerable against adversarial attacks under normal training (NT) [16, 20]. To robustify DNNs, Madry et al. madry_towards_2018 proposed the PGD adversarial training (AT) method which trains DNNs with adversarial examples of the original data. AT is formally formulated as the following minmax optimization problem,
(2) 
Its effectiveness has been verified by extensive empirical and theoretical results [21, 7]. For further improvement, many variants of PGD have been proposed in terms of its robustness enhancement [22], generalization to nonadversarial data [23], and computational efficiency [14].
Deep image denoising
During image capturing, unknown types of noise may be induced by physical sensors, data compression, and transmission. Noisy observations are usually modeled as the addition between the groundtruth images and certain zeromean noise [6, 27], i.e., with , where is the element of
. The random vector
with distribution denotes a random clean image and the noise with a distribution satisfies the zeromean constraint. Denoising techniques aim to recover clean images from their noisy observations [25, 6]. Suppose we are given a training set of noisy and clean image pairs sampled from distributions and respectively, we can train a DNN to effectively remove the noise induced by distribution from the noisy observations. A series of DNNs have been developed for denoising in recent years, including DnCNN [25], FFDNet [26], and RIDNet [2].In realworld applications [1, 19], the noise distribution is usually unknown due to the complexity of the image capturing procedures; besides, collecting a large number of image pairs (clean/noisy or noisy/noisy) for training sometimes may be unrealistic in safetycritical domains such as medical radiology [27]. To overcome these, researchers developed denoising techniques by approximating real noise with common distributions like Gaussian or Poisson [6, 27]. To train denoisers that can deal with different levels of noise, where the noise level is measured by the energydensity of noise, the training set may consist of noisy images sampled from a variety of noise distributions [25], whose expected energydensities range from zero to certain budget (the expected norms range from zero to ). For example, where and and are sampled from and respectively and where
is randomly selected from a set of Gaussian distributions
. The denoiser trained with is termed as an denoiser.On robustness of deep image denoisers
In practice, data storage and transmission may induce imperceptible perturbations on the original data so that the perturbed noise may be statistically slightly different from the noise sampled from the specific original distribution. Although an denoiser can successfully remove noise sampled from , the performance of noise removal on the perturbed data is not guaranteed. Thus, we propose a novel attack method, ObsAtk, to assess the adversarial robustness of DIDs in Section 3. To robustify DIDs, we propose an adversarial training strategy, HAT, in Section 4. HATtrained DIDs can effectively denoise adversarial perturbed noisy images and preserve good performance on nonadversarial data.
Besides the adversarial robustness issue, it has been shown that denoisers trained with cannot generalize well to unseen realworld noise [12, 3]. Several methods have been proposed for realworld noise removal, but most of them require a large number of real noisy data for training, e.g., CBDNet (clean/noisy pairs) [10] and Noise2Noise (noisy pairs) [12], which is sometimes impractical. In Section 4.3, we show that HATtrained DIDs can generalize well to unseen real noise without the need of utilizing real noisy images for training.
3 ObsAtk for Robustness Evaluation
In this section, we propose a novel adversarial attack, Observationbased Zeromean Attack (ObsAtk), to evaluate the robustness of DIDs. We also conduct experiments on benchmark datasets to demonstrate that normallytrained DIDs are vulnerable to adversarial perturbations.
3.1 Observationbased Zeromean Attack
An denoiser can generate a highquality reconstruction close to the groundtruth from a noisy observation . To evaluate the robustness of with respect to a perturbation on , we develop an attack to search for the worst perturbation that degrades the recovered image as much as possible. Formally, we need to solve the problem stated in Eq. (3). The optimization problem is subject to two constraints: The first constraint requires the norm of to be bounded by a small adversarial budget . The second constraint restricts the mean of all elements in to be zero. This corresponds to the zeromean assumption of noise in realworld applications because a small meanshift does not necessarily result in visual noise. For example, a meanshift in grayscale images implies a slight change of brightness. Since the zeromean perturbation is added to a noisy observation , we term the proposed attack as Observationbased Zeromean Attack (ObsAtk).
(3a)  
s.t.  (3b) 
We solve the constrained optimization problem Eq. (3) by using the classical projectedgradientdescent (PGD) method. PGDlike methods update optimization variables iteratively via gradient descent and ensure the constraints to be satisfied by projecting parameters back to the feasible region at the end of each iteration. To deal with the norm and zeromean constraints, we develop a twostep operation in Eq. (4), that first projects the perturbation back to the zeromean hyperplane and then projects the result onto the neighborhood.
(4a)  
(4b) 
In each iteration, as shown in Figure 1, the first step involves projecting the perturbation onto the zeromean hyperplane. The zeromean hyperplane consists of all the vectors whose mean of all elements equals zero, i.e., , where is the length all ones vector. Thus, is a normal of the zeromean plane. We can project any vector onto the zeromean plane via (4a). The vector is first projected along the direction of , then its projection onto the zeromean plane equals itself minus its projection onto . The second step involves further projecting back to the ball via linear scaling. If is already within the ball, we keep unchanged. Otherwise, the final projection is obtained by scaling with a factor . For any two sets and , although the projection onto is, in general, not equal to the result obtained by first projecting onto , then onto , surprisingly, the following holds for the two sets in (3b).
Theorem 1 (Informal)
The formal statement and the proof of Theorem 1 are provided in Appendix A. The complete procedure of ObsAtk is summarized in Algorithm 1.
3.2 Robustness Evaluation via ObsAtk
We use ObsAtk to evaluate the adversarial robustness of denoisers on several grayscale and RGB benchmark datasets, including Set12, Set68, BSD68, and Kodak24. For grayscale image denoising, we use Train400 to train a DnCNNB [25] model, which consists of 20 convolutional layers. We follow the training setting in Zhang et al. zhang_beyond_2017 and randomly crop patches in size of . Noisy and clean image pairs are constructed by injecting different levels of white Gaussian noise into clean patches. The noise levels are uniformly randomly selected from with . For RGB image denoising, we use BSD432 (BSD500 excluding images in BSD68) to train a DnCNNC model with the same number of layers as DnCNNB and but set the input and output channels to be three. Other settings follow those of the training of DnCNNB.
We evaluate the denoising capability of the denoiser on Gaussian noisy images and their adversarially perturbed versions. The image quality of reconstruction is measured via the peaksignalnoise ratio (PSNR) metric. A large PSNR between reconstruction and groundtruth implies a good performance of denoising. We denote the energydensity of the noise in test images as and consider three levels of noise, i.e., , , and . For Gaussian noise removal, we add white Gaussian noise with to clean images. For Uniform noise removal, we generate noise from . For denoising adversarial noisy images, the norm budgets of adversarial perturbation are set to be and respectively, where equals the size of test images. We perturb noisy observations whose noise are generated from , so that the norms of total noise in adversarial images are still bounded by and the energydensity thus are bounded by . We use Atk to denote the adversarially perturbed noisy images in the size of with adversarial budget . The number of iterations of PGD in ObsAtk is set to be five.
Dataset  Atk  Atk  

Set68  29.16/0.02  29.15/0.01  24.26/0.12  23.12/0.10  
31.68/0.00  31.68/0/00  26.66/0.04  26.08/0.02  
Set12  30.39/0.01  30.41/0.01  24.32/0.18  22.96/0.13  
32.78/0.00  32.81/0.00  26.91/0.05  26.25/0.01  
BSD68  31.25/0.11  31.17/0.11  27.44/0.08  26.08/0.06  
33.98/0.11  33.93/0.12  29.31/0.08  27.84/0.04  
Kodak24  32.20/0.13  32.13/0.14  27.87/0.08  26.37/0.07  
34.77/0.13  34.73/0.14  29.55/0.07  28.00/0.04 
From Tables 1, we observe that ObsAtk clearly degrades the reconstruction performance of DIDs. In comparison to Gaussian or Uniform noisy images with the same noise levels, the recovered results from adversarial images are much worse in the sense of the PSNR. For example, when removing noisy images in Set12, the average PSNR of reconstructions from Gaussian noise can achieve 32.78 dB, whereas the PSNR drops to 26.25 dB when dealing with Atk adversarial images. We observe the consistent phenomenon that a normallytrained denoiser cannot effectively remove adversarial noise from visual results in Figure 2.
4 Robust and Generalizable Denoising via Hat
The previous section shows that existing deep denoisers are vulnerable to adversarial perturbations. To improve the adversarial robustness of deep denoisers, we propose an adversarial training method, hybrid adversarial training (HAT), that uses original noisy images and their adversarial versions for training. Furthermore, we build a connection between the adversarial robustness of deep denoisers and their generalization capability to unseen types of noise. We show that HATtrained denoisers can effectively remove realworld noise without the need to leverage the realworld noisy data.
4.1 Hybrid Adversarial Training
AT has been proved to be a successful and universally applicable technique for robustifying deep neural networks. Most variants of AT are developed for the classification task specifically, such as TRADES [22] and GAIRAT [24]. Here, we propose an AT strategy, HAT, for robust image denoising:
(5) 
where and . Note that is the adversarial perturbation obtained by solving ObsAtk in Eq. (3).
As shown in Eq. (5), the loss function consists of two terms. The first term measures the distance between groundtruth images and reconstructions from nonadversarial noisy images , where contains noise sampled from a certain common distribution , such as Gaussian. This term encourages a good reconstruction performance of from common distributions. The second term is the distance between and the reconstruction from the adversarially perturbed version of . This term ensures that the reconstructions from any two noisy observations within a small neighborhood of have similar image qualities. Minimizing these two terms at the same time controls the worstcase reconstruction performance .
The coefficient balances the tradeoff between reconstruction from common noise and the local continuity of . When equals zero, HAT degenerates to normal training on common noise. The obtained denoisers fail to resist adversarial perturbations as shown in Section 3. When is very large, the optimization gradually ignores the first term and completely aims for local smoothness. This may yield a trivial solution that always outputs a constant vector for any input. A proper value of thus ensures a denoiser that performs well for common noise and the worstcase adversarial perturbations simultaneously. We perform an ablation study on the effect of for the robustness enhancement and unseen noise removal in Appendix C.
To train a denoiser applicable to different levels of noise with an energydensity bounded by , we randomly select a noise distribution from a family of common distributions .
includes a variety of zeromean distributions whose variance are bounded by
. For example, we define for the experiments in the remaining of this paper.4.2 Robustness Enhancement via Hat
We follow the same settings as those in Section 3 for training and evaluating deep denoisers. The highest level of noise used for training is set to be . Noise is sampled from a set of Gaussian distributions . We train deep denoisers with the HAT strategy and set to be , and use onestep Atk to generate adversarially noisy images for training. We compare HAT with normal training (NT) and the vanilla adversarial training (vAT) used in Choi et al. choi_deep_2021 that trains denoisers only with adversarial data. The results on Set68 and BSD68 are provided in this section. More results on Set12 and Kodak24 (in Tables B.1 and B.2) are provided in Appendix B.
Method  Atk  Atk  Atk  

NT  29.16/0.02  26.20/0.07  24.26/0.12  23.12/0.10  
31.68/0.00  27.98/0.05  26.66/0.04  26.08/0.02  
vAT  29.05/0.07  27.02/0.15  25.51/0.32  24.34/0.34  
31.53/0.09  28.74/0.16  27.43/0.19  26.68/0.15  
HAT  28.88/0.04  27.48/0.10  26.40/0.16  25.32/0.17  
31.36/0.03  29.52/0.01  28.34/0.03  27.34/0.03 
are compared in terms of the noise removal of Gaussian noise and adversarial noise. We repeat the training for three times and report the mean and standard deviation (mean/std).
From Tables 2 and 3, we observe that HAT obviously improves the reconstruction performance from adversarial noise in comparison to normal training. For example, on the Set68 dataset (Table 2), when dealing with level noise, the normallytrained denoiser achieves 31.68 dB for Gaussian noise removal, but the PSNR drops to 26.08 dB against Atk. In contrast, the HATtrained denoiser achieves a PSNR of 27.34 dB (1.26 dB higher) against Atk and maintains a PSNR of 31.36 dB for Gaussian noise removal. In Figure 3, we can see that when dealing with adversarially noisy images, the HATtrained denoiser can recover highquality images while the normallytrained denoiser preserves noise patterns in the output. Besides, we observe that, similar to image classification tasks [22], ATbased methods (HAT and vAT) robustify deep denoisers at the expense of the performance on nonadversarial data (Gaussian denoising). Nevertheless, the degraded reconstructions are still reasonably good in terms of the PSNR.
Method  Atk  Atk  Atk  

NT  31.25/0.11  28.93/0.08  27.44/0.08  26.08/0.06  
33.98/0.11  31.09/0.10  29.31/0.08  27.84/0.04  
vAT  30.64/0.02  28.81/0.03  27.67/0.01  26.64/0.03  
33.45/0.06  31.10/0.05  29.79/0.02  28.63/0.08  
HAT  30.98/0.03  29.18/0.03  28.02/0.02  26.93/0.04  
33.67/0.04  31.38/0.04  30.03/0.02  28.80/0.01 
Dataset  BM3D  DIP  N2S(1)  NT  vAT  HAT  N2C 

PolyU  37.40 / 0.00  36.08 / 0.01  35.37 / 0.15  35.86 / 0.01  36.77 / 0.00  37.82 / 0.04  – / – 
CC  35.19 / 0.00  34.64 / 0.06  34.33 / 0.14  33.56 / 0.01  34.49 / 0.10  36.26 / 0.06  – / – 
SIDD  25.65 / 0.00  26.89 / 0.02  26.51 / 0.03  27.20 / 0.70  27.08 / 0.28  33.44 / 0.02  33.50 / 0.03 
4.3 Robustness Benefits Generalization to Unseen Noise
It has been shown that denoisers that are normally trained on common synthetic noise fail to remove realworld noise induced by standard imaging procedures [19, 1]. To train denoisers that can handle realworld noise, researchers have proposed several methods which can be roughly divided into two categories, namely datasetbased denoising methods and singleimagebased denoising methods. Highperformance datasetbased methods require a set of real noisy data for training, e.g., CBDNet requiring pairs of clean and noisy images [10] and Noise2Noise requiring multiple noisy observations of every single image [12]. However, a large number of paired data are not available in some applications, such as medical radiology and highspeed photography. To address this, singleimagebased methods are proposed to remove noise by exploiting the correlation between signals across pixels and the independence between noise. This category of methods, such as DIP [17] and N2S [3], are adapted to various types of signalindependent noise, but they optimize the deep denoiser on each test image. The testtime optimization is extremely timeconsuming, e.g., N2S needs to update a denoiser for thousands of iterations to achieve good reconstruction performance.
Here, we point out that HAT is a promising framework to train a generalizable deep denoiser only with synthetic noise. The resultant denoiser can be directly applied to perform denoising for unseen noisy images in realtime. During training, HAT first samples noise from common distributions (Gaussian) with noise levels from low to high. ObsAtk then explores the neighborhood for each noisy image to search for a particular type of noise that degrades the denoiser the most. By ensuring the denoising performance of the worstcase noise, the resultant denoiser can deal with other unknown types of noise within the neighborhood as well. To train a robust denoiser that generalizes well to realworld noise, we need to choose a proper adversarial budget . When is very small and close to zero, the HAT reduces to normal training. When is very much larger than the norm of basic noise , the adversarially noisy image may be visually unnatural because the adversarial perturbation
only satisfies the zeromean constraint and is not guaranteed to be spatially uniformly distributed as other types of natural noise being. In practice, we set the value of
of ObsAtk to be , where denotes the size of image patches. The value of of HAT is kept unchanged as .Experimental Settings
We evaluate the generalization capability of HAT on several realworld noisy datasets, including PolyU [18], CC [19], and SIDD [1]. PolyU, CC, and SIDD contain RGB images of common scenes in daily life. These images are captured by different brands of digital cameras and smartphones, and they contain various levels of noise by adjusting the ISO values. For the PolyU and CC, we use the clean images in BSD500 for training an adversarially robust denoiser with . We sample Gaussian noise from a set of distributions and add the noise to clean images to craft noisy observations. HAT trains the denoiser jointly with Gaussian noisy images and their adversarial versions. For the SIDD, we use clean images in the SIDDsmall set for training and test the denoisers on the SIDDval set. The highest level of noise used for HAT is set to be . In each case, we only use clean images for training denoisers without the need of real noisy images
Results
We compare HATtrained denoisers with the NT and vATtrained ones. From Table 4, we observe that HAT performs much better than both competitors. For example, on the SIDDval dataset, the HATtrained denoiser achieves an average PSNR value of 33.44 dB that is 6.24 dB higher than the NTtrained one. We also compare HATtrained denoisers with singleimagebased methods, including DIP, N2S, and the classical BM3D [6]. For DIP and N2S,^{1}^{1}1The officially released codes of DIP and N2S are used here. the numbers of iterations for each image are set to be 2,000 and 1,000, respectively. N2S works in two modes, namely singleimagebased denoising and datasetbased denoising. Here, we use N2S in the singleimagebased mode, denoted as N2S(1), due to the assumption that no real noisy data are available for training. We observe that HATtrained denoisers consistently outperform these baselines. Visual comparisons are provided in Appendix D. Besides, since the SIDDsmall provides a set of real noisy and groundtruth pairs, we train a denoiser, denoted as Noise2Clean (N2C), with these paired data and use the N2C denoiser as the oracle for comparison. We observe that HATtrained denoisers are comparable to the N2C one for denoising images in SIDDval (a PSNR of 33.44dB vs 33.50dB).
5 Conclusion
Normallytrained deep denoisers are vulnerable to adversarial attacks. HAT can effectively robustify deep denoisers and boost their generalization capability to unseen realworld noise. In the future, we will extend the adversarialtraining framework to other image restoration tasks, such as deblurring. We aim to develop a generic ATbased robust optimization framework to train deep models that can recover clean images from unseen types of degradation.
6 Acknowledgments
HY and VYFT are funded by a Singapore National Research Foundation (NRF) Fellowship (R263000D02281).
JZ was supported by JST ACTX Grant Number JPMJAX21AF.
MS was supported by JST CREST Grant Number JPMJCR18A2.
References
 [1] (2018) A HighQuality Denoising Dataset for Smartphone Cameras. In CVPR, (en). External Links: ISBN 9781538664209, Link, Document Cited by: §1, §2, §4.3, §4.3.
 [2] (2019) Real Image Denoising with Feature Attention. In ICCV, External Links: Link Cited by: §1, §2.
 [3] (2019) Noise2Self: Blind Denoising by SelfSupervision. In ICML, (en). Cited by: §2, §4.3.
 [4] (2004) Convex optimization. Cambridge university press. Cited by: Appendix A.
 [5] (2017) Towards Evaluating the Robustness of Neural Networks. In IEEE Symposium on Security and Privacy (SP), External Links: Document Cited by: §2.
 [6] (2007) Image Denoising by Sparse 3D TransformDomain Collaborative Filtering. IEEE TIP 16. External Links: ISSN 19410042, Document Cited by: §1, §2, §2, §4.3.
 [7] (2019) Convergence of Adversarial Training in Overparametrized Neural Networks. In NeurIPS, (en). External Links: Link Cited by: §2.

[8]
(2016)
Medical image denoising using convolutional denoising autoencoders
. In ICDMW, External Links: Link, Document Cited by: §1.  [9] (2015) Explaining and Harnessing Adversarial Examples. arXiv:1412.6572 [cs, stat] (en). External Links: Link Cited by: §1, §2.
 [10] (2019) Toward Convolutional Blind Denoising of Real Photographs. In CVPR, (en). External Links: ISBN 9781728132938, Link, Document Cited by: §1, §2, §4.3.
 [11] (2017) Adversarial examples in the physical world. arXiv:1607.02533 [cs, stat]. External Links: Link Cited by: §2.
 [12] (2018) Noise2Noise: Learning Image Restoration without Clean Data. In ICML, External Links: Link Cited by: §1, §2, §4.3.
 [13] (2018) Towards Deep Learning Models Resistant to Adversarial Attacks. In ICLR, (en). External Links: Link Cited by: §1, §1, §2.
 [14] (2019) Adversarial Training for Free!. In NeurIPS, External Links: Link Cited by: §2.
 [15] (2014) Intriguing properties of neural networks. In ICLR, External Links: Link Cited by: §2.
 [16] (2020) On Adaptive Attacks to Adversarial Example Defenses. arXiv:2002.08347 [cs, stat]. External Links: Link Cited by: §2.
 [17] (2018) Deep Image Prior. In CVPR, External Links: Link Cited by: §4.3.
 [18] (2018) Realworld Noisy Image Denoising: A New Benchmark. arXiv:1804.02603 [cs]. External Links: Link Cited by: §4.3.
 [19] (2017) Multichannel Weighted Nuclear Norm Minimization for Real Color Image Denoising. In ICCV, External Links: Link Cited by: §2, §4.3, §4.3.

[20]
(2019)
On Robustness of Neural Ordinary Differential Equations
. In ICLR, (en). External Links: Link Cited by: §2. 
[21]
(2021)
CIFS: Improving Adversarial Robustness of CNNs via Channelwise Importancebased Feature Selection
. In ICML, (en). External Links: Link Cited by: §2.  [22] (2019) Theoretically Principled Tradeoff between Robustness and Accuracy. In ICML, External Links: Link Cited by: §2, §4.1, §4.2.
 [23] (2020) Attacks Which Do Not Kill Training Make Adversarial Learning Stronger. In ICML, External Links: Link Cited by: §2.
 [24] (2020) Geometryaware Instancereweighted Adversarial Training. In ICLR, (en). External Links: Link Cited by: §4.1.
 [25] (2017) Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising. IEEE TIP. External Links: ISSN 10577149, 19410042, Link, Document Cited by: §1, §2, §2, §2, §3.2.
 [26] (2018) FFDNet: Toward a Fast and Flexible Solution for CNN based Image Denoising. IEEE TIP. External Links: ISSN 10577149, 19410042, Link, Document Cited by: §2.
 [27] (2019) A Poisson_gaussian Denoising Dataset with Real Fluorescence Microscopy Images. In CVPR, External Links: Link Cited by: §2, §2.
Appendix A Twostep Projection
Theorem 1
For any arbitrary vector , its projection onto the region defined by the intersection of the normbounded and zeromean constraints is equivalent to the projection first onto the zeromean hyperplane followed by the projection onto the ball (), i.e.,
(6) 
where
(7a)  
(7b) 
and .
Proof
Let us consider the RHS of Eq. (6) first. It is easy to derive the projections onto and seperately:
(8a)  
(8b) 
Thus, we have
(9) 
Now let us consider the LHS of Eq. (6). The projection onto can be formulated as the solution of the following convex optimization problem:
(10) 
where . We can write the Lagrangian, , associated with the problem (10) as
(11) 
Since there exists an , e.g., , such that and , the problem (10) is strictly feasible, i.e., it satisfies the Slater’s condition [4]. Besides, the objective and the constraints are all differentiable, thus the KKT conditions in Eq. (12) provide necessary and sufficient conditions for optimality.
(12a)  
(12b)  
(12c)  
(12d)  
(12e) 
We obtain the optimal solution by considering the following two cases separately, i.e., and .
Case(1): .
If , then Eq. (12) reduces to the following equation:
(13a)  
(13b)  
(13c) 
We can easily solve these equations and obtain that
(14a)  
(14b)  
(14c) 
If , then Eq. (12) reduces to the following set of equations:
(15a)  
(15b)  
(15c) 
According to (15b) and (15c), we obtain that with a norm strictly larger than , which contradicts the constraint . Thus, for the case of , we have that which is equal to in Eq. (9).
Case(2): .
Since and , we have . For any other point and , we have , where the strict inequality holds because is the set of points from a hyperplane. Thus, is not the . Therefore, .
In summary, we show that for any arbitrary .
Appendix B Experiments of Robustness Enhancement on Set12 and Kodak24
We compare the robustness of deep denoisers trained via three strategies, i.e., NT, vAT and HAT. The results on Set 12 and Kodak24 are provided in Table B.1 and Table B.2 respectively. We observe that HAT can effectively robustify deep denoisers. The reconstruction quality of HATtrained denoisers from adversarially noisy images is clearly better than that of the NT and vATtrained ones.
Training  Atk  Atk  Atk  

NT  30.39/0.01  26.51/0.14  24.32/0.18  22.96/0.13  
32.78/0.00  28.50/0.08  26.91/0.05  26.25/0.01  
vAT  30.25/0.08  27.56/0.06  25.82/0.04  24.33/0.04  
32.63/0.09  29.37/0.17  27.83/0.15  26.91/0.08  
HAT  30.01/0.06  27.96/0.15  26.46/0.20  25.13/0.19  
32.47/0.04  29.95/0.03  28.45/0.04  27.20/0.03 
Training  Atk  Atk  Atk  

NT  32.20/0.13  29.57/0.09  27.87/0.08  26.37/0.07  
34.77/0.13  31.54/0.11  29.55/0.07  28.00/0.04  
vAT  31.44/0.01  29.41/0.05  28.13/0.06  26.98/0.02  
34.14/0.08  31.53/0.11  30.06/0.08  28.78/0.06  
HAT  31.83/0.04  29.85/0.02  28.56/0.02  27.34/0.05  
34.36/0.06  31.84/0.05  30.37/0.02  29.05/0.01 
Appendix C Ablation study
c.1 Effect of on Robustness Enhancement and Generalization to Realworld noise
Here, we evaluate the effect of in HAT on the adversarial robustness and the generalization capability to realworld noise. We train deep denoisers on the RGB BSD500 (except 68 images for test) dataset. The obtained denoisers are tested on the BSD68 dataset for Gaussian and adversarial noise removal. The generalization capability is evaluated on two datasets of realworld noisy images, i.e., PolyU and CC. Experimental settings follow those in Section 4.2.
Figure C.1 corroborates the analysis in Section 4.1 that the coefficient balances the tradeoff between reconstruction from common noise and the adversarial robustness. We also find that the generalization capability to realworld noise is correlated to the adversarial robustness. Specifically, good adversarial robustness usually implies good generalization to realworld noise. In Figure C.1, the best robustness and the best performance on realworld noise appear around or . When is too large or too small, the robustness and generalization worsen. For the noise sampled from Gaussian distributions, increasing degrades the denoising performance. In summary, we set to or to achieve a good balance between the denoising performance on common noise and the adversarial robustness as well as realworld generalization.
c.2 Effect of on Generalization to Realworld Noise
Here, we evaluate the effect of used in HAT on the generalization capability to realworld noise. We train deep denoisers on the RGB BSD500 (except 68 images for test) dataset and evaluate the generalization capability on two realworld datasets, namely PolyU and CC. The is set to be . The adversarial budget of ObsAtk, that generates adversarially noisy images for HAT, is set to be values from for comparison, where denotes the size of images. Other experimental settings follow those in Section 4.2.
Figure C.2 corroborates the analysis in Section 4.3. When is very small and close to zero, the HAT reduces to normal training. The resultant denoisers cannot effectively remove realworld noise. When is very much larger than the norm of basic noise , the statistics of adversarial noise may be very unnatural because the adversarial perturbation might concentrate on a certain region, like edges or texture, and not be spatially uniformly distributed as other types of natural noise being. We can see that, when , the denoising performance on realworld datasets starts to decrease. In practice, we set the value of of ObsAtk to be to train generalizable denoisers.
Appendix D Visual results of realworld noise removal
We show the denoising results on SIDDval set in Figure D.1. We observe that HATtrained denoiser can effectively remove the realworld noise while the normallytrained one retains much noise in the reconstructions. Besides, the HATtrained denoiser outperforms other baseline methods and produces much cleaner results.