1 Introduction
Deep neural networks (DNNs)
[1, 2, 3, 4, 5] are known to be susceptible to adversarial attacks, i.e. examples crafted intentionally by adding slight noise to the input [6, 7, 8, 9]. The interesting aspect hereby is that such adversarial noise can not be recognized by humans, but will considerably decrease the accuracy of a well trained DNN [10, 11].Adversarial attacks can be broadly classified into blackbox and whitebox attacks [7, 12, 13, 14]. In the blackbox attacks, the attacker is assumed to have access to the classifier outputs only. Here, the attacker can first train a substitute network from the data and classifier outputs, and then modify the original data samples by gradient descent in order to maximize the classifier output of the substitute network for a wrong label. In whitebox attacks, the attacker is assumed to have the full knowledge on the target classification system, including the architecture and the weights of the DNNs and the respective defense strategy. In this scenario, attackers can omit the training procedure of the substitute network, and directly craft the adversarial noise. For defense, a number of strategies have been proposed, i.e. (i) incorporation of adversarial examples in the training phase [15, 16, 17, 18, 19], (ii) preprocessing such as compression and decompression that help destroying the elaborate spatial coherence hidden in the adversarial samples [20, 21, 22]
and (iii) projection of the adverserial examples onto the estimated data manifold
[23, 24, 19].In this paper, we propose a novel defense strategy: adversarial examples are relaxed towards the estimation of the high density area of the data manifold. Figure 1 gives an overview of the approach. The adversarial sample (red circle) is created by moving the original sample (black circle) away from the data manifold for which the learning machine to be attacked was trained. It is assumed that this sample is placed into a low density area of the training data, where the DNN is not well trained, but still close to the original high density area. Under this assumption, a counterstrike to the adversarial attack is to relax the sample from the low density area (high energy state) to the closest high density area (low energy state) as this makes the classifier more confident and helps to remove the adversarial noise. To efficiently relax adversarial samples, we use the Metropolisadjusted Langevin algorithm (Mala) [25, 26]
, an efficient Markov chain Monte Carlo (MCMC) sampling method. Mala requires the gradient of the energy function, which corresponds to the gradient of the (negative) log probability or the score function. An option for estimating this gradient
of the log probability of the input is to use a denoising autoencoder (DAE) [27, 28]. Just naively applying Mala with DAE would, however, have an apparent drawback: if there would exist high density regions (clusters) where the labels of samples are mixed, then Mala with DAE could drive the input sample to the area with a wrong label (see green line in Fig. 1), which degrades the classification accuracy. To overcome this drawback, we propose to replace DAE with its supervised variant, called supervised DAE (sDAE). sDAE can be trained by minimizing the sum of the reconstruction loss and the cross entropy loss of the classifier outputs. Mala with sDAE, which we call Malade, applied on the input/feature space of the classifier thus drives the adverserial samples towards high density regions of the data generating distributions (see blue line in Fig. 1), where the classifier is well trained to predict the correct label. Figure 2 compares Mala and Malade on a real image from the Mnist dataset. Since Malade uses prior knowledge on the target labels, it provides a much better projection to the manifold than Mala (with unsupervised DAE).Conceptually, our novel strategy is inspired by a method where the adversarial samples are projected to the closest point in the data manifold [23, 24, 18, 19] through an optimization procedure. Clearly, our method does not require optimization, and therefore is much faster than existing stateoftheart methods. We demonstrate the high robustness of Malade against blackbox and whitebox attacks on various benchmark datasets and find better or comparable performance to the stateoftheart defense methods at significantly less computational costs.
2 Attack and Defense Strategies
2.1 Attacking Strategies
We introduce the three most popular attacking strategies for whitebox attacks. For blackbox attacks, the same strategies can be applied after a substitute network is trained to mimic the classifier outputs.
Fast gradient sign method
Fast gradient sign method (FGSM) [6] is one of the simplest and fastest attack algorithms. Given an original sample and its corresponding label in the 1of expression, FGSM performs the gradient descent to move away from the true label:
(1) 
where is the cross entropy loss of the classifier output for the true label , and is the step size controlling the distance from the original sample. Eq.(1) corresponds to the untargeted attacks, where the goal of the attacker is to make the classifier give a wrong label. By replacing the second term with , Eq.(1) gives the targeted attacks, where the attacker tries to make the classifier give a specific target label . The gradient step of FGSM can be applied iteratively, which naturally strengthens the attacks, while the adversarial noise gets easier for a human to detect.
R+fgsm
CarliniWagner
The CarliniWagner attack (CW) optimizes the noise by solving
(3) 
where denotes the norm, is an objective function that causes the sample to be misclassified, and is a tradeoff parameter balancing the distance from the original sample and the strength of the attack. This method provides a crafted adversarial examples with minimum distance from the original sample, while making the classification accuracy low.
2.2 Defense Strategies
Adversarial training
In this strategy, adversarial samples are generated by a few known attacking strategies, and added to the training data, to make the classifier robust against those attacks [7, 18, 29, 19]. A drawback is naturally that one cannot foresee and provide adversarial examples of all kinds of attacks before training the classifier, which leaves the classifier open to new/unknown attacking strategies. A unique strategy is to distill the knowledge from a classifier to a student network [16]. It was reported that the student network trained on the classifier outputs (probabilities) tends to generalize well the samples outside but close to the data manifold, and therefore be more robust against the adversarial attacks than the original classifier.
Input preprocessing methods
A simple but effective method to defend against adversarial noise is to preprocess the input. It was reported that image transformations (e.g., bit depth reduction, JPEG compression and decompression) can destroy the elaborate spatial coherence hidden in the adversarial noise [20]. Denoising autoencoders (DAE) have been used for the same purpose [21, 22]. [18] pointed out that a DAE trained with the reconstruction error along with the classification error can be considered as a stack of networks, and adversarial examples crafted for such networks tend to contain less adversarial noise.
Generative methods
Fortified networks [19] reconstruct the feature space using a DAE. While the DAE is effectively used to model the data distribution, this method is orthogonal to our method proposed here. The authors considered the DAE to be part of the classifier model and the DAE is applied on the feature space. This combination of classifier and DAE parameters are trained jointly using classification loss, mean squared error of the DAE’s reconstruction and adversarial loss in addition. [24] proposed an untrained generative network as a deep image prior to reconstruct the image, thereby removing the adversarial noise. A generator pretrained in an adversarial fashion is used to reconstruct the image in [23, 24]. At test time, the generator is optimized with a reconstruction error. This corresponds to searching the latent space for an image close to the manifold of images learned.
3 Sampling and Denoising
Metropolisadjusted Langevin Algorithm
Metropolisadjusted Langevin algorithm (Mala) is an efficient Markov chain Monte Carlo (MCMC) sampling method which uses the gradient of the energy (negative logprobability ). Sampling is performed sequentially by
(4) 
where is the step size, and is random perturbation subject to . By appropriately controlling the step size
and the noise variance
, the sequence is known to converge to the distribution .^{1}^{1}1 For convergence, a rejection step after Eq.(4) is required. However, it was observed that a variant, called Malaapprox [30], without the rejection step gives reasonable sequence for moderate step sizes. We use Malaapprox in our proposed method. [30] successfully generate realistic artificial images that follow the natural image distribution with the gradient estimated by denoising autoencoders.Denoising Autoencoders
A denoising autoencoders (DAE) [27, 28] is trained so that data samples contaminated with artificial noise is cleaned. More specifically, it minimizes the reconstruction error:
(5) 
where denotes the expectation over the distribution , is a training sample subject to a distribution , and is artificial Gaussian noise with mean zero and variance . denotes an empirical (training) distribution of the distribution , namely, where are the training samples. [31]
discussed relation between DAEs and contractive autoencoders (CAEs), and proved the following useful property of DAEs:
[31] Under the assumption that ^{2}^{2}2 This assumption is not essential as we show in the proof in Appendix A. , the minimizer of the DAE objective (5) satisfies(6) 
as . Proposition 3 states that a DAE trained with a small can be used to estimate the gradient of the log probability. In a blog [32], it was proved that the residual is proportional to the score function of the noisy input distribution for any , i.e.,
(7) 
4 Proposed Method
In this section, we propose our novel strategy for defense against adversarial attacks. We first introduce a supervised variant of DAE, and then propose our defense strategy.
4.1 Supervised Denoising Autoencoders (sDAE)
We propose a supervised variant of DAE, called the supervised denoising autoencoder (sDAE), which is trained by minimizing the following functional with respect to the function :
(8) 
The difference from the DAE objective (5) is in the second term, which is proportional to the cross entropy loss. With this additional term, sDAE provides the gradient estimator of the logjointprobability averaged over the training (conditional) distribution. Assume that the classifier output accurately reflects the conditional probability of the training data, i.e., , then the minimizer of the sDAE objective (8), satisfies
(9) 
(Sketch of proof) Similarly to the analysis in [31], we first Taylor expand around , and write the sDAE objective similar to the CAE objective (The objective contains a higher order term than in [31] since we do not assume that ). After that, applying the second order EulerLagrange equation gives Eq.(9) as a stationary condition. The complete proof is given in Appendix A.
Since , if the label distribution is flat (or equivalently the number of training samples for all classes is the same), i.e., , the residual of sDAE gives
The first term is the gradient of the logconditionaldistribution on the label, where the label is estimated from the prior knowledge (the expectation is taken over the training distribution of the label, given ). If the number of training samples are nonuniform over the classes, the weight (or the step size) should be adjusted so that all classes contribute equally to the DAE training.
4.2 Mala with sDAE (Malade)
By using the sDAE, we perform Mala on the joint distribution, to relax the input samples:
(10) 
Malade generates samples at every step using the score function provided by sDAE.
is the step size which describes the stride to be taken at every step and
is the noise term.While [19] use DAE to denoise the features using the behavior of [31], the DAE and classifier models are not independently trained. Malade on the other hand can be trained under supervision from any pretrained classifier model and it would learn the clustering accordingly. DefenseGan [23] and InvertandClassify [24] perform steps of optimization to reconstruct the input, more number of steps can prove detrimental as the generator can reconstruct the adversarial example as well. Malade is based on sampling using Langevin dynamics. The gradient flow driving the samples become close to zero as they approach the data manifold effectively assuring data fidelity. MagNet [22] uses autoencoders to reconstruct the image as a preprocessing step. Hence, it can be considered as a special case of Mala sampling (guided by unsupervised DAE) with number of steps equals and step size equals .
5 Experiments
In this section, we report on the empirical performance of the proposed method in comparison with stateoftheart baselines. We conducted experiment on the following datasets:
5.1 Blackbox Attacks
We first evaluate the robustness against blackbox attacks. We trained classifier models with different architectures on the Mnist dataset. The list of models and their architectures can be found in Appendix B. Table 1 summarizes the results. As in previous work on simulating blackbox attacks [7], we allow the attacker to train a substitute model to imitate the classifier’s output with samples kept aside from the test set. The adversarial examples crafted by the substitute is then used to attack the classifier. In the blackbox setting, the defense is also considered to be part of the blackbox. So, in effect, the substitute is trained from the output of the defense and the classifier.
For FGSM attack, with , the attacker is successful in bringing down the accuracy of the classifier. For a classifier model D and an attacker with architecture of A, the accuracy of the classifier on the test samples goes to . For Malade, sDAE was trained for each classifier model under supervision. On the other hand, for Mala, one unsupervised DAE was trained and applied to all the classifiers. For a given classifier, the number of steps to be taken and the step sizes are fixed for defense. The selection of step size is crucial for defense and will be discussed in Section 5.3
We present the results of Mala and Malade and compare them with two stateoftheart defenses  DefenseGan [23] and MagNet [22]  in Table 1. We reproduced results by DefenseGan by using the publicly available code.^{3}^{3}3 https://github.com/kabkabm/defensegan For Magnet, we cleaned samples by the same unsupervised DAE used for Mala. In addition to our experiments on Mnist dataset, we evaluate the performance of Malade on Cifar10 and report the results in Table 4. In Appendix D.1 and D.2, success and failure cases of Malade are displayed using images from the Cifar10 dataset.
Classifier  Accuracy  No Defense  Mala  Malade  DefenseGan  MagNet 
A/B  99.17  61.52  93.66  95.27  85.56  94.75 
A/C  99.17  54.38  90.55  94.11  86.43  97.63 
B/A  99.26  68.77  93.15  95.21  89.72  93.77 
B/D  99.26  45.04  92.23  94.16  88.75  97.93 
C/B  99.27  61.54  93.81  95.75  86.32  94.41 
C/D  99.27  59.30  95.64  97.18  89.62  96.72 
D/A  98.34  27.74  89.78  91.32  84.21  90.31 
D/D  98.34  25.23  86.74  92.82  87.83  92.59 
5.2 Whitebox Attacks
In whitebox settings, the attacker is assumed to have knowledge of the classifier and the defense in addition to the model parameters. We evaluate Mala and Malade on FGSM, R+FGSM and CW attacks. While the perturbations caused by the FGSM and R+FGSM attacks visually affect the image, the adversarial examples crafted by the CW attack produces images that are visually as good as real images.
Table 2 provides our results for the whitebox settings along with the baseline methods. MagNet, which performed well for blackbox settings suffers severely on whitebox settings. DefenseGan on the other hand, performs very well against the whitebox attacks due to the optimization procedure for each input.
Since there is randomness at each step of the sampling due to in Eqn. 10, Malade is robust on the whitebox setting as well. Although Malade in whitebox attacks performs slightly worse than DefenseGan, it is worth noting that DefenseGan performs 200 steps of optimization and 10 different random initial seed to get these results. The score function for Malade is obtained from a pretrained sDAE and hence is computationally much less expensive. To compute the results from all our experiments, Malade required only 10 steps. Computation time is compared in Section 5.6.
More importantly, for real world systems, the attacker is at the liberty to choose the attacking strategy. Hence, it is vital that a defense mechanism be robust to both blackbox and whitebox attacks. Assuming that the attacker can choose the best strategies (taking the minimum accuracy over the blackbox and the whitebox strategies), our proposed Malade outperforms both baseline methods.
Attack  Classifier  Accuracy  No Defense  Mala  Malade  DefenseGan  MagNet 

A  99.17  18.36  81.77  86.16  97.03  82.17  
FGSM  B  99.26  06.07  86.96  95.50  97.14  85.35 
C  99.21  09.54  81.71  96.86  97.07  79.01  
D  98.34  22.24  81.45  94.74  96.43  78.83  
R+  A  99.17  12.90  85.09  88.30  97.18  86.19 
FGSM  B  99.26  06.03  88.59  96.27  97.29  88.50 
,  C  99.21  05.14  84.22  97.26  97.32  82.66 
D  98.34  21.91  83.57  95.50  94.98  81.69  
A  99.17  00.00  67.57  71.60  98.90  00.00  
CW  B  99.26  00.00  67.54  91.86  91.60  00.00 
norm  C  99.21  28.04  69.49  96.15  98.90  00.00 
D  98.34  00.00  67.53  90.25  98.30  00.03 
Accuracy  No Defense  Malade  
0.10  99.26  98.94  98.98 
0.15  99.26  97.97  98.71 
0.20  99.26  93.90  98.59 
0.25  99.26  82.42  97.95 
0.30  99.26  61.54  97.18 
0.35  99.26  42.45  94.61 
0.40  99.26  30.33  86.16 
Attack  Method  Accuracy  No Defense  Mala  Malade 

Blackbox  FGSM  83.89  54.32  66.47  68.21 
5.3 Selection of step size
The score function provided by Malade drives the generated sample towards high density regions in the data generating distributions. With the direction provided by the score function, controls the distance to move. With large , there is possibility of jumping out of the data manifold. Empirically, we found that annealing and with an offset provided best results.
5.4 Effect of for FGSM attacks
Table 3 provides further results on the blackbox FGSM attacks for classifier model C with varied from to . An attacker is at the disposal of several strategies to weaken the classifier. A good defense should be robust to different attacks as well as different parameters of the attacks. While Malade is robust to varying the in FGSM attacks, higher values of can also destroy the image visually perceptible to a human eye.
5.5 Effect of the noise while training sDAE
The score function provided by DAE is dependent on the noise added to the input while training the DAE. While too small values for make the score function highly unstable, too large values blur the score. The same is true for the score function provided by Malade. Here in all our experiments, we trained the DAE as well as the sDAE with . Such a large noise is beneficial for reliable estimation of the score function. We report in Table 5 our findings on the effect of classification accuracy under a blackbox FGSM attack by varying the amount of noise on which the sDAE was trained on.
5.6 Time complexity
In Table 6, we report the time taken by Malade as a defense. It includes computing the score function and generating a sample after N steps. As the number of steps increases (N), the computational demand is minimal (measured as the elapsed time in seconds). A single NVIDIA GeForce GTX TITAN V GPU was used to perform this analysis. For DefenseGan, we show the results corresponding to a single random start (referred to as R=1 in the authors’ work). Note that DefenseGAN requires 10 random starts (and hence 10 times more computation) to achieve the accuracy reported in Tables 1 and 2.
0.01  0.1  0.2  0.3  0.4 

87.08  91.98  96.91  97.18  96.30 
N  Malade  DefenseGan 

10  0.008 0.003  0.043 0.027 
25  0.016 0.007  0.070 0.003 
50  0.028 0.013  0.137 0.004 
6 Conclusion
Adversarial attacks to deep learning models essentially change a sample such that human perception does not allow to detect change. However, the classifier is compromised and yields a false prediction. Common practice for defense uses a denoising step [22, 20, 23, 24] to alleviate this effect. In this work we have proposed to use the Metroplisadjusted Langevin algorithm which is guided through a supervised DAE  Malade. This framework allows to drive the adversarial samples towards the underlying data manifold and thus towards the high density regions of the data generating distribution which were originally used for training the nonlinear learning machine. This twostep procedure (1) relaxing and (2) classification give rise to a high generalization performance that significantly reduces the effect of adversarial attacks. We empirically show that Malade is robust against different attacks on blackbox and whitebox settings—Malade outperforms the stateoftheart method (DefenseGan), assuming that the attacker is at the disposal of several strategies to weaken the classifier. In addition, Malade is more computationally efficient than the other methods. Future work includes fine tuning of our strategy, e.g., majority vote of the classifier outputs from a collection of the generated samples after burnin, analyzing the attacks and defenses using interpretation methods [34], and applying the supervised DAE to other applications such as federated or distributed learning [35, 36].
Acknowledgments
This work was supported by the Fraunhofer Society under the MPIFhG collaboration project “Theory & Practice for Reduced Learning Machines”. This work was also supported by the German Research Foundation (GRK 1589/1) by the Federal Ministry of Education and Research (BMBF) under the project Berlin Big Data Center (FKZ 01IS14013A).
References

[1]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in
Advances in neural information processing systems, 2012, pp. 1097–1105.  [2] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradientbased learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.

[3]
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,
V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, June 2015, pp. 1–9.  [4] K. Simonyan and A. Zisserman, “Very deep convolutional networks for largescale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
 [5] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
 [6] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572, 2014.

[7]
N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami, “Practical blackbox attacks against machine learning,” in
Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. ACM, 2017, pp. 506–519.  [8] J. Bruna, C. Szegedy, I. Sutskever, I. Goodfellow, W. Zaremba, R. Fergus, and D. Erhan, “Intriguing properties of neural networks,” 2013.
 [9] A. Nguyen, J. Yosinski, and J. Clune, “Deep neural networks are easily fooled: High confidence predictions for unrecognizable images,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015, pp. 427–436.
 [10] I. Evtimov, K. Eykholt, E. Fernandes, T. Kohno, B. Li, A. Prakash, A. Rahmati, and D. Song, “Robust physicalworld attacks on machine learning models,” arXiv preprint arXiv:1707.08945, 2017.
 [11] A. Athalye and I. Sutskever, “Synthesizing robust adversarial examples,” arXiv preprint arXiv:1707.07397, 2017.
 [12] N. Papernot, P. McDaniel, and I. Goodfellow, “Transferability in machine learning: from phenomena to blackbox attacks using adversarial samples,” arXiv preprint arXiv:1605.07277, 2016.
 [13] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami, “The limitations of deep learning in adversarial settings,” in Security and Privacy (EuroS&P), 2016 IEEE European Symposium on. IEEE, 2016, pp. 372–387.
 [14] N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in Security and Privacy (SP), 2017 IEEE Symposium on. IEEE, 2017, pp. 39–57.
 [15] F. Tramèr, A. Kurakin, N. Papernot, D. Boneh, and P. McDaniel, “Ensemble adversarial training: Attacks and defenses,” arXiv preprint arXiv:1705.07204, 2017.
 [16] N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami, “Distillation as a defense to adversarial perturbations against deep neural networks,” in Security and Privacy (SP), 2016 IEEE Symposium on. IEEE, 2016, pp. 582–597.
 [17] T. Strauss, M. Hanselmann, A. Junginger, and H. Ulmer, “Ensemble methods as a defense to adversarial perturbations against deep neural networks,” arXiv preprint arXiv:1709.03423, 2017.
 [18] S. Gu and L. Rigazio, “Towards deep neural network architectures robust to adversarial examples,” arXiv preprint arXiv:1412.5068, 2014.
 [19] A. Lamb, J. Binas, A. Goyal, D. Serdyuk, S. Subramanian, I. Mitliagkas, and Y. Bengio, “Fortified networks: Improving the robustness of deep networks by modeling the manifold of hidden representations,” arXiv preprint arXiv:1804.02485, 2018.
 [20] C. Guo, M. Rana, M. Cissé, and L. van der Maaten, “Countering adversarial images using input transformations,” arXiv preprint arXiv:1711.00117, 2017.
 [21] F. Liao, M. Liang, Y. Dong, T. Pang, J. Zhu, and X. Hu, “Defense against adversarial attacks using highlevel representation guided denoiser,” arXiv preprint arXiv:1712.02976, 2017.
 [22] D. Meng and H. Chen, “Magnet: a twopronged defense against adversarial examples,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2017, pp. 135–147.
 [23] P. Samangouei, M. Kabkab, and R. Chellappa, “Defensegan: Protecting classifiers against adversarial attacks using generative models,” in International Conference on Learning Representations, vol. 9, 2018.
 [24] A. Ilyas, A. Jalal, E. Asteri, C. Daskalakis, and A. G. Dimakis, “The robust manifold defense: Adversarial training using generative models,” arXiv preprint arXiv:1712.09196, 2017.
 [25] G. O. Roberts and J. S. Rosenthal, “Optimal scaling of discrete approximations to langevin diffusions,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 60, no. 1, pp. 255–268, 1998.
 [26] G. O. Roberts, R. L. Tweedie et al., “Exponential convergence of langevin distributions and their discrete approximations,” Bernoulli, vol. 2, no. 4, pp. 341–363, 1996.
 [27] P. Vincent, H. Larochelle, Y. Bengio, and P.A. Manzagol, “Extracting and composing robust features with denoising autoencoders,” in Proceedings of the 25th international conference on Machine learning. ACM, 2008, pp. 1096–1103.
 [28] Y. Bengio, L. Yao, G. Alain, and P. Vincent, “Generalized denoising autoencoders as generative models,” in Advances in Neural Information Processing Systems, 2013, pp. 899–907.
 [29] A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial machine learning at scale,” arXiv preprint arXiv:1611.01236, 2016.
 [30] A. Nguyen, J. Yosinski, Y. Bengio, A. Dosovitskiy, and J. Clune, “Plug & play generative networks: Conditional iterative generation of images in latent space,” arXiv preprint arXiv:1612.00005, 2016.
 [31] G. Alain and Y. Bengio, “What regularized autoencoders learn from the datagenerating distribution,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 3563–3593, 2014.
 [32] Unknown, “Learning by denoising part 2. connection between data distribution and denoising function,” 2016. [Online]. Available: https://thecuriousaicompany.com/learningbydenoisingpart2connectionbetweendatadistributionanddenoisingfunction/
 [33] A. Krizhevsky, “Learning multiple layers of features from tiny images,” 2009.
 [34] G. Montavon, W. Samek, and K.R. Müller, “Methods for interpreting and understanding deep neural networks,” Digital Signal Processing, vol. 73, pp. 1–15, 2018.
 [35] H. B. McMahan, E. Moore, D. Ramage, S. Hampson et al., “Communicationefficient learning of deep networks from decentralized data,” arXiv preprint arXiv:1602.05629, 2016.
 [36] F. Sattler, S. Wiedemann, K.R. Müller, and W. Samek, “Sparse binary compression: Towards distributed deep learning with minimal communication,” arXiv:1805.08768, 2018.
Appendix A Proof of Theorem 4.1
sDAE is trained so that the following functional is minimized with respect to the function :
(11) 
which is a finite sample approximation to the true objective
(12) 
We assume that and are analytic functions with respect to . For small , the Taylor expansion of the th component of around gives
where is the Hessian of a function . Substituting this into Eq.(12), we have
(13) 
We can find the optimal function minimizing the functional (14) by using calculus of variations. The optimal function satisfies the following EulerLagrange equation: for each ,
(16) 
where is the gradient (of with respect to ) and is the Hessian.
We have
and therefore
where is the Kronecker delta. Substituting the above into Eq.(16), we have
(17) 
and therefore
Appendix B Model Architecture
In this appendix, we summarize the architectures of the deep neural networks we used in all experiments. Appendix B.1 gives the architectures of the classifier models while Appendix B.2 gives the architecture of the DAE and sDAE models (both have the same architecture).
Conv represents convolution, with the format of Conv(number of output filter maps, kernel size, stride size). Linear
represents a fully connected layer with the format of Linear(number of output neurons).
Reluis a rectified linear unit while
Tanh is a hyberbolic tangent function. Softmaxis a logistic function which squashes the input tensor to real values of range [0,1] and adds up to 1.
Conv_Transpose is transpose of the convolution operation, sometimes called deconvolution, with the format of Conv_Transpose(number of output filter maps, kernel size, stride size).b.1 Classifier Architecture
A  B  C  D 

Conv()  Conv()  Conv()  Linear() 
Relu()  Relu()  Relu()  Relu() 
Conv()  Conv()  Conv()  Linear() 
Relu()  Relu()  Relu()  Relu() 
Linear()  Conv()  Linear()  Linear() 
Relu()  Relu()  Relu()  Softmax() 
Linear()  Linear()  Linear()  
Softmax()  Softmax()  Softmax() 
b.2 DAE (sDAE) Architecture
DAE, sDAE  

Conv()  
Encoder  Tanh() 
Conv()  
Tanh()  
Conv_Transpose()  
Tanh()  
Decoder  Conv_Transpose() 
Tanh()  
Conv()  
Tanh() 