1 Introduction
Since the initial investigation by Szegedy et al. (2013), adversarial examples have drawn a large interest. Various methods for both generating adversarial examples as well as protecting a classifier from them have been proposed (see Sec. 3–4 for more details.) Adversarial examples exist due to misbehaviors of a classifier in some regions of the input space and are generated often by finding a point in such a region using optimization.
According to (Gilmer et al., 2018), adversarial examples can be categorized into those off the data manifold, which is defined as a manifold on which training examples lie, and those on the data manifold. Offmanifold adversarial examples occur as the classifier does not have a chance to observe any offmanifold examples during training, which is a natural consequence from the very definition of the data manifold. Onmanifold adversarial examples however exist between training examples on the data manifold. There are two causes behind this phenomenon; (1) the sparsity of training examples and (2) the nonsmooth behavior of the classifier on the data manifold.
In this paper, we propose to tackle both off and onmanifold adversarial examples by incorporating an offtheshelf retrieval mechanism which indexes a large set of examples and training this combination of a deep neural network classifier and the retrieval engine to behave linearly on the data manifold using a novel variant of the recently proposed mixup algorithm (Zhang et al., 2017), to which we refer as “local mixup.”
The retrieval mechanism efficiently selects a subset of neighboring examples from a candidate set near the input. These neighboring examples are used as a local approximation to the data manifold in the form of a featurespace convex hull onto which the input is projected. The classifier then makes a decision based on this projected input. This addresses offmanifold adversarial examples. Within this featurespace convex hull, we encourage the classifier to behave linearly by using local mixup to further address onmanifold adversarial examples.
We evaluate the proposed approach, called a retrievalaugmented classifier, with a deep convolutional network (LeCun et al., 1998) on object recognition. We extensively test the retrievalaugmented convolutional network (RaCNN) on datasets with varying scales; CIFAR10 (Krizhevsky & Hinton, 2009), SVHN (Netzer et al., 2011) as well as ImageNet (Deng et al., 2009), against five readilyavailable adversarial attacks including both whitebox (FGSM, iFGSM, DeepFool and LBFGS) and blackbox attacks (Boundary). Our experiments reveal that the RaCNN is more robust to these five attacks than the vanilla convolutional network.
2 RetrievalAugmented CNN
Gilmer et al. (2018) have recently demonstrated that adversarial examples exist both on and off the data manifold in a carefully controlled setting in which examples from two classes are placed on two disjoint spheres. This result suggests that it is necessary to tackle both types of adversarial examples to improve the robustness of a deep neural network based classifier to adversarial examples. In this section, we describe our approach toward building a more robust classifier by combining an offtheshelf retrieval engine and a variant of the recently proposed mixup learning strategy.
2.1 Setup
Let be a candidate set of examples. This set may be created as a subset from a training set or may be an entire separate set. We use as a proxy to the underlying data manifold.
is a distance function that measures the dissimilarity between two inputs and . In order to facilitate the use of an offtheshelf retrieval engine, we use
(1) 
where is a predefined, or pretrained, feature extractor. We assume the existence of a readilyavailable retrieval engine that takes as input and returns the nearest neighbors in according to .
We then have a deep neural network classifier composed of a feature extraction
and a classifier . This classifier is trained on a training set , taking into account the extra set and the retrieval engine.2.2 Inference
In this setup, we first describe the forward evaluation of the proposed network. This forward pass is designed to handle adversarial examples “off” the data manifold by projecting them onto the data manifold.
Local Characterization of Data Manifold
Given a new input , we use the retrieval engine to retrieve the examples ’s from that are closest to : . We then build a featurespace convex hull by
As observed earlier, linear interpolation of two input vectors in the feature space of a deep neural network often corresponds to a plausible input vector, unlike when interpolation was done in the raw input space
(see, e.g., Bengio et al., 2013; Kingma & Welling, 2013; Radford et al., 2015). Based on this observation, we consider the featurespace convex hull as a reasonable local approximation to the underlying data manifold.Trainable Projection
Exact projection of the input onto this convex hull requires expensive optimization, especially in the highdimensional space. As we consider a deep neural network classifier, the dimension of the feature space could be hundreds or more, making this exact projection computationally infeasible. Instead, we propose to learn a goaldriven projection procedure based on the attention mechanism (Bahdanau et al., 2014).
We compare each input against and compute a score:
where is a trainable weight matrix (Luong et al., 2015). These scores are then normalized to form a set of coefficients: These coefficients ’s are then used to form a projection point of in the featurespace convex hull :
This trainable projection could be thought of as learning to project an offmanifold example on the locallyapproximated manifold to maximize the classification accuracy.
Classification
The projected feature now represents the original input and is fed to a final classifier . In other words, we constrain the final classifier to work only with a point inside a featurespace convex hull of neighboring training examples. This constraint alleviates the issue of the classifier’s misbehavior in the region outside the data manifold up to a certain degree.^{1}^{1}1 The quality of the local approximation may not be uniformly high across the input space, and we do not claim that it solves the problem of offmanifold adversarial examples.
2.3 Training
The output of the classifier is almost fully differentiable with respect to the classifier , both of the features extractors ( and ) and the attention weight matrix , except for the retrieval engine .^{2}^{2}2 We believe the introduction of this nondifferentiable, blackbox retrieval engine further contributes to the increased robustness against whitebox attacks.
This allows us to train the entire pipeline in the previous section using backpropagation
(Rumelhart et al., 1986) and gradientbased optimization.Local Mixup
This is however not enough to ensure the robustness of the proposed approach to onmanifold adversarial examples. During training, the classifier only observes a very small subset of any featurespace convex hull. Especially in a highdimensional space, this greatly increase the chance of the classifier’s misbehavior within these featurespace convex hulls, as also noted by Gilmer et al. (2018). In order to address this issue, we propose to augment learning with a local variant of the recently proposed mixup algorithm (Zhang et al., 2017).
The goal of original mixup is to encourage a classifier to act linearly between any pair of training examples. This is done by linearly mixing in two randomlydrawn training examples and creating a new linearlyinterpolated example pair during training. Let two randomlydrawn pairs be and , where and are onehot vectors in the case of classification. Mixup creates a new pair and uses it as a training example, where
is a random sample from a beta distribution. We call this original version
global mixup, as it increases the linearity of the classifier between any pair of training examples.It is however unnecessary for our purpose to use global mixup, as our goal is to make the classifier better behave (i.e., linearly behave) within a featurespace convex hull . Thus, we use a local mixup in which we uniformly sample the convex coefficients ’s at random to create a new mixed example pair . We use the Kraemer Algorithm (see Sec. 4.2 in Smith & Tromble, 2004).
Overall
We use stochastic gradient descent (SGD) to train the proposed network. At each update, we perform
descent steps for the usual classification loss, and descent steps for the proposed local mixup.2.4 Retrieval Engine
The proposed approach does not depend on the specifics of a retrieval engine . Any offtheshelf retrieval engine that supports dense vector lookup could be used, enabling the use of a very largescale with latest fast dense vector lookup algorithms, such as FAISS (Johnson et al., 2017). In this work, we used a more rudimentary retrieval engine based on localitysensitive hashing (LSH; see, e.g., Datar et al., 2004) with a reduced feature dimension using random projection (see, e.g., Bingham & Mannila, 2001, and references therein), as the sizes of candidate sets in the experiments contain approximately 1M or less examples. The key from Eq. (1) was chosen to be a pretrained deep neural network without the final fullyconnected classifier layers (Krizhevsky et al., 2012; He et al., 2016).
3 Adversarial Attack
3.1 Attack Scenarios
Scenario 1 (Direct Attack)
In this work, we consider the candidate set and the retrieval engine which indexes it to be “hidden” from the outside world. This property makes a usual whitebox attack more of a graybox attack in which the attacker has access to the entire system except for the retrieval part. This is our first attack scenario.
Scenario 2 (Retrieval Attack)
Despite the hidden nature of the retrieval engine and the candidate set, it is possible for the attacker to confuse the retrieval engine, if she/he could access the feature extractor . We furthermore give the attacker the access not only to but the original classifier which was tuned together with . This allows the attacker to create an adversarial example on that could potentially disrupt the retrieval process, thereby fooling the proposed network. Although this is unlikely in practice, we test this second scenario to investigate the possibility of compromising the retrieval engine.
3.2 Attack Methods
Under each of these scenarios, we evaluate the robustness of the proposed approach on the five widely used/tested adversarial attack algorithms including both whitebox and blackbox attacks. They are fast gradient sign method (FGSM, Goodfellow et al., 2014b), its iterative variant (iFGSM, Kurakin et al., 2016), DeepFool (MoosaviDezfooli et al., 2016), LBFGS (Tabacof & Valle, 2016) and Boundary (Brendel et al., 2017). We acknowledge that this is not an exhaustive list of attacks, however find it to be extensive enough to empirically evaluate the robustness of the proposed approach.
Fast Gradient Sign Method (FGSM)
FGSM creates an adversarial example by adding the scaled sign of the gradient of the loss function
computed using a target class to the input:where the scale controls the difference between the original input and its adversarial version . This is a whitebox attack, requiring the availability of the gradient of the loss function with respect to the input.
Iterative FGSM (iFGSM)
iFGSM improves upon the FGSM by iteratively modifying the original input for a fixed number of steps. At each step,
where and . Similarly to the FGSM, the iFGSM is a whitebox attack.
DeepFool
MoosaviDezfooli et al. (2016) proposed to create an adversarial example by finding a residual vector with the minimum norm with the constraint that the output of a classifier must flip. They presented an efficient iterative procedure to find such a residual vector. Similarly to the FGSM and iFGSM, this approach relies on the gradient of the classifier’s output with respect to the input, and is hence a whitebox attack.
LBfgs
Tabacof & Valle (2016) proposed an optimizationbased approach, similar to DeepFool above, however, more explicitly constraining the input to lie inside a tight box defined by training examples. They use LBFGSB (Zhu et al., 1997) to solve this boxconstrained optimization problem. This is also a whitebox attack.
Boundary
Brendel et al. (2017) proposed a powerful blackbox attack, or more specifically decisionbased attack, that requires neither the gradient of a classifier nor the predictive distribution. It only requires the final decision of the classifier. Starting from an adversarial example, potentially far away from the original input, it iteratively searches for a next adversarial example that has a smaller difference to the original input. This procedure guarantees the reduction in the difference by rejecting any step that neither decreases the difference nor makes the example not adversarial.
Implementation
We use Foolbox^{3}^{3}3 Available at http://foolbox.readthedocs.io/en/latest/. Revision 2d468cb6. released by Rauber et al. (2017). Whenever necessary for further analysis, such as the accuracy per the amount of adversarial perturbation, we implement some of these attacks ourselves.
4 Related Work
Since the phenomenon of adversarial examples was noticed by Szegedy et al. (2013), there have been a stream of attempts at making a deep neural network more robust. Most of the existing work are orthogonal to the proposed approach here and could be used together. We however detail them here to demonstrate similarities and contrasts against our approach.
4.1 Input Transformation
An offmanifold adversarial example can be avoided, if it could be projected onto the data manifold, characterized by training examples. This could be thought of as transforming an input. There have been two families of algorithms in this direction.
DataIndependent Transformation
The first family of defense mechanisms aims at reducing the input space so as to minimize regions that are off the data manifold. Dziugaite et al. (2016) demonstrated that JPEGcompressed images suffer less from adversarial attacks. Lu et al. (2017) suggest that trying various scaling of an image size could overcome adversarial attacks, as they seem to be sensitive to the scaling of objects. Guo et al. (2017) uses an idea of compressed sensing to transform an input image by reconstructing it from its lowerresolution version while minimizing the total variation (Rudin et al., 1992). More recently, Jacob Buckman (2018) proposed to discretize each input dimension using thermometer coding. These approaches are attractive due to their simplicity, but there have some work showing that it is often not enough to defend against sophisticated adversarial examples (see, e.g., Shin & Song, 2017).
DataDependent Transformation
On the other hand, various groups have tried using a datadependent transformation mostly relying on density estimation.
Gu & Rigazio (2014)used a denoising autoencoder
(Vincent et al., 2010) to push an input back toward the data manifold. Pouya Samangouei (2018) and Song et al. (2017) respectively use a pixelCNN (van den Oord et al., 2016) and generative adversarial network (Goodfellow et al., 2014a) to replace an input image with a nearby, likely image. Instead of using a separately trained generative model, Guo et al. (2017) uses a technique of image quilting (Efros & Freeman, 2001). These approaches are similar to our use of a retrieval engine over the candidate set. They however do not attempt at addressing the issue of misbehaviors of a classifier on the data manifold.Clean  FGSM  iFGSM  DeepFool  

0  1e04  2e04  4e04  1e05  2e05  8e05  1e05  2e05  8e05  
Baseline  85.15  14.05  7.5  4.22  55.2  26.17  2.59  26.04  11.72  0.34 
RaCNNK5  72.57  42.97  34.29  24.55  72.57  72.48  45.46  64.34  61.34  60.96 
RaCNNK5mixup  75.6  46.37  37.9  28.11  74.89  74.89  48.12  66.96  63.84  63.55 
RaCNNK10  79.52  52.95  43.9  33.77  79.12  79  55.27  72.89  71.81  71.14 
RaCNNK10mixup  80.80  53  44.01  33.47  79.87  79.72  54.36  73.63  72.35  71.26 

4.2 AttackAware Learning
Another direction has been on modifying a learning algorithm to make a classifier more robust to adversarial examples. As our approach relies on usual backpropagation with stochastic gradient descent, most of the approaches below, as well as above, are readily used together.
Adversarial Training
Already early on, Goodfellow et al. (2014b) proposed a procedure of adversarial training, where a classifier is trained on both training examples and adversarial examples generated onthefly. Lee et al. (2017) extended this procedure by introducing a generative adversarial network (GAN, Goodfellow et al., 2014a) that learns to generate adversarial examples while simultaneously training a classifier. These approaches are generally applicable to any system that could be tuned frequently, and could be used to train the proposed model.
Robust Optimization
Instead of explicitly including adversarial examples during training, there have been attempts to modify a learning algorithm to induce robustness. Cisse et al. (2017) proposed parseval training that encourages the Lipschitz constant of each layer of a deep neural network classifier to be less than one. More recently, Aman Sinha (2018) proposed a tractable robust optimization algorithm for training a deep neural net classifier to be more robust to adversarial examples. This robust optimization algorithm ensures that the classifier wellbehaves in the neighborhood of each training point. It is highly relevant to the proposed local mixup which also aims at making a classifier wellbehave between any pair of neighboring training examples.
4.3 RetrievalAugmented Neural Networks
The proposed approach tightly integrates an offtheshelf retrieval engine into a deep neural network. This approach of retrievalaugmented deep learning has recently been proposed in various tasks.
Gu et al. (2017)use a textbased retrieval engine to efficiently retrieve relevant training translation pairs and let their nonparametric neural machine translation system seamlessly fuse an input sentence and the retrieved pairs for better translation.
Wang et al. (2017) proposed a similar approach to text classification, and Guu et al. (2017) to language modeling. More recently, Sprechmann et al. (2018) applied this retrievalbased mechanism for online learning, similarly to the earlier work by Li et al. (2016) in the context of machine translation.Clean  FGSM  iFGSM  DeepFool  

0  2e04  4e04  8e04  2e05  8e05  2e04  2e05  8e05  2e04  
Baseline  95.48  42.09  30.95  21.61  70.41  35.53  11.17  51.10  16.00  4.28 
RaCNNK5  90.78  64.87  53.31  39.44  90.73  75.80  63.41  84.62  81.30  80.55 
RaCNNK5mixup  91.64  68.31  57.20  43.73  91.55  77.74  65.75  86.18  83.20  82.43 
RaCNNK10 
92.19  64.94  52.24  37.73  92.10  76.41  62.70  86.18  84.25  82.21 
RaCNNK10mixup  92.49  68.72  57.30  43.49  92.45  78.26  65.50  87.33  84.73  84.10 
5 Experiments
5.1 Settings
Datasets
We test the proposed approach (RaCNN) on three datasets of different scales. CIFAR10 has 50k training and 10k test examples, with 10 classes. SVHN has 73k training and 26k test examples, with 10 classes. ImageNet has 1.3M training and 50k validation examples with 1,000 classes. For CIFAR10 and ImageNet, we use the original training set as a candidate set, i.e., , while we use the extra set of 531k examples as a candidate set in the case of SVHN. The overall training process involves data augmentation on but not .
Pretrained Feature Extractor
We train a deep convolutional network for each dataset, remove the final fullyconnected layers and use the remaining stack as a feature extractor for retrieval. This feature extractor is fixed when used in the proposed RaCNN.
RaCNN: Feature Extractor and Classifier
We use the same convolutional network from above for the RaCNN as well (separated into and by the final average pooling) for each dataset. For CIFAR10 and SVHN, we train and from scratch. For ImageNet, on the other hand, we fix and train from the pretrained ResNet18 above. The latter was done, as we observed it greatly reduced training time in the preliminary experiments.
Training
We use Adam (Kingma & Ba, 2014) as an optimizer. We investigate the influence of the newly introduced components–retrieval and local mixup– by varying and .
Evaluation
In addition to the accuracy on the clean test set, we look at the accuracy per the amount of perturbation used to create adversarial examples. We use the default MeanSquaredDistance from the Foolbox library; this amount is computed as a normalized distance between the original example and its perturbed version :
We further notice that our attacks are generally performed with clipping the outbounded pixel values at each step.
5.2 Cifar10
Model
In the CIFAR10 experiments, our model contains 6 convolutional layers followed by 2 fullyconnected layers. Every layer is operated with batch normalization
(Ioffe & Szegedy, 2015)and ReLU after. More details can be found in Appendix
7.Scenario 1 (Direct Attack)
We present in Fig. 1 the effect of adversarial attacks with varying strengths (measured in the normalized distance) on both the vanilla convolutional network (Baseline) and the proposed RaCNN’s with various settings. Across all five adversarial attacks, it is clear that the proposed RaCNN is more robust to adversarial examples than the vanilla classifier is. The proposed local mixup improves the robustness further, especially when the number of retrieved examples is small, i.e., . We conjecture that this is due to the quadratically increasing number of pairs, i.e., , for which local mixup must take care of, with respect to .
Scenario 2 (Retrieval Attack)
In Table 1, we present the accuracies of both the baseline and RaCNN’s with varying strengths of whitebox attacks, when the feature extractor for the retrieval engine is attacked. We observe that it is indeed possible to fool the proposed RaCNN by attacking the retrieval process. Comparing Fig. 1 and Table 1, we however notice that the performance degradation is much less severe in this second scenario.
Clean  FGSM  iFGSM  DeepFool  

0  1e04  2e04  4e04  1e05  2e05  4e05  1e05  2e05  4e05  
Baseline  88.98  15  13.12  11.65  9.59  3.57  1.82  0.29  0.17  0.16 
RaCNNK10mixup  77.68  20.17  17.40  14.70  77.28  64.97  17.67  35.74  35.72  35.71 
5.3 Svhn
Model
We use the same architecture and hyperparameter setting as in the CIFAR10 experiments.
Scenario 1 (Direct Attack)
On SVHN, we observe a similar trend from CIFAR10. The proposed RaCNN is more robust against all the adversarial attacks compared to the vanilla convolutional network. Similarly to CIFAR10, the proposed approach is most robust to DeepFool and Boundary, while it is most susceptible to LBFGS. We however notice that the impact of local mixup is larger with SVHN than was with CIFAR10.
Another noticeable difference is the impact of the number of retrieved examples on the classification accuracy. In the case of CIFAR10, the accuracies on the clean test examples (the first column in Table 1) between using 5 and 10 retrieved examples differ significantly, while it is much less so with SVHN (the first column in Table 2.) We conjecture that this is due to a lower level of variation in input examples in SVHN, which are pictures of house numbers taken from streets, compared to those in CIFAR10, which are pictures of general objects.
Scenario 2 (Retrieval Attack)
5.4 ImageNet
Model
We use ResNet18 (He et al., 2016). We pretrain it as a standalone classifier on ImageNet and use the feature extractor part for retrieval. We use the same feature extractor for the RaCNN without updating it. The classifier is initialized with and tuned during training. In the case of ImageNet, we only try retrieved examples with local mixup. Due to the high computational cost of the LBFGS and Boundary attacks, we evaluate both the vanilla classifier and RaCNN against these two attacks on 200 images drawn uniformly at random from the validation set. We use Accuracy@5 which is a standard metric with ImageNet.
Scenario 1 (Direct Attack)
A general trend with ImageNet is similar to that with either CIFAR10 or SVHN, as can be seen in Fig. 3. The proposed RaCNN is more robust to adversarial attacks. We however do observe some differences. First, iFGSM is better at compromising both the baseline and RaCNN than LBFGS is, in this case. Second, DeepFool is much more successful at fooling the baseline convolutional network on ImageNet than on the other two datasets, but is much less so at fooling the proposed RaCNN.
Scenario 2 (Retrieval Attack)
Unlike CIFAR10 and SVHN, we have observed that the retrieval attack is sometimes more effective than the direct attack in the case of ImageNet. For instance, FGSM can compromise the retrieval feature extractor to decrease the accuracy from 77.68 down to 0.20 at . We observed a similar behavior with DeepFool, but not with iFGSM.
5.5 Discussion
In summary, we have observed that the proposed RaCNN, when trained with the local mixup, is more robust to adversarial attacks, at least those five considered in the experiments, than the vanilla convolutional network. More specifically, the RaCNN was most robust to the blackbox, decisionbased attach (Brendel et al., 2017), while it was more easily compromised by whitebox attacks, especially by the LBFGS attack (Tabacof & Valle, 2016) which relies on a strong, quasiNewton optimizer. This suggests that the RaCNN could be an attractive alternative to the vanilla convolutional network when deployed, for instance, in a cloudbased environment.
In Fig. 4, we show retrieval results given a query image from ImageNet. Although adversarial attack did indeed alter the retrieval engine’s behavior, we see that the semantics of the original query image could still be maintained in those sets of retrieved images, suggesting two insights. First, the robustness of the RaCNN is largely due to the robustness of the retrieval engine to small perturbation in the input. Even when the retrieval quality degrades, we observe that a majority of retrieved examples are of the same, or a similar, class. Second, we could further improve the robustness by designing the feature extractor for the retrieval engine more carefully. For instance, an identity function would correspond to retrieval based on the raw pixels, which would make the retrieval engine extremely robust to any adversarial attack imperceptible to humans. This may however results in a lower accuracy on clean examples, which is a tradeoff that needs to be determined per task.
As have been observed with the existing input transformation based defense strategies, the robustness of the proposed RaCNN comes at the expense of the generalization performance on clean input examples. We have observed however that this degradation could be controlled at the expense of computational overhead by varying the number of retrieved examples per input. This controllability could be an important feature when deploying such a model in production.
6 Conclusion
In this paper, we proposed a novel retrievalaugmented convolutional network classifier (RaCNN) that integrates an offtheshelf retrieval engine to counter adversarial attacks. The RaCNN was designed to tackle both off and onmanifold adversarial examples, and to do so, we use a retrieval engine to locally characterize the data manifold as a featurespace convex hull and the attention mechanism to project the input onto this convex hull. The entire model, composed of the retrieval engine and a deep convolutional network, is trained jointly, and we introduced the local mixup learning strategy to encourage the classifier to behave linearly on the featurespace convex hull.
We have evaluated the proposed approach on three standard object recognition benchmarks–CIFAR10, SVHN and ImageNet– against four whitebox adversarial attacks and one blackbox, decisionbased attack. The experiments have revealed that the proposed approach is indeed more robust than the vanilla convolutional network in all the cases. The RaCNN was found to be especially robust to the blackbox, decisionbased attack, suggesting its potential for the cloudbased deployment scenario.
The proposed approach consists of three major components; (1) local characterization of data manifold, (2) data manifold projection and (3) regularized learning on the manifold. There is a large room for improvement in each of these components. For instance, a featurespace convex hull may be replaced with a more sophisticated kernel estimator. Projection onto the convex hull could be done better, and a learning algorithm better than local mixup could further improve the robustness against onmanifold adversarial examples. We leave these possibilities as future work.
References
 Aman Sinha (2018) Aman Sinha, Hongseok Namkoong, John Duchi. Certifiable distributional robustness with principled adversarial training. International Conference on Learning Representations, 2018.
 Bahdanau et al. (2014) Bahdanau, Dzmitry, Cho, Kyunghyun, and Bengio, Yoshua. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
 Bengio et al. (2013) Bengio, Yoshua, Mesnil, Grégoire, Dauphin, Yann, and Rifai, Salah. Better mixing via deep representations. In Proceedings of the 30th International Conference on Machine Learning (ICML13), 2013.
 Bingham & Mannila (2001) Bingham, Ella and Mannila, Heikki. Random projection in dimensionality reduction: applications to image and text data. In KDD. ACM, 2001.
 Brendel et al. (2017) Brendel, Wieland, Rauber, Jonas, and Bethge, Matthias. Decisionbased adversarial attacks: Reliable attacks against blackbox machine learning models. arXiv preprint arXiv:1712.04248, 2017.
 Cisse et al. (2017) Cisse, Moustapha, Bojanowski, Piotr, Grave, Edouard, Dauphin, Yann, and Usunier, Nicolas. Parseval networks: Improving robustness to adversarial examples. In International Conference on Machine Learning, 2017.
 Datar et al. (2004) Datar, Mayur, Immorlica, Nicole, Indyk, Piotr, and Mirrokni, Vahab S. Localitysensitive hashing scheme based on pstable distributions. In Proceedings of the twentieth annual symposium on Computational geometry. ACM, 2004.
 Deng et al. (2009) Deng, Jia, Dong, Wei, Socher, Richard, Li, LiJia, Li, Kai, and FeiFei, Li. Imagenet: A largescale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, 2009.
 Dziugaite et al. (2016) Dziugaite, Gintare Karolina, Ghahramani, Zoubin, and Roy, Daniel M. A study of the effect of jpg compression on adversarial images. arXiv preprint arXiv:1608.00853, 2016.
 Efros & Freeman (2001) Efros, Alexei A and Freeman, William T. Image quilting for texture synthesis and transfer. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques. ACM, 2001.
 Gilmer et al. (2018) Gilmer, Justin, Metz, Luke, Faghri, Fartash, Schoenholz, Samuel S, Raghu, Maithra, Wattenberg, Martin, and Goodfellow, Ian. Adversarial spheres. arXiv preprint arXiv:1801.02774, 2018.
 Goodfellow et al. (2014a) Goodfellow, Ian, PougetAbadie, Jean, Mirza, Mehdi, Xu, Bing, WardeFarley, David, Ozair, Sherjil, Courville, Aaron, and Bengio, Yoshua. Generative adversarial nets. In Advances in neural information processing systems, 2014a.
 Goodfellow et al. (2014b) Goodfellow, Ian J, Shlens, Jonathon, and Szegedy, Christian. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014b.
 Gu et al. (2017) Gu, Jiatao, Wang, Yong, Cho, Kyunghyun, and Li, Victor OK. Search engine guided nonparametric neural machine translation. arXiv preprint arXiv:1705.07267, 2017.
 Gu & Rigazio (2014) Gu, Shixiang and Rigazio, Luca. Towards deep neural network architectures robust to adversarial examples. arXiv preprint arXiv:1412.5068, 2014.
 Guo et al. (2017) Guo, Chuan, Rana, Mayank, Cisse, Moustapha, and van der Maaten, Laurens. Countering adversarial images using input transformations. arXiv preprint arXiv:1711.00117, 2017.
 Guu et al. (2017) Guu, Kelvin, Hashimoto, Tatsunori B, Oren, Yonatan, and Liang, Percy. Generating sentences by editing prototypes. arXiv preprint arXiv:1709.08878, 2017.
 He et al. (2016) He, Kaiming, Zhang, Xiangyu, Ren, Shaoqing, and Sun, Jian. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.
 Ioffe & Szegedy (2015) Ioffe, Sergey and Szegedy, Christian. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, 2015.
 Jacob Buckman (2018) Jacob Buckman, Aurko Roy, Colin Raffel Ian Goodfellow. Thermometer encoding: One hot way to resist adversarial examples. International Conference on Learning Representations, 2018.
 Johnson et al. (2017) Johnson, Jeff, Douze, Matthijs, and Jégou, Hervé. Billionscale similarity search with gpus. arXiv preprint arXiv:1702.08734, 2017.
 Kingma & Ba (2014) Kingma, Diederik P and Ba, Jimmy. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
 Kingma & Welling (2013) Kingma, Diederik P and Welling, Max. Autoencoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
 Krizhevsky & Hinton (2009) Krizhevsky, Alex and Hinton, Geoffrey. Learning multiple layers of features from tiny images. Technical report, 2009.
 Krizhevsky et al. (2012) Krizhevsky, Alex, Sutskever, Ilya, and Hinton, Geoffrey E. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, 2012.
 Kurakin et al. (2016) Kurakin, Alexey, Goodfellow, Ian, and Bengio, Samy. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533, 2016.
 LeCun et al. (1998) LeCun, Yann, Bottou, Léon, Bengio, Yoshua, and Haffner, Patrick. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 1998.
 Lee et al. (2017) Lee, Hyeungill, Han, Sungyeob, and Lee, Jungwoo. Generative adversarial trainer: Defense to adversarial perturbations with gan. arXiv preprint arXiv:1705.03387, 2017.
 Li et al. (2016) Li, Xiaoqing, Zhang, Jiajun, and Zong, Chengqing. One sentence one model for neural machine translation. arXiv preprint arXiv:1609.06490, 2016.
 Lu et al. (2017) Lu, Jiajun, Sibai, Hussein, Fabry, Evan, and Forsyth, David. No need to worry about adversarial examples in object detection in autonomous vehicles. arXiv preprint arXiv:1707.03501, 2017.
 Luong et al. (2015) Luong, MinhThang, Pham, Hieu, and Manning, Christopher D. Effective approaches to attentionbased neural machine translation. arXiv preprint arXiv:1508.04025, 2015.
 MoosaviDezfooli et al. (2016) MoosaviDezfooli, SeyedMohsen, Fawzi, Alhussein, and Frossard, Pascal. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
 Netzer et al. (2011) Netzer, Yuval, Wang, Tao, Coates, Adam, Bissacco, Alessandro, Wu, Bo, and Ng, Andrew Y. Reading digits in natural images with unsupervised feature learning. In NIPS workshop on deep learning and unsupervised feature learning, 2011.
 Pouya Samangouei (2018) Pouya Samangouei, Maya Kabkab, Rama Chellappa. DefenseGAN: Protecting classifiers against adversarial attacks using generative models. International Conference on Learning Representations, 2018.
 Radford et al. (2015) Radford, Alec, Metz, Luke, and Chintala, Soumith. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
 Rauber et al. (2017) Rauber, Jonas, Brendel, Wieland, and Bethge, Matthias. Foolbox v0. 8.0: A python toolbox to benchmark the robustness of machine learning models. arXiv preprint arXiv:1707.04131, 2017.
 Rudin et al. (1992) Rudin, Leonid I, Osher, Stanley, and Fatemi, Emad. Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena, 1992.
 Rumelhart et al. (1986) Rumelhart, David E, Hinton, Geoffrey E, and Williams, Ronald J. Learning representations by backpropagating errors. Nature, 1986.
 Shin & Song (2017) Shin, R and Song, D. Jpegresistant adversarial images. In MAchine LEarning and Computer Security Workshop, 2017.
 Smith & Tromble (2004) Smith, Noah A and Tromble, Roy W. Sampling uniformly from the unit simplex. Johns Hopkins University, Tech. Rep, 2004.
 Song et al. (2017) Song, Yang, Kim, Taesup, Nowozin, Sebastian, Ermon, Stefano, and Kushman, Nate. Pixeldefend: Leveraging generative models to understand and defend against adversarial examples. arXiv preprint arXiv:1710.10766, 2017.
 Sprechmann et al. (2018) Sprechmann, Pablo, Jayakumar, Siddhant, Rae, Jack, Pritzel, Alexander, Badia, Adria Puigdomenech, Uria, Benigno, Vinyals, Oriol, Hassabis, Demis, Pascanu, Razvan, and Blundell, Charles. Memorybased parameter adaptation. International Conference on Learning Representations, 2018.
 Szegedy et al. (2013) Szegedy, Christian, Zaremba, Wojciech, Sutskever, Ilya, Bruna, Joan, Erhan, Dumitru, Goodfellow, Ian, and Fergus, Rob. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
 Tabacof & Valle (2016) Tabacof, Pedro and Valle, Eduardo. Exploring the space of adversarial images. In Neural Networks (IJCNN), 2016 International Joint Conference on. IEEE, 2016.
 van den Oord et al. (2016) van den Oord, Aaron, Kalchbrenner, Nal, Espeholt, Lasse, Vinyals, Oriol, Graves, Alex, et al. Conditional image generation with pixelcnn decoders. In Advances in Neural Information Processing Systems, 2016.
 Vincent et al. (2010) Vincent, Pascal, Larochelle, Hugo, Lajoie, Isabelle, Bengio, Yoshua, and Manzagol, PierreAntoine. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 2010.
 Wang et al. (2017) Wang, Zhiguo, Hamza, Wael, and Song, Linfeng. nearest neighbor augmented neural networks for text classification. arXiv preprint arXiv:1708.07863, 2017.
 Zhang et al. (2017) Zhang, Hongyi, Cisse, Moustapha, Dauphin, Yann N, and LopezPaz, David. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017.
 Zhu et al. (1997) Zhu, Ciyou, Byrd, Richard H, Lu, Peihuang, and Nocedal, Jorge. Algorithm 778: LBFGSB: Fortran subroutines for largescale boundconstrained optimization. ACM Transactions on Mathematical Software (TOMS), 1997.
7 Appendix: Model details
Our CIFAR10 and SVHN model spec is listed in the following table.
Stage  Architecture  Size 
Feature extractor  96 3x3 convolution  96 x 30 x 30 
batch normalization  
96 3x3 convolution  96 x 28 x 28  
batch normalization & ReLU  
96 3x3 convolution with stride 2x2 
96 x 13 x 13  
batch normalization & ReLU  
192 3x3 convolution  192 x 11 x 11  
batch normalization & ReLU  
192 3x3 convolution with stride 2x2  192 x 4 x 4  
batch normalization  
Attention  256 4x4 convolution  256 
Convexsum (with attention mechanism or local mixup)  256  
Classification  fullyconnected layer 256 x 64  64 
batch normalization & ReLU  
fullyconnected layer 64 x 10  10 
The pretrained retrieval index building network is listed as follow:
Stage  Architecture  Size 
Feature extractor  96 3x3 convolution  96 x 30 x 30 
batch normalization & ReLU  
96 3x3 convolution  96 x 28 x 28  
batch normalization & ReLU  
96 3x3 convolution with stride 2x2  96 x 13 x 13  
batch normalization & ReLU  
192 3x3 convolution  192 x 11 x 11  
batch normalization & ReLU  
192 3x3 convolution with stride 2x2  192 x 4 x 4  
batch normalization  
Classification  fullyconnected layer 3072 x 512  512 
Used only  batch normalization & ReLU  
in Scenario 2  fullyconnected layer 512 x 128  128 
batch normalization & ReLU  
fullyconnected layer 128 x 10  10 
Comments
There are no comments yet.