Retrieval-Augmented Convolutional Neural Networks for Improved Robustness against Adversarial Examples

02/26/2018 ∙ by Jake Zhao, et al. ∙ 0

We propose a retrieval-augmented convolutional network and propose to train it with local mixup, a novel variant of the recently proposed mixup algorithm. The proposed hybrid architecture combining a convolutional network and an off-the-shelf retrieval engine was designed to mitigate the adverse effect of off-manifold adversarial examples, while the proposed local mixup addresses on-manifold ones by explicitly encouraging the classifier to locally behave linearly on the data manifold. Our evaluation of the proposed approach against five readily-available adversarial attacks on three datasets--CIFAR-10, SVHN and ImageNet--demonstrate the improved robustness compared to the vanilla convolutional network.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Since the initial investigation by Szegedy et al. (2013), adversarial examples have drawn a large interest. Various methods for both generating adversarial examples as well as protecting a classifier from them have been proposed (see Sec. 34 for more details.) Adversarial examples exist due to misbehaviors of a classifier in some regions of the input space and are generated often by finding a point in such a region using optimization.

According to (Gilmer et al., 2018), adversarial examples can be categorized into those off the data manifold, which is defined as a manifold on which training examples lie, and those on the data manifold. Off-manifold adversarial examples occur as the classifier does not have a chance to observe any off-manifold examples during training, which is a natural consequence from the very definition of the data manifold. On-manifold adversarial examples however exist between training examples on the data manifold. There are two causes behind this phenomenon; (1) the sparsity of training examples and (2) the non-smooth behavior of the classifier on the data manifold.

In this paper, we propose to tackle both off- and on-manifold adversarial examples by incorporating an off-the-shelf retrieval mechanism which indexes a large set of examples and training this combination of a deep neural network classifier and the retrieval engine to behave linearly on the data manifold using a novel variant of the recently proposed mixup algorithm (Zhang et al., 2017), to which we refer as “local mixup.”

The retrieval mechanism efficiently selects a subset of neighboring examples from a candidate set near the input. These neighboring examples are used as a local approximation to the data manifold in the form of a feature-space convex hull onto which the input is projected. The classifier then makes a decision based on this projected input. This addresses off-manifold adversarial examples. Within this feature-space convex hull, we encourage the classifier to behave linearly by using local mixup to further address on-manifold adversarial examples.

We evaluate the proposed approach, called a retrieval-augmented classifier, with a deep convolutional network (LeCun et al., 1998) on object recognition. We extensively test the retrieval-augmented convolutional network (RaCNN) on datasets with varying scales; CIFAR-10 (Krizhevsky & Hinton, 2009), SVHN (Netzer et al., 2011) as well as ImageNet (Deng et al., 2009), against five readily-available adversarial attacks including both white-box (FGSM, iFGSM, DeepFool and L-BFGS) and black-box attacks (Boundary). Our experiments reveal that the RaCNN is more robust to these five attacks than the vanilla convolutional network.

2 Retrieval-Augmented CNN

Gilmer et al. (2018) have recently demonstrated that adversarial examples exist both on and off the data manifold in a carefully controlled setting in which examples from two classes are placed on two disjoint spheres. This result suggests that it is necessary to tackle both types of adversarial examples to improve the robustness of a deep neural network based classifier to adversarial examples. In this section, we describe our approach toward building a more robust classifier by combining an off-the-shelf retrieval engine and a variant of the recently proposed mix-up learning strategy.

2.1 Setup

Let be a candidate set of examples. This set may be created as a subset from a training set or may be an entire separate set. We use as a proxy to the underlying data manifold.

is a distance function that measures the dissimilarity between two inputs and . In order to facilitate the use of an off-the-shelf retrieval engine, we use

(1)

where is a predefined, or pretrained, feature extractor. We assume the existence of a readily-available retrieval engine that takes as input and returns the nearest neighbors in according to .

We then have a deep neural network classifier composed of a feature extraction

and a classifier . This classifier is trained on a training set , taking into account the extra set and the retrieval engine.

2.2 Inference

In this setup, we first describe the forward evaluation of the proposed network. This forward pass is designed to handle adversarial examples “off” the data manifold by projecting them onto the data manifold.

Local Characterization of Data Manifold

Given a new input , we use the retrieval engine to retrieve the examples ’s from that are closest to : . We then build a feature-space convex hull by

As observed earlier, linear interpolation of two input vectors in the feature space of a deep neural network often corresponds to a plausible input vector, unlike when interpolation was done in the raw input space 

(see, e.g., Bengio et al., 2013; Kingma & Welling, 2013; Radford et al., 2015). Based on this observation, we consider the feature-space convex hull as a reasonable local approximation to the underlying data manifold.

Trainable Projection

Exact projection of the input onto this convex hull requires expensive optimization, especially in the high-dimensional space. As we consider a deep neural network classifier, the dimension of the feature space could be hundreds or more, making this exact projection computationally infeasible. Instead, we propose to learn a goal-driven projection procedure based on the attention mechanism (Bahdanau et al., 2014).

We compare each input against and compute a score:

where is a trainable weight matrix (Luong et al., 2015). These scores are then normalized to form a set of coefficients: These coefficients ’s are then used to form a projection point of in the feature-space convex hull :

This trainable projection could be thought of as learning to project an off-manifold example on the locally-approximated manifold to maximize the classification accuracy.

Classification

The projected feature now represents the original input and is fed to a final classifier . In other words, we constrain the final classifier to work only with a point inside a feature-space convex hull of neighboring training examples. This constraint alleviates the issue of the classifier’s misbehavior in the region outside the data manifold up to a certain degree.111 The quality of the local approximation may not be uniformly high across the input space, and we do not claim that it solves the problem of off-manifold adversarial examples.

2.3 Training

The output of the classifier is almost fully differentiable with respect to the classifier , both of the features extractors ( and ) and the attention weight matrix , except for the retrieval engine .222 We believe the introduction of this non-differentiable, black-box retrieval engine further contributes to the increased robustness against white-box attacks.

This allows us to train the entire pipeline in the previous section using backpropagation 

(Rumelhart et al., 1986) and gradient-based optimization.

Local Mixup

This is however not enough to ensure the robustness of the proposed approach to on-manifold adversarial examples. During training, the classifier only observes a very small subset of any feature-space convex hull. Especially in a high-dimensional space, this greatly increase the chance of the classifier’s misbehavior within these feature-space convex hulls, as also noted by Gilmer et al. (2018). In order to address this issue, we propose to augment learning with a local variant of the recently proposed mix-up algorithm (Zhang et al., 2017).

The goal of original mixup is to encourage a classifier to act linearly between any pair of training examples. This is done by linearly mixing in two randomly-drawn training examples and creating a new linearly-interpolated example pair during training. Let two randomly-drawn pairs be and , where and are one-hot vectors in the case of classification. Mixup creates a new pair and uses it as a training example, where

is a random sample from a beta distribution. We call this original version

global mixup, as it increases the linearity of the classifier between any pair of training examples.

It is however unnecessary for our purpose to use global mixup, as our goal is to make the classifier better behave (i.e., linearly behave) within a feature-space convex hull . Thus, we use a local mixup in which we uniformly sample the convex coefficients ’s at random to create a new mixed example pair . We use the Kraemer Algorithm (see Sec. 4.2 in Smith & Tromble, 2004).

Overall

We use stochastic gradient descent (SGD) to train the proposed network. At each update, we perform

descent steps for the usual classification loss, and descent steps for the proposed local mixup.

2.4 Retrieval Engine

The proposed approach does not depend on the specifics of a retrieval engine . Any off-the-shelf retrieval engine that supports dense vector lookup could be used, enabling the use of a very large-scale with latest fast dense vector lookup algorithms, such as FAISS (Johnson et al., 2017). In this work, we used a more rudimentary retrieval engine based on locality-sensitive hashing (LSH; see, e.g., Datar et al., 2004) with a reduced feature dimension using random projection (see, e.g., Bingham & Mannila, 2001, and references therein), as the sizes of candidate sets in the experiments contain approximately 1M or less examples. The key from Eq. (1) was chosen to be a pretrained deep neural network without the final fully-connected classifier layers (Krizhevsky et al., 2012; He et al., 2016).

3 Adversarial Attack

3.1 Attack Scenarios

Scenario 1 (Direct Attack)

In this work, we consider the candidate set and the retrieval engine which indexes it to be “hidden” from the outside world. This property makes a usual white-box attack more of a gray-box attack in which the attacker has access to the entire system except for the retrieval part. This is our first attack scenario.

Scenario 2 (Retrieval Attack)

Despite the hidden nature of the retrieval engine and the candidate set, it is possible for the attacker to confuse the retrieval engine, if she/he could access the feature extractor . We furthermore give the attacker the access not only to but the original classifier which was tuned together with . This allows the attacker to create an adversarial example on that could potentially disrupt the retrieval process, thereby fooling the proposed network. Although this is unlikely in practice, we test this second scenario to investigate the possibility of compromising the retrieval engine.

3.2 Attack Methods

Under each of these scenarios, we evaluate the robustness of the proposed approach on the five widely used/tested adversarial attack algorithms including both white-box and black-box attacks. They are fast gradient sign method (FGSM, Goodfellow et al., 2014b), its iterative variant (iFGSM, Kurakin et al., 2016), DeepFool (Moosavi-Dezfooli et al., 2016), L-BFGS (Tabacof & Valle, 2016) and Boundary (Brendel et al., 2017). We acknowledge that this is not an exhaustive list of attacks, however find it to be extensive enough to empirically evaluate the robustness of the proposed approach.

Fast Gradient Sign Method (FGSM)

FGSM creates an adversarial example by adding the scaled sign of the gradient of the loss function

computed using a target class to the input:

where the scale controls the difference between the original input and its adversarial version . This is a white-box attack, requiring the availability of the gradient of the loss function with respect to the input.

Iterative FGSM (iFGSM)

iFGSM improves upon the FGSM by iteratively modifying the original input for a fixed number of steps. At each step,

where and . Similarly to the FGSM, the iFGSM is a white-box attack.

DeepFool

Moosavi-Dezfooli et al. (2016) proposed to create an adversarial example by finding a residual vector with the minimum -norm with the constraint that the output of a classifier must flip. They presented an efficient iterative procedure to find such a residual vector. Similarly to the FGSM and iFGSM, this approach relies on the gradient of the classifier’s output with respect to the input, and is hence a white-box attack.

L-Bfgs

Tabacof & Valle (2016) proposed an optimization-based approach, similar to DeepFool above, however, more explicitly constraining the input to lie inside a tight box defined by training examples. They use L-BFGS-B (Zhu et al., 1997) to solve this box-constrained optimization problem. This is also a white-box attack.

Boundary

Brendel et al. (2017) proposed a powerful black-box attack, or more specifically decision-based attack, that requires neither the gradient of a classifier nor the predictive distribution. It only requires the final decision of the classifier. Starting from an adversarial example, potentially far away from the original input, it iteratively searches for a next adversarial example that has a smaller difference to the original input. This procedure guarantees the reduction in the difference by rejecting any step that neither decreases the difference nor makes the example not adversarial.

Implementation

We use Foolbox333 Available at http://foolbox.readthedocs.io/en/latest/. Revision 2d468cb6. released by Rauber et al. (2017). Whenever necessary for further analysis, such as the accuracy per the amount of adversarial perturbation, we implement some of these attacks ourselves.

4 Related Work

Since the phenomenon of adversarial examples was noticed by Szegedy et al. (2013), there have been a stream of attempts at making a deep neural network more robust. Most of the existing work are orthogonal to the proposed approach here and could be used together. We however detail them here to demonstrate similarities and contrasts against our approach.

4.1 Input Transformation

An off-manifold adversarial example can be avoided, if it could be projected onto the data manifold, characterized by training examples. This could be thought of as transforming an input. There have been two families of algorithms in this direction.

Data-Independent Transformation

The first family of defense mechanisms aims at reducing the input space so as to minimize regions that are off the data manifold. Dziugaite et al. (2016) demonstrated that JPEG-compressed images suffer less from adversarial attacks. Lu et al. (2017) suggest that trying various scaling of an image size could overcome adversarial attacks, as they seem to be sensitive to the scaling of objects. Guo et al. (2017) uses an idea of compressed sensing to transform an input image by reconstructing it from its lower-resolution version while minimizing the total variation (Rudin et al., 1992). More recently, Jacob Buckman (2018) proposed to discretize each input dimension using thermometer coding. These approaches are attractive due to their simplicity, but there have some work showing that it is often not enough to defend against sophisticated adversarial examples (see, e.g., Shin & Song, 2017).

Data-Dependent Transformation

On the other hand, various groups have tried using a data-dependent transformation mostly relying on density estimation.

Gu & Rigazio (2014)

used a denoising autoencoder 

(Vincent et al., 2010) to push an input back toward the data manifold. Pouya Samangouei (2018) and Song et al. (2017) respectively use a pixelCNN (van den Oord et al., 2016) and generative adversarial network (Goodfellow et al., 2014a) to replace an input image with a nearby, likely image. Instead of using a separately trained generative model, Guo et al. (2017) uses a technique of image quilting (Efros & Freeman, 2001). These approaches are similar to our use of a retrieval engine over the candidate set. They however do not attempt at addressing the issue of misbehaviors of a classifier on the data manifold.

Clean FGSM iFGSM DeepFool
0 1e-04 2e-04 4e-04 1e-05 2e-05 8e-05 1e-05 2e-05 8e-05
Baseline 85.15 14.05 7.5 4.22 55.2 26.17 2.59 26.04 11.72 0.34
RaCNN-K5 72.57 42.97 34.29 24.55 72.57 72.48 45.46 64.34 61.34 60.96
RaCNN-K5-mixup 75.6 46.37 37.9 28.11 74.89 74.89 48.12 66.96 63.84 63.55
RaCNN-K10 79.52 52.95 43.9 33.77 79.12 79 55.27 72.89 71.81 71.14
RaCNN-K10-mixup 80.80 53 44.01 33.47 79.87 79.72 54.36 73.63 72.35 71.26

Table 1: The CIFAR-10 classifiers’ robustness to the adversarial attacks in the Scenario 2 (Retrieval Attack)

4.2 Attack-Aware Learning

Another direction has been on modifying a learning algorithm to make a classifier more robust to adversarial examples. As our approach relies on usual backpropagation with stochastic gradient descent, most of the approaches below, as well as above, are readily used together.

Adversarial Training

Already early on, Goodfellow et al. (2014b) proposed a procedure of adversarial training, where a classifier is trained on both training examples and adversarial examples generated on-the-fly. Lee et al. (2017) extended this procedure by introducing a generative adversarial network (GAN, Goodfellow et al., 2014a) that learns to generate adversarial examples while simultaneously training a classifier. These approaches are generally applicable to any system that could be tuned frequently, and could be used to train the proposed model.

Robust Optimization

Instead of explicitly including adversarial examples during training, there have been attempts to modify a learning algorithm to induce robustness. Cisse et al. (2017) proposed parseval training that encourages the Lipschitz constant of each layer of a deep neural network classifier to be less than one. More recently, Aman Sinha (2018) proposed a tractable robust optimization algorithm for training a deep neural net classifier to be more robust to adversarial examples. This robust optimization algorithm ensures that the classifier well-behaves in the neighborhood of each training point. It is highly relevant to the proposed local mixup which also aims at making a classifier well-behave between any pair of neighboring training examples.

4.3 Retrieval-Augmented Neural Networks

The proposed approach tightly integrates an off-the-shelf retrieval engine into a deep neural network. This approach of retrieval-augmented deep learning has recently been proposed in various tasks.

Gu et al. (2017)

use a text-based retrieval engine to efficiently retrieve relevant training translation pairs and let their non-parametric neural machine translation system seamlessly fuse an input sentence and the retrieved pairs for better translation.

Wang et al. (2017) proposed a similar approach to text classification, and Guu et al. (2017) to language modeling. More recently, Sprechmann et al. (2018) applied this retrieval-based mechanism for online learning, similarly to the earlier work by Li et al. (2016) in the context of machine translation.

FGSM iFGSM DeepFool L-BFGS Boundary
Figure 1: The CIFAR-10 classifiers’ robustness to the adversarial attacks in the Scenario 1 (Direct Attack). The x-axis indicates the strength of attack in terms of the normalized distance. The y-axis corresponds to the accuracy.
Clean FGSM iFGSM DeepFool
0 2e-04 4e-04 8e-04 2e-05 8e-05 2e-04 2e-05 8e-05 2e-04
Baseline 95.48 42.09 30.95 21.61 70.41 35.53 11.17 51.10 16.00 4.28
RaCNN-K5 90.78 64.87 53.31 39.44 90.73 75.80 63.41 84.62 81.30 80.55
RaCNN-K5-mixup 91.64 68.31 57.20 43.73 91.55 77.74 65.75 86.18 83.20 82.43

RaCNN-K10
92.19 64.94 52.24 37.73 92.10 76.41 62.70 86.18 84.25 82.21
RaCNN-K10-mixup 92.49 68.72 57.30 43.49 92.45 78.26 65.50 87.33 84.73 84.10
Table 2: The SVHN classifiers’ robustness to the adversarial attacks in the Scenario 2 (Retrieval Attack)

5 Experiments

5.1 Settings

Datasets

We test the proposed approach (RaCNN) on three datasets of different scales. CIFAR-10 has 50k training and 10k test examples, with 10 classes. SVHN has 73k training and 26k test examples, with 10 classes. ImageNet has 1.3M training and 50k validation examples with 1,000 classes. For CIFAR-10 and ImageNet, we use the original training set as a candidate set, i.e., , while we use the extra set of 531k examples as a candidate set in the case of SVHN. The overall training process involves data augmentation on but not .

Pretrained Feature Extractor

We train a deep convolutional network for each dataset, remove the final fully-connected layers and use the remaining stack as a feature extractor for retrieval. This feature extractor is fixed when used in the proposed RaCNN.

RaCNN: Feature Extractor and Classifier

We use the same convolutional network from above for the RaCNN as well (separated into and by the final average pooling) for each dataset. For CIFAR-10 and SVHN, we train and from scratch. For ImageNet, on the other hand, we fix and train from the pretrained ResNet-18 above. The latter was done, as we observed it greatly reduced training time in the preliminary experiments.

Training

We use Adam (Kingma & Ba, 2014) as an optimizer. We investigate the influence of the newly introduced components–retrieval and local mixup– by varying and .

Evaluation

In addition to the accuracy on the clean test set, we look at the accuracy per the amount of perturbation used to create adversarial examples. We use the default MeanSquaredDistance from the Foolbox library; this amount is computed as a normalized distance between the original example and its perturbed version :

We further notice that our attacks are generally performed with clipping the outbounded pixel values at each step.

5.2 Cifar-10

Model

In the CIFAR-10 experiments, our model contains 6 convolutional layers followed by 2 fully-connected layers. Every layer is operated with batch normalization

(Ioffe & Szegedy, 2015)

and ReLU after. More details can be found in Appendix

7.

Scenario 1 (Direct Attack)

We present in Fig. 1 the effect of adversarial attacks with varying strengths (measured in the normalized distance) on both the vanilla convolutional network (Baseline) and the proposed RaCNN’s with various settings. Across all five adversarial attacks, it is clear that the proposed RaCNN is more robust to adversarial examples than the vanilla classifier is. The proposed local mixup improves the robustness further, especially when the number of retrieved examples is small, i.e., . We conjecture that this is due to the quadratically increasing number of pairs, i.e., , for which local mixup must take care of, with respect to .

Scenario 2 (Retrieval Attack)

In Table 1, we present the accuracies of both the baseline and RaCNN’s with varying strengths of white-box attacks, when the feature extractor for the retrieval engine is attacked. We observe that it is indeed possible to fool the proposed RaCNN by attacking the retrieval process. Comparing Fig. 1 and Table 1, we however notice that the performance degradation is much less severe in this second scenario.

FGSM iFGSM DeepFool L-BFGS Boundary
Figure 2: The SVHN classifiers’ robustness to the adversarial attacks in the Scenario 1 (Direct Attack). The x-axis indicates the strength of attack in terms of the normalized distance. The y-axis corresponds to the accuracy.
Clean FGSM iFGSM DeepFool
0 1e-04 2e-04 4e-04 1e-05 2e-05 4e-05 1e-05 2e-05 4e-05
Baseline 88.98 15 13.12 11.65 9.59 3.57 1.82 0.29 0.17 0.16
RaCNN-K10-mixup 77.68 20.17 17.40 14.70 77.28 64.97 17.67 35.74 35.72 35.71
Table 3: The ImageNet classifiers’ robustness to the adversarial attacks in the Scenario 2 (Retrieval Attack).

5.3 Svhn

Model

We use the same architecture and hyper-parameter setting as in the CIFAR-10 experiments.

Scenario 1 (Direct Attack)

On SVHN, we observe a similar trend from CIFAR-10. The proposed RaCNN is more robust against all the adversarial attacks compared to the vanilla convolutional network. Similarly to CIFAR-10, the proposed approach is most robust to DeepFool and Boundary, while it is most susceptible to L-BFGS. We however notice that the impact of local mixup is larger with SVHN than was with CIFAR-10.

Another noticeable difference is the impact of the number of retrieved examples on the classification accuracy. In the case of CIFAR-10, the accuracies on the clean test examples (the first column in Table 1) between using 5 and 10 retrieved examples differ significantly, while it is much less so with SVHN (the first column in Table 2.) We conjecture that this is due to a lower level of variation in input examples in SVHN, which are pictures of house numbers taken from streets, compared to those in CIFAR-10, which are pictures of general objects.

Scenario 2 (Retrieval Attack)

We observe a similar trend between CIFAR-10 and SVHN, when the feature extractor for retrieval was attacked, as shown in Tables 12.

FGSM iFGSM DeepFool L-BFGS Boundary
Figure 3: The ImageNet classifiers’ robustness to the adversarial attacks in the Scenario 1 (Direct Attack). The x-axis indicates the strength of attack in terms of the normalized distance. The y-axis corresponds to the accuracy. The adversary utilizes top-5 accuracies for attacks.

5.4 ImageNet

Model

We use ResNet-18 (He et al., 2016). We pretrain it as a standalone classifier on ImageNet and use the feature extractor part for retrieval. We use the same feature extractor for the RaCNN without updating it. The classifier is initialized with and tuned during training. In the case of ImageNet, we only try retrieved examples with local mixup. Due to the high computational cost of the L-BFGS and Boundary attacks, we evaluate both the vanilla classifier and RaCNN against these two attacks on 200 images drawn uniformly at random from the validation set. We use Accuracy@5 which is a standard metric with ImageNet.

Scenario 1 (Direct Attack)

A general trend with ImageNet is similar to that with either CIFAR-10 or SVHN, as can be seen in Fig. 3. The proposed RaCNN is more robust to adversarial attacks. We however do observe some differences. First, iFGSM is better at compromising both the baseline and RaCNN than L-BFGS is, in this case. Second, DeepFool is much more successful at fooling the baseline convolutional network on ImageNet than on the other two datasets, but is much less so at fooling the proposed RaCNN.

Scenario 2 (Retrieval Attack)

Unlike CIFAR-10 and SVHN, we have observed that the retrieval attack is sometimes more effective than the direct attack in the case of ImageNet. For instance, FGSM can compromise the retrieval feature extractor to decrease the accuracy from 77.68 down to 0.20 at . We observed a similar behavior with DeepFool, but not with iFGSM.

Clean

iFGSM (Scenario 1 – Direct Attack) with

iFGSM (Scenario 2 – Retrieval Attack) with

iFGSM (Scenario 1 – Direct Attack) with

iFGSM (Scenario 2 – Retrieval Attack) with

Figure 4: On the left-most column shows the query image, and the next ten images have been retrieved by . We show the retrieval results using the original image and the adversarial images one row at a time. With the amount of injected noise high enough to fool any vanilla convolutional network, the behavior of the retrieval engine changes however largely maintains the semantics of the query image. That is, most of the retrieved images contain fish, although specific species may change.

5.5 Discussion

In summary, we have observed that the proposed RaCNN, when trained with the local mixup, is more robust to adversarial attacks, at least those five considered in the experiments, than the vanilla convolutional network. More specifically, the RaCNN was most robust to the black-box, decision-based attach (Brendel et al., 2017), while it was more easily compromised by white-box attacks, especially by the L-BFGS attack (Tabacof & Valle, 2016) which relies on a strong, quasi-Newton optimizer. This suggests that the RaCNN could be an attractive alternative to the vanilla convolutional network when deployed, for instance, in a cloud-based environment.

In Fig. 4, we show retrieval results given a query image from ImageNet. Although adversarial attack did indeed alter the retrieval engine’s behavior, we see that the semantics of the original query image could still be maintained in those sets of retrieved images, suggesting two insights. First, the robustness of the RaCNN is largely due to the robustness of the retrieval engine to small perturbation in the input. Even when the retrieval quality degrades, we observe that a majority of retrieved examples are of the same, or a similar, class. Second, we could further improve the robustness by designing the feature extractor for the retrieval engine more carefully. For instance, an identity function would correspond to retrieval based on the raw pixels, which would make the retrieval engine extremely robust to any adversarial attack imperceptible to humans. This may however results in a lower accuracy on clean examples, which is a trade-off that needs to be determined per task.

As have been observed with the existing input transformation based defense strategies, the robustness of the proposed RaCNN comes at the expense of the generalization performance on clean input examples. We have observed however that this degradation could be controlled at the expense of computational overhead by varying the number of retrieved examples per input. This controllability could be an important feature when deploying such a model in production.

6 Conclusion

In this paper, we proposed a novel retrieval-augmented convolutional network classifier (RaCNN) that integrates an off-the-shelf retrieval engine to counter adversarial attacks. The RaCNN was designed to tackle both off- and on-manifold adversarial examples, and to do so, we use a retrieval engine to locally characterize the data manifold as a feature-space convex hull and the attention mechanism to project the input onto this convex hull. The entire model, composed of the retrieval engine and a deep convolutional network, is trained jointly, and we introduced the local mixup learning strategy to encourage the classifier to behave linearly on the feature-space convex hull.

We have evaluated the proposed approach on three standard object recognition benchmarks–CIFAR-10, SVHN and ImageNet– against four white-box adversarial attacks and one black-box, decision-based attack. The experiments have revealed that the proposed approach is indeed more robust than the vanilla convolutional network in all the cases. The RaCNN was found to be especially robust to the black-box, decision-based attack, suggesting its potential for the cloud-based deployment scenario.

The proposed approach consists of three major components; (1) local characterization of data manifold, (2) data manifold projection and (3) regularized learning on the manifold. There is a large room for improvement in each of these components. For instance, a feature-space convex hull may be replaced with a more sophisticated kernel estimator. Projection onto the convex hull could be done better, and a learning algorithm better than local mixup could further improve the robustness against on-manifold adversarial examples. We leave these possibilities as future work.

References

7 Appendix: Model details

Our CIFAR-10 and SVHN model spec is listed in the following table.

Stage Architecture Size
Feature extractor 96 3x3 convolution 96 x 30 x 30
batch normalization
96 3x3 convolution 96 x 28 x 28
batch normalization & ReLU

96 3x3 convolution with stride 2x2

96 x 13 x 13
batch normalization & ReLU
192 3x3 convolution 192 x 11 x 11
batch normalization & ReLU
192 3x3 convolution with stride 2x2 192 x 4 x 4
batch normalization
Attention 256 4x4 convolution 256
Convex-sum (with attention mechanism or local mixup) 256
Classification fully-connected layer 256 x 64 64
batch normalization & ReLU
fully-connected layer 64 x 10 10

The pretrained retrieval index building network is listed as follow:

Stage Architecture Size
Feature extractor 96 3x3 convolution 96 x 30 x 30
batch normalization & ReLU
96 3x3 convolution 96 x 28 x 28
batch normalization & ReLU
96 3x3 convolution with stride 2x2 96 x 13 x 13
batch normalization & ReLU
192 3x3 convolution 192 x 11 x 11
batch normalization & ReLU
192 3x3 convolution with stride 2x2 192 x 4 x 4
batch normalization
Classification fully-connected layer 3072 x 512 512
Used only batch normalization & ReLU
in Scenario 2 fully-connected layer 512 x 128 128
batch normalization & ReLU
fully-connected layer 128 x 10 10