Log In Sign Up

VisionGuard: Runtime Detection of Adversarial Inputs to Perception Systems

by   Yiannis Kantaros, et al.

Deep neural network (DNN) models have proven to be vulnerable to adversarial attacks. In this paper, we propose VisionGuard, a novel attack- and dataset-agnostic and computationally-light defense mechanism for adversarial inputs to DNN-based perception systems. In particular, VisionGuard relies on the observation that adversarial images are sensitive to lossy compression transformations. Specifically, to determine if an image is adversarial, VisionGuard checks if the output of the target classifier on a given input image changes significantly after feeding it a transformed version of the image under investigation. Moreover, we show that VisionGuard is computationally-light both at runtime and design-time which makes it suitable for real-time applications that may also involve large-scale image domains. To highlight this, we demonstrate the efficiency of VisionGuard on ImageNet, a task that is computationally challenging for the majority of relevant defenses. Finally, we include extensive comparative experiments on the MNIST, CIFAR10, and ImageNet datasets that show that VisionGuard outperforms existing defenses in terms of scalability and detection performance.


Countering Adversarial Images using Input Transformations

This paper investigates strategies that defend against adversarial-examp...

Robust Adversarial Attacks Against DNN-Based Wireless Communication Systems

Deep Neural Networks (DNNs) have become prevalent in wireless communicat...

EagleEye: Attack-Agnostic Defense against Adversarial Inputs (Technical Report)

Deep neural networks (DNNs) are inherently vulnerable to adversarial inp...

Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural Networks Against Adversarial Attacks

Despite achieving state-of-the-art performance across many domains, mach...

GraCIAS: Grassmannian of Corrupted Images for Adversarial Security

Input transformation based defense strategies fall short in defending ag...

Defending Adversarial Examples via DNN Bottleneck Reinforcement

This paper presents a DNN bottleneck reinforcement scheme to alleviate t...

Dispersed Pixel Perturbation-based Imperceptible Backdoor Trigger for Image Classifier Models

Typical deep neural network (DNN) backdoor attacks are based on triggers...

I Introduction

Deep neural networks (DNNs) have been deployed in multiple safety-critical systems, such as medical imaging, autonomous cars, and surveillance systems. At the same time, DNNs have been shown to be vulnerable to adversarial examples [1], i.e., inputs which have deliberately been modified to cause either misclassification or desired incorrect prediction that would benefit an attacker. Adversarial examples in the literature can be divided into two sub-classes depending on how the attack is executed. One augments the physical environment to induce misclassification (e.g., adding a sticker to a stop sign) [2], while the other adds a small perturbation to the classifier input data. In this work, when we refer to adversarial examples, we only refer to the latter subclass (i.e., small perturbation attacks). Adversarial examples (especially in the case of image classification) have received increased research attention due to the following properties. First, the difference between legitimate and adversarial inputs can be imperceptible, making adversarial detection a very challenging task [1]. Second, the transferability of adversarial samples between different models allows for black-box attacks [3, 4]. Third, adversarial samples are often misclassified with high confidence, implying that DNNs fail to discriminate between adversarial and legitimate inputs [5].

To establish reliability and security of DNN-based perception systems against adversarial input images, we propose, VisionGuard, a novel attack- and dataset-agnostic detection framework. VisionGuard does not modify the specific classifier and, does not rely on building separate classifiers. Instead, VisionGuard relies on the observation that adversaries may be successful at fooling DNNs due to the large feature space over which they can look for adversarial inputs. This is also validated in our experiments: the larger the input space (i.e., image dimensions), the easier to fool the target classifier. Motivated by this, the proposed defense aims to shrink the feature space available to adversaries. In particular, to determine if an image is adversarial, VisionGuard checks if the softmax output of the target classifier on a given input image changes significantly after feeding it a ‘refined’ version of that image. To refine images, i.e., to squeeze out possibly unnecessary features of the input image, we apply lossy compression algorithms (e.g., JPEG) with high compression quality. Then, we measure the similarity of the corresponding softmax outputs using the Kullback-Leibler (K-L) divergence metric. If this metric is above a threshold, the image is classified as adversarial; otherwise, it is classified as clean.

I-a Related Works

Similar defenses that rely on image transformations have also been proposed in the image purification domain. For instance, [6, 7] apply JPEG compression, bit depth reduction, and crop ensemble to remove noise and possible adversarial components from images. However, purification is applied to all images, whether they are adversarial or not, which compromises the accuracy of the network on clean images [8]. In contrast, VisionGuard does not affect the accuracy of the target classifier. Image transformations have also been employed in [9] to detect adversarial inputs but in a completely different way than the proposed one. In particular, [9] relies on building a DNN-based detector that takes as input features, where is the number of applied transformations (e.g., rotation and translation) and

is the number of logits/classes. For instance, for the MNIST dataset

and while for ImageNet there are logits. Similar to VisionGuard, MagNet [10] checks if an input image is adversarial by applying a single image transformation and examining the corresponding softmax output. The difference is that instead of compression algorithms, MagNet employs auto-encoders to generate new images that are reconstructed from the original ones. Related defenses that complement VisionGuard such as robust/adversarial training [1, 11] and image purification approaches [12] are discussed in Section III.

I-B Evaluation: Scalability & Detection Performance

We evaluate VisionGuard on the MNIST, CIFAR10, and ImageNet datasets and we show that, unlike relevant works, it is very computationally light in terms of runtime and memory requirements, even when it is applied to large-scale datasets, such as ImageNet; therefore, it can be employed in real-time applications that may also involve large-scale image spaces. We provide extensive comparisons that show that VisionGuard outperforms similar detectors [10, 13, 14] both in terms of scalability and detection performance.


In particular, VisionGuard attains comparable performance to these defenses on small datasets, such as MNIST and CIFAR10, but is more computationally efficient in terms of memory requirements. Due to its low computational cost at both runtime and design-time, VisionGuard is suitable for real-time applications that may involve large-scale image domains. as highlighted by our experiment on ImageNet images. To the contrary, application of [10, 13, 14] to ImageNet is computationally challenging. Specifically, the defense in [10] requires the training of additional auto-encoders. In fact, we were unable to train an auto-encoder for ImageNet within two weeks of training. The detector in [13, 14], on the other hand, requires extracting and storing the last hidden layer output for all training images which is a time-consuming process and may not be possible on all platforms (e.g., lightweight IoT cameras) due to excessive memory requirements. These embeddings are used at runtime to check if an image is adversarial. The same limitation also holds for [15] which employs the KDE detector [13, 14] but additionally requires a specific training process for the target classifier. Similarly, [9] requires training a new DNN detector. Additionally, [9] is significantly more computationally expensive than VisionGuard, since at runtime it requires the application of image transformations and the last hidden layer output of the target classifier for each transformed image.

Detection Performance

Also, we provide extensive experiments that show that VisionGuard attains high detection performance on the MNIST, CIFAR10, and ImageNet datasets. The detectors in [13, 14, 10] attain similar performance to VisionGuard on MNIST and CIFAR 10 but their performance drops significantly in more complex datasets such as ImageNet. In fact their performance is comparable to a random detector. Moreover, we would like to emphasize that the defenses proposed in [10, 9, 13, 14, 15] are dataset-specific as they heavily rely on training sets directly and/or building separate DNN models based on these training sets. Particularly, in these works, a new auto-encoder and a new DNN-based detector need to be built and new embeddings need to be extracted and stored for each dataset. Therefore, it is unclear how these methods perform when they are deployed in real-world environments for which datasets do not exist. To the contrary, VisionGuard does not rely on training sets, or on the training process of the target DNN, or on building separate DNN classifiers. For instance, we show through extensive experiments that VisionGuard is dataset-agnostic in the sense that the same transformation (JPEG with compression quality ) yields high detection performance across datasets that differ both in content and in image dimensions.

Finally, we would like to highlight that several white-box attacks have been proposed to bypass existing defenses under the assumption that the structure of the defense mechanism is known to the attacker [16, 17, 18, 19]. Designing defenses against white-box attacks is out of the scope of this paper. Instead, our goal is to address an equally important issue which is to develop defense mechanisms that scale to large image domains, a task that is particularly challenging for existing defenses both at design- and run-time as shown in our experiments. Nevertheless, we include preliminary results showing that VisionGuard is robust to a certain class of white-box attacks by introducing randomness in the image transformations.

I-C Contribution

The contributions of this paper can be summarized as follows. First, we introduce VisionGuard, a new attack-agnostic and dataset-agnostic detection technique for defense against adversarial examples. Second, we show that VisionGuard is more computationally efficient, both at run-time and design-time, than defenses that rely on training sets or building DNN-based detectors. This allows us to apply VisionGuard to real-time applications that may also involve large-scale image domains, illustrated by experiments on ImageNet. Third, we provide extensive comparative experiments on MNIST, CIFAR10, and ImageNet, that show that VisionGuard outperforms similar defenses in terms of scalability and detection performance.

Ii Adversarial Attacks

Fig. 1: Examples of almost imperceptible adversarial images from ImageNet after CW attack.

Consider a classifier , where is the set of images , where is the number of pixels, and is the set of labels. Let and denote the true and the predicted label of image , respectively. Then, the goal of an attacker is to perturb an image by so that the difference between the perturbed and the original image is imperceptible and the perturbed image is missclassified, i.e., . In what follows, we provide a summary of existing adversarial attacks that can generate such perturbations .

Fast Gradient Sign Method (FGSM): The Fast Gradient Sign Method (FGSM) [3] creates adversarial examples by perturbing the images

in the direction the gradient of the loss function by magnitude

, where determines the perturbation size, i.e.,


where is the sign function and is the model’s loss function with parameters and labels .

Projected Gradient Descent (PGD): The Projected Gradient Descent (PGD) method is a straightforward extension of FGSM. Specifically, it applies adversarial noise many times iteratively, giving rise to the following recursive formula:


where and represent a clipping of the values of a sample so that it is within the -neighborhood of . Compared to FGSM, this approach allows for extra control over the attack.

Jacobian Saliency Map Attack (JSMA): An iterative method for targeted misclassification is proposed in [20]. Specifically, an adversarial saliency map is constructed based on the forward derivative, as this gives the adversary the information required to make the neural network misclassify a given sample. For an input and a neural network , the DNN output associated with the class is denoted by . To achieve a target class ,

must increase while the probabilities

, where must decrease, until . The adversary can accomplish this by increasing input features using the following saliency map

where and is an input feature. High values of correspond to input features that will either increase the target class, or decrease other classes significantly, or both. Thus, the goal is to find input features that maximize and perturb these features by . This process is repeated iteratively until the target misclassification is achieved.

Carlini-Wagner (CW): An iterative (targeted or untargeted) attack to generate adversarial examples with small perturbations is proposed in [21]. The perturbation is selected by solving the following optimization problem:


where is a suitably chosen constant and depends on the softmax output of the neural network and is selected so that if the perturbed image is misclassified or gets a desired label, and otherwise.

Iii Existing Defenses Against Adversarial Inputs

To establish the reliability and security for DNN systems, several techniques have been proposed that range from adversarial and robust training to building separate classifiers that detect adversarial inputs. A recent summary of existing defenses can also be found in [22].

A common defense mechanism against adversarial examples relies on augmenting the training set with adversarial examples and incorporating an adversarial component as a regularizer in the classification objective; see e.g., [23, 24] and the references therein. However, these adversarial training methods are effective only on adversarial examples that are crafted using the parameters of the neural network, while adversarial inputs are transferable between different models [3, 4].

Training methods that improve the robustness of neural networks, independently of the attack, have also been proposed. For instance, distillation, initially proposed to reduce the size of deep neural networks [11], can be used as a defensive mechanism that makes crafting of adversarial examples difficult [25]. A computationally-efficient robust training method is proposed in [4]

that relies on Gaussian data augmentation during training and requires the BReLU activation function.

Image purification methods have also been proposed that rely on training neural network that act as ‘denoisers’ [26, 12]. The goal of the denoisers is to purify the images, aiming to remove any adversarial components, which are then fed to the classifier. Another idea of defense is to use separate DNNs to detect adversarial examples; see e.g., [27, 10, 28, 29] and the references therein. Common limitations of these approaches are that (i) they compromise the accuracy of the network on clean images [8] since, purification is applied to all images, whether they are adversarial or not and (ii) they are often sensitive to attack-specific parameters or dependent on the targeted classifier [30].

An alternative research direction for defenses against adversarial attacks focuses on designing adversarial detectors that do not depend on the training process of the targeted classifier and do not rely on training new classifiers. Works along this direction are presented in [13, 14, 31, 6, 7]. VisionGuard also lies in this category and complements existing robust/adversarial training approaches and purification-based defenses.. In [13, 14]

the kernel density estimation (KDE) detector is proposed that selects thresholds on the likelihood of an image. This likelihood is computed using the outputs of the last hidden layer of the classifier for the image under investigation and for all training images.


shows that adversarial images have abnormal coefficients in the lower-ranked principal components obtained by Principal Component Analysis (PCA) that can potentially be exploited for defense against adversarial inputs. Additionally, the purification-based approaches proposed in

[6, 7] apply transformations (such as JPEG compression, bit depth reduction, and crop ensemble) to the input images before feeding them to the classifier aiming to remove any possible adversarial components; however, these approaches affect the accuracy of the classifier on clean images [8]. The majority of all defenses discussed above have only been evaluated on smaller datasets such as MNIST and CIFAR10; therefore, their applicability on realistic, large-scale image domains is questionable as also discussed in [18]. Recently, detectors that operate directly on images, independent from the targeted classifier, have been proposed that rely on steganalysis methods [32]. However, the defense in [32] is attack-specific, since separate detectors must be designed for each type of attack. Note that [32] is not effective on small-scale images as they do not provide enough samples to construct efficient features.

Iv VisionGuard: A New Image Defense Framework

Our goal is to build a detector , such that (i) if the image is a legitimate image and (ii) if is an adversarial input, i.e., if it has been manipulated/perturbed. In what follows, we propose VisionGuard, an attack-agnostic compression-based detector; see also Algorithm 1.

Input: Input image , Target classifier ;
Output: if is an adversarial input, and otherwise;
1 ;
2 Feed to the DNN model and get softmax output ;
3 Apply lossy compression with high compression quality to and get image ;
4 Feed to the DNN model and get softmax output ;
5 Compute ;
6 if  then
7       ;
Algorithm 1 VisionGuard

The proposed detector relies on the observation that classifiers are robust to certain small transformations applied to the inputs that squeeze out features that may be unnecessary for correct classification. In particular, here, as a transformation, we employ lossy compression with high compression quality (e.g., JPEG compression). VisionGuard comprises four steps to detect whether an input image is adversarial or not. First, the input image is fed to the classifier to get the softmax output denoted by . Second, lossy compression with high compression quality is applied to to get an image denoted by . Third, is fed to the classifier to get the softmax output denoted by . Fourth, is classified as adversarial if the softmax outputs and are significantly different. Formally, we measure similarity between and using the K-L divergence measure, denoted by . Specifically, if is greater than a threshold , then is considered an adversarial input, i.e., ; otherwise, is classified as a legitimate image, i.e., .

Detection Thresholds: To determine the detection threshold we use Receiver Operating Characteristics (ROC) graphs that are constructed as follows. First, given a set of clean images we construct the corresponding set of adversarial images, denoted by , using any attack or possibly a mixture of attacks. Next, recall that our detection mechanism maps each image to (legitimate input) or (adversarial input). Hereafter, we call the class of adversarial images as ‘positives’ and the class of clean images as ‘negatives’. Then, given a threshold , we estimate the true positive rate as the number of true positives (i.e., the number of adversarial inputs classified as adversarial inputs) divided by the total number of positives, i.e., the total number of adversarial images. Similarly, we estimate the false positive rate as the number of false positives (i.e., the number of legitimate inputs classified as adversarial inputs) divided by the total number of negatives, i.e., the total number of clean images. Then, ROC graphs can be constructed by plotting the TP rate on the Y axis and the FP rate on the X axis for various thresholds . Given an ROC graph, we select the threshold that returns the closest point to , since this point corresponds to perfect attack detection.

V Experiments

(a) FGSM, PGD, and JSMA
(b) CW
Fig. 2: Accuracy of the target DNN classifier on test sets of adversarial images generated by FGSM, PGD, JSMA, and CW for various attack-specific parameters on the MNIST, CIFAR-10, and ImageNet datasets.
FGSM 0.00005 0.0017 0.1728
PGD 0.005 1.11 8.46
JSMA 1.75 19.67 DNF
CW 0.005 1.01 16.39
TABLE I: Runtime (secs) of Attack Algorithms per Image
FGSM 61.6% 58.4% 56.2%
FGSM 75.9% 71.8% 76.5%
FGSM 71.3% 72.9% 84.5%
FGSM 57.1% 51.6% 92.0%
PGD 93.7% 87.5% 83.7%
PGD 90.0% 85.9% 81.7%
PGD 79.8% 76.7% 85.5%
JSMA 71.7% 71.3% 65.7%
JSMA 83.4% 80.2% 77.8%
JSMA 91.7% 85.5% 85.8%
CW 96.3% 95.7% 88.1%
CW 96.1% 96.0% 89.9%
CW 94.7% 96.0% 87.3%
TABLE II: MNIST: Comparative experiments in terms of AUC against MagNet and KDE.

. CIFAR10 VG+JPEG92 MagNet KDE FGSM 84.1% 84.3% 80.8% FGSM 82.9% 89.8% 89.2% FGSM 78.3% 89.7% 90.4% FGSM 76.9% 85.5% 89.7% PGD 93.7% 93.9% 96.9% PGD 91.1% 93.7% 96.4% PGD 89.2% 93.4% 95.9% JSMA 93.9% 93.4% 95.0% JSMA 94.2% 93.7% 95.5% JSMA 94.2% 93.7% 95.6% CW 87.2% 90.0% 89.6% CW 85.8% 89.5% 89.0% CW 84.8% 88.9% 87.1%

TABLE III: CIFAR10: Comparative experiments in terms of AUC against MagNet and KDE

. ImageNet VG+JPEG75 VG+JPEG92 VG+JPEG98 VG+Median3 MagNet KDE FGSM 84.0% 89.7% 86.6% 85.0%         DNF (could not train auto-encoders) 47.1% FGSM 85.3% 94.7% 86.7% 88.9% 47.9% FGSM 88.9% 98.7% 88.8% 93.3% 46.0% FGSM 91.0% 99.8% 82.2% 94.0% 47.4% PGD 89.6% 94.5% 87.4% 86.7% 43.2% PGD 89.8% 98.7% 93.4% 87.8% 46.8% PGD 94.3% 98.4% 91.2% 80.1% 47.6% CW 93.2% 97.9% 82.6% 82.4% 54.9% CW 87.2% 91.9% 88.3% 88.6% 47.4% CW 80.9% 85.6% 85.0% 87.9% 47.2%

TABLE IV: ImageNet: Comparative experiments in terms of AUC against MagNet and KDE

We tested VisionGuard (VG) against the state-of-the-art attacks, FGSM, PGD, JSMA, and CW, on three standard machine learning datasets: MNIST, CIFAR 10, and ImageNet. MNIST contains

grayscale images divided into training samples and test samples with classes. CIFAR10 contains RGB images divided into training samples and test samples with classes, as well. ImageNet (ILSVRC2012) contains million RGB images divided in million training images, validation images, and testing images. For the purposes of experimentation, we treat the validation set as the training set due to the availability of labels. For the MNIST classification task, we consider a simple, fully-connected neural network with two hidden layers and neurons per layer that achieves

accuracy on the test set. As for the CIFAR10 and ImageNet datasets, we consider convolutional neural networks with residual blocks (ResNet-56 and ResNet-50, respectively)

[33]. The accuracy of the trained ResNet-56 and ResNet-50 is and , respectively.

VG 0 0 0
MagNet 24KB 16KB DNF ( weeks)
KDE 5.1MB 4.2MB 4.8GB
TABLE V: Disk-space requirements of VisionGuard, MagNet, and KDE per dataset

Evaluation of Attacks: We apply the FGSM, PGD, JSMA, and CW attacks, for various attack-specific parameters on the MNIST, CIFAR10, and ImageNet test sets. The accuracy of the classifier on the resulting adversarial test sets is depicted in Figure 2. Observe in this figure that as the magnitude of the perturbation increases, the accuracy of the neural network decreases. Moreover, observe that as the image dimensions increase, it is easier to fool the target classifier. The reason is that adversaries can search for adversarial inputs over larger input feature spaces. In Table I, we also report the average runtime required to generate a single adversarial image on MNIST, CIFAR10, and ImageNet using the FGSM, PGD, JSMA, and CW attacks. Observe that the most computationally-light attack is FGSM while the most computationally-expensive is JSMA. In fact, JSMA failed to generate an adversarial ImageNet image due to memory constraints within the attack method. Also, note that there is a trade-off between computationally-efficiency and effectiveness of the attack. Specifically, observe in Figure 2 that e.g., on CIFAR10, FGSM and JSMA are the most and least effective attacks, respectively.

Evaluation of VisionGuard & Comparative Experiments: In what follows, we evaluate the efficacy of VG using ROC graphs. For the construction of ROC graphs, we call ‘positives’ the images (i) that have been attacked (even if the attack fails, i.e., it does not cause misclassification) and (ii) the clean images that are misclassified as they can also been seen as ‘adversarial’ inputs. All other images (i.e., clean images that are correctly classified) are called ’negatives’. We examine the performance of VG when it is integrated with JPEG compression with various compression qualities and with median filters. Note that VG along with rotation transformations, such as the ones used in [9], or bit-depth reduction transformations, as used in [6, 7], yield poor detection performance and, therefore, such results are omitted. Finally, we provide comparisons against MagNet [10] and the KDE detector that is originally proposed in [13] and later presented also in [14]. To compare against MagNet and KDE, we leverage the code provided by the authors.

MNIST: Table II presents the area under the ROC graphs (AUC) when VG is applied using JPEG compression with compression quality. Similar performance was seen for compression qualities , , , and median filter. The additional results are omitted due to space limitations. Observe in Table II that VG outperforms both MagNet and KDE in almost all attacks. Also, note that VG and MagNet fail to detect FGSM-generated adversarial inputs. Finally, VG using JPEG compression requires secs on average to check if an input image is adversarial.

CIFAR10: The respective AUC comparison for CIFAR10 is presented in Table III. VG, MagNet, and KDE have comparable AUC-based performance on adversarial images generated using the PGD, JSMA and CW attacks, while both KDE and MagNet outperform VG on FGSM-based adversarial inputs, especially for large values of the attack parameter . Note that [18] states that the KDE detector gives poor performance on CIFAR10, which contradicts our results. Finally, VG using JPEG compression requires 0.0187 secs on average to check if an input image is adversarial.

ImageNet: To evaluate VG on ImageNet, we have randomly sampled images; the results are summarized in Table IV. Observe that VG attains high AUC-based performance () for almost all attacks and any attack parameters. MagNet requires a new auto-encoder for each new dataset it is applied to. An auto-encoder for ImageNet is not provided by the authors in [10] while training such an auto-encoder did not finish within two weeks; therefore, comparisons on ImageNet are not available. Furthermore, extracting the embeddings for million ImageNet images, as required in [13, 14, 15], required hours approximately. In Table IV we report the performance of the KDE detector which is comparable to the performance of a random detector. Finally, note that VG using JPEG compression requires on average secs to check if an ImageNet image is adversarial. As the dimensions of the input images increase, the runtime of VG increases slightly as well. This is expected as the JPEG compression/decompression time complexity is , where is the number of pixels [34].


Recall that MagNet requires a new autoencoder for each new dataset it is applied to. Similarly, the KDE-based detectors

[13, 14, 15] rely on extracting embeddings from training sets. As a result, it is questionable if these detectors achieve high detection performance when they are deployed in real-world environments for which datasets may not exist. Notice in Tables II-IV that VisionGuard is dataset-agnostic as the same transformation, JPEG with compression quality yields high detection performance across the MNIST, CIFAR10, and ImageNet datasets that differ both in content and image dimensions.

Disk-space Requirements: In Table V, we report the disk-space requirements for VG, MagNet, and KDE. MagNet requires storing the auto-encoders that are used at runtime. The auto-encoders used in [10]

are stored as keras models and require

KB and KB for MNIST and CIFAR10, respectively. KDE requires disk-space to store the last hidden layer output of the DNN for all training images. In particular, KDE requires MB, MB, and GB for MNIST, CIFAR10, and ImageNet images respectively. In contrast, VG does not have any disk-space requirements, as it does not rely on training sets or building new DNNs and performs the image transformations in memory.

Robustness to Random Noise: As JPEG compression is sensitive to noise, we also evaluated the robustness of VG on random noise. Specifically, first we add random Gaussian noise to the original clean images. The accuracy of the classifiers on the generated noisy MNIST, CIFAR10, and ImageNet datasets dropped by on average across all datasets indicating that the generated noise caused misclassifications. Then, we compute the ROC curves where positives are the original clean and correctly classified images and negatives are the resulting noisy and correctly classified images. The AUC is , , and on MNIST, CIFAR10, and ImageNet, respectively. This shows that VG cannot distinguish between clean and noisy images and, therefore, it will not raise false alarms due to random noise e.g., dust in the camera lens.

Robustness to White-box Attacks: The majority of existing defenses, including [10, 13, 14], can be easily bypassed, as shown recently, especially in white-box settings, where the attacker knows the structure of the detector. In particular, MagNet [10] is sensitive to adversarial inputs as it relies on training auto-encoders. Similarly, KDE [13, 14] can be bypassed in a two-step attack derived from the CW attack [18]. First, the CW attack is applied to an input image yielding an image . Second, a modified version of the CW attack is applied to which yields an image . This secondary attack extends the CW attack by incorporating the KDE of the adversarial image generated in the first step in the objective function (3). In particular, the objective function of the secondary attack is , where . In a similar way, by incorporating the KL-based metric into the objective function of (3), VisionGuard may be bypassed. However, even in this case, we can introduce randomness in terms of the compression quality or the employed transformation to make it harder to fool the detector. Particularly, recall that the performance of VisionGuard does not change significantly on CW-based adversarial images as the image transformations change; see Table IV. However, changes significantly across different transformations. For instance, using the CW attack with

on ImageNet, the mean and variance of

across images is (i) and for JPEG75; (ii) and for JPEG92; (iii) and for JPEG98; and (iv) and for Median . By applying random transformations once an input image is received (i.e., in this case the attacker cannot know the parameters of the transformation), we allow (which is also part of the CW white-box attack) to take values that may be completely different from what the attacker thinks. For instance, when implementing this CW white-box attack we observed that if the value of used by the attacker is significantly different from the true one, the AUC decreases only by depending on the value of the attack parameter within the the secondary CW attack. A similar observation is also made in [9]. Nevertheless, this white-box attack distorts significantly the input images, since the CW attack is applied twice to the input image. Note that introducing randomness to the KDE detector is impossible by its construction.

Moreover, we would like to emphasize that existing attack algorithms are quite computationally expensive and, therefore, they can be hardly applied to real-time applications such as autonomous driving. For instance, recall from Table I that the CW attack requires in average more than seconds to craft a single adversarial input while this runtime significantly increases in the white-box setting. As a result, to bypass a detector in a white-box setting, the attacker not only does it need to know the structure of the detector but it also needs a sufficient amount of time to perform the attack.

Vi Conclusions

In this paper, we proposed VisionGuard an attack- and dataset-agnostic defense against adversarial input to perception systems that scales to large-scale image domains. To determine whether an image is adversarial or not, VisionGuard checks if the output of the classifier remains consistent under JPEG transformations. Future work will focus on designing defenses against physical adversarial attacks.