Code for "PatchCleanser: Certifiably Robust Defense against Adversarial Patches for Any Image Classifier"
The adversarial patch attack against image classification models aims to inject adversarially crafted pixels within a localized restricted image region (i.e., a patch) for inducing model misclassification. This attack can be realized in the physical world by printing and attaching the patch to the victim object and thus imposes a real-world threat to computer vision systems. To counter this threat, we propose PatchCleanser as a certifiably robust defense against adversarial patches that is compatible with any image classifier. In PatchCleanser, we perform two rounds of pixel masking on the input image to neutralize the effect of the adversarial patch. In the first round of masking, we apply a set of carefully generated masks to the input image and evaluate the model prediction on every masked image. If model predictions on all one-masked images reach a unanimous agreement, we output the agreed prediction label. Otherwise, we perform a second round of masking to settle the disagreement, in which we evaluate model predictions on two-masked images to robustly recover the correct prediction label. Notably, we can prove that our defense will always make correct predictions on certain images against any adaptive white-box attacker within our threat model, achieving certified robustness. We extensively evaluate our defense on the ImageNet, ImageNette, CIFAR-10, CIFAR-100, SVHN, and Flowers-102 datasets and demonstrate that our defense achieves similar clean accuracy as state-of-the-art classification models and also significantly improves certified robustness from prior works. Notably, our defense can achieve 83.8 certified robust accuracy against a 2 1000-class ImageNet dataset.READ FULL TEXT VIEW PDF
Localized adversarial patches aim to induce misclassification in machine...
An adversarial patch can arbitrarily manipulate image pixels within a
Adversarial patch attack aims to fool a machine learning model by arbitr...
Adversarial patch attacks that craft the pixels in a confined region of ...
Adversarial patch attacks are among one of the most practical threat mod...
State-of-the-art object detectors are vulnerable to localized patch hidi...
Top-k predictions are used in many real-world applications such as machi...
Code for "PatchCleanser: Certifiably Robust Defense against Adversarial Patches for Any Image Classifier"
The adversarial patch attack [brown2017adversarial, karmon2018lavan] against image classification models aims to induce test-time misclassification. A patch attacker injects adversarially crafted pixels within a localized restricted region (i.e., a patch) and can realize a physical-world attack by printing and attaching the patch to the victim object. The physically realizable nature of patch attacks imposes a significant threat to real-world computer vision systems. To secure the deployment of critical computer vision systems, there has been an active research thread on certifiably robust defenses against adversarial patches [chiang2020certified, zhang2020clipped, levine2020randomized, xiang2021patchguard, metzen2021efficient, cropping]. These defenses aim to provide a certifiable guarantee on making correct predictions on certain images, even in the presence of an adaptive white-box attacker. This strong robustness property provides a pathway towards ending the arms-race between attackers and defenders.
Limitation of prior works: the dependence on specific model architectures. While prior works have made significant contributions to certifiable robustness, they are hindered by their dependence on specific model architectures, which limits their performance. State-of-the-art certifiably robust image classifiers against patch attacks [zhang2020clipped, levine2020randomized, xiang2021patchguard, metzen2021efficient, cropping] all rely on the use of small receptive fields (i.e., the region of the input image that an extracted feature is looking at, or affected by). The small receptive field bounds the number of features that can be corrupted by the adversarial patch but also limits the information received by each feature. As a result, these defenses are limited in their classification accuracy: for example, the best top-1 clean accuracy on ImageNet [deng2009imagenet] achieved by prior certifiably robust defenses is around 55% [xiang2021patchguard, cropping] while state-of-the-art undefended classification models can attain an accuracy of 75%-90% [resnet, bit, vit, deit, resmlp]. The poor clean accuracy discourages the real-world deployment of proposed defenses and also limits the achievable robustness against adversarial patches (since the robust accuracy can be no higher than the clean accuracy).111Chiang et al. [chiang2020certified] proposed the first certifiably robust defense against adversarial patches via Interval Bound Propagation [gowal2018effectiveness, mirman2018differentiable]. This defense does not rely on small receptive fields but requires extremely expansive model training. As a result, it is only applicable to small classification models (with limited performance) and low-resolution images.
To the best of our knowledge, the only prior defense that achieves certifiable robustness without any assumption on the model architecture is Minority Reports (MR) [mccoyd2020minority]. However, the MR defense has a weaker security guarantee of being able to only detect a patch attack, and an attacker can force the model to always alert and abstain from making a prediction. In applications where human fallback is unavailable (e.g., level-5 autonomous vehicles without human drivers), the inability of a model to make a prediction can compromise functionality. In this paper, we focus on the harder task of performing robust image classification in the presence of a patch attack.
PatchCleanser: certifiably robust image classification (without abstention) that is compatible with any state-of-the-art image classifier. In order to overcome the limitation of prior works, we propose PatchCleanser as an architecture-agnostic defense for robust image classification (without any abstention). The PatchCleanser defense operates on the input images and is thus agnostic to the model architecture. PatchCleanser aims to remove/mask all adversarial pixels, such that the attacker has no influence on the model prediction and we can obtain accurate predictions from any state-of-the-art image classifier (with any architecture).
However, the key question is: How can we mask out the patch if the patch location is unknown? An intuitive idea is to place a large mask over all possible locations on an image and evaluate model predictions on every masked image. If the mask is large enough, then at least one masked image is benign (i.e., no adversarial pixels) and is likely to give a correct prediction (a similar intuition is used in MR for attack detection [mccoyd2020minority]). Unfortunately, despite the existence of one benign (and usually correct) masked prediction, an adaptive attacker who knows about our defense strategy can carefully manipulate other masked predictions to make it hard for us to distinguish the benign prediction from others (e.g., by injecting multiple different masked prediction labels).
Main contribution: a double-masking defense with certifiable robustness. In order to robustly recover the correct classification label, we propose a double-masking algorithm that involves two rounds of masking. We provide a defense overview in Figure 1. In the first round of masking (left of the figure), we apply every mask from a mask set to the input image and evaluate model predictions on one-masked images. The mask set is constructed in a way such that at least one mask can remove the entire patch (regardless of the patch location) and give a benign (and usually correct) masked prediction. If all first-round one-mask predictions reach a unanimous agreement, we output the agreed label (clean image; top of Figure 1). On the other hand, if the first round of masking has a disagreement (adversarial image; left bottom of Figure 1), we need to distinguish the benign prediction (the one with all adversarial pixels masked) from malicious ones (the ones with adversarial pixels left). Towards this end, we apply two masks in the second round of masking and use inconsistencies in model predictions on a set of two-masked images to filter out all malicious one-mask predictions (right bottom of the figure). We will present the details of our double-masking defense in Section III-C and analyze its certifiable robustness in Section III-D. Our theoretical results in Theorem 1 (Section III-D) demonstrate that, given a clean image, if model predictions on all two-masked images are correct, PatchCleanser with this model has certified robustness for this image against any patch attacker within our threat model.
Defense implementation: adaptive mask set generation and masked model training. In our defense implementation, we further adopt two techniques to boost defense performance. First, we use an adaptive mask set generation technique to generate mask sets with a variable number of masks while maintaining the security guarantee (at least one mask removes the entire patch). This technique enables us to optimize for computational constraints (by using fewer masks) at a small cost of classification accuracy. Second, we propose to apply two masks to the training images to mimic our double-masking defense. This approach can enhance model predictions on masked images and improve both clean accuracy and certifiable robustness of our defense.
Evaluation: state-of-the-art clean accuracy and certified robust accuracy. We instantiate PatchCleanser with three representative state-of-the-art architectures for image classification: ResNet [resnet], Vision Transformer (ViT) [vit], and ResMLP [resmlp]. We evaluate our defense performance on six image datasets: ImageNet [deng2009imagenet], ImageNette [imagenette], CIFAR-10 [cifar], CIFAR-100 [cifar], SVHN [svhn], and Flowers-102 [flowers]. We demonstrate that PatchCleanser achieves state-of-the-art (clean) classification accuracy and also greatly improves the certified robust accuracy from prior works [chiang2020certified, zhang2020clipped, levine2020randomized, xiang2021patchguard, metzen2021efficient]. In Figure 2, we plot the clean accuracy and certified robust accuracy of different defenses on the ImageNet dataset [deng2009imagenet] to visualize our significant performance improvement over prior works.
Our contributions can be summarized as follows:
We present PatchCleanser’s double-masking defense that is compatible with any image classifier to mitigate the threat of adversarial patch attacks.
We formally prove the certifiable robustness of PatchCleanser for certain images against any adaptive white-box attacker within our threat model.
We evaluate PatchCleanser on three state-of-the-art classification models and six benchmark datasets and demonstrate the significant improvements in clean accuracy and certified robust accuracy (e.g., Figure 2).
In this section, we formulate image classification models, adversarial patch attacks, and our defense objectives.
In this paper, we focus on the image classification problem. We use to denote the image space, where each image has width , height , number of channels , and the pixels are re-scaled to . We further denote the label space as . An image classification model is denoted as , which takes an image as input and predicts the class label .
We do not make any assumption on the architecture of the image classification model . Our defense is compatible with any popular model such as ResNet [resnet], Vision Transformer [vit], and ResMLP [resmlp].
Attack objective. We focus on test-time evasion attacks. Given a model , an image , and its true class label , the attacker aims to find an image satisfying a constraint such that . The constraint is defined by the attacker’s threat model, which we discuss next.
Attacker capability. The localized patch attacker has arbitrary control over the image pixels in a localized restricted region, and this region can be anywhere
on the image. Formally, we use a binary tensorto represent the restricted region, where the pixels within the region are set to and others are set to . We further use to denote a set of regions (i.e., a set of patches at different locations). Then, we can express the patch attacker’s constraint set as , where refers to the element-wise multiplication operator. When clear from the context, we drop and use instead of .
We note that three parameters determine the region : the number of patches, patch shapes, and patch sizes. In our presentation and experiments, we primarily focus on the case where represents one square region because a high-performance certifiably robust defense against a single patch is currently an open/unsolved problem. This also follows existing works [chiang2020certified, zhang2020clipped, levine2020randomized, xiang2021patchguard, metzen2021efficient] and enables a fair comparison (Section IV). Nevertheless, our defense is applicable to alternative patch shapes and multiple patches; we provide quantitative discussions in Section V. Furthermore, similar to prior works [xiang2021patchguard, mccoyd2020minority, xiang2021patchguardpp], we assume that the defender has a conservative estimation of the patch size
. We will analyze the implication of overly-conservative patch size estimation in SectionIV-C.
We design PatchCleanser with four major objectives.
Robust classification. We aim to build a defended model for robust classification. That is, we want to have for a clean data point and any adversarial example . Note that we aim to recover the correct prediction without any abstention, which is harder than merely detecting an attack (e.g., Minority Reports [mccoyd2020minority]).
Certifiable robustness. We aim to design defenses with certifiable robustness: given a clean data point , the defended model can always make a correct prediction for any adversarial example within the threat model, i.e., .222Heuristic-based defenses like Digital Watermarking [hayes2018visible] and Local Gradient Smoothing [naseer2019local] have been shown broken against an adaptive white-box attacker who has full knowledge of the defense setup [chiang2020certified]. Therefore, it is important to design defenses that have certifiable security guarantees. We will design a robustness certification procedure, which takes a clean data point and threat model as inputs, to check if the robustness can be certified. The certification procedure should account for all possible attackers within the threat model , who could have full knowledge of our defense and full access to our model parameters; thus, it lifts the burden of designing adaptive attacks for evaluating robustness. Finally, we calculate the fraction of clean images in a dataset for which we can certify their robustness. We call this fraction “certified robust accuracy” and take it as the metric for robustness evaluation.
Compatibility with any model architecture. As discussed in Section I, prior works [chiang2020certified, zhang2020clipped, levine2020randomized, xiang2021patchguard, metzen2021efficient] on certifiably robust image classification suffer from their dependence on the model architecture (e.g., small receptive fields). Such dependence limits the model performance and hinders the practical deployment of the defense. In PatchCleanser, we aim to design a defense that is compatible with any state-of-the-art model architecture to achieve high defense performance (recall Figure 2) and benefit from any progress in image classification research.
Scalability to high-resolution images. Finally, we aim to design our defense to be efficient enough to scale to high-resolution images. Since state-of-the-art models usually use high-resolution images for better classification performance [bit, vit, resmlp], defenses that only work for low-resolution images [chiang2020certified] have limited applicability.
In this section, we introduce the details of PatchCleanser design and implementation. We will also discuss how to certify the robustness of PatchCleanser for certain images against any adaptive white-box attacker within our threat model.
Architecture-agnostic defenses via pixel masking. In PatchCleanser, we aim to design a robust image classification defense that is compatible with any model architecture. We achieve this by performing our defense on the input image. Our high-level idea is to move a mask over the image and evaluate model prediction on every masked image. Intuitively, if the mask is large enough to cover the entire patch, then at least for one mask location, all adversarial pixels can be removed, and the corresponding masked prediction is benign and likely to be correct. However, an adaptive attacker can adversarially inject different masked prediction labels to make it hard for us to distinguish the benign prediction from others (recall the bottom left of Figure 1, which has three different one-mask prediction labels “cat”, “dog”, and “fox”). Therefore, we need a defense that is resilient to any white-box adaptive attacker and is able to perform robust classification without abstention.
PatchCleanser with a double-masking algorithm. In PatchCleanser, we propose a double-masking defense that uses two rounds of masking for robust prediction (Section III-C). We can certify/prove the robustness of our double-masking defense for certain images against any adaptive white-box attacker within our threat model (Section III-D). In our implementation, we further adopt two techniques to boost defense performance (adaptive mask set generation for efficiency and masked model training for accuracy; Section III-E).
In the following subsections, we formulate the mask set used for pixel masking and then discuss the details of our defense modules. We summarize a list of important notation in Table I.
PatchCleanser aims to mask out the entire patch on the image and obtain accurate predictions from any state-of-the-art classification model. In this subsection, we introduce the concept of a mask set used in our masking operations.
Mask set formulation. We represent each mask as a binary tensor in the same shape as the images; the elements within the mask take values of , and others are . We further denote a set of masks as (similar to the definitions of and ). We require the mask set to have the guarantee that at least one of its masks removes the entire patch (regardless of the patch location). We formally define this property as -covering.
A mask set is -covering if
where denotes that is smaller than in all coordinates.
One straightforward way to generate a -covering mask set is to set , i.e., using a mask in the same shape/size as the estimated patch shape/size and placing the mask over all possible locations. However, we note that this mask set can be extremely large (e.g., up to 40k possible locations for a 2424 mask on a 224224 image). We will discuss an optimized approach that can generate a -covering mask set with a tunable number of masks to meet different computational constraints (Section III-E). In the next subsection, we introduce how to perform our double-mask defense with a -covering mask set.
The double-masking algorithm is the core algorithm of PatchCleanser; it performs two rounds of masking with a -covering mask set to robustly recover the correct prediction label. The defense intuition is that: (1) for clean images, model predictions are generally invariant to (multiple) masks; (2) for adversarial images, the model prediction usually changes when a mask removes the patch; (3) when the entire patch on the adversarial image is masked, the masked image turns “clean”, and its predictions are generally invariant to a second mask. We first introduce the high-level defense design and then explain the algorithm details.
|Undefended model||Estimated patch size|
|Input image||Actual Patch size|
|Class label||Budget of #masks|
|Pixel mask||Mask size|
|Masked prediction set||Image size|
First-round masking: detecting a prediction disagreement. Recall that Figure 1 gives an overview of our double-masking algorithm. In the first round of masking, we apply every mask from the -covering mask set to the input image and evaluate all one-mask predictions (left of Figure 1). In the clean setting, all one-mask predictions are likely to reach a unanimous agreement on the correct label (Intuition 1), and we will output the agreed prediction (top of Figure 1). In the adversarial setting, at least one mask will remove all adversarial pixels; thus, at least one one-mask prediction is benign and likely to be correct (bottom left of Figure 1). In this case, we will detect a disagreement in one-mask predictions (benign versus malicious; Intuition 2); we will then perform a second round of masking to settle this disagreement.
Second-round masking: settling the prediction disagreement. We first divide all one-mask prediction labels into two groups: the majority prediction (the prediction label with the highest occurrence) and the disagreer predictions (other labels that disagree with the majority). We need to decide which prediction label to trust (i.e., the majority or one of the disagreers). To solve this problem, we iterate over every disagreer prediction and its corresponding first-round mask and add a second mask from our mask set to compute a set of two-mask predictions (right of Figure 1). If the first-round disagreer mask removes the patch, every second-round mask is applied to a “clean” image, and thus all two-mask predictions (evaluated with one first-round mask and different second-round masks) are likely to have a unanimous agreement (Intuition 3). We can trust and return this agreed prediction. On the other hand, if the first-round disagreer mask does not remove the patch, the one-masked image is still “adversarial”, and the second-round mask will cause a disagreement in two-mask predictions (when one of the second masks covers the patch). In this case, we discard this one-mask disagreer. Finally, if we try all one-mask disagreer predictions and no prediction label is returned, we trust and return the one-mask majority prediction as the default exit case.
Algorithm details. We provide the defense pseudocode in Algorithm 1. The defense procedure takes an image , an undefended model and the mask set as inputs and aims to output a robust prediction . Line 4-7 illustrates the first-round masking; Line 8-13 demonstrates the second-round masking.
Details of first-round masking. In Algorithm 1, we first call the masking sub-procedure using the mask set (Line 4). The mask set needs to ensure that at least one mask can remove the entire patch (i.e., -covering); we will discuss the mask set generation approach in Section III-E.
In , we aim to collect all masked predictions and determine the majority prediction label (i.e., the label with the highest occurrence) as well as disagreer predictions (i.e., other predictions). We first generate a set for holding all mask-prediction pairs. Next, for each mask in the mask set , we evaluate the masked prediction via ; here is the element-wise multiplication operator. We then add the mask-prediction pair () to the set . After gathering all masked predictions, we identify the label with the highest prediction occurrence as majority prediction (Line 23). Furthermore, we construct a disagreer prediction set , whose elements are disagreer mask-prediction pairs (Line 24). Finally, we return the majority prediction label and the disagreer prediction set .
After the first call of , we check if one-mask predictions reach a unanimous agreement (i.e., the disagreer prediction set is empty; Line 5). If is empty, we consider the input image likely as a clean image and return the agreed/majority prediction (Case I: agreed prediction; Line 6). On the other hand, a non-empty disagreer set implies a first-round prediction disagreement, and the algorithm proceeds to the second-round masking to settle the disagreement.
Details of second-round masking. The pseudocode of the second-round masking is in Line 8-13. We will look into every one-mask disagreer prediction in (Line 8). For each , we apply the disagreer mask to the image and feed the masked image to the masking sub-procedure for the second-round masking (Line 9). If all two-mask predictions agree with the one-mask prediction (i.e., the output of second-round masking satisfies that and ), we consider that the first-round mask has already removed the adversarial perturbations. Our algorithm returns this disagreer prediction (Case II: disagreer prediction; Line 11). On the other hand, if two-mask predictions disagree (i.e., ), we consider that the disagreer mask has not removed the patch. In this case, we discard this one-mask disagreer prediction and move to the next one. In the end, if we return no prediction in the second-round masking, we trust and return the majority one-mask prediction (Case III: majority prediction; Line 14).
Remark: defense complexity. When the number of disagreer predictions is bounded by a small constant (which is the usual case in the clean setting), the defense complexity is . However, its worst-case complexity is (doing all two-mask predictions). In Appendix -C, we discuss and evaluate another defense algorithm which has the same robustness guarantees and a better worst-case inference complexity , at the cost of a drop in clean accuracy.
In this subsection, we discuss how to certify the robustness of our double-masking algorithm. Recall that we consider our defense is certifiably robust for a given image if our model prediction is always correct against any adaptive white-box attacker within our threat model .
First, we define a concept of two-mask correctness, which we claim is a sufficient condition for certified robustness.
A model has two-mask correctness for a mask set and a clean data point if
Next, we present our theorem stating that two-mask correctness for a clean image (and a -covering set) implies the robustness of our defense to adversarial patches on that image.
Given a clean data point , a classification model , a mask set , and the threat model , if is -covering and has two-mask correctness for and , then .
To provide an intuition and aid the presentation of our proof, we provide a visual example in Figure 3, in which we have mask set and label set . Next, we present our formal proof.
Since is -covering, at least one one-masked image in the first-round masking is benign (no adversarial pixels). Furthermore, given the two-mask correctness, the model prediction on any benign one-masked image is correct (when two masks are at the same locations, two-mask correctness reduces to one-mask). Therefore, we have the guarantee that at least one one-mask prediction in the first-round masking is correct.
The correct first-round prediction can be either the majority prediction or the disagreer prediction. We will show that our defense will return the correct label in either scenario.
Scenario 1: (i.e., the correct one-mask prediction is the majority prediction). There are two sub-scenarios, and we show that both of them will return the correct label :
Let us look into the second-round masking (Line 8-13 of Algorithm 1). For every disagreer prediction and its corresponding mask , two-mask correctness and -covering imply that at least one two-mask prediction is correct as (when the second mask remove the patch). If , we will see a two-mask prediction disagreement (i.e., ); thus, the algorithm will not return via Case II. ∎
Scenario 2: (i.e., the correct one-mask prediction is one of disagreer predictions). In this case, the correct one-mask prediction disagrees with . As a result, , and the algorithm proceeds to second-round masking. We show that the second-round masking will always return a label (via Case II), and the returned label is always correct.
Claim 2.1: the return of a label. From -covering and , we know that there is one first-round disagreer mask that removes the patch. Since all adversarial pixels are removed, this one-mask prediction and all its two-mask predictions are correct as (due to two-mask correctness). Therefore, we have , and the algorithm can return this disagreer via Case II. Note that the algorithm will iterate over all disagreer masks; thus, our algorithm will definitely output a label. This is depicted for in the third row of Figure 3.
In summary, using Scenario 1 and Scenario 2, we demonstrate that Algorithm 1 will always return a correct prediction if the model has two-mask correctness. ∎
Robustness certification procedure. From Theorem 1, we can easily certify the robustness of our defense on a clean/test image by checking if our model has two-mask correctness on that image. The pseudocode for our robust certification is presented in Algorithm 2. It takes the clean data point , vanilla prediction model , mask set , and threat model as inputs, and aims to determine if our defense (Algorithm 1) can always predict correctly as .333We note that we only need the ground-truth label in our robustness certification procedure (Algorithm 2) to check the prediction correctness but not in our defense (Algorithm 1). The certification procedure first checks if mask set is -covering (Line 4). Next, it evaluates all possible two-mask predictions (Line 7-12). If any of the two-mask predictions are incorrect, the algorithm returns False (possibly vulnerable). On the other hand, if all two-mask predictions match the ground-truth , we have certified robustness for this image, and the algorithm returns True.
We reiterate that our notion of certified robustness is strong: it captures (1) all possible patch locations on an image and (2) all possible attack strategies within our threat model, including adaptive white-box attacks. In Section IV, we will use Algorithm 2 to evaluate the certifiable robustness of our defense. We report certified robust accuracy as the fraction of test images in a dataset for which Algorithm 2 returns True.
In this subsection, we present two implementation techniques that can improve the defense efficiency and model robustness, and finally provide a complete view of our end-to-end PatchCleanser defense.
Adaptive mask set generation. The first implementation technique aims to enhance defense efficiency. In practice, a defense needs to operate within the constraints of available computational resources. This imposes a bound on the number of masked model predictions that we can evaluate. Therefore, we need to carefully generate a mask set that meets the computation budget (i.e., the number of masks) while maintaining the security guarantee (i.e., -covering). We note that the reduction in computation overhead does not come for free but at the cost of classification accuracy (due to larger masks required by -covering); our mask set generation approach allows us to balance this trade-off between efficiency and accuracy in the real-world deployment. We first present our approach for 1-D “images” (Figure 4 provides two visual examples) and then generalize it to 2-D.
Adjusting the mask set size. We generate a mask set via moving a mask over the input image. Consider a mask of width over an image of size , we first place the mask at the coordinate (so that the mask covers the indices from to ). Next, we move the mask with a stride of across the image and gather a set of mask locations . Finally, we place the last mask at the index of in case the mask at cannot cover the last pixels. We can define a mask set as:
Furthermore, we can compute the mask set size as:
This equation shows that we can adjust the mask set size via the mask stride . In the example of Figure 4, we can reduce the mask set size from to by increasing the mask stride from to . Next, we discuss the security property (i.e., -covering) of the mask set .
Ensuring the security guarantee. Using a large mask stride might leave “gaps” between two adjacent masks; therefore, we need to choose a proper mask size to cover these gaps to ensure that the mask set is -covering.
The mask set is -covering for a patch that is no larger than .
Without loss of generality, we consider the first two adjacent masks in the 1-D scenario, whose mask pixel index ranges are and , respectively. Now let us consider an adversarial patch of size . In order to avoid being completely masked by the first mask, the smallest index of the patch has to be no smaller than . However, we find that the second mask starts from the index , so the patch that evades the first mask will be captured by the second mask. ∎
Mask set generation. Armed with the ability to adjust the mask set size and to ensure the security guarantee (as discussed above), we now present our complete mask set generation approach that takes the computational budget as one of its inputs. Given the image size , the estimated patch size , and the computation budget (i.e., the number of masks), we can derive mask stride and mask size using Lemma 2 () and Equation 2 () as follows:
We then generate the set via Equation 1 accordingly.
Generalizing to 2-D images. We can easily generalize the 1-D mask set to 2-D by separately applying Equation 1 and Equation 3 to each of the two axes of the image. For images, patches, number of masks, we can calculate with Equation 3 and obtain with Equation 1. The mask set generation becomes .
Remark: trade-off between efficiency and accuracy. As shown in Equation 3, if we want to improve the efficiency (by having a smaller ), we will have to use a larger stride and larger mask size . Intuitively, the model prediction is less invariant to a larger mask; thus, the improvement in efficiency can be at the cost of model accuracy. We will study this trade-off between efficiency and accuracy in Section IV-C.
Masked model training. The second implementation technique is to tailor the training technique to further boost model robustness. From Theorem 1, we can see that the key to the certifiable robustness is the model’s two-mask correctness. Therefore, we aim to improve the prediction invariance of model to pixel masking for achieving high certifiable robustness. Towards this goal, we add (two) masks at random locations to training images and teach the model to predict correctly on masked images. This strategy has been previously used as a data augmentation approach in conventional model training (e.g., Cutout [cutout]); we use it to enhance certifiable robustness against adversarial patches. We will analyze the effect of masked model training in Section IV-C.
End-to-end PatchCleanser pipeline. In summary, the end-to-end PatchCleanser pipeline is as follows:
First, we perform adaptive mask set generation to obtain a secure -covering mask set that satisfies a certain computation budget (number of masks ).
Second, we pick and train a model (with any architecture). Optionally, we can use masked model training to enhance its prediction invariance to pixel masking (masks used for the training can be different from ).
Third, most importantly, we perform double-masking (Algorithm 1) with and for robust classification.
Finally, we can use our certification procedure (Algorithm 2) to certify the robustness of PatchCleanser on a given image against any adaptive white-box attacker within the threat model . We will evaluate the fraction of test images that can be certified across multiple datasets in the next section.
|Dataset||ImageNette [imagenette]||ImageNet [deng2009imagenet]||CIFAR-10 [cifar]|
|Patch size||1% pixels||2% pixels||3% pixels||1% pixels||2% pixels||3% pixels||0.4% pixels||2.4% pixels|
|IBP [chiang2020certified]||computationally infeasible||65.8||51.9||47.8||30.8|
The BagCert numbers are provided by the authors [metzen2021efficient] through personal communication since the source code is unavailable; results for ImageNette are not provided.
In this section, we instantiate PatchCleanser with three different classification models, and extensively evaluate the defense using three different datasets (we have results for another three datasets in Appendix -B). We will demonstrate state-of-the-art clean accuracy and certified robust accuracy of PatchCleanser compared with prior works [chiang2020certified, zhang2020clipped, levine2020randomized, xiang2021patchguard, metzen2021efficient] and provide detailed analysis of our defense under different settings.
In this subsection, we briefly introduce our evaluation setup. We provide additional details in Appendix -A. We will make our code publicly available to enable reproducibility.
Datasets. We choose three popular image classification benchmark datasets for evaluation: ImageNet [deng2009imagenet], ImageNette [imagenette], CIFAR-10 [cifar]. In Appendix -B, we also include results for three additional datasets (Flowers-102 [flowers], CIFAR-100 [cifar], and SVHN [svhn]).
ImageNet and ImageNette. ImageNet [deng2009imagenet] is a challenging image classification dataset which has 1.3M training images and 50k validation images from 1000 classes. ImageNette [imagenette] is a 10-class subset of ImageNet with 9469 training images and 3925 validation images. ImageNet/ImageNette images have a high resolution, and we resize and crop them to 224224 before feeding them to different models.
CIFAR-10. CIFAR-10 [cifar] is a benchmark dataset for low-resolution image classification. CIFAR-10 has 50k training images and 10k test images from 10 classes. Each image is in the resolution of 3232. We resize them to 224
224 via bicubic interpolation for a better classification performance.
Models. We choose three representative image classification models to build PatchCleanser.
ResNet. ResNet [resnet]
is a classic Convolutional Neural Network (CNN) model. It uses layers of convolution filters and residual blocks to extract features for image classification. We use ResNetV2-50x1 and its publicly available weights trained for ImageNet[deng2009imagenet]. We finetune the model for other different datasets used in our evaluation.
Vision Transformer (ViT). ViT [vit] is adapted from NLP Transformer [vaswani2017attention] for the image classification task. It divides an image into disjoint pixel blocks, and uses self-attention architecture to extract features across different pixel blocks for classification. We use ViT-B16-224 [vit] trained for ImageNet and finetune it on other datasets.
Multi-layer Perceptron (MLP).
Multi-layer Perceptron (MLP).There have been recent advances in leveraging MLP-only architectures for image classification (e.g., MLP-mixer [mlpmixer], ResMLP [resmlp]). These architectures take pixel blocks as input and “mix” features/pixels across locations and channels for predictions. We choose ResMLP-S24-224 [resmlp] in our evaluation. We take the pre-trained model for ImageNet, and finetune it for other datasets.
Adversarial patches. Following prior works [chiang2020certified, levine2020randomized, xiang2021patchguard, metzen2021efficient], we report defense performance against a square patch that takes 1%, 2%, and 3% of input image pixels for ImageNet/ImageNette and a square patch with 0.4% and 2.4% pixels for CIFAR-10 images. We note that these patches can be anywhere on the image. In Section IV-C, we also report results for larger patch sizes to understand the limit of our defense. In Section V, we quantitatively discuss the implications of alternative patch shapes and multiple patches.
Defenses. We build three defense instances PC-ResNet, PC-ViT, PC-MLP using three vanilla models of ResNet, ViT, and MLP. In our default defense setup, we set the number of masks for the 224224 images. We then generate the -covering mask set as discussed in Section III-E (considering that we have a perfect estimation of the adversarial patch size, which follows the setup of PatchGuard [xiang2021patchguard]). We further apply two 128128 masks at random locations to the training images for masked model training. In Section IV-C, we will analyze the effect of different defense setups and discuss the implications of conservatively using an over-estimated patch size.
We also report defense performance of prior works Interval Bound Propagation based defense (IBP) [chiang2020certified], Clipped BagNet (CBN) [zhang2020clipped], De-randomized Smoothing (DS) [levine2020randomized], PatchGuard (PG) [xiang2021patchguard], and BagCert [metzen2021efficient] for comparison. We use the optimal defense settings stated in their respective papers.
We report clean accuracy and certified robust accuracy as our main evaluation metrics. Theclean accuracy is defined as the fraction of clean test images that can be correctly classified by our defended model. The certified robust accuracy is the fraction of test images for which Algorithm 2 returns True (certifies the robustness for this image), i.e., no adaptive white-box attacker can bypass our defense. In addition to accuracy metrics, we also use the per-example inference time to analyze the computational overhead.
We report our main evaluation results for PatchCleanser in Table II and compare defense performance with prior works [chiang2020certified, zhang2020clipped, levine2020randomized, xiang2021patchguard, metzen2021efficient]. For each evaluation setting (each column in the table), we mark the best result for PatchCleanser models and the best result for prior works in bold.
State-of-the-art clean accuracy. As shown in Table II, PatchCleanser achieves high clean accuracy. Take PC-ViT as an example, PatchCleanser achieves 99.5+% clean accuracy for 10-class ImageNette, 83.8+% for 1000-class ImageNet, and 98.7+% for CIFAR-10. We further report the accuracy of state-of-the-art vanilla classification models in Table III. From these two tables, we can see that the clean accuracy of our defended models is very close to the state-of-the-art undefended models (the difference is smaller than 1%). The high clean accuracy can foster real-world deployment of our defense.444Similar to prior works like PatchGuard [xiang2021patchguard], the clean accuracy for different patch sizes varies slightly due to the use of different mask sets (based on knowledge of patch sizes). We will show a similarly high performance of PatchCleanser when we over-estimate the patch size (Section IV-C).
High certified robustness. In addition to state-of-the-art clean accuracy achieved by our defense, we can see from Table II that PatchCleanser also has very high certified robust accuracy. For ImageNette, our PC-ViT has a certified robust accuracy of 97.5% against a 1% square patch. That is, for 97.5% of test images, no strong adaptive white-box attacker who uses a 1%-pixel square patch anywhere on the image can induce misclassification of our defended model. Furthermore, we can also see high certified robust accuracy for ImageNet and CIFAR-10, e.g., 65.1% certified robust accuracy for a 1%-pixel patch on ImageNet and 94.3% certified robust accuracy for a 0.4%-pixel patch on CIFAR-10.
Significant improvements in clean accuracy and certified robust accuracy from prior works. We compare our defense performance with all prior certifiably robust defenses. From Table II, we can see that all our defense instances (i.e., PC-ResNet, PC-ViT, and PC-MLP) significantly outperform all prior works in terms of both clean accuracy and certified robust accuracy. Notably, for a 2%-pixel patch on ImageNet, PC-ViT improves the clean accuracy from 54.6% to 83.8% (29.2% gain in top-1 accuracy) and boosts the certified robust accuracy from 26.0% to 60.4% (the accuracy gain is 34.4%; the improvement is more than 2 times). Moreover, we can see that the certified robust accuracy of PC-ViT is even higher than the clean accuracy of all prior works. These significant improvements are due to PatchCleanser’s compatibility with state-of-the-art classification models, while previous works are fundamentally incompatible with them (recall that PG [xiang2021patchguard], DS [levine2020randomized], BagCert [metzen2021efficient], CBN [zhang2020clipped] are all limited to models with a small receptive field).
We can also see large improvements across datasets including ImageNette and CIFAR-10. For a 2%-pixel patch on ImageNette, PC-ViT improves clean accuracy from 95.0% to 99.6% and certified robust accuracy from 86.7% to 96.4%. For a 2.4%-pixel patch on CIFAR-10, PC-ViT improves clean accuracy from 86.0% to 98.7% (12.7% gain) and certified robust accuracy from 60.0% to 89.1% (29.1% gain).
Takeaways. In this subsection, we demonstrate that PatchCleanser has similar clean accuracy as vanilla state-of-the-art models, as well as high certified robust accuracy. In comparison with prior certifiably robust defenses, we demonstrate a significant improvement in both clean accuracy and certified robust accuracy. These improvements showcase the strength of defenses that are compatible with any state-of-the-art model.
In this subsection, we provide a detailed analysis of PatchCleanser models. We will analyze the effects of different model architectures and masked model training on the defense performance, discuss the trade-off between defense performance and defense efficiency, study the implications of over-estimated patch sizes, and finally explore the limit of the PatchCleanser defense. In Appendix -D, we analyze the defense performance on images with different object sizes and classes.
The Vision Transformer (ViT) based defense has the best performance and the masked model training can significantly improve model accuracy. In this analysis, we study the defense performance of different models (ResNet [resnet], ViT [vit], and MLP [resmlp]), and analyze the effect of masked model training (discussed in Section III-E). We report the clean accuracy and certified robust accuracy of different models for different datasets in Table IV. For ImageNette/ImageNet, we report certified robust accuracy for a 2%-pixel patch; for CIFAR-10, we report for a 2.4%-pixel patch.
First, we can see that despite the similar clean accuracy of vanilla models (recall Table III), PC-ViT has significantly higher clean accuracy and certified robust accuracy than PC-ResNet and PC-MLP. We believe that this is because ViT has been found more robust to pixel masking [paul2021vision].
Second, we can see that the masked model training has a large impact on the defense performance. On ImageNette and CIFAR-10 datasets, the masked training (denoted by the suffix “masked”) significantly improves both clean accuracy and certified robust accuracy. For example, the clean accuracy of PC-ResNet on CIFAR-10 improves from 94.3% to 97.8% and the robust accuracy improves from 39.4% to 78.8%. However, we find that the masked model training exhibits a different behavior on ImageNet. For ResNet, the masked model training significantly improves the certified robust accuracy. However, for PC-ViT and PC-MLP, the masked model training on ImageNet only has a slight impact. We partially attribute this to the challenging nature of the ImageNet model training.
There is a trade-off between defense performance and defense efficiency. In this analysis, we use PC-ViT against a 2%-pixel patch on 5000 randomly selected ImageNet test images to study the trade-off between defense performance and efficiency (similar analyses for ImageNette and CIFAR-10 are available in Appendix -B). In Figure 7, we report the clean accuracy, certified robust accuracy, and per-example inference time (evaluated using a batch size of one) for PC-ViT configured with different computation budgets (i.e., number of masks ). As shown in the figure, as we increase the number of masks, the certified robust accuracy first significantly improves and then gradually saturates. This is because a larger gives a smaller mask stride and leads to a smaller mask size, which enhances the robustness certification. However, we also observe that the per-example inference time greatly increases as we are using a larger number of masks. Therefore, we need to carefully choose a proper mask set size to balance the trade-off between defense performance and defense efficiency. In our default setting, we prioritize the defense performance and use a mask set size of .
We further visualize the defense overhead and defense performance (in terms of clean accuracy and certified robust accuracy) for different defenses in Figure 8. As shown in the figure, CBN [zhang2020clipped] (12.0ms), PG-BN [xiang2021patchguard] (44.2ms), and BagCert [metzen2021efficient] (14.0ms) have a very small runtime since they only requires one-time model feed-forward inference. For PC-ViT, we report the performance trade-off under different mask set sizes ranging from 4 to 36 (we omit PC-ResNet and PC-MLP for simplicity; additional results for them are available in Appendix -B). We can see that when PC-ViT is optimized for classification accuracy, we have 83.5% clean accuracy and 60.5% certified robust accuracy with a moderate defense overhead (672.4ms). On the other hand, when PC-ViT is optimized for defense efficiency, we achieve a small per-example inference time (78.8ms) while still significantly outperforming prior works in terms of clean accuracy (82.7%) and certified robust accuracy (41.6%). Furthermore, we note that prior works such as DS [levine2020randomized] and PG-DS [xiang2021patchguard] have a much larger defense overhead on ImageNet (4740.0 ms and 4918.0ms, respectively).
From this analysis, we demonstrate that there is a trade-off between defense strength and defense efficiency. In PatchCleanser, we can tune mask set size to balance this trade-off. In contrast, while prior works like PG-BN [xiang2021patchguard] and BagCert [metzen2021efficient] have a smaller inference time, they cannot further improve their defense performance regardless of additionally available computation resources. Finally, we argue that our defense can be applied to time-sensitive applications like video analysis by performing the defense on a subset of frames. We also note that we can significantly reduce the empirical inference time by running the masked prediction evaluation (i.e., in Algorithm 1) in parallel when multiple GPUs are available. With the improvement in computation resources and the development of high-performance lightweight models, we expect the computational cost to be mitigated in the future.
Over-estimation of patch sizes has a small impact on the defense performance. In PatchCleanser, we need to estimate the patch size to generate a proper mask set (the dependence on patch size estimation is similar to PatchGuard [xiang2021patchguard]). In this analysis, we aim to study the defense performance when we over-estimate the patch size. In Figure 7, we plot the defense performance as a function of an estimated patch area (i.e., the number of pixels) on the ImageNet dataset (the actual patch has 3232 pixels on the 224224 image); results for other datasets are in Appendix -B. The x-axis denotes the ratio of the estimated patch area to the actual patch area; 100% implies no over-estimation. As shown in the figure, as the over-estimation becomes greater, the clean accuracy of PC-ViT is barely affected while the certified robust accuracy gradually drops. We note that even when the estimation of the patch area is conservatively set to 4 times the actual area of the patch, PC-ViT still significantly outperforms all prior works in terms of clean accuracy and certified robust accuracy.
Understanding the limit of our defense. In Figure 7, we report the defense performance of PC-ViT against different patch sizes on the 224224 ImageNet test images (results for other datasets are in Appendix -B). This analysis helps us to understand the limit of PatchCleanser when facing extremely large adversarial patches. The figure shows that, as we increase the patch size, the clean accuracy of PC-ViT slowly decreases. For example, even when the patch size is 112112 (on the 224224 image), the clean accuracy is still above 80%. The clean accuracy finally deteriorates to 0.1% (random guess) when the patch is extremely large as 192192. The certified robust accuracy also decreases when a larger patch is used. When a large patch of 6464 is used, we have 43.7% certified robust accuracy; when the patch is as large as 112112 (half of the image size), we still have a non-trivial top-1 certified robust accuracy of 18.6% for 1000-class classification.
In this section, we quantitatively discuss the implications of different patch shapes and multiple patches, followed by the limitations and future work directions of PatchCleanser.
In our evaluation, we primarily focus on the scenario of a square patch. In this subsection, we discuss the compatibility of our defense with different patch shapes.
A key requirement in our defense algorithm is that at least one mask in the mask set can remove the entire patch (i.e., -covering). Therefore, if we have prior knowledge of the patch shape, we can generate a mask set that satisfies the security guarantee. For example, if we consider a rectangle patch, we can use a set of rectangle masks, whose height and width can be determined by Equation 3. In Table V, we report the evaluation results for patches of different rectangle shapes (but with a similar number of corrupted pixels). For simplicity, we set the mask strides of all axes to 32px. As shown in the table, PatchCleanser generalizes well across different patch shapes; PatchCleanser has similar defense performance for different rectangle patches with a similar number of corrupted pixels.
If the prior knowledge of patch shape/size is not available, we can consider a mask set that includes masks with different shapes/sizes. In the last row of Table V, we provide a proof of concept by applying a mask set of all shapes used in the above analysis to 500 randomly selected test images. As shown in the table, our defense still achieves high performance. We note that the reported certified robust accuracy accounts for a much stronger attacker who can use any of the 7 different patch shapes listed in the table, which explains a small drop in certified robustness. Finally, we note that considering multiple patch shapes can result in a large mask set size and high computational cost. How to adaptively generate a small mask set for multiple shapes is an interesting direction to study. One possible idea is to group similar shapes together and use larger masks to cover similar shapes to reduce the mask set size.
|All above shapes||99.0||91.0||81.8||47.2||99.2||82.0|
Proof of concept: evaluated on 500 randomly selected test images.
|Two 1%-pixel patches||98.8||89.2||82.8||43.8||98.6||76.6|
|One 2%-pixel patch||99.2||95.2||82.2||61.0||98.8||91.2|
In this paper, we primarily focus on the one-patch setting (similar to prior works [chiang2020certified, xiang2021patchguard, zhang2020clipped, metzen2021efficient, levine2020randomized, cropping]) because a high-performance certifiably robust defense against one patch is still an unresolved question. In this subsection, we discuss a preliminary approach for extending our defense for multiple patches.
Let us suppose that there are adversarial patches. We can generate a mask set with all possible -mask combinations, at least one of which can remove all patches. We then apply our double-mask algorithm with this mask set for robust image classification. In order to certify the robustness of a given image, we need to check if the image predictions are correct for all -mask combinations.
In Table VI, we provide a proof of concept for our multiple-patch defense. We select 500 random test images from each dataset and report defense performance against two 1%-pixel patches. As shown in the table, our defense achieves non-trivial defense performance against two patches. Moreover, we compare the defense performance against a 2%-pixel patch (which has the same number of adversarial pixels as two 1%-pixel patches). We can see that our defense performance for two patches is reduced compared to that for one-patch (despite the accuracy drop, the numbers are still much higher than prior works against a 2%-pixel patch in Table II). One major cause for this performance drop is that we have to use a large mask stride and mask size to reduce the number of masks due to the computation constraints (note that the number of -mask combinations can grow exponentially large). How to further improve the efficiency and performance of our defense against multiple patches is an interesting future work direction.
Finally, we note that if multiple patches are restricted to be close to each other, we can merge them into a larger single patch (the defense performance can then be computed via our analysis in Figure 7). On the other hand, if there are a large number of patches that are far away from each other, the attack becomes less localized and is more similar to a global attack, which has been studied in prior works [levine2020lzero, jia2020almost, DBLP:conf/nips/LeeYCJ19].
In this subsection, we discuss the limitations and future work directions of our defense.
Improving inference efficiency. Compared to some prior works [zhang2020clipped, xiang2021patchguard, metzen2021efficient], our defense achieves better performance at the cost of efficiency (recall Figure 8). In Section IV-C (Figure 7), we also see a trade-off between defense performance and efficiency. How to improve the efficiency of the underlying model (e.g., EfficientNet [efficientnet]) and the algorithm (e.g., our alternative inference algorithm in Appendix -C) is interesting to study. We note that we can also improve runtime by evaluating masked predictions in parallel with multiple GPUs.
Improving classification models’ prediction invariance to pixel masking. PatchCleanser is compatible with any classification models. Therefore, any improvement in image classification can benefit PatchCleanser. Notably, improving model prediction invariance to masking is crucial to model’s robustness against adversarial patches. Relevant research questions include: (1) How to train a model to be robust to pixel masking (e.g., Cutout [cutout], CutMix [yun2019cutmix])? (2) How to design model architecture that is inherently robust to pixel masking (e.g., ViT [vit], CompNet [kortylewski2020compositional]
)? (3) How to leverage ideas like image inpainting[liu2018image] for better prediction invariance?
Relaxing the prior estimation of the patch shape and patch size. PatchCleanser requires an estimation of the patch shape/size to generate the mask set. This limitation is shared by all masking-based defenses [mccoyd2020minority, xiang2021patchguard, xiang2021patchguardpp]: an underestimated patch size/shape will undermine the robustness. How to relax the dependence on this prior knowledge is important to study (we discussed one possible idea in Section V-A by considering a mask set that includes multiple possible shapes).
Extending to other defense tasks. PatchCleanser demonstrates that we can achieve high certifiable robustness for any image classifier. Our defense insights can be helpful for other vision tasks such as object detection. For example, we can plug PatchCleanser into DetectorGuard [xiang2021detectorguard], a certifiably robust object detection algorithm that uses robust image classifiers for robust object detection, to improve its defense performance. Moreover, PatchCleanser can also have applicability to training-time model poisoning attacks [gu2017badnets]. For example, PatchCleanser can be used to filter out training images that are poisoned with localized triggers.
The adversarial patch attack was first introduced by Brown et al. [brown2017adversarial]; this attack focused on generating universal adversarial patches to induce model misclassification. Brown et al. [brown2017adversarial] demonstrated that the patch attacker can realize a physical-world attack by printing and attaching the patch to the victim objects. A concurrent paper on the Localized and Visible Adversarial Noise (LaVAN) attack [karmon2018lavan] aimed at inducing misclassification in the digital domain. Both of these papers operated in the white-box threat model, with access to the internals of the classifier under attack. PatchAttack [yang2020patchattack]
, on the other hand, proposed a reinforcement learning based attack for generating adversarial patches in the black-box setting.
There have been adversarial patch attacks proposed in other domains such as object detection [liu2018dpatch], semantic segmentation [sehwag2018not], and network traffic analysis [shan2021real]. In this paper, we focus on test-time attacks against image classification models.
To counter the threat of adversarial patches, two heuristic-based empirical defenses, namely Digital Watermark (DW) [hayes2018visible] and Local Gradient Smoothing (LGS) [naseer2019local], were first proposed. However, Chiang et al. [chiang2020certified] had shown that these defenses were ineffective against an adaptive attacker with the knowledge of the defense algorithm and model parameters [chiang2020certified].
The ineffectiveness of empirical defenses has inspired many certifiably robust defenses. Chiang et al. [chiang2020certified] proposed the first certifiably robust defense against adversarial patches via Interval Bound Propagation (IBP) [gowal2018effectiveness, mirman2018differentiable]
, which conservatively bounded the activation values of neurons to derive a robustness certificate. This defense requires expensive training and does not scale to large models and high-resolution images. Zhang et al.[zhang2020clipped] proposed Clipped BagNet (CBN) to clip features of BagNet (a classification model with small receptive fields) for certified robustness. Levine et al. [levine2020randomized] proposed De-randomized Smoothing, which fed small image regions to a classification model and used predictions on different regions to vote for the final prediction. Xiang et al. [xiang2021patchguard] proposed PatchGuard as a general defense framework with two key ideas: the use of small receptive fields and secure feature aggregation. Notably, PatchGuard subsumes a number of recent works [zhang2020clipped, levine2020randomized, metzen2021efficient, cropping]. Metzen et al. [metzen2021efficient] proposed BagCert, a variant of BagNet with majority voting, for certified robustness. Lin et al. [cropping] proposed Randomized Cropping in which model predictions on different cropped images performed majority voting for the final robust prediction. This approach only offers probabilistic certified robustness, so we did not compare its performance with other approaches that offer deterministic certified robustness in Section IV.
A key takeaway from our paper is that the dependence of prior works on specific model architectures (e.g., small receptive fields [zhang2020clipped, levine2020randomized, xiang2021patchguard, metzen2021efficient, cropping]) greatly limits the defense performance; in contrast, the compatibility of PatchCleanser with any model architecture leads to state-of-the-art defense performance in terms of clean accuracy and certified robust accuracy, outperforming all prior works by a large margin.
Another line of certifiably robust research focuses on attack detection. The Minority Reports (MR) defense [mccoyd2020minority] places a mask at all image locations and uses the inconsistency in masked prediction voting grids as an attack indicator. PatchGuard++ [xiang2021patchguardpp] proposed to use models with small receptive fields to extract features and do efficient feature-space masking for attack detection. We note that the first-round masking of PatchCleanser is is similar to the masking operation of MR. However, performing one-round masking of MR can only achieve robustness for attack detection, and an attacker can exploit this weaker notion of robustness to force the model to abstain from making a prediction. In contrast, PatchCleanser’s double-masking algorithm focuses on the harder task of recovering correct predictions without abstention.555One concurrent work [tramer2021detecting] discusses a theoretical but computationally impractical way to build robust classification defenses using robust attack detection techniques for global attacks. Studying its implications on patch defense is an interesting direction of future work.
Some other recent defenses focus on adversarial patch training [wu2019defending, rao2020adversarial], but they lack certifiable robustness guarantee. In other domains like object detection, empirical defenses [saha2020role, ji2021adversarial, liang2021we] and certifiably robust defenses [xiang2021detectorguard] have also been proposed. We omit a detailed discussion since PatchCleanser focuses on certifiably robust image classification.
In addition to adversarial patch attacks and defenses, there is a significant body of work on adversarial examples. Conventional adversarial attacks [szegedy2013intriguing, goodfellow2014explaining, papernot2016limitations, carlini2017towards] aim to introduce a small global perturbation to the image for model misclassification. Empirical defenses [papernot2016distillation, xu2017feature, meng2017magnet, metzen2017detecting] were first proposed to mitigate the threat of adversarial examples, but were later found vulnerable to a strong adaptive attacker with the knowledge of the defense setup [carlini2017adversarial, athalye2018obfuscated, tramer2020adaptive]. The fragility of these heuristic-based defenses inspired a new research thread on developing certifiably robust defenses [raghunathan2018certified, wong2017provable, lecuyer2019certified, cohen2019certified, salman2019provably, gowal2018effectiveness, mirman2018differentiable]. In contrast, we focus on adversarial patch attacks, whose perturbations are localized and thus are realizable in the physical world.
In this paper, we propose PatchCleanser for certifiably robust image classification against adversarial patch attacks. Notably, PatchCleanser is compatible with any state-of-the-art classification model (including ones with large receptive fields). PatchCleanser uses a double-masking algorithm to remove all adversarial pixels and recover the correct prediction (without abstention). PatchCleanser also uses an adaptive mask set generation technique to improve defense efficiency as well as a masked model training to boost certified robust accuracy. Our evaluation shows that PatchCleanser outperforms all prior works by a large margin: it is the first certifiably robust defense that achieves clean accuracy comparable to state-of-the-art vanilla models while simultaneously achieving high certified robust accuracy. PatchCleanser thus represents a promising new direction in our quest for secure computer vision systems.
We would like to thank Jan Hendrik Metzen for providing additional evaluation results of the BagCert defense [metzen2021efficient]. We are also grateful to Ashwinee Panda, Sihui Dai, Alexander Valtchanov, Xiangyu Qi, and Tong Wu for their valuable feedback on the paper.