Log In Sign Up

Going Grayscale: The Road to Understanding and Improving Unlearnable Examples

Recent work has shown that imperceptible perturbations can be applied to craft unlearnable examples (ULEs), i.e. images whose content cannot be used to improve a classifier during training. In this paper, we reveal the road that researchers should follow for understanding ULEs and improving ULEs as they were originally formulated (ULEOs). The paper makes four contributions. First, we show that ULEOs exploit color and, consequently, their effects can be mitigated by simple grayscale pre-filtering, without resorting to adversarial training. Second, we propose an extension to ULEOs, which is called ULEO-GrayAugs, that forces the generated ULEs away from channel-wise color perturbations by making use of grayscale knowledge and data augmentations during optimization. Third, we show that ULEOs generated using Multi-Layer Perceptrons (MLPs) are effective in the case of complex Convolutional Neural Network (CNN) classifiers, suggesting that CNNs suffer specific vulnerability to ULEs. Fourth, we demonstrate that when a classifier is trained on ULEOs, adversarial training will prevent a drop in accuracy measured both on clean images and on adversarial images. Taken together, our contributions represent a substantial advance in the state of art of unlearnable examples, but also reveal important characteristics of their behavior that must be better understood in order to achieve further improvements.


page 1

page 12

page 13

page 14

page 15


A Differentiable Color Filter for Generating Unrestricted Adversarial Images

We propose Adversarial Color Filtering (AdvCF), an approach that uses a ...

Boundary Adversarial Examples Against Adversarial Overfitting

Standard adversarial training approaches suffer from robust overfitting ...

Adversarial amplitude swap towards robust image classifiers

The vulnerability of convolutional neural networks (CNNs) to image pertu...

Learning with Multiplicative Perturbations

Adversarial Training (AT) and Virtual Adversarial Training (VAT) are the...

Towards Efficient Adversarial Training on Vision Transformers

Vision Transformer (ViT), as a powerful alternative to Convolutional Neu...

Predicting Shot Making in Basketball Learnt from Adversarial Multiagent Trajectories

In this paper, we predict the likelihood of a player making a shot in ba...

1 Introduction

The ever-growing amount of easily available online data has enabled the massive progress neural networks have achieved [schmidhuber2015deep, lecun2015deep]. However, online data is often personal or even sensitive, raising concerns about privacy and unauthorized use. Several widely in-use data sets have been collected without user consent [hill2020secretive, birhane2021large] and there is an urgent need for approaches that project users’ privacy and allow users to retrain control over their own data.

This need is addressed by benign data poisoning, approaches that protect users’ data from being used to train a classifier, but do not actively attempt to harm the classifier, as is the case with malicious data poisoning. Key examples are TensorClog [shen2019tensorclog], which causes gradient vanishing, and adversarial poisoning [fowl2021adversarial], which adds conflicting information to the image data. Both [shen2019tensorclog, fowl2021adversarial] mention the usefulness of their approaches for data protection. Recently, [huang2021unlearnable] introduced an approach to crafting UnLearnable Examples (ULEs), which has the aim of allowing a defender to create examples that are unusable for a exploiter to train Deep Neural Networks (DNNs). We refer to the specific ULE method proposed by [huang2021unlearnable] as UnLearnable Examples-Original (ULEO), and call the protected examples it generates ULEOs. The ULEO method carries out benign data poisoning by learning sample-wise, error-minimizing perturbations that are imperceptible to the human eye.

The goal of this paper is to reveal the road that research should follow in order to understand and improve the state of the art of ULE. Until now, only adversarial training [madry2018deep] has been shown to be effective against ULEOs [huang2021unlearnable, fowl2021adversarial]. Adversarial training has the disadvantage of being computationally expensive and it also trades off model accuracy compared to regular training on clean images. In this paper, we show that a simple pre-filtering method can also effectively defeat ULEOs. By examining the visual characteristics of ULEO perturbations, as shown in Figure 1, we observe that they contain few spatial changes but many changes over three color channels. We refer to such perturbations as channel-wise perturbations.

Building on the insight of Figure 1, we conjecture that ULEOs mainly exploit channel-wise perturbations. To test this conjecture, we demonstrate that suppressing channel-wise perturbations will allow classifiers to defeat ULEOs. We suppress channel-wise perturbations in two ways. First, we test simple grayscale pre-filtering as a ULEO countermeasure and find that this grayscale exploiter is even more effective than adversarial training in mitigating the effects of ULEOs. Note that the grayscale exploiter has the same architecture as its corresponding original exploiter but only the input images are pre-filtered to become three (RGB)-channel grayscale images (see Section 4.1 for technical details). Second, we apply a radical bit-depth reduction [xu2018feature] and find that this transformation is also quite effective at mitigating ULEOs.

Based on our insight about the importance of channel-wise perturbations, we propose a grayscale defender to generate stronger ULEs by first incorporating grayscale knowledge to prevent channel-wise perturbations, and then promoting spatial patterns via applying standard data augmentations on clean images (see Section 4.2 for technical details). This method is an extension of ULEO that we call ULEO-GrayAugs and that is effective independently of whether grayscale pre-filtering is applied before the classifier is trained. Further, we show that perturbations generated using simple Multi-Layer Perceptrons (MLPs) [rumelhart1985learning] do have complex spatial patterns, and are able to fool complex CNNs. In this way, we demonstrate the potential benefits of spatial perturbations over channel-wise perturbations.

These observations are the groundwork for further experiments that reveal what we still do not understand about how ULEOs work. Specifically, we investigate ULEOs with adversarially trained classifiers and discuss the impact of ULEOs on the adversarial robust accuracy of models, going beyond previous work, which only looks at accuracy on clean images. We also investigate training on mixed ULEOs and clean data, and moving ULEOs to ImageNet data. Overall, the aim of our analysis is to provide a better understanding of unlearnable examples and inspire future evaluation in more realistic scenarios. An overview of our work is shown in Figure 

2. In sum, this paper makes the following contributions:

  • We show that ULEOs, the unlearnable examples originally proposed by [huang2021unlearnable] mainly exploit color channels and that their effects can be mitigated using simple grayscale pre-filtering and without resorting to adversarial training.

  • Building on this insight, we propose ULEO-GrayAugs, which improves upon ULEO by generating grayscale perturbations and using data augmentations for further promoting spatial changes. ULEO-GrayAugs is effective against classifiers with and without grayscale pre-filtering.

  • We, for the first time, generate ULEOs using simple Multi-Layer Perceptrons (MLPs) [rumelhart1985learning] and find that the resulting perturbations can transfer to CNNs but not the other way around. This suggests that CNNs are more vulnerable to ULEOs and generating ULEOs using MLPs serves as an efficient alternative to the current ULEO.

  • We find that the mitigating effect of adversarial training prevents a drop in classifier accuracy both on clean test images and, surprisingly, on adversarial test images. This observations suggests that more work is necessary to achieve generally effective ULEs, and that future work should consider both clean accuracy and adversarial robust accuracy.

2 Related Work

Data Poisoning. The aim of data poisoning can be either permitting/forbidding certain data samples during test time (i.e.integrity poisoning) or decreasing the general model performance (i.e.availability poisoning) [barreno2010security]

. Early studies focus on classical machine learning algorithms, such as linear models and SVMs 

[biggio2011support, xiao2015feature, koh2017understanding]

, where the poisoning is formulated as a bi-level optimization problem. Later, data poisoning against deep learning models was also explored. Backdoor data poisoning 

[chen2017targeted, gu2019badnets], as a type of integrity poisoning, implants specific (trigger) information into a learned model by manipulating training data, in order to cause abnormal model behavior on test samples that have specific triggers. In contrast, availability data poisoning aims to degrade a model’s performance by only manipulating the training data [shen2019tensorclog, huang2021unlearnable, fowl2021adversarial]. To this end, existing work has relied on gradient vanishing [shen2019tensorclog], model error minimization [huang2021unlearnable] or adversarial examples [fowl2021adversarial]. Our work will specifically focus on improving the method of [huang2021unlearnable], in terms of both poisoning performance and threat models.

Adversarial Examples. Adversarial Examples aim to fool models at test time by adding imperceptible perturbations to clean test images [szegedy2014intriguing, goodfellow2015explaining, carlini2017towards]. Similar ideas of using grayscale transformation have also used in [Laidlaw2019functional, zhao2020adversarial] for mitigating color-based adversarial examples. Adversarial training [goodfellow2015explaining, madry2018deep] is currently considered the only empirically strong technique for defeating adversarial examples. In the context of data poisoning, adversarial training has also been demonstrated to be very effective, i.e., adversarially training a model on poisoned data can secure high enough clean test accuracy [huang2021unlearnable, fowl2021adversarial]. However, previous work only focused on the clean test accuracy without exploring the adversarial robustness (i.e.robust accuracy on adversarial examples around clean test data). In Section 6.2, we, for the first time, discuss the impact of ULEs on the adversarial robustness of adversarially trained models.

Privacy Protection. In addition to data poisoning approaches that can be used to make user data unexploitable during training [shen2019tensorclog, huang2021unlearnable, fowl2021adversarial, shan2020fawkes]

, approaches based on adversarial machine learning have also been developed to protect privacy by misleading the machines during test time, for instance, in person-related recognition 

[oh2016faceless, oh2017adversarial, rajabi2021practicality] and social media mining [larson2018pixel, liu2019pixel]. Privacy attributes in images were analyzed in depth by [orekondy2017towards, sattar2020body].

3 Threat Model

In this section, we describe the threat model we use in this paper. Following the ULEO work [huang2021unlearnable], we introduce two parties: the data defender and exploiter. The defender’s goal is to make its uploaded data unlearnable to the exploiter by manipulating them, and the exploiter then trains a classifier from scratch on these uploaded data. The defender’s success is measured by the accuracy of the classifier on clean test data. The lower the clean test accuracy, the more successful the defender is considered to be.

3.1 Defender’s and Exploiter’s Knowledge

We consider the following two scenarios that specify the knowledge of both Defender and Exploiter:

S1: Reactive Exploiter: The exploiter is aware of ULEs, and so reactively applies specific techniques during model training to mitigate ULEs.

S2: Adaptive Defender: The defender is aware that the exploiter applies specific techniques for mitigation, and so adapts itself by incorporating the knowledge of such specific techniques into the ULE optimization.

Specifically, S2 is a new, challenging scenario that has not been explored in the ULEO work [huang2021unlearnable].

3.2 Defender’s and Exploiter’s Capability

The defender is only allowed to manipulate the uploaded data but not the training process of the exploiter. The defender should ensure the stealthiness and maintain the general utility of the manipulated data. For example, if the data are images posted on some social platform, they should look like normal images. To ensure stealthiness and maintain general utility, related work has manipulated the image data by adding imperceptible perturbations that are normally restricted by some norm [huang2021unlearnable, fowl2021adversarial]. Note that, if the original images are three-channel (RGB) images, the resulting perturbed images need also to be three-channel (RGB) images even with grayscale perturbations.

The exploiter trains its model on collected data until convergence without specific constraints on resources. The exploiter is assumed to take as input only three-channel (RBG) images. In this paper, we focus on the case in which the exploiter does not have access to other clean data, and so cannot identify the existence of ULEs in its training data before finally deploying the model. We also discuss the case in which a clean validation set is available, and point out that in this case, the state-of-the-art adversarial poisoning [fowl2021adversarial] could be substantially mitigated by early stopping the training process (see Section 5.2 for more details).

3.3 Unlearnable Example Optimization

Formally stated, unlearnable examples [huang2021unlearnable] aim to make a classifier generalize poorly on the clean image distribution , from which the clean training set is sampled:


where represents the parameters of the model , and is the cross-entropy loss, which takes as input a pair of model output and the corresponding label . denotes the set of the additive perturbations .

In order to achieve the above objective, error-minimizing perturbations [huang2021unlearnable] have been proposed to solve the following min-min bi-level optimization problem:


where denotes the source model used for perturbation optimization. This optimization can prevent the classifier in Eq. 1 from being penalized by the objective function during training, and as a result can fool the classifier into believing there is “nothing” to learn from each perturbed training image  [huang2021unlearnable]. In order to ensure the imperceptibility, the perturbations are normally constrained by norm  [huang2021unlearnable, fowl2021adversarial].

The inner optimization aims to find the perturbations by minimizing the model’s classification loss, while the outer optimization updates model parameters by training on the perturbed images achieved in the inner optimization. Note that the above inner and outer optimization share the same objective, and the model training starts from scratch in each round while the perturbations accumulate through the whole optimization. When is being updated, is frozen, and vice versa. The inner and outer optimization will be alternatively implemented and finally terminated when the pre-defined training accuracy on the outer model is met. The training steps in the outer optimization should be limited compared to standard model training [franceschi2018bilevel, shaban2019truncated, huang2020metapoison, huang2021unlearnable]. The detailed pipeline is described in Algorithm 1 in Appendix B. We focus on the setting of sample-wise perturbations since they are harder to expose in practice and the qualities of ULEOs over other methods are mainly recognized in this setting [huang2021unlearnable].

4 Improving ULEs in Our Two Scenarios

In this section, we first present how to use grayscale pre-filtering to help the exploiter defeat ULEOs [huang2021unlearnable], and then discuss how to improve upon ULEOs against our grayscale exploiter by proposing a grayscale defender.

4.1 Grayscale Exploiters against ULEOs

As mentioned before, adversarial training has been recognized to be the only effective technique for defeating unlearnable examples [huang2021unlearnable, fowl2021adversarial]. However, it is known to be computationally expensive and also sacrifices the model performance on clean data [madry2018deep]. As we have argued in Section 1, ULEO perturbations mainly exploit channel-wise perturbations. As a result, we propose to use simple grayscale pre-filtering during model training to allow the classifier to mitigate the perturbations of ULEOs. In this case, the optimization as formulated in Eq. 1 can be modified into:


where denotes the grayscale filtering111We implement the grayscale using torchvision applied on each training input image, . By doing this, the channel-wise changes in the ULEO perturbations are completely removed before the perturbed images are used for model training. Note that when the classifier is trained on clean data, the gray-scale filtering only leads to small accuracy drop (see Table 1

). This is consistent with previous finding on ImageNet that color information makes little difference to the model accuracy 

[xie2018pre]. The diagram showing the working pipeline can be found in Appendix C.

4.2 ULEO-GrayAugs against Grayscale Exploiters

The defender can be adapted to be stronger if it knows that the exploiter has applied gray-scale pre-filtering. A direct way to achieve this is by incorporating grayscale knowledge into the ULEO optimization. To this end, we propose ULEO-Gray, which adapts the ULEO optimization in Eq. 3 to:

where both and still have three (RGB) channels. Each pixel in is restricted to have the same values in all three channels, in order to prevent the perturbations from exploiting channel-wise changes.

This new optimization can substantially improve upon ULEO against classifiers with grayscale perturbations, but cannot reach comparable performance to that on standard classifiers (see Table 2 for detailed results). We argue that it is because the current perturbation optimization only operates on the fixed original image throughout the whole optimization procedure. In order to promote ULEO to learn spatial perturbations, we further propose ULEO-GrayAugs, which applies standard (spatial) data augmentations (here, random crop and random horizontal flip) to clean images:

where denotes the data augmentations. ULEO-GrayAugs allows perturbations to bypass the grayscale pre-filtering and to have more spatial changes.

5 Experiments

In this section, we validate the effectiveness of the proposed grayscale pre-filtering in defeating ULEOs, and then demonstrate the strong performance of our ULEO-GrayAugs, which uses grayscale perturbations and data augmentations to allow classifiers to learn spatial changes over channel-wise changes.

5.1 Experimental Settings

Following earlier work [huang2021unlearnable, fowl2021adversarial], the standard setup has the whole training data consist of either ULEs or clean images. In specific experiments we allow the training data to be a mix of clean and ULE training data. Following [huang2021unlearnable]

, we mainly conduct experiments on CIFAR-10 

[krizhevsky2009learning], which consists of 50000 training images and 10000 test images with the size of 3232 from 10 classes. We also discuss ULEs on an ImageNet subset containing the first 100 classes as in [huang2021unlearnable]. If not mentioned specifically, we follow [huang2021unlearnable] to restrict the ULE perturbations by norm with for CIFAR-10 and for ImageNet.

5.2 Grayscale Exploiters against ULEOs

Figure 3: Clean test accuracy (%) of the standard (left) and grayscale (right) exploiters. Exploiters are trained on clean data, ULEOs, or adversarial images by the state-of-the-art poisoning method, Adversarial Poisoning (AdvPois) [fowl2021adversarial].
Def\Exp w/o Mixup BDR-2 AT Gray
Clean 94.58 93.32 89.12 85.30 93.04
ULEOs 24.47 51.01 41.65 84.75 90.06
Table 1: Classification accuracy (%) on CIFAR-10 clean data (for reference) and CIFAR-10 ULEO data mitigated with different methods. Mixup [zhang2018mixup] (state of the art as per [huang2021unlearnable, fowl2021adversarial]), Bit-Depth Reduction to 2 bits (BDR-2) [xu2018feature], Adversarial Training (AT), and our own grayscale exploiter (Gray), which transforms ULEs to grayscale before classifying.

Figure 3 shows the learning curves of both the standard exploiter and our grayscale exploiter trained on different types of data: clean, ULEOs, and adversarial poisoning examples [fowl2021adversarial]. As can be seen, using the proposed grayscale pre-filtering allows the exploiter to effectively mitigate the effects of ULEOs and achieved close performance to the case with clean training data. We can also observe that Adversarial Poisoning (AdvPois) [fowl2021adversarial]

are effective on both exploiters when only looking at the final accuracy results. However, it takes multiple training epochs for the model to exploit the perturbations made by AdvPois, making it possible to substantially mitigate AdvPois by early stopping when the exploiter can monitor the model performance on a set of clean validation data (from another reliable source), which is feasible in practice. The ULEO work 

[huang2021unlearnable] also found the same property on random and error-maximizing ULE perturbations. Note that here we don’t discuss TensorClog [shen2019tensorclog] because it has been shown to be much less effective [fowl2021adversarial].

We further compare our grayscale pre-filtering with other techniques for defeating ULEOs. As can be seen from Table 1, grayscale pre-filtering surpasses the previously thought best pre-filtering technique, Mixup, by 39.05%. It also substantially outperforms the current state of the art, adversarial training, and at the same time better maintains the model accuracy when the training data are clean.

It is worth noting that another pre-filtering technique, bit-depth reduction, which can also reduce channel-wise differences, achieves substantial performance against ULEOs. This result supports our hypothesis that ULEOs mainly exploit channel-wise perturbations. More detailed results of bit-depth reduction with other bit depths than 2-bit can be found in Appendix A.

Def\Exp w/o Gray w/ Gray
Clean 94.58 93.04
ULEOs 24.474.41 90.063.51
ULEO-Augs 23.427.48 91.471.85
ULEO-Grays 43.3411.49 44.6710.47
ULEO-GrayAugs 25.987.38 24.985.84
Table 2: Clean test accuracy (%) of the standard and grayscale exploiters against ULEs (ULEOs and our ULEOs with grayscale perturbations and/or data augmentations) on CIFAR-10. All results are averaged over 5 runs.
Figure 4: Learning curves of grayscale exploiters trained on clean data, ULEOs, or ULEO-GrayAugs.

5.3 ULEO-GrayAugs against Grayscale Exploiters

As we have shown that the ULEOs are vulnerable to grayscale exploiters, here we demonstrate the effectiveness of our ULEO-GrayAugs, which improve upon ULEOs by generating perturbations with a larger spatial variation. As can be seen from Table 2, our ULEO-GrayAugs still yield a low accuracy (24.98%) on the grayscale exploiter. This confirms ULEO-GrayAugs exploit spatial perturbations to bypass the grayscale pre-filtering. More detailed evidence with the training curves of ULEOs vs. ULEO-GrayAugs can be found in Figure 4. More interestingly, our ULEO-GrayAugs method also yields similarly good performance on the standard exploiter, making it a generally stronger ULE solution than the ULEO method. We also find that solely using data augmentations (ULEO-Augs) will make no difference from ULEOs and solely using grayscale (ULEO-Grays) also does not lead to optimal results.

The above experiments are based on the assumption that the defender can use the same model as the exploiter to generate ULEs. However, it might not be the case in practical scenarios with unknown target models. To test how the effectiveness of ULEO-GrayAugs generalizes to practical scenarios, we test their cross-model transferability. As can be seen from Table 3, ULEO-GrayAugs can be successfully generated independently of the model architectures. In addition, they well maintain their strong effects when being transferred from one model to another.

Def\Exp RN-18 DN-121 VN-11
RN-18 28.35/28.97 25.13/23.99 33.07/28.78
DN-121 28.85/30.01 28.94/25.79 30.80/25.00
VN-11 11.47/11.14 14.28/13.26 14.50/13.23
Table 3: Clean test accuracy (%) of exploiters in the columns when defenders choose a model in the rows for generating ULEO-GrayAugs. We consider three architectures: ResNet-18 (RN-18), DenseNet-121 (DN-121), and VGGNet-11 (VN-11), and report results for standard/grayscale exploiters.
Ori Add Gray Clean ULEOs ULEO-GrayAugs
5% - 63.89 - -
5% 10% 80.83 -13.55 -4.61
5% 30% 89.26 -22.12 -10.48
5% - 61.99 - -
5% 10% 81.61 -1.54 -6.11
5% 30% 87.95 -0.54 -15.84
Table 4: Clean test accuracy (%) of a model trained by the exploiter (with or without grayscale pre-filtering) on an original (Ori) training set containing 5% of the official training data of CIFAR-10 with additional (Add) training data of Clean, ULEOs, or ULEO-GrayAugs. For each case with all training data being clean, we directly report the model accuracy, while for each case with partial clean and ULEs, we report its accuracy difference from the case with the same amount of clean training data. Note that after adding 30% the model accuracy starts getting saturated. Similar results have been found for the cases with Ori as 10% (see Table 9 in Appendix A).

5.4 ULEs Mixed with Clean Training Data

So far, our experiments have been conducted in the setting that the whole training data are ULEs or clean images. However, it is realistic that exploiters may have access to other clean training data from another source. Here we test the effectiveness of ULEs (ULEOs and our ULEO-GrayAugs) when different proportions of clean and perturbed data are used to train the classifier. In this case, ULEs are considered effective if adding them leads to significantly less increased accuracy compared with adding the same amount of clean data.

Table 4 compares the accuracy before and after adding clean data or ULEs to the original training set. As can be seen, in all cases, adding extra data improves model accuracy. This is consistent with previous findings [huang2021unlearnable, fowl2021adversarial]. However, compared with adding clean data, adding the same amount of ULEs leads to less improvement. Specifically, for standard exploiters, ULEOs are stronger and ULEO-GrayAugs also yield substantial accuracy drop. However, for grayscale exploiters, ULEO-GrayAugs are stronger but ULEOs yield little accuracy drop.

Def\Exp w/o Gray w/ Gray
Clean 59.66 58.18
ULEOs 4.88 8.52
ULEO-GrayAugs 7.04 6.46
Table 5: Clean test accuracy (%) of standard and grayscale exploiters against ULEs (ULEOs and our ULEO-GrayAugs) on ImageNet subset ( = 16). The perturbations were class-wise instead of sample-wise.

5.5 ULEs on ImageNet

In order to verify the generalizability of ULEs (ULEOs and our ULEO-GrayAugs) to larger datasets, we test on ImageNet [deng2009imagenet] images. Table 5 shows the results achieved by directly using the official training code 222 of [huang2021unlearnable]. Note that we remove the “color jitter” data augmentation during the model training because it naturally affects channel-wise perturbations, making the usefulness of grayscale pre-filerting difficult to validate. As can be seen, the improvement of our grayscale exploiter and ULEO-GrayAugs still holds. However, the improvement is limited compared to the reported results on CIFAR-10. By carefully checking the official code, we identify an implementation error333In personal communication, the authors acknowledge the error and are planning to address it in order to support further research. that mistakenly trains classifiers on (incorrect) class-wise perturbations [huang2021unlearnable], which make ULEOs achieve strong results much more easily on exploiters with/without grayscale pre-filtering. After clearing up this error, we find that the model training part of the bi-level ULEO optimization cannot converge to a pre-defined high accuracy after a small-scale hyper-parameter search. We suspect it might be because ImageNet has more classes and images with higher visual complexity, while having less training samples per class. For this reason, we also try generating ULEOs on ImageNet with only 10 classes and find that the optimization can successfully converge, but the resulting perturbations have very limited effects. The perturbations for both experiments are shown in Appendix F. There, we also visualize perturbations generated for upsampled (224224) CIFAR-10 images to show that increasing the image size alone does not change the channel-wise patterns of perturbations. We leave further exploration for future work.

6 Additional Practical Insights into ULEs

In this section, we provide additional practical insights into ULEs that have not been discussed in previous work [huang2021unlearnable, fowl2021adversarial]. We discuss generating ULEs while using MLPs to fool CNNs from the defender perspective. We also explore from the exploiter perspective the effectiveness of adversarial training in maintaining adversarial robust accuracy of models against ULEs.

Figure 5: Perturbations of ULEOs generated using MLPs for the same three CIFAR-10 images shown in Figure 1.
Def\Exp MLP RN-18 DN-121 VN-11
MLP 48.66/47.50 94.58/21.55 95.19/15.84 91.66/81.88
Table 6: Clean test accuracy (%) of MLP vs. CNNs that are trained on clean images/MLP-generated ULEOs.

6.1 ULEs on MLPs

The exploration of ULEOs in the original [huang2021unlearnable] and our work has so far been limited to Deep Neural Networks (CNNs). However, it remains unclear whether the underlying mechanism of the ULEOs are specific to CNNs or also generalizable to other machine learning algorithms. To address this concern, we also try generating ULEOs on a simple machine learning algorithm, Multi-Layer Perceptrons (MLPs) [rumelhart1985learning]. Specifically, we use flattened ULEOs or clean images as inputs to MLPs.

As can be seen from Table 6, the ULEOs generated using MLPs are not able to succeed against MLP exploiters. However, surprisingly, they can be very strong against CNN exploiters, especially on ResNet-18 and DenseNet-121. This observation provides a generally more efficient way to generate successful ULEOs on CNNs, i.e.using a simple MLP as the defender. A closer look at the training curves (see Figure 6 in Appendix A) of the CNNs on MLP-generated ULEOs suggests that MLP-generated ULEOs, similar to those generated by AdvPois [fowl2021adversarial], can also be substantially mitigated by early stopping. We further try the reverse direction by transferring ULEOs from CNNs to MLPs, but find that it does not work. This suggests that CNNs are generally much more vulnerable to ULEOs than MLPs.

In order to shed light on the specific properties of ULEOs on MLPs vs. CNNs, we visualize the perturbations of the ULEOs generated on MLPs in Figure 5. As can be seen, different from the perturbations of ULEOs that are generated using CNNs (as shown in Figure 1), these perturbations on MLPs naturally contain many spatial changes. This difference suggests that the underlying working principles of these two types of ULEOs might be different. We leave more detailed exploration of this difference for future work.

AT-\Test Clean FGSM PGD
Clean 85.30 54.01 46.59
ULEOs 84.75 52.42 40.34
ULEO-GrayAugs 84.38 53.07 43.80
Table 7: Clean and adversarial robust (against FGSM and PGD with 20 steps) test accuracy (%) of exploiter that is adversarially trained on clean images, ULEOs, or our ULEO-GrayAugs. All models are adversarially trained using PGD with 7 steps.

6.2 ULEs Meet Adversarial Examples

Current work has demonstrated that Adversarially Trained models on ULEs (AT-ULE models) can still achieve high clean test accuracy [huang2021unlearnable, fowl2021adversarial]. This is explainable because the AT-ULE model is expected to be effective in maintaining normal model behaviors on any test examples that fall into the -ball of ULEs. Since clean test examples obviously fall into this ball, clean test accuracy of AT-ULE model should be comparable to the training accuracy.

Despite the above observation that adversarial training prevents ULEs from decreasing the clean test accuracy, we find that it can also surprisingly, prevents the drop of robust test accuracy. It is still unclear if it prevents the drop of adversarial robust accuracy (i.e., accuracy on adversarial examples around clean test data). This is related to a realistic scenario in which an exploiter has implemented computationally expensive adversarial training for achieving high robust accuracy around clean data but might not know its training data are actually ULEs (rather than clean data). In this scenario, the resulting AT-ULE models would not be intended to achieve high adversarial robust accuracy around clean test data because the AT-ULE models have been only trained on ULEs and expected to secure the adversarial robust accuracy around ULE test data.

However, as can be seen from Table 7, AT-ULE models trained on the current ULE methods (ULEOs and ULEO-GrayAugs) only yield slightly lower robust accuracy than that of the AT models trained on clean data. This suggests that in practice, the current ULE methods are still vulnerable to adversarial training in terms of clean test accuracy, but also, adversarial robust accuracy. More details about this unexpected phenomenon are worth exploring by future research, in order to achieve comprehensively stronger ULEs against adversarial training.

7 Discussion

In this section, we first provide general insight into understanding ULEs. Then we discuss limitations of our work and potential ways to address them in future work.

7.1 ULEO Principles

In this paper, we have shown ULEOs mainly exploit simple, channel-wise perturbations. However, the reason why ULEOs have this “lazy” behavior remains unclear. In the following, we provide two related perspectives.

Model architecture matters. We have shown that generating ULEOs on MLPs do meet our expectation of exploiting complex, spatial perturbations over simple, channel-wise perturbations, and these perturbations are also effective against CNNs. This implies that the “lazy” behavior might be related to the fact that CNNs tend to use shortcut features [geirhos2020shortcut]. Since convolutions have a channel dependency and have the translation-invariant property, exploiting channel-wise perturbations could be simpler than exploiting more complex, spatial perturbations.

Lack of data may not be relevant. If lack of data is the reason for the “lazy” behavior, we would expect that by using data augmentations, this behavior will be alleviated. However, our results suggest that the ULEO-Augs, which incorporate data augmentations during optimization, still exploit channel-wise perturbations, as shown in Figure 7 of Appendix A. Also, ULEO-Augs have been shown to achieve almost the same performance as ULEOs. Our results of ULEO-GrayAugs have shown that data augmentations might play a role on top of grayscale perturbations. Specifically, we find that ULEO-GrayAugs promote the use of complex, spatial perturbations (see Figure 2) and lead to much stronger results than ULEOs.

7.2 Limitations and Outlook

In this section, we sketch the road ahead for research on unlearnable examples by describing the remaining limitations of our investigation and providing an outlook onto important open points that should be addressed by related work.

Threat model. We have seen that extending the threat model to include more realistic scenarios yields deeper insight into ULEs. Future work should continue to expand the threat model. First, the threat model should explicitly specify the constraints on available resources. For the exploiter, adversarial training may be computationally too expensive to be a viable solution. For the defender, ULE generation must be simple and efficient so that it remains feasible to as many users as possible to apply in order to protect their data. We have demonstrated promising results for ULEs generated using MLPs, a simple model, but more research is necessary to determine which other simple models are also effective and to understand why.

The threat model should also explicitly specify that the intent of the defender is benign. Our experiments show that ULEs are not particularly dangerous. Clean data is more useful to the exploiter than ULEs (ULEOs or ULEO-GrayAugs), but adding ULEs does not hurt the model and may even improve performance. Future ULEs must continue to provide users with the confidence that their ULE-protected data will simply fail to improve the classifier, rather than negatively impacting its performance. Users with the intent to protect their own data do not want to be mistaken for malicious attackers, resulting in e.g., being banned from the platform (or worse) for attempting to actively harm the performance of a classifier.

Adversarial Training. Our work has shown that adversarial training mitigates the effect of ULEs (both ULEOs and our ULEO-GrayAugs) and the resulting classifiers are unexpectedly robust to adversarial examples around clean data. Moving forward, researchers should measure the strength of ULEs against adversarially trained models on both original data and adversarial examples. Future work should seek to understand this surprising finding, and strive for ULEs that remain effective in the face of adversarial training.

Discouraging Shortcuts. The strength of ULEO perturbations, as discussed above, might be related to the extent to which ULEO approaches exploit shortcut features. Future work should further explore limiting the possibilities of shortcuts during ULE generation. Such research promises to lead to more robust data protection and also better theoretical understanding of ULEs.

Data variety. We have carried out the evaluation with data comparable to that used by [huang2021unlearnable]. Future work on ULEs should test on a wider range of data sets. First, larger data sets with more training data and/or more competing classes could be relevant. Second, data sets with very high or very low visual complexity could also yield additional insight into how and why ULEs work.

8 Conclusion

In this paper, we have advanced the state of the art by extending unlearnable examples as originally studied by [huang2021unlearnable] (ULEOs) to a more realistic scenario including a grayscale exploiter. This scenario revealed the dependence of ULEOs on color perturbations and led us to propose ULEO-GrayAugs, a novel version of ULEO that remains robust under the more realistic scenario. We demonstrate, for the first time, that ULEOs generated using simple MLPs can have strong impact on CNN-based classifiers and provide an efficient way to generate successful ULEOs. We also present evidence, from tests of adversarial robust accuracy, that ULEO-GrayAugs has future room for improvement. We have also discussed several directions of future work, for which our paper lays a foundation.


This work was carried out on the Dutch national e-infrastructure with the support of SURF Cooperative.


Appendix A Additional Experimental Results

w/o BDR-2 BDR-4 BDR-6
Clean 94.58 89.12 93.98 94.48
ULEOs 24.40 47.10 22.92 21.39
Table 8: Clean test accuracy (%) on CIFAR-10 with clean training data (for reference) or other poisoned training data (ULEOs, AdvPois, or our ULEO-GrayAugs) mitigated with Bit-Depth Reduction to different bits [xu2018feature].
Ori Add Gray Clean ULEOs ULEO-GrayAugs
10% - 73.48 - -
10% 10% 83.30 -3.64 -1.23
10% 30% 89.85 -11.22 -5.61
10% - 74.23 - -
10% 10% 82.72 -0.05 -2.30
10% 30% 88.46 -0.18 -7.56
Table 9: Clean test accuracy (%) of a model trained by the exploiter (with or without grayscale) on an original (Ori) training set containing 10% of the official training data of CIFAR-10 plus additional (Add) training data of Clean, ULEOs, or ULEO-GrayAugs. For each case with all training data being clean, we directly report the model accuracy, while for each case with partial clean and ULEs, we report its accuracy difference from the case with the same amount of clean training data. Note that after adding 30% the model accuracy starts getting saturated.
Figure 6: Learning curves of the CNNs trained on ULEOs generated using MLPs.
Figure 7: Perturbations generated with only data augmentations (ULEO-Augs) for the three clean images in Figure 1. They still exploit little spatial patterns as perturbations of ULEOs.

Appendix B Implementation details

1:Input: Initialized model weights , random perturbations , perturbation constraints , clean training data (, y) , stop error , model training epochs , iteration count .
5:     for  in  do
6:          Optimize() Eq. 3 Outer
7:     end for
8:      Eq. 3 Inner
9:      Clip()
10:     Error Eval(
12:until Error
Algorithm 1 Sample-wise error-minimizing perturbations

Appendix C ULEO Pipeline

The pipeline of bi-level optimization for generating ULEOs is described in Figure 8. First, as shown in the first row, the perturbations, which are randomly initialized, are added to clean images, and the resulting perturbed images are used to train a DNN model for multiple epochs in the outer optimization. Then, as shown in the second row, given the trained DNN model from the outer optimization, perturbations are optimized for multiple iterations. Finally, the optimized perturbations are added to clean images for model training in the next round of outer optimization. The above whole process is repeated until the pre-defined training accuracy is met.

Figure 8: The pipeline of bi-level optimization for generating ULEOs. The dashed lines represent the information exchange between inner and outer optimization.

Appendix D Additional Examples of ULEOs

Figure 9: Additional ULEO Examples on CIFAR-10: Original images (left), ULEO perturbations (middle), and perturbed images (right).

Appendix E Additional Examples of ULEO-GrayAugs

Figure 10: Additional ULEO-GrayAug Examples on CIFAR-10: Original images (left), ULEO-GrayAug perturbations (middle), and perturbed images (right). Examples for the same clean images as in Figure 9 are shown.

Appendix F ULEOs on ImageNet (100 and 10 classes) and on Upsampled CIFAR-10

Figure 11: Class-wise ULEO Perturbations (224 224) on ImageNet subset with 100 classes used for the (incorrect) experiments reported in Table 5.
Figure 12: ULEO Examples (224 224) on ImageNet subset with 10 classes: Original images (left), ULEO perturbations (middle), and perturbed images (right).
Figure 13: ULEO Examples on upsampled CIFAR-10 (224 224): Original images (left), ULEO perturbations (middle), and perturbed images (right).