Adversarial Examples that Fool both Human and Computer Vision

02/22/2018 ∙ by Gamaleldin F. Elsayed, et al. ∙ Google 0

Machine learning models are vulnerable to adversarial examples: small changes to images can cause computer vision models to make mistakes such as identifying a school bus as an ostrich. However, it is still an open question whether humans are prone to similar mistakes. Here, we create the first adversarial examples designed to fool humans, by leveraging recent techniques that transfer adversarial examples from computer vision models with known parameters and architecture to other models with unknown parameters and architecture, and by modifying models to more closely match the initial processing of the human visual system. We find that adversarial examples that strongly transfer across computer vision models influence the classifications made by time-limited human observers.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 5

page 7

page 8

page 13

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Machine learning models are easily fooled by adversarial examples: inputs optimized by an adversary to produce an incorrect model classification (szegedy2013intriguing; Biggio13). In computer vision, an adversarial example is usually an image formed by making small perturbations to an example image. Many algorithms for constructing adversarial examples (szegedy2013intriguing; goodfellow2014explaining; papernot2015limitations; kurakin17physical; madry2017towards) rely on access to both the architecture and the parameters of the model to perform gradient-based optimization on the input. Without similar access to the brain, these methods do not seem applicable to constructing adversarial examples for humans.

One interesting phenomenon is that adversarial examples often transfer from one model to another, making it possible to attack models that an attacker has no access to szegedy2013intriguing; liu2016delving. This naturally raises the question of whether humans are susceptible to these adversarial examples. Clearly, humans are prone to many cognitive biases and optical illusions hillis2002combining

, but these generally do not resemble small perturbations of natural images, nor are they currently generated by optimization of a ML loss function. Thus the current understanding is that this class of transferable adversarial examples has no effect on human visual perception, yet no thorough empirical investigation has yet been performed.

A rigorous investigation of the above question creates an opportunity both for machine learning to gain knowledge from neuroscience, and for neuroscience to gain knowledge from machine learning. Neuroscience has often provided existence proofs for machine learning—before we had working object recognition algorithms, we hypothesized it should be possible to build them because the human brain can recognize objects. See Hassabis et al. hassabis2017neuroscience

for a review of the influence of neuroscience on artificial intelligence. If we knew conclusively that the human brain could resist a certain class of adversarial examples, this would provide an existence proof for a similar mechanism in machine learning security. If we knew conclusively that the brain can be fooled by adversarial examples, then machine learning security research should perhaps shift its focus from designing models that are robust to adversarial examples

(szegedy2013intriguing; goodfellow2014explaining; papernot2016distillation; xu2017feature; ensemble_training; madry2017towards; kolter2017provable; buckman2018thermometer) to designing systems that are secure despite including non-robust machine learning components. Likewise, if adversarial examples developed for computer vision affect the brain, this phenomenon discovered in the context of machine learning could lead to a better understanding of brain function.

In this work, we construct adversarial examples that transfer from computer vision models to the human visual system. In order to successfully construct these examples and observe their effect, we leverage three key ideas from machine learning, neuroscience, and psychophysics. First, we use the recent black box adversarial example construction techniques that create adversarial examples for a target model without access to the model’s architecture or parameters. Second, we adapt machine learning models to mimic the initial visual processing of humans, making it more likely that adversarial examples will transfer from the model to a human observer. Third, we evaluate classification decisions of human observers in a time-limited setting, so that even subtle effects on human perception are detectable. By making image presentation sufficiently brief, humans are unable to achieve perfect accuracy even on clean images, and small changes in performance lead to more measurable changes in accuracy. Additionally, a brief image presentation limits the time in which the brain can utilize recurrent and top-down processing pathways potter2014detecting

, and is believed to make the processing in the brain more closely resemble that in a feedforward artificial neural network.

We find that adversarial examples that transfer across computer vision models do successfully influence the perception of human observers, thus uncovering a new class of illusions that are shared between computer vision models and the human brain.

2 Background and Related Work

2.1 Adversarial Examples

Goodfellow et al. goodfellow2017

define adversarial examples as “inputs to machine learning models that an attacker has intentionally designed to cause the model to make a mistake.” In the context of visual object recognition, adversarial examples are images usually formed by applying a small perturbation to a naturally occurring image in a way that breaks the predictions made by a machine learning classifier. See Figure

1a for a canonical example where adding a small perturbation to an image of a panda causes it to be misclassified as a gibbon. This perturbation is small enough to be imperceptible (i.e., it cannot be saved in a standard png file that uses 8 bits because the perturbation is smaller than of the pixel dynamic range). This perturbation relies on carefully chosen structure based on the parameters of the neural network—but when magnified to be perceptible, human observers cannot recognize any meaningful structure. Note that adversarial examples also exist in other domains like malware detection (grosse17), but we focus here on image classification tasks.

Two aspects of the definition of adversarial examples are particularly important for this work:

  1. Adversarial examples are designed to cause a mistake

    . They are not (as is commonly misunderstood) defined to differ from human judgment. If adversarial examples were defined by deviation from human output, it would by definition be impossible to make adversarial examples for humans. On some tasks, like predicting whether input numbers are prime, there is a clear objectively correct answer, and we would like the model to get the correct answer, not the answer provided by humans (time-limited humans are probably not very good at guessing whether numbers are prime). It is challenging to define what constitutes a mistake for visual object recognition. After adding a perturbation to an image it likely no longer corresponds to a photograph of a real physical scene. Furthermore, it is philosophically difficult to define the real object class for an image that is not a picture of a real object. In this work, we assume that an adversarial image is misclassified if the output label differs from the human-provided label of the clean image that was used as the starting point for the adversarial image. We make small adversarial perturbations and we assume that these small perturbations are insufficient to change the true class.

  2. Adversarial examples are not (as is commonly misunderstood) defined to be imperceptible. If this were the case, it would be impossible by definition to make adversarial examples for humans, because changing the human’s classification would constitute a change in what the human perceives (e.g., see Figure 1b,c).

2.1.1 Clues that Transfer to Humans is Possible

Some observations give clues that transfer to humans may be possible. Adversarial examples are known to transfer across machine learning models, which suggest that these adversarial perturbations may carry information about target adversarial classes. Adversarial examples that fool one model often fool another model with a different architecture szegedy2013intriguing, another model that was trained on a different training set szegedy2013intriguing, or even trained with a different algorithm papernot2016transferability

(e.g., adversarial examples designed to fool a convolution neural network may also fool a decision tree). The transfer effect makes it possible to perform black box attacks, where adversarial examples fool models that an attacker does not have access to

(szegedy2013intriguing; papernot2017practical). Kurakin et al. kurakin17physical found that adversarial examples transfer from the digital to the physical world, despite many transformations such as lighting and camera effects that modify their appearance when they are photographed in the physical world. Liu et al. liu2016delving showed that the transferability of an adversarial example can be greatly improved by optimizing it to fool many machine learning models rather than one model: an adversarial example that fools five models used in the optimization process is more likely to fool an arbitrary sixth model.

Moreover, recent studies on stronger adversarial examples that transfer across multiple settings have sometimes produced adversarial examples that appear more meaningful to human observers. For instance, a cat adversarially perturbed to resemble a computer athalye2017synthesizing while transfering across geometric transformations develops features that appear computer-like (Figure 1b), and the ‘adversarial toaster’ from Brown et al. brown2017adversarial possesses features that seem toaster-like (Figure 1c). This development of human-meaningful features is consistent with the adversarial example carrying true feature information and thus coming closer to fooling humans, if we acounted for the notable differences between humans visual processing and computer vision models (see section 2.2.2)

Figure 1: Adversarial examples optimized on more models / viewpoints sometimes appear more meaningful to humans. This observation is a clue that machine-to-human transfer may be possible. (a) A canonical example of an adversarial image reproduced from goodfellow2014explaining. This adversarial attack has moderate but limited ability to fool the model after geometric transformations or to fool models other than the model used to generate the image. (b) An adversarial attack causing a cat image to be labeled as a computer while being robust to geometric transformations, adopted from athalye2017blog. Unlike the attack in a, the image contains features that seem semantically computer-like to humans. (c) An adversarial patch that causes images to be labeled as a toaster, optimized to cause misclassification from multiple viewpoints, reproduced from brown2017adversarial. Similar to b, the patch contains features that appear toaster-like to a human.

2.2 Biological and Artificial Vision

2.2.1 Similarities

Recent research has found similarities in representation and behavior between deep convolutional neural networks (CNNs) and the primate visual system (cadieu2014deep). This further motivates the possibility that adversarial examples may transfer from computer vision models to humans. Activity in deeper CNN layers has been observed to be predictive of activity recorded in the visual pathway of primates cadieu2014deep; Yamins2016UsingGD. Reisenhuber and Poggio riesenhuber1999hierarchical developed a model of object recognition in cortex that closely resembles many aspects of modern CNNs. Kummerer et al. kummerer2014deep; kummerer2017deepgaze showed that CNNs are predictive of human gaze fixation. Style transfer (gatys2015neural) demonstrated that intermediate layers of a CNN capture notions of artistic style which are meaningful to humans. Freeman et al. freeman2011metamers used representations in a CNN-like model to develop psychophysical metamers, which are indistinguishable to humans when viewed briefly and with carefully controlled fixation. Psychophysics experiments have compared the pattern of errors made by humans, to that made by neural network classifiers geirhos2017comparing; Rajalingham240614.

2.2.2 Notable Differences

Differences between machine and human vision occur early in the visual system. Images are typically presented to CNNs as a static rectangular pixel grid with constant spatial resolution. The primate eye on the other hand has an eccentricity dependent spatial resolution. Resolution is high in the fovea, or central of the visual field, but falls off linearly with increasing eccentricity (retinal). A perturbation which requires high acuity in the periphery of an image, as might occur as part of an adversarial example, would be undetectable by the eye, and thus would have no impact on human perception. Further differences include the sensitivity of the eye to temporal as well as spatial features, as well as non-uniform color sensitivity (land2012animal). Modeling the early visual system continues to be an area of active study (olshausen201320; mcintosh2016deep). As we describe in section 3.1.2, we mitigate some of these differences by using a biologically-inspired image input layer.

Beyond early visual processing, there are more major computational differences between CNNs and the human brain. All the CNNs we consider are fully feedforward architectures, while the visual cortex has many times more feedback than feedforward connections, as well as extensive recurrent dynamics (olshausen201320). Possibly due to these differences in architecture, humans have been found experimentally to make classification mistakes that are qualitatively different than those made by deep networks eckstein2017humans. Additionally, the brain does not treat a scene as a single static image, but actively explores it with saccades (ibbotson2011visual). As is common in psychophysics experiments (kovacs1995cortical), we mitigate these differences in processing by limiting both the way in which the image is presented, and the time which the subject has to process it, as described in section 3.2.

3 Methods

Section 3.1 details our machine learning vision pipeline. Section 3.2 describes our psychophysics experiment to evaluate the impact of adversarial images on human subjects.

3.1 The Machine Learning Vision Pipeline

3.1.1 Dataset

In our experiment, we used images from ImageNet

deng2009imagenet. ImageNet contains 1,000 highly specific classes that typical people may not be able to identify, such as “Chesapeake Bay retriever”. Thus, we combined some of these fine classes to form six coarse classes we were confident would be familiar to our experiment subjects (dog, cat, broccoli, cabbage, spider, snake). We then grouped these six classes into the following groups: (i) Pets group (dog and cat images); (ii) Hazard group (spider and snake images); (iii) Vegetables group (broccoli and cabbage images).

3.1.2 Ensemble of Models

We constructed an ensemble of CNN models trained on ImageNet (). Each model is an instance of one of these architectures: Inception V3, Inception V4, Inception ResNet V2, ResNet V2 50, ResNet V2 101, and ResNet V2 152 (inceptionv3; inceptionresnet; resnet). To better match the initial processing of human visual system, we prepend each model with a retinal layer, which pre-processes the input to incorporate some of the transformations performed by the human eye. In that layer, we perform an eccentricity dependent blurring of the image to approximate the input which is received by the visual cortex of human subjects through their retinal lattice. The details of this retinal layer are described in Appendix B. We use eccentricity-dependent spatial resolution measurements (based on the macaque visual system) retinal

, along with the known geometry of the viewer and the screen, to determine the degree of spatial blurring at each image location. This limits the CNN to information which is also available to the human visual system. The layer is fully differentiable, allowing gradients to backpropagate through the network when running adversarial attacks. Further details of the models and their classification performance are provided in Appendix

E.

3.1.3 Generating Adversarial Images

For a given image group, we wish to generate targeted adversarial examples that strongly transfer across models. This means that for a class pair (e.g., : cats and : dogs), we generate adversarial perturbations such that models will classify perturbed images from as ; similarly, we perturbed images from to be classified as . A different perturbation is constructed for each image; however, the norm of all perturbations are constrained to be equal to a fixed .

Formally: given a classifier , which assigns probability to each coarse class given an input image , a specified target class and a maximum perturbation , we want to find the image that minimizes with the constraint that . See Appendix C for details on computing the coarse class probabilities . With the classifier’s parameters, we can perform iterated gradient descent on in order to generate our (see Appendix D). This iterative approach is commonly employed to generate adversarial images kurakin17physical.

Figure 2: Experiment setup and task. (a) examples images from the conditions (image, adv, and flip). Top: adv targeting broccoli class. bottom: adv targeting cat class. See definition of conditions at Section 3.2.2. (b) example images from the false experiment condition. (c) Experiment setup and recording apparatus. (d) Task structure and timings. The subject is asked to repeatedly identify which of two classes (e.g. dog vs. cat) a briefly presented image belongs to. The image is either adversarial, or belongs to one of several control conditions. See Section 3.2 for details.

3.2 Human Psychophysics Experiment

38 subjects with normal or corrected vision participated in the experiment. Subjects gave informed consent to participate, and were awarded a reasonable compensation for their time and effort111The study was granted an Institutional Review Board (IRB) exemption by an external, independent, ethics board (Quorum review ID 33016)..

3.2.1 Experimental Setup

Subjects sat on a fixed chair cm away from a high refresh-rate computer screen (ViewSonic XG2530) in a room with dimmed light (Figure 2c). Subjects were asked to classify images that appeared on the screen to one of two classes (two alternative forced choice) by pressing buttons on a response time box (LOBES v5/6:USTC) using two fingers on their right hand. The assignment of classes to buttons was randomized for each experiment session. Each trial started with a fixation cross displayed in the middle of the screen for ms, instructing subjects to direct their gaze to the fixation cross (Figure 2d). After the fixation period, an image of size ( visual angle) was presented briefly at the center of the screen for a period of ms ( ms for some sessions). The image was followed by a sequence of ten high contrast binary random masks, each displayed for ms (see example in Figure 2d). Subjects were asked to classify the object in the image (e.g., cat vs. dog) by pressing one of two buttons starting at the image presentation time and lasting until ms (or ms for some sessions) after the mask was turned off. The waiting period to start the next trial was of the same duration whether subjects responded quickly or slowly. Realized exposure durations were ms from the times reported above, as measured by a photodiode and oscilloscope in a separate test experiment. Each subject’s response time was recorded by the response time box relative to the image presentation time (monitored by a photodiode). In the case where a subject pressed more than one button in a trial, only the class corresponding to their first choice was considered. Each subject completed between 140 and 950 trials.

3.2.2 Experiment Conditions

Each experimental session included only one of the image groups (Pets, Vegetables or Hazard). For each group, images were presented in one of four conditions as follows:

  • image: images from the ImageNet training set (rescaled to the [] range to avoid clipping when adversarial perturbations are added; see Figure 2a left) ).

  • adv: we added adversarial perturbation to image, crafted to cause machine learning models to misclassify adv as the opposite class in the group (e.g., if image was originally a cat, we perturbed the image to be classified as a dog). We used a perturbation size large enough to be noticeable by humans on the computer screen but small with respect to the image intensity scale (; see Figure 2a middle). In other words, we chose to be large (to improve the chances of adversarial examples transfer to time-limited human) but kept it small enough that the perturbations are class-preserving (as judged by a no-limit human).

  • flip: similar to adv, but the adversarial perturbation () is flipped vertically before being added to image. This is a control condition, chosen to have nearly identical perturbation statistics to the adv condition (see Figure 2a right). We include this condition because if adversarial perturbations reduce the accuracy of human observers, this could just be because the perturbations degrade the image quality.

  • false: in this condition, subjects are forced to make a mistake. To show that adversarial perturbations actually control the chosen class, we include this condition where neither of the two options available to the subject is correct, so their accuracy is always zero. We test whether adversarial perturbations can influence which of the two wrong choices they make. We show a random image from an ImageNet class other than the two classes in the group, and adversarially perturb it toward one of the two classes in the group. The subject must then choose from these two classes. For example, we might show an airplane adversarially perturbed toward the dog class, while a subject is in a session classifying images as cats or dogs. We used a slightly larger perturbation in this condition (; see Figure 2b).

The conditions (image, adv, flip) are ensured to have balanced number of trials within a session by either uniformly sampling the condition type in some of the sessions or randomly shuffling a sequence with identical trial counts for each condition in other sessions. The number of trials for each class in the group was also constrained to be equal. Similarly for the false condition the number of trials adversarially perturbed towards class 1 and class 2 were balanced for each session. To reduce subjects using strategies based on overall color or brightness distinctions between classes, we pre-filtered the dataset to remove images that showed an obvious effect of this nature. Notably, in the pets group we excluded images that included large green lawns or fields, since in almost all cases these were photographs of dogs. See Appendix F for images used in the experiment for each coarse class. For example images for each condition, see Figures Supp.2 through Supp.5.

4 Results

4.1 Adversarial Examples Transfer to Computer Vision Models

We first assess the transfer of our constructed images to two test models that were not included in the ensemble used to generate adversarial examples. These test models are an adversarially trained Inception V3 model kurakin2016mlatscale and a ResNet V2 50 model. Both models perform well ( accuracy) on clean images. Attacks in the adv and false conditions succeeded against the test models between and of the time, depending on image class and experimental condition. The flip condition changed the test model predictions on fewer than of images in all conditions, validating its use as a control. See Tables Supp.3 - Supp.6 for accuracy and attack success measurements on both train and test models for all experimental conditions.


Figure 3: Adversarial images transfer to humans. (a) By adding adversarial perturbations to an image, we are able to bias which of two incorrect choices subjects make. Plot shows probability of choosing the adversarially targeted class when the true image class is not one of the choices that subjects can report (false

condition), estimated by averaging the responses of all subjects (two-tailed t-test relative to chance level

). (b) Adversarial images cause more mistakes than either clean images or images with the adversarial perturbation flipped vertically before being applied. Plot shows probability of choosing the true image class, when this class is one of the choices that subjects can report, averaged across all subjects. Accuracy is significantly less than 1 even for clean images due to the brief image presentation time. (error bars SE; *: ; **: ; ***: ) (c) A spider image that time-limited humans frequently perceived as a snake (top parentheses: number of subjects tested on this image). right: accuracy on this adversarial image when presented briefly compared to when presented for long time (long presentation is based on a post-experiment email survey of 13 participants).

Figure 4: Adversarial images effect human response time. (a) Average response time to false images. (b) Average response time for adv, image, and flip conditions (error bars SE; * reflects ; two sample two-tailed t-test). In all three stimulus groups, there was a trend towards slower response times in the adv condition than in either control group. (c) Probability of choosing the adversarially targeted class in the false condition, estimated by averaging the responses of all subjects (two-tailed t-test relative to chance level ; error bars SE; *: ; **: ; ***: ). The probability of choosing the targeted label is computed by binning trials within percentile reaction time ranges (0-33 percentile, 33-67 percentile, and 67-100 percentile). The bias relative to chance level of 0.5 is significant when people reported their decision quickly (when they may have been more confident), but not significant when they reported their decision more slowly. As discussed in Section 4.2.2, differing effect directions in (b) and (c) may be explained by adversarial perturbations decreasing decision confidence in the adv condition, and increasing decision confidence in the false condition.

4.2 Adversarial Examples Transfer to Humans

We now show that adversarial examples transfer to time-limited humans. One could imagine that adversarial examples merely degrade image quality or discard information, thus increasing error rate. To rule out this possibility, we begin by showing that for a fixed error rate (in a setting where the human is forced to make a mistake), adversarial perturbations influence the human choice among two incorrect classes. Then, we demonstrate that adversarial examples increase the error rate.

4.2.1 Influencing the Choice between two Incorrect Classes

As described in Section 3.2.2, we used the false condition to test whether adversarial perturbations can influence which of two incorrect classes a subject chooses (see example images in Figure Supp.2).

We measured our effectiveness at changing the perception of subjects using the rate at which subjects reported the adversarially targeted class. If the adversarial perturbation were completely ineffective we would expect the choice of targeted class to be uncorrelated with the subject’s reported class. The average rate at which the subject chooses the target class metric would be as each false image can be perturbed to class 1 or class 2 in the group with equal probability. Figure 3a shows the probability of choosing the target class averaged across all subjects for all the three experiment groups. In all cases, the probability was significantly above the chance level of . This demonstrates that the adversarial perturbations generated using CNNs biased human perception towards the targeted class. This effect was stronger for the the hazard, then pets, then vegetables group. This difference in probability among the class groups was significant (; Pearson Chi-2 GLM test).

We also observed a significant difference in the mean response time between the class groups (; one-way ANOVA test; see Figure 4a). Interestingly, the response time pattern across image groups (Figure 4a)) was inversely correlated to the perceptual bias pattern (Figure 3a)) (Pearson correlation , ; two-tailed Pearson correlation test). In other words, subjects made quicker decisions for the hazard group, then pets group, and then vegetables group. This is consistent with subjects being more confident in their decision when the adversarial perturbation was more successful in biasing subjects perception. This inverse correlation between attack success and response time was observed within group, as well as between groups (Figure 4).

4.2.2 Adversarial Examples Increase Human Error Rate

We demonstrated that we are able to bias human perception to a target class when the true class of the image is not one of the options that subjects can choose. Now we show that adversarial perturbations can be used to cause the subject to choose an incorrect class even though the correct class is an available response. As described in Section 3.2.2, we presented image, flip, and adv.

Most subjects had lower accuracy in adv than image (Table A). This is also reflected on the average significantly lower accuracy across all subjects for the adv than image (Figure 3b).

The above result may simply imply that the signal to noise ratio in the adversarial images is lower than that of clean images. While this issue is partially addressed with the false experiment results in Section 4.2.1, we additionally tested accuracy on flip images. This control case uses perturbations with identical statistics to adv up to a flip of the vertical axis. However, this control breaks the pixel-to-pixel correpsondence between the adversarial perturbation and the image. The majority of subjects had lower accuracy in the adv condition than in the flip condition (Table A). When averaging across all trials, this effect was very significant for the pets and vegetables group (), and less significant for the hazard group () (Figure 3b). These results suggest that the direction of the adversarial image perturbation, in combination with a specific image, is perceptually relevant to features that the human visual system uses to classify objects. These findings thus give evidence that strong black box adversarial attacks can transfer from CNNs to humans, and show remarkable similarities between failure cases of CNNs and human vision.

In all cases, the average response time was longer for the adv condition relative to the other conditions (Figure 4b), though this result was only statistically significant for two comparisons. If this trend remains predictive, it would seem to contradict the case when we presented false images (Figure 4a). One interpretation is that in the false case, the transfer of adversarial features to humans was accompanied by more confidence, whereas here the transfer was accompanied by less confidence, possibly due to competing adversarial and true class features in the adv condition.


Figure 5: Examples of the types of manipulations performed by the adversarial attack. See Figures Supp.3 through Supp.5 for additional examples of adversarial images. Also see Figure Supp.2 for adversarial examples from the false condition.

5 Discussion

Our results invite several questions that we discuss briefly.

5.1 Have we actually fooled human observers or did we change the true class?

One might naturally wonder whether we have fooled the human observer or whether we have replaced the input image with an image that actually belongs to a different class. In our work, the perturbations we made were small enough that they generally do not change the output class for a human who has no time limit (the reader may verify this by observing Figures 2a,b, 3c, and Supp.2 through Supp.5).

. We can thus be confident that we did not change the true class of the image, and that we really did fool the time-limited human. Future work aimed at fooling humans with no time-limit will need to tackle the difficult problem of obtaining a better ground truth signal than visual labeling by humans.

5.2 How do the adversarial examples work?

We did not design controlled experiments to prove that the adversarial examples work in any specific way, but we informally observed a few apparent patterns illustrated in Figure 5: disrupting object edges, especially by mid-frequency modulations perpendicular to the edge; enhancing edges both by increasing contrast and creating texture boundaries; modifying texture; and taking advantage of dark regions in the image, where the perceptual magnitude of small perturbations can be larger.

5.3 What are the implications for machine learning security and society?

The fact that our transfer-based adversarial examples fool time-limited humans but not no-limit humans suggests that the lateral and top-down connections used by the no-limit human are relevant to human robustness to adversarial examples. This suggests that machine learning security research should explore the significance of these top-down or lateral connections further. One possible explanation for our observation is that no-limit humans are fundamentally more robust to adversarial example and achieve this robustness via top-down or lateral connections. If this is the case, it could point the way to the development of more robust machine learning models. Another possible explanation is that no-limit humans remain highly vulnerable to adversarial examples but adversarial examples do not transfer from feed-forward networks to no-limit humans because of these architectural differences.

Our results suggest that there is a risk that imagery could be manipulated to cause human observers to have unusual reactions; for example, perhaps a photo of a politician could be manipulated in a way that causes it to be perceived as unusually untrustworthy or unusually trustworthy in order to affect the outcome of an election.

5.4 Future Directions

In this study, we designed a procedure that according to our hypothesis would transfer adversarial examples to humans. An interesting set of questions relates to how sensitive that transfer is to different elements of our experimental design. For example: How does transfer depend on ? Was model ensembling crucial to transfer? Can the retinal preprocessing layer be removed? We suspect that retinal preprocessing and ensembling are both important for transfer to humans, but that could be made smaller. See Figure Supp.1 for a preliminary exploration of these questions.

6 Conclusion

In this work, we showed that adversarial examples based on perceptible but class-preserving perturbations that fool multiple machine learning models also fool time-limited humans. Our findings demonstrate striking similarities between convolutional neural networks and the human visual system. We expect this observation to lead to advances in both neuroscience and machine learning research.

Acknowledgements

We are grateful to Ari Morcos, Bruno Olshausen, David Sussillo, Hanlin Tang, John Cunningham, Santani Teng, and Daniel Yamins for useful discussions. We also thank Dan Abolafia Simon Kornblith, Katherine Lee, Niru Maheswaranathan, Catherine Olsson, David Sussillo, and Santani Teng, for helpful feedback on the manuscript. We thank Google Brain residents for useful feedback on the work. We also thank Deanna Chen, Leslie Philips, Sally Jesmonth, Phing Turner, Melissa Strader, Lily Peng, and Ricardo Prada for assistance with IRB and experiment setup.

References

Appendix A Supplementary Figures and Tables

Figure Supp.1: Intuition on factors contributing to the transfer to humans. To give some intuition on the factors contributing to transfer, we examine a cat image from ImageNet ((a) left) that is already perceptually close to the target adversarial dog class, making the impact of subtle adversarial effects more obvious even on long observation ((a) right). Note that this image was not used in the experiment, and that typical images in the experiment did not fool unconstrained humans. (b) shows adversarial images with different perturbation sizes ranging from to . Even smaller perturbation make the adversarial image perceptually more similar to a dog image, which suggests that transfer to humans may be robust to small . (c) Investigation of the importance of matching initial visual processing. The adversarial image on the left is similar to a dog, while removing the retina layer leads to an image which is less similar to a dog. This suggests that matching initial processing is an important factor in transferring adversarial examples to humans. (d) Investigation of the importance of the number of models in the ensemble. We generated adversarial images with using an ensemble of size models. One can see that adversarial perturbations become markedly less similar to a dog class as the number of models in the ensemble is reduced. This supports the importance of ensembling to the transfer of adversarial examples to humans.
Table Supp.1: Adversarial examples transfer to humans. Number of subjects that reported the correct class of images in the adv condition with lower mean accuracy compared to their mean accuracy in the image and flip conditions.
Group adv image adv flip total

pets
29 22 35
hazard 19 16 24

vegetables
21 23 32

[1.5pt]

Table Supp.2: Accuracy of models on ImageNet validation set. models trained on ImageNet with retina layer pre-pended and with train data augmented with rescaled images in the range of []; model trained with adversarial examples augmented data. First ten models are models used in the adversarial training ensemble. Last two models are models used to test the transferability of adversarial examples.
Model Top-1 accuracy

Resnet V2 101
0.77

Resnet V2 101
0.7205

Inception V4
0.802

Inception V4
0.7518


Inception Resnet V2
0.804

Inception Resnet V2
0.7662

Inception V3
0.78

Inception V3
0.7448

Resnet V2 152
0.778


Resnet V2 50
0.708



Resnet V2 50 (test)
0.756

Inception V3 (test)
0.776

[1.5pt]

Table Supp.3: Accuracy of ensemble used to generate adversarial examples on images at different conditions. models trained on ImageNet with retina layer appended and with train data augmented with rescaled images in the range of []; Numbers triplet reflects accuracy on images from pets, hazard, and vegetables groups, respectively.
Train Model adv () flip ()

Resnet V2 101
0.0, 0.0, 0.0 95, 92, 91

Resnet V2 101
0.0, 0.0, 0.0 87, 87, 77

Inception V4
0.0, 0.0, 0.0 96, 95, 86

Inception V4
0.0, 0.0, 0.0 87, 87, 73


Inception Resnet V2
0.0, 0.0, 0.0 97, 95, 95

Inception Resnet V2
0.0, 0.0, 0.0 87, 83, 73


Inception V3
0.0, 0.0, 0.0 97, 94, 89

Inception V3
0.0, 0.0, 0.0 83, 86, 74

Resnet V2 152
0.0, 0.0, 0.0 96, 95, 91


Resnet V2 50
0.0, 0.0, 0.0 82, 85, 81

[1.5pt]

Table Supp.4: Accuracy of test models on images at different conditions. model trained on both clean and adversarial images. Numbers triplet is accuracy on pets, hazard, and vegetables groups, respectively.
Model adv () flip ()

Resnet V2 50
8.7, 9.4, 13 93, 91, 85

Inception V3
6.0, 6.9, 17 95, 92, 94

[1.5pt]

Table Supp.5: Attack success on model ensemble. Same convention as Table Supp.3
Model adv () flip ()

Resnet V2 101
100, 100, 100 2, 0, 0

Resnet V2 101
100, 100, 100 3, 0, 0

Inception V4
100, 100, 100 1, 0, 1

Inception V4
100, 100, 100 4, 1, 0

Inception Resnet V2
100, 100, 100 1, 0, 1

Inception Resnet V2
100, 100, 100 5, 2, 0


Inception V3
100, 100, 100 1, 0, 0

Inception V3
100, 100, 100 5, 1, 1

Resnet V2 152
100, 100, 100 1, 0, 0


Resnet V2 50
100, 100, 100 3, 1, 0


[1.5pt]

Table Supp.6: Attack success on test models. model trained on both clean and adversarial images. Numbers triplet is error on pets, hazard, and vegetables groups, respectively.
Model adv () flip ()

Resnet V2 50
87, 85, 57 1.3, 0.0, 0.0

Inception V3
89, 87, 74 1.5, 0.5, 0.0


[1.5pt]

Appendix B Details of retinal blurring layer

b.1 Computing the primate eccentricity map

Let be the distance (in meters) of the viewer from the display and be the height and width of a square image (in meters). For every spatial position (in meters) in the image we compute the retinal eccentricity (in radians) as follows:

(1)

and turn this into a target resolution in units of radians

(2)

We then turn this target resolution into a target spatial resolution in the plane of the screen,

(3)
(4)

This spatial resolution for two point discrimination is then converted into a corresponding low-pass cutoff frequency, in units of cycles per pixel,

(5)

where the numerator is rather than since the two point discrimination distance is half the wavelength.

Finally, this target low-pass spatial frequency

for each pixel is used to linearly interpolate each pixel value from the corresponding pixel in a set of low pass filtered images, as described in the following algorithm (all operations on matrices are assumed to be performed elementwise),

1:
2:
3:
4:
5:
6:for  do
7:     
8:     
9:end for
10:
11:
Algorithm 1 Applying retinal blur to an image

We additionally cropped to width before use, to remove artifacts from the image edge.

Note that because the per-pixel blurring is performed using linear interpolation into images that were low-pass filtered in Fourier space, this transformation is both fast to compute and fully differentiable.

Appendix C Calculating probability of coarse class

To calculate the probability a model assigns to a coarse class, we summed probabilities assigned to the individual classes within the coarse class. Let be the set of all individual labels in the target coarse class. Let be all other individual labels not in the target coarse class. , since there are 1000 labels in ImageNet. Let be the coarse class variable and be our target coarse class. We can compute the probability a model assigns to a coarse class given image as

(6)

where is the unnormalized probability assigned to fine class (in practice = of class

). The coarse logit of the model with respect to the target class

is then .

Appendix D Adversarial images generation.

In the pipeline, an image is drawn from the source coarse class and perturbed to be classified as an image from the target coarse class. The attack method we use, the iterative targeted attack [kurakin17physical], is performed as

(7)

where is the cost function as described below, is the label of the target class, is the step size, is the original clean image, and is the final adversarial image. We set , and is given per-condition in Section 3.2.2. After optimization, any perturbation whose -norm was less than was scaled to have -norm of , for consistency across all perturbations.

Our goal was to create adversarial examples that transferred across many ML models before assessing their transferability to humans. To accomplish this, we created an ensemble from the geometric mean of several image classifiers, and performed the iterative attack on the ensemble loss 

[liu2016delving]

(8)
(9)

where is the coarse class probabilities from model , and is the probability from the ensemble. In practice, is equivalent to standard cross entropy loss based on coarse logits averaged across models in the ensemble (see Appendix C for the coarse logit definition).

To encourage a high transfer rate, we retained only adversarial examples that were successful against all 10 models for the adv condition and at least models for the false condition (see Section 3.2.2 for condition definitions).

Appendix E Convolutional Neural Network Models

Some of the models in our ensemble are from a publicly available pretrained checkpoints222https://github.com/tensorflow/models/tree/master/research/slim, and others are our own instances of the architectures, specifically trained for this experiment on ImageNet with the retinal layer prepended. To encourage invariance to image intensity scaling, we augmented each training batch with another batch with the same images but rescaled in the range of [], instead of []. Supplementary Table A identifies all ten models used in the ensemble, and shows their top-1 accuracies, along with two holdout models that we used for evaluation.

Image removed due to file size constraints. See http://goo.gl/SJ8jpq for full Supplemental Material with all images.

Figure Supp.2: Adversarial Examples for false condition (a) pets group. (b) hazard group. (c) vegetables group.

Image removed due to file size constraints. See http://goo.gl/SJ8jpq for full Supplemental Material with all images.

Figure Supp.3: Adversarial Examples pets group

Image removed due to file size constraints. See http://goo.gl/SJ8jpq for full Supplemental Material with all images.

Figure Supp.4: Adversarial Examples hazard group

Image removed due to file size constraints. See http://goo.gl/SJ8jpq for full Supplemental Material with all images.

Figure Supp.5: Adversarial Examples vegetables group

Appendix F Image List from Imagenet

The specific imagenet images used from each class in the experiments in this paper are as follows:

dog:

’n02106382564.JPEG’, ’n02110958598.JPEG’, ’n0210155613462.JPEG’, ’n021136247358.JPEG’, ’n021137992538.JPEG’, ’n0209163511576.JPEG’, ’n021063822781.JPEG’, ’n02112706105.JPEG’, ’n0209557010951.JPEG’, ’n020938595274.JPEG’, ’n0210952510825.JPEG’, ’n020962941400.JPEG’, ’n02086646241.JPEG’, ’n020982865642.JPEG’, ’n021063829015.JPEG’, ’n020903799754.JPEG’, ’n0210231810390.JPEG’, ’n020866464202.JPEG’, ’n020869105053.JPEG’, ’n021139783051.JPEG’, ’n020938593809.JPEG’, ’n021052512485.JPEG’, ’n0210952535418.JPEG’, ’n021089157834.JPEG’, ’n02113624430.JPEG’, ’n020932567467.JPEG’, ’n020870462701.JPEG’, ’n020903798849.JPEG’, ’n02093754717.JPEG’, ’n0208607915905.JPEG’, ’n021024804466.JPEG’, ’n021076835333.JPEG’, ’n021023188228.JPEG’, ’n02099712867.JPEG’, ’n020942581958.JPEG’, ’n0210904725075.JPEG’, ’n021136244304.JPEG’, ’n0209747410985.JPEG’, ’n020910323832.JPEG’, ’n02085620859.JPEG’, ’n02110806582.JPEG’, ’n020857828327.JPEG’, ’n020942585318.JPEG’, ’n020870465721.JPEG’, ’n02095570746.JPEG’, ’n020996013771.JPEG’, ’n0210248041.JPEG’, ’n020869101048.JPEG’, ’n020941147299.JPEG’, ’n0210855113160.JPEG’, ’n021101859847.JPEG’, ’n0209729813025.JPEG’, ’n0209729816751.JPEG’, ’n02091467555.JPEG’, ’n021137992504.JPEG’, ’n0208578214116.JPEG’, ’n0209747413885.JPEG’, ’n021052518108.JPEG’, ’n021137993415.JPEG’, ’n020955708170.JPEG’, ’n020882381543.JPEG’, ’n020970476.JPEG’, ’n021040295268.JPEG’, ’n0210058311473.JPEG’, ’n021139786888.JPEG’, ’n021043651737.JPEG’, ’n020961774779.JPEG’, ’n021076835303.JPEG’, ’n0210891511155.JPEG’, ’n020869101872.JPEG’, ’n021065508383.JPEG’, ’n020880942191.JPEG’, ’n0208562011897.JPEG’, ’n020960514802.JPEG’, ’n021007353641.JPEG’, ’n020910321389.JPEG’, ’n021063824671.JPEG’, ’n020972989059.JPEG’, ’n02107312280.JPEG’, ’n0211188986.JPEG’, ’n021139785397.JPEG’, ’n020972093461.JPEG’, ’n020898671115.JPEG’, ’n020976584987.JPEG’, ’n020941144125.JPEG’, ’n02100583130.JPEG’, ’n021121375859.JPEG’, ’n0211379919636.JPEG’, ’n020880945488.JPEG’, ’n02089078393.JPEG’, ’n020984131794.JPEG’, ’n021137991970.JPEG’, ’n020910323655.JPEG’, ’n0210585511127.JPEG’, ’n020962943025.JPEG’, ’n020941144831.JPEG’, ’n0211188910472.JPEG’, ’n021136249125.JPEG’, ’n020974749719.JPEG’, ’n020944332451.JPEG’, ’n020958896464.JPEG’, ’n02093256458.JPEG’, ’n020911342732.JPEG’, ’n020912442622.JPEG’, ’n020941142169.JPEG’, ’n020906222337.JPEG’, ’n021015566764.JPEG’, ’n020960511459.JPEG’, ’n020870469056.JPEG’, ’n020981058405.JPEG’, ’n021121375696.JPEG’, ’n021108067949.JPEG’, ’n020972982420.JPEG’, ’n020856206814.JPEG’, ’n021089151703.JPEG’, ’n0210087719273.JPEG’, ’n021065503765.JPEG’, ’n021073123524.JPEG’, ’n021118892963.JPEG’, ’n021136249129.JPEG’, ’n020970473200.JPEG’, ’n020932568365.JPEG’, ’n020939919420.JPEG’, ’n021121371635.JPEG’, ’n021111293530.JPEG’, ’n021010068123.JPEG’, ’n021020405033.JPEG’, ’n02113624437.JPEG’, ’n020906225866.JPEG’, ’n021108063711.JPEG’, ’n0211213714788.JPEG’, ’n021051627406.JPEG’, ’n020970475061.JPEG’, ’n0210842211587.JPEG’, ’n020914674265.JPEG’, ’n0209146712683.JPEG’, ’n021043653628.JPEG’, ’n020866463314.JPEG’, ’n02099849736.JPEG’, ’n021007358112.JPEG’, ’n0211201812764.JPEG’, ’n0209342811175.JPEG’, ’n021106279822.JPEG’, ’n0210714224318.JPEG’, ’n021051625489.JPEG’, ’n020937545904.JPEG’, ’n02110958215.JPEG’, ’n020953144027.JPEG’, ’n021099613250.JPEG’, ’n021085517343.JPEG’, ’n0211062710272.JPEG’, ’n020883643099.JPEG’, ’n021108062721.JPEG’, ’n020953142261.JPEG’, ’n021065509870.JPEG’, ’n021075743991.JPEG’, ’n020955703288.JPEG’, ’n0208607939042.JPEG’, ’n020962949416.JPEG’, ’n021108066528.JPEG’, ’n0208846611397.JPEG’, ’n02092002996.JPEG’, ’n020984138605.JPEG’, ’n02085620712.JPEG’, ’n021002363011.JPEG’, ’n020866467788.JPEG’, ’n020856204661.JPEG’, ’n020981051746.JPEG’, ’n021136248608.JPEG’, ’n020974741168.JPEG’, ’n021076831496.JPEG’, ’n0211018512849.JPEG’, ’n0208562011946.JPEG’, ’n0208739416385.JPEG’, ’n0211080622671.JPEG’, ’n02113624526.JPEG’, ’n0209629412642.JPEG’, ’n021130237510.JPEG’, ’n0208836413285.JPEG’, ’n020958892977.JPEG’, ’n021050569215.JPEG’, ’n021023189744.JPEG’, ’n0209729811834.JPEG’, ’n0211127716201.JPEG’, ’n020857828518.JPEG’, ’n0211397811280.JPEG’, ’n0210638210700.JPEG’.

cat:

’n02123394661.JPEG’, ’n0212304511954.JPEG’, ’n021233943695.JPEG’, ’n021233942692.JPEG’, ’n0212359712166.JPEG’, ’n021230457014.JPEG’, ’n021231592777.JPEG’, ’n02123394684.JPEG’, ’n02124075543.JPEG’, ’n021235977557.JPEG’, ’n021240757857.JPEG’, ’n021235973770.JPEG’, ’n021240754986.JPEG’, ’n02123045568.JPEG’, ’n021233941541.JPEG’, ’n021235973498.JPEG’, ’n0212359710304.JPEG’, ’n021233942084.JPEG’, ’n021235975283.JPEG’, ’n0212359713807.JPEG’, ’n0212407512282.JPEG’, ’n021235978575.JPEG’, ’n0212304511787.JPEG’, ’n02123394888.JPEG’, ’n021230451815.JPEG’, ’n021233947614.JPEG’, ’n0212359727865.JPEG’, ’n021240751279.JPEG’, ’n021233944775.JPEG’, ’n02123394976.JPEG’, ’n021233948385.JPEG’, ’n0212359714791.JPEG’, ’n0212304510424.JPEG’, ’n021235977698.JPEG’, ’n021240758140.JPEG’, ’n021230453754.JPEG’, ’n021235971819.JPEG’, ’n02123597395.JPEG’, ’n02123394415.JPEG’, ’n021240759747.JPEG’, ’n021230459467.JPEG’, ’n021231596842.JPEG’, ’n021233949611.JPEG’, ’n021235977283.JPEG’, ’n0212359711799.JPEG’, ’n02123597660.JPEG’, ’n021230457511.JPEG’, ’n0212359710723.JPEG’, ’n021231597836.JPEG’, ’n0212359714530.JPEG’, ’n0212359728555.JPEG’, ’n021233946079.JPEG’, ’n021233946792.JPEG’, ’n0212359711564.JPEG’, ’n021235978916.JPEG’, ’n02124075123.JPEG’, ’n021230455150.JPEG’, ’n02124075353.JPEG’, ’n0212359712941.JPEG’, ’n0212304510095.JPEG’, ’n021235976533.JPEG’, ’n021230454611.JPEG’, ’n02123597754.JPEG’, ’n021233948561.JPEG’, ’n021235976409.JPEG’, ’n021231594909.JPEG’, ’n02123597564.JPEG’, ’n021233941633.JPEG’, ’n021233941196.JPEG’, ’n021233942787.JPEG’, ’n0212407510542.JPEG’, ’n021235976242.JPEG’, ’n021235973063.JPEG’, ’n0212359713164.JPEG’, ’n021230457449.JPEG’, ’n0212304513299.JPEG’, ’n021233948165.JPEG’, ’n021233941852.JPEG’, ’n021235978771.JPEG’, ’n021231596581.JPEG’, ’n021233945906.JPEG’, ’n021240752747.JPEG’, ’n0212407511383.JPEG’, ’n021235973919.JPEG’, ’n021233942514.JPEG’, ’n021240757423.JPEG’, ’n021233946968.JPEG’, ’n021230454850.JPEG’, ’n0212304510689.JPEG’, ’n0212407513539.JPEG’, ’n0212359713378.JPEG’, ’n021231594847.JPEG’, ’n021233941798.JPEG’, ’n0212359727951.JPEG’, ’n02123159587.JPEG’, ’n021235971825.JPEG’, ’n021231592200.JPEG’, ’n0212359712.JPEG’, ’n021235976778.JPEG’, ’n021235976693.JPEG’, ’n0212304511782.JPEG’, ’n0212359713706.JPEG’, ’n021233949032.JPEG’, ’n021240754459.JPEG’, ’n0212359713752.JPEG’, ’n021233942285.JPEG’, ’n021235971410.JPEG’, ’n021231596134.JPEG’, ’n0212359711290.JPEG’, ’n021235976347.JPEG’, ’n021233941789.JPEG’, ’n0212304511255.JPEG’, ’n021233946096.JPEG’, ’n021233944081.JPEG’, ’n021233945679.JPEG’, ’n021233942471.JPEG’, ’n021231595797.JPEG’, ’n0212359713894.JPEG’, ’n0212407510854.JPEG’, ’n021233948605.JPEG’, ’n021240758281.JPEG’, ’n0212359711724.JPEG’, ’n021233948242.JPEG’, ’n021233943569.JPEG’, ’n0212359710639.JPEG’, ’n021230453818.JPEG’, ’n021240756459.JPEG’, ’n02123394185.JPEG’, ’n021235978961.JPEG’, ’n021240759743.JPEG’, ’n021233941627.JPEG’, ’n0212359713175.JPEG’, ’n021230452694.JPEG’, ’n021235974537.JPEG’, ’n021235976400.JPEG’, ’n021230457423.JPEG’, ’n021235973004.JPEG’, ’n021233942988.JPEG’, ’n021240759512.JPEG’, ’n021233946318.JPEG’, ’n021235971843.JPEG’, ’n021240752053.JPEG’, ’n021235973828.JPEG’, ’n0212339414.JPEG’, ’n021233948141.JPEG’, ’n021240751624.JPEG’, ’n02123597459.JPEG’, ’n021240756405.JPEG’, ’n021230458595.JPEG’, ’n021231593226.JPEG’, ’n021240759141.JPEG’, ’n021235972031.JPEG’, ’n021230452354.JPEG’, ’n021235976710.JPEG’, ’n021235976613.JPEG’, ’n021231591895.JPEG’, ’n021233942953.JPEG’, ’n021233945846.JPEG’, ’n02123394513.JPEG’, ’n0212304516637.JPEG’, ’n021233947848.JPEG’, ’n021233943229.JPEG’, ’n021230458881.JPEG’, ’n021233948250.JPEG’, ’n021240757651.JPEG’, ’n02123394200.JPEG’, ’n021233942814.JPEG’, ’n021230456445.JPEG’, ’n021233942467.JPEG’, ’n021230453317.JPEG’, ’n021235971422.JPEG’, ’n0212359713442.JPEG’, ’n021233948225.JPEG’, ’n021235979337.JPEG’, ’n0212339432.JPEG’, ’n021233942193.JPEG’, ’n021233941625.JPEG’, ’n021235978799.JPEG’, ’n0212359713241.JPEG’, ’n021235977681.JPEG’, ’n021235974550.JPEG’, ’n021235973896.JPEG’, ’n021233949554.JPEG’, ’n0212407513600.JPEG’, ’n02123394571.JPEG’, ’n0212359710886.JPEG’, ’n021230456741.JPEG’, ’n0212304510438.JPEG’, ’n021230459954.JPEG’.

spider:

’n01775062517.JPEG’, ’n0177475018017.JPEG’, ’n0177438413186.JPEG’, ’n017747503115.JPEG’, ’n017750625075.JPEG’, ’n017735491541.JPEG’, ’n017750624867.JPEG’, ’n017750628156.JPEG’, ’n017747507128.JPEG’, ’n017750624632.JPEG’, ’n017735498734.JPEG’, ’n017735492274.JPEG’, ’n0177354910298.JPEG’, ’n017743841811.JPEG’, ’n017747507498.JPEG’, ’n0177475010265.JPEG’, ’n017735491964.JPEG’, ’n017747503268.JPEG’, ’n017735496095.JPEG’, ’n017750628812.JPEG’, ’n0177475010919.JPEG’, ’n017750621180.JPEG’, ’n017735497275.JPEG’, ’n017735499346.JPEG’, ’n017735498243.JPEG’, ’n017750623127.JPEG’, ’n0177354910608.JPEG’, ’n017735493442.JPEG’, ’n017731571487.JPEG’, ’n017747507775.JPEG’, ’n01775062419.JPEG’, ’n017747507638.JPEG’, ’n01775062847.JPEG’, ’n017747503154.JPEG’, ’n017735491534.JPEG’, ’n017731571039.JPEG’, ’n017750625644.JPEG’, ’n017750628525.JPEG’, ’n01773797216.JPEG’, ’n01775062900.JPEG’, ’n017747508513.JPEG’, ’n017747503424.JPEG’, ’n017747503085.JPEG’, ’n017750623662.JPEG’, ’n0177438415681.JPEG’, ’n01774750326.JPEG’, ’n017731579503.JPEG’, ’n017747503332.JPEG’, ’n017747502799.JPEG’, ’n0177315710606.JPEG’, ’n017731571905.JPEG’, ’n01773549379.JPEG’, ’n01773797597.JPEG’, ’n017731573226.JPEG’, ’n017747507875.JPEG’, ’n0177438416102.JPEG’, ’n017735492832.JPEG’, ’n017750625072.JPEG’, ’n017735494278.JPEG’, ’n017735495854.JPEG’, ’n017743841998.JPEG’, ’n0177475013875.JPEG’, ’n017750628270.JPEG’, ’n017735492941.JPEG’, ’n017747505235.JPEG’, ’n017735494150.JPEG’, ’n017747506217.JPEG’, ’n017750623137.JPEG’, ’n017747505480.JPEG’, ’n0177438411955.JPEG’, ’n017750628376.JPEG’, ’n017731572688.JPEG’, ’n017735496825.JPEG’, ’n0177475010422.JPEG’, ’n0177438420786.JPEG’, ’n01773549398.JPEG’, ’n017735494965.JPEG’, ’n017747507470.JPEG’, ’n017750621379.JPEG’, ’n017743842399.JPEG’, ’n017735499799.JPEG’, ’n01775062305.JPEG’, ’n0177438415519.JPEG’, ’n017747503333.JPEG’, ’n017747502604.JPEG’, ’n017747503134.JPEG’, ’n017747504646.JPEG’, ’n017750625009.JPEG’, ’n0177475010200.JPEG’, ’n017750627964.JPEG’, ’n017743842458.JPEG’, ’n017737973333.JPEG’, ’n017747509987.JPEG’, ’n017735495790.JPEG’, ’n01773549854.JPEG’, ’n0177475011370.JPEG’, ’n0177475010698.JPEG’, ’n017747509287.JPEG’, ’n017737976703.JPEG’, ’n01773797931.JPEG’, ’n017735495280.JPEG’, ’n017737975385.JPEG’, ’n017737971098.JPEG’, ’n01774750436.JPEG’, ’n0177438413770.JPEG’, ’n017747509780.JPEG’, ’n017747508640.JPEG’, ’n01774750653.JPEG’, ’n0177438412554.JPEG’, ’n017747509716.JPEG’

snake:

’n017370217081.JPEG’, ’n0172857216119.JPEG’, ’n0173518910620.JPEG’, ’n017517483573.JPEG’, ’n017293226690.JPEG’, ’n0173518920703.JPEG’, ’n017344184792.JPEG’, ’n017499392784.JPEG’, ’n017299774113.JPEG’, ’n017562916505.JPEG’, ’n017421723003.JPEG’, ’n0172857219317.JPEG’, ’n017393815838.JPEG’, ’n017370211381.JPEG’, ’n017499394704.JPEG’, ’n0175558110792.JPEG’, ’n017299779474.JPEG’, ’n0174440111909.JPEG’, ’n0173938110303.JPEG’, ’n01749939820.JPEG’, ’n0172857227743.JPEG’, ’n0173441812057.JPEG’, ’n017421728636.JPEG’, ’n0172997714112.JPEG’, ’n017393816286.JPEG’, ’n01734418761.JPEG’, ’n0174013113437.JPEG’, ’n017289209571.JPEG’, ’n017534884234.JPEG’, ’n017499395712.JPEG’, ’n017393816072.JPEG’, ’n017393817683.JPEG’, ’n017293229202.JPEG’, ’n0175174813413.JPEG’, ’n017562914626.JPEG’, ’n017421729733.JPEG’, ’n0173702112610.JPEG’, ’n0173938187.JPEG’, ’n017299771134.JPEG’, ’n01753488637.JPEG’, ’n0174826418478.JPEG’, ’n0172857222360.JPEG’, ’n017370213386.JPEG’, ’n01751748560.JPEG’, ’n0175174818223.JPEG’, ’n017499395750.JPEG’, ’n017482647044.JPEG’, ’n017393811163.JPEG’, ’n01751748311.JPEG’, ’n017562919028.JPEG’, ’n0173938110473.JPEG’, ’n017285721415.JPEG’, ’n0172932210918.JPEG’, ’n01748264653.JPEG’, ’n0175348810957.JPEG’, ’n017562913990.JPEG’, ’n0175629111915.JPEG’, ’n017562916776.JPEG’, ’n0174013111661.JPEG’, ’n017299775715.JPEG’, ’n0173702116733.JPEG’, ’n0175348815197.JPEG’, ’n017444017248.JPEG’, ’n017285727661.JPEG’, ’n0174013113680.JPEG’, ’n017293225446.JPEG’, ’n017499396508.JPEG’, ’n017482642140.JPEG’, ’n0172997716782.JPEG’, ’n017482647602.JPEG’, ’n0175629117857.JPEG’, ’n01729977461.JPEG’, ’n0174217220552.JPEG’, ’n017351893258.JPEG’, ’n017289209265.JPEG’, ’n0174826418133.JPEG’, ’n0174826416699.JPEG’, ’n017393811006.JPEG’, ’n0175348810555.JPEG’, ’n017517483202.JPEG’, ’n017344183929.JPEG’, ’n017517485908.JPEG’, ’n017517488470.JPEG’, ’n017393813598.JPEG’, ’n01739381255.JPEG’, ’n0172997715657.JPEG’, ’n0174826421477.JPEG’, ’n017517482912.JPEG’, ’n017289209154.JPEG’, ’n0172857217552.JPEG’, ’n0174013114560.JPEG’, ’n017293225947.JPEG’.

Broccoli:

’n077149908640.JPEG’, ’n077149905643.JPEG’, ’n077149907777.JPEG’, ’n07714990888.JPEG’, ’n077149903398.JPEG’, ’n077149904576.JPEG’, ’n077149908554.JPEG’, ’n077149901957.JPEG’, ’n077149904201.JPEG’, ’n077149903130.JPEG’, ’n077149904115.JPEG’, ’n07714990524.JPEG’, ’n077149906504.JPEG’, ’n077149903125.JPEG’, ’n077149905838.JPEG’, ’n077149901779.JPEG’, ’n077149906393.JPEG’, ’n077149901409.JPEG’, ’n077149904962.JPEG’, ’n077149907282.JPEG’, ’n077149907314.JPEG’, ’n0771499011933.JPEG’, ’n077149901202.JPEG’, ’n077149903626.JPEG’, ’n077149907873.JPEG’, ’n077149903325.JPEG’, ’n077149903635.JPEG’, ’n0771499012524.JPEG’, ’n0771499014952.JPEG’, ’n077149907048.JPEG’, ’n07714990500.JPEG’, ’n077149907950.JPEG’, ’n077149902445.JPEG’, ’n077149901294.JPEG’, ’n077149907336.JPEG’, ’n0771499014743.JPEG’, ’n077149901423.JPEG’, ’n077149902185.JPEG’, ’n077149906566.JPEG’, ’n07714990567.JPEG’, ’n077149901532.JPEG’, ’n077149905212.JPEG’, ’n077149908971.JPEG’, ’n077149906116.JPEG’, ’n077149905462.JPEG’, ’n077149907644.JPEG’, ’n077149908596.JPEG’, ’n077149901138.JPEG’, ’n0771499015078.JPEG’, ’n077149901602.JPEG’, ’n077149902460.JPEG’, ’n07714990159.JPEG’, ’n077149909445.JPEG’, ’n07714990471.JPEG’, ’n077149901777.JPEG’, ’n077149909760.JPEG’, ’n077149901528.JPEG’, ’n0771499012338.JPEG’, ’n077149902201.JPEG’, ’n077149906850.JPEG’, ’n077149904492.JPEG’, ’n077149907791.JPEG’, ’n077149909752.JPEG’, ’n077149901702.JPEG’, ’n077149903682.JPEG’, ’n0771499014342.JPEG’, ’n077149902661.JPEG’, ’n077149905467.JPEG’.

Cabbage:

’n0771457114784.JPEG’, ’n077145714795.JPEG’, ’n0771457111969.JPEG’, ’n077145711394.JPEG’, ’n077145714155.JPEG’, ’n077145713624.JPEG’, ’n0771457113753.JPEG’, ’n077145717351.JPEG’, ’n0771457110316.JPEG’, ’n077145717235.JPEG’, ’n0771457117716.JPEG’, ’n077145711639.JPEG’, ’n077145715107.JPEG’, ’n077145714109.JPEG’, ’n0771457111878.JPEG’, ’n0771457115910.JPEG’, ’n0771457114401.JPEG’, ’n077145712741.JPEG’, ’n077145718576.JPEG’, ’n077145711624.JPEG’, ’n0771457113479.JPEG’, ’n077145712715.JPEG’, ’n077145713676.JPEG’, ’n0771457112371.JPEG’, ’n077145714829.JPEG’, ’n077145713922.JPEG’, ’n0771457110377.JPEG’, ’n077145718040.JPEG’, ’n077145718147.JPEG’, ’n0771457110377.JPEG’, ’n077145718040.JPEG’, ’n077145715730.JPEG’, ’n0771457116460.JPEG’, ’n077145718198.JPEG’, ’n077145711095.JPEG’, ’n077145713922.JPEG’, ’n077145717745.JPEG’, ’n077145716301.JPEG’.