The Attack Generator: A Systematic Approach Towards Constructing Adversarial Attacks

by   Felix Assion, et al.
neurocat GmbH

Most state-of-the-art machine learning (ML) classification systems are vulnerable to adversarial perturbations. As a consequence, adversarial robustness poses a significant challenge for the deployment of ML-based systems in safety- and security-critical environments like autonomous driving, disease detection or unmanned aerial vehicles. In the past years we have seen an impressive amount of publications presenting more and more new adversarial attacks. However, the attack research seems to be rather unstructured and new attacks often appear to be random selections from the unlimited set of possible adversarial attacks. With this publication, we present a structured analysis of the adversarial attack creation process. By detecting different building blocks of adversarial attacks, we outline the road to new sets of adversarial attacks. We call this the "attack generator". In the pursuit of this objective, we summarize and extend existing adversarial perturbation taxonomies. The resulting taxonomy is then linked to the application context of computer vision systems for autonomous vehicles, i.e. semantic segmentation and object detection. Finally, in order to prove the usefulness of the attack generator, we investigate existing semantic segmentation attacks with respect to the detected defining components of adversarial attacks.



There are no comments yet.


page 8

page 12


Evaluating Adversarial Attacks on Driving Safety in Vision-Based Autonomous Vehicles

In recent years, many deep learning models have been adopted in autonomo...

Towards robust sensing for Autonomous Vehicles: An adversarial perspective

Autonomous Vehicles rely on accurate and robust sensor observations for ...

Evaluating the Robustness of Semantic Segmentation for Autonomous Driving against Real-World Adversarial Patch Attacks

Deep learning and convolutional neural networks allow achieving impressi...

Resilience of Autonomous Vehicle Object Category Detection to Universal Adversarial Perturbations

Due to the vulnerability of deep neural networks to adversarial examples...

Minimizing Perceived Image Quality Loss Through Adversarial Attack Scoping

Neural networks are now actively being used for computer vision tasks in...

Composite Adversarial Attacks

Adversarial attack is a technique for deceiving Machine Learning (ML) mo...

Maximum Mean Discrepancy is Aware of Adversarial Attacks

The maximum mean discrepancy (MMD) test, as a representative two-sample ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Recent advances in the field of machine learning have aroused the interest to apply these techniques in safety- and security-critical application contexts. One example is the integration of convolutional neural network-based dense classifiers into autonomous cars 

[23, 16]. In this challenging domain, we require not only a high accuracy on the true underlying data distribution, but also the trained ML module to be able to deal with maliciously crafted inputs.

Unfortunately, the last few years have shown that current state-of-the-art ML algorithms, in particular deep neural networks, are quite brittle. With the publications of Szegedy et al[38] and Goodfellow et al[18] as a starting point, adversarial examples have been recognized as significant weak points.

An adversarial example is an input data point that is slightly perturbed by an adversarial perturbation to cause misclassifications. These adversarial perturbations are created by an adversary with the help of an adversarial attack and are often hard to detect or even imperceptible to the human eye. The imperceptibility is not only challenging for the desired deployment in safety- and security-critical industries, but also hints at a crucial difference between the sensory information processing in humans and in artificial neural networks [7]. Since the discovery of this vulnerability, a lot of different adversarial attacks and defenses have been published, e.g[11, 10, 37]. It has become an arms race between attackers and defenders [33].

The development of new adversarial attacks remains to be one key objective of adversarial robustness research. This is due to the fact that adversarial attacks play a central role in the context of robustifying ML systems, as well as during the evaluation of adversarial robustness. For example, adversarial attacks are often part of a defense strategy. Currently, there does not exist any defense mechanism that is fully satisfactory, although adversarial training shows promising results. Adversarial training integrates adversarial examples into the training procedure, i.e. the neural network is trained on a mixture of clean and adversarial data points [39, 21, 25]. Thus, this defense strongly depends on adversarial attacks, which can provide the needed adversarial perturbations. At the same time, adversarial attacks are also central for the evaluation of whether or not a deep neural network is robust. Ideally, the robustness evaluation process should be independent of concrete attacks and instead, build on provable verification techniques, i.e. methods that can issue robustness guarantees [40, 13]. Unfortunately, these provable approaches are not yet scalable to complex tasks like semantic segmentation or object detection. As a consequence, one has to again rely on a set of adversarial attacks for the evaluation process.

This raises the question how one can develop large, diverse sets of strong adversarial attacks, which can help with the hardening and the evaluation of neural networks. Although the attack research is flourishing, this question has not been answered. Even the leading software toolboxes, like the Adversarial Robustness Toolbox [29], CleverHans [31] or the Foolbox [34], still offer rather limited collections of benchmark attacks. This also implies that most of the defense proposals are only evaluated against a handful of arbitrarily selected attacks. Furthermore, we still miss broadly accepted attack-based benchmark challenges for safety- and security-critical tasks. These constraints are the result of the current modus operandi in the development of new attacks. Adversarial attacks are basically published one by one with a fixed threat model in mind. For example, the first wave of adversarial attacks was very much fixated on white-box, targeted attacks with -imperceptibility constraints for simple classification tasks [43].

Up until now, we missed the chance to analyze adversarial attacks on a structural level. A structural analysis helps us understand the defining parts of an adversarial attack, i.e. see an adversarial attack as a composition of various elements. In this way, one could shift the research focus from arbitrarily assembled attacks to defining new potential elements of the detected building blocks of an attack. This view would directly increase the number of adversarial attacks significantly, since every new element implies a large number of new adversarial attacks, namely all potential combinations with other compatible building block elements. This modular structure of an attack has already been partially recognized by the research community within the discussion of imperceptibility metrics [24]. Every adversarial attack contains some kind of imperceptibility measure. Traditionally, there has been a strong focus on -norms as the driver of imperceptibility to the human eye [36]. Recently, a lot of publications suggest other quantifiers to measure perceptual similarity within an adversarial attack, e.g[14, 41]. This is already a significant progress, since every known adversarial attack can now be updated by exchanging -balls with these new proposed measures.

In this paper, we take a first step towards the detection of structural similarities between adversarial attacks. We acknowledge that adversarial attacks can be viewed as (constrained) optimization problems combined with optimization methods, which try to find a solution of the optimization problem. With the help of an adversarial perturbation taxonomy, we further define building blocks and various influencing factors of the optimization problem and optimization method of an adversarial attack. Finally, we test our conceptual ideas by analyzing prominent existing attacks.

In summary, our key contributions are:

  • We consolidate and extend existing adversarial perturbation taxonomy approaches. The different dimensions of the proposed taxonomy are then equipped with potential options for the adversary, which are loosely connected to the computer vision task for autonomous driving. However, the taxonomy can easily be applied to other domains by adjusting the options within the taxonomy dimensions.

  • We argue that adversarial attacks are a composition of different quantifiers / measures, which can be grouped and can be directly linked to the different dimensions and options of the taxonomy. We then suggest a deeper investigation of new measures linked to the taxonomy dimensions. In this way, we pave the way to the fast generation of new attack sets, i.e. outline the ”attack generator”.

  • We validate our conceptual ideas by investigating the semantic segmentation adversarial attacks introduced in [27]. Furthermore, we present first small experiments, where we deduce new attacks by exchanging various measures of the original attack formulations.

2 Taxonomy of Adversarial Perturbations

In this section we want to taxonomize adversarial perturbations along multiple dimensions, hence describe different classes of adversarial perturbations. These classes are helpful in a variety of contexts. Especially when considering adversarial robustness as a security issue, it becomes crucial to analyze essential properties of a realistic threat. In the past, publications were largely concerned with perturbation classes, which do not relate to specific security concerns [17]. Thus, there is an obvious need to further clarify realistic threat scenarios, in order to close the gap between the literature and the concerns related to the actual deployment of ML systems. It has to be noted that what constitutes a relevant, realistic threat is highly application-specific. However, a general taxonomy can provide the necessary structural framework for this risk evaluation.

Taxonomy approaches for adversarial perturbations, adversarial examples or adversarial attacks have already been presented in several publications, e.g[35, 9, 43, 17, 30]. In the following, we consolidate and extend these taxonomy approaches. Additionally, we explore options within the different dimensions of the taxonomy. While the dimensions of the taxonomy are application independent, some of the options are motivated by the computer vision task for autonomous driving. The dimensions of this taxonomy proposal are inspired by the framework for empirical evaluation of classifier security presented in [4, 5].

In general, we recognize two central questions when classifying adversarial perturbations: Who created the adversarial perturbation? And which attack strategy led him to the perturbation at hand? As a consequence, the proposed taxonomy consists of the two high-level dimensions ”threat model” and ”attack strategy”. The threat model summarizes the most important information about the adversary. Influenced by his goals, knowledge and constraints, the adversary then develops an attack strategy, which ultimately results in an adversarial attack and thus, the considered adversarial perturbation.

2.1 Threat Model

The threat model characterizes the attacker. It usually specifies his goals, knowledge and capabilities (constraints). Thus, we suggest to further decompose the threat model into these three sub-dimensions.

2.1.1 Adversary’s Goals

The overall objective of the adversary is to force the victim model to make mistakes with the help of an adversarial perturbation. But this rather broad goal can be further specified by discussing the type of output the adversary desires (specificity) and defining the scope in which the perturbation should be successful in harming the ML system (perturbation scope). Furthermore, one key premise of an adversarial perturbation is that it should be imperceptible or inconspicuous. Since imperceptibility is still a very abstract concept, the adversary usually has a more specific type of imperceptibility in mind (perturbation imperceptibility). We will now go through the different aspects of the adversary’s goals and equip them with suitable options for the adversary. It should be noted that options are not necessarily mutually exclusive. This will also be true for options presented in other dimensions of the taxonomy.

Specificity: What are the desired consequences of the adversarial perturbation?

  • Untargeted (Non-targeted): The goal is to craft a perturbation which results in as many misclassifications as possible. There is no preference concerning the appearing classes in the adversarial output [1].

  • Static Target: The perturbation should lead to a fixed classification output, which is essentially independent of the input point added to the perturbation [27]. For example, the perturbation always forces the victim model to output one fixed image of an empty street without any pedestrians or cars in sight.

  • Dynamic Target: This type of goal has also been introduced by Metzen et al[27] in the context of attacking semantic image segmentation. Here, the adversarial perturbation aims at keeping the ML module’s output unchanged with the exception of removing certain target classes. The desired classification output depends on the input point which is combined with the crafted perturbation. Removing the pedestrian class in every possible traffic situation is an example for a dynamic target objective.

  • Confusing Target (Confusion): The adversarial perturbation should keep the classification output unchanged with the exception of changing the position or size of certain target classes. As in the dynamic target setting, the desired output is related to the considered input image. As an example, one can think of an adversarial perturbation that reduces the size of pedestrians and in this way leads to a false sense of distance.

Perturbation Scope: What is the desired application scope of the adversarial perturbation?

  • Individual Scope: The perturbation is crafted for one specific input image, i.e. one specific adversarial example is the target of the adversary. It is not necessary that the same perturbation fools the ML system on other data points.

  • Contextual Scope: The goal is to create a fixed image-agnostic perturbation that causes label changes for one or more specific contextual situations. For example, the perturbation works for traffic situations on snowy or rainy days and is then able to fool the victim model under the majority of angles, distances and lighting effects.

  • Universal Scope: The goal is to create a fixed image-agnostic perturbation that causes label changes for a significant part of the true data distribution with no explicit contextual dependencies. This scope has first been proposed by Moosavi-Dezfooli et al[28] and has been further analyzed in [27, 32].

Perturbation Imperceptibility: In which way should the perturbation be imperceptible?

  • -based Imperceptibility: Due to small changes with respect to some -norm, the human observer should not be able to detect the adversarial perturbation when applied to one or more input images.

  • Attention-based Imperceptibility: Due to unremarkable changes, the human observer should not be able to detect the adversarial perturbation when applied to one or more input data points. These unremarkable changes are not motivated by a -norm, but are rather the result of other measures of perceptual similarity. Examples are perturbations based on rotations and translations [42], Wasserstein distance [41] or SSIM [36].

  • Output Imperceptibility: A human observer can not easily detect irregularities in the classification output whenever the adversarial perturbation is applied. For instance, adversarial examples still lead to plausible traffic situations and misclassifications are integrated unobtrusively into their environment.

  • Detector Imperceptibility: A predefined selection of software-based detection systems is not able to detect irregularities in the input, output or in the activation patterns of the ML module caused by the adversarial perturbation. Hence, the adversary tries not only to mislead the victim model, but also adversarial example detectors placed around the victim model [27, 26].

2.1.2 Adversary’s Knowledge

The knowledge of the adversary can be divided into ”knowledge about the victim model and its parameters” and ”knowledge about the training data set” [3]. The publications [20, 19] were used as a basis for the following list of options.

Model Knowledge: What does the adversary know about the ML model and its parameters?

  • White-box: The adversary has full knowledge of the model internals, hence is aware of the concrete architecture, all parameter / weight configurations and possibly even the training strategy.

  • Output-transparent Black-box

    : The adversary can not retrieve model parameters, but he can observe all or parts of the class probabilities or logits of the ML module’s output.

  • Query-limited Black-box: The adversary can not access relevant model parameters, but he can observe the full or parts of the module’s output on a limited number of inputs or with a limited frequency.

  • Label-only Black-box: The adversary can neither access relevant model parameters nor the class probabilities or logits, but he can observe the full or parts of the final classification decisions of the system, i.e. only access to inferred label (argmax layer).

  • (Full) Black-box: The adversary can neither retrieve relevant model parameters nor can he directly observe the output of the ML system. As a consequence, adversarial perturbations have to be created without querying the victim model.

Data Knowledge: What does the adversary know about the data sets which have been used to train the ML system?

  • Training Data: The full or at least a significant part of the training data is available to the adversary.

  • Surrogate Data: There is no direct access to the original training data, but the adversary can collect data points from the relevant underlying data distribution of the victim model’s environment. In the case of computer vision for autonomous driving, this is the minimal degree of data knowledge, since the adversary can always easily gather images or videos of traffic situations.

2.1.3 Adversary’s Capabilities

Traditionally, this threat model characteristic clarifies the abilities and constraints of the adversary, thus outlines the attacker’s power during his attempt to attack the ML system [3]. In this taxonomy we only investigate attackers utilizing adversarial perturbations. Thus, the capabilities of the adversary are fully defined by his means of feeding perturbations to the victim model.

Input Constraints: How can the adversary feed malicious input to the victim model?

  • Digital Data Feed (Direct Data Feed): The attacker can directly feed digital input to the ML module. Hence, he can adjust specific float values of input images.

  • Physical Data Feed: The adversary can not directly feed digital input, instead he creates physical perturbations, e.g[2, 15]. He has to place these adversarial objects in the environment of the autonomous car, which finally fool the module when they appear in the field of view of the camera.

  • Spatial Constraint: It is not possible to place a physical or digital perturbation over the entire input image. Instead, the adversary can only influence limited areas of the input data.

2.2 Attack Strategy

An adversarial perturbation is not fully characterized by the goals, knowledge and constraints of the adversary. One is still lacking a few fundamental decisions the adversary made on his way to the concrete formulation of the adversarial attack which in the end generated the perturbation. These decisions are always governed by the threat model. In other words, the taxonomy dimension ”threat model” influences the decisions summarized in the ”attack strategy”.

The attack strategy should specify what kind of model and data basis is going to be handed to the attack. Additionally, the structure of an adversarial perturbation differs strongly with the central mathematical procedure used within the adversarial attack to search for perturbation candidates. We therefore propose the following decomposition of the attack strategy.

2.2.1 Attack Input

With an adversarial perturbation the attacker wants to force the victim model to make classification mistakes. But, this does not imply that an adversarial attack is necessarily taking the true victim model into account during the generation of the perturbation. Analogously, the attacker has to decide what kind of data he wants to give to the attack and this can again deviate from the set defined by his data knowledge (see: Section 2.1.2).

Model Basis: Which model is used by the adversarial attack?

  • Victim Model: The attack primarily utilizes the victim model in order to calculate adversarial perturbations.

  • Surrogate Model: The adversarial attack does not directly work with the victim model, but considers a surrogate model. This is often necessary if the adversary has only limited knowledge about the victim model or the victim model does not allow certain mathematical procedures [22].

Data Basis: Which data basis is used by the adversarial attack?

  • Training Data: Data points of the victim model’s original training data set are given to the adversarial attack.

  • Surrogate Data: The attack is primarily build on data that is related to the underlying data distribution of the task, but has not been previously used to train the ML system.

  • No Data: The adversary is not giving any task related data to the attack. Instead, the adversarial attack works with images that are not samples of the present data distribution [12].

2.2.2 Mathematical Procedure

With this dimension we try to summarize predominant mathematical tools that facilitate the detection of suitable adversarial perturbations. These tools are integrated into the adversarial attack itself.

Optimization Method: Which mathematical procedure is the key ingredient for the perturbation search of the attack?

  • First-order Methods: The adversarial attack tries to exploit perturbation directions given by exact or approximate (sub-)gradients.

  • Second-order Methods: The perturbation search is build on the calculation of the Hessian matrix or approximations of the Hessian matrix [38].

  • Evolution & Random Sampling: The adversarial attack generates possible perturbations by sampling distributions and combining promising candidates. One can often fasten these methods by integrating prior knowledge about the decision boundary of the ML module [8].

3 The Attack Generator

An adversarial attack consists of two parts: (1) A constrained optimization problem that has to be minimized over admissible perturbations; (2) An optimization method that searches for approximate solutions of the constrained optimization problem. These two components of an attack are not always explicitly stated within an attack publication, but most of the time they are straightforward to derive. As input, the attack usually takes some kind of data set and a callable model. Potential choices with respect to the attack input have been discussed in the attack strategy dimension of the taxonomy (see: Section 2.2.1). On the other hand, the output of an adversarial attack is the desired adversarial perturbation or an adversarial example, i.e. a combination of the perturbation with a specific input data point (see: Figure 1).

Furthermore, the optimization method has also been introduced as a key part of the attack strategy. We presented various options of the adversary with ”first-order methods” being the most common choice. Consequently, there is only one element of Figure 1 where we have not yet clarified its relation to the above presented taxonomy, namely the optimization problem of the attack.

Figure 1: Adversarial attacks can be viewed as an optimization problem together with an optimization method. It takes some model and data set as input in order to create the perturbation.

The optimization problem can abstractly be written as


where is the objective function that takes a perturbation as input and maps it to some fitness value in . Additionally, the objective function depends on the attack input and, in turn, on the provided ML-model and the data set . Often we can not take any arbitrary perturbation , but we are rather constrained as introduced in the taxonomy dimension ”input constraints” (see: Section 2.1.3). Thus, the given input constraints define an admissible set , which contains all potential perturbation candidates.

Now, let us take a closer look at the objective function : This function is the mathematical formalization of the goals of the adversary. For the adversary, minimizing the objective function is equivalent to achieving his goals with respect to specificity, perturbation imperceptibility and perturbation scope (see: Section 2.1.1). In order to arrive at this mathematical representation of his goals, the attacker has to initially define, directly or indirectly, quantifiers / measures that evaluate the level of specificity , the level of imperceptibility and the level of scope . These are again real-valued functions which take the perturbation as input and additionally depend on the attack input, hence depend on and the full or parts of the provided data set . Thus, if one wants to be more thorough, one should rather write with . To make these abstract ideas a little bit more tangible, let us discuss a few examples for the different quantifiers which are frequently used in the adversarial attack literature:

As already mentioned in the introduction, perturbation imperceptibility has in the past often been measured with the help of a -norm, mostly or . In these cases one has . Please note that we in general do not pose any mathematical requirements on the real-valued maps with . If the adversary defines , then is a norm. But, we can also imagine situations where one might want to consider distance measures or imperceptibility quantifiers that do not fulfill the metric or norm axioms. For instance, if the adversary is interested in detector imperceptibility (see: Section 2.1.1), then imperceptibility of a perturbation is equivalent to a set of detectors not recognizing the attack. This imperceptibility measure does not follow the norm axioms, e.g. due to binary output, measure is not absolutely homogeneous. In general, it is of utmost importance that the attack research looses its strong focus on -norms as imperceptibility measures, since one can not expect that an adversary will do the favor of sticking to this one option of the perturbation imperceptibility taxonomy dimension.

As a specificity measure

, attack researchers often make use of the original loss function

of the ML model. They insert the desired adversarial outcome instead of the true label of data point and define . In this example we see the usual dependence of on the input model and the input data set .

In the majority of existing attacks, the perturbation scope quantifier is closely connected to . As discussed in the taxonomy, the desired scope defines in which situations the adversarial perturbation should be successful in harming the ML system (see: Section 2.1.1

). To evaluate this, the adversary often takes Monte Carlo estimates over

, thus


with desired scope data set and being the cardinality of .

Finally, if one has determined these three goal measures, the objective function is just a composition of , and . In other words, the selection of these three measures essentially defines the optimization problem of the adversarial attack (see: Figure 2).

Figure 2: The optimization objective of an adversarial attack can be viewed as a composition of three adversary’s goal measures.

Going back to our previous examples, a sample composition is


where is a weighting factor. This gives us the following attack optimization problem


A lot of the published attack optimization problems introduce as an additional constraint instead of penalizing it in the objective function. In the setting of our example, this would lead to the following optimization problem


with imperceptibility constant. With an appropriate choice of the weighting constant , Equation (4) and (5) lead to similar, or sometimes even the same, solutions and therefore, this does not significantly undermine our perspective on the attack problem presented in Equation (1).

Overall, this gives us the insight that an adversarial attack consists of various building blocks, which are all linked to dimensions and options of the adversarial perturbation taxonomy (see: Appendix A). Creating a new adversarial attack is now equivalent to assembling adversary’s goal measures to form an optimization objective and equipping this with a suitable optimization method. The choice of the optimization method has to acknowledge constraints given by the input model and input data as well as additional constraints on the perturbation.

This modular view on an adversarial attack also outlines the path to the creation of sets of adversarial attacks instead of publishing one attack at a time. We have seen that the specificity, imperceptibility and scope quantifiers crucially define the adversarial attack. Thus, by investigating new measures of these kinds, one implicitly provides a number of new adversarial objective functions, namely all possible combinations with other adversary’s goal measures. Finally, this results in a set of new adversarial attacks. In the context of computer vision systems for autonomous driving, researchers could therefore go through the options listed in Section 2.1.1 and assign suitable quantifiers. This approach also helps us derive new adversarial attacks from existing ones by inserting alternative adversary’s goal measures. In the following, we will underline the benefit of our conceptual ideas by experimenting with the attacks presented in [27].

4 Experiments

Figure 3: Sample results of adapted semantic segmentation attacks on ICNet with single image as attack input (=1): (1) First column: Attack input image with original prediction of ICNet; (2) First row of right three columns - confusion: Attack enlarges pedestrian class ( = 15); (3) Second row of right three columns - attention-based imperceptibility: Attack removes pedestrian class with flow field perturbation.

We want to analyze two attacks introduced in [27] to further clarify the concepts presented in Section 3. Additionally, we show how the modular view facilitates the deduction of new adversarial attacks from existing ones.

Metzen et al[27] showed the existence of targeted, universal adversarial perturbations for state-of-the-art semantic segmentation neural networks. To generate these perturbations, Metzen et al. try to solve


where is the whole training data set of the victim model . The model output

consists of class probability vectors for every pixel of the input image

. Equivalently to the example of Section 3, the function denotes the loss function of the ML module, i.e. in this semantic segmentation setting


with spatial dimensions of an image and the cross entropy classification loss.

To solve the optimization problem of Equation (6), they follow an iterative gradient descent scheme, thus they exploit the white-box knowledge of the victim model by using a first-order optimization method. However, the key contribution of Metzen et al. is the proposed generation of the adversarial targets . As already mentioned in Section 2.1.1, they distinguish between a static and dynamic specificity target. In the static target case, one specific target segmentation is chosen for all input images, i.e.  for all . For the dynamic target of removing a certain classification class,

is determined by applying a nearest-neighbor heuristic to the predicted classification decision

of the network. To be more precise, one substitutes all one-hot vectors of the target class by one-hot vectors which encode the nearest alternative non-target class.

Now, let us take a look at the static and dynamic attack with the attack generator perspective of Section 3: As the imperceptibility measure we clearly have , i.e. imperceptibility is measured by the -norm of the perturbation. The two attacks differ in their specificity objective, namely


with adversarial targets and generated as described above. The scope measure is identical for the static as well as for the dynamic attack formulation. It is just the Monte Carlo estimation of the chosen specificity measure over the whole training set, thus


with . We recognize again that the attack optimization objective (see: Equation (6)) is a composition of the just defined adversary’s goal measures and we are in an analog setting as in the example of Section 3 (see: Equations (4), (5)).

After having worked out the different building blocks of the attacks, one can now think about exchanging different elements in order to derive new attacks on semantic segmentation modules. Recall that an adversarial attack consists of an optimization problem and an optimization method (see: Figure 1). Thus, one potential adaptation is the selection of a different optimization method. Within the attack strategy taxonomy dimension we provided two options other than first-order methods. Especially applying evolution & random sampling strategies might be beneficial, because they facilitate a perturbation search even if the adversary does not have full knowledge about the semantic segmentation module. However, we want to focus on changes concerning the attack optimization problem given by Equation (6). Changes here are basically equivalent to exchanging one or more of the three adversary’s goal measures.

Metzen et al. discuss static and dynamic targets, but they do not address untargeted or confusion specificity objectives (see: Section 2.1.1). A potential confusion goal could be to enlarge a target class, e.g. increase size of pedestrian class. This can be achieved by substituting the original specificity measures by the very similar measure , where is the adversarial confusion target for input image . The only difference is the generation of the target segmentation . Inspired by the original versions of the attacks, we again use a nearest-neighbor heuristic together with the predicted classification decisions to craft

. However, this time we exchange the one-hot vectors of the nearest-neighbors of our target class and always insert the one-hot vector of the target class. This automatically leads to a target segmentation with an enlarged target class. For the implementation of this target generation, we used the nearest-neighbor interpolation of the OpenCV resize method 

[6]. Keeping all other adversary’s goal quantifiers the same, this gives us a confusion semantic segmentation attacks. Figure 3 shows a sample result of this adapted attack on a self-trained ICNet for real-time semantic segmentation [44] (see also: Appendix B).

If one wants to keep the initial static and dynamic specificity measures, we could alternatively experiment with different imperceptibility quantifiers . Within the proposed taxonomy, we presented attention-based imperceptibility as an alternative option to -based imperceptibility. This imperceptibility option contains a lot of interesting perturbation concepts, e.g. adversarial perturbations generated through spatial transformation [42]. In the spatial transformation setting, the adversarial perturbation is a flow field which summarizes the per-pixel transformations of an image in order to get to the adversarial example . Hence, with being the function that applies the transformations of to the original image . As an imperceptibility measure, one can then consider the total variation of the flow field :


where are the image coordinates of the 4-pixel neighbors of coordinate . Note that is a flow field and hence for any spatial dimension . With this in mind, we can formulate an attention-based version of the presented semantic segmentation attacks:


with weighting factor and the static or dynamic target label of image . In Figure 3, we present an exemplary result of this attack on a traffic situation containing a pedestrian (see also: Appendix B).

5 Conclusion

In this paper, we present a comprehensive adversarial perturbation taxonomy together with options for the adversary linked to every taxonomy dimension. We describe adversarial attacks as a composition of various elements which are closely related to the options of the given taxonomy. In particular, we illustrate the crucial role of adversary’s goal measures in the creation of new adversarial attacks. This structured view on adversarial attacks facilitates the construction of sets of new attacks by investigating new specificity, perturbation imperceptibility and perturbation scope measures. Our experimental adaptations of existing semantic segmentation attacks demonstrate the benefits of this modular view on adversarial attacks.

We propose a change of the publication style of adversarial attacks. We are convinced that a stronger focus on the exploration of new potential attack building blocks, instead of presenting fully assembled attacks, will help structure the adversarial attack research field and furthermore, will fasten the development of large, diverse sets of benchmark adversarial attacks.


  • [1] A. Arnab, O. Miksik, and P. Torr (2018) On the robustness of semantic segmentation models to adversarial attacks.

    IEEE/CVF Conference on Computer Vision and Pattern Recognition

    , pp. 888–897.
    Cited by: 1st item.
  • [2] A. Athalye, L. Engstrom, A. Ilyas, and K. Kwok (2017) Synthesizing robust adversarial examples. PMLR 80, pp. 284–293. Cited by: Appendix A, 2nd item.
  • [3] B. Biggio, S. R. Bulo, I. Pillai, M. Mura, E. Z. Mequanint, M. Pelillo, and F. Roli (2014)

    Poisoning complete-linkage hierarchical clustering

    SSPR 8621, pp. 42-52. Cited by: §2.1.2, §2.1.3.
  • [4] B. Biggio, G. Fumera, and F. Roli (2014) Security evaluation of pattern classifiers under attack. IEEE Transactions on Knowledge and Data Engineering 26. Cited by: §2.
  • [5] B. Biggio and F. Roli (2018)

    Wild patterns: ten years after the rise of adversarial machine learning

    Pattern Recognition 84, pp. 317-331. Cited by: §2.
  • [6] G. Bradski (2000) The opencv library. Dr. Dobb’s Journal of Software Tools. Cited by: §4.
  • [7] W. Brendel, J. Rauber, and M. Bethge (2018) Decision-based adversarial attacks: reliable attacks against black-box machine learning models. ICLR. Cited by: Appendix A, §1.
  • [8] T. Brunner, F. Diehl, M. T. Le, and A. Knoll (2018) Guessing smart: biased sampling for efficient black-box adversarial attacks. arXiv: 1812.09803v2. Cited by: 3rd item.
  • [9] N. Carlini, A. Athalye, N. Papernot, W. Brendel, J. Rauber, D. Tsipras, I. Goodfellow, A. Mądry, and A. Kurakin (2019) On evaluating adversarial robustness. arXiv: 1902.06705v2. Cited by: §2.
  • [10] N. Carlini and D. Wagner (2016) Adversarial examples in the physical world. arXiv: 1607.02533v4. Cited by: §1.
  • [11] N. Carlini and D. Wagner (2017) Towards evaluating the robustness of neural networks. IEEE Symposium on Security and Privacy, pp. 39–57. Cited by: §1.
  • [12] Y. Du, M. Fang, J. Yi, J. Cheng, and D. Tao (2018) Towards query efficient black-box attacks: an input-free perspective.

    Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security

    , pp. 13–24.
    Cited by: 3rd item.
  • [13] K. Dvijotham, S. Gowal, R. Stanforth, R. Arandjelovic, B. O’Donoghue, J. Uesato, and P. Kohli (2018) Training verified learners with learned verifiers. arXiv preprint: 1805.10265v2. Cited by: §1.
  • [14] L. Engstrom, B. Tran, D. Tsipras, L. Schmidt, and A. Madry (2017) A rotation and a translation suffice: fooling cnns with simple transformations. arXiv: 1712.02779v3. Cited by: §1.
  • [15] K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, F. Tramer, A. Prakash, T. Kohno, and D. Song (2018) Physical adversarial examples for object detectors. WOOT’18 Proceedings of the 12th USENIX Conference on Offensive Technologies. Cited by: 2nd item.
  • [16] S. Gidaris and N. Komodakis (2015) Object detection via a multi-region & semantic segmentation-aware cnn model. IEEE International Conference on Computer Vision (ICCV), pp. 1134–1142. Cited by: §1.
  • [17] J. Gilmer, R. Adams, I. Goodfellow, D. Andersen, and G. Dahl (2018) Motivating the rules of the game for adversarial example research. arXiv: 1807.06732v2. Cited by: §2, §2.
  • [18] I. Goodfellow, J. Shlens, and C. Szegedy (2014) Explaining and harnessing adversarial examples. arXiv: 1412.6572v3. Cited by: Appendix A, §1.
  • [19] A. Ilyas, L. Engstrom, A. Bengio, and J. Lin (2018) Black-box adversarial attacks with limited queries and information. PMLR 80, pp. 2137–2146. Cited by: §2.1.2.
  • [20] A. Kurakin, I. Goodfellow, S. Bengio, Y. Dong, F. Liao, M. Liang, T. Pang, J. Zhu, X. Hu, and C. X. et al. (2018) Adversarial attacks and defences competition. The NIPS ’17 Competition: Building Intelligent Systems. Cited by: §2.1.2.
  • [21] A. Kurakin, I. Goodfellow, and S. Bengio (2016) Adversarial machine learning at scale. arXiv: 1611.01236v2. Cited by: §1.
  • [22] Y. Liu, X. Chen, C. Liu, and D. Song (2016) Delving into transferable adversarial examples and black-box attacks. arXiv: 1611.02770v3. Cited by: 2nd item.
  • [23] J. Long, E. Shelhamer, and T. Darrell (2015) Fully convolutional networks for semantic segmentation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440. Cited by: §1.
  • [24] B. Luo, Y. Liu, L. Wei, and Q. Xu (2018) Towards imperceptible and robust adversarial example attacks against neural networks. AAAI Publications, Thirty-Second AAAI Conference on Artificial Intelligence, pp. 1652–1659. Cited by: §1.
  • [25] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu (2018)

    Towards deep learning models resistant to adversarial attacks

    ICLR. Cited by: Appendix A, §1.
  • [26] D. Meng and H. Chen (2017) MagNet: a two-pronged defense against adversarial examples. CCS, pp. 135–147. Cited by: 4th item.
  • [27] J. H. Metzen, M. Kumar, T. Brox, and V. Fischer (2017) Universal adversarial perturbations against semantic image segmentation. IEEE International Conference on Computer Vision (ICCV), pp. 2774–2783. Cited by: 3rd item, 2nd item, 3rd item, 3rd item, 4th item, §3, §4, §4.
  • [28] S. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard (2017) Universal adversarial perturbations. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 86–94. Cited by: 3rd item.
  • [29] M. Nicolae, M. Sinn, M. N. Tran, A. Rawat, M. Wistuba, V. Zantedeschi, N. Baracaldo, B. Chen, H. Ludwig, I. M. Molloy, and B. Edwards (2018) Adversarial robustness toolbox v0.4.0. arXiv: 1807.01069v3. Cited by: §1.
  • [30] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami (2016) The limitations of deep learning in adversarial settings. IEEE European Symposium on Security and Privacy (EuroS&P), pp. 372–387. Cited by: §2.
  • [31] N. Papernot, F. Faghri, N. Carlini, I. Goodfellow, R. Feinman, A. Kurakin, C. Xie, Y. Sharma, T. Brown, and A. R. et al. (2016) Technical report on the cleverhans v2.1.0 adversarial examples library. arXiv: 1610.00768v6. Cited by: §1.
  • [32] J. Perolat, M. Malinowski, B. Piot, and O. Pietquin (2018) Playing the game of universal adversarial perturbations. arXiv: 1809.07802v2. Cited by: 3rd item.
  • [33] A. Raghunathan, J. Steinhardt, and P. Liang (2018) Semidefinite relaxations for certifying robustness to adversarial examples. NIPS. Cited by: §1.
  • [34] J. Rauber, W. Brendel, and M. Bethge (2017) Foolbox: a python toolbox to benchmark the robustness of machine learning models. arXiv: 1707.04131v3. Cited by: §1.
  • [35] A. Serban, E. Poll, and J. Visser (2018) Adversarial examples - a complete characterisation of the phenomenon. arXiv: 1810.01185v2. Cited by: §2.
  • [36] M. Sharif, L. Bauer, and M. K. Reiter (2018) On the suitability of lp-norms for creating and preventing adversarial examples. IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1686–1688. Cited by: §1, 2nd item.
  • [37] J. Su, D. V. Vargas, and S. Kouichi (2019) One pixel attack for fooling deep neural networks.

    IEEE Transactions on Evolutionary Computation

    Cited by: §1.
  • [38] C. Szegedy, W. Zaremba, I. Sutskever, J.Bruna, D. Erhan, I. Goodfellow, and R. Fergus (2013) Intriguing properties of neural networks. arXiv: 1312.6199v4. Cited by: Appendix A, §1, 2nd item.
  • [39] F. Tramer, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, and P. 35th35th (2018) Ensemble adversarial training: attacks and defenses. ICLR. Cited by: §1.
  • [40] E. Wong and J. Kolter (2018) Provable defenses against adversarial examples via the convex outer adversarial polytope. Proceedings of the 35th International Conference on Machine Learning 80, pp. 5286–5295. Cited by: §1.
  • [41] E. Wong, F. R. Schmidt, and J. Z. Kolter (2019) Wasserstein adversarial examples via projected zhu, iterations. arXiv: 1902.07906. Cited by: §1, 2nd item.
  • [42] C. Xiao, J. Zhu, B. Li, W. He, M. Liu, and D. Song (2018) Spatially transformed adversarial examples. ICLR. Cited by: 2nd item, §4.
  • [43] X. Yuan, P. He, Q. Zhu, and X. Li (2019) Adversarial examples: attacks and defenses for deep learning. IEEE Transactions on Neural Networks and Learning Systems. Cited by: §1, §2.
  • [44] H. Zhao, X. Qi, X. Shen, J. Shi, and J. Jia (2018) ICNet for real-time semantic segmentation on high-resolution images. Computer Vision – ECCV, pp. 418–434. Cited by: §4.

Appendix A Analysis Existing Attacks

angle=90 Attack Name Opt. Problem Opt. Method Specificity
FGSM [18]

with true label of image and loss function of neural network. First Order Method
Use scaled sign of gradient.
Measure: -l(F(x+δ),y^x)
Individual Scope
Only considering single image. Measure: id(M_sp(δ))
-based Imperceptibility
Measure: ∥ δ∥_∞
L-BFGS [38]
with adversarial target.
Second Order Method
Box-constrained L-BFGS is a quasi-Newton method.
is desired outcome. Measure: l(F(x+δ),y^x_tar)
Individual Scope
Only considering single image. Measure: id(M_sp(δ))
-based Imperceptibility
Measure: ∥ δ∥_2
PGD [25]
with true label of and loss function of neural network.
First Order Method
Multi-step gradient descent combined with projections.
Measure: -l(F(x+δ),y^x)
Individual Scope
Only considering single image. Measure: id(M_sp(δ))
-based Imperceptibility
Measure: ∥ δ∥_∞
Attack [7]
with adversarial criterion for data point .
Evolution & Random Sampling
Method is initialized from an adversarial point and then random walk along decision boundary.
Untargeted / Targeted
Depends on choice of adversarial criterion (and initialization point). Measure: c(x+δ)
Individual Scope
Only considering single image. Measure id(M_sp(δ))
-based Imperceptibility
Measure: ∥ δ∥^2_2
EOT [2]
with distribution of transformation functions, distance function and target.
First Order Method
Use projected gradient descent.
Measure also takes transformation as input. Measure: -logP(y^x_tar ∣t(x+δ))
Consider one image and relevant transformations of . Measure: E_t ∼T[M_sp(t,δ))]
-based / Attention-based Imperceptibility: Depends on choice of distance. Measure: d(⋅,⋅)

Appendix B Sample Results