Enhancing ML Robustness Using Physical-World Constraints

by   Varun Chandrasekaran, et al.

Recent advances in Machine Learning (ML) have demonstrated that neural networks can exceed human performance in many tasks. While generalizing well over natural inputs, neural networks are vulnerable to adversarial inputs -an input that is "similar" to the original input, but misclassified by the model. Existing defenses focus on Lp-norm bounded adversaries that perturb ML inputs in the digital space. In the real world, however, attackers can generate adversarial perturbations that have a large Lp-norm in the digital space. Additionally, these defenses also come at a cost to accuracy, making their applicability questionable in the real world. To defend models against such a powerful adversary, we leverage one constraint on its power: the perturbation should not change the human's perception of the physical information; the physical world places some constraints on the space of possible attacks. Two questions follow: how to extract and model these constraints? and how to design a classification paradigm that leverages these constraints to improve robustness accuracy trade-off? We observe that an ML model is typically a part of a larger system with access to different input modalities. Utilizing these modalities, we introduce invariants that limit the attacker's action space. We design a hierarchical classification paradigm that enforces these invariants at inference time. As a case study, we implement and evaluate our proposal in the context of the real-world application of road sign classification because of its applicability to autonomous driving. With access to different input modalities, such as LiDAR, camera, and location we show how to extract invariants and develop a hierarchical classifier. Our results on the KITTI and GTSRB datasets show that we can improve the robustness against physical attacks at minimal harm to accuracy.


Prediction Poisoning: Utility-Constrained Defenses Against Model Stealing Attacks

With the advances of ML models in recent years, we are seeing an increas...

Analyzing Accuracy Loss in Randomized Smoothing Defenses

Recent advances in machine learning (ML) algorithms, especially deep neu...

Evaluating the Robustness of Semantic Segmentation for Autonomous Driving against Real-World Adversarial Patch Attacks

Deep learning and convolutional neural networks allow achieving impressi...

Generating Adversarial Fragments with Adversarial Networks for Physical-world Implementation

Although deep neural networks have been widely applied in many applicati...

Adversarial Objects Against LiDAR-Based Autonomous Driving Systems

Deep neural networks (DNNs) are found to be vulnerable against adversari...

Unrestricted Adversarial Examples

We introduce a two-player contest for evaluating the safety and robustne...

HDTest: Differential Fuzz Testing of Brain-Inspired Hyperdimensional Computing

Brain-inspired hyperdimensional computing (HDC) is an emerging computati...

1 Introduction

Widespread adoption of Machine Learning (ML) for critical tasks brought along the question of trust

: Are ML models robust in making correct decisions when safety is at risk? Despite the significant advances in deep learning, near-human performance on several tasks did not translate to robustness in adversarial settings 

[42]. Several ML models, especially Deep Neural Networks (DNNs), are found susceptible to perturbed inputs with grave safety implications [4, 21, 32]. An attacker can adversarially modify an input example, such as pixels in an image, so that a classifier would mislabel the input. Such perturbations leave semantics of the input unperturbed, to the point that humans are often unable to discern differences between the original input and the perturbed input.

While many defenses have been proposed to make ML models more robust, only a few have survived the onslaught of attacks [2]. Among these defenses are certification [34], randomized smoothing [11, 27], and adversarial training [30]. The common theme across these defenses is to ensure that the model’s output is stable within an -norm ball around the input.

These defenses against adversarial examples have largely been detached from the real world. They focus on providing robustness guarantees within an -norm ball around an input without considering the semantic and contextual properties of the classification problem. Further, these defenses suffer significant degradation in accuracy [43, 24, 11, 27]. The main problem lies in that -norm bounds are incapable of capturing the geometry of decision boundaries in the task’s original input domain [24], which leads to the tension between accuracy and robustness. Recent results in literature have demonstrated that there are settings where robustness and accuracy cannot be simultaneously achieved [43, 5, 1]. Thus, the design of efficient and robust DNN classifiers remains an open research problem [7].

How can we defend against powerful adversaries and avoid the shortcomings of previous defenses? Fortunately, there is one constraint on the power of the adversary: the perturbation should not change the human’s interpretation of the physical object; the physical world places some constraints on the space of possible attacks. These physical constraints, when enforced on the attacker, limit its attack space – making the classifier more robust. In this paper, we show that we can use physical constraints to improve robustness at no additional accuracy cost.

Figure 1: The LiDAR depth measurements are immune to physical perturbations, including changes in lighting, and any stickers placed on the US road sign.

Take the example of classifying a US road sign (Fig. 1). A US road sign classified as a stop sign has to obey shape, color, and location constraints. The shape of the road sign is an octagon, its color is mostly red, and its location must be at an intersection. If the defender has access to shape, color, and/or location information, then it can narrow down the set of possible labels. An attacker can no longer generate arbitrary perturbations; its perturbations have to obey the constraints that the defender imposes. Forcing the adversary to satisfy constraints from the physical world increases the attack’s cost relative to its resources, thereby satisfying Saltzer and Shroeder’s “work factor” principle for the design of secure systems [36].

This paper addresses three questions related to employing physical constraints. First, how can we extract and model these physical constraints? Second, can we design a classification regime that utilizes these constraints? Third, does the accuracy-robustness trade-off improve?

Modeling Constraints

We model physical world constraints as invariants or relations on the model’s input space. Two input samples abide by a constraint if they satisfy the invariant. We present different examples of invariants and show that they generalize previously proposed concepts, such as robust features [43, 23]. Enforcing these invariants on an attacker introduces additional constraints on its adversarial space, which we demonstrate improves robustness. In particular, we prove that applying invariants affects results from literature that present statistical [43] and computational [1] settings where accuracy and robustness fail to coexist (Sec. 3).

Hierarchical Classification

In the real world, it is useful to view the classification problem within the context of a larger system, with access to multi-modal or multi-sensor inputs. We consider invariants that utilize additional modalities to define high-level attributes. These attributes split the output space of the classifier into equivalence classes, thereby partitioning the classification problem into smaller pieces. Leveraging the invariants, we design a hierarchical classification scheme that mimics a decision tree. At the decision nodes of the hierarchy, we have classifiers that extract the high-level attributes. At the leaves of the hierarchy, we have classifiers that predict within an equivalence class of labels (Sec. 


Robustness-Accuracy Improvement

Employing the invariants within the hierarchical classifier provides robustness gains on two levels. On the high-level, a sequence of invariants limits the prediction to an equivalence class of labels. Equivalence classes limit the adversary’s targeted attack capability; they can only target labels within the equivalence class. On the lower level, we show that reducing the number of labels improves the robustness of the classifier compared to the original classifier predicting within the full set of labels. More importantly, we show that these gains in robustness do not harm accuracy (Sec. 4.2 and Sec. 4.3).

As a case study of the real-world application of the invariants and the proposed hierarchical classifier, we study the problem of road sign classification in the context of autonomous vehicles. This study is motivated by recent work which demonstrates that both state-of-the-art machine learning classifiers, as well as detectors, are susceptible to real-world physical perturbations [13, 12]. Existing defenses can do little to protect against these physical perturbations, characterized with large -norms.

We demonstrate that invariants (consequently improved robustness) can be realized by a combination of additional modalities, such as LiDAR point clouds and GPS information. These invariants place physical, structural, and semantic constraints on the classification task. Fig. 1 shows an example of a perturbed STOP sign using adversarial patches from previous work [13]. This sign is misclassified as a speed limit sign. The depth map extracted from LiDAR shows that the shape of the sign is still an octagon, which could never refer to a speed limit sign. Sec. 5 further elaborates on using physical constraints to improve the robustness of the safety-critical task of road sign classification. Our results on the GTSRB [41] and KITTI [16] datasets show that:

  1. Without additional (re)training, the proposed hierarchical classification approach improves the robustness for the same model architecture while keeping the accuracy comparable. For triangular signs, our proposed approach yields an average increase in robustness by 60% (Sec. 5.3). The insight enabling this result is that state-of-the-art physically realizable adversarial attacks do not modify the robust features under consideration.

  2. Extending the hierarchy using multiple invariants further improves robustness. In our case study, we combined shape and location constraints to obtain an average increase of ¿80% for speed limit signs (circular signs at a given geographic location) (Sec. 5.3.3).

To recap, this paper introduces invariants to model the physical world constraints. Then, it provides the design and analysis of a hierarchical classification paradigm that utilizes these invariants. We evaluate the effectiveness of our proposed classification paradigm using road sign classification as a case study. While our case study is within the context of computer vision, the underlying ideas and techniques apply to other domains, such as audio classification.

2 Background

This section provides the background necessary for our discussion on invariants and hierarchical classifiers. We present the notation on classification, follow with a threat model, and outline recent defense mechanisms.

2.1 Classification and Notation

We represent a classification task as a function which accepts an input , where , and outputs . We consider classifiers where

is a probability distribution over a set of labels

. We denote by as the probability that the input has the label , such that and . In neural networks,

usually refers to the softmax layer of the network. The classifier assigns the input

a label . The runner-up label is defined as and the true label as .

The data distribution is a distribution over ; this is the distribution from which the data is drawn.

2.2 Threat Model

We consider attacks on the classification task, where an attacker applies a perturbation to the input to change the classification output. A targeted attacker generates a perturbation, , that modifies the classification output to a predetermined label, . Formally, the attacker aims to have . On the other hand, an untargeted attacker seeks a perturbation such that . In this paper, we define natural accuracy as the model’s accuracy on unperturbed inputs, and adversarial accuracy as the model’s accuracy on adversarially perturbed inputs.

We consider a white-box attacker with full knowledge of the model’s architecture and parameters. Given this knowledge, the attacker can perform two types of attacks: digital-space and physical space attacks. Digital-space attacks represent attacks that perturb the direct inputs to the ML model; such attacks are not necessarily realized in the physical world. These attacks, typically, involve a solution to an optimization problem: search for the smallest perturbation (represented as the -norm,) to be added to the input , such that the perturbed instance is misclassified by the classifier:

Attacks in the physical spaces move beyond digitally manipulated input images to craft realizable adversarial examples. Generating physically realizable adversarial examples involves modifying the optimization problem above to account for physical transformations. For example, these transformations manifest in varying lighting conditions, angles, and distances in the image domain [3], or manifest in different acoustic channel models in the audio case [47]. Other physical attacks involve “semantic” adversarial examples [13, 12, 37, 38, 26] in which the adversary adds physical objects that change the classification output. This process involves constraining the optimization problem by introducing structure into adversarial examples (e.g., patches in images).

2.3 Robustness Certificates

Different defense approaches aim at providing a robustness certificate for a classifier [27, 28, 11, 34]. The robustness certificate (denoted as ) implies the following:

The above means that the predicted label of does not change in the -ball around . The -ball is formally defined as: . There are several robust defenses in the literature. We focus on one representative approach: randomized smoothing, which scales with large networks but provides conservative robustness bounds. Other approaches [34] attempt to bound the local Lipschitz constant of the network around an input to provide robustness guarantees. These approaches do not scale to large networks.

Randomized Smoothing

This approach transforms a classifier to a smoothed one that can certify robustness of an input in the -norm space [27, 28, 11]. Smoothing a classifier involves adding noise to the input, drawn from a

-mean distribution, and training the network on these noisy inputs. The prediction of the smoothed classifier is taken by estimating the probability of each class’s decision region under the distribution of the added noise.

In the most recent approach of Cohen et al. [11], the smoothed classifier’s prediction is defined as the class whose decision region is the highest under . The intuition is that if the adversarial perturbation () is small enough, the probability measure of the decision regions should be similar between and  [11]. The probability of each class’s decision region is empirically estimated through running Monte Carlo simulations under the distribution .

A lower bound on the robustness certificate of each input for the smoothed classifier is given as:


where is the inverse of standard Gaussian CDF, is the lower bound of – the probability of top label, and is the upper bound bound of – the probability of runner-up label, and is drawn from . Other certified smoothing approaches have similar robustness bounds that are functions of the margin between the top and runner-up labels in the base classifiers [27].

2.4 Adversarial Training

Finally, adversarial training is an empirical defense strategy that builds on robust optimization. One such defense was proposed by Madry et al. [30]. This strategy casts the defender’s problem as min-max formulation [45, 22]:


This approach trains a classifier that minimizes the adversarial loss under an -norm bounded adversary. The inner max is solved using any attack that generates adversarial examples within . Madry et al. employ the Projected Gradient Descent (PGD) attack which works well for -norm bounded attackers.

Another adversarial training technique is based on distributional robust optimization [39]. The min-max formulation is similar to the above with two exceptions. First, the inner max is taken over perturbing the input data distribution in a Wasserstein ball. Second, the inner max is formulated as maximizing the loss under the perturbed distribution subject to the Lagrangian penalty of the Wasserstein distance between the original and perturbed data distributions.

3 Invariants

In their quest to provide -norm robustness bounds, defense approaches degrade accuracy on natural (unperturbed) examples [30, 11, 27]. While simple architectures based on simple datasets (such as MNIST) maintain a relatively high natural accuracy (larger than 90%), other architectures have a significant drop in accuracy, reaching the lower 40s [11]. Most realistic prediction tasks are not MNIST-like, and robust defenses come at a considerable cost in accuracy, even for modest robustness bounds.

In this paper, we propose enforcing real-world constraints to limit the adversary’s attack space. We show that constraining the attacker through additional information from the physical world improves the defender’s trade-off between accuracy and robustness. Three questions arise related to employing physical constraints: (1) how to extract and model these physical constraints? (2) how to design a classification technique that employs these constraints? and (3) how to quantify the resulting accuracy-robustness trade-off? The rest of this section addresses the first question, while Sec. 4 addresses the second and third questions.

3.1 Invariant Model

We model a physical-world constraint as an invariant . A physical constraint limits the adversary to the invariant so that it can perturb to but has to maintain: . This formulation allows us to express different forms of the physical invariants. Two examples of invariants are:

  • Suppose is a function (e.g. for intuition, think as the shape of image ). One formulation for the invariant is as follows:

    In other words, the adversary needs to “preserve” the output of the function . In our shape example, this invariant corresponds to the adversary not being able to change the shape. For example, if corresponds to a rectangle, then the adversary is constrained to perturbations that are rectangles.

  • Let be an equivalence relation on the set of labels . Let be the true label of . Consider the invariant as follows:

    In other words the adversary is constrained to stay in an equivalence class of labels.

The first invariant partitions the input space of the classifier, but not necessarily its output space. Two input samples might have the same label but have different values of . For example, two objects classified as the same class might have different colors (if is defined as the color of the object).

Another example of invariant generalizes robust features [43, 23]. Suppose , and the adversary is not allowed to change the first dimensions (i.e. these are robust features). In this case, we can define as the following relation:

where is the projection to the first k dimensions.

These features are robust to adversarial perturbations, and any function on them is immune to adversarial perturbations as well, since . There are different ways of extracting such robust features. A recently proposed method utilize the very-slow process of adversarial training to reveal robust features. This method, however, incurs a considerable accuracy cost for modest perturbations [23]. For example, training with robust features that are extracted from an adversarially trained model (with ) on CIFAR-10 yields a natural accuracy of less than 85% compared to 95.3% on the original model.

Another venue of extracting these robust features is to view the classification problem within the context of a larger system. Classifiers are typically deployed in multi-modal or multi-sensor settings. Utilizing these extra modalities, which the attacker cannot easily perturb, provides a venue for robust features. For example, consider the road sign classification problem, where smart vehicles have access to sensors such as LiDAR and localization. While features from these sensors might not be strongly predictive of the correct label, they can assist in identifying the road sign. These features enable the extraction of high-level semantics such as the speed limit or the shape of road sign. Using LiDAR we can identify the shape of the road sign with very high accuracy. A circular shape, for example, does not indicate the actual label but it narrows down the classification task from 43 original labels to 23 labels.

3.2 Why do Invariants Help?

Intuitively, invariants help in limiting the attacker’s actions. A typical attacker aims to minimize the size of the adversarial perturbation over the space of adversarial examples, . This space could refer to an -norm ball around an input in the case of digital attacks, or a set of general transformations (e.g., adding patches, rotations) in the case of physical attacks. We can pose the attacker’s problem as:


where refers to the distance in some metric space. Enforcing an invariant on the attacker changes the second condition from to . If , then:

This observation indicates that limiting the attacker’s space of perturbations through an invariant results in perturbations that are farther away from the original input. Forcing the attacker to generate larger perturbations improves the robustness of the model.

Recent results in literature have suggested that robustness and accuracy cannot be simultaneously achieved [43]. Other results show that while robust classifiers, with high natural accuracy, might exist, they are not computationally feasible to learn [5, 1]. Next, we describe how invariants affect these results by making robustness feasible from statistical and computational aspects.

3.2.1 Robustness and Accuracy no Longer at Odds

We consider the setting of Tsipras et al. [43, 23]

of a binary classification task. In this setting, robustness and accuracy are at odds 

[43]. Here, we show, over the same setting, that imposing an invariant on the attacker improves the defender’s accuracy-robustness trade-off.

They work in a binary classification setting and have features. The data distribution is defined as follows: A sample is generated as follows:

  • First a label is sampled uniformly from .

  • Feature takes on value with probability and with probability 1-p.

  • Features are sampled from the distribution

is the normal distribution with mean

and standard deviation

. This setting defines a binary classification task with two types of features: one robust feature that predicts the correct label with a probability and a set of features that “moderately” predict the output label sampled from . A classifier that averages the features has near perfect natural accuracy but fails under an -bounded adversary that can perturb each feature by . Tsipras et al. formalize this observation by the following theorem (reproduced from their recent paper [43] with a simple change of notation to avoid clash of notation for ):

Theorem 1.

Any classifier that attains at least natural accuracy on has adversarial accuracy at most against an -bounded adversary with .

Theorem 1 shows that, in this setting, a trade-off between robustness and accuracy exists. For example, a classifier with natural accuracy of has an adversarial accuracy of when . The robust feature provides a baseline natural and adversarial accuracy equal to . To improve the natural accuracy beyond , the classifier relies on the other features. An -bounded adversary that can perturb each feature by effectively flips the distribution of from to . The result of a model trained on the original distribution is unreliable under this flipped distribution, which degrades the adversarial accuracy.

Imposing an invariant on the same -bounded adversary prevents it from completely flipping the distributions of from to . The invariant restricts the attacker’s actions within the ball; a more restrictive invariant improves the adversarial robustness. Consider an invariant where the adversary is not allowed to perturb the first features.

In such a case, one can construct a meta-feature defined as the sign of the average of such that: , where . The meta-feature predicts when and predicts when .

This meta-feature is predictive of as follows:


Recall that is the inverse of the CDF for the normal distribution. If is large enough such that , then . Recall that as long as , this adversary is still the same -bounded adversary with which we started the discussion. Then, is a robust feature that is highly predictive of the output. This meta-feature was constructed via imposing a constraint on the adversary, and is not similar to which is provided as part of the problem setup.

Given the result of theorem 1, the robustness-accuracy trade-off can reworded as: Any classifier that attains at least natural accuracy on has adversarial accuracy at most against an -bounded adversary with subject to the invariant that the first elements of are zero. If the value of is large enough, the value of can be much larger than that of so that both natural accuracy and adversarial accuracy can be close to .

Figure 2: The improvement in adversarial accuracy as the invariant imposes a tighter constraint in the setting of Tsipras et al. [43].

Fig. 2 shows the adversarial accuracy for a toy example with and for different values of less than or equal to 1. For all the values of , the natural accuracy reaches nearly 100%. The value of controls the invariant. It is evident from the figure, that for small values of , even when is much smaller than , the adversarial accuracy exceeds 90%. The main takeaway is that imposing an invariant even on a relatively small number of features can improve the defender’s adversarial robustness.

3.2.2 Invariants Make Robustness Feasible

Researchers have started investigating the computational limitations of learning robust classifiers. This line of investigation started with Bubeck et al. [5] and continued with Degwekar and Vaikunthnathan [1]. We describe the construction of Degwekar and Vaikunthnathan [1] based on Pseudo-Random Functions (PRFs). This construction shows a scenario where a robust classifier exists but is hard to find in polynomial time.

Let be a PRF with a secret key . PRFs are an important building block in cryptography [25]. Let be an error-correcting code (ECC), where encodes a message and decodes a message which can “correct” a certain number of errors. Recall that there are excellent ECC in the literature. For example, the construction of Guruswami and Indyk [18] can tolerate constant fraction errors and still enable correct decoding.

Basic Construction

Consider the following two distributions:

where is drawn uniformly from , is the secret key, and ‘,’ is a concatenation operator.

Note that there exists a perfect classifier because the first bit of the sample indicates which distribution ( or ) it belongs to. This classifier provides perfect natural accuracy and is easy to learn. This “trick” was introduced by Bubeck et al. [5].

There exists a robust classifier : given an where and key , executes and obtains , and checks the last bit to see whether it is or (this can be done because has the secret key ). Due to the properties of the ECC, can tolerate a constant fraction of the errors. If the attacker flips the first bit of , a robust classifier is hard to learn. Without knowing the key , a robust classifier has to essentially predict the output of from only. Such a classifier cannot achieve better accuracy than (where is a negligible function of ). This result follows from the fact that is a PRF (i.e. essentially a probabilistic polynomial time adversary (PPTA) cannot distinguish between and a random bit) [1]. This setting demonstrates a situation where a robust classifier exists but cannot be found in polynomial time.

Next, we show an invariant which when imposed on the above-mentioned scenario negates the hardness result. Let . Now suppose there is an invariant such that implies that or the attacker is not allowed to change the first bit. In this case, a robust classifier exists because the classifier can just inspect the first bit and correctly classify the sample. The classification accuracy stays at because the adversary cannot perturb the first bit, which indicates which distribution the sample belongs to.

Extended Construction

One can extend the above construction to allow for higher robustness by considering the following two distributions.

Where is drawn uniformly from and is the secret key.

As with above, a perfect, and easy-to-find, classifier exists by inspecting at the first bit of the sample. Similarly, there exists a robust classifier by running over and then predicting the last bit. As with before, if the attacker flips the first bit of , a robust classifier is hard to learn.

Now, suppose there is an invariant such that:

or the attacker is not allowed to flip more than one bit in the first three bits. In this case, a robust classifier exists because the classifier can just look at the majority of the three first bits and correctly classify the sample. The classification accuracy stays at because the adversary cannot change the majority of the bits, which indicates which distribution the sample belongs to.

Both constructions above clearly show that invariants can have an effect even on scenarios meant to create “computationally hard” cases.

4 Hierarchical Classification

Having established that invariants limit the attacker’s attack space and potentially improve robustness, we look into how to leverage invariants in classification. In this paper, we consider a specific type of invariant as defined in the previous section. This invariant constrains the attacker to stay in an equivalence class of labels, such that:

These invariants represent the grouping of labels into more general and semantically relevant attributes. Enforcing such an invariant means that each input sample falls within an equivalence class of labels. The final prediction has to be within the labels belonging to the equivalence class. An attacker cannot change the classification of an input to a label outside the equivalence class in which it falls.

It is natural to view these invariants as partitioning the classification problem into smaller pieces, which is the reason why we consider them in the first place. Formally, we consider invariants that partition the set of labels into subsets of labels (each representing an equivalence class of labels) such that: and . We treat the classification within each subset of labels as a standalone classification problem.

We leverage these invariants to hierarchically classify input samples. Sequentially applying the invariants narrows down the classification task into predicting within smaller sets of labels. The outcome is a decision tree-like structure where each invariant splits the label space further. At the leaves of the decision tree, we design robust classifiers that predict within smaller label space. In summary, we have two types of classifiers: intermediate and leaf. The intermediate classifiers extract high-level features of the data that split the label space. The leaf classifiers make the final classification decision within the reduced set of labels.

For example, a two-level hierarchical classification would have an intermediate classifier deciding in which equivalence class () does the input sample fall. Then, one of the leaf classifiers decides the final prediction within the labels in . Sec. 5 shows a real-world example of the hierarchical classification.

One can draw parallels between this approach and the simple Principal Component Analysis (PCA) case. The intuition here is that on the original data, the first couple of components would discriminate different classes and intra-class variance would be left to the last components. In this approach, the intermediate classifiers extract features that model the inter-class variance and the leaf classifiers model the intra-class variance (discriminating labels in the reduced label set). Another added benefit to this approach is explainability. As equivalence classes are derived from physical world constraints, typically comprehensible to humans, this approach provides some notion of explainability to the predictions, akin to small decision trees 


4.1 Structure of Hierarchical Classifier

Figure 3: High-level description of the hierarchical classifier. The thin arrows highlight data flows while thick arrows indicate decision paths. The original set of labels is . Each intermediate classifier splits the label set further. Each leaf classifier predicts within a reduced set of labels. For example, the left-most classifier assigns each label within a probability value while assigning the other labels a probability of 0.

The hierarchical classifier is a mapping from to , where is the dimension of the original input, is the dimension of the additional features the defender has access to, and is the dimension of the original label set. Fig. 3

show the high-level structure of the hierarchical classifier, including the input features, classifiers, and output vectors.

4.1.1 Intermediate Classifiers

We assume the defender has access to a set of robust features through adversarial training or from other sensors. Depending on the classification problem, one can define high-level attributes from these robust features. These attributes partition the label space into equivalence classes; each value of an attribute is associated with an equivalence class. The intermediate classifiers predict the values of user-defined attributes.

Given a label set , an intermediate classifier partitions this label set into classes, where . It maps an input from (robust features) to a subset of the labels. Each class might have a leaf classifier or another intermediate classifier; we do not assume the hierarchy to be balanced (Fig. 3). An intermediate classifier need not be a DNN. For example, the location of a vehicle can be mapped to type of road is travelling. The type of road partitions the set of speed limit road signs.

4.1.2 Leaf Classifiers

Within a set of labels of size , narrowed down through a sequence of intermediate classifiers, a leaf classifier makes the final classification output. A leaf classifier is a mapping from to . To train this classifier, only the samples which have labels within are needed. The overall inference is very similar to the decision tree; each intermediate classifier chooses the next one to be invoked till the inference reaches a leaf classifier. Only one leaf classifier is invoked per input. The leaf classifier returns a probability distribution over . All other labels in are assigned a probability of 0.

The leaf classifiers can be made more robust through utilizing techniques such as certified smoothing or adversarial training as indicated in Sec. 2. Below, we show that reducing the number of labels improves the robustness of the leaf classifiers relevant to the original classifier.

4.2 Robustness Analysis

Figure 4: The robustness radius as function of the size of label sets in CIFAR-10, using the randomized smoothing approach of Cohen et al. [11].

The hierarchical classifier offers improved robustness at two levels. Forcing the attacker into an equivalence class of labels limits its targeted attack capability; an attacker cannot move the input outside its equivalence class. The leaf classifier, predicting within reduced label set, improves the robustness certificate – making the classifier stable within a larger ball around the input. This property limits the attacker’s attack capability within the same equivalence class.

We find that classification on the reduced label sets improves robustness relative to the original classifier. Put differently, knowing that a certain input falls within an equivalence class, the robustness certificate of is larger on the leaf classifier than it is on the original classifier. This robustness property is subtle; it arises from the observation that reducing the labels increases the distance between the decision boundaries. We show that this property holds for any general classifier.

Let , where is the set of all distributions over . For ease of notation, (for ) denotes the probability corresponding to in . Fix and define (where as follows):

It is easy to see that if , then , or is anti-monotonic in the second parameter. Recall that is similar to “hinge loss”. If we instantiate by the output of the softmax layer and use the argument of Hein and Andriushchenko for any classifier [20], we can immediately see that robustness radius increases as the set of possible labels is decreased. A similar argument can be used for the smoothing approaches [11, 27]. For example, Fig. 4 shows the robustness radius for label subsets of different sizes from CIFAR-10. The robustness radius is computed using the randomized smoothing approach of Cohen et al. [11]. It is evident from the figure that as the subset size decreases, the average robustness per sample increases.

4.3 Accuracy and Robustness Guarantees

The above analysis motivates the improvement in the trade-off between accuracy and robustness of employing the hierarchical classification versus the original classifier. In terms of accuracy, the error rate of the hierarchical classifier is a function of the error at the intermediate classifiers and the leaf classifiers. Given a leaf classifier that predicts from labels , the accuracy is:

The probability is taken over the samples ; empirically this probability is computed over some test set within . For simplicity, assume the data is drawn uniformly from the labels, so that the accuracy is . In the hierarchical classification, the overall accuracy is the sum of the accuracy of each leaf classifier weighted by the accuracy of the intermediate classifier correctly predicting the corresponding equivalence class.

When the intermediate classifiers predict the equivalence class very accurately (we show such examples in the next section), the total accuracy of the hierarchical classifier is lower-bounded by that of the base classifier. The lower bound is because the base classifier might assign an input example a label outside its correct equivalence class. When enforcing the correct equivalence class, the leaf classifier might correct the prediction result. In the construction scenario we show above, the leaf classifier will never misclassify the correctly classified examples from the base classifier.

The robustness guarantees of the original classifier depends on that of each individual classifier (the intermediate and the leaves). Given a set of classifiers composing the hierarchical classification, the attacker needs to attack only one of them to change the classification output. Then, the robustness guarantee of the hierarchical classifier is the minimum of the guarantees of the composing classifiers (intermediate and leaf ones). z

To see why the robustness guarantee of the hierarchical classifier is the minimum of the guarantees of the composing classifiers, consider the simple case of three classifiers: , , and which form a larger classifier . The hierarchy is such that is a binary classifier deciding between passing the input to or , which are the leaf classifiers. A white-box adversary aims to attack the larger classifier as usual (where C(x) is the predicted label):


Using the internal knowledge of the classifier, the adversary’s objective can be restated as:


Since only one of the constraints has to be satisfied, the problem can broken down into smaller subproblems:



We can take the lower bound and by solving the less constrained problem of:


Finally, the lower bound of the needed perturbation to attack is the minimum of the perturbations needed to attack each network individually. In particular, . If each network has a robustness guarantee such that , , and , then . It is straightforward to generalize this example for multiple and non-binary intermediate classifiers.

5 Case Study: Road Sign Classification

We performed several experiments to answer a fundamental question: Does the proposed hierarchical classification improve the robustness vs. accuracy trade-off in real-world settings? Using a combination of the GTSRB [41] and KITTI [16] datasets, we analyze this question in the context of road sign classification (e.g. to be used by autonomous vehicles) aided by robust features (i.e. shapes) extracted from an auxiliary input source, i.e. LiDAR point clouds. While conventional computer vision techniques can be used to extract the shape of an object from raw pixel values, these are not fool-proof i.e. the inputs are easy to perturb. Thus, we require an input that is hard to adversarially perturb in the real-world. LiDAR point clouds satisfy this criteria, providing us robustness for free. It is imperative to assure that the root is robust; errors at the root cascade through the hierarchy. Additionally, the presence of LiDARs in most autonomous vehicles further validates our choice.

Our experiments suggest that:

  1. The hierarchical classification approach improves the robustness vs. accuracy trade-off; for the same model architecture, while maintaining the same accuracy (if not better in some cases), the proposed approach produces an average increase in robustness111Used interchangeably with the robustness certificate of Cohen et al. [11] by .

  2. Extending the hierarchy using multiple invariants further improves robustness. In our case study, we combined shape and location constraints to obtain an average increase of for speed limit signs (which can be thought of as circular signs at a given geographic location).

We stress that this case study does not represent “the solution” to completely protect against adversarial attacks in the physical world. Instead, it demonstrates that invariants significantly raise the bar for launching specific type of adversarial attacks. It does so while improving robustness at no harm to accuracy.

5.1 Experimental Setup

Figure 5: The hierarchy over the road signs from the GTSRB dataset.

We use two datasets: the first is the German Traffic Sign Recognition Benchmark (GTSRB) [41] which contains 51,840 cropped images of German road signs which belong to 43 classes; Fig. 5 shows these classes. The second is the KITTI dataset which contains information over a five-day recording period from an instrumented vehicle on the roads of Karlsruhe, Germany. The KITTI dataset contains time-stamped location measurements, high-resolution images, and LiDAR scans. We post-processed this dataset to extract the cropped images of the included road signs as well as the corresponding LiDAR depth maps. To do so, we retrained a YOLOv3 object detector [35] to detect the German road signs using the German Traffic Sign Detection Benchmark (GTSDB) dataset. We obtained the bounding boxes of all the road signs, which we then used to obtain 3138 cropped images, their corresponding LiDAR depth maps, and their labels. We manually verified the validity of all these labels.

For all the results we describe, the experimental setup is as follows. The hierarchical classifier, as presented in Fig. 5 comprises a single root classifier which extracts shape from the LiDAR point cloud inputs. Based on the shape (circular, triangular, octagonal, inverse triangular, or rectangular), one of five different leaf classifiers is activated. Unless mentioned otherwise, the architecture for all classifiers in the hierarchy is a standard ResNet-20 model [19]. The leaf classifiers are trained using randomized smoothing [11], and the robustness certificate can be calculated as in Eq. 1. Since the root classifier operates on point cloud inputs, robustness is achieved at the root for free222This is under the assumption that the point cloud inputs are robust, and hard to perturb. We validate this empirically in Sec. 5.6. Adversarial attacks in the status quo do not impact the root classifier, as they are primarily targeted towards fooling classifiers that operate on pixel inputs. We discuss the case of a stronger adversary later in this section.

5.2 Feature Extraction

Recall that the proposed hierarchical approach works by breaking down the classification problem into different phases. An initial classification is made to obtain the robust feature (i.e. shape in our case study); observe that this classification partitions the (label) space of road signs; each partition comprises of road signs belonging to a particular shape. Over these reduced label spaces, a (per-shape) robust classifier is trained for the final road sign prediction.

How does one obtain the robust feature?

In our case study, we use domain knowledge of the classification problem to extract the shape using auxiliary inputs. To obtain shape predictions, we trained different classifiers (of varying size, architecture, and complexity) for 200 epochs. The training set comprised of 2538 point cloud inputs, and the test set comprised of 600 point cloud inputs. From our experiments, we observed that the ResNet-20 architecture trained for 200 epochs on point clouds achieves the best, near-perfect, test accuracy (98.42%). This motivates our selection of the ResNet-20 architecture for the leaf classifiers as well.

5.3 Improving the robustness vs. accuracy trade-off

We re-iterate that the robust feature, shape, partitions the label space; labels belonging to a particular shape (e.g. all speed limit signs are circular) belong to a particular equivalence class. Based on the signs in the GTSRB dataset, we observe that the diamond, octagon, and inverse-triangle shapes have a single road sign belonging to each of their equivalence classes. Thus, for these shapes, we do not require a leaf classifier; classifying based on the robust feature is sufficient333Consequently, their robustness certificate is .

The baseline for our experiments is a smoothed classifier [11] trained to predict all 43 labels, trained using images from the GTSRB dataset. For thoroughness, we trained this and the other leaf (circle, triangle) classifiers in our experiment with the noise parameter set to different values i.e. (refer Equation 1).

5.3.1 Retraining vs. Renormalization

Intuitively, the root classifier places an input into its corresponding equivalence class. Assuming the root is accurate, the leaf classifiers only operate on inputs belonging to a specific equivalence class i.e. labels belonging to a particular shape. Thus, for obtaining such a leaf classifier, one could (a) utilize the baseline classifier trained on labels, discard the probability estimates of the labels not of interest (i.e. labels belonging to a different equivalence class), and renormalize the remaining probability estimates i.e. use renormalization approach, or (b) retrain a leaf classifier from scratch based on the labels belonging to that particular equivalence class (i.e. obtain 2 new leaf classifiers). Figure 6 highlights how both these approaches impact the robustness certificate. When evaluated on over 1000 distinct inputs for each equivalence class, it is clear that both these approaches increase the robustness certificate. While the renormalization approach can only increase the robustness certificate, the retraining approach can potentially decrease the robustness certificate for some inputs.

(a) (Retrained) Circles
(b) (Retrained) Triangles
(c) (Renormalized) Circles
(d) (Renormalized) Triangles
Figure 6: Percentage improvement of robustness certificate on leaf classifiers using retraining and renormalization.

Recall that the value of the robustness certificate is directly proportional to the margin , where is the inverse of standard Gaussian CDF, is the lower bound of – the probability of top label, and is the upper bound bound of – the probability of runner-up label, and is drawn from .

In the renormalized leaf classifiers, all estimates are the same as the baseline classifier. However, depending on how the label space is partitioned, the runner-up might either be in the equivalence class under consideration, or in a different equivalence class. In the former scenario, renormalizing the probability estimates (corresponding to the equivalence class alone) will further widen the margin, and increase the certificate444From our experiments, we observed that the probability estimates of labels that do not belong to the equivalence class are oftentimes zero. In such scenarios, renormalization of the probability estimates that belong to the equivalence class does not widen the margin, but keeps it the same as before. Consequently, the certificate also remains the same.. In the latter scenario, the runner-up estimate has to be less than or equal to . Consequently, the margin ¿ . On renormalization, this margin widens further.

In the retraining scenario, however, we do not have knowledge about the ordering of the probability estimates in the retrained classifier in comparison to the baseline classifier. It is possible that for a retrained classifier and for a given input, while the correct class’ estimate remains , the runner-up’s estimate can be greater, lesser, or equal to the original (baseline) estimate . Thus, the new robustness certificate can either be lower, greater, or the same as the baseline scenario. This problem is more fundamental; since robustness is a local property relying on the local Lipschitz constant, and the structure of the decision spaces - partial or incomplete knowledge of any of these can result in spurious selection of the runner-up label.

5.3.2 Certified Accuracy

While both approaches described above can improve the robustness certificate, this must not be at the expense of accuracy [43]. Thus, for a fixed set of inputs, we also measure the certified accuracy i.e. the accuracy of classification while employing Monte Carlo trials (as described in Sec. 2.3). Observe in Table 1 that (a) the renormalized classifier has comparable accuracy to the baseline classifier (by construction, since it utilizes the same weights) and (b) the retrained classifiers have better certified accuracy than the renormalized classifiers. In summary, both approaches do not greatly impact the accuracy compared to the baseline. Thus, we have empirically demonstrated that employing invariants can boost robustness without sacrificing accuracy.

Type Circular Signs (%) Triangular Signs (%)
Baseline & 0.25 82.59 76.78
Renormalized 0.5 65.63 65.63
1 43.80 43.80
0.25 88.60 82.89
Retrained 0.5 75.70 67.91
1 54.70 42.69
Table 1: The renormalized classifier has at least the same certified accuracy as the baseline classifier. The retrained classifier performs better.
Figure 7: The percentage improvement of the robustness certificate for SpeedLimit signs utilizing a 2 layer hierarchical classifier. These results are a significant improvement on those of Fig. 6

5.3.3 Adding More Invariants

Using location information, one is able to again partition the label space. For example, a highway can not have stop signs, or an intersection will not have a speed limit sign etc. To validate this hypothesis, we performed a small scale proof-of-concept experiment to further constrain the space of labels that is obtained by splitting on the shape feature (i.e. see if we can obtain a subset of the set of, say, circular labels). Using the location information from the KITTI dataset, and local map data from OpenStreetMap [10], for particular locations, we can further constrain the space of circular road signs to just SpeedLimit signs. From Fig. 7, we observe that increasing the number of robust features (to 2 - shape and location) increases the robustness certificate further. The ordering of robust features in the hierarchy (e.g. shape followed by location vs. location followed by shape) to obtain the best increase in robustness is an open question, one which we wish to tackle in future work.

5.4 KITTI Dataset Analysis

We performed the same set of experiments described above on the KITTI dataset, where we are able to analyze the hierarchy end-to-end. As before, each input was placed into an equivalence class based on the prediction of the near-perfect root classifier. We only report the results for the circular and triangular leaf classifiers, tested with 176 and 972 inputs respectively555

The skew in inputs is due to lack of diversity in the examples obtained from the KITTI dataset


Consistent with our experiments on GTSRB, through renormalization, we observe an average increase in the robustness certificate for inputs belonging to both equivalence classes.

Label Type
Circles 18.79% 19.07% 11.63%
Triangles 27.17% 1.90% 18.18%
Table 2: Improvement in robustness certificate for different equivalence classes

The certified accuracy improves in the hierarchical structure as well. To understand how this is the case, we first compute the certified accuracy of the baseline classifier. The results are summarized in Table 3. Unlike the earlier results, we suspect that the certified accuracy for inputs belonging to the circle equivalence class are high because of (a) imbalance in the training data used for the baseline i.e. the classifier may have been overfit for inputs belonging to these classes, or (b) small number of samples (only 176).

Label Type
All 94.34% 89.55% 85.01%
Circles 96.09% 93.5% 94.13%
Triangles 85.71% 69.89% 37.79%
Table 3: Certified accuracy for different label types for the baseline classifier

Observe that the certified accuracy for the hierarchical classifiers (refer Table 4) is comparable to that of the baseline scenario. We compute the accuracy on the hierarchical classifier by passing the depth map through the root classifier, and then the corresponding image through the leaf classifier. The minor variance is caused by the error associated with the Monte Carlo trials while prediction at the leaf. Thus, the results obtained while analyzing the KITTI dataset are consistent with those in Sec. 5.3.

Label Type
Circles 95.16% 92.82% 93.49%
Triangles 83.19% 71.54% 42.90%
Table 4: Certified accuracy for different label types when the classifier is renormalized

5.5 Attacks on Point Clouds

Abundant prior work has demonstrated adversarial attacks on images; these techniques naturally transfer over to attacking the leaf classifier. But do they work on the root classifier? Based on our threat model defined in Sec. 2.2, attacks on the root classifier need to be manifested in the real world. We would like to stress that our threat model assumes a passive adversary, and this is consistent with several other works in this space; active adversaries are beyond the scope of this work. While the requirement for real-world realizability may seem very strict, we shall see how relaxing the condition does not provide the adversary much leverage.

How does one attack the root classifier?

In the real-world, for an object under consideration, Kurarin et al. [26] generate an adversarial example in the digital space. They then print the adversarial example, and place it over the physical object. A similar approach is employed by Athalye et al. [3]. Following this procedure, we generated adversarial examples on the point cloud inputs, and verified if these perturbations are realizable in the physical world. To generate adversarial perturbations, we used the algorithms proposed by Goodfellow et al. [17] (denoted FGSM), Carlini and Wagner [9] (denoted CW), and Madry et al. [30] (denoted PGD). For each attack, parameters were chosen such to minimize (a) the -norm, and (b) the -norm of the perturbation. Based on the perturbations generated by these algorithms, we compute the depth modifications made by these perturbations (RW )666Obtained from data sheet of our LiDAR [44]., and record the fraction of points perturbed ().

Analogous to pixels in images, points constitute the fundamental units of a point cloud; while pixel-perturbations are required to be imperceptible, real-world perturbations are required to be tolerable. For example, painting over a road sign is an adversarial attack, but such an attack is not tolerable. An analogous approach in the pixel space would be to mask the entire input image with random pixels. While adversarial, the new image obtained is easily distinguishable from the original image.

Recall that each point in the point cloud represents the depth of the object from the LiDAR sensor; any perturbation in the point cloud represents a change in depth of the portion of the physical object represented by that particular point. Intuitively, one may think of this as malforming the object in a strategic manner, or drilling holes through it etc. Thus, larger the perturbation, greater the change in depth. Consequently, the perturbation is more profound/evident in the physical space. From Table 5, it is clear that none of the adversarial attacks in the status quo will result in physically realizable attacks which are tolerable. One can intuitively visualize this by combining the values for RW and the number of points modified; it is clear that some attacks require a sizable portion of the road sign to be modified (as discussed above). Doing so will render the road sign unrecognizable, and consequently unusable.

n (%) RW (cm) n (%) RW (cm)
CW Avg: 13.78 27.50 15.67 30.40
Max: 71.48 285.47 77.73 235.23
FGSM Avg: 5.84 19.94 29.53 74.10
Max: 38.09 279.41 93.55 260.86
PGD Avg: 5.95 23.83 24.47 58.08
Max: 40.82 230.00 91.02 260.86
Table 5: Average and maximum distances of value perturbations from and norm bounded attacks. Most of these perturbations would be far from imperceivable from humans. We use the following abbreviations (1) n = percentage of pixels perturbed, and (2) RW = real world

5.6 Physically Realizable Noise

Figure 8: Adversarially generated stickers and markings cover the signs. Transparent plastic and glass is attached to (a) and (b) but are not detected by LiDAR. (b) and (c) have metallic shapes attached to the backs of the signs which alter the shapes seen in the depth PCs (Top) but have lower intensity as seen in the reflectivity PCs (Bottom).

One might argue that the above attacks are not designed to generate physically realizable adversarial examples, and thus the numbers reported in Table 5 are an exaggeration of the extent an adversary must go to. Instead, an adversary could use an algorithm tailored for physical perturbations, such as the one proposed by Ekyholt et al. [14]. To this end, we implemented their algorithm to obtain physically realizable adversarial examples; we then printed the stickers generated by their algorithm, stuck them on road signs, and gathered the corresponding point cloud information using a real-time 3D Velodyne LiDAR Puck with 16 channels and a 100m range [44]. From Fig. (a)a, it is evident that stickers on the stop sign do not impact the depth point cloud in any capacity; though these stickers may impact the leaf classifiers, the robust feature (i.e. shape) is still easy to extract.

At a high level, the objective of the adversary is to generate an adversarial example that alters the shape without modifying the depth of the point cloud greatly (unlike Table 5). We implemented this strategy in three different ways: (a) using transparent material stuck behind the sign (Figure (a)a), (b) using a combination of transparent and opaque objects stuck behind the sign (Fig. (b)b), and (c) using a larger opaque object stuck behind the sign (Figure (c)c). We make the following observations: (a) transparent objects do not impact the depth and reflectivity point cloud777The amplitude of the pulsed laser from the LiDAR can be used to obtain the reflectivity of the material it is reflecting off of.; the laser beam generated by the LiDAR passes through these objects, and (b) while metallic objects impact the depth point clouds, their presence can be detected using the reflectivity point cloud (as evident by the blue regions around the stop sign in both Figures (b)b and  (c)c).

While an adversary can circumvent these issues by using a metallic object with the same reflectivity as the sign under consideration, such an approach substantially increases the adversarial budget in comparison to the attacks in the status quo. Additionally, such attacks are easily noticeable by a human-in-the-loop. Thus, the hierarchical classification both improves the robustness vs. accuracy trade-off, and increases the budget for an adversarial attack.

6 Related Work

Researchers have extensively studied the robustness of Machine Learning models through exploring new attack strategies and various defense mechanisms. These efforts are very well documented in literature [8]. In this section, we only discuss work related to the different components of our classification pipeline.

Hierarchical Classification

Recent research casts image classification as a visual recognition task [48, 33, 40]. The common observation is that these recognition tasks introduce a hierarchy; enforcing a hierarchical structure further improves the accuracy. Similar to our approach, Yan et al. [48]

propose a HD-CNN that classifies input images into coarse categories which then pass corresponding leaf classifiers for fine-grained labeling. They perform spectral clustering on the confusion matrix of a baseline classifier to identify the clusters of categories. This approach is optimized for natural accuracy and uses the image data at all levels of hierarchy. In contrast, we employ robust features from different modalities to construct more robust classifiers.

Srivastava et al. [40] show that leveraging the hierarchical structure can be very useful when there is limited access to inputs belonging to certain classes. They propose an iterative method which uses training data to optimize the model parameters and validation data to select the best tree starting from an initial pre-specified tree. This approach further motivates our tree-based hierarchy; in several settings, such as autonomous driving systems, a hierarchy is readily available (as displayed by our experiments with shape and location).

Physically Realizable Attacks

Extensive research is aimed at generating digital adversarial examples, and defenses corresponding to - norm bounded perturbations to the original inputs [17, 31, 26, 30]. However, these studies fail to provide robustness guarantees for the attacks realizable in the physical world due to a variety of factors including view-point shifts, camera noise, domain adaptation, and other affine transformations.

The first results in this space were presented by Kurakin et al. [26]. The authors generate adversarial examples for an image, print them, and verify if the prints are adversarial or not. Sharif et al. developed a physical attack approach [37, 38]

on face recognition systems using a printed pair of eyeglasses. Recent work with highway traffic signs demonstrates that both state-of-the-art machine learning classifiers, as well as detectors, are susceptible to real-world physical perturbations 

[13, 12]. Athalye et al. [3] provide an algorithm to generate 3D adversarial examples (with small -norm), relying on various transformations (for different points-of-view).

LiDAR Attacks

Similar to our approach, Liu et al. [29]

adapt the attacks and defense schemes from the 2D regime to 3D point cloud inputs. They have shown that even simpler defenses such as outlier removal, and removing salient points are effective in safeguarding point clouds. This observation further motivates our selection of point clouds as auxiliary inputs in the case study. However, Liu et al.

[29] do not physically realize the generated perturbations. Other approaches consider active adversarial attacks against the LiDAR modalities [6], which can be expensive to launch. In this paper, we focus on passive attacks (on sensors) through object perturbations.

Xiang et al. [46] propose several algorithms to add adversarial perturbations to point clouds through generating new points or perturbing existing points. An attacker can generate an adversarial point cloud, but manifesting this point cloud in the physical world is a different story. There are several constraints need to be accounted for, such as the LiDAR’s vertical and horizontal resolution and the scene’s 3D layout. . Still, an attacker would need to attack more than one modality to cause a misclassification.

Robust Features

Ilyas et al. [23] and Tsipras et al. [43] distinguish robust features from non-robust features to explain the trade-off between adversarial robustness and natural accuracy. While the authors show an improved trade-off between standard accuracy and robust accuracy, it is achieved at the computational cost of generating a large, robust dataset through adversarial training [23]. We circumvent this computational overhead by adopting invariants (and consequently robust features) imposed by the constraints in the physical world.

7 Conclusion

In this paper, we discuss how robust features realized through invariants (obtained through domain knowledge, or provided by real world constraints), when imposed on a classification task can be leveraged to improve adversarial robustness without impacting accuracy. Better still, this is achieved at minimal computational overhead. Through a new hierarchical classification approach, we validate our proposal on a real-world classification task - road sign classification - using two datasets (GTSRB and KITTI). We also show how some invariants can be used to safeguard the aforementioned classification task from physically realizable adversarial examples. Through the course of our work, we identified key themes that we hope to focus in future research: (a) Do robust features always exist?, and (b) Can these features be extracted efficiently?