I Introduction
In the past decade, the broad applications of deep learning techniques are the most inspiring advancements of machine learning
[lecun2015deep]. Compared to early attempts on neural networks [williams1986learning, lecun1998gradient], modern deep learning models introduce more layers with complex structures and nonlinear transformations to model a highlevel abstraction of data [bengio2013representation]. The ability to learn from examples makes deep learning particularly attractive to cognitive applications, such as image and speech recognition [krizhevsky2012imagenet, hinton2012deep], object detection [girshick2014rich][collobert2008unified], etc. Moreover, deep learning systems demonstrated fascinating performance in many realworld applications and achieved near or even beyond humanlevel accuracy in solving classification problems in such domains, including handwritten digit recognition [lecun1998mnist], image classification [graham2014fractional], semantic scene understanding
[wu2016apesnet], etc.The security industry has also adopted machine learning techniques in its practices [lian2007image, zhou2016image, zhang2015cross]. Many of these applications are based on the strong classification capability of the learning models, including surveillance [tao2006human], authentication [lian2007image]
[taigman2014deepface], vehicle detection [zhou2016image], and crowd behavior analysis [zhang2015cross].However, recent research discovered that machine learning and neural network models are susceptible to adversarial attacks, which apply small perturbations on input samples to fool models [szegedy2013intriguing]. Such attacks generally downgrade models’ confidence levels on inputs and even result in misclassifications [goodfellow2014explaining]. The amplitude of the perturbation that is used in adversarial attacks (a.k.a. adversarial strength) can be quite small or even imperceptible to the human eyes. Furthermore, Papernot et al. discovered that an elaboratelyperturbed example (i.e., adversarial example) is transferable: the adversarial example crafted by a substitute model can not only deceive itself but also influence other models (e.g., victim models), even without knowing the internal structures and parameters of these victim models [papernot2017arxiva]. These properties raise severe concerns on the security of deep learning technique.
A lot of research efforts have been put on adversarial examples and adversarial attacks [yuan2017adversarial]. Most works focused on maximizing the classification error (for attacks)/accuracy (for defenses) while minimizing the difference/distance between adversarial examples and original samples [carlini2017towards, szegedy2013intriguing, papernot2017practical, papernot2017arxiva]. As research are going further and deeper, the robustness of neural networks emerges as the new focus. Adversarial attacks are categorized as one type of evasion attacks, which fool neural networks by introducing deliberatelymodified examples at test time. The ultimate goal of improving model robustness is defending not only adversarial attacks, but any types of evasion attacks that attackers may conduct. One possible solution and explanation are based on the decision space analysis. We summarize the adversarialbased and decisionspacebased related works in Section II.
Compared to the existing works on neural network security problems, especially on the adversarial attack and defense schemes, our major contributions in this work can be summarized as follows:

[leftmargin=*]

We prove the relation between the model robustness and margins (distances between samples and decision boundaries) in the decision space;

We analyze the model robustness in the decision space based on an existing work [he2018decision] and propose a set of distancebased criteria to evaluate it;

We find out interclass inequality exists in all datasets discussed in this work and class robustness can be utilized to improve model’s overall robustness;

We propose a universal feedback learning method with interclass inequality compensation to facilitate model retraining;

We show our method is effective in improving model accuracy and robustness against multiple types of evasion attacks with experiments on MNIST and CIFAR10 datasets.
The remainder of this paper is organized as follows: Section II summarizes the existing works about adversarial attacks and decisionspacebased methods; Section III introduces the motivation of this work; Section IV presents the details of our proposed method together with theoretical proof on method effectiveness; Section V discusses the experimental results; At last, we conclude this work in Section VI.
Ii Related Works
Iia Adversarial Related Topics
Kurakin et al. [Kurakin2017ICLR] showed that combining small batches of both adversarial examples and original data in adversarial training could make the model more resilient to adversarial attacks. Carlini and Wagner [carlini2017towards]
demonstrated that defensive distillation does not significantly enhance the robustness of neural networks in some scenarios by introducing three new attack algorithms. Cisse
et al. [cisse2017parseval] introduced a layerwise regularization method to reduce the neural network’s sensitivity to small perturbations, which are difficult to be visually caught.These adversarialbased approaches usually are effective for one or several specific adversarial attacks/defenses, but still vulnerable to different types or new adversarial attacks/defenses methods. For example, Athalye et al. claimed that the attacks basedon their newlyproposed obfuscated gradients (which is a kind of gradient masking method) could circumvent 7 out of 9 noncertified whiteboxsecure defenses, all of which were accepted by International Conference on Learning Representations (ICLR) 2018 [athalye2018obfuscated]. This shocking fact alerted the researchers in the area about the importance of universality for both attack and defense methods.
IiB Decision Related Topics
Adopted from the statistics community, a decision space (a.k.a an input space) refers to a vector space where all input samples lie. Decision boundaries are hypersurfaces that partition the decision space into separate sets. During the learning of decision boundaries, neural networks attempt to minimize the empirical error, while support vector machines (SVMs) tend to maximize the empirical margin between the decision boundary and input samples
[boser1992training].Besides the common MinMaxbased adversarial researches, the latest studies shift the focus to the meaning of adversarialrelated problems in decision spaces directly. Tramèr et al. [tramer2017space] introduced methods of finding multiple orthogonal adversarial directions and showed that these perturbations span a multidimensional contiguous space of misclassified points. They believe that the higher the dimensionality of adversarial perturbations is, the more likely the subspaces of two models will intersect. Brendel et al. introduced the boundary attack, a decisionbased attack that starts from a large adversarial perturbation and then seeks to reduce the perturbation while staying adversarial [brendel2018decision].
Cao and Gong aimed to increase the robustness of neural networks [cao2017mitigating]. Instead of using only a test sample to determine which class it belongs to, hundreds of neighboring samples are generated in the surrounding hypercube and a voting algorithm is applied to decide the true label of the test sample. The results show that this straightforward defense method is effective in improving neural networks’ robustness in the sense of adversarial aspects without sacrificing the classification accuracy on legitimate examples. The method could substantially reduce the success rates of CW attacks [carlini2017towards] while other defense methods (such as adversarial training and distillation) couldn’t.
Inspired by this regionbased classification method [cao2017mitigating], He et al. [he2018decision] moved from hypercubes to larger neighborhoods. They proposed an orthogonaldirection ensemble attack called OptMargin, which could evade the regionbased classification defense mentioned in [cao2017mitigating]. They also analyzed margins (between samples and decision boundaries) and adjacent class information, then built a simple neural network to detect adversarial examples with these analysis results.
Another stateoftheart decisionbased defense is proposed in [madry2017towards]. Madry et al. found projected gradient descent (PGD) is a universal adversary among firstorder approaches. To guarantee the adversarial robustness of the model, they restarted PGD from many points in the balls around data points to generate adversarial examples for training. The idea is intriguing, but it cannot provide a concrete guarantee against nonadversarial evasion attacks. In [elsayed2018large]
, a novel loss function is introduced to impose a margin on any layers of a model. To the best of the authors’ knowledge, none of the existing works discovers or deals with interclass inequality mentioned in the following Section
III.Iii Motivation
Inspired by Tramèr et al.’s work [tramer2017space], we form an assumption on how generated examples facilitate the robustness improvement: generated examples can “push” decision boundaries towards other classes, which enlarges the margins between real samples and decision boundaries. By retraining with generated examples in different directions, the margins of decision boundaries will enlarge. An illustration of this assumption with adversarial examples is given in Fig. 1.
Essentially, neural networks minimize the empirical error, while SVMs maximize the empirical margin between the decision boundary and training samples. These approaches have different objective functions, but target at the similar or even the same ultimate goal, that is, to classify the samples into correct classes as accurate as possible. Since most attacks are distancebased, the objective of model training shall include not only high accuracy but also large margins.
In many adversarialrelated researches [carlini2017towards, szegedy2013intriguing, papernot2017practical, papernot2017arxiva], the adversarial strengths are limited to certain ranges to guarantee the distortions are imperceptible. This limitation is equivalent to restricting the distance between original samples and adversarial examples. As defined, adversarial examples are supposed to cross decision boundaries and be misclassified with the minimum moving effort. If the model’s decision boundaries can be trained to be far from every sample, then the generation of boundarycrossing examples (including adversarial examples) will be more difficult. As a result, the model’s robustness against evasion attacks will be improved. This assumption will be further proved in Section IV.
What’s more, this kind of defense method is more generalized: not only is it beneficial to adversarial defenses, but also it can alleviate the damage of any distancerelated evasion attack. In realworld scenarios, images captured by selfdriving cars, security cameras, and webcams suffer from large noise and could totally vitiate classifiers. If the margin between samples and decision boundaries can be greatly increased, the model’s robustness will be improved positivelycorrelated.
When observing all the margins and adjacent/destination classes (the first predicted class other than the original class as a sample keeps moving in any fixed direction), we discover another remarkable fact: adjacent classes are not equallydistributed, even though every class has the equal amount of training samples (6,000 training samples per class for MNIST [lecun1998gradient] and 5,000 for CIFAR10 [krizhevsky2009learning]). In Table I, the percentages mark the proportions of total random search that fall into the corresponding adjacent class. As can be seen that class 3 and Bird are the most robust (adjacent) classes in MNIST and CIFAR10, respectively, and class 0 and Horse are the most vulnerable (adjacent) classes. We name this phenomenon as interclass inequality.
MNIST  0  1  2  3  4  5  6  7  8  9  N/A 

Adjacent  0.02%  0.60%  1.59%  5.31%  4.52%  2.30%  1.07%  0.54%  4.29%  0.82%  78.95% 
CIFAR10  Airplane  Automobile  Bird  Cat  Deer  Dog  Frog  Horse  Ship  Truck  N/A 
Adjacent  0.20%  0.15%  44.61%  11.40%  7.51%  0.27%  25.62%  0.02%  2.47%  0.37%  7.39% 
The proportional differences between vulnerable classes and robust classes are so tremendous that raises the question whether each class has the same robustness level. One hypothesis is that different classes have different spatial occupancy, which affect the construction of decision boundaries. A robust class has a relatively large volume in the highdimensional decision space, while a vulnerable class has a smaller volume region. As far as we observe from the results, vulnerable classes are adjacent to more classes, while robust ones are adjacent to fewer classes. If we can take class robustness into account, the model robustness can also be improved via improving class robustness. This will be further discussed in Section IV.
Iv Feedback Learning and Theoretical Proof
To alleviate the robustness problem and interclass inequality mentioned in Section III, we refine the boundary search originally presented by He et al. [he2018decision] and propose feedback learning (F.L.). Its principle is to understand how well the model learns and generate the corresponding examples to facilitate the retraining process. The procedures of the complete feedback learning method can be divided into five steps: training, sample selection, direction and margin calculation (the boundary search), example generation, and retraining and testing. Algorithm 1 gives a demonstrative realization of the feedback learning algorithm. In order to avoid confusion, we specify that “samples” are legitimate, authentic, genuine samples collected from the realworld, and “examples” are generated or perturbed instances. The detail about example generation will be discussed in Section IVB.
Iva Robustness Measurement
First, we introduce the model’s mean margin matrix , of which the th row, th column element is
(1) 
where, and respectively denote the margin and unit vector between samples and decision boundaries. All distances, margins mentioned in this paper are based on Euclidean metric. The numerator in the right hand side of Equation (1) means the summed margins from all samples in class to the decision boundary between class and class . is the total number of margins added for the origin and adjacent class pair .
As discussed in previous section, we define the model’s robustness as the margins between samples and decision boundaries. The relation can be formed as:
(2) 
where is the total number of classes and denotes the robustness level of the model. A larger means the bigger overall margin and better robustness for the model.
Moreover, we define class robustness by:
(3) 
Here we use summed margin to better express both margin and total number of traverses for each class, as there may exist cases with large and small or small and large . In such cases, mean margin can not reflect how many traverses happen in the given experiment settings, while the summed margin can.
According to Equation (3), the class robustness of any class could be described as a ratio between total margins of class as adjacent class and total margins of class as origin class. The numerator measures the defensiveness of class , and the denominator represents its offensiveness. If is greater than 1.0, we believe class is robust and mild. Otherwise, class is vulnerable if is less than 1.0. The class robustness can be used to measure the model’s interclass inequality and indicate which classes need further training.
IvB Example Generation Criteria
Considering interclass inequality of different classes, vulnerable classes have smaller margins and are prone to being transferred by other classes, while robust classes have relatively larger margins. Here, we utilize a straightforward approach to improve the class robustness of vulnerable classes: by increasing the proportion of examples in vulnerable classes. To be specific, all classes are categorized into three robustness levels, with three different settings for generating examples. If is among the top 20% of all classes, it has “highlevel robustness” and only 20 samples in class are chosen for generating retraining examples (“minimum selection”). If is among the bottom 50% in all classes, class has “lowlevel robustness” and 150 samples will be utilized (“maximum selection”). Otherwise, class has “mediumlevel robustness” and 100 samples will be utilized (“medium selection”).
For each chosen sample, we generate retraining examples in 40 different directions with the top40 minimum margins. The generation strength is the margins we measured, which could guarantee boundary crossing (a theoretical proof on boundary retraining is provided in Section IVC). In this way, we construct a retraining dataset, shuffled with generated examples and all original training samples. Please note that all parameters mentioned here are empirical.
IvC Theoretical Derivation
To prove that retraining could positively influence decision boundaries, let us start with a binary classification problem illustrated in Figure 2. Here, is any dimensional input. The binary classifier can be described as , where is the parameter. The decision boundary separates class A () and class B (). Assume a training example in class A with . Unit vector is the gradient direction from towards , and unit vector is a random direction. The loss function is:
(4) 
Here, is the total number of samples trained, is the label of samples,
is the activation function, and
is the Hadamard product.Lemma 1: If , then the direction will have a positive contribution to the gradient , that is: .
Proof: As vector can be decomposed using vector resolution such as , in which is the tangent direction of , . Thus:
(5) 
After retraining with a generated example , with , we get a new boundary .
Theorem 1: After training with , the new classifier satisfies .
Proof: The loss introduced by is:
(6) 
In backpropagation, weights and biases are updated as:
(7a)  
(7b) 
is the learning rate of the classifier and is the derivative of . We denote .
Since will be misclassified by the original boundary, we have , then .
Suppose the binary classifier only has one layer, . Then:
(8) 
Based on Equation (7a) and Equation (7b), we can replace and in Equation (8) with and :
(9) 
if is a nonlinear function. Here is the total number of layers of , and are parameters in the last layer, is the representation of the first layers, and is all parameters in . Following similar steps as Equation (9):
(10) 
Therefore, we can guarantee that after retraining with in the binary classifier .
Now we consider multiclass classification problems. Learning directly from MoosaviDezfooli et al. [moosavi2016deepfool], we know that mapping multiple labels is (or can be approximated as) a onevsall classification scheme. Moreover, our feedback learning method only focuses on one sample and one direction at a time. Two classes at most could be involved with any sample and direction pair. Thus, it is always a binary classification problem in our case.
CCI  Model  0  1  2  3  4  5  6  7  8  9  Avg.  Std. 

Ori.  Ori.  239.567  49.505  252.513  65.123  53.959  26.399  155.302  31.591  50.915  56.208  98.1082  85.681 
Adv.  27.161  75.020  245.622  50.273  79.537  45.759  255  66.710  89.535  56.269  99.089  81.728  
F.L.  25.143  219.195  235.856  111.182  207.838  84.16  255  97.96  194.448  36.047  146.6829  85.323  
Reduced F.L.  28.888  142.801  251.285  95.455  230.259  76.429  255  111.232  198.483  33.115  142.2947  86.766  
F.L.  Ori.  26.615  49.505  248.739  37.857  53.959  16.805  82.651  26.325  47.715  39.919  63.009  67.783 
Adv.  89.823  75.020  233.650  52.883  79.537  67.908  254.547  57.907  83.937  98.880  109.409  72.486  
F.L.  150.723  219.195  255  132.132  207.838  117.963  255  107.617  206.296  190.074  184.1838  54.176  
Reduced F.L.  219.107  142.801  176.329  171.791  230.259  106.657  255  91.034  190.249  137.5  172.0727  53.460 
V Case Study
To prove the effectiveness of our proposed method, experiments were conducted with two models on two datasets: a Convolutional Neural Network (CNN) on MNIST and a ResNet
[he2016deep] on CIFAR10. For both settings, we randomly picked 1,500 samples from training sets and performed boundary searches in orthogonal directions. We adopted the linear search method from He et al. [he2018decision] to find margins and adjacent classes: For MNIST, 784 random orthogonal directions were searched with step size 0.02 (the dimension of a MNIST sample is 784 and pixel values are 0 to 1) in both positive and negative directions. For CIFAR10, 1,000 random orthogonal directions were searched with step size 2.0 (the dimension of a CIFAR10 sample is 3,072 and pixel values are 0 to 255). We trained and retrained our models with parameter settings derived from Madry et al. [madry2017towards]on Tensorflow (v1.8.0)
[tensorflow2015whitepaper]. Most of the attacks we performed were using the Cleverhans library (v2.1.0) [papernot2018cleverhans].In Figure 3 and the rest of this paper, we will use the following abbreviations for the models: Ori. is the original one, Adv. was adversariallytrained by Madry et al. [madry2017towards], F.L. was retrained from the original model using feedback learning with compensation to interclass inequality, and Reduced F.L. was retrained without considering interclass inequality. The results show that feedback learning, both with or without inequality compensation, achieve better results than original training under FGSM attacks [goodfellow2014explaining]. For adversarial strength less than 0.4, the stateoftheart adversarial model is the most accurate, while feedback learning surpasses it when the perturbation further increases. It is widely accepted that adversarial strength of 0.3 is large enough for any adversarial attacks on MNIST dataset. Benefiting from the randomness in direction selection, feedback learning can guarantee larger margins for more general directions and defend against severe attacks. The Adv. model performs even worse than the Ori. model at adversarial strength 0.5. In fact, the defense of the Adv. model degrades as the attack strength increases and when attack and defense methods differ. Our proposed feedback learning, in contrast, can defend adversarialbased attacks, even it does not use any adversarialrelated information.
Va Margin Improvement and Interclass Inequality Mitigation
In Table II, class center images (CCIs) are defined as examples with the largest mean margins in each class. First, we searched over the 1,500sample set to find one CCI for each class and two different CCI sets were found based on two models (because the boundaries and margins are different in different models). Then we applied a linear search to measure the margins from CCIs to the decision boundaries of four models: Ori., Adv., F.L., and Reduced F.L.. The results are concluded as follows:

[leftmargin=*]

The Ori. model has the worst mean margin performance for both CCI sets. The F.L. model has the largest mean margins. The overall mean margins of the Adv. model is marginally larger than that of the Ori. model. The F.L.
model has highest overall mean margins and low standard deviations in both CCI cases, followed by the
Reduced F.L. model with slightly worse performances. 
In comparing CCI sets, the F.L.’s CCI set has the largest mean margins (except on the Ori. model) and the smallest standard deviations on all four models.
The results demonstrate that the retraining with crossboundary examples always improve margin robustness, which is in consist with the proof in Section IVC. Employing feedback learning method additionally alleviates the interclass inequality problem by increasing the margins in vulnerable directions.
Attacks  MNIST  CIFAR10  

Ori.  Adv.  F.L.  Ori.  Adv.  F.L.  
FGSM  50.50%  97.57%  76.96%  36.60%  83.70%  56.20% 
CWL2  39.40%  94.50%  51.63%  9.30%  54.20%  20.30% 
PGD  18.62%  98.03%  64.54%  38.90%  85.20%  60.80% 
BIM  15.36%  97.95%  56.39%  31.20%  85.10%  51.20% 
MIM  38.08%  97.94%  76.39%  44.20%  85.10%  60.30% 
DeepFool  32.92%  64.36%  44.75%  36.60%  83.70%  56.20% 
Random  78.19%  31.09%  91.07%  45.40%  84.20%  87.50% 
Gaussian  77.31%  34.77%  90.65%  79.14%  86.59%  90.58% 
VB Robustness Improvement and Decision Space Analysis
Table III compares the accuracy performance of Ori., Adv., and F.L. models on multiple evasion attack methods: fast gradient sign method (FGSM) [goodfellow2014explaining], Carlini & Wagner (CWL2) [carlini2017towards], projected gradient descent (PGD) [madry2017towards], basic iterative method (BIM) [Kurakin2017ICLR], momentum iterative method (MIM) [dong2017boosting], DeepFool [moosavi2016deepfool]
, random noise (with a uniform distribution), and Gaussian noise. Please note that all attack parameters were carefully tuned for better effectiveness display (so that none of the models should perform worse than a random guess). For adversarialrelated attacks, the
Adv. model is the most accurate overall, while feedback learning improves model robustness to a certain extent. As to random and Gaussian noise attacks, feedback learning clearly reaches the best accuracies on both datasets, while the Adv. model’s robustness could be destroyed against nonadversarial attacks (especially on lowresolution datasets such as MNIST).The result shows our method can improve accuracies on both adversarialbased examples and nonadversarial examples, compared to the Ori. model. The reason why the F.L. model doesn’t achieve better result than the Adv. model on adversarialbased examples is that the latter is trained especially for these kinds of attacks. As we can see that the Adv. model has worse results on nonadversarial examples than the F.L. model.
Furthermore, we measured and calculated the margins and robustness of all the four models. Please note that we omit the class “N/A” in Table I, because the margin is large enough to be ignored if no adjacent class found. The heat maps of mean margin matrices in Figure 4 give visual comparisons between the Ori. and F.L. models’ margin measurements. Mean margin matrices of the Adv. and Reduced F.L. models are not shown due to page limit. Here, MNIST samples are scaled to and CIFAR10 samples are kept in original format in pixel value. For MNIST dataset, as class “3” and “4” fall into “highlevel robustness” category and class “0” and “1” have “lowlevel robustness” in the Ori. model, more samples were generated for retraining in class “0” and “1” than class “3” and “4” as defined in Section IVB. After retraining with feedback learning, the mean margins of class “3” and “4” slightly decrease, while other classes’ mean margins increase. We also note that more margins are beyond searching range (which is half of the pixel value range, 0.5 for MNIST and 127.5 for CIFAR10) after retraining. These changes result in an overall improved summed margin and mean margin. For CIFAR10 dataset, the improvement is more significant.
Moreover, we calculate the robustness levels: , , , and on MNIST; and , , , and on CIFAR10. Here, margins with no adjacent class are counted as the maximum searching range. The results prove that our feedback learning method substantially increases the margins from samples to decision boundaries and also ensure models have larger overall margins comparing those obtained from retaining without considering interclass inequality. More analyses on causes of such results are given in Section VC.
VC Class Robustness Comparison
MNIST  0  1  2  3  4  5  6  7  8  9  Avg.  Std. 

0.005  0.149  0.955  7.237  6.662  2.135  0.661  0.187  3.075  0.176  2.124  2.73  
0.702  0.007  4.425  1.511  0.221  1.296  0.922  0.182  3.603  0.293  1.316  1.517  
29.316  31.268  15.408  0.009  0.007  1.856  10.387  2.318  5.858  25.966  12.24  12.48  
3.665  32.208  0.041  0.553  128.07  0.809  0.388  1.227  1.049  0.001  16.80  40.32  
CIFAR10  0  1  2  3  4  5  6  7  8  9  Avg.  Std. 
0.032  0.011  14.005  0.888  0.565  0.021  1.110  0.002  0.151  0.029  1.681  4.349  
0.066  0.0074  1.768  0.104  1.240  0.013  25.46  0.299  0.031  0.497  2.948  7.93  
0.020  0.059  9.147  0.343  0.573  0.023  11.209  0.015  0.231  0.052  2.167  4.254  
0.721  0.894  14.628  0.057  0.128  0.121  1.559  0.470  0.197  0.066  1.884  4.503 
Table IV compares class robustness of four models: Ori., Adv., F.L. and Reduced F.L.. The Adv. model controls neither model robustness nor class robustness, thus it has the most unstable performance among all. Feedback learning improves overall class margins with acceptable expense: Several “highlevel” and “mediumlevel robustness” classes may become more vulnerable (such as “3” and “4” for both datasets), while “lowlevel robustness” classes have remarkably better robustness. Without considering interclass inequality, the Reduced F.L. model still achieves better robustness than the Ori. model, but relatively worse than the F.L. model in the sense of overall margins and standard deviations.
Why does the F.L. model achieve better results than the Reduced F.L. model, even when the latter employs more examples in retraining? This is because the total capacity of the decision space is constant, and decision boundaries are only borders to partition the given space. Retraining with any technique will only relocate boundaries and redistrict class regions. As the interclass inequality problem exists, training each class with the same emphasis won’t increase classes’ robustness simultaneously, but may even result in severer maldistribution of class domains. This also explains why is even smaller than on MNIST. Feedback learning attempts to relax such an inequality by allowing vulnerable classes to make more contributions to the models, thus the boundary relocation is more controllable.
VD Computation Complexity Analysis
The boundary search is the most computationconsuming part in our feedback learning method. As the dimension of inputs increase, the computation cost will go up. The following two explanations/solutions could greatly alleviate this issue.

[leftmargin=*]

The boundary search happens before the retraining process, which means it is conducted only once for each model. What’s more, the margin information can be reused to further retrain the model multiple times until the model reaches the desired robustness performance;

The boundary search in each direction for each sample can be highly parallel. This won’t reduce the computation cost but will greatly shorten the searching time. Here we assume the major concern is model robustness and computation time, not the computing power. Thus, our method can be deployed on datasets with high dimensional inputs.
Vi Conclusions
In this work, we first analyze the model robustness in the decision space. According to it, we propose a feedback learning method to understand how well a model learns and facilitate the model’s retraining to remedy the defects. A set of distancebased criteria for model robustness evaluation show that our method can significantly improve models accuracy and robustness against different types of attacks. Moreover, we observe the existence of interclass inequality, which can be compensated by changing the proportions of examples generated in each class.
Comments
There are no comments yet.