The robustness against adversarial attacks of (deep) neural networks (NNs) for classification tasks has become one of the most discussed topics in machine learning research since it was discovered [1, 2]. By making almost imperceptible changes to the input of a NN, attackers are able to force a misclassification of the input or even switch the prediction to any desired class. With machine learning taking a more important role within our society, the security of machine learning models in general is under more scrutiny than ever.
To define an adversarial example, we use a definition similar to . Suppose we use a set of scoring functions which assign a score to each class given an input of the data space . Moreover, the predicted class label for is determined by a winner-takes-all rule and we have access to a labeled data point
which is correctly classified as. An adversarial example of the sample is defined as the minimal required perturbation of by to find a point at the decision boundary or in the classification region of a different class than , i. e.
Note that the magnitude of the perturbation is measured regarding a respective norm . If , an adversarial example close to the decision boundary is found. Thus, adversarials are also related to the analysis of the decision boundaries in a learned model. It is important to define the difference between the ability to generalize and the robustness of a model . Assume a model trained on a finite number of data points drawn from an unknown data manifold in . Generalization refers to the property to correctly classify an arbitrary point from the unknown data manifold (so-called on-manifold samples). The robustness of a model refers to the ability to correctly classify on-manifold samples that were arbitrarily disturbed, e. g. by injecting Gaussian noise. Depending on the kind of noise these samples are on-manifold or off-manifold adversarials (not located on the data manifold). Generalization and robustness have to be learned explicitly because the one does not imply the other.
Although Learning Vector Quantization (LVQ), as originally suggested by T. Kohonen in , is frequently claimed as one of the most robust crisp classification approaches, its robustness has not been actively studied yet. This claim is based on the characteristics of LVQ methods to partition the data space into Voronoï cells (receptive fields), according to the best matching prototype vector. For the Generalized LVQ (GLVQ) , considered as a differentiable cost function based variant of LVQ, robustness is theoretically anticipated because it maximizes the hypothesis margin in the input space . This changes if the squared Euclidean distance in GLVQ is replaced by adaptive dissimilarity measures such as in Generalized Matrix LVQ (GMLVQ)  or Generalized Tangent LVQ (GTLVQ) . They first apply a projection and measure the dissimilarity in the corresponding projection space, also denoted as feature space. A general robustness assumption for these models seems to be more vague.
The observations of this paper are: (1) GLVQ and GTLVQ have a high robustness because of their hypothesis margin maximization in an appropriate space. (2) GMLVQ is susceptible to adversarial attacks and hypothesis margin maximization does not guarantee a robust model in general. (3) By increasing the number of prototypes the robustness and the generalization ability of a LVQ model increases. (4)
Adversarial examples generated for GLVQ and GTLVQ often make semantic sense by interpolating between digits.
2 Learning Vector Quantization
LVQ assumes a set of prototypes to represent and classify the data regarding a chosen dissimilarity . Each prototype is responsible for exactly one class and each class is represented by at least one prototype. The training dataset is defined as a set of labeled data points . The scoring function for the class yields . Hence, the predicted class is the class label of the closest prototype to .
GLVQ is a cost function based variant of LVQ such that stochastic gradient descent learning (SGDL) can be performed as optimization strategy. Given a training sample , the two closest prototypes and with correct label and incorrect label are determined. The dissimilarity function is defined as the squared Euclidean distance . The cost function of GLVQ is
with the local loss where
is a monotonically increasing differentiable activation function. The classifier functionis defined as
where . Thus, is negative for a correctly classified training sample and positive otherwise. Since is differentiable, the prototypes can be learned by a SGDL approach.
Generalized Matrix LVQ:
By substituting the dissimilarity measure in GLVQ with an adaptive dissimilarity measure
GMLVQ is obtained . The relevance matrix is learned during training in parallel to the prototypes. The parameter controls the projection dimension of and must be defined in advance.
Generalized Tangent LVQ:
In contrast to the previous methods, GTLVQ  defines the prototypes as affine subspaces in instead of points. More precisely, the set of prototypes is defined as where is the -dimensional basis and is the translation vector of the affine subspace. Together with the parameter vector , they form the prototype as affine subspace . The tangent distance is defined as
is a hyperparameter. Substitutingin GLVQ with and redefining the set of prototypes to yields GTLVQ. The affine subspaces defined by are learned by SGDL.
3 Experimental Setup
In this section adversarial attacks as well as robustness metrics are introduced and the setup of the evaluation is explained. The setup used here follows the one presented in  with a few minor modifications to the study of LVQ methods. All experiments and models were implemented using the Keras framework in Python on top of Tensorflow.111Tensorflow: www.tensorflow.org; Keras: www.keras.io The Foolbox 
implementations with default settings were used for the attacks. The evaluation was performed using the MNIST dataset as it is one of the most used datasets for robust model evaluation in the literature. Despite being considered by many as a solved ‘toy’ dataset with state-of-the-art (SOTA) deep learning models reaching close to perfect classification accuracy, the defense of adversarial attacks on MNIST is still far from being trivial. The dataset consists of handwritten digits in the data space with . We trained our models on the 60K training images and evaluated all metrics and scores on the complete 10K test images.
3.1 Adversarial Attacks
Adversarial attacks can be grouped into two different approaches, white-box and black-box, distinguished by the amount of knowledge about the model available to the attacker. White-box or gradient-based attacks are based on exploiting the interior gradients of the NNs, while black-box attacks rely only on the output of the model, either the logits, the probabilities or just the predicted discrete class labels. Each attack is designed to optimize the adversarial image regarding a given norm. Usually, the attacks are defined to optimize overnorms (or -norms) with and, therefore, are called -attacks.
In the evaluation, nine attacks including white-box and black-box attacks were compared. The white-box attacks are: Fast Gradient Sign Method (FGSM) , Fast Gradient Method (FGM), Basic Iterative Method (BIM) , Momentum Iterative Method (MIM)  and Deepfool . The black-box attacks are: Gaussian blur, Salt-and-Pepper (S&P), Pointwise  and Boundary . See Tab. 2 for the definition of each attack. Note that some of the attacks are defined for more than one norm.
3.2 Robustness Metrics
The robustness of a model is measured by four different metrics, all based on the adversarial distances . Given a labeled test sample from a test set and an adversarial -attack , is defined as: (1) zero if the data sample is misclassified ; (2) if found an adversary and ; (3) if no adversary was found by and .
For each attack the median- score is defined as describing an averaged over
robust to outliers.222Hence, median- can be if for over of the samples no adversary was found. The median- score is computed for all -attacks as the where is defined as . This score is a worst-case evaluation of the median-, assuming that each sample is disturbed by the respective worst-case attack (the attack with the smallest distance). Additionally, the threshold accuracies acc- and acc- of a model over are defined as the percentage of adversarial examples found with , using either the given -attack for all samples or the respective worst-case attack respectively. This metric represents the remaining accuracy of the model when only adversaries under a given threshold are considered valid. We used the following thresholds for our evaluation: , and .
3.3 Training Setup and Models
All models, except the Madry model, were trained with the Adam optimizer 
for 150 epochs using basic data augmentation in the form of random shifts bypixels and random rotations by .
Two NNs are used as baseline models for the evaluation. The first model is a convolutional NN, denoted as CNN, with two convolutional layers and two fully connected layers. The convolutional layers have 32 and 64 filters with a stride of one and a kernel size of 3
3. Both are followed by max-pooling layers with a window size and stride each of 2
2. None of the layers use padding. The first fully connected layer has 128 neurons and a dropout rate of 0.5. All layers use the ReLU activation function except for the final fully connected output layer which uses a softmax function. The network was trained using the categorical cross entropy loss and an initial learning rate ofwith a decay of 0.9 at plateaus.
The second baseline model is the current SOTA model for MNIST in terms of robustness proposed in 
and denoted as Madry. This model relies on a special kind of adversarial training by considering it as a min-max optimization game: before the loss function is minimized over a given training batch, the original images are partially substituted by perturbed images withsuch that the loss function is maximized over the given batch. The Madry model was downloaded from https://github.com/MadryLab/mnist_challenge.
All three LVQ models were trained using an initial learning rate of 0.01 with a decay of 0.5 at plateaus and with
defined as the identity function. The prototypes (translation vectors) of all methods were class-wise initialized by k-means over the training dataset. For GMLVQ, we definedwith and initialized
as a scaled identity matrix with Frobenius norm one. After each update step,was normalized to again have Frobenius norm one. The basis matrices of GTLVQ were defined by
and initialized by a singular value decomposition with respect to each initialized prototypeover the set of class corresponding training points . The prototypes were not constrained to (‘box constrained’) during the training, resulting in possibly non-interpretable prototypes as they can be points in .333A restriction to leads to an accuracy decrease of less than .
Two versions of each LVQ model were trained: one with one prototype per class and one with multiple prototypes per class. For the latter the numbers of prototypes were chosen such that all LVQ models have roughly 1M parameters. The chosen number of prototypes per class are given in Tab. 2 by #prototypes.
The results of the model robustness evaluation are presented in Tab. 2. Fig. 1 displays adversarial examples generated for each model. Below, the four most notable observations that can be made from the results are discussed.
Hypothesis margin maximization in the input space produces robust models (GLVQ and GTLVQ are highly robust):
Tab. 2 shows outstanding robustness against adversarial attacks for GLVQ and GTLVQ. GLVQ with multiple prototypes and GTLVQ with both one or more prototypes per class, outperform the NN models by a large difference for the - and -attacks while having a considerably lower clean accuracy. This is not only the case for individual black-box attacks but also for the worst-case scenarios. For the -attacks this difference is especially apparent. A possible explanation is that the robustness of GLVQ and GTLVQ is achieved due to the input space hypothesis margin maximization .444Note that the results of  hold for GTLVQ as it can be seen as a version of GLVQ with infinitely many prototypes learning the affine subspaces. In  it was stated that the hypothesis margin is a lower bound for the sample margin which is, if defined in the input space, used in the definition of adversarial examples (1). Hence, if we maximize the hypothesis margin in the input space we guarantee a high sample margin and therefore, a robust model. A first attempt to transfer this principle was made in  to create a robust NN by a first order approximation of the sample margin in the input space.
However, the Madry model still outperforms GLVQ and GTLVQ in the -attacks
as expected. This result is easily explained using the manifold based
definition of adversarial examples and the adversarial training procedure
of the Madry model, which optimizes the robustness against .
Considering the manifold definition, one could say that Madry augmented
the original MNIST manifold to include small
perturbations. Doing so, Madry creates a new training-manifold
in addition to the original MNIST manifold. In other words,
the robustness of the adversarial trained Madry model
can be seen as its generalization on the new training-manifold (this
becomes clear if one considers the high acc- scores for ).
For this reason, the Madry model is only robust against off-manifold
examples that are on the generated training-manifold. As soon as off-training-manifold
examples are considered the accuracy will drop fast. This was also
shown in , where
the accuracy of the Madry model is significantly lower when considering
a threshold .555For future work a more extensive evaluation should be considered:
including not only the norm for which a single attack was optimized
but rather a combination of all three norms. This gives a better insight
on the characteristics of the attack and the defending model. The
norm can be interpreted as the number of pixels that have
to change, the norm as the maximum deviation of a pixel
and the norm as a kind of average pixel change. As attacks
are optimized for a certain norm, only considering this norm might
give a skewed impression of their attacking capability. Continuing,
calculating a threshold accuracy including only adversaries that are
below all three thresholds may give an interesting and more meaningful
norm as a kind of average pixel change. As attacks are optimized for a certain norm, only considering this norm might give a skewed impression of their attacking capability. Continuing, calculating a threshold accuracy including only adversaries that are below all three thresholds may give an interesting and more meaningful metric.
Furthermore, the Madry model has outstanding robustness scores for gradient-based attacks in general. We accredit this effect to potential obfuscation of gradients as a side-effect of the adversarial training procedure. While  was not able to find concrete evidence of gradient obfuscation due to adversarial training in the Madry model, it did list black-box-attacks outperforming white-box attacks as a signal for its occurrence.
Hypothesis margin maximization in a space different to the input space does not necessarily produce robust models (GMLVQ is susceptible for adversarial attacks):
In contrast to GLVQ and GTLVQ, GMLVQ has the lowest robustness score across all attacks and all methods. Taking the strong relation of GTLVQ and GMLVQ into account , it is a remarkable result.666GTLVQ can be seen as localized version of GMLVQ with the constraint that the matrices must be orthogonal projectors. One potential reason is, that GMLVQ maximizes the hypothesis margin in a projection space which differ in general from the input space. The margin maximization in the projection space is used to construct a model with good generalization abilities, which is why GMLVQ usually outperforms GLVQ in terms of accuracy (see the clean accuracy for GLVQ and GMLVQ with one prototype per class). However, a large margin in the projection space does not guarantee a big margin in the input space. Thus, GMLVQ does not implicitly optimize the separation margin, as used in the definition of an adversarial example (1), in the input space. Hence, GMLVQ is a good example to show that a model, which generalizes well, is not necessarily robust.
Another effect which describes the observed lack of robustness by GMLVQ is its tendency to oversimplify (to collapse data dimensions) without regularization. Oversimplification may induce heavy distortions in the mapping between input and projection space, potentially creating dimensions in which a small perturbation in the input space can be mapped to a large perturbation in the projection space. These dimensions are later used to efficiently place the adversarial attack. This effect is closely related to theory known from metric learning, here oversimplification was used by  to optimize a classifier over , which maximally collapses (concentrates) the classes to single points (related to the prototypes in GMLVQ). It is empirically shown that this effect helps to achieve a good generalization.
To improve the robustness of GMLVQ penalizing the collapsing of dimensions may be a successful approach. A method to achieve this is to force the eigenvalue spectrum of the mapping to follow a uniform distribution, as proposed in. This regularization technique would also strengthen the transferability between the margin in the projection and input space. Unfortunately, it requires the possibly numerical instable computation of the derivative of a determinant of a product of which makes it impossible to train an appropriate model for MNIST using this regularization so far. The fact that GTLVQ is a constrained version of GMLVQ gives additional reason to believe that regularizations / constraints are able to force a model to be more robust.
Increasing the number of prototypes improves the ability to generalize and the robustness:
For all three LVQ models the robustness improves if the number of prototypes per class increases. Additionally, increasing the number of prototypes leads to a better ability to generalize. This observation provides empirical evidence supporting the results of . In  it was stated that generalization and robustness are not necessarily contradicting goals, which is a topic recently under discussion.
With multiple prototypes per class, the robustness of the GLVQ model improves by a significantly larger margin than GTLVQ. This can be explained by the high accuracy of GTLVQ with one prototype. The high accuracy with one prototype per class indicates that the data manifold of MNIST is almost flat and can therefore be described with one tangent such that introducing more prototypes does not improve the model’s generalization ability. If we add more prototypes in GLVQ, the prototypes will start to approximate the data manifold and with that implicitly the tangent prototypes used in GTLVQ. With more prototypes per class, the scores of GLVQ will therefore most likely converge towards those of GTLVQ.
GLVQ and GTLVQ require semantically correct adversarial examples:
Fig. 1 shows a large semantic difference between the adversarial examples generated for GLVQ / GTLVQ and the other models. A large portion of the adversarial examples generated for the GLVQ and GTLVQ models look like interpolations between the original digit and another digit.777A similar effect was observed in  for k-NN models. This effect is especially visible for the Deepfool, BIM and Boundary attacks. In addition to this, the Pointwise attack is required to generate features from other digits to fool the models, e. g. the horizontal bar of a two in the case of GLVQ and the closed ring of a nine for GTLVQ (see digit four). In other words, for GLVQ and GTLVQ some of the attacks generate adversaries that closer resemble on-manifold samples than off-manifold. For the other models, the adversaries are more like off-manifold samples (or in the case of Madry, off-training-manifold).
In this paper we extensively evaluated the robustness of LVQ models against adversarial attacks. Most notably, we have shown that there is a large difference in the robustness of the different LVQ models, even if they all perform a hypothesis margin maximization. GLVQ and GTLVQ show high robustness against adversarial attacks, while GMLVQ scores the lowest across all attacks and all models. The discussion related to this observation has lead to four important conclusions: (1) For (hypothesis) margin maximization to lead to robust models the space in which the margin is maximized matters, this must be the same space as where the attack is placed. (2) Collapsed dimensions are beneficial for the generalization ability of a model. However, they can be harmful for the model’s robustness. (3) It is possible to derive a robust model by applying a fitting regularization / constraint. This can be seen in the relation between GTLVQ and GMLVQ and is also studied for NNs . (4) Our experimental results with an increased number of prototypes support the claim of , that the ability to generalize and the robustness are principally not contradicting goals.
In summary, the overall robustness of LVQ models is impressive. Using only one prototype per class and no purposefully designed adversarial training, GTLVQ is on par with SOTA robustness on MNIST. With further research, the robustness of LVQ models against adversarial attacks can be a valid reason to deploy them instead of NNs in security critical applications.
-  I. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015.
-  C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. In International Conference on Learning Representations, 2014.
-  G. Elsayed, D. Krishnan, H. Mobahi, K. Regan, and S. Bengio. Large margin deep networks for classification. In Advances in Neural Information Processing Systems, pages 850–860. 2018.
-  D. Stutz, M. Hein, and B. Schiele. Disentangling adversarial robustness and generalization. arXiv preprint arXiv:1812.00740, 2018.
-  T. Kohonen. Learning Vector Quantization. Neural Networks, 1(Supplement 1), 1988.
-  A. Sato and K. Yamada. Generalized Learning Vector Quantization. In Advances in Neural Information Processing Systems, pages 423–429, 1996.
-  K. Crammer, R. Gilad-Bachrach, A. Navot, and N. Tishby. Margin analysis of the lvq algorithm. In Advances in Neural Information Processing Systems, pages 479–486, 2003.
-  P. Schneider, M. Biehl, and B. Hammer. Adaptive relevance matrices in Learning Vector Quantization. Neural computation, 21(12):3532–3561, 2009.
-  S. Saralajew and T. Villmann. Adaptive tangent distances in Generalized Learning Vector Quantization for transformation and distortion invariant classification learning. In Neural Networks (IJCNN), 2016 International Joint Conference on, pages 2672–2679. IEEE, 2016.
-  L. Schott, J. Rauber, M. Bethge, and W. Brendel. Towards the first adversarially robust neural network model on MNIST. In International Conference on Learning Representations, 2019.
-  J. Rauber, W. Brendel, and M. Bethge. Foolbox: A Python toolbox to benchmark the robustness of machine learning models. arXiv preprint arXiv:1707.04131, 2017.
-  A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533, 2016.
-  Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li. Boosting adversarial attacks with momentum. In , pages 9185–9193, 2018.
-  S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2574–2582, 2016.
-  W. Brendel, J. Rauber, and M. Bethge. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. In Proceedings of the 6th International Conference on Learning Representations, 2018.
-  D. P. Kingma and J. L. Ba. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations, pages 1–13, 2015.
-  A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
-  A. Athalye, N. Carlini, and D. Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In Proceedings of the 35th International Conference on Machine Learning, 2018.
-  A. Globerson and S. Roweis. Metric learning by collapsing classes. In Advances in Neural Information Processing Systems, pages 451–458, 2006.
-  P. Schneider, K. Bunte, H. Stiekema, B. Hammer, T. Villmann, and M. Biehl. Regularization in matrix relevance learning. IEEE Transactions on Neural Networks, 21(5):831–840, 2010.
-  F. Croce, M. Andriushchenko, and M. Hein. Provable robustness of relu networks via maximization of linear regions. arXiv preprint arXiv:1810.07481, 2018.