I Introduction
Artificial deep neural networks have achieved the stateoftheart performances in many domains such as pattern recognition and even natural language processing
[1]. However, deep neural networks suffer from adversarial attacks [2, 3], i.e., they can make an incorrect classification with high confidence when the input image is slightly modified yet maintaining its class label. In contrast, for humans and other animals, the decision making systems in the brain are quite robust to imperceptible pixel perturbations in the sensory inputs [4]. This immediately establishes a fundamental question: what is the origin of the adversarial vulnerability of artificial neural networks? To address this question, we can first gain some insights from recent experimental observations of biological neural networks.A recent investigation of recorded population activity in the visual cortex of awake mice revealed a power law behavior in the principal component spectrum of the population responses [5], i.e., the
biggest principal component (PC) variance scales as
, where is the exponent of the power law. In this analysis, the exponent is always slightly greater than one for all input naturalimage stimuli, reflecting an intrinsic property of a smooth coding in biological neural networks. It can be proved that when the exponent is smaller than , where is the manifold dimension of the stimuli set, the neural coding manifold must be fractal [5], and thus slightly modified inputs may cause extensive changes in outputs. In other words, the encoding in a slow decay of population variances would capture fine details of sensory inputs, rather than an abstract concept summarizing the inputs. For a fast decay case, the population coding occurs in a smooth and differentiable manifold, and the dominant variance in the eigenspectrum captures key features of the object identity. Thus, the coding is robust, even under adversarial attacks. Inspired by this recent study, we ask whether the powerlaw behavior exists in the eigenspectrum of the correlated hidden neural activity in deep neural networks. Our goal is to clarify the possible fundamental relationship between classification accuracy, the decay rate of activity variances, manifold dimensionality and adversarial attacks of different nature.Taking the tradeoff between biological reality and theoretical analysis, we consider a special type of deep neural network, trained with a local cost function at each layer [6]
. Moreover, this kind of training offers us the opportunity to look at the aforementioned fundamental relationship at each layer. The input signal is transferred by trainable feedforward weights, while the error is propagated back to adjust the feedforward weights via random quenched weights connecting the classifier at each layer. The learning is therefore guided by the target at each layer, and layered representations are created due to this hierarchical learning. These layered representations offer us the neural activity space for the study of the above fundamental relationship.
Ii Model
The deep neural network under investigation is shown in Fig. 1. We train the network of layers (numbered from ) to classify the MNIST handwritten digits [7]
. The first layer serves as an input layer, receiving a 784dimensional vector from the MINST dataset. These sensory inputs are then transferred to the next layer in a feedforward manner. The
() layer has neurons connected to a random classifier of a fixed size, corresponding to the number of digit classes. The random classifier means that the connection weight to the classifier is presampled from a zeromean Gaussian distribution. The depth
and each layer’s width are flexible according to our settings. The strength of the connection or the weight between neuron at the layer and neuron at the upper layer is denoted by , which is trainable. Similarly, at the layer , the weight between neuron at this layer and neuron at the neighboring random classifier is specified by , which is quenched.For the forward propagation, e.g., at the layer , we have an input and the preactivation is given by , simply summarizing the input pixels according to the corresponding weights. To ensure that the preactivation is of the order one, a scaling factor is often applied to the preactivation. However, the scaling is not necessary in our task. A nonlinear transfer function is then applied to the preactivation to obtain an activation defined by
. Here, we use the rectified linear unit (ReLU) function as the nonlinear function, i.e.,
. The input of the next layer is the activation of layer , except for the first layer where the input is a dimensional vector characterizing one handwritten digit. Meanwhile, the activation of the current layer is also fed to the random classifier, resulting in a classification score vector whose components. The score vector can be transformed into a classification probability by applying the softmax function, i.e.,
. can be understood as the probability of the inputlabel prediction. To specify the cost function, we first define as the onehot representation for the label of an image. More precisely, , where is the digit number of the input image. Finally, the local error function at the layer is given by . The local error is nothing but the cross entropy between and . The forward propagation process can be summarized as follows:(1)  
The local cost function is minimized when for every
. The minimization is achieved by the gradient decent method. The gradient of the local error with respect to the weight of the feedforward layer can be calculated by applying the chain rule, given by:
(2) 
Then, all trainable weights are updated by following , where and indicate the learning step and the learning rate, respectively. In our settings,
is a prefixed random Gaussian variables. In practice, we use the minibatchbased stochastic gradient descent method. The entire training dataset is divided into hundreds of minibatches; after sweeping through all minibatches, an epoch of learning is completed. The learning is then tested in an unseen/test dataset. For the standard MNIST dataset, the training set has
images, and the test set has images. We can conclude whether a test image is correctly classified at the layer by comparing the position of the maximum component of the classifier score vector with that of the onehot label , i.e., the image is correctly classified at the layer if . The test accuracyis then estimated by the number of correctlyclassified images divided by the total number of images in the test set.
After learning, the input ensemble can be transfered throughout the network in a layerwise manner. Then, at each layer, the activity statistics can be analyzed by the eigenspectrum of the correlation matrix (or covariance matrix). We use principle component analysis (PCA) to obtain the eigenspectrum, which gives variances along orthogonal directions in the descending order. For each input image, the population output of neurons at the layer can be thought of as a point in the dimensional activation space. It then follows that, for input images, the outputs can be seen as a cloud of points. The PCA first finds the direction with a maximal variance of the cloud, then chooses the second direction orthogonal to the first one, and so on. Finally, the PCA identifies orthogonal directions and corresponding variances. In our current setting, the eigenvalues of the the covariance matrix of the neural manifold explain variances. Arranging the eigenvalues in the descending order leads to the eigenspectrum whose behavior will be later analyzed in the next section.
To consider effects of adversarial examples, we add perturbations to the original input
at each layer. The original input is obtained by the layerwise propagation of an image from the test set. We consider two kinds of additive perturbations: one is the Gaussian white noise and the other is the fast gradient sign method (FGSM) noise
[8, 9], representing blackbox attacks and whitebox attacks, respectively. Each component of the white noise is an i.i.d random number drawn from zero mean Gaussian distribution with different variances (attack/noise strength), and each component of the FGSM noise is taken from the gradient of the local cost function at each layer with respect to the immediate input of this layer. These types of perturbations are given as follows:(3a)  
(3b) 
where , denotes the perturbation magnitude, and . In fact, the FGSM attack can be thought of as an norm ball attack around the original input image.
Iii Results and Discussion
In this section, we apply our model to clarify the possible fundamental relationship between classification accuracy, the decay rate of activity variances, manifold dimensionality and adversarial attacks of different nature.
iii.1 Test error decreases with depth
We first show that the deep supervised learning in our current setting works. Fig.
2 shows that the training error decreases as the test accuracy increases (before early stopping) during training. As the layer goes deeper, the test accuracy grows until saturation despite a slight deterioration. This behavior provides an ideal candidate of deep learning to investigate the emergent properties of the layered intermediate representations after learning, without and with adversarial attacks. Next, we will study in detail how the test accuracy is related to the powerlaw exponent, how the test accuracy is related to the attack strength, and how the dimensionality of the layered representation changes with the exponent, under zero, weak, and strong adversarial attacks.iii.2 Powerlaw decay of dominant eigenvalues of the activity correlation matrix
A typical eigenspectrum of our current deep learning model is given in Fig. 3. Notice that the eigenspectrum is displayed in the loglog scale, then the slope of the linear fit of the spectrum gives the powerlaw exponent . We use the first ten PC components to estimate but not all for the following two reasons: () A waterfall phenomenon appears at the position around the dimension, which is more evident at higher layers. () The first ten dimensions explain more than of the total variance, and thus they capture the key information about the geometry of the representation manifold. The waterfall phenomenon in the eigenspectrum can occur multiple times, especially for deeper layers [Fig. 3 (a)], which is distinct from that observed in biological neural networks [see the inset of Fig. 3 (a)]. This implies that the artificial deep networks may capture fine details of stimuli in a hierarchical manner. A typical example of obtaining the powerlaw exponent is shown in Fig. 3 (b) for the fifth layer. When the stimulus size is chosen to be large enough (e.g., ; throughout the paper), the fluctuation of the estimated exponent due to stimulus selection can be neglected.
iii.3 Effects of layer width on test accuracy and powerlaw exponent
We then explore the effects of layer width on both test accuracy and powerlaw exponent. As shown in Fig. 4 (a), the test accuracy becomes more stable with increasing layer width. This is indicated by an example of which shows a large fluctuation of the test accuracy especially at deeper layers. We conclude that a few hundreds of neurons at each layer is sufficient for an accurate learning.
The powerlaw exponent also shows a similar behavior; the estimated exponent shows less fluctuations as the layer width increases. This result also shows that the exponent grows with layers. The deeper the layer is, the larger the exponent becomes. A larger exponent suggests that the manifold is smoother, because the dominant variance decays fast, leaving few space for encoding the irrelevant features in the stimulus ensemble. This may highlight the depth in hierarchical learning is important for capturing key characteristics of sensory inputs.
iii.4 Relationship between test accuracy and powerlaw exponent
In the following simulations, we use neurons at each layer. The relationship between and is plotted in Fig. 5
, with error bars indicating the standard errors across
independent trainings. The typical value of grows along the depth, implying that the coding manifold becomes smoother at higher layers. The clear separation of the values between the earlier layer (e.g., the exponent of the first layer is less than one) and the deeper layers may indicate a transition from nonsmooth (a slow decay of the eigenspectrum) coding to smooth coding. Interestingly, the test accuracy does not increase with the depth for deeper layers, just simply increasing first and then quickly drops with a relatively small margin. This suggests that too much smoothness will harm the generalization ability of the network, although the impact is not significant and thus limited.iii.5 Properties of the model under blackbox attacks
We first consider the additive Gaussian white noise perturbation to the input representation at each layer. This kind of perturbation is also called the blackbox attack, because it is not necessary to have access to the training details of the deep learning model, including architectures and loss function. Under this attack, the test accuracy decreases as the perturbation magnitude increases. As expected from the above results of manifold smoothness at deeper layers, deeper layers become more robust against the blackbox attacks of increasing perturbation magnitude [Fig.
6 (a, b)]. The dependence of the powerlaw exponent shows that the adversarial robustness correlates with the manifold smoothness at deeper layers, which highlights the fundamental relationship between the adversarial vulnerability and the fractal manifold [5].Inspecting carefully the behavior of as a function of the test accuracy, we can identify two interesting regimes separated by a turning point at [Fig. 6 (c)]. Below the turning point, increasing results in a clear decreasing in both and ; Moreover, for the first three layers, a linear relationship can be identified. The linear fitting coefficient is as high as . This observation is very interesting. The underlying mechanism is closely related to the monotonic behavior of the accuracy or exponent as a function of attack strength. Let us define their respective functions as and . Then a simple transformation leads to , which suggests that a particular choice of and allows for the observed linear relationship, e.g., a linear function. Therefore, how the earlier layer is affected by adversarial examples with increasing strength plays a vital role in shaping the interesting linear regime. Above the turning point, the linear fitting fails, and instead, a nonlinear relationship takes over. One key reason is that deeper layers are more robust against adversarial examples in our current setting.
iii.6 Properties of the model under whitebox attacks
We then consider the whitebox attack—FGSM additive noise perturbation; the results are summarized in Fig 7. The FGSM attack is much stronger than the blackbox attack, as expected from the fact that the loss function and network architecture knowledge are both used by the FGSM attack. The first few layers display evident adversarial vulnerability, or completely fail to identity correct class of the input images, while the last deeper layers still show adversarial robustness to some extent. These deeper layers also maintain a relatively high value of , although strong adversarial examples strongly reduce the manifold smoothness especially for layers next to the earlier layers. Under the FGSM attack, the  relationship becomes complicated, and the linearnonlinear separation that occurs in the blackbox attacks disappears. However, the high accuracy still implies a high value of . In particular, when the whitebox perturbation is weak, the system reduces the manifold smoothness by a small margin to get a high accuracy [Fig. 7 (c)]. This can be interpreted as follows. Supported by the trained weights (no adversarial training), the manifold formed by the adversarial examples takes into account more details (or some special directions pointing to the decision boundary) of the adversarial examples. Under the FGSM attack, specific pixels with norm perturbations affecting strongly the loss function are particularly used to flip the decision output. In this sense, the competition (or tradeoff) between the accuracy and the manifold smoothness captured by is present. This may explain why there exists a peak in Fig. 7 (c). The peak also appears in Fig. 7 (b). Both types of peaks have the onetoone correspondence. Note that blackbox attacks has no such properties.
iii.7 Relationship between manifold linear dimensionality and powerlaw exponent
The linear dimensionality of a manifold formed by data/representations can be thought of as a first approximation of intrinsic geometry of a manifold [10, 11], defined as follows:
(4) 
where is the eigenspectrum of the covariance matrix. Suppose the eigenspectrum has a powerlaw decay behavior as the PC dimension increases, we simplify the dimensionality equation as follows:
(5) 
where denotes the layer width, and the approximation is used to get the second equality. In the thermodynamic limit, , where is the Reimann zeta function. Note that for a small value of , a theoretical prediction of in Fig. 8 (b,c) is obtained by using the first equality (sum).
Results are shown in Fig. 8. The theoretical prediction agrees roughly with simulations under zero, weak and strong attacks of blackbox and whitebox types. This shows that using the powerlaw decay behavior of the eigenspectrum in terms of the first few dominant dimensions to study the relationship between the manifold geometry and adversarial vulnerability of artificial neural networks is also reasonable, as also confirmed by many aforementioned nontrivial properties about this fundamental relationship. Note that when the network width increases, a deviation may be observed due to the waterfall phenomenon observed in the eigenspectrum (see Fig. 3).
Iv Conclusion
In this work, we study the fundamental relationship between the adversarial robustness and the manifold smoothness characterized by the powerlaw exponent of the eigenspectrum. The eigenspectrum is obtained from the correlated neural activity on the representation manifold. We choose deep supervised learning with local errors as our target deep learning model, because of its nice property of allowing for analyzing each layered representation in terms of both test accuracy and manifold smoothness. We then reveal that the deeper layer has a larger value of , thereby possessing the adversarial robustness against both black and whitebox attacks. In particular, we find a turning point under the blackbox attacks for the curve, separating linear and nonlinear relationships. This turning point also signals the qualitative change of the manifold geometric property. Under the whitebox attacks, the exponentaccuracy behavior becomes more complicated, as isotropic properties of attacks like in Gaussian white noise perturbations do not hold, which requires that a tradeoff between manifold smoothness and test accuracy, given the normal trained weights (no adversarial training), should be taken.
All in all, although our study does not provide precise mechanisms underlying the adversarial vulnerability, the empirical works are expected to offer some intuitive arguments about the fundamental relationship between generalization capability and the intrinsic properties of representation manifolds inside the deep neural networks, encouraging future mechanistic studies towards the final goal of aligning machine perception and human perception [4].
Acknowledgements.
This research was supported by the startup budget 7413018831109 of the 100talent program of Sun Yatsen University, and the NSFC (Grant No. 11805284).References
 [1] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, Cambridge, MA, 2016.
 [2] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pages 39–57, 2017.

[3]
Jiawei Su, Danilo Vasconcellos Vargas, and Kouichi Sakurai.
One pixel attack for fooling deep neural networks.
IEEE Transactions on Evolutionary Computation
, 23(5):828–841, 2019.  [4] Zhenglong Zhou and Chaz Firestone. Humans can decipher adversarial images. Nature Communications, 10(1):1334, 2019.
 [5] Carsen Stringer, Marius Pachitariu, Nicholas Steinmetz, Matteo Carandini, and Kenneth D. Harris. Highdimensional geometry of population responses in visual cortex. Nature, 571(7765):361–365, 2019.
 [6] Hesham Mostafa, Vishwajith Ramesh, and Gert Cauwenberghs. Deep supervised learning using local errors. Frontiers in Neuroscience, 12:608, 2018.
 [7] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86:2278–2324, 1998.
 [8] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In ICLR 2014 : International Conference on Learning Representations (ICLR) 2014, 2014.
 [9] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In ICLR 2015 : International Conference on Learning Representations 2015, 2015.
 [10] Haiping Huang. Mechanisms of dimensionality reduction and decorrelation in deep neural networks. Phys. Rev. E, 98:062313, 2018.
 [11] Jianwen Zhou and Haiping Huang. Weaklycorrelated synapses promote dimension reduction in deep neural networks. arXiv:2006.11569, 2020.
Comments
There are no comments yet.