Vulnerability of Deep Learning

03/16/2018 ∙ by Richard Kenway, et al. ∙ 0

The Renormalisation Group (RG) provides a framework in which it is possible to assess whether a deep-learning network is sensitive to small changes in the input data and hence prone to error, or susceptible to adversarial attack. Distinct classification outputs are associated with different RG fixed points and sensitivity to small changes in the input data is due to the presence of relevant operators at a fixed point. A numerical scheme, based on Monte Carlo RG ideas, is proposed for identifying the existence of relevant operators and the corresponding directions of greatest sensitivity in the input data. Thus, a trained deep-learning network may be tested for its robustness and, if it is vulnerable to attack, dangerous perturbations of the input data identified.



There are no comments yet.


page 1

page 2

page 3

page 4

Code Repositories


Code for 'Vulnerability of deep learning based gait biometric recognition to adversarial perturbations'

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

i.1 The Threat to Deep Learning

Despite many successful applications, deep learning HintonOsinderoTeh2006 ; LeCunBengioHinton2015 ; Salakhutdinov2015 is poorly understood. The absence of an underlying theory means that we cannot be certain under what circumstances a given trained network will operate correctly and this uncertainty calls into question the use of such networks in safety-critical applications. There is also the possibility that slightly perturbed input data can be constructed that would fool the network, but not a human, making the network susceptible to adversarial attack, or simply cause it to misclassify and thereby conceal information in the data YuanHeZhuBhatLi2018 .

It is important to distinguish between ambiguous data, which any system, machine or human, would find difficult to classify and data that has been imperceptibly altered to cause the deep network to misclassify it. This paper introduces a theoretical framework, based on the Renormalisation Group (RG), which specifies the conditions under which the latter may occur and proposes a calculational method for testing a trained network for this vulnerability. Further, it determines the directions in the space of input data along which small perturbations are amplified by such a vulnerable deep network, causing it to generate a qualitatively incorrect output.

The theoretical framework captures the salient features of deep learning: the multi-layer structure, trained one layer at a time. It is demonstrated for the specific case in which each layer is a Restricted Boltzmann Machine (RBM) 

Salakhutdinov2015 and the network is trained to classify a data set in terms of outputs

, whose exact conditional probability distribution is

. However, the approach may be extended straightforwardly to other deep networks and to data generation as well as classification.

i.2 Analogy with the Renormalisation Group

The formal analogy between deep learning and the RG has been noted before, although with some controversy about whether the context should be unsupervised, or supervised learning 

MehtaSchwab2014 ; LinTegmarkRolnick2017 ; SchwabMehta2016 . There are more fundamental issues.

The first is that the RG applied to critical phenomena describes how features of the microscopic degrees of freedom of a system (analogous to the input data

) depend on length scale, successively eliminating short-distance fluctuations and generating a sequence of effective Hamiltonians, which describe the surviving features on larger and larger length scales. A critical point is associated with scale-invariant fluctuations and convergence of the sequence of effective Hamiltonians to a fixed-point Hamiltonian which describes them. Although many applications of deep learning do involve data which exhibit structure on different length scales, e.g., data drawn from the physical world, and this appears to be captured by deep learning so that it is “cheap” LinTegmarkRolnick2017 , this need not be the case.

The second is that the RG transformation effected by each layer of the network may be different, reflecting the emergence of different types of features in the data, whereas in critical phenomena each application of the RG transformation changes the length scale by a fixed amount.

We will assume that training of the deep network converges to a conditional probability distribution that is a good approximation to , provided there is sufficient number of layers. Thus, the sequence of RG transformations corresponding to applying the RBM in each layer to the output from the hidden nodes in the previous layer, starting with the input data, converges.

Presumably, some more generalised form of scaling takes place in deep learning and we will assume that this has the effect of successively scaling to smaller values all but a finite number of eigenvalues of the Fisher Information Matrix (FIM) of the trained deep network, so that its spectrum is sloppy 

MachtaChachraTranstrumSethna2013 . We will say more about this in Section IV.

In Section II we introduce the RG formalism in the context of a deep RBM network used for classification and, in particular, define the sequence of effective Hamiltonians whose fixed points and universality classes define distinct outputs. In Section III we develop the computational scheme, based on Monte Carlo RG Ma1976 ; Swendsen1979

, for estimating the stability of these fixed points with respect to the depth of the network. Section 

IV relates the instability of a particular fixed point to direction(s) in the input data space , along which small changes in the data cause major changes in the output probability distribution and, hence, are sources of vulnerability.

Ii Renormalisation Group for Deep Learning

Consider a layered network of RBMs with input nodes , layers of hidden nodes , , and output nodes

. For example, the hidden nodes may be binary vectors of the same dimension as the input data

and the outputs , and the joint probability distribution for the layer is of the form Salakhutdinov2015


where , and are weights determined by the layerwise training. We will not require the precise form of .

The trained deep RBM network generates the probability distribution for the output given input data , , iteratively for through


where, according to Bayes Theorem,


We define the effective Hamiltonian for the layer by


Then, the sequence of Hamiltonians is generated by the RBMs in each layer via


such that each Hamiltonian depends only on the previous one in the sequence. This has the same form as an RG transformation with kernel  MehtaSchwab2014 .

At this point, it is worth noting that the input data determines the couplings in the space of effective Hamiltonians via the first layer,


and the couplings in each subsequent layer are functions solely of the couplings in the previous layer.

We parametrise the space of effective Hamiltonians in terms of a complete set of operators

, which, in the case of binary variables, consist of all possible products of the components of

, with couplings . Thus, the effective Hamiltonian for layer is


Then the effect of the RG transformation in eq. (5) is to define solely in terms of and, thereby, to generate a flow in the coupling-constant space of the effective Hamiltonians.

The key assumption we make is that the training of the deep RBM network converges, so that, for large enough, the sequence of effective Hamiltonians converges to a fixed point:


such that


The analogue of the RG universality class of the fixed point is the subset of the input data, , which are associated with the same output distribution, . Thus, assuming the training correctly classifies the data,




so that there must be more than one fixed point. In any non-trivial classification problem, there is a distinct fixed point of the correctly trained deep network for every distinct output.

Next, we consider the flow in coupling-constant space close to a fixed point and, in particular, the stability properties of the fixed point as the number of layers is increased. We assume that the deeper the network the more accurately it is able to learn the classification problem. Close to the fixed point , the relationship between the couplings in adjacent layers is, to leading order,


Since , this becomes




is the stability matrix of the fixed point. The stability properties of a fixed point determine whether it becomes sensitive to small changes in the input data as the number of layers is increased.

Define the eigenvalues and eigenvectors of the stability matrix by


and express the couplings, , and the corresponding operators, , in the basis of eigenvectors of the stability matrix as


Then the effective Hamiltonian in this basis is


and, close to the fixed point, from eqs (13) and (15),


In the language of the RG, if , is relevant and, if , it is irrelevant. In the context of deep learning, the existence of a relevant operator at a fixed point creates a sensitivity to input data which generate a non-zero coupling to this operator. Such data will lead to a probability distribution for the output which increasingly deviates from the fixed-point distribution as the network is made deeper (more layers are added). This is the source of vulnerability of deep networks to tiny changes in the input data which excite a relevant operator, leading to the sequence of effective Hamiltonians converging to the wrong fixed point, i.e., misclassifying the data.

As it stands, this provides an explanation of the potential vulnerability of deep networks, but doesn’t help to diagnose whether a particular network is susceptible. This is because, in practice, we only have access to the RBM weights, e.g., eq. (1), which enable us to sample the hidden nodes and output nodes for a given input, but don’t enable us to construct the sequence of effective Hamiltonians, or the fixed-point Hamiltonian. In the next section, we will explain how Monte Carlo RG methods may be used to estimate the stability matrix , defined in eq. (14), and how its eigenvalues behave in deep networks close to the fixed point. This will be sufficient to determine whether a particular trained deep network has any fixed points with relevant operators.

It is worth noting that the existence of multiple fixed points (associated with a non-trivial classification problem) does not imply the existence of relevant operators at any of the fixed points. In the robust situation, where no fixed point has a relevant direction, data close to the boundary between two universality classes are simply ambiguous and do not represent a source of adversarial attack (because a human would be equally likely to misclassify them).

Iii Stability of Deep Learning

Having trained a deep network of RBMs on a classification problem whose exact solution is , we have available a sequence of RBMs given, for example, by eq. (1), which enables us to use the conditional probability distributions in eq. (2) to sample the hidden nodes in each layer and the output layer. Using this capability, we apply methods developed for Monte Carlo RG Ma1976 ; Swendsen1979 to estimate the stability matrix for each fixed point in the space of effective Hamiltonians.

Denote by the expectation value of the operator with respect to the effective Hamiltonian in layer , eq. (4), i.e.,


We start from the chain rule applied to

close to a fixed point (i.e., for large),


Here is the approximation to the stability matrix in eq. (14) obtained from the layer, which should converge to for large . Differentiating eq. (20) and using eq. (7), we obtain for the term on the RHS of eq. (21)


To obtain a similar expression for the term on the LHS of eq. (21), we need to use , the RBM for layer , to relate to , as in eq. (5). In Monte Carlo RG, this is where the RG blocking step enters. We obtain

Thus, with a suitable choice of operators, given a specific input , by sampling the hidden nodes in the upper layers of the network (large ) and computing a set of expectation values of these operators, we can construct a set of linear equations for the elements of the stability matrix, ,


Hence, we may obtain a set of successive approximations for the eigenvalues and eigenvectors in eq. (15).

By varying the subset of operators used in eq. (24), we can test the convergence of the estimates of the largest eigenvalues of the stability matrix associated with a given layer of the network and, by computing the eigenvalues from successive layers, we can investigate whether increasing depth of the network is associated with scaling behaviour of the corresponding operators, as in eq. (19).

If this scaling behaviour is due to an eigenvalue of magnitude greater than one, then even a tiny component of the initial data, which creates a non-zero coupling to the corresponding eigenvector, will cause instability of the fixed point if the network is deep. In the next section, we will explain how to identify the specific perturbation of the data which does this.

Iv Generation of Adversarial Data

We consider the exact conditional probability distribution, , and hence the trained network, , to be parametrised by the input data, , via the fixed-point Hamiltonian in eqs (8) and (9). The Fisher Information Matrix (FIM) MachtaChachraTranstrumSethna2013 ; RajuMachtaSethna2017 is a metric which measures how a probability distribution changes along different directions in parameter space. If the fixed-point Hamiltonian has an unstable direction due to a relevant operator, then this will correspond to the direction in parameter space along which the probability distribution changes fastest, i.e., the stiffest direction. Hence, to generate adversarial data we need to compute the eigenvector associated with the largest eigenvalue of the Fisher information matrix for the fixed-point probability distribution. A tiny admixture of this component in the data will lead to erroneous classification if the network is too deep.

The FIM is defined in terms of the Kullback-Liebler divergence, , which is itself a measure of how distinguishable two probability distributions and are, using data sampled from :


This enables us to measure the dependence of on by considering two nearby data points, and , and expanding the Kullback-Liebler divergence as a Taylor series in their difference MachtaChachraTranstrumSethna2013 ,


where, using eqs (4), (7) and (20),

is the FIM.

Having chosen a subspace of operators, , in which to express the effective Hamiltonians in each layer of the network (via eq. (7)), such that sufficiently precise estimates of the largest eigenvalues of the stability matrix associated with each layer, , are obtained by the method in Section III, we can compute the FIM using the chain rule:


The last derivative may be computed from eq. (6) and the explicit form of the RBM for the first layer, e.g., eq. (1), by expressing in terms of to determine . Alternatively, we may solve the system of linear equations in eq. (24) for the first layer:


Thus, computing the FIM involves translating the input data into the chosen coupling-constant subspace via the RBM in the first layer, taking the product of the stability matrices associated with each layer in eq. (28), and computing a set of expectation values of operators in this subspace using the full network. The fact that the FIM is built from the product of stability matrices connects the scaling behaviour in eqs (13) and (19) with the sloppiness of the FIM spectrum observed in MachtaChachraTranstrumSethna2013 , i.e., all but a fixed number of eigenvalues of the FIM are scaled to very small values, resulting in a fixed number of stiff (relevant) directions and many sloppy (irrelevant) directions.

We may then obtain the eigenvector corresponding to the largest eigenvalue of the FIM. For deep networks which have an unstable fixed point, small perturbations of the corresponding input data in the direction of this eigenvector of the FIM are likely to result in the data being misclassified.

V Conclusions

In this paper we have extended the formal analogy between deep learning and the RG. This enabled us to interpret the classification problem in terms of a sequence of effective Hamiltonians associated with the layers of the network, whose couplings are determined by the input data, and the convergence of this sequence to a distinct fixed point for each distinct output of the classification problem. Input data with the same classification should all result in a flow in the coupling-constant space of these effective Hamiltonians which converges to the same fixed point.

This exposed the possibility that a fixed point might have an unstable direction, associated with a relevant operator in the language of the RG. Small perturbations of the input data that create a non-zero overlap with such a relevant operator tend to be amplified exponentially by successive layers of the network, so that the flow diverges from the correct fixed point and the data may be misclassified. Fixed points with no relevant operators are not prone to this sort of error and will only misclassify data that is ambiguous even to a human.

Convergence of the sequence of effective Hamiltonians to a fixed point for a given subset of the input data is guaranteed by the training (assuming this converges to a good approximation to the exact conditional probability distribution). The existence of a relevant direction at the fixed point is associated with sensitivity to perturbations which take the data outside of this subset. The size of the amplification factor for the perturbation is greater the greater the depth of the network. Hence, the vulnerability to misclassification is directly due to the depth of a network.

The key result of this paper is a method for diagnosing whether a given trained network is vulnerable to adversarial attack, or simply misclassification due to noise in the data, based only on computing expectation values of operators in the hidden and output layers. The computations are fairly demanding, but may be justified for a network that is intended for use in safety-critical applications.