The emerging understanding of deep learning HintonOsinderoTeh2006 ; LeCunBengioHinton2015 ; Salakhutdinov2015 , through its analogy with the Renormalisation Group (RG) MehtaSchwab2014 ; LinTegmarkRolnick2017 ; SchwabMehta2016 , reveals how its vulnerability to adversarial attack YuanHeZhuBhatLi2018 is directly related to the depth of the layered network Kenway2018 . Successive layers may amplify the sensitivity of the network to specific small perturbations in the input data, causing it to misclassify the input. Whether a deep network has this vulnerability depends on the problem which it is trained to solve and the number of layers. The RG provides a method for diagnosing if the vulnerability exists, using only the trained weights and input data Kenway2018 . However, the same information can also be used to construct the specific small changes to the input data that will confuse the network Kenway2018 . Thus, it becomes important to prevent a would-be attacker from acquiring the weights.
A possible strategy for an attacker with access to the inputs and outputs of the network would be to use them to train a clone of the network. The cloned weights could then be used to expose any vulnerability and to generate adversarial data. This paper shows how the RG can be used to poison imperceptibly the outputs of the original network, so that, if they are used to clone it, the resulting cloned weights will be wrong and the cloned network will misclassify some data. The attacker would be able to discover that the cloning attempt had failed, but would not be able to rectify it.
In the next section, we summarise the RG framework for deep learning in the context of a deep network in which each layer is a Restricted Boltzmann Machine (RBM)Salakhutdinov2015
and the network is trained to classify a data setin terms of outputs
, whose exact conditional probability distribution is. In Section III we extend this framework to incorporate the layerwise training, which involves a second deep network, built from the same RBMs, that performs data generation according to . It is the RG analysis of this data-generation network which is used in Section IV to determine how to poison the outputs of the original classification network, so that it cannot be cloned. Finally, in Section V, we discuss how applicable these RG-inspired methods are for safeguarding deep learning.
Ii Renormalisation Group for Classification
Consider a layered network of RBMs with input nodes , layers of hidden nodes , , and output nodes
. For example, the hidden nodes may be binary vectors of the same dimension as the input dataand the outputs , and the joint probability distribution for the layer is of the form Salakhutdinov2015
where , and are weights determined by the layerwise training, discussed further in Section III.
The trained deep RBM network for the classification problem, , generates the probability distribution for the output given input data , , iteratively for through the RG-like transformation
where, according to Bayes Theorem,
We define the effective Hamiltonian for the layer of this classification network by
and parametrise the space of effective Hamiltonians in terms of a complete set of operators , with couplings . Thus, the effective Hamiltonian for layer is
and the effect of the transformation in eq. (2) is to define solely in terms of , for . This generates a flow in the coupling-constant space of the effective Hamiltonians.
The stability matrix for each layer of the network is defined by
and it may be estimated using the method inKenway2018 .
The key assumption we make is that the training of the deep RBM network converges so that, for large enough, the sequence of effective Hamiltonians converges to a fixed point:
The stability properties of the fixed points determine whether the network becomes sensitive to small changes in the input data as the number of layers is increased Kenway2018 .
Iii Renormalisation Group for Data Generation
Using the individual RBMs, , making up the layers in the deep classification network in eq. (2), we can construct a deep generation network iteratively for by effectively reversing the transformation in eq. (2):
where, according to Bayes Theorem,
for , with couplings to the operators defined by
and stability matrix
Layerwise training of the RBMs , , may be achieved by minimising the mutual information,
with respect to the weights in the layer, with the weights in the other layers held fixed, and iterating to convergence for all of the layers. Here is the Kullback-Liebler divergence defined for the two probability distributions and by
For classification, the joint probability in eq. (14) is
where is the probability distribution for the sample of classified data, , used for training. The joint probability for the data-generation problem is similar to eq. (16) with and interchanged.
The assumption that the network in eq. (2) is trained to perform the classification problem correctly, ensures that, as decreases to 0, the sequence of effective Hamiltonians for the data-generation problem in eq. (11) also converges to a fixed point
) has an eigenvalue bigger than one for small, so that the fixed point has a relevant direction, then data generation is sensitive to small changes in and these may be utilised to poison the outputs of the classification network without affecting their validity for classification.
Iv Poisoning the Network Outputs
The classification network may be protected from cloning by adding an imperceptibly small perturbation to each output , which excites an unstable direction of the corresponding data-generation fixed point if it is used for data generation, as it would be in training another network, i.e.,
The deeper the network, the smaller can be to produce a significant discrepancy in eq. (19). This renders the mutual information in eq. (14) incorrect for small , so that the resulting cloned weights are wrong and, if is small enough, does not affect the use of as the classifier.
which is the strongest poison, i.e., produces the largest effect for a given small admixture, is proportional to the eigenvector of the Fisher Information Matrix (FIM)RajuMachtaSethna2017 for with the largest eigenvalue Kenway2018 . This FIM is given by
where is the expectation value of the operator with respect to the effective Hamiltonian in layer , eq. (11), i.e.,
Having chosen a subspace of operators, , in which to express the effective Hamiltonians in each layer of the network (via eq. (12)), such that sufficiently precise estimates of the stability matrix associated with each layer, , in eq. (13) are obtained by the method in Kenway2018
, we can compute the FIM using the chain rule:
and the explicit form of the RBM for the layer, e.g., using eq. (1) to express in terms of to determine .
Thus, the same method used to attack a deep network with poisoned data Kenway2018 may be used by the network to protect itself from cloning, provided some of the data-generation fixed points have unstable directions.
In this paper we have shown how the increasing vulnerability of a deep-learning network to adversarial attack with increasing depth may also provide a means of defence. This applies to a class of deep networks each of which may be represented by layers of RBMs, trained individually so that the network solves a classification problem. The training utilises the same RBMs to construct a data-generation network, with the classifiers as inputs, and it is the potential sensitivity of this network to small changes in the classifiers that turns data poisoning into a means of defence.
Both the vulnerability to attack and the effectiveness of the defence are proportional to the depth of the network, and are reliant on the corresponding RG fixed points (for classification, or data generation) having a relevant, i.e., unstable direction in the space of operators used to define effective Hamiltonians for each layer.
If the classification fixed points have no relevant directions, then the network is not susceptible to adversarial attack and no defence is required. This may be ascertained using the method in Kenway2018 . If the classification fixed point has a relevant direction, there is no guarantee that the associated data-generation fixed point also has a relevant direction, although this may be determined by the same method, which then also determines the most effective way to poison the classification outputs without affecting their validity as classifiers.
Use of the poisoned outputs to train a clone of the classification network would cause the cloned network to classify some data incorrectly. While this would be readily apparent to a would-be attacker, it could not be rectified. Unless it can be shown that instability of the classification fixed point implies instability of the associated data-generation fixed point, this protection against cloning is not always available. However, the method in Kenway2018 provides a means of checking both the inherent vulnerability of a given network and whether poisoning of its outputs is an effective defence, so that the safety of the network can be established.
- (1) G.E. Hinton, S. Osindero and Y-W. Teh, Neural Comput., 18, 1527 (2006).
- (2) Y. LeCun, Y. Bengio and G. Hinton, Nature, 521, 436 (2015).
- (3) R. Salakhutdinov, Annu. Rev. Stat. Appl., 2, 361 (2015).
- (4) P. Mehta and D.J. Schwab, An exact mapping between the Variational Renormalization Group and Deep Learning, arXiv:1410.3831.
- (5) H.W. Lin, M. Tegmark and D. Rolnick, J. Stat. Phys., 168, 1223 (2017).
- (6) D.J. Schwab and P. Mehta, Comment on “Why does deep and cheap learning work so well?”, arXiv:1609.03541.
- (7) X. Yuan, P. He, Q. Zhu, R.R. Bhat and X. Li, Adversarial Examples: Attacks and Defenses for Deep Learning, arXiv:1712.07107.
- (8) R. Kenway, Vulnerability of Deep Learning, arXiv:1803.06111.
- (9) A. Raju, B.B. Machta and J.P. Sethna, Information geometry and the renormalization group, arXiv:1710.05787.