# Protection against Cloning for Deep Learning

The susceptibility of deep learning to adversarial attack can be understood in the framework of the Renormalisation Group (RG) and the vulnerability of a specific network may be diagnosed provided the weights in each layer are known. An adversary with access to the inputs and outputs could train a second network to clone these weights and, having identified a weakness, use them to compute the perturbation of the input data which exploits it. However, the RG framework also provides a means to poison the outputs of the network imperceptibly, without affecting their legitimate use, so as to prevent such cloning of its weights and thereby foil the generation of adversarial data.

## Authors

• 2 publications
• ### Vulnerability of Deep Learning

The Renormalisation Group (RG) provides a framework in which it is possi...
03/16/2018 ∙ by Richard Kenway, et al. ∙ 0

• ### Rallying Adversarial Techniques against Deep Learning for Network Security

Recent advances in artificial intelligence and the increasing need for p...
03/27/2019 ∙ by Joseph Clements, et al. ∙ 0

• ### Adversarial Initialization -- when your network performs the way I want

The increase in computational power and available data has fueled a wide...
02/08/2019 ∙ by Kathrin Grosse, et al. ∙ 0

• ### Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Neural Networks

We explore the problem of selectively forgetting a particular set of dat...
11/12/2019 ∙ by Aditya Golatkar, et al. ∙ 20

• ### Proximal Mapping for Deep Regularization

Underpinning the success of deep learning is effective regularizations t...
06/14/2020 ∙ by Mao Li, et al. ∙ 0

• ### Identifying Weights and Architectures of Unknown ReLU Networks

The output of a neural network depends on its parameters in a highly non...
10/02/2019 ∙ by David Rolnick, et al. ∙ 0

• ### Reconstructing Network Inputs with Additive Perturbation Signatures

In this work, we present preliminary results demonstrating the ability t...
04/11/2019 ∙ by Nick Moran, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

The emerging understanding of deep learning HintonOsinderoTeh2006 ; LeCunBengioHinton2015 ; Salakhutdinov2015 , through its analogy with the Renormalisation Group (RG) MehtaSchwab2014 ; LinTegmarkRolnick2017 ; SchwabMehta2016 , reveals how its vulnerability to adversarial attack YuanHeZhuBhatLi2018 is directly related to the depth of the layered network Kenway2018 . Successive layers may amplify the sensitivity of the network to specific small perturbations in the input data, causing it to misclassify the input. Whether a deep network has this vulnerability depends on the problem which it is trained to solve and the number of layers. The RG provides a method for diagnosing if the vulnerability exists, using only the trained weights and input data Kenway2018 . However, the same information can also be used to construct the specific small changes to the input data that will confuse the network Kenway2018 . Thus, it becomes important to prevent a would-be attacker from acquiring the weights.

A possible strategy for an attacker with access to the inputs and outputs of the network would be to use them to train a clone of the network. The cloned weights could then be used to expose any vulnerability and to generate adversarial data. This paper shows how the RG can be used to poison imperceptibly the outputs of the original network, so that, if they are used to clone it, the resulting cloned weights will be wrong and the cloned network will misclassify some data. The attacker would be able to discover that the cloning attempt had failed, but would not be able to rectify it.

In the next section, we summarise the RG framework for deep learning in the context of a deep network in which each layer is a Restricted Boltzmann Machine (RBM)

Salakhutdinov2015

and the network is trained to classify a data set

in terms of outputs

, whose exact conditional probability distribution is

. In Section III we extend this framework to incorporate the layerwise training, which involves a second deep network, built from the same RBMs, that performs data generation according to . It is the RG analysis of this data-generation network which is used in Section IV to determine how to poison the outputs of the original classification network, so that it cannot be cloned. Finally, in Section V, we discuss how applicable these RG-inspired methods are for safeguarding deep learning.

## Ii Renormalisation Group for Classification

Consider a layered network of RBMs with input nodes , layers of hidden nodes , , and output nodes

. For example, the hidden nodes may be binary vectors of the same dimension as the input data

and the outputs , and the joint probability distribution for the layer is of the form Salakhutdinov2015

 tk(hk,hk−1) = ehTkWkhk−1+aTkhk+bTkhk−1zk, zk = Trhk,hk−1ehTkWkhk−1+aTkhk+bTkhk−1, (1)

where , and are weights determined by the layerwise training, discussed further in Section III.

The trained deep RBM network for the classification problem, , generates the probability distribution for the output given input data , , iteratively for through the RG-like transformation

 qk(hk|x) = Trhk−1tk(hk|hk−1)qk−1(hk−1|x), q0(h0|x) = δh0,x, qN(y|x) ≈ p(y|x), (2)

where, according to Bayes Theorem,

 tk(hk|hk−1)=tk(hk,hk−1)Trhktk(hk,hk−1). (3)

We define the effective Hamiltonian for the layer of this classification network by

 qk(h|x) = e−H(k)x(h)Z(k)x, Z(k)x = Trhe−H(k)x(h) (4)

and parametrise the space of effective Hamiltonians in terms of a complete set of operators , with couplings . Thus, the effective Hamiltonian for layer is

 H(k)x(h)=∑αg(k)αOα(h) (5)

and the effect of the transformation in eq. (2) is to define solely in terms of , for . This generates a flow in the coupling-constant space of the effective Hamiltonians.

The stability matrix for each layer of the network is defined by

 T(k)αβ=∂g(k)α∂g(k−1)β,k=2,…,N (6)

and it may be estimated using the method in

Kenway2018 .

The key assumption we make is that the training of the deep RBM network converges so that, for large enough, the sequence of effective Hamiltonians converges to a fixed point:

 H∗x(h)=∑αg∗αOα(h), (7)

such that

 e−H∗x(y)Z∗x=qN(y|x)≈p(y|x). (8)

The stability properties of the fixed points determine whether the network becomes sensitive to small changes in the input data as the number of layers is increased Kenway2018 .

## Iii Renormalisation Group for Data Generation

Using the individual RBMs, , making up the layers in the deep classification network in eq. (2), we can construct a deep generation network iteratively for by effectively reversing the transformation in eq. (2):

 ~qk−1(hk−1|y) = Trhktk(hk−1|hk)~qk(hk|y), ~qN(hN|y) = δhN,y, ~q0(x|y) ≈ p(x|y), (9)

where, according to Bayes Theorem,

 tk(hk−1|hk)=tk(hk,hk−1)Trhk−1tk(hk,hk−1). (10)

In analogy with eqs (4) - (6), we can define a sequence of effective Hamiltonians for data generation as follows.

 ~qk(h|y) = e−~H(k)y(h)~Z(k)y, ~Z(k)y = Trhe−~H(k)y(h), (11)

for , with couplings to the operators defined by

 ~H(k)y(h)=∑α~g(k)αOα(h) (12)

and stability matrix

 ~T(k)αβ=∂~g(k−1)α∂~g(k)β,k=N−1,…,1. (13)

Layerwise training of the RBMs , , may be achieved by minimising the mutual information,

 DKL(Trx,y~qk(hk|y)qk−1(hk−1|x)p(x,y)||tk(hk,hk−1)) (14)

with respect to the weights in the layer, with the weights in the other layers held fixed, and iterating to convergence for all of the layers. Here is the Kullback-Liebler divergence defined for the two probability distributions and by

 DKL(p1||p2)=Trxp1(x)log[p1(x)p2(x)]. (15)

For classification, the joint probability in eq. (14) is

 p(x,y)=p(y|x)ptraining(x), (16)

where is the probability distribution for the sample of classified data, , used for training. The joint probability for the data-generation problem is similar to eq. (16) with and interchanged.

The assumption that the network in eq. (2) is trained to perform the classification problem correctly, ensures that, as decreases to 0, the sequence of effective Hamiltonians for the data-generation problem in eq. (11) also converges to a fixed point

 ~H∗y(h)=∑α~g∗αOα(h), (17)

such that

 e−~H∗y(x)~Z∗y=~q0(x|y)≈p(x|y). (18)

The stability properties of this fixed point can also be analysed using the method in Kenway2018 . If the stability matrix in eq. (13

) has an eigenvalue bigger than one for small

, so that the fixed point has a relevant direction, then data generation is sensitive to small changes in and these may be utilised to poison the outputs of the classification network without affecting their validity for classification.

## Iv Poisoning the Network Outputs

The classification network may be protected from cloning by adding an imperceptibly small perturbation to each output , which excites an unstable direction of the corresponding data-generation fixed point if it is used for data generation, as it would be in training another network, i.e.,

 ~q0(x|y+δy)≠~q0(x|y). (19)

The deeper the network, the smaller can be to produce a significant discrepancy in eq. (19). This renders the mutual information in eq. (14) incorrect for small , so that the resulting cloned weights are wrong and, if is small enough, does not affect the use of as the classifier.

The perturbation

which is the strongest poison, i.e., produces the largest effect for a given small admixture, is proportional to the eigenvector of the Fisher Information Matrix (FIM)

RajuMachtaSethna2017 for with the largest eigenvalue Kenway2018 . This FIM is given by

 ~F(0)ij = ∂2∂y′i∂y′jDKL(~q0(x|y)||~q0(x|y′))∣∣ ∣∣y′=y = = ∑αα′∂~g(0)α∂yi∂~g(0)α′∂yj[⟨OαOα′⟩(0)−⟨Oα⟩(0)⟨Oα′⟩(0)]

where is the expectation value of the operator with respect to the effective Hamiltonian in layer , eq. (11), i.e.,

 ⟨Oγ⟩(k) = TrhOγ(h)~qk(h|y) (21) = TrhOγ(h)e−~H(k)y(h)~Z(k)y.

Having chosen a subspace of operators, , in which to express the effective Hamiltonians in each layer of the network (via eq. (12)), such that sufficiently precise estimates of the stability matrix associated with each layer, , in eq. (13) are obtained by the method in Kenway2018

, we can compute the FIM using the chain rule:

 ∂~g(0)α∂yi=∑β[~T(1)…~T(N−1)]αβ∂~g(N−1)β∂yi. (22)

Using eqs (9), (11) and (12), the last derivative may be computed from

 e−~H(N−1)y(hN−1)~Z(N−1)y=tN(hN−1|y), (23)

and the explicit form of the RBM for the layer, e.g., using eq. (1) to express in terms of to determine .

Thus, the same method used to attack a deep network with poisoned data Kenway2018 may be used by the network to protect itself from cloning, provided some of the data-generation fixed points have unstable directions.

## V Conclusions

In this paper we have shown how the increasing vulnerability of a deep-learning network to adversarial attack with increasing depth may also provide a means of defence. This applies to a class of deep networks each of which may be represented by layers of RBMs, trained individually so that the network solves a classification problem. The training utilises the same RBMs to construct a data-generation network, with the classifiers as inputs, and it is the potential sensitivity of this network to small changes in the classifiers that turns data poisoning into a means of defence.

Both the vulnerability to attack and the effectiveness of the defence are proportional to the depth of the network, and are reliant on the corresponding RG fixed points (for classification, or data generation) having a relevant, i.e., unstable direction in the space of operators used to define effective Hamiltonians for each layer.

If the classification fixed points have no relevant directions, then the network is not susceptible to adversarial attack and no defence is required. This may be ascertained using the method in Kenway2018 . If the classification fixed point has a relevant direction, there is no guarantee that the associated data-generation fixed point also has a relevant direction, although this may be determined by the same method, which then also determines the most effective way to poison the classification outputs without affecting their validity as classifiers.

Use of the poisoned outputs to train a clone of the classification network would cause the cloned network to classify some data incorrectly. While this would be readily apparent to a would-be attacker, it could not be rectified. Unless it can be shown that instability of the classification fixed point implies instability of the associated data-generation fixed point, this protection against cloning is not always available. However, the method in Kenway2018 provides a means of checking both the inherent vulnerability of a given network and whether poisoning of its outputs is an effective defence, so that the safety of the network can be established.