Connecting Lyapunov Control Theory to Adversarial Attacks

07/17/2019 ∙ by Arash Rahnama, et al. ∙ Booz Allen Hamilton Inc. 0

Significant work is being done to develop the math and tools necessary to build provable defenses, or at least bounds, against adversarial attacks of neural networks. In this work, we argue that tools from control theory could be leveraged to aid in defending against such attacks. We do this by example, building a provable defense against a weaker adversary. This is done so we can focus on the mechanisms of control theory, and illuminate its intrinsic value.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Adversarial Machine Learning has been a research area for over a decade (Lowd and Meek, 2005)

, but has only recently gained increased attention due to the successful application of adversarial attacks to deep learning networks

(Goodfellow et al., 2015). If we define the DNN as a model , which produces an output given some input , then we are interested in the types of adversarial attacks which can perturb given some change such that the model produces an incorrect decision (i.e., ). Typically, it is assumed that the attack parameter is bounded by some norm, such that . Adversarial attacks have been successful in degrading the performance of DNNs across many domains and algorithms even with very small values of (Biggio and Roli, 2018)

. For instance, it has been shown that by altering only one pixel in the input image, the standard convolutional neural network can be fooled into making the wrong decision (i.e.,

) (Su et al., 2017). This points to the innate vulnerability of DNNs and its negative impacts on the public trust in reliability and safety of machine learning systems. As the use of DNNs in safety-critical environments such as autonomous vehicles (Eykholt et al., 2018) and medicine (Fredrikson et al., 2014) increases, the need for designing provable defensive solutions against adversarial attacks also grows. This also motivates our interest in designing robust deep learning models.

With an increased interest from the community, many attempts to produce heuristic defensive designs have been proposed

(Yuan et al., 2017; Kannan et al., 2018; Li and Li, 2017; Guo et al., 2018; Xie et al., 2018; Samangouei et al., 2018). Yet when carefully evaluated, it has been routinely found that these defensive approaches are not effective against their intended adversary (Athalye et al., 2018; Athalye and Carlini, 2018). Another venue of research has focused on building provably robust defensive designs against adversarial attacks. These have so far focused on the careful application of more sophisticated optimization techniques to prove that everything within an ball of the training data will produce the same output (Abbasi and Vision, 2018; Gowal et al., 2018; Wong et al., 2018; Dvijotham et al., 2018; Wong and Kolter, 2018). While the motivation behind these optimization based approaches is intuitive, the model must be robust if the response is consistent, none of these methods can yet scale to large datasets. Given the shocking nature by which most defensive designs have been easily defeated (Carlini and Wagner, 2017), many have begun the work to build a new theory which can help explain and resolve these issues. This work has been done "from the ground up," and attempts to build new mathematical tools and results to understand the problem. Early works provided grounding to the intuitive connection between model’s accuracy, the dimensionality of the feature space, and the model’s susceptibility to attack (Gilmer et al., 2018). Wang et al. (2017) developed foundations to compare a trained model to an oracle , and provided a connection between extraneous features and susceptibility to attack. Demontis et al. ([n.d.]) showed connections between the norm of the input gradients and the ability to transfer attacks against one model to a second unknown model.

In this work, we make an important connection between the field of control theory and the design of robust DNNs. In section 2 we show how to use Lyapunov theory of stability to model neural networks as a dynamical nonlinear system, and bound the perturbation’s effect against a simple adversary for all possible inputs. Breifly, we will discuss related work in section 3. An abridged review of the needed control theory, and empirical validation of our theory, are available in the appendix.

2. Main Results

Using the above results from the field of control theory, we show how to develop a regularization technique that provides provable bounds for DNN with layers. The primary result is given in (1), where is the change in the input of the DNN, which is constrained such that , . is the resulting perturbations to the activation of each proceeding layer of the network. The constants , , can be chosen almost arbitrarily, so long as the denominator remains positive. A third parameter must satisfy . This shows the deviations of the network’s final activation () is bounded by a ratio of the input perturbation size () for all possible inputs.

(1)

Our proof strategy begins with treating each layer of the DNN as a nonlinear dynamical system. For each layer, we show the conditions under which the layer obtains the Incrementally Input Feed-Forward Passive (IIFP) properties (see (Zames, 1966), or Definition 3 ). The control theory view allows us to consider each layer independently in our analysis, where the input to one layer is the output of the previous. Using a sequence of IIFP layers, we then show that their sequential combination, under certain conditions, will maintain the Incrementally Output Feedback Passive (IOFP) property (see (Zames, 1966), or Definition 2). Having the IOFP property allows us to derive the global bound given in Equation 1, producing a DNN which is provably robust against an adversary that can alter the input by any constant factor. This result can be used both to understand robustness for classification problems, and the less studied regression case (Nguyen and Raff, 2019).

2.1. Proving Robustness for the Cascade

Our aim is to find a relationship between the distortions introduced by adversarial examples and robustness in DNNs. Here, we characterize a measure of robustness which can be used to certify a minimum performance index against adversarial attacks on a neural network. Given a DNN, we are interested in characterizing the (local) robustness of an arbitrary natural example by ensuring that all of its neighborhood has the same inference outcome. The neighborhood of is characterized by an ball centered at . Geometrically speaking, the minimum distance of a misclassified nearby example to is the least adversary strength required to alter the target model’s prediction, which is also the largest possible robustness certificate for

. We aim to utilize the IIFP and IOFP properties of the activation function and their relationship with Lyapunov stability properties of nonlinear systems to find a robustness measure

(Zames, 1966). The definition of the injected perturbations by the adversary is as follows,

Definition 0 ().

Consider the input to the layer of size , . The perturbed input signal is where

is the attack vector with all positive or all negative entries of the same size. The perturbed input vector

is within an -bounded -ball centered at i.e., , where .

Here, we consider the constant variations , i.e., , , which are injected by the adversary into the initial input or the signals traveling from a hidden layer to another. A system is defined as a layer inside the DNN which accepts an input of size (output of the previous layer) and produces an output of size (what is produced after the activationn transformation). We suggest that DNN’s parameters should be trained so that the output variations are small for small variations in input . We treat each layer of the DNN as a nonlinear dynamical system. We show the conditions under which a layeris IIFP from its input to its output. Then, we prove that the interconnection of IIFP layers under certain conditions is IOFP with a negative and as a result find a bounded stable and robust relationship between the input and output of the entire DNN i.e., show that bounded changes applied to the input produce bounded changes in the output which are upper-bounded by the changes in the input.

Theorem 2 ().

Consider the cascade interconnection of nonlinear systems where , if each sub-system for is instantaneously Incrementally Input Feed-Forward Passive (IIFP) with a storage function and such that,

then there exists a positive , where, .

for which the entire cascade interconnection of nonlinear systems admits a storage function of the form,

(2)

satisfying

(3)

for some , where , and and are the incremental inputs and outputs of system .

Proof: The proof is to show that the cascade interconnection of the systems is IOFP with the passivity index . We need to show that the storage function given in (2) satisfies (3), where , and is the incremental difference between any two input signals to the layer (system) , and is the incremental difference between their respective outputs for the layer (system) . is the incremental input to the first layer (system), and is the final incremental output of the cascade interconnection and is a vector of size with incremental output entries where . The relationship given in (3) holds if,

(4)

We can define,

and . Then it can be seen that the left hand-side of (4) is equal to

where . According to Theorem 6 if, , then there exists a diagonal matrix such that . Hence, it can be shown that the left size of the equation given in (4) is negative, i.e., and thus,

We have formulated the storage function, given in (3), for the cascade of systems . As mentioned before, each system

represents a layer inside the DNN and the entire cascade interconnection represents the entire DNN. Now, we can characterize a relationship between the incremental inputs fed into the first layer of DNN and the respective incremental outputs at the final layer of the DNN. This means that we can effectively characterize the changes caused by adversarial attacks by quantifying their effects on the output of DNN. We can prove an upper-bound for the changes occurring at the output layer of the DNN given the respective input differences fed into the network. Needless to say, these input differences are caused by adversarial attacks. Consequently, we characterize a measure of robustness for the entire DNN. As shown in the next corollary, if the loss function is designed such that it is encouraged for each hidden layer of the DNN to behave as an IIFP nonlinear system then the changes in the output caused by the attack vector

injected by the adversary into the input of the layer are bounded and limited by the changes in the input signal itself (the norm of the attack parameter). Correspondingly, the adversary’s ability to change the output behavior is limited to the use of larger attack parameters which in return are easier to detect.

Corollary 3 ().

Consider a cascade of nonlinear systems organized as feed-forward layers of a neural network, with layers. If each sub-system is instantaneously Incrementally Input Feed-Forward Passive (IIFP) with a positive incremental input passivity index , then the entire cascade of systems is instantaneously Incrementally Output Feedback Passive (IOFP) with the passivity index and the storage function where meets the condition, . One can show that the variations in the final output of the entire network () are upper-bounded (limited) by the variations in the input signal () through the following relation,

or further as a tighter bound, for all the output variations for all the layers, we can show the following bound, where and .

Proof: Given Theorem 2, and the following definitions, , and , we have,

where are design parameters and . Finally if we move the appropriate terms to the left side of the above inequalities we have,

Or further as a tighter bound we can have,

where should hold.

2.2. Proving Bounds Against Perturbations

A DNN can be represented as a cascade of systems. One can model the DNN as for for some , where is the input feature of the -th layer, is a (non-linear) activation function, and and

are respectively the layer-wise weight matrix and bias vector applied to the flow of information from the layer

to . and

represent the number of neurons in layers

and . For a set of parameters, , we denote the function representing the entire DNN as where . Given the training data, , where and , the loss function is defined as , where is usually selected to be cross-entropy or the squared -distance for classification and regression tasks, respectively. The model parameter to be learned is . We consider how we can obtain a model that is insensitive to the perturbation of the input. The goal is to obtain a model, , such that the -norm of the incremental change is small, where is an arbitrary vector and is a perturbation vector with a small

-norm. Most DNNs exhibit nonlinearity only due to the activation functions, such as ReLU, maxout and maxpooling. In such cases, function

is a piece-wise linear function. Hence, if we consider a small neighborhood of , we can regard as a linear function. In other words, we can represent it by an affine map, , using a matrix, , and a vector, , which depend on and . It is important to note that because of Theorem 6, the number of layers in the DNN under consideration should be larger than (). This does not limit our results as any DNN with a smaller number of layers will only consist of an input and an output layer.

We suggest that model parameter should be trained so that the output variations are small for small variations in input . To further investigate the property of , we assume that each activation function, is a modified version of element-wise ReLU called the Leaky ReLU: where , and . It follows that, to bound the variations in the output of the neural network by the variations in the input, it suffices to bound these variations for each . Here, we consider that the variations are injected by the adversary into the initial input or the signal traveling from a hidden layer to another. This motivates us to consider a new form of regularization scheme, which is described in the following. As mentioned before, a system is defined as a layer inside the DNN which accepts an input of size (output of the previous layer) and produces an output of size (what is produced after the Leaky ReLu transformation).

The transformations between the two layers and can be divided into two sub-transformations which respectively represent the set of row operations done on the input signal that produce positive or negative outputs on the other side of the Leaky ReLU activation function. The positive transformation includes the rows and the negative transformation includes the rows where . Below, if the consecutive layers and

are of different sizes, the appropriate matrices are padded with zeros. These changes do not affect our results and are only done for mathematical tractability. If

, then is padded with rows of zero i.e.

and the identity matrix

has the dimensions of . If , then is padded with columns of zero i.e. , the identity matrix initially of size is padded with rows and columns of zero to produce and the vector initially of size is padded with rows of zero to produce

. We need to first show the conditions under which the non-linear transformations inside the DNN are IIFP each with a positive input passivity index

. As a result, the input passivity index for the layer , is a positive design parameter representing the extend to which, we want to encourage this behavior in the sub-layer . These parameters will re-appear in the loss function for the entire system to encourage this behavior on the network level.

Given the Definition 3, and the fact that what happens at the output level constitutes a linear transformation, we have,

(5)

for some positive . is a diagonal matrix defined as follows: the first diagonal entries are equal to , the next diagonal entries are equal to and the rest of the diagonal entries are zeros. The relationship given in (2.2) can be further simplified to have,

The above relation holds and as a result the above transformations are IIFP, if the summation of the weights (entries of ) is greater than , i.e., .

As a result a regularization scheme that would encourage the layers to behave as IIFP nonlinear systems should encourage the following relation for each layer,

(6)

where is the number of neurons in the previous layer, is the number of neurons in the next layer and is the number of hidden layers. The above regularization rule on the weights is happening at the layer level independent of other layers, unlike Ridge or LASSO regularization rules. A simple regularization rule added to the loss function that encourages the behavior given in (6) will maintain the IIFP property for each layer and the IOFP property with a negative for the entire DNN as defined in Theorem 2. Finally, the variations in the final output of the entire network () is upper-bounded (limited) by the variations in the input signal () through the following relation,

Or similarly as a tighter bound, for all the output variations at all the layers we have (7), where . We point out that the view of DNNs as a non-linear system does not depend on any special properties, or even recognition that the network has a final layer. As such, the bounds apply to all layers simultaneously, bounding an attack initiated at any individual layer, as well as the response of any hidden layer. This type of result has never been previously shown, and comes for free with control theory.

(7)

3. Related Work

While we are not aware of any prior work that has shown the direct applicability of control theory to adversarial attacks, we make note of two types of connections to prior work.

First, Zantedeschi et al. (2017) developed a bounded (or "clamped") version of the ReLU activation function as a method of bounding the perturbation of the network independent of the learned network’s weights. By doing so they show where is the product of the Lipschitz constants of each layer. This can be seen as an special case of our results where the parameters are selected such that .

Second, we note as an example the valuable work of Zhang et al. (2018), who developed bounds on a network’s response with a variety of activation functions. In the parlance of control theory, their work shows similar bounded conic behavior as what we have performed in this work. In contrast to their work, our approach leverages control theory to define the behavior of the network as a whole, allowing us to derive results in a more direct fashion. Our hope is that by further leveraging and framing these problems in a control theoretic context, we can simplify the issue of dealing with adversarial attacks.

4. Conclusion

We have, by example, shown how the findings from the field of control theory, and more specifically Lyapunov theory of stability and robustness, are directly applicable and thus related to a new found interest in adversarial attacks. Through this lens, we can more easily define the behavior of networks as a whole, resulting in bounded behavior for all possible inputs. While we do not by any means solve the issue of adversarial attacks in this work, we hope to have effectively illustrated the deep connection between these two fields.

Appendix A Mathematical Reference

Our work is based on the Lyapunov theory of stability and robustness for nonlinear dynamical systems, which emerges from the field of control theory. We recognize many in the machine learning community are not as familiar with this domain of research. Here, we give a brief overview of the mathematical principles behind our results and the proposed approach for designing robust DNNs. For a more complete background on stability and robustness of nonlinear systems in the field of control theory, we refer the reader to (Khalil, 1996).

In our work, we consider the nonlinear system given in Fig. 1,

where , , and are respectively the state, input and output of the system, and , and are the state, input and output spaces.

Remark 1: Any layer inside a DNN may be seen as a nonlinear system as described above. For the layer , has the size of the layer and stands for the input to the layer before the weights and biases are applied. has the size of layer and may be seen as the output of the layer after the activation functions. In this vein, and may be thought of as functions which model the state changes () occurring during the training of the DNN and their relationship to the input and output signals.

Definition 0 ().

((Zames, 1966)) System is instantaneously incrementally finite-gain -stable, if for any two inputs , there exists a positive gain , such that the relation,

holds. Here, and represent the Frobenius norm of the signals and may be any positive number.

Remark 2: Note that the property defined in Definition 1 is less restrictive than assuming Lipschitz continuity for a DNN. The Lipschitz property corresponds to replacing the right side of the above equation with a function of input difference i.e., which is linear in . Further, the above assumption does not place any constraints on the initial conditions of the system DNN. This potentially allows for producing model distributions which have disconnected support (Fawzi et al., 2018).

Definition 0 ().

((Zames, 1966)) System is considered to be instantaneously Incrementally Output Feedback Passive (IOFP), if it is dissipative with respect to the well-defined supply rate,

for some positive .

Definition 0 ().

((Zames, 1966)) System is considered to be instantaneously Incrementally Input Feed-Forward Passive (IIFP), if it is dissipative with respect to the well-defined supply rate,

for some positive .

Remark 3: A well-defined supply rate function is one that is finite over time and meets certain conditions. System is dissipative with respect to the well-defined supply rate , if there exists a nonnegative storage function such that . Hence, in order to show that a system is IIFP or IOFP, we need to show that the system’s supply rate is greater or equal to zero. For more details on this subject, we refer the readers to (Willems, 1972). Lastly, the IIFP and IOFP properties of a system have a direct relationship with the system’s robustness and stability properties. By proving these properties for each layer of the DNN, we are effectively encouraging the same robust behavior for the entire DNN.

Theorem 4 ().

((Khalil, 1996)) If the dynamical system is Incrementally Output Feedback Passive (IOFP) with , then it is incrementally finite-gain -stable with the gain .

Figure 1. A nonlinear system .
Theorem 5 ().

((Arcak and Sontag, 2006)) A matrix is said to be Lyapunov diagonally stable, if there exists a diagonal matrix such that

Theorem 6 ().

((Arcak and Sontag, 2006)) A matrix of the form:

Where s.t. , is Lyapunov diagonally stable, i.e. it satisfies the relation given in Theorem 5 for some matrix , if and only if the secant criterion,

holds.

Remark 4: It is important to note that the properties given in above theorems will be utilized in our proofs to show stability and robustness for a cascade of layers in a DNN.

Appendix B Empirical Experiments

The primary purpose of our paper is to show, by example, the connection of control theory to adversarial attacks. Having bounded the deviation of a network’s activation’s given a constant perturbation to the input, we now show this holds empirically and can be adapted into a regularize with little work. Because our adversary is constraint to a single constant perturbation, this is of little practical importance. It however is indicative of how more involved applications of control theory may be adapted into usable defenses, and empirically confirms our proof holds.

The regularization term defined in (8) directly from the proof. It is composed of layer dependent penalties, each with two components. A "constant" term that is determined by the values of the hyper parameters (, the Leaky ReLU slope , and hidden layer size ) which are defined before training starts. Subtracted from this is a "weight" term, which is simply the sum of all weight coefficients for the hidden layer. The penalty simply encourages the sum of weights in layer to be larger than the layer-wise constant .

(8)

b.1. Dataset Details

Experiments were run on the following regression datasets. Prior to use, each dataset was split into training, validation, and test sets. The independent variables for each dataset were normalized using training set statistics, and principal component analysis was used to reduce the dimensionality of each dataset to 10. The target variable for each dataset was also scaled to the

range. All data sets were obtained from the UCI Machine Learning repository.

Boston Housing: the first dataset we evaluate on is the Boston house price data of Harrison and Rubinfeld (1978), the target variable is the median value in thousands of dollars of owner-occupied homes in the area of Boston, Massachusetts.

Communities and Crime: the second data set we evaluate on is the Communities and Crime Unnormalized data set (Redmond and Baveja, 2002). The number of murders in 1995 is the target variable, and variables include potential factors such as percent of housing occupied, per capita income, and police operating budget. Independent variables from the original data set that contained missing values were dropped.

Relative Location of CT Slices on Axial Axis: the third data set we evaluate on is the Relative Location of CT Slices on Axial Axis data set (Graf et al., 2011). The data consists of a set of 53500 CT images from 74 different patients where each CT slice is described by two histograms in polar space. The histograms describe the location of bone structures in the image and the location of air inclusions inside of the body. The independent variables consist of the information contained in the two histograms, and the target variable is the relative location of an image on the axial axis.

Malware: The fourth data set we evaluate on is the Dynamic Features of VirusShare Executables data set from Huynh et al. (2017) which contains the dynamic features of executables collected by VirusShare between November 2010 and July 2014. The target variable is a risk score between 0 and 1. This data set is an intrinsically interesting use case as malware authors are an active real-life adversary.

Condition Based Maintenance: The fifth data set we evaluate our approach on is the Condition Based Maintenance of Naval Propulsion Plants data set consists of results from a numerical simulator of a naval vessel characterized by a gas turbine propulsion plant (Coraddu et al., 2014). This data set has two target variables, the gas turbine’s compressor decay state coefficient and the gas turbine’s turbine decay state coefficient. As such we will treat this as two different regression data sets that use the same feature set.

b.2. Network Architecture, Training, and Attack Settings

The network architectures in all experiments consist of an input layer, 2, 6, or 12 hidden layers of size equal to the input layer size, and a single node output layer. Leaky rectified linear units were used as hidden layer activation functions with the negative slope set to

, and Adam was used as the optimization algorithm (Kingma and Ba, 2015). The regularization term from (8) was re-scaled to receive a weight (i.e., magnitude) equal to the mean squared error in the loss function. This was to avoid numerical issues in training. In all cases, the hyper-parameter is set to , i.e. for . While training with gradient descent in this fashion does not guarantee that the conditions will be met for for , in practice the results are close, with more details shown in appendix subsection B.4. The adversarial attack follows Definition 1 with . Because of the constraint ,

, our threat model results in an adversary with a single degree of freedom. We are aware this is a weaker threat model, but our focus is to show the connections between the Lyapunov theory and the domain of adversarial attacks. It also allows us to use Hill climbing to find the optimal perturbation vector within the

-bounded -ball.

b.3. Results and Discussion

Bounds for each dataset and network depth combination are computed by finding the largest values of for each layer satisfying Equation 6 and the learned weights parameters.

Figure 2. Box plot describing the distribution of for each dataset and network depth combination. Red dots represent the upper bound on the ratio, based on Equation 1.

Figure 2 shows a box plot describing the distribution of for each dataset and network depth combination. Red dots represent the upper bound on the ratio, computed as . This plot illustrates the fact that no bound violations occurred in any of our experiments. For all test data points, datasets, and network depths, is lower than the upper bound.

For many of the datasets, there is a wide gap between the worst observed perturbations resulting from an attack and the upper bound on the perturbations. The presence of large gaps suggests that there could be a difference between the bound on the universe of all possible data (what our theory provides) and the space occupied by observed data. This suggests room for tighter bounds by looking at the behavior of the network within a limited input space defined by the data. However, in some cases such as on the CBM Turbine dataset with 6 hidden layers, the gap is much smaller, demonstrating that bounding the worst case scenario does have practical value as it is possible in practice to get very close to the worst case.

b.4. Learn are close to the desired value.

Figure 3. Histogram describing the distribution of computed from the trained networks combined across all datasets and network depths using Equation 6.

Figure 3 contains a histogram illustrating the distribution of computed from the trained networks combined across all datasets and network depths using Equation 6. As described in Section B, in the regularization term the desired were set to for all layers in all networks. This figure shows that the resulting are almost all equal or slightly greater than , noting that larger result in more resilient networks. This demonstrates the practical effectiveness of the regularization term defined in (8) at obtaining networks with a specified desired resilience.

It would be possible to force the chosen value of to occur in the network by using a projection step after every gradient update. Not only is this more computationally demanding, but we find it makes learning a network with comparable MSE more difficult. Because we see that the learn value of is almost always closer to the desired value, we prefer to train in this relaxed fashion, and can be seen as a way of allowing the network flexibility to reduce the bound in order to obtain a useful model. This is reasonable in our opinion, since a model with degenerate performance in all cases is intrinsically never useful.

References