Towards Robust Deep Neural Networks

We examine the relationship between the energy landscape of neural networks and their robustness to adversarial attacks. Combining energy landscape techniques developed in computational chemistry with tools drawn from formal methods, we produce empirical evidence that networks corresponding to lower-lying minima in the landscape tend to be more robust. The robustness measure used is the inverse of the sensitivity measure, which we define as the volume of an over-approximation of the reachable set of network outputs under all additive l_∞ bounded perturbations on the input data. We present a novel loss function which contains a weighted sensitivity component in addition to the traditional task-oriented and regularization terms. In our experiments on standard machine learning and computer vision datasets (e.g., Iris and MNIST), we show that the proposed loss function leads to networks which reliably optimize the robustness measure as well as other related metrics of adversarial robustness without significant degradation in the classification error.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

08/27/2020

Adversarially Robust Learning via Entropic Regularization

In this paper we propose a new family of algorithms for training adversa...
10/26/2021

Adversarial Robustness in Multi-Task Learning: Promises and Illusions

Vulnerability to adversarial attacks is a well-known weakness of Deep Ne...
03/03/2021

Formalizing Generalization and Robustness of Neural Networks to Weight Perturbations

Studying the sensitivity of weight perturbation in neural networks and i...
04/30/2018

How Robust are Deep Neural Networks?

Convolutional and Recurrent, deep neural networks have been successful i...
04/17/2018

Learning how to be robust: Deep polynomial regression

Polynomial regression is a recurrent problem with a large number of appl...
03/04/2020

The Impact of Hole Geometry on Relative Robustness of In-Painting Networks: An Empirical Study

In-painting networks use existing pixels to generate appropriate pixels ...
07/21/2021

Memorization in Deep Neural Networks: Does the Loss Function matter?

Deep Neural Networks, often owing to the overparameterization, are shown...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The advent of machine learning techniques, most notably deep learning

[4]

, has made complex tasks such as object detection, natural language processing, machine translation, and stock-market analysis more efficient through their automation. Despite its tremendous success in many academic endeavors and commercial applications, the adoption of deep learning in the perception, decision and control loops of mission/safety-critical systems has been limited. One possible reason for the slow rate of adoption is the limited theoretical understanding of the inner workings of deep neural networks (DNNs). Another possible reason is the lack of guarantees of certain behavioral, and, in particular, robustness properties. One such robustness property is the ability of the network to be resistant to adversarial attacks. In an adversarial attack, the input data is perturbed minimally such that, while the resulting adversarial example closely resembles the unmolested sample, the output of the trained network is affected. Recent work in the deep learning literature  

[12, 27, 35, 21, 15, 33, 10]

has shown that DNNs may be susceptible to adversarial examples. For state-of-art DNN image classifiers, researchers have concocted methods to reliably engineer small perturbations that result in successful adversarial examples. In one such example 

[12], the generated adversarial example looks, to the naked human eye, indistinguishable from the original sample, but produces a drastically different classifier output. Some input perturbations are synthetic [12], while others target physical, real-world objects [21, 10].

In traditional verification and validation (V&V) processes for safety-critical systems, the robustness of the system is assured via an extensive testing procedure which samples the expected variations in the values of the parameters of the operating environment and system inputs until certain criteria of appropriate coverage metrics are reached. This test-based approach is likely to be less effective for systems with deep learning components since the dimensions of their input spaces are typically much larger. Furthermore, the lack of understanding of the topology of the input spaces involved leads to an open theoretical question: how do we define covergence metrics for systems with deep learning components? Nevertheless, we know that it is feasible to provide formal guarantees of robustness properties of small to medimum DNNs [16] as well as to enforce the guarantee of a robustness property through the training process for larger DNNs [20]. With the increase in predictability that comes with the formal guarantees of adversarial robustness, it is conceivable that deep learning systems could be deployed in mission and/or safety-critical systems.

In this paper, we present a new training method that results in networks which are less sensitive to changes in the input, and, consequently, likely to be more robust to adversarial attacks. The method reliably yields networks of increased robustness by introducing a cost term that penalizes output sensitivity to changes in the input.

The sensitivity measure is described briefly below and is further detailed in Sec. 3. Consider a portion of a two-dimensional output space illustrated in Fig. 1 (left) where a segment of the decision boundary between classes and lies. The black dot represents the network output for a sample belonging to . Consider a set

containing all possible perturbation vectors expected upon network deployment. These could include, in the case of a vision-based object classification application, synthetic noise, physical perturbations, shadows, glare, etc. Consider the input data

, and the set resulting from perturbing with all possible disturbances from . The DNN maps the perturbed input set into some output reachable set, an over-approximation of which is represented by the box in the figure. We measure the sensitivity of the network as the aggregate volume of all over-approximations of the output reachable set across the training samples for a predetermined .

Also shown in Fig. 1 (left) is the region of potential counter-examples (red portion of the box), which contains all the points in the over-approximated output set for which the classification changes from to , i.e., potential outputs of adversarial inputs. We hypothesize that, by reducing the volume of the output reachable set for some given perturbed input set, the possibility of successful adversarial inputs can be ameliorated, as indicated by the smaller orange box in Fig. 1 (right).

Figure 1: Reducing the size of the output reachable set could lead to increased robustness to adversarial inputs

The main contributions of this paper can be summarized as follows:

  • [leftmargin=*, itemsep=0pt]

  • We introduce two different approaches to estimate the proposed sensitivity measure for feedforward neural networks (see Sec. 

    3): one based on Reluplex [16]

    , an update on the classical Simplex method that enables support of ReLU constraints, and one based on evaluating the dual cost of a Linear Programming (LP) relaxation of the exact encoding of ReLU network’s input and output relation  

    [20, 8].

  • We conduct an empirical study on the relationship between the sensitivities of neural networks and the energy values of their corresponding optimization landscape minima, from which we conclude that networks corresponding to lower-lying minima tend to be less sensitive than those corresponding to higher-valued minima. We also show that traditional optimization approaches based on Stochastic Gradient Descent (SGD) algorithms and regularization do not necessarily lead to networks with good sensitivity properties (see Sec. 

    4).

  • Based on the above results, we propose a novel loss function which enables effective task-oriented learning while penalizing high network sensitivity to changes in the input (see Sec. 5). We verify the effectiveness of the proposed cost function on both the Iris and MNIST datasets, and compare the performance of our method with that of state-of-art approaches (see Sec. 6).

2 Background and Related Work

Many supervised machine and deep learning algorithms operate based on the principle of Empirical Risk Minimization (ERM), whereby the expectation of a loss function associated with a given hypothesis is approximated with its empirical estimate. The main implication of this modus operandi is the so-called Generalization Error (GE), which refers to the difference between the empirical error and the expected error; practically speaking, the GE manifests itself in a difference in algorithm performance on unseen data relative to the performance observed on the training data. Since ERM is an optimistically biased estimation process [41], techniques to bridge the gap between the empirical risk and the true risk have been proposed: the narrower the gap, the better the generalization properties of the algorithm. Broadly speaking, these techniques fall under one of the following categories:

2.1 Sensitivity and Robustness Analysis and Optimization

The authors of [41] define robustness as the property of a machine learning algorithm to produce similar test and training errors when the test and training samples are similar to each other, and hypothesize that a weak measure of robustness is both sufficient and necessary for good generalizability. Furthermore, they derive a generalization bound that holds under certain constraints. The authors of [25] conclude empirically that trained neural networks are more robust to input perturbations in the vicinity of the training manifold and that higher robustness correlates well with good generalization properties. In [32], the relation between the GE and the classification margin of a neural network is studied. The authors conclude that a necessary condition for good generalization is for the Jacobian matrix of the network to have a bounded spectral norm in the neighborhood of the training samples, in line with the results presented in [25, 42].

2.2 Robustness Against Adversarial Attacks

Adversarial examples are data samples constructed by applying perturbations to known training samples along directions that result in the largest change at the network output [13], so that the perturbed input, often indistinguishable from the original input, leads to an erroneous decision by the network. This vulnerability is directly related to the GE and can be ameliorated by training with adversarial examples [13]. The approach introduced in [14] aims at achieving robustness against adversarial attacks, and consists of stacking a denoising auto encoder (DAE) with the network of interest, the role of the DAE being to map back the adversarial example to the training manifold. An alternative architecture based on contractive auto encoders (CAE) is also proposed. The authors of [20] propose a method to train deep networks based on ReLU activation units that are provably robust to adversarial attacks; they achieve this by minimizing the worst-case loss across the a convex over-approximation of the reachable set under norm-bounded input perturbations.

2.3 Formal Verification of Neural Networks

Often, the GE includes behavior that is not only incorrect, but also unpredictable, which prevents deployment of deep learning algorithms in safety-critical applications. Formal verification refers to techniques aimed at providing mathematical guarantees about the behavior of systems including computer programs [28]. Formal verification are used most often for safety-critical applications in sectors such as aerospace, nuclear and rail. Early approaches to formally verifying neural networks leveraged SMT solvers [29]

, but did not scale well and were practical on very small networks with a single hidden layer and a small number of neurons. Reluplex 

[16] extended the simplex algorithm to support the piece-wise linear nature of ReLU activation units, and showed improved scalability properties. More recently, authors have the tackled the problem using optimization-based approaches including: Mixed Integer Programming (MIP) formulations [6, 7], LP relaxation [9], branch and bound [9, 5], and dual formulations [8, 20]. In contrast to most of the other literature, the framework introduced in [8]

applies to a general class of activation functions.

2.4 Analysis of Optimization Landscapes

Modern machine learning methods, most notably deep learning techniques, are posed in the form of highly non-convex optimization tasks with multiple local minima. The simplest way to quantify the quality of landscape minima is to measure their distance from the global minimum: the closer the energy of a local minimum to that of the global minimum, the better the quality of the minimum and its corresponding network. This definition is ill-posed as in general the global minimum is not known for a typical nonlinear optimization function. One of the dominant explanations as to why deep learning works so well is that there may be no bad minima at all for the loss functions of deep network [2, 31, 24, 11, 34]: under certain unrealistic assumptions, it has been shown that all the local minima are global minima [17]. In more realistic scenarios, cost functions with a non-vanishing -regularization term have been shown to have local minima which are not global minima [22, 36]. Another branch of research leverages optimization methods developed for chemical physics, called the potential energy landscape theory [39]. Using such techniques, the authors of [3] showed that the landscape of the loss function for a specific feed-forward artificial neural network with one hidden layer trained on the MNIST dataset exhibited a single funnel structure. In a recent paper [23] (see also, [40, 18]), the measure of the goodness of a minimum is refined by adding further metrics in addition to the value of the cost function or the performance. The work concluded that when an overspecified artificial neural network with one hidden layer and -regularization was used to learn the exclusive OR (XOR) function, although the classification error was often zero for various minima, the sparsity structure of the network varied with the minima.

3 Measure of Robustness Via Sensitivity

This section provides a detailed description of the sensitivty measure overviewed in Sec. 1, as well as of the methods that we use to estimate sensitivity for feedforward neural networks with ReLU activation nodes under -norm bounded perturbations. Throughout the paper, we use ReLU as the activation function. A feedforward neural network is a parameterized function which maps input data in to output vectors in . By applying -norm bounded perturbations to an input data vector , we obtain a perturbed input set , where denotes the perturbation bound. Given the perturbed input set , the output reachable set of the network is . As stated earlier, we use an estimate of the volume of as a surrogate metric for network sensitivity. For feedforward networks with ReLU activation nodes, is generally a non-convex polytope whose exact volume is difficult to compute. Instead, we define the sensitivity of the network as the volume of a box over-approximation of the output reachable set as illustrated in Fig. 2.

Figure 2: Proposed sensitivity measure is the volume of the box over-approximation of the output polytope which results from -norm bounded perturbations to the input.

The over-approximation can be either tight (green box with solid boundary in Fig. 2) or loose (red box with dashed boundary in Fig. 2) depending on the computational methods used. There is a significant trade-off between the conservatism of the over-approximation and the time it takes to compute it. We now describe the two methods used in our experiments. The first method, based on the dual formulation, is used to compute a conservative (i.e., loose) over-approximation. This method turned out to be efficient enough in practice to be incorporated in the training process described in Sec. 5. The second method uses Reluplex, a SMT solver specialized for DNNs, to compute a tight over-approximation of the output polytope.

3.1 Computing Sensitivity Using a Dual Formulation

The volume of the tight over-approximation (green box in Fig. 2) can be computed by finding the min and max values along each dimension of the output vector and then computing the product of the lengths of the segments between such min and max values across all dimensions. Finding the min and max of the output dimensions of a -layer ReLU network requires solving a set of difficult optimization problems i.e., minimize and maximize , where is the -th entry of the output vector , subject to a set of piece-wise linear constraints imposed by the ReLU activations. The min of under -norm bounded perturbations to input with bound is the solution of the following optimization problem, where

is the identity matrix

.

(1)
subject to (2)
(3)
(4)

where is the number of layers in the network, and and are the weights and biases of layer , respectively. The constraints in Eq. 2 capture the fact that the input belongs to the perturbed set . The piece-wise linear constraints in Eq. 4 denote the relations between the inputs and outputs of the layers with ReLU activations. The max of is just the negative of the min of or the negative of the solution of the optimization problem in Eq. 1 for . The volume of the over-approximation is:

(5)

It is known within the operational research community that the optimization problem in 1 can be transformed into a mixed-integer linear programming (MILP) problem through the usage of the big- trick [30]. Therefore computing the sensitivity of a network can be done by solving a MILP using a state-of-art solver such as Gurobi [26] or by using other techniques with exact encodings benchmarked in a recent work in [5]. However, it remains difficult to scale up the “exact” methods to efficiently verify sensitivity of networks with multiple fully-connected layers and more than hundreds of ReLU nodes.

Due to the associated computational complexity, using methods with exact encodings is not yet practical for our main goal, which is to incorporate the sensitivity measure into the loss function driving the learning. Instead, we adopt the approach taken by recent work in which ReLU constraints are relaxed into a set of linear constraints. The relaxation is illustrated graphically in Fig. 3.

Figure 3: Linear relaxation of piece-wise linear constraints.

Consider the piece-wise linear constraints from Eq. 4; under the relaxation procedure, they are transformed into a set of linear constraints:

(6)

With the linear constraints in Eq. 6, the optimization task in Eq. 1 is transformed into a primal linear programming (LP) problem. Furthermore, the dual objective, when evaluated at any feasible point, becomes a lower-bound to the primal objective. As noted in [20], there is a feasible point which can guarantee in practice a “sufficiently tight enough” lower bound to the primal objective.

Let be the dual objective function from [20], where is the input data, is the primal objective matrix, is the norm bound of the allowable input perturbations, is the dual variable, and represents the network parameters; the sensitivity function , which measures the sensitivity of a neural network under norm-bounded perturbations, is given by:

(7)

Eq. 7

is the volume of a (box) over-approximation of the output reachable set of a feed-forward neural network with ReLU activation nodes under

-norm bounded perturbations with bound .

The sensitivity function can be evaluated within seconds or less, on a GPU-enabled laptop with CUDA, for MNIST-sized networks with multiple layers, and more than 1500 ReLU nodes. Comparatively, the sensitivity of a similar-sized network when evaluated with an “exact method” such as Reluplex can take hours. However, this computational speed advantage comes at the cost of over-estimation of the sensitivity measure, as both the linear relaxation in Eq. 6 and the evaluation of the dual objective at a feasible point adds to the conservatism of the (box) over-approximation.

3.2 Computing Sensitivity Using Reluplex

As mentioned in Sec. 3.1, several “exact” techniques benchmarked in [5] can be used to compute the volume of the tight over-approximation. In the experiments of Sec. 4, we used Reluplex as the solver. To compute the volume of the tight (box) over-approximation, we queried Reluplex the following satisfiability problem: for a given data input and perturbation bound , does there exist a solution that satisfies the set of constraints

(8)

and Eqs. 23 and 4 in which and are respectively the candidate min and max of . If Reluplex returns a negative answer, then and are valid lower and upper bounds to . Reluplex is queried repeatedly with a sequence of min and max candidates and until a numerical tolerance is reached. We implemented a simple bisection algorithm to generate a sequence of min and max candidates and , and a translational tool-chain to repeatedly query Reluplex. As described in more detail in Sec. 4, Reluplex was efficient enough for computing the sensitivities of pre-trained networks between the scale of the Iris and MNIST datasets. Fig. 4 plots for close to 200 networks (each corresponding to different landscape minima) in which is the sensitivity computed using the dual formulation and is the sensitivity computed by Reluplex on the same network with the same input perturbation bound. The large peak around network #50 shows that the dual of LP technique can occasionally over-estimate the sensitivity of a network by a large amount when compared against Reluplex, but the plot also shows that for the majority of the networks the over-estimation remains relatively insignificant.

Figure 4: Relative difference between sensitivity estimation via the Dual Formulation and Reluplex.

4 Optimization Landscape Minima: Energy Value vs. Sensitivity

In this section, we investigate the quality of the different minima in the optimization landscape of neural networks by leveraging a combination of energy landscape approaches developed by chemical physicists, and the formal verification tool Reluplex developed by the computer scientists.

4.1 Computing the Minima

Here we give a short description of a procedure to find multiple minima. To find multiple minima, we use methods developed in computational chemistry to explore energy landscapes of atomic and molecular clusters [38]

. First, using different quasi-Newton methods such as limited memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) and basin-hopping methods starting from different random initial guesses, we find multiple local minima. Next, an eigen-vector following method is used to find multiple saddle points of index 1 again by feeding different random initial guesses and following the search in the restricted directions where the Hessian matrix has exactly one negative eigenvalue. Then, from each saddle point of index 1, we compute two steepest descent paths in two different directions to connect the corresponding pair of minima. While following this process of connecting pairs of minima, whenever any of the two end-points of this path is not in the existing database of the minima, it gets added to the database. This process is iterated a few times to obtain multiple minima, while retaining only energetically distinct minima (minima which are at different geometrical locations but have the same energy value are discrete-symmetrically related with each other and can be obtained via a discrete transformation of weights).

To run these computations, we use a wrapper around Python Energy Landscape Explorer (PELE) [1]

, which performs energy landscape related computations (i.e., all the aforementioned steps) for any given unconstrained multivariate cost function with continuous variables, as well as Theano

[37], which generates the cost function for the given neural network architecture. To circumvent the discontinuities associated with the ReLU activation functions, we take a constant value whenever the Hessian is singular by using Theano’s built-in gradients libraries. For the Iris data, we fix the regularization parameter to 0.001. Since this is a stochastic method, there is no guarantee that all minima will be found. We only sample the minima to obtain a qualitative picture.

4.2 Iris Results and Discussion

The network architecture selected to tackle classification on the Iris dataset consists of 3 fully-connected hidden layers of 25 neurons each. We used the energy landscape methods described in Sec. 4.1 to obtain 763 Iris minima of varying energy values. The computation took a total of 20 hours on a standard laptop with a single CPU and 8gb of memory.

For each of these minima, we assume a vector of perturbation bounds where , where is a vector containing the ranges of values of features of the entire Iris dataset; that is, each input dimension has its individual perturbation bound proportional to the spread of the input data along that dimension. Given a fixed and an input data point , we queried Reluplex iteratively, as described in Sec. 3.2, until convergence to the min and max of each output dimension is achieved up to some numerical tolerance. The sensitivity of the network output is plotted as a function of the energy value of the landscape minima for in Fig. 5. It can be seen that there is a tendency for networks corresponding to lower energy minima to have lower sensitivity and hence possibly exhibit more robustness to norm-bounded input perturbations. The plot illustrates the behavior of the average sensitivity for 30 samples and for all samples (150) of the Iris dataset. We observe that there is a very small difference between the average sensitivity of 30 samples versus the average sensitivity of all 150 examples.

Figure 5: Network sensitivity vs. network energy for Iris landscape

5 Learning with Sensitivity Minimization

Having shown empirically in the prior section that low-lying landscape minima lead to networks that are less sensitive than networks with higher energy, we now describe a new training method that encourages convergence to networks with reduced sensivitiy. First, we point out that the traditional task-centric loss function (e.g., cross-entropy) plus an -regularization term does not necessarilty guarantee convergence to network with reduced sensitivity. Fig. 6 shows a slew of training runs done on the IRIS dataset. The -axis is the sensitivity of the network, and

-axis is the training epoch. It can be seen that the change in sensivitiy is irregular and without any clear pattern or trend.

Figure 6: Samples of training runs with undesired evolution of sensitivities.

The new training process, which aims to reduce sensitivity of its output networks, is illustrated in Fig. 7.

Figure 7: Proposed training method to reduce sensitivity.

The main idea of the proposed training process is to augment the usual loss function with an additional weighted term in which the multiplier

is a new hyperparameter, and

is the sensitivity function from 7. Let be the usual loss function with regularization incorporated. Given training sets of size , the optimization task is to find the following minimizer:

(9)

To use the sensitivity function during training, some feasible values for the dual variables need to be chosen. The optimality of the values of the dual variables affects how conservative the estimated sensitivity will be. For the experiments in the next section, we pick the feasible point using Algorithm 1 from [20].

6 Experimental Results

In this section, we present experimental results comparing the performance of three different training methods applied on the standard handwritten digits data-set MNIST. The first method is the baseline which employs a standard cross entropy loss function with regularization. The second method is the weighted sensitivity minimization technique presented in this paper (SM). The third method is the training technique proposed in [20], denoted as K&W, which maximizes the adversarial margin. During each run, at each mini-batch step, all 3 training methods are applied on the convolutional network. This produces 3 separate training runs with identical mini-batches. The convolutional network architecture has one convolutional layer and two fully-connected layers. The number of parameters of this architecture is around

. To save some computational time, we keep the network architecture relatively small, although we note that the proposed technique can scale up to larger networks since it exhibits none of the combinatorial complexities present in any of the methods used to compute the tight over-approximation. We used a vanilla stochastic gradient descent (SGD) optimizer without any heuristics based on momentum or RMS propagation. The usage of vanilla SGD, although slower compared to ADAM 

[19]

, was both more numerically stable and sufficient for our purpose, which was to evaluate the relative differences in the evolution of the adversarial robustness, sensitivity and performance of training runs using different techniques on the same set of mini-batches. The software environment used for the training was pytorch 0.3.1 and Cuda 9.0. The hardware setup consisted of a laptop with Intel i7-2500 CPU, 16GB of memory and a single Nvidia GTX 970M graphics card. The learning step size chosen was

, the -regularization weight was by default and the batch size was . A perturbation bound of was chosen for all experiments.

6.1 Tuning the sensitivity multiplier

As expected, there is a trade-off between the amount of sensitivity minimization and the network’s classification performance. This trade-off is observed during the tuning of , the multiplier of the sensitivity term. For a given number of training epochs, increasing the value of typically results in a network that is less sensitive and more robust in terms of the adversarial error measure, but that has worse performance in terms of the classification error. In the experiments, we observed that a good choice of is very much dependent on the dataset. For the Iris dataset experiments, a good choice of was while for the MNIST experiments, a good choice was .

6.2 MNIST Results

For the SM runs, we set the multiplier to 1 and geometrically decrease or increase the until the classification performance of the SM run approached that of the baseline. This iterative process settled on and the resulting performance evaluations of 3 training runs are shown in Figs. 8 and 9.

Fig. 8 shows the classification error of all 3 training runs evaluated on the test data. With , the classification error performance of our training method SM is nearly as good as the baseline and exceeds the performance of K&W.

Figure 8: Classification error with

Next, we look at the sensitivity measure of the 3 training runs evaluated on the test data. This is the measure which our training method tries to minimize. As shown in Fig. 9, not surprisingly, SM outperforms both the baseline and K&W. Note that the multiplier term was small enough that the sensitivity actually increases in the run produced by our own training method.

Figure 9: Sensitivity with

Finally we look at the adversarial error, i.e., the robustness measure from [20], which is the percentage of the test data for which the application of -bounded perturbations could lead to adversarial examples. As shown in Fig. 10, each of the 3 training runs are evaluated for their adversarial error on the test data. Here, SM was outperformed by K&W; however, when compared against the baseline, both methods increase the robustness of the network by significant margins.

Figure 10: Adversarial error with

6.3 Discussion

Figs. 9 and 10 show a possible negative correlation between the sensitivity of the network and its adversarial robustness. While our method does not minimize the adversarial error directly, the empirical outcome from Fig. 10 seems to indicate that encouraging small sensitivity has an effect on that measure. Complementarily, the method from [20] does directly minimize the adversarial error by maximizing of the adversarial margin; however, as shown in Fig. 9 it appears to also have significant effect towards minimizing the sensitivity of the network when compared against the baseline. The difference in classification error between the method in [20] and our method could be due to the former leading to over-fitting due to the learning leveraging the worst-case adversarial example; in contrast, our training method reduces the size of the output reachable set more uniformly, and not based on any particular sampling scheme at the output. Alternatively, if the sensitivity term in our training method is replaced by the negative of the adversarial margin of K&W, it is conceivable that we will observe a similar trade-off between their robustness measure and the classification performance.

7 Conclusion and Future Work

In this paper, we studied the relationship between the energy value of the network minima and the sensitivity of the resulting network. Using formal verification techniques, we have demonstrated on the Iris dataset a direct correlation between the sensitivity of the network and the energy value of the corresponding landscape minima. Based on that result, we also studied the relationship between the sensitivity of the neural network and the adversarial robustness of the network. We have found, through a novel learning framework, that the adversarial robustness of the network can be improved via a reduction of the sensitivity of the network. According to the proposed learning approach, the sensitivity and/or robustness constraints are handled via the penalty method, which results in the usual unconstrained optimization problem solved in machine learning. The resulting penalty parameter is a hyperparmeter which can be tuned to reduce the degradation in the performance of the network while still providing improvements in the adversarial robustness of the network.

At this point, we have only considered perturbations of small magnitude captured by the norm-bound mathematical abstraction. The extension of the verification technique to complex additive disturbances such as occlusion, glare, and shadow or multiplicative disturbances such as rotation and translation, remains an open question as the mathematical abstractions of those complex disturbances may not fit with the dual of LP relaxation framework. Furthermore, the certification of any new process for guaranteeing robustness of deep-learning based components in the context of mission- and safety-critical applications remains an open question as well.

Other questions remain as to the scalability and generalization of the training method. The key benchmark of scalability is whether sensitivity, robustness, and any other desired properties can be efficiently computed for significantly larger networks, using the dual of LP relaxation formulation. The generalization questions refers to whether the training technique produces the same consistent across-the board convergence to the desired property enforced via the penalty term for other datasets. Beyond robustness and/or sensitivity, other formalizable properties of the neural network that are critical to building an assurance case need to be studied as well. There is also a need to address the conservatism of the sensitivity computations, and whether the looseness of the approximation can be further improved without significant reduction in the efficiency of the method.

References