Reachable Set Computation and Safety Verification for Neural Networks with ReLU Activations

12/21/2017 ∙ by Weiming Xiang, et al. ∙ 0

Neural networks have been widely used to solve complex real-world problems. Due to the complicate, nonlinear, non-convex nature of neural networks, formal safety guarantees for the output behaviors of neural networks will be crucial for their applications in safety-critical systems.In this paper, the output reachable set computation and safety verification problems for a class of neural networks consisting of Rectified Linear Unit (ReLU) activation functions are addressed. A layer-by-layer approach is developed to compute output reachable set. The computation is formulated in the form of a set of manipulations for a union of polyhedra, which can be efficiently applied with the aid of polyhedron computation tools. Based on the output reachable set computation results, the safety verification for a ReLU neural network can be performed by checking the intersections of unsafe regions and output reachable set described by a union of polyhedra. A numerical example of a randomly generated ReLU neural network is provided to show the effectiveness of the approach developed in this paper.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 15

page 17

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Artificial neural networks have been widely used in machine learning systems. Applications include adaptive control

[1, 2, 3, 4, 5, 6]

, pattern recognition

[7, 8], game playing [9], autonomous vehicles [10] ,and many others. Though neural networks have been showing effectiveness and powerful ability in resolving complex problems, they are confined to systems which comply only to the lowest safety integrity levels since, in most of time, a neural network is viewed as a black box without effective methods to assure safety specifications for its outputs. Neural networks are trained over a finite number of input and output data, and are expected to be able to generalize to produce desirable outputs for given inputs even including previously unseen inputs. However, in many practical applications, the number of inputs is essentially infinite, this means it is impossible to check all the possible inputs only by performing experiments and moreover, it has been observed that neural networks can react in unexpected and incorrect ways to even slight perturbations of their inputs [11], which could result in unsafe systems. Hence, methods that are able to provide formal guarantees are in a great demand for verifying specifications or properties of neural networks. Verifying neural networks is a hard problem, even simple properties about them have been proven NP-complete problems [12]. The difficulties mainly come from the presence of activation functions and the complex structures, making neural networks large-scale, nonlinear, non-convex and thus incomprehensible to humans. Until now, only few results have been reported for verifying neural networks. The verification for feed-forward multi-layer neural networks is investigated based on Satisfiability Modulo Theory (SMT) in [13, 14]. In [15] an abstraction-refinement approach is proposed for verification of specific networks known as

Multi-Layer Perceptrons

(MLPs). In [12], a specific kind of activation functions called Rectified Linear Unit (ReLU) is considered for verification of neural networks. A simulation-based approach is developed in [16]

, which turns the reachable set estimation problem into a neural network maximal sensitivity computation problem that is described in terms of a chain of convex optimization problems. Additionally, some recent reachable set estimation results are reported for neural networks

[17, 18, 19], these results that are based on Lyapunov functions certainly have potentials to be further extended to safety verification.

A neural network is comprised of a set of layers of neurons, with a linear combination of values from nodes in the preceding layer and applying an activation function to the result. These activation functions are usually nonlinear functions. In this work, we are going to focus on ReLU activation functions

[20], which is widely used in many neural networks [21, 22, 23, 13]. A ReLU function is basically piecewise linear. It returns zero when the node is with a negative value, implying the node is inactive. On the other hand, when the node is active with a positive value, the ReLU returns the value unchanged. This piecewise linearity allows ReLU neural networks have several advantages such as a faster training process and avoiding gradient vanishing problem. As to output reachable set computation and verification problems addressed in this paper, this piecewise linearity will also play a fundamental role in the computing procedure.

The main contribution of this work is to develop an approach for computing output reachable set of ReLU neural networks and further applying to safety verification problems. Under the assumption that the initial set is described by a union of ployhedra, the output reachable set is computed layer-by-layer via a set of manipulations of ployhedra. For a ReLU function, there are three cases from the view of the output vectors:

  • Case 1: All the elements in the input vector are positive, thus the output is exactly equivalent to the input;

  • Case 2: All the elements are non-positive so that the ReLU function produces a zero vector according to the definition of ReLU function;

  • Case 3: The input vector has both positive and non-positive elements. This is a much more intricate case which will be proved later that the outputs belong to a union of polyhedra, that is essentially non-convex.

These above three caese are able to fully characterize the output behaviors of a ReLU function and form the basic idea of computing the output reachable set for neural networks comprised of ReLU neurons. With the above classifications and a complete reachability analysis for ReLU functions, the output reachable set of a ReLU layer can be obtained case by case and in the expression of a union of polyhedra. Then, the approach is generalized from a single layer to a neural network consisting of multiple layers for output reachable set computation of neural networks. Finally, the safety verification can be performed by checking if there is non-empty intersection between the output reachable set and unsafe regions. Since the output reachable set computed in this work is an exact one with respect to an input set, the verification results are sound for both safe and unsafe conclusions. The main benefit of our approach is that all the computation processes are formulated in terms of operations on polyhedra, which can be efficiently solved by existing tool for manipulations on polyhedra.

The remainder of the paper is organized as follows. The preliminaries for ReLU neural networks and problem formulations are given in Section II. The output reachability analysis for ReLU functions is studied in Section III. The main results, reachable set computation and verification for ReLU neural networks, are presented in Section IV. A numerical example is given in Section V to illustrate our approach, and we conclude in Section VI.

Notations: denotes the field of real numbers, stands for the vector space of all -tuples of real numbers, is the space of matrices with real entries. denotes a block-diagonal matrix. stands for infinity norm for vector defined as . denotes the transpose of matrix .

2 Preliminaries and Problem Formulation

This section presents the mathematical model of neural networks with ReLU activations considered in this paper and formulates the problems to be studied.

2.1 Neural Networks with ReLU Activations

A neural network consists of a number of interconnected neurons. Each neuron is a simple processing element that responds to the weighted inputs it received from other neurons. In this paper, we consider the most popular and general feedforward neural networks called the Multi-Layer Perceptron (MLP). Generally, an MLP consists of three typical classes of layers: An input layer, that serves to pass the input vector to the network, hidden layers of computation neurons, and an output layer composed of at least a computation neuron to produce the output vector.

The action of a neuron depends on its activation function, which is described as

(1)

where is the th input of the th neuron, is the weight from the th input to the th neuron, is called the bias of the th neuron, is the output of the th neuron, is the activation function. The activation function is generally a nonlinear function describing the reaction of th neuron with inputs , . Typical activation functions include rectified linear unit, logistic, tanh, exponential linear unit, linear functions, etc.

An MLP has multiple layers, each layer , , has neurons. In particular, layer is used to denote the input layer and stands for the number of inputs in the rest of this paper and thus stands for the last layer, that is the output layer. For a neuron , in layer , the corresponding input vector is denoted by and the weight matrix is

where

is the weight vector. The bias vector for layer

is

The output vector of layer can be expressed as

where is the activation function for layer .

For an MLP, the output of layer is the input of layer, and the mapping from the input of input layer to the output of output layer stands for the input-output relation of the MLP, denoted by

(2)

where .

In this work, we aim at a class of activation functions called Rectified Linear Unit (ReLU), which is expressed as below:

(3)

Thus, the output of a neuron considered in (1) can be rewritten as

(4)

and the corresponding output vector of layer becomes

(5)

in which the ReLU function is applied element-wise.

In the most of real applications, an MLP is usually viewed as a black box to generate a desirable output with respect to a given input. However, regarding property verifications such as the safety verification, it has been observed that even a well-trained neural network can react in unexpected and incorrect ways to even slight perturbations of their inputs, which could result in unsafe systems. Thus, the output reachable set estimation of an MLP, which is able to cover all possible values of outputs, is necessary for the safety verification of an MLP and draw a safe or unsafe conclusion for an MLP.

2.2 Problem Formulation

Given an input set , the reachable set of neural network (2) is stated by the following definition.

Definition 1

Given a neural network in the form of (2) and input belonging to a set , the output reachable set of (2) is defined by

(6)

In our work, the input set is considered to be a union of polyhedra, that is expressed as , where , , are described by

(7)

With respect to input set (7), the reachable set computation problem for neural network (2) with ReLU activations is given as below.

Problem 1

Given an input set and a neural network in the form of (2) with ReLU activations described by (5), how to compute the reachable set defined by (6)?

Then, we will focus on the safety verification for neural networks. The safety specification for output is expressed by a set defined in the output space, describing the safety requirement. For example, in accordance to input set, the safety region can be also considered as a union of polyhedra defined in output space as , where , , are given by

(8)

The safety region in the form of (8) formalizes the safety requirements for output . If output belongs to safety region , we say the neural network is safe, otherwise, it is called unsafe.

Definition 2

Given a neural network in the form of (2) and safety region , the MLP is safe if and only if the following condition is satisfied:

(9)

where is the symbol for logical negation and is the output reachable set of MLP defined by (6).

Therefore, the safety verification problem for MLP with ReLU activations can be stated as follows.

Problem 2

Given an input set by (7), a safety specification by (8), a neural network in the form of (2) with ReLU activation by (5), how to check if condition (9) is satisfied?

The above two linked problems are the main concerns to be addressed in the rest of this paper. The crucial step is to find an efficient way to compute the output reachable set for a ReLU neural network with a given input set. In the next sections, the main results will be presented for the output reachable set computation and safety verification for ReLU neural networks.

3 Ouput Reach Set Computation of ReLU Functions

In this section, we consider the output reachable set of a single ReLU function with an input set . Before presenting the result, an indicator vector , , is introduced for the following derivation. In the indicator vector , the element is valuated as below:

Considering all the valuations of in , there are possible valuations in total, which are indexed as

Furthermore, each indicator vector from to are diagonalized and denoted as

Now, we are ready to compute the output reachable set of ReLU function with an input . For the input set, we have three cases listed below:

  • Case 1: All the elements are positive by the input in , that implies

    According to the definition of ReLU function, the output set should be

    (10)
  • Case 2: All the elements in the outputs are non-positive, which means

    By the definition of ReLU, it directly leads to

    (11)
  • Case 3: The outputs have both positive and non-positive elements, which corresponds to indicator vectors , . Note that, for each , , the element indicates due to . With respect to each , and noting , we define the set

    where . In a compact form, it can be expressed as

    Due to ReLU functions, when , it will be set to , thus the output for should be

    Again, due to ReLU functions, the final value should be non-negative, that is , thus this additional constraint has to be added for to have

    As a result, the output reachable set is

    (12)

An illustration for the above three cases is shown in Figure 1 with two dimensional input space. In Figure 1, (a) is for the Case 1, (b) is for the Case 2, (c) and (d) are for the Case 3. Summarizing the three cases for a ReLU function , the following proposition can be obtained.

Figure 1: Visualization of ReLU function , . (a) is for the Case 1, the red area is for ; (b) is for the Case 2 where the red spot denotes ; (c) and (d) are for the Case 3, the red lines on the axis are , . The resulting output set in (d) is non-convex but expressed by a union of polyhedra.
Theorem 1

Given a ReLU function with an input set , its output reachable set is

(13)

where , and are defined by (10), (11) and (12), respectively.

Proof. The proof can be straightforwardly obtained from the derivation of above three cases. Three cases completely characterize the behaviors of ReLU functions due to . For the case of , it produces , and leads to . As to , , the output reachable set is . Thus, the output reachable set is the union of output sets of three cases, that is .

Theorem 1 gives out a general result for output reachable set of a ReLU function, since there is no restriction imposed on input set . In the following, we consider the input set as a union of polyhedra described by , where , are given as

(14)

Based on Theorem 1, the following result can be derived for input sets described by a union of polyhedra.

Theorem 2

Given a ReLU function with an input set in which , , is defined by (14), its output reachable set is

(15)

where , , , are as follows

Proof. First, when , it has , thus the output set , which is

Then, if , where can be defined by

According to the definition of ReLU functions, it directly shows that the output set is

Finally, we consider , , which can be expressed by

where .

Adding the additional constraint that , set , , is expressed as

with .

By the fact of , , , it can be obtained that

Thus, based on Theorem 1, the output reachable set for input set is

Moreover, for input set , the output reachable set is , which implies (15) holds.

According to the result in Theorem 2, the following useful corollary can be derived before ending this subsection.

Corollary 1

Given a ReLU function , if the input set is a union of polyhedra, then the output reachable set is also a union of polyhedra.

Proof. By Theorem 2, , , , , , are all defined as polyhedra when input set is a union of polyhedra, thus is a union of polyhedra. The proof is complete.

Theorems 1 and 2 present the output reach set of a ReLU function. Following those results for ReLU functions, the reachable set for neural networks composed of ReLU activations will be studied in the next section.

4 Reach Set Computation and Verification for ReLU Neural Networks

Based on the reachability analysis for ReLU activation functions in previous section, we are ready for the main problems in this paper, the reachable set computation and verification problems for ReLU neural networks. First, ReLU neural networks can be expressed recursively in a layer-by-layer form of

(16)

where is the input set defined by a union of polyhedra as in (7). The input set and output set of layer are denoted as and , respectively.

Lemma 1

Consider neural network (16) with input set defined by a union of polyhedra (7), the output sets of each layer , , are all defined by a union of polyhedra.

Proof. Consider input set of layer , which is a union of polyhedra, the following set which is an affine map of should be a union of polyhedra

Then, using Corollary 1 and , the output reachable set is a union of polyhedra.

Also from (16), because we have

The above procedure can be iterated from to to claim , , are all defined by a union of polyhedra.

By Lemma 1, the output sets of each layer are defined as a union of polyhedra, and due to , the input set of each layer can be represented as , in which is

(17)

With regard to the input set of layer described by (17), the output reachable set can be obtained by the following theorem, which is the main result in this paper.

Theorem 3

Consider layer of ReLU neural network (16) with input set defined by (17), the output reachable set of layer is

(18)

where

Proof. The proof is briefly presented below since it is similar to the proof line for Theorem 2.

Like the proof line in Theorem 2, when which means , combining , it defines a set as

Moreover, due to , the output reachable set for is

Similarly, if , we have set defined by

Using ReLU function to get

Lastly, we consider the case that has both positive and non-positive elements, that is , , where is expressed as follows

where

Furthermore, due to , an additional constraint should be added to to obtain as

where

Then, following the guidelines of Theorem 2, the output reachable set of the form (18) can be established.

As for linear activations, which are commonly used in the output layer, the output reachable set can be computed like the set for ReLU, without the constraint . The following corollary is given for linear layers.

Corollary 2

Consider a linear layer with input set defined by (17), the output reachable set of linear layer is

(19)

where .

Proof. For an input , the linear relation implies that the output reachable set is . Moreover, because of , it directly leads to .

With the output reachable set computation results for both ReLU layers and linear layers, we are now ready to present the output reachable set computation results along with safety verification results summarized as functions OutputReLU, OutputReLUNetwork and VeriReLUNetwork presented in Algorithms 1, 2 and 3, respectively.

1:ReLU neural network weight matrix and bias , input set with .
2:Output reachable set .
3:function OutputReLU()
4:     for  do
5:         
6:         
7:         
8:         if  then
9:              
10:         else
11:              
12:         end if
13:         for  do
14:              
15:              
16:              
17:              
18:         end for
19:         
20:     end for
21:     return
22:end function
Algorithm 1 Output Reach Set Computation for ReLU Layers
1:ReLU neural network weight matrices and biases , , input set with .
2:Output reachable set .
3:function OutputReLUNetwork()
4:     
5:     for  do
6:         if Layer is a linear layer then
7: