Log In Sign Up

Safety Verification and Robustness Analysis of Neural Networks via Quadratic Constraints and Semidefinite Programming

by   Mahyar Fazlyab, et al.

Analyzing the robustness of neural networks against norm-bounded uncertainties and adversarial attacks has found many applications ranging from safety verification to robust training. In this paper, we propose a semidefinite programming (SDP) framework for safety verification and robustness analysis of neural networks with general activation functions. Our main idea is to abstract various properties of activation functions (e.g., monotonicity, bounded slope, bounded values, and repetition across layers) with the formalism of quadratic constraints. We then analyze the safety properties of the abstracted network via the S-procedure and semidefinite programming. Compared to other semidefinite relaxations proposed in the literature, our method is less conservative, especially for deep networks, with an order of magnitude reduction in computational complexity. Furthermore, our approach is applicable to any activation functions.


page 1

page 2

page 3

page 4


Probabilistic Verification and Reachability Analysis of Neural Networks via Semidefinite Programming

Quantifying the robustness of neural networks or verifying their safety ...

Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming

Convex relaxations have emerged as a promising approach for verifying de...

Neural network training under semidefinite constraints

This paper is concerned with the training of neural networks (NNs) under...

A Dual Approach to Scalable Verification of Deep Networks

This paper addresses the problem of formally verifying desirable propert...

DualApp: Tight Over-Approximation for Neural Network Robustness Verification via Under-Approximation

The robustness of neural networks is fundamental to the hosting system's...

Parametric Chordal Sparsity for SDP-based Neural Network Verification

Many future technologies rely on neural networks, but verifying the corr...

Linear systems with neural network nonlinearities: Improved stability analysis via acausal Zames-Falb multipliers

In this paper, we analyze the stability of feedback interconnections of ...

1 Introduction

Neural networks have become increasingly effective at many difficult machine-learning tasks. However, the nonlinear and large-scale nature of neural networks make them hard to analyze and, therefore, they are mostly used as black-box models without formal guarantees. In particular, neural networks are highly vulnerable to attacks, or more generally, uncertainty in the input. In the context of image classification, for example, neural networks can be easily deluded into changing their classification labels by slightly perturbing the input image. Indeed, it has been shown that even imperceptible perturbations in the input of the state-of-the-art neural networks cause natural images to be misclassified with high probability

(Moosavi-Dezfooli et al., 2017). These perturbations can be either of an adversarial nature (Szegedy et al., 2013), or they could merely occur due to compression, resizing, and cropping (Zheng et al., 2016). As another example, in the space of malware classification, the existence of adversarial examples not only limits their potential application settings but entirely defeats its purpose. These drawbacks limit the adoption of neural networks in safety-critical applications such as self-driving vehicles (Bojarski et al., 2016), aircraft collision avoidance procedures (Julian et al., 2016), speech recognition, and recognition of voice commands; see (Xiang et al., 2018a) for a survey.

Motivated by the serious consequences of the fragility of neural networks to input uncertainties or adversarial attacks, there has been an increasing effort in developing tools to measure or improve the robustness of neural networks. Many results focus on specific adversarial attacks and make an attempt to harden the network by, for example, crafting hard-to-classify examples

(Goodfellow et al., ; Kurakin et al., 2016; Papernot et al., 2016; Moosavi-Dezfooli et al., 2016)

. Although these methods are scalable and work well in practice, they still suffer from false negatives. Safety-critical applications require provable robustness against any bounded variations in the input data. As a result, many tools have recently been used, adapted, or developed for this purpose, such as mixed-integer linear programming

(Bastani et al., 2016; Lomuscio & Maganti, 2017; Tjeng et al., 2017), robust optimization and duality theory (Kolter & Wong, 2017; Dvijotham et al., 2018), Satisfiability Modulo Theory (SMT) (Pulina & Tacchella, 2012), dynamical systems (Ivanov et al., 2018; Xiang et al., 2018b), Abstract Interpretation (Mirman et al., 2018) and many others (Hein & Andriushchenko, 2017). All these works aim at bounding the worst-case value of a performance measure when their input is perturbed within a specified range.

In this paper, we develop a semidefinite program (SDP) for safety verification and robustness analysis of neural networks against norm-bounded uncertainties in the input. Our main idea is to abstract the nonlinear activation functions by the constraints they impose on the pre- and post- activation values. In particular, we describe various properties of activation functions using quadratic constraints, such as bounded slope, bounded values, monotonicity, and repetition across layers. Using this abstraction, any properties (e.g., safety or robustness) that we can guarantee for the “abstracted” network will automatically be satisfied by the original network as well. The quadratic form of these constraints allows us to formulate the problem as an SDP. Our main tool for developing the SDP is the -procedure from robust control, which allows us to reason about multiple quadratic constraints. As a notable advantage, we can analyze networks with any combination of activation functions across the layers. In this paper, we focus on a canonical problem (formally stated in 2.1

) that can be adapted to other closely related problems such as sensitivity analysis with respect to input perturbations, output reachable set estimation, adversarial training examples generation, and near-duplicate detection.

1.1 Related Work

The performance of certification algorithms for neural networks can be measured along three axes. The first axis is the tightness of the certification bounds; the second axis is the computational complexity, and, the third axis is applicability across various models (e.g. different activation functions). These axes conflict. For instance, the conservatism of the algorithm is typically at odds with the computational complexity. On the other hand, generalizable algorithms tend to be more conservative. The relative advantage of any of these algorithms is application specific. For example, reachability analysis and safety verification applications call for less conservative algorithms, whereas in robust training, computationally fast algorithms are desirable

(Weng et al., 2018).

On the one hand, formal verification techniques such as Satisfiability Modulo (SMT) solvers (Ehlers, 2017; Huang et al., 2017; Katz et al., 2017), or integer programming approaches (Lomuscio & Maganti, 2017; Tjeng et al., 2017)

rely on combinatorial optimization to provide tight certification bounds for piece-wise linear networks, whose complexity scales exponentially with the size of the network in the worst-case. A notable work to improve scalability is

(Tjeng et al., 2017), where the authors do exact verification of piecewise-linear networks using mixed-integer programming with an order of magnitude reduction in computational cost via tight formulations for non-linearities and careful preprocessing.

On the other hand, certification algorithms based on continuous optimization are more scalable but less accurate. A notable work in this category is reported in (Kolter & Wong, 2017), where the authors propose a linear-programming (LP) relaxation of piece-wise linear networks and provide upper bounds on the worst-case loss using weak duality. The main advantage of this work is that the proposed algorithm solely relies on forward- and back-propagation operations on a modified network, and thus is easily integrable into existing learning algorithms. In (Raghunathan et al., 2018a), the authors propose an SDP relaxation of one-layer sigmoid-based neural networks based on bounding the worst-case loss with a first-order Taylor expansion. Finally, the closest work to the present work is (Raghunathan et al., 2018b), in which the authors propose a semidefinite relaxation (SDR) for certifying robustness of piece-wise linear multi-layer neural networks. This technique provides tighter bounds than that of (Kolter & Wong, 2017), although it is less scalable.

Our contribution. The present work, which also relies on an SDP relaxation, has the following features:

  • We use various forms of quadratic constraints (QCs) to abstract any type of activation function.

  • Our method is able to capture

    the cross-coupling between neurons across different layers

    , thereby reducing conservatism. This feature becomes particularly effective for deep networks, see 6 and Figure 5.

  • We can control the trade-off between computational complexity and conservatism by systematically including or excluding different types of QCs.

  • For one-layer neural networks, the proposed SDP offers an order of magnitude reduction in computational complexity compared to (Raghunathan et al., 2018b) while preserving the accuracy. In particular, there are decision variables (where is the total number of neurons), while the SDP of (Raghunathan et al., 2018b) has decision variables.

  • For multi-layer neural networks, the SDP of the present work–with all possible QCs included–is more accurate than that of (Raghunathan et al., 2018b) with the same computational complexity.

The main drawback of our approach (and all SDP-based approaches) is the limited scalability of SDPs in general. To overcome this issue for the case of deep networks with more than (roughly) five thousands of neurons, we propose to adopt a modular approach, in which we analyze the network layer by layer via solving a sequence of small SDPs, as opposed to a single large one. This approach mitigates the scalability issues but induces more conservatism.

1.2 Notation and Preliminaries

We denote the set of real numbers by , the set of real

-dimensional vectors by

, the set of -dimensional matrices by , and the

-dimensional identity matrix by

. We denote by , , and the sets of -by- symmetric, positive semidefinite, and positive definite matrices, respectively. The -norm () is displayed by .

2 Safety and Robustness Analysis

2.1 Problem Statement

Consider the nonlinear vector-valued function

described by a multi-layer feed-forward neural network. Given a bounded set

of possible inputs (e.g., adversarial examples), the neural network maps to an output set given by


The desirable properties that we would like to verify can often be represented by a safety specification set in the output space of the neural network. In this context, the network is safe if the output set lies within the safe region, i.e., if the inclusion holds. In the context of image classification, for example, a popular choice are perturbations in the norm, i.e., , where is a correctly classified test image, is the set of all possible images obtained by perturbing each pixel of by ,

is the set of all perturbed logit inputs to the classifier, and

is the set of all logit values that produce the same label as . Then the condition guarantees that the network will assign the same label to all images in (local robustness).

Checking the condition , however, requires an exact computation of the nonconvex set , which is very difficult. Instead, our interest is in finding a non-conservative outer approximation of and verifying the safety properties by checking the condition . This approach detects all false negatives but also produces false positives, whose rate depends on the tightness of the over-approximation–see Figure 1. The goal of this paper is to solve this problem for a broad class of input uncertainties and safety specification sets using semidefinite programming.

Figure 1:

The output set (in blue), the boundary of its over-approximation (in red), and the hyperplane characterizing the safe region (in black). Left: The network is deemed safe since

. Right: The network is deemed unsafe since .

2.2 Neural Network Model

For the model of the neural network, we consider an -layer feed-forward neural network described by the following recursive equations:


where is the input to the network, is a bounded uncertainty set,

are the weight matrix and bias vector of the

-th layer. The nonlinear activation function

(ReLU, sigmoid, tanh, leaky ReLU, etc.) is applied coordinate-wise to the pre-activation vectors, i.e., it is of the form


where is the activation function of each individual neuron. The output depends on the specific application we are considering. For example, in image classification with cross-entropy loss, represents the logit input to the softmax function; or, in feedback control, is the input to the neural network controller (e.g., tracking error) and is the control input to the plant.

3 Problem Abstraction via Quadratic Constraints

In this section, our goal is to provide an abstraction of the verification problem that can be converted into a semidefinite program. Our main tool is Quadratic Constraints (QCs), which were first developed in the context of robust control (Megretski & Rantzer, 1997) for describing nonlinear, time-varying, or uncertain components of a system but we adapt it here for our purposes. We start off with the abstraction of the uncertainty set using QCs.

3.1 Input uncertainty

We now provide a particular way of representing the input set that will prove useful for developing the SDP.


Let be a nonempty set. Suppose is the set of all symmetric matrices such that


We then say that satisfies the QC defined by .

Note that by definition, is a convex cone. Furthermore, we can write


In other words, we can over-approximate by expressing it as a possibly infinite intersection of sets defined by quadratic inequalities.

Proposition (QC for hyper-rectangle)

The hyper-rectangle satisfies the quadratic constraint defined by


where for all .

Our particular focus in this paper is on perturbations in norm, , which are a particular class of hyper-rectangles with and .

We can adapt the result of Proposition 3.1 to other uncertainty sets such as polytopes, zonotopes, and ellipsoids. We do not elaborate on these attack models in this paper. We instead assume that the uncertainty set can be abstracted by a quadratic constraint of the form (4). We will see in 4 that the matrix appears as a decision variable in the SDP. In this way, we can optimize the outer approximation of to minimize the conservatism of the specific verification problem we want to solve.

3.2 Safety Specification Set

In our framework, we can consider specification sets that can be represented (or inner approximated) by the intersection of finitely many quadratic inequalities:


where the are given. This characterization includes ellipsoids and polytopes in particular. For instance, for a safety specification set described by the polytope

the are given by

3.3 Abstraction of Nonlinearities by Quadratic Constraints

One of the main difficulties in the analysis of neural networks is the presence of nonlinear activation functions. To simplify the analysis, instead of analyzing the network directly, our main idea is to remove the nonlinear activation functions from the network but retain the constraints they impose on the pre- and post-activation signals. Using this abstraction, any properties (e.g., safety or robustness) that we can guarantee for the constrained network will automatically be satisfied by the original network as well. In the following, we show how we can encode many of the important properties of the activation functions (e.g., monotonicity, bounded slope, and bounded values) using quadratic constraints. We first provide the formal definition below.


Let and suppose is the set of all symmetric and indefinite matrices such that the inequality


holds for all . Then we say satisfies the quadratic constraint defined by .

We remark that our definition of a quadratic constraint slightly differs from the one used in robust control (Megretski & Rantzer, 1997), by including a constant in the vector surrounding the matrix , which allows us to incorporate affine constraints (e.g. bounded nonlinearities).

The derivation of quadratic constraints is function specific but there are certain rules that can be used for all of them which we describe below.

3.3.1 Slope-restricted Nonlinearities

Consider the nonlinear function with . We say that is sector-bounded in the sector () if the following condition holds for all :


Intuitively (and for the one-dimensional case ), this inequality means that the function lies in the sector formed by the lines and . As an example, the ReLU function belongs to the sector . The sector condition, however, does not impose any restriction on the slope of the function. This motivates a more accurate description of nonlinearities that have bounded slope.

Definition (slope-restricted nonlinearity)

A nonlinear function is slope-restricted on () if


for any two pairs and , where and with an abuse of notation.

For the one-dimensional case (), the slope restriction condition in (10) states that the chord connecting any two points on the curve of the function has a slope that is at least and at most :


Comparing (9) and (10), we see that the sector bound condition is a special case of the slope restriction condition when . As a result, a slope-restricted nonlinearity with is also sector bounded; see Figure 2 for an illustration.

In the context of neural networks, our interest is in repeated nonlinearities of the form (3). Furthermore, the activation values might be bounded from below or above (e.g., the ReLU function which outputs a nonnegative value). The quadratic inequality in (10) is too conservative and does not capture these properties. In the following, we discuss QCs for these properties.

Figure 2: A slope-restricted nonlinearity (left) and a sector-bounded nonlinearity (right).

3.3.2 Repeated Nonlinearities

Suppose is slope-restricted on and let be a vector-valued function constructed by component-wise repetition of . It is not hard to verify that is also slope-restricted in the same sector. However, this representation simply ignores the fact that all the nonlinearities that compose are the same. By taking advantage of this structure, we can refine the quadratic constraint that describes . To be specific, for an input-output pair , we can write the inequality


for all . This particular QC considerably reduces conservatism, especially for deep networks, as it reasons about the coupling between the neurons throughout the entire network. By making an analogy to dynamical systems, we can interpret the neural network as a time-varying discrete-time dynamical system where the same nonlinearity is repeated for all time indexes (the layer number). Then the QC in (12) couples all the possible neurons.

Lemma (repeated nonlinearities)

Suppose is slope-restricted on . Then the vector-valued function satisfies


where , , and is the -th unit vector in .

There are several results in the literature about repeated nonlinearities. For instance, in (D’amato et al., 2001; Kulkarni & Safonov, 2002), the authors derive QCs for repeated and odd nonlinearities (e.g. tanh function).

3.3.3 Bounded Nonlinearities

Finally, suppose the nonlinear function values are bounded, i.e., for all . This bound is equivalent to


We can write a similar inequality when the pre- activation values are known to be bounded.

We observe that the inequalities (12)-(14) are all quadratic in and , and therefore can be encapsulated into QCs of the form (8). As we show in 4, the matrix that abstracts the nonlinearities appears as a decision variable in the SDP.

3.4 Quadratic Constraints for Activation Functions

To connect the results of the previous two subsections to activation functions in neural networks, we recall the following result from (Heath & Wills, 2005).

Lemma (gradient of convex functions)

Consider a function that is -convex and -smooth. Then the gradient function is slope-restricted in the sector .

Notably, all activation functions used in deep neural networks are gradients of convex functions. They therefore belong to the class of slope-restricted nonlinearities, according to Lemma 3.4. We have the following result.


The following statements hold true.

  • The ReLU function is slope-restricted and sector-bounded in the sector .

  • The sigmoid function,

    is slope-restricted in the sector .

  • The tanh function, is slope-restricted and sector-bounded in the sector .

  • The leaky ReLU function, with is slope-restricted and sector-bounded in the sector .

  • The exponential linear function (ELU), with is slope-restricted and sector-bounded in the sector .

  • The softmax function, is slope-restricted in the sector .


It is easy to show that each of the activation functions mentioned above is the gradient of a convex function.

Although the above rules can be used to guide the search for valid QCs for activation functions, a less conservative description of activation functions requires a case-by-case treatment to further exploit the structure of the nonlinearity. For instance, the ReLU function precisely lies on the boundary of the sector . Indeed, it can be described by the following constraints (Raghunathan et al., 2018b):


The first constraint is the boundary of the sector and the other constraints simply prune these boundaries to recover the ReLU function. In the following lemma, we provide a full QC characterization of the ReLU function.

Lemma (QC for ReLU function)

The ReLU function, , satisfies the QC defined by where


Here and is given by

with and .

Deriving non-conservative QCs for the other functions (other than ReLU) is more complicated as they are not on the boundary of any sector. However, by bounding these functions at multiple points by sector bounds of the form (10), we can obtain a substantially better over-approximation. In Figure 3, we illustrate this idea for the tanh function.

Figure 3: The curve of the tanh function over-approximated by the intersection of three sectors.

4 SDP for One-layer Neural Networks

For the sake of simplicity in the exposition, we start with the analysis of one-layer neural networks and then extend the results to the multi-layer case in 5. In the following theorem, we put all the pieces together to develop an SDP that can assert whether .

Theorem (SDP for one layer)

Consider a one-layer neural network described by the equations


Suppose , where is bounded and satisfies the quadratic constraint defined by , i.e., for any ,


Furthermore, suppose satisfies the quadratic constraint defined by , i.e., for any ,


Consider the following matrix inequality:


where equationparentequation


and is a given symmetric matrix. If (20) is feasible for some , then

Theorem 4 states that if the matrix inequality (20) is feasible for some , then we can certify that the network is safe with respect to the perturbation set and safety specification set , i.e., . Since and are both convex, (20) is a linear matrix inequality (LMI) feasibility problem and, hence, can be efficiently solved via interior-point method solvers for convex optimization.

The crux of our idea in the development of Theorem 4 is the -procedure (Yakubovich, 1997), a technique to reason about multiple quadratic constraints, and is frequently used in robust control and optimization (Boyd et al., 1994; Ben-Tal et al., 2009).

4.1 Certified Upper Bounds

In Theorem 4, we developed a feasibility problem to assert whether the output set is enclosed in the safe set . In particular, if is described by the half-space


with a given and , then the feasibility of the LMI in Theorem 4 implies , or equivalently,


In other words, is a certified upper bound on the quantity . Now if treat as a decision variable, we can optimize this bound by minimizing subject to the LMI constraint in (20). This is particularly useful for reachability analysis, where the goal is to over approximate the output set by a polyhedron of the form

where are given and the goal is to find the smallest value of for all such that .

5 Multi-layer Neural Networks

In this section, we turn to multi-layer neural networks. Assuming that all the activation functions are the same across the layers (repetition across layers), we can concatenate all the pre- and post-activation signals together and form a more compact representation. To see this, we first introduce , where is the number of hidden layers. Then, we can write (2) compactly as equationparentequation


In the following result, we develop the multi-layer counterpart of Theorem 4 for the model in (24).

Theorem (SDP for multiple layers)

Consider the multi-layer neural network described by (24). Suppose and satisfy the quadratic constraints defined by and , respectively, as in (18) and (19). Consider the following LMI.


where equationparentequation


and is a given symmetric matrix. If (25) is feasible for some , then

6 Discussion

In this section, we discuss the numerical aspects of our approach. For solving the SDP, we used MOSEK (ApS, 2017) with CVX (CVX Research, 2012) on a 5-core personal computer with 8GB of RAM. We start off with the computational complexity of the proposed SDP.

6.1 Computational Complexity

Input set. The quadratic constraint that over-approximates the input set is indexed by decision variables, where is the input dimension. However, if we use a diagonal matrix in (6), we can reduce the number of decision variables to without significantly increasing the conservatism.

Activation functions. For a network with hidden neurons, if we use all possible quadratic constraints, the number of decision variables will be , which is the same number of decision variables as in (Raghunathan et al., 2018b). For the one-layer case, if we ignore repeated nonlinearities, we arrive at decision variables. In our numerical experiments, we did not observe any additional conservatism after removing repeated nonlinearities across the neurons of the same layer. However, accounting for repeated nonlinearities was very effective for the case of multiple layers.

Safety specification set. The number of decision variables for the safety specification set depends on how we would like to bound the output set. For instance, for finding a single hyperplane, we have only one decision variable. For the case of ellipsoids, there will be decision variables.

6.2 Experiments

In Figure 4, we compare the bounds for a network with inputs, outputs, and a varying number of hidden layers with neurons per layer. We observe that the bounds obtained by the SDP remain relatively accurate as a result of including repeated nonlinearities. In the supplementary material, we visualize the over-approximations for different scenarios. In Figure 5 we depict the effect of the number of layers on the quality of approximation; in Figure 6, we show the effect of number of hidden neurons on the quality of approximation for a single-layer network, and in Figure 7, we change the perturbation size .

In Table 1, we report the computation time (CVX overhead included) for a network with one hidden layer and a varying number of neurons. We observe that the SDR of (Raghunathan et al., 2018b) runs out of memory for larger networks (1600 in the computer used for this experiment). However, the SDP of this paper can solve networks of up to size 5000 with the same memory.

Figure 4: Comparison of the bounds with the ground truth for a network with , , random selection of the weights and biases, and varying number of hidden layers with 100 neurons per layer. The hyperplane vector is chosen as the vector of all ones.
Number of neurons Solve time
SDP (this paper) SDR
200 3.2 2.7
400 11.3 20.4
800 78.6 149.1
1200 311.2 799.1
1600 1072.6 OOM
2000 1249.7 OOM
3000 3126.5 OOM
Table 1: Comparison between the solve times of the SDP (this paper), and the SDR (Raghunathan et al., 2018b)) for a varying number of nodes. The SDR approach runs out of memory (oom) for networks larger than 1600 neurons.

7 Conclusion

In this paper, we proposed an SDP for robustness analysis and safety verification of feed-forward fully-connected neural networks with general activation functions. We used quadratic constraints to abstract various elements of the problem, namely, the input uncertainty set, the safety specification set, and the nonlinear activation functions. To reduce conservatism, we developed quadratic constraints that are able to reason about the coupling between neurons throughout the entire network. We focused on uncertainty sets. However, we can consider any set that can be over approximated by quadratic constraints, namely, ellipsoids, polytopes, and zonotopes.


Appendix A Appendix

a.1 Proof of Proposition 3.1

Note that the inequality implies


for all , where . Summing the preceding inequality over all and denoting , we will arrive at the claimed inequality.

a.2 Proof of Lemma 3.3.2

For the pairs and , , we can write


By substituting

into (28), we obtain


where . Multiplying both sides of (29) by and summing over all will yield the desired result.

a.3 Proof of Lemma 3.4

Consider the equivalence in (15) for the -th coordinate of the activation function :


Multiplying the constraints by , , and , respectively and adding them together, we obtain



back into (31), we get


On the other hand, since is a repeated nonlinearity, it satisfies the inequality


according to Lemma 3.3.2, where the expression for is given in the lemma. Adding (33) and (31) for all yields the desired QC for the ReLU function.

a.4 Proof of Theorem 4

Using the assumption that satisfies the QC defined by , we can write the following QC from the identity :


By substituting the identity


back into (34) and denoting , we can write the inequality


for all . Next, by assumption satisfies the QC defined by . The corresponding quadratic inequality can be written as

for all . The above QC can be written as


for all and all . Suppose (20) holds for some . By left- and right- multiplying both sides of (19) by and , respectively, we obtain

Therefore, the last term on the left-hand side must be nonpositive for all , or, equivalently,


Using the relation , the above inequality can be written as

The proof is now complete.

a.5 Proof of Theorem 5

Since satisfies the QC defined by , we can write the following QC from the identity :


The preceding inequality is equivalent to


for all . Next, by assumption satisfies the QC defined by . The corresponding quadratic inequality can be written as


for all and all . Suppose (25) holds for some . By left- and right- multiplying both sides of (19) by and , respectively, we obtain

for all . Therefore, the last quadratic term must be nonpositive, from where we can write


for all . Using the relation from (24), the above inequality can be written as

The proof is now complete.

Figure 5: Illustrations of the output set (blue), the polytope obtained from the results of this paper (red), and the polytope obtained by the semidefinite relaxation of (Raghunathan et al., 2018b)

. The number of neurons per layer is 100, the weights are chosen randomly using the Gaussian distribution, and the input set is the

norm with and . From the left to right and top to bottom, the number of hidden layers is and (the activation function is ReLU). We observe that by the inclusion of repeated nonlinearities, our method is able to provide tighter bounds.
Figure 6: The effect of the number of hidden neurons on the over approximation quality of the SDP for a one-layer neural network with (left), (middle), and hidden nuerons (right). The activation function is ReLU. Quadratic constraints for repeated nonlinearity are not included.
Figure 7: The effect of (the norm of the input set) on the over approximation quality of the SDP for (left), (middle), and (right). The network architecture is 2-500-2 with ReLU activation functions. Quadratic constraints for repeated nonlinearity are not included.