I Introduction
Neural Networks (NN) have been very successful in various applications such as endtoend learning for selfdriving cars [4], learningbased controllers in robotics [20]
, speech recognition, and image classifiers. Their vulnerability to input uncertainties and adversarial attacks, however, refutes the deployment of neural networks in safety critical applications. In the context of image classification, for example, it has been shown in several works
[28, 18, 21] that even adding an imperceptible noise to the input of neural networkbased classifiers can completely change their decision. In this context, verification refers to the process of checking whether the output of a trained NN satisfies certain desirable properties when its input is perturbed within an uncertainty model. More precisely, we would like to verify whether the neural network’s prediction remains the same in a neighborhood of a test point . This neighborhood can represent, for example, the set of input examples that can be crafted by an adversary.In worstcase
safety verification, we assume that the input uncertainty is bounded and we verify a safety property for all possible perturbations within the uncertainty set. This approach has been pursued extensively in several works using various tools, such as mixedinteger linear programming
[2, 15, 22], robust optimization and duality theory [13, 8], Satisfiability Modulo Theory (SMT) [19], dynamical systems [12, 27], Robust Control [9], Abstract Interpretation [17] and many others [11, 25].In probabilistic verification, on the other hand, we assume that the input uncertainty is random but potentially unbounded. Random uncertainties can emerge as a result of, for example, data quantization, input preprocessing, and environmental background noises [26]. In contrast to the worstcase approach, there are only few works that have studied verification of neural networks in probabilistic settings [26, 7, 3]. In situations where we have random uncertainty models, we ask a related question: “Can we provide statistical guarantees on the output of neural networks when their input is perturbed with a random noise?” In this paper, we provide an affirmative answer by addressing two related problems:

[leftmargin=*]

Probabilistic Verification:
Given a safe region in the output space of the neural network, our goal is estimate the probability that the output of the neural network will be in the safe region when its input is perturbed by a random variable with a known mean and covariance.

Confidence propagation: Given a confidence ellipsoid on the input of the neural network, we want to estimate the output confidence ellipsoid.
The rest of the paper is organized as follows. In Section II, we discuss safety verification of neural networks in both deterministic and probabilistic settings. In Section III, we provide an abstraction of neural networks using the formalism of quadratic constraints. In Section IV we develop a convex relaxation to the problem of confidence ellipsoid estimation. In Section V, we present the numerical experiments. Finally, we draw our conclusions in Section VI.
Ia Notation and Preliminaries
We denote the set of real numbers by , the set of real
dimensional vectors by
, the set of dimensional matrices by , and thedimensional identity matrix by
. We denote by , , and the sets of by symmetric, positive semidefinite, and positive definite matrices, respectively. We denote ellipsoids in bywhere is the center of the ellipsoid and determines its orientation and volume. We denote the mean and covariance of a random variable by and , respectively.
Ii Safety Verification of Neural Networks
Iia Deterministic Safety Verification
Consider a multilayer feedforward fullyconnected neural network described by the following equations,
(1)  
where is the input to the network,
are the weight matrix and bias vector of the
th layer. The nonlinear activation function(Rectified Linear Unit (ReLU), sigmoid, tanh, leaky ReLU, etc.) is applied coordinatewise to the preactivation vectors, i.e., it is of the form
(2) 
where
is the activation function of each individual neuron. Although our framework is applicable to all activation functions, we focus our attention to ReLU activation functions,
.In deterministic safety verification, we are given a bounded set of possible inputs (the uncertainty set), which is mapped by the neural network to the output reachable set . The desirable properties that we would like to verify can often be described by a set in the output space of the neural network, which we call the safe region. In this context, the network is safe if .
IiB Probabilistic Safety Verification
In a deterministic setting, reachability analysis and safety verification is a yes/no problem whose answer does not quantify the proportion of inputs for which the safety is violated. Furthermore, if the uncertainty is random and potentially unbounded, the output would satisfy the safety constraint only with a certain probability. More precisely, given a safe region in the output space of the neural network, we are interested in finding the probability that the neural network maps the random input to the safe region,
Since is a nonlinear function, computing the distribution of given the distribution of is prohibitive, except for special cases. As a result, we settle for providing a lower bound on the desired probability,
To compute the lower bound, we adopt a geometrical approach, in which we verify whether the reachable set of a confidence region of the input lies entirely in the safe set . We first recall the definition of a confidence region.
Definition 1 (Confidence region)
The level () confidence region of a vector random variable is defined as any set for which
Although confidence regions can have different representations, our particular focus in this paper is on ellipsoidal confidence regions. Due to their appealing geometric properties (e.g., invariance to affine subspace transformations), ellipsoids are widely used in robust control to compute reachable sets [24, 23, 5].
The next two lemmas characterize confidence ellipsoids for Gaussian random variables and random variables with known first two moments.
Lemma 1
Let be an dimensional Gaussian random variable. Then the level confidence region of is given by the ellipsoid
(3) 
where
is the quantile function of the chisquared distribution with
degrees of freedom.For nonGaussian random variables, we can use Chebyshev’s inequality to characterize the confidence ellipsoids, if we know the first two moments.
Lemma 2
Let be an dimensional random variable with and . Then the ellipsoid
(4) 
is a level confidence region of .
Lemma 3
Let be a confidence region of a random variable . If , then is a level confidence region for the random variable , i.e., .
Proof.
The inclusion implies . Since is not necessarily a onetoone mapping, we have . Combining the last two inequalities yields the desired result. ∎
According to Lemma 3, if we can certify that the output reachable set lies entirely in the safe set for some , then the network is safe with probability at least . In particular, finding the best lower bound corresponds to the nonconvex optimization problem,
(5) 
with decision variable . By Lemma 3, the optimal solution then satisfies
(6) 
IiC Confidence Propagation
A closely related problem to probabilistic safety verification is confidence propagation. Explicitly, given a level confidence region of the input of a neural network, our goal is to find a level confidence region for the output. To see the connection to the probabilistic verification problem, let be any outer approximation of the output reachable set, i.e., . By lemma 3, is a level confidence region for the output. Of course, there is an infinite number of such possible confidence regions. Our goal is find the “best” confidence region with respect to some metric. Using the volume of the ellipsoid as an optimization criterion, the best confidence region amounts to solving the problem
(7) 
The solution to the above problem provides the level confidence region with the minimum volume. Figure 1 illustrates the procedure of confidence estimation. In the next section, we provide a convex relaxation of the optimization problem (7). The other problem in (5) is a straightforward extension of confidence estimation, and hence, we will not discuss the details.
Iii Problem Relaxation via Quadratic Constraints
Due to the presence of nonlinear activation functions, checking the condition in (5) or (7) is a nonconvex feasibility problem and is NPhard, in general. Our main idea is to abstract the original network by another network in the sense that overapproximates the output of the original network for any input ellipsoid, i.e., for any . Then it will be sufficient to verify the safety properties of the relaxed network, i.e., verify the inclusion . In the following, we use the framework of quadratic constraints to develop such an abstraction.
Iiia Relaxation of Nonlinearities by Quadratic Constraints
In this subsection, we show how we can abstract activation functions, and in particular the ReLU function, using quadratic constraints. We first provide a formal definition, introduced in [9].
Definition 2
Let be and suppose is the set of all symmetric and indefinite matrices such that the inequality
(8) 
holds for all . Then we say satisfies the quadratic constraint defined by .
Note that the matrix in Definition 2 is indefinite, or otherwise, the constraint trivially holds. Before deriving QCs for the ReLU function, we recall some definitions, which can be found in many references; for example [16, 6].
Definition 3 (Sectorbounded nonlinearity)
A nonlinear function is sectorbounded on the sector () if the following condition holds for all ,
(9) 
Definition 4 (Sloperestricted nonlinearity)
A nonlinear function is sloperestricted on () if for any and ,
(10) 
Repeated nonlinearities. Assuming that the same activation function is used in all neurons, we can exploit this structure to refine the QC abstraction of the nonlinearity. Explicitly, suppose is sloperestricted on and let be a vectorvalued function constructed by componentwise repetition of . It is not hard to verify that is also sloperestricted in the same sector. However, this representation simply ignores the fact that all the nonlinearities that compose are the same. By taking advantage of this structure, we can refine the quadratic constraint that describes . To be specific, for an inputoutput pair , we can write the sloperestriction condition
(11) 
for all distinct . This particular QC can tighten the relaxation incurred by the QC abstraction of the nonlinearity.
IiiB QC for ReLU function
In this subsection, we derive quadratic constraints for the ReLU function, . Note that this function lies on the boundary of the sector . More precisely, we can describe the ReLU function by three quadratic and/or affine constraints:
(12) 
On the other hand, for any two distinct indices , we can write the constraint (11) with , and ,
(13) 
By adding a weighted combination of all these constraints (positive weights for inequalities), we find that the ReLU function satisfies
(14)  
for any multipliers for . This inequality can be written in the compact form (8), as stated in the following lemma.
Lemma 4 (QC for ReLU function)
The ReLU function, , satisfies the QC defined by where
(15) 
Here and is given by
where is the th basis vector in and .
Proof.
See [9]. ∎
IiiC Tightening the Relaxation
In the previous subsection, we derived QCs that are valid for the whole space . When restricted to a region , we can tighten the QC relaxation. Consider the relationship and let , and be the set of neurons that are always active or always inactive, i.e.,
(16)  
The constraint holds with equality for active neurons. Therefore, we can write
Similarly, the constraint holds with equality for inactive neurons. Therefore, we can write
Finally, it can be verified that the crosscoupling constraint in (13) holds with equality for pairs of always active or always inactive neurons. Therefore, for any , we can write
These additional degrees of freedom on the multipliers can tighten the relaxation incurred in (14). Note that the set of active or inactive neurons are not known a priori. However, we can partially find them using, for example, interval arithmetic.
Iv Analysis of the Relaxed Network via Semidefinite Programming
In this section, we use the QC abstraction developed in the previous section to analyze the safety of the relaxed network. In the next theorem, we state our main result for onelayer neural networks and will discuss the multilayer case in Section IVA.
Theorem 1 (Output covering ellipsoid)
Consider a onelayer neural network described by the equation
(17) 
where satisfies the quadratic constraint defined by , i.e., for any ,
(18) 
Suppose . Consider the following matrix inequality
(19) 
where
with
If (19) is feasible for some , then with
Proof.
We first introduce the auxiliary variable , and rewrite the equation of the neural network as
Since satisfies the QC defined by , we can write the following QC from the identity :
(21) 
By substituting the identity
back into (21) and denoting , we can write the inequality
(22) 
for any and all . By definition, for all , we have , which is equivalent to
By using the identity
we conclude that for all ,
(23) 
Suppose (19) holds for some . By left and right multiplying both sides of (18) by and , respectively, we obtain
For any the first two quadratic terms are nonnegative by (23) and (22), respectively. Therefore, the last term on the lefthand side must be nonpositive for all ,
But the preceding inequality, using the relation , is equivalent to
which is equivalent to . Using our notation for ellipsoids, this means for all , we must have . ∎
In Theorem 1, we proposed a matrix inequality, in variables , as a sufficient condition for enclosing the output of the neural network with the ellipsoid . We can now use this result to find the minimumvolume ellipsoid with this property. Note that the matrix inequality (19) is not linear in . Nevertheless, we can convexify it by using Schur Complements.
Lemma 5
Proof.
Having established Lemma 5, we can now find the minimumvolume covering ellipsoid by solving the following semidefinite program (SDP),
minimize  (25) 
where the decision variables are . Since is a convex cone, (25) is a convex program and can be solved via interiorpoint method solvers.
Iva Multilayer Case
For multilayer neural networks, we can apply the result of Theorem 1 in a layerbylayer fashion provided that the input confidence ellipsoid of each layer is nondegenerate. This assumption holds when for all we have (reduction in the width of layers), and the weight matrices are full rank. To see this, we note that ellipsoids are invariant under affine subspace transformations such that
This implies that is positive definite whenever is positive definite, implying that the ellipsoid is nondegenerate. If the assumption is violated, we can use the compact representation of multilayer neural networks elaborated in [9] to arrive at the multilayer couterpart of the matrix inequality in (19).
V Numerical Experiments
In this section, we consider a numerical experiment, in which we estimate the confidence ellipsoid of a onelayer neural network with inputs, hidden neurons and outputs. We assume the input is Gaussian with and . The weights and biases of the network are chosen randomly. We use MATLAB, CVX [10], and Mosek [1] to solve the corresponding SDP. In Figure 2, we plot the estimated level output confidence ellipsoid along with sample outputs. We also plot the image of level input confidence ellipsoid under along with the estimated level output confidence ellipsoid.
Vi Conclusions
We studied probabilistic safety verification of neural networks when their inputs are subject to random noise with known first two moments. Instead of analyzing the network directly, we proposed to study the safety of an abstracted network instead, in which the nonlinear activation functions are relaxed by the quadratic constraints their inputoutput pairs satisfy. We then showed that we can analyze the safety properties of the abstracted network using semidefinite programming. It would be interesting to consider other related problems such as closedloop statistical safety verification and reachability analysis.
References
 [1] (2017) The mosek optimization toolbox for matlab manual. version 8.1.. External Links: Link Cited by: §V.
 [2] (2016) Measuring neural net robustness with constraints. In Advances in neural information processing systems, pp. 2613–2621. Cited by: §I.

[3]
(2018)
Analytic expressions for probabilistic moments of pldnn with gaussian input.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 9099–9107. Cited by: §I.  [4] (2016) End to end learning for selfdriving cars. arXiv preprint arXiv:1604.07316. Cited by: §I.
 [5] (2011) Stochastic tubes in model predictive control with probabilistic constraints. IEEE Transactions on Automatic Control 56 (1), pp. 194–200. Cited by: §IIB.
 [6] (2001) New results for analysis of systems with repeated nonlinearities. Automatica 37 (5), pp. 739–747. Cited by: §IIIA, §IIIA.
 [7] (2018) Verification of deep probabilistic models. arXiv preprint arXiv:1812.02795. Cited by: §I.
 [8] (2018) A dual approach to scalable verification of deep networks. arXiv preprint arXiv:1803.06567. Cited by: §I.
 [9] (2019) Safety verification and robustness analysis of neural networks via quadratic constraints and semidefinite programming. arXiv preprint arXiv:1903.01287. Cited by: §I, §IIIA, §IIIB, §IVA.
 [10] (2008) CVX: matlab software for disciplined convex programming. Cited by: §V.
 [11] (2017) Formal guarantees on the robustness of a classifier against adversarial manipulation. In Advances in Neural Information Processing Systems, pp. 2266–2276. Cited by: §I.
 [12] (2018) Verisig: verifying safety properties of hybrid systems with neural network controllers. arXiv preprint arXiv:1811.01828. Cited by: §I.
 [13] (2017) Provable defenses against adversarial examples via the convex outer adversarial polytope. arXiv preprint arXiv:1711.00851 1 (2), pp. 3. Cited by: §I.
 [14] (2002) All multipliers for repeated monotone nonlinearities. IEEE Transactions on Automatic Control 47 (7), pp. 1209–1212. Cited by: §IIIA.
 [15] (2017) An approach to reachability analysis for feedforward relu neural networks. arXiv preprint arXiv:1706.07351. Cited by: §I.
 [16] (1997) System analysis via integral quadratic constraints. IEEE Transactions on Automatic Control 42 (6), pp. 819–830. Cited by: §IIIA.

[17]
(2018)
Differentiable abstract interpretation for provably robust neural networks.
In
International Conference on Machine Learning
, pp. 3575–3583. Cited by: §I.  [18] (2017) Universal adversarial perturbations. arXiv preprint. Cited by: §I.
 [19] (2012) Challenging smt solvers to verify neural networks. AI Communications 25 (2), pp. 117–135. Cited by: §I.
 [20] (2018) Neural lander: stable drone landing control using learned dynamics. arXiv preprint arXiv:1811.08027. Cited by: §I.

[21]
(2019)
One pixel attack for fooling deep neural networks.
IEEE Transactions on Evolutionary Computation
. Cited by: §I.  [22] (2017) Evaluating robustness of neural networks with mixed integer programming. arXiv preprint arXiv:1711.07356. Cited by: §I.
 [23] (2002) A conic reformulation of model predictive control including bounded and stochastic disturbances under state and input constraints. In Proceedings of the 41st IEEE Conference on Decision and Control, 2002., Vol. 4, pp. 4643–4648. Cited by: §IIB.
 [24] (2018) Linear model predictive safety certification for learningbased control. In 2018 IEEE Conference on Decision and Control (CDC), pp. 7130–7135. Cited by: §IIB.
 [25] (2018) Efficient formal safety analysis of neural networks. In Advances in Neural Information Processing Systems, pp. 6369–6379. Cited by: §I.
 [26] (2018) PROVEN: certifying robustness of neural networks with a probabilistic approach. arXiv preprint arXiv:1812.08329. Cited by: §I.
 [27] (2018) Output reachable set estimation and verification for multilayer neural networks. IEEE transactions on neural networks and learning systems (99), pp. 1–7. Cited by: §I.
 [28] (2016) Improving the robustness of deep neural networks via stability training. In Proceedings of the ieee conference on computer vision and pattern recognition, pp. 4480–4488. Cited by: §I.