Data-Driven Assessment of Deep Neural Networks with Random Input Uncertainty

When using deep neural networks to operate safety-critical systems, assessing the sensitivity of the network outputs when subject to uncertain inputs is of paramount importance. Such assessment is commonly done using reachability analysis or robustness certification. However, certification techniques typically ignore localization information, while reachable set methods can fail to issue robustness guarantees. Furthermore, many advanced methods are either computationally intractable in practice or restricted to very specific models. In this paper, we develop a data-driven optimization-based method capable of simultaneously certifying the safety of network outputs and localizing them. The proposed method provides a unified assessment framework, as it subsumes state-of-the-art reachability analysis and robustness certification. The method applies to deep neural networks of all sizes and structures, and to random input uncertainty with a general distribution. We develop sufficient conditions for the convexity of the underlying optimization, and for the number of data samples to certify and localize the outputs with overwhelming probability. We experimentally demonstrate the efficacy and tractability of the method on a deep ReLU network.

Authors

• 6 publications
• 26 publications
• Certifying Neural Network Robustness to Random Input Noise from Samples

Methods to certify the robustness of neural networks in the presence of ...
10/15/2020 ∙ by Brendon G. Anderson, et al. ∙ 0

• Data-Driven Reachability Analysis Using Matrix Zonotopes

In this paper, we propose a data-driven reachability analysis approach f...
11/17/2020 ∙ by Amr Alanwar, et al. ∙ 0

• Reachability Analysis of Deep Neural Networks with Provable Guarantees

Verifying correctness of deep neural networks (DNNs) is challenging. We ...
05/06/2018 ∙ by Wenjie Ruan, et al. ∙ 0

• Towards the Unification and Data-Driven Synthesis of Autonomous Vehicle Safety Concepts

As safety-critical autonomous vehicles (AVs) will soon become pervasive ...
07/30/2021 ∙ by Andrea Bajcsy, et al. ∙ 0

• Robust Optimization Framework for Training Shallow Neural Networks Using Reachability Method

In this paper, a robust optimization framework is developed to train sha...
07/27/2021 ∙ by Yejiang Yang, et al. ∙ 0

• Neural Predictive Monitoring under Partial Observability

We consider the problem of predictive monitoring (PM), i.e., predicting ...
08/16/2021 ∙ by Francesca Cairoli, et al. ∙ 3

• Robustness Analysis of Neural Networks via Efficient Partitioning: Theory and Applications in Control Systems

Neural networks (NNs) are now routinely implemented on systems that must...
10/01/2020 ∙ by Michael Everett, et al. ∙ 0

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Neural networks stand out for their high performance and flexibility in making data-driven predictions and decisions. However, researchers have shown that many networks are highly sensitive to inputs altered by random or adversarial perturbations [1, 2, 3]. This can result in misclassifications or outputs entering an unsafe region of the output space, as well as a large uncertainty propagation from inputs to outputs. When employing neural networks in safety-critical systems, e.g., autonomous vehicles [4, 5], this sensitive behavior is intolerable. Consequently, much effort has been placed on localizing neural network outputs and certifying their safety in the presence of input uncertainty.

In localization, one seeks to find a subset of the output space that contains the possible outputs, whereas certification is the decision problem of assessing whether the outputs enter an unsafe region or not. These two problems are clearly related: exact localization of the network outputs can be used to certify their safety. However, this approach has two problems: 1) the output set is generally intractable to compute [6], and 2) certification typically amounts to solving an NP-hard, nonconvex optimization over the output set [7]

. As a result, these assessment methods have largely been treated separately in the settings of output set estimation (see also, reachability analysis)

[6, 8], and robustness certification [9, 10, 11], and these remain active areas of research.

I-a Related Works

In this paper, we consider random input uncertainty with a known or sufficiently well-modeled probability distribution. Despite the large body of work on assessing sensitivity to adversarial inputs, random uncertainty often models reality more accurately than worst-case uncertainty

[12]. Various methods to localize and certify outputs in the presence of random inputs have been proposed [12, 13, 14, 15, 16]. However, all these approaches rely on trading off theoretical guarantees with computational complexity, and on making restrictive assumptions about either the network structure, e.g., ReLU activations, or the input distribution, e.g., Gaussian or independent coordinates.

To overcome the above limitations, we develop a novel method using a sampling-based approach called scenario optimization, which is computationally tractable, provides probabilistic guarantees, and can be applied to arbitrary networks and input distributions. The scenario approach has recently been used in both output set estimation [17] and in robustness certification [18]; however, these methods alone fail to completely assess network sensitivity. In particular, [17] localizes outputs but may fail to determine their safety, as we demonstrate in Section V-B. Furthermore, this method is restricted to localizing outputs into a norm ball, lacking the generality needed to well-approximate the more complicated (and typically nonconvex) outputs sets of neural networks in practice. On the other hand, [18] can efficiently issue robustness certificates, but completely ignores the aspect of localizing the outputs in order to do so.

I-B Contributions

In this paper, we formulate a unified framework that simultaneously localizes network outputs and certifies their safety with high-probability guarantees. The assessment procedure is data-driven, and subsumes the output set estimation method in [17] and the robustness certification method in [18] as special cases. Our method is completely general: it may be applied to any neural network and any input distribution. The outputs can be localized into a general class of sets, not just norm balls, and we obtain sufficient conditions on this class to ensure that the procedure amounts to a convex scenario optimization problem. Furthermore, we show that the resulting localization and robustness certification can be made to hold with overwhelming probability upon using a sufficient number of sampled data points in the scenario optimization. We illustrate the assessment procedure on a deep ReLU network, demonstrating the user’s control over the strength of the probabilistic guarantees and the varying levels of certification and localization. Finally, we show that our unified approach of localizing and certifying simultaneously can issue robustness certificates in cases where the two-step process of localizing then certifying cannot.

I-C Outline

Various notions of robustness are introduced and used to formalize the problem in Section II. In Sections III and IV, we connect the concepts of certification and localization of network outputs, and show that both can be assessed with guarantees using a single data-driven convex optimization problem with sufficiently many samples. We illustrate the results in Section V and conclude in Section VI.

I-D Notations

The set of real numbers is denoted by . Given a set , we denote its power set (the set of all subsets of ) by . The Minkowski sum of sets and is defined as . Furthermore, we define . For a function , we write the image of a set under as . Finally, for a norm on we denote its dual norm by , where . We assume throughout that optimization problems are attained by a solution.

Ii Problem Statement

Ii-a Network Description, Safe Set, and Safety Level

In this paper, we consider a neural network

with arbitrary structure and parameters. We assume that the input to the neural network is a random variable

with a given distribution . The support of is called the input set, which is denoted by . The output set of the network is defined to be .

Next, consider a given convex polyhedral111

The assumption of a polyhedral safe set is not restrictive. For instance, the set of outputs assigned a given label by a classifier is commonly a polyhedral set. Furthermore, when assessing network robustness using an arbitrary safe set, one may always instead use a convex polyhedral inner-approximation.

safe set , where and . By applying the results of this paper to each row of and individually, we may assume without loss of generality that , henceforth setting and . The elements of are considered to be safe. For a point , the value is called the safety level of . The point is safe if and only if its safety level is nonnegative.

Ii-B Various Notions of Robustness

We now use the safety level to define three notions of robustness for the neural network.

Ii-B1 Deterministic Robustness Level

The deterministic robustness level of the network is defined as

 r∗=infy∈Ya⊤y+b. (1)

If the deterministic robustness level is nonnegative, then , which implies that the random output is safe with probability one. This notion of robustness coincides with that used when considering adversarial inputs [19, 10, 11].

Ii-B2 Approximate Robustness Level

Although the deterministic robustness level (1) can issue strong guarantees about the safety of the network output, computing its value amounts to solving an intractable nonconvex optimization problem, since is generally a nonconvex set. Instead of computing , we can consider approximating it by

 ^r(^Y)=infy∈^Ya⊤y+b, (2)

where , termed the surrogate output set, is more tractable than , and preferably convex. We call (2) the approximate robustness level of the network. If is chosen to cover the output set , then . In this case, if the approximate robustness level is nonnegative, then the random output is safe with probability one.

Ii-B3 Probabilistic Robustness Level

The notion of deterministic robustness is too strong for many applications, particularly those with random input uncertainty [12]. Therefore, for a prescribed probability level we define the probabilistic robustness level of the network:

 ¯r(ϵ)=sup{r∈R:PX(a⊤f(X)+b≥r)≥1−ϵ}. (3)

Intuitively, the condition states that the random output has safety level at least , with probability at least . The probabilistic robustness level of the network is the largest such number . We remark that (3) is precisely the notion of probabilistic robustness used in [18]. However, [18] only provides a method for certifying that , making no effort to localize the random output in the output space.

In this paper, we aim to localize the neural network output while simultaneously certifying its safety. Mathematically, this amounts to estimating as well as lower bounding . However, as written, these two notions are seemingly disjoint, as the probabilistic robustness level encodes no information about where in the output space the random output can reach, and the output set cannot be tractably used to ascertain robustness information due to its nonconvexity. In what follows, we bridge this gap by utilizing the approximate robustness level to bound and localize the output with high probability.

Iii Certification With Localization

Iii-a Bounding the Probabilistic Robustness Level

We begin by considering the certification aspect of our problem. It can be easily verified that the probabilistic robustness level is lower bounded by the deterministic robustness level; for all , and . Therefore, a natural question is whether one can instead use the easier-to-compute approximate robustness level to lower bound the probabilistic robustness level. As it turns out, this is the case so long as the surrogate output set has high coverage over . Before proving this claim in Proposition 1, we formally define this notion of coverage.

Definition 1 (ϵ-cover).

Let be a subset of . For , the set is said to be an -cover of if

 PX(f(X)∈^Y)≥1−ϵ.

For small , Definition 1 says that is an -cover of the output set if contains the random output with high probability. In particular, if we can compute an -cover of , then we will have probabilistically localized the output. By restricting the surrogate output set in (2) to be an -cover of , we guarantee that the approximate robustness level takes into account the safety of with high probability. In this case, we suspect to well-approximate in a probabilistic sense, thereby giving a lower bound on . We formalize this conclusion as follows.

Proposition 1 (Lower bound from ϵ-cover).

Let be an arbitrary subset of . If is an -cover of , then the approximate robustness level (2) lower bounds the probabilistic robustness level (3), i.e.,

 ^r(^Y)≤¯r(ϵ). (4)
Proof.

Note that implies that by (2). Therefore, it holds that . Since is an -cover of , we have that . Hence,

 1−ϵ≤PX(f(X)∈^Y)≤PX(a⊤f(X)+b≥^r(^Y)).

This shows that is feasible for the optimization (3). Therefore, , as desired. ∎

Proposition 1 can be interpreted as follows. Suppose that is chosen to be an -cover of and the approximate robustness level, , is computed using as the surrogate output set. Then with high probability, the random output of the neural network has a safety level at least , and is contained in . In particular, if , then the random output is safe with probability at least . The proposition thereby shows that the approximate robustness level can be used for certification and localization of the output so long as the surrogate output set is chosen appropriately.

Iii-B Optimizing the Bound

From Proposition 1, we know that -covers constitute good choices of the surrogate output set used to compute the approximate robustness level. This is because the random output of the neural network is guaranteed to have safety level at least with high probability. However, it is entirely possible that the choice of -cover results in , even when the network is probabilistically robust, meaning that . In this case, the approximate robustness level fails to issue a high-probability certificate for the safety of the random output , despite being able to localize it.

To overcome the above problem, we turn our attention to optimizing the lower bound (4). This amounts to finding an -cover of that maximizes the approximate robustness level. Since optimizing over all possible subsets of is generally intractable, we choose to restrict our search to sets within a class parameterized by a parameter set and a set-valued function . A concrete example of one such class is given below.

Example 1 (Norm ball class).

Let be a fixed norm on and . Defining , let be defined by . Then, and define the class of -norm balls:

 H={{y∈Rny:∥y−¯y∥≤r}:r>0, ¯y∈Rny}.

The problem of choosing and (and therefore also ) is discussed in detail in Section IV. By restricting our search for -covers to within the class , our search reduces to maximizing the approximate robustness level over the parameter set . By slightly abusing notation, we denote the dependence of the approximate robustness level on the parameter explicitly as

 ^r(θ)=inf{a⊤y+b:y∈h(θ)}, (5)

and we formulate the following optimization problem:

 maximize ^r(θ)−λv(θ) (6) subject to PX(f(X)∈h(θ))≥1−ϵ, θ∈Θ,

where the optimization variable is the parameter . Here, , and can be chosen to be any nonnegative convex function on that increases as the volume of increases.

The objective in (6) is the approximate robustness level computed using the set as the surrogate output set. The constraint enforces that we only consider parameters such that is an -cover of the output set . The regularization term penalizes the size of . This makes the set as small as possible while maintaining its -coverage, thereby yielding the tightest high-probability localization of the output . The regularization is done at the expense of a slightly suboptimal bound (4), and can be eliminated by setting , if only certification is desired. On the other hand, taking amounts to putting all assessment efforts into localizing the output. This certification-localization tradeoff is experimentally explored in Section V-A.

Iv Data-Driven Reformulation

Even when the set is convex for all , the probabilistic constraint in (6) is in general nonconvex [20]. Constraints of this form are typically referred to as chance constraints, and there exist various approaches to reformulating and relaxing them into convex constraints. Since the problem at hand considers neural networks with complicated or possibly unknown models, we seek a data-driven approach to approximately enforcing the chance constraint in (6), without losing the certification and localization properties of the solution. The scenario approach is a popular method within the stochastic optimization and robust control communities that replaces the chance constraint with hard constraints on a number of randomly sampled data points [20, 21, 22, 23]. As we will soon see, this sampling-based method fits nicely into the framework of our problem, and maintains a lower bound on the probabilistic robustness level with high probability, provided that a sufficiently large number of samples is used.

To implement the scenario approach, suppose that is a set of independent samples of . For each input , we compute its corresponding output . Then, replacing the chance constraint in (6) with hard constraints on the samples yields the following scenario optimization problem:

 maximize ^r(θ)−λv(θ) (7) subject to yj∈h(θ) for all j∈{1,2,…,N}, θ∈Θ,

where the optimization variable is . Note that solutions to (7) are random due to the random data .

As mentioned in Section I-A, the scenario approach was used recently in reachable set estimation for dynamical systems [17] and in neural network robustness certification [18]. We remark that these works are special cases of our proposed problem (7). In particular, (7) recovers the optimization of [17] in the special case that , equals the volume of the set , and is the norm ball class. On the other hand, [18] is recovered in the special case that and is the class of all half-spaces in . Consequently, (7) subsumes these prior works, handling more general classes and regularizations , and providing a unified framework for simultaneous certification and localization of the random output . In Section V-B, we demonstrate the necessity for the more powerful formulation (7) by giving an example where reducing to the special case of [17] causes the robustness certification to fail.

Now, although the scenario approach has successfully eliminated the chance constraint from (6), there remain two issues to consider. First, it is not immediately clear whether the scenario optimization problem is convex. In Section IV-A, we leverage results from parametric optimization to develop conditions on our choice of and to ensure that the scenario problem (7) is convex. Second, the solution of the scenario problem (7) gives a random approximation to the solution of (6), which optimizes the bound (4) on the probabilistic robustness level. In Section IV-B we develop formal guarantees showing that the solution of (7) maintains a lower bound on the probabilistic robustness level with high probability, provided that the number of samples used is sufficiently large.

Iv-a Conditions for Convex Optimization

In this section, we consider the effect of and on lower bounding the probabilistic robustness level of the network, and on the tractability of the resulting scenario optimization (7). A key insight is this: an -cover of the output set may in general be much larger than the output set itself. This is because regions of an -cover that do not intersect with also do not count towards the coverage proportion . Therefore, if the class from which we choose an -cover does not have high enough complexity, then the -covers within may need to be exceedingly large in order to achieve -coverage. As an example, consider covering a line segment in first with an -norm ball, and then, instead, with an ellipsoid. See Figure 1. Clearly, the additional complexity of the ellipsoid allows for tighter coverage of the line segment.

The problem with unnecessarily large -covers is that the feasible set in (5

) includes many vectors

that may not be actual outputs in . In this case, the approximate robustness level is small, even though the probabilistic robustness level may be high. To avoid this problem, our choice of and should ensure that the class has high enough complexity. However, our choices should also yield a scenario problem (7) that is convex. Indeed, Theorem 1 gives sufficient conditions for the convexity of the scenario optimization. Before presenting these conditions, let us recall a fundamental definition for set-valued functions.

Definition 2 (Convexity of set-valued functions).

Consider a set-valued function defined on the convex set . The function is said to be convex on if

 (λh(θ1)+(1−λ)h(θ2))⊆h(λθ1+(1−λ)θ2)

for all and all . The function is said to be concave on if

 h(λθ1+(1−λ)θ2)⊆(λh(θ1)+(1−λ)h(θ2))

for all and all . Finally, the function on is said to be affine if it is both convex and concave.

Remark 1.

The definitions of convexity and concavity for a set-valued function appear to be opposite of those for scalar-valued and vector-valued functions. However, these definitions are consistent with those used in set-valued optimization and coincide with the traditional definition of cone-convexity. In particular, a convex cone defines an order relation on ; are ordered as if and only if [24]. Taking yields the familiar partial order of subset inclusion, and Definition 2 amounts to the usual definition of cone-convexity with respect to the order .

Example 2 (Norm ball functions are affine).

Consider again the norm ball class given in Example 1. It is easily verified by Definition 2 that the set-valued function defining the class is both convex and concave on . Therefore, is an affine set-valued function.

With tools for defining and proving convexity of set-valued functions now in place, we can present conditions under which the scenario optimization (7) is convex, and therefore easily solvable.

Theorem 1 (Convex scenario optimization).

Consider the scenario optimization problem (7). Suppose takes the form

 Θ={θ∈Rp:gi(θ)≤0 for all i∈{1,2,…,m}},

where the functions are convex. Furthermore, suppose is a concave set-valued function that takes the form

 h(θ)={y∈Rny:hi(y,θ)≤0 for all i∈{1,2,…,n}},

where and is convex for all . Then, (7) is a convex optimization problem.

Proof.

Since (7) is a maximization problem, we must show that under the assumptions on and , the objective is concave on and the constraints are convex.

Let us first consider the objective , where . Since

1. is jointly concave on ;

2. is a concave set-valued function on ;

3. and is a convex set;

Proposition 3.1 of [25] gives that is a concave function on . Since is assumed to be convex on and , we conclude that the objective is concave.

Now, let us consider the constraints. The constraints are convex, so is a convex constraint. Next, the random constraint is equivalent to the constraint on that for all . Since is a convex function, the constraint is convex. Since this holds for all and all , we conclude that all of the constraints in (7) are convex. ∎

Remark 2.

Theorem 1 is easily extended to include affine equality constraints in the forms taken by and . Additionally, if the functions in Theorem 1 are jointly convex, one can show that is an affine set-valued function, and therefore in (7) is affine (by applying Proposition 4.2 of [25]). Therefore, if is also affine, the scenario problem (7) has an affine objective.

Theorem 1 precisely answers our earlier inquiry: the class should be complex enough to contain -covers of the output set that are not unnecessarily large, but at the same time should be defined by convex constraints and should be taken as a concave set-valued function also defined by convex constraints. Note that these conditions on are not as restrictive as they may seem. In particular, Example 2 shows for the norm ball class that is affine (and therefore concave) and defined by convex constraints, and that this holds for all norms on , even though norm functions themselves are not affine. Therefore, Theorem 1 guarantees that the scenario optimization (7) using the norm ball class is a convex problem, and its objective is affine per Remark 2. We verify this fact in the following example.

Example 3 (Scenario optimization with norm ball class).

Recall the norm ball class and its corresponding set-valued function defined on given by

 h(¯y,r)={y∈Rny:∥y−¯y∥≤r}.

We show that (7) using this class is convex. Indeed, the approximate robustness level is

 ^r(¯y,r) =inf∥y−¯y∥≤ra⊤y+b =b−sup∥z∥≤1−a⊤(rz+¯y) =b+a⊤¯y−r∥a∥∗,

which is affine in the optimization variable . Hence, the scenario problem reduces to

 maximize b+a⊤¯y−r∥a∥∗−λv(¯y,r) (8) subject to ∥yj−¯y∥≤r for all j∈{1,2,…,N}, r>0,

which is a convex problem since is convex.

Iv-B High-Probability Guarantees

We now turn to consider the randomness of the scenario problem’s optimal value. In particular, we ask the following question: can the random scenario problem (7) be used to accurately lower bound the probabilistic robustness level and localize the random output ? In Theorem 2, we show that the answer is affirmative with high probability, provided that the problem is convex and a large enough number of samples is used.

Theorem 2 (High-probability guarantees).

Let . Assume that the scenario optimization (7) is convex and is attained by a solution . If

 N≥2ϵ(log1δ+p),

then the following events hold with probability at least :

1. is an -cover;

2. .

Proof.

Since the scenario problem is convex and , Theorem 1 of [22] gives that, with probability at least , we have

 PX(f(X)∈h(θ∗))≥1−ϵ.

By Definition 1, this implies that is an -cover of . By Proposition 1, this further implies that . ∎

In Theorem 2, randomness of a solution to the scenario problem (7) is taken care of by the probability bound. In particular, may not actually be an -cover, albeit with probability at most . This added randomness is precisely the price paid for replacing the intractable chance-constrained problem (6) with the tractable scenario problem (7). However, as Theorem 2 shows, the additional randomness is not a problem, since the requirement on scales like . Therefore, we can take very small and still maintain a reasonable sample size . In doing so, the scenario problem can be used in place of the chance-constrained problem to compute the maximum approximate robustness level and lower bound the probabilistic robustness level of the neural network. The resulting certification and localization hold with a probability that can be made arbitrarily close to one. For this reason, we slightly abuse terminology and call in the scenario problem (7) the optimal -cover.

Iv-C Procedural Outline

Before demonstrating our theoretical developments in Section V, we briefly recapitulate our proposed assessment method, and note the procedure’s remarkable generality. The procedure amounts to three steps:

1. Choose the parameter set and concave set-valued function according to Theorem 1 with sufficiently high complexity (e.g., moderately large ).

2. Choose probability levels close to zero. Independently sample inputs from the distribution over the support , and then compute .

3. Choose a regularization parameter and nonnegative convex function . Solve the scenario optimization problem (7). Theorem 1 guarantees that the problem is convex, and Theorem 2 guarantees with probability that the solution lower bounds the probabilistic robustness level and that is an -cover of .

We now remark the high generality of our procedure. First, the procedure does not require knowledge of the model of the network or its internal structures. Indeed, the only characteristics of the network that affect the above computation are the input and output dimensions, and

. Therefore, this procedure is effectively invariant to the number and width of hidden layers, making it particularly powerful in assessing the probabilistic robustness of deep neural networks. Furthermore, the procedure makes no assumptions on the differentiability, continuity, or nonlinearity type of the network’s activation functions.

Another remarkable generality of the proposed approach is that it applies to any input probability distribution . The support of the distribution, i.e., the input set , can be nonconvex, and our procedure still reduces to solving a convex optimization problem.

Finally, we remark the personalization granted to the user. Specifically, the user has the freedom to choose , , , , , and . These choices correspond to trading off computational cost with the tightness of the high-probability guarantees and with the tightness of the resulting bound on the probabilistic robustness level. Thus, the procedure can always be tailored to the user’s individual resources and desires. In particular, computational resources permitting, our data-driven approach can make the certification and localization hold with arbitrarily high probability by choosing and small enough. Finally, by varying , the user can choose the amount of importance they place on robustness certification versus on output localization. In particular, taking reduces to pure certification, whereas reduces to pure localization. This effect of varying is empirically demonstrated in Section V-A.

V Numerical Experiments

V-a Illustrative Example

Consider a neural network with ReLU activations and randomly designed weights. In our computations, we treat the weights and network structure as unknown, but assume that for we may compute . The input is distributed uniformly on the input set , where and . We consider the safe set , where and are chosen randomly for the purpose of this experiment.

We now follow our procedural outline given in Section IV-C to localize the output and assess its safety. We start by selecting the set and the set-valued function defined by

 h(¯y,r)={y∈R2:∥y−¯y∥Q≤r},

where is a norm on defined by for a fixed symmetric positive definite matrix . It is easily shown that the dual norm of takes the form . As shown in Example 2, is an affine set-valued function, and therefore and satisfy the conditions of Theorem 1.

The probability levels are chosen as and . We set , then uniformly sample inputs from and compute their corresponding random outputs . We compute the (symmetric positive definite) sample covariance matrix of the data and use it to define . Namely, we set . By doing so, we take our class to be the set of ellipsoids with axes scaled and oriented according to the principal components of the sampled output data.

As shown in Example 3, the scenario problem of interest takes the form

 maximize b+a⊤¯y−r∥a∥Q∗−λv(¯y,r) subject to ∥yj−¯y∥Q≤r for all j∈{1,2,…,N}, r>0,

where the optimization variable is . We choose the regularizer to be the square of the norm ball radius, i.e., . The optimization problem is convex as guaranteed by Theorem 1.

We solve the scenario problem first without regularization, and then with two different levels of reguarization: and . The respective solutions are denoted by , , and . Each instance takes approximately 15 seconds to solve using CVX in Matlab on a standard laptop with a 2.9 GHz quad-core i7 processor. The resulting approximate robustness levels are , , and . In each instance, Theorem 2 guarantees that the probabilistic robustness level is at least with probability at least . In other words, the random output has a safety level of with high probability, showing that the neural network is probabilistically robust.

The optimal -covers, , , and , contain with probability at least (disregarding ), and are shown in Figures 2 and 3. The set is massively over-conservative due to the choice , which corresponds to pure robustness certification. In the cases of and , the optimal -covers give much tighter localizations of the output . The approximate robustness levels with regularization are only slightly lower than the unregularized value. Yet, the most regularized -cover, , clearly provides much tighter approximation to , and still guarantees with high probability that . Despite the clear success of regularization in this example, it is important to remark that when the norm ball is not chosen to align with the data, the effect of regularization on the approximate robustness level can be more dramatic, and may cause the approximate robustness level to be negative even when the unregularized value is nonnegative.

V-B Comparison to Output Set Estimation

In this example, we compare our proposed assessment method to an alternate approach. In the second approach, we first estimate the output set of the neural network using the scenario-based reachability analysis in [17]. We then use the resulting output set estimate to assess the robustness of the network. Recall that our proposed scenario optimization (7) generalizes the reachability analysis of [17]. In addition to localizing the network outputs, our approach directly takes the goal of robustness certification into account, whereas the estimation technique of [17] does not.

To illustrate our comparison, consider a simple ReLU neural network given by , where for . The input is distributed uniformly on the input set , where . The safe set is given as , where and . It is straightforward to show that the output set is the top-half of the input set, namely, . Hence, if then . Therefore, , and so the random output is safe with probability one. The network is deterministically robust (and therefore has nonnegative probabilistic robustness level as well).

We now perform the two assessments at hand, computing our proposed solution first. We choose the -norm ball class for our candidate -covers and draw sufficiently many output samples according to Theorem 2 with and . Next, we choose the regularizer and regularization parameter , and then solve our proposed scenario problem (8) for the -norm ball class. The solution correctly certifies that network outputs are safe with high probability; see the blue set in Figure 4.

We now turn to the alternative method using the reachability analysis proposed in [17]. We use the same -norm ball class as above and solve for the minimum volume -cover using the same sampled outputs. The estimated output set is shown in red in Figure 4. Despite being a tighter localization, a substantial portion of the estimated output set exits the safe set, meaning this approach cannot certify the robustness of the network, even though the random output is truly safe with probability one. This comparison illustrates the fundamental difference between the problems of output set estimation and robustness certification. In particular, a good estimate of the output set of the network may not be the most informative set to use for robustness certification. This observation endorses our proposed method, which simultaneously encodes both goals of certification and localization.

Vi Conclusions

In this paper, we propose a data-driven method for assessing the robustness of a general deep neural network to an input with random uncertainty. We introduce an intuitive notion of probabilistic robustness based on the safety level of the random output, and we relate this to the more common definition of deterministic robustness. We show that by approximating the deterministic robustness level using -covers of the output set, the probabilistic robustness level can be lower bounded while simultaneously localizing the output. We provide conditions to ensure that optimizing the lower bound amounts to a tractable convex optimization problem. The optimization’s solution issues formal guarantees on the safety and localization of the random output that can be made to hold with overwhelming probability.