1. Introduction
With the fast spread of artificial intelligence and particularly neural networks in various fields including safetycritical applications such as autonomous driving
(Xiang et al., 2018), there is a growing need for replacing verification methods based on statistical testing (Kim et al., 2020) by more formal methods able to provide safety guarantees of the satisfaction of desired properties on the output of the neural network (Liu et al., 2021) or on the global behavior of a closedloop system with a neural network controller (Akintunde et al., 2018; Xiang et al., 2020; Saoud and Sanfelice, 2021).When focusing on neural networks alone, the rapidly growing field of formal verification has been categorized into three main types of objectives (see survey paper (Liu et al., 2021)): counterexample result to find an inputoutput pair that violates the desired property; adversarial result to determine the maximum allowed disturbance that can be applied to a nominal input while preserving the properties of the nominal output; reachability result to evaluate (exactly or through overapproximations) the set of all output values reachable by the network when given a bounded input set. While the first two types of results often rely on solving primal or dual optimization problems (Liu et al., 2021; Salman et al., 2019), reachability results (which are considered in this paper) naturally lean toward reusing existing or developing new reachability analysis approaches to be applied layer by layer in the network. The current reachability methods in the literature consider various set representations to overapproximate the network’s output set with static intervals (Xiang et al., 2020), symbolic interval equations and linear relaxations of activation functions (ReluVal (Wang et al., 2018b), Neurify (Wang et al., 2018a), VeriNet (Henriksen and Lomuscio, 2020), CROWN (Zhang et al., 2018)) or polyhedra and zonotopes in the tool libraries NNV (Tran et al., 2020) and ERAN (4).
The main weakness of existing neural network verifiers is that most of them are limited to a very small class of activation functions. The large majority of verifiers only handles piecewiseaffine activation to focus on the most popular ReLU function (Wang et al., 2018b, a; Katz et al., 2019; Botoeva et al., 2020). A handful of tools also cover other popular activation functions, primarily the Sshaped functions such as the sigmoid or hyperbolic tangent (P. Henriksen and A. Lomuscio (2020); H. Tran, X. Yang, D. M. Lopez, P. Musau, L. V. Nguyen, W. Xiang, S. Bak, and T. T. Johnson (2020); 4). Several other tools claiming to handle general activation functions are either implicitly restricted to monotone functions in their theory or implementation (Dvijotham et al., 2018; Raghunathan et al., 2018), or their claimed generality rather refers to a mere compatibility with general activation functions by assuming that the user provides their own implementation of how to handle these new functions (Zhang et al., 2018).
Contributions.
The goal of this tool paper is to present a novel method for the reachability analysis of feedforward neural networks with general activation functions. The proposed approach adapts the existing mixedmonotonicity reachability method for dynamical systems (Meyer et al., 2021) to be applicable to neural networks so that we can obtain an interval overapproximation of the set of all the network’s outputs that can be reached from a given input set. Since our reachability method is also applicable to any partial network within the main neural network and always returns a sound overapproximation of the last layer’s output set, intersecting the interval overapproximations from several partial networks ending at the same layer can only reduce the conservativeness while preserving the soundness of the results. To take full advantage of this, we propose an algorithm that applies our new mixedmonotonicity reachability method to all partial networks contained within the considered layer network, to ensure that the resulting interval overapproximation is the tightest output bounds of the network obtainable using mixedmonotonicity reachability analysis.
Mixedmonotonicity reachability analysis is applicable to any system whose Jacobian matrix is bounded on any bounded input set (Meyer et al., 2021)
. In the case of neural networks combining linear transformations and nonlinear activation functions, the above requirement implies that our approach is applicable to any Lipschitzcontinuous activation function. Since extracting the bounds of the Jacobian matrix of a system (or of the derivative of an activation function in our case) may not always be straightforward, for the sake of selfcontainment of the paper, we provide a method to automatically obtain these bounds for a (still very general) subclass of activation functions whose derivative can be defined as a
piece piecewisemonotone function. Apart from the binary step and Gaussian function, all activation functions that the author could find in the literature (including the many nonmonotone activation functions reviewed in (Zhu et al., 2021)) belong to this class.Although most of the tools cited above provide a twopart verification framework for neural networks (one reachability part to compute output bounds of the network, and one part to iteratively refine the network’s domain until a definitive answer can be given to the verification problem), this paper focuses on providing a novel method for the first reachability step only. The proposed approach can thus be seen either as a preliminary step towards the development of a larger verification framework for neural networks combining our new reachability method with an iterative splitting of the input domain, or as a standalone tool for the reachability analysis of neural networks as is used for example in the analysis of closedloop systems with a sampleandhold neural network controller (Xiang et al., 2020). In summary, the main contributions of this paper are:

a novel approach to obtain sound output bounds of a neural network using mixedmonotonicity reachability analysis;

a method that is compatible with any Lipschitzcontinuous activation function;

a framework that allows to easily add new activation functions (the user only needs to provide the definition of the activation function and its derivative, the global extrema of the derivative and the corresponding and );
The paper is organized as follows. Section 2 defines the considered problem and provides useful preliminaries for the main algorithm, such as the definition mixedmonotonicity reachability for a neural network, and how to automatically obtain local bounds on the activation function derivative. Section 3 presents the main algorithm applying mixedmonotonicity reachability to all partial networks within the given network. Finally, Section 4 compares our novel reachability approach to the bounding methods of other reachabilitybased verifiers in the literature.
2. Preliminaries
Given with , the (multidimensional) interval is defined as the set using componentwise inequalities.
2.1. Problem definition
Consider an layer feedforward neural network defined as
(1) 
where
is the output vector of layer
, and being the input and output of the neural network, respectively. We assume that the network is pretrained and all weight matricesand bias vectors
are known. The function is defined as the componentwise application of a scalar and Lipschitzcontinuous activation function.Some verification problems on neural networks aim to measuring the robustness of the network with respect to input variations. Since the exact output set of the network cannot always be easily computed, we instead rely on approximating this set by a simpler set representation, such as a multidimensional interval. By ensuring that we compute an overapproximation interval containing the whole output set, we can then solve the verification problem on the interval: if the desired property is satisfied on the overapproximation, then it is also satisfied on the real output set of the network. In this paper, we focus on the problem of computing an interval overapproximation of the output set of the network when its input is taken in a known bounded set, as formalized below.
Problem 1 ().
Naturally, our secondary objective is to ensure that the computed interval overapproximation is as small as possible in order to minimize the number of false negative results in the subsequent verification process.
2.2. Mixedmonotonicity reachability
In this paper, we solve Problem 1 by iteratively computing overapproximation intervals of the output of each layer, until the last layer of the network is reached. These overapproximations are obtain by adapting to neural networks existing methods for reachability analysis of dynamical systems. More specifically, we rely on the mixedmonotonicity approach in (Meyer et al., 2021) to overapproximate the reachable set of any discretetime system with Lipschitzcontinuous vector field . Since the neural network (1) is instead defined as a function with input and output of different dimensions, we present below the generalization of the mixedmonotonicity method from (Meyer et al., 2021) to any Lipschitzcontinuous functions.
Consider the function with input , output and Lipschitzcontinuous function . The Lipschitzcontinuity assumption ensures that the derivative (also called Jacobian matrix in the paper) is bounded.
Proposition 2.1 ().
Given an interval bounding the derivative for all , denote the center of the interval as . For each output dimension , define input vectors and row vector such that for all ,
Then for all and output dimension , we have:
Proposition 2.1 can thus provide an interval overapproximation of the output set of any function as long as bounds on the derivative are known. Obtaining such bounds for a neural network is made possible by computing local bounds on the derivative of its activation functions, as detailed in Section 2.3.
2.3. Local bounds of activation functions
Proposition 2.1 can be applied to any neural network whose activation functions are Lipschitz continuous since such functions have a bounded derivative. However, to avoid requiring the users to manually provide these bounds for each different activation function they want to use, we instead provide a method to automatically and easily compute local bounds for a very general class of functions describing most popular activation functions and their derivatives.
Let and consider the scalar function . In this paper, we focus on piece piecewisemonotone functions for which there exists such that is:

nonincreasing on until reaching its global minimum ;

nondecreasing on until reaching its global maximum ;

and nonincreasing on .
When (resp. ), the first (resp. last) monotone segment is squeezed into a singleton at infinity and can thus be ignored.
Although this description may appear restrictive, we should note that the vast majority of activation functions as well as their derivatives belong to this class of functions. In particular, this is the case for (but not restricted to) popular piecewiseaffine activation functions (identity, ReLU, parameterized ReLU), monotone functions (hyperbolic tangent, arctangent, sigmoid, SoftPlus, ELU) as well as many nonmonotone activation functions (SiLU, GELU, HardSwish, Mish, REU, PFLU, FPFLU, which are all reviewed or introduced in (Zhu et al., 2021)). Two examples of such functions are provided in Figure 1 representing the Sigmoid Linear Unit (SiLU, also called Swish) activation function (with and ) and its derivative (with and ). The only exception that the author could find is the Gaussian activation function which belongs to this class, but not its derivative (although does belong to this class, so a very similar approach can still be applied).
Proposition 2.2 ().
Given a scalar function as defined above and a bounded input domain , the local bounds of on are given by:
In this paper, Proposition 2.2 is primarily used to obtain local bounds of the activation function derivatives in order to compute bounds on the network’s Jacobian matrix for Proposition 2.1. On the other hand, Proposition 2.2 can also be useful on activation functions themselves to compare (in Section 4) our method based on mixedmonotonicity reachability with the naive interval propagation through the layers.
3. Reachability algorithm
Solving Problem 1 by applying the mixedmonotonicity approach from Proposition 2.1 on the neural network (1) can be done in many different ways. One possibility is to iteratively compute bounds on the Jacobian matrix of the network and then apply Proposition 2.1 once for the whole network. However, this may result in wide bounds for the Jacobian of the whole network, which in turn may result in a fairly conservative overapproximation of the network output due to the term in Proposition 2.1. The dual approach is to iteratively apply Proposition 2.1 to each layer of the network, which would result in tighter Jacobian bounds (since one layer’s Jacobian requires less interval matrix products than the Jacobian of the whole network), and thus tighter overapproximation of the layer’s output. However, the loss of the dependency with respect to the network’s input may induce another source of accumulated conservativeness if we have many layers. Many other intermediate approaches can also be considered, where we split the network into several consecutive partial networks and apply Proposition 2.1 iteratively to each of them.
Although all the above approaches result in overapproximations of the network output, the main challenge is that we cannot determine in advance which method would yield the tightest bounds, since this is highly dependent on the considered network and input interval. We thus devised an algorithm that encapsulates all possible choices by running Proposition 2.1 on all possible partial networks of (1), before taking the intersection of the obtained bounds to get the tightest bounds of each layer’s output that can be obtained with mixed monotonicity. The main steps of this approach are presented in Algorithm LABEL:algo and summarized below.
Given the layer network (1) with activation function and input interval as in Problem 1, our goal is to apply Proposition 2.1 to each partial network of (1), denoted as , containing only layers to (with ) and with input and output
. We start by initializing the Jacobian bounds of each partial network to the identity matrix. Then, we iteratively explore the network (going forward), where for each layer
we first use interval arithmetics (Jaulin et al., 2001) to compute the preactivation bounds based on the knowledge of the output bounds of the previous layer, and then apply Proposition 2.2 to obtain local bounds on the activation function derivative (). Next, for each partial network (with ) ending at the current layer , we compute its Jacobian bounds based on the Jacobian of layer () and the Jacobian of the partial network . We then apply Proposition 2.1 to the partial network with the Jacobian bounds we just computed and input bounds . Finally, once we computed the overapproximations of corresponding to each partial network ending at layer , we take the intersection of all of them to obtain the final bounds for .algocf[tbh]
Theorem 3.1 ().
The interval returned by Algorithm LABEL:algo is a solution to Problem 1.
The proof of this result is straightforward since we compute several interval overapproximations of the output of each layer using Proposition 2.1, thus ensuring that their intersection is still an overapproximation of the layer’s output. Then, using these overapproximations as the input bounds of the next layers guarantees the soundness of the approach.
Note also that in Algorithm LABEL:algo, Proposition 2.1 is applied to every possible partial network that exists within the main network (1). We are thus guaranteed that the resulting interval from Algorithm LABEL:algo is the least conservative solution to Problem 1 that could be obtained from applying the mixedmonotonicity reachability analysis of Proposition 2.1 to any decomposition of (1) into consecutive partial networks.
4. Numerical comparisons
The recent years have seen the development of a wide variety of new methods and tools for bounding and verifying properties on the outputs of neural networks, including optimizationbased approaches (see e.g. survey paper (Liu et al., 2021)) and methods based on reachability analysis with various set representations such as polyhedra and zonotopes in the ERAN (4) and NNV libraries (Tran et al., 2020). Since the method presented in this paper is an optimizationfree reachability analysis approach using (multidimensional) interval bounds, we focus our numerical comparisons to the 5 most related tools also relying on optimizationfree interval reachability. The main ideas of these approaches and their limitations are summarized below.

Naive interval bound propagation (IBP), as presented e.g. in (Xiang et al., 2020), is the most straightforward approach where interval arithmetic operations (Jaulin et al., 2001) are used to propagate the input bounds of the network through each affine transformation and activation function. Its implementation in the framework of (Xiang et al., 2020) is restricted to monotone activation functions and the results are very conservative due to losing the dependency to the network’s input.

To preserve part of the input dependency and thus obtain tighter bounds, ReluVal (Wang et al., 2018b) propagates inputdependent symbolic intervals through the network. This tool is restricted to ReLU activations and the input dependency is lost (concretization of the symbolic bounds) whenever the preactivation bounds span both activation modes of a ReLU node.

Neurify (Wang et al., 2018a) is an improved version of ReluVal where the concretization of the symbolic intervals is replaced by a linear relaxation of the ReLU activation, thus bounding the nonlinear function by two linear equations.

VeriNet (Henriksen and Lomuscio, 2020) is a generalization of the approach in Neurify (symbolic interval propagation and linear relaxation of activation functions) to Sshaped activation functions, such as sigmoid and hyperbolic tangent. In terms of implementation, VeriNet only propagates a single symbolic equation and an error matrix, which is more efficient than propagating two symbolic equations as done in Neurify and ReluVal.

Unlike all the above methods, CROWN (Zhang et al., 2018) relies on a backward propagation of the activation’s linear relaxations through the layers of the network. Although its authors claim the CROWN verifier to be compatible with any activation function (as long as the user provides their own methods to compute linear relaxations of the activation functions based on the preactivation bounds), in practice its current implementation in autoLiRPA (Xu et al., 2020) can only handle piecewiseaffine and Sshaped activation, similarly to VeriNet.
Since the tools to be compared in this section are written in various programming languages (Matlab, C, Python), for the convenience of comparison we have reimplemented the bounding method of each tool in Matlab. The implemented approaches are those described in each of the original papers cited above, and may thus slightly differ from the latest updates of the corresponding public toolboxes.
Most neural network verifiers have a twopart structure: one bounding the output of the network; and one iteratively refining the network’s domain until the union of the computed bounds satisfies or falsifies the desired property. Since the main contributions of this paper are related to the bounding part (and not to the full verification framework), restricting our numerical comparisons to a handful of pretrained networks from the literature is not very meaningful to evaluate the relative qualities of the compared bounding methods for various activation functions. We thus rather focus the comparisons on the average results obtained from each approach over a large set of randomly generated neural networks of various sizes.
Each considered feedforward neural network has the structure defined in (1) with all its parameters randomly chosen as follows: a depth ; a number of input and output variables
; a number of neurons per hidden layer (each layer may have a different width)
; and input bounds defined as the hypercube around a randomly chosen center input . The comparison between our mixedmonotonicity method in Algorithm LABEL:algo and the five approaches from the literature cited above (when applicable) is done over a set of random networks for each of these activation functions:
Rectified Linear Unit (ReLU) is the piecewiseaffine function ;

Hyperbolic tangent (TanH) is the Sshaped function ;

Exponential Linear Unit (ELU) is the monotone function if and if ;

Sigmoid Linear Unit (SiLU) is the nonmonotone function .
Note that all these activation functions are Lipschitz continuous and their derivatives satisfy the desired shape described in Section 2.3, meaning that Algorithm LABEL:algo can be applied to all of them.
For each of the four considered activation functions and six bounding methods, the average results over the randomly generated networks are given in Table 1 for the computation time (on a laptop with GHz processor and GB of RAM running Matlab 2021a) and Table 2 for the width of the output bounds (using the norm). For the average of the results to be meaningful despite the varying depth and width of the random networks, the provided results of each network are scaled down with respect to its total number of neurons.
The first and main outcome of these comparisons is on the much higher generality of our proposed mixedmonotonicity approach compared to all others. Indeed, with no additional effort it can natively handle piecewiseaffine activation functions (as ReluVal and Neurify), Sshaped functions (as VeriNet and CROWN), other monotone functions (as the IBP implementation in (Xiang et al., 2020)), as well as general nonmonotone activation function currently not handled by any of the above five tools. Note that the results in parentheses in Tables 13 are those not natively supported by the original tools, but added in our own implementation of their bounding approaches. This is the case for the IBP approach which is restricted to monotone activation functions in its implementation of (Xiang et al., 2020), but that we extended to nonmonotone activation by using Proposition 2.2. For the ELU activations which are not Sshaped functions and are not natively handled by the bounding methods from VeriNet and CROWN, we still managed to include it in our own implementation of these tools since it is a convex function for which linear relaxations can be computed by taking the tangent at the center of the preactivation bounds for the lower bound equation and the intercepting line for the upper bound equation.
In addition to the generality of our approach, its simplicity to handle new userprovided activation functions shall be highlighted. Indeed, ReluVal (Wang et al., 2018b), Neurify (Wang et al., 2018a), VeriNet (Henriksen and Lomuscio, 2020) and CROWN (Zhang et al., 2018) all rely on linear relaxations of the nonlinear activation functions, which may require long and complex implementations to be provided by the user for each new activation function (e.g. finding the optimal linear relaxations of Sshaped activation functions takes several hundreds of lines of code in the implementation of VeriNet). In contrast, for a user to add a new activation type to be used within our mixedmonotonicity approach, all they need to provide is the definition of the activation function and its derivative, and the global , and corresponding and of the derivative as defined in Section 2.3. Everything else is automatically handled internally by Algorithm LABEL:algo.
This generality and ease of use however comes at the cost of a higher computational complexity as can be seen in Table 1. This higher complexity primarily comes from the fact that most other methods propagate interval or symbolic bounds through the network only once (except from CROWN which is also computationally expensive), while our approach in Algorithm LABEL:algo calls the mixedmonotonicity reachability method from Proportion 2.1 on all partial networks within the main layer network.
Method  ReLU  TanH  ELU  SiLU 

IBP (Xiang et al., 2020)  ()  
ReluVal (Wang et al., 2018b)        
Neurify (Wang et al., 2018a)        
VeriNet (Henriksen and Lomuscio, 2020)  ()    
CROWN (Zhang et al., 2018)  ()    
MixedMonotonicity 
Method  ReLU  TanH  ELU  SiLU 

IBP (Xiang et al., 2020)  ()  
ReluVal (Wang et al., 2018b)        
Neurify (Wang et al., 2018a)        
VeriNet (Henriksen and Lomuscio, 2020)  ()    
CROWN (Zhang et al., 2018)  ()    
MixedMonotonicity 
Method  ReLU  TanH  ELU  SiLU 

IBP (Xiang et al., 2020)  ()  
ReluVal (Wang et al., 2018b)        
Neurify (Wang et al., 2018a)        
VeriNet (Henriksen and Lomuscio, 2020)  ()    
CROWN (Zhang et al., 2018)  ()   
In terms of width of the output bounds, we can see in Table 2 that on average, our approach performs better than the IBP method and worse than ReluVal, Neurify, VeriNet and CROWN when applicable. On the other hand, Table 3 gives the proportion of the random networks for each activation function on which our mixedmonotonicity approach returns tighter (or at least as tight) bounds than the other five methods. In particular, we can see that Algorithm LABEL:algo always performs better than the IBP approach, and that it returns tighter bounds than the other four tools (ReluVal, Neurify, VeriNet, CROWN) in  of cases, depending on the tool and activation function. Note also that in of cases, CROWN was not able to return a valid output for the ELU activation due to its formulation in (Zhang et al., 2018) being incompatible with horizontal relaxations (when preactivation bounds of the ELU are very negative).
5. Conclusions
This paper presents a new tool for the sound reachability analysis of feedforward neural networks using mixedmonotonicity. The main strength of the proposed approach is that it can be applied to very general networks, since its only limitation is to have Lipschitzcontinuous activation functions, which is the case for the vast majority of activation functions currently in use in the literature. In addition, on activation functions handled by other reachability tools, our algorithm outperforms them on about to of networks, depending on the tools and activation functions. Another significant strength of our framework is its simplicity for an external user to add new activation functions. Indeed, while tools based on symbolic intervals require userprovided linear relaxations of the new activation functions (which may need up to several hundreds of lines of code in some cases), handling a new activation function in our approach only requires the user to provide the definition of the function and its derivative, and the global extrema of the derivative. The generality and simplicity of use of our approach come at the cost of a higher computational complexity.
Although this tool can already be used on its own for the reachability analysis of neural networks, one of our next objectives is to include this new reachability method into a larger neural network verification framework with an iterative refinement of the input domain until a safety problem can be solved.
References
 Reachability analysis for neural agentenvironment systems. In Sixteenth International Conference on Principles of Knowledge Representation and Reasoning, Cited by: §1.
 Efficient verification of relubased neural networks via dependency analysis. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, pp. 3291–3299. Cited by: §1.
 A dual approach to scalable verification of deep networks.. In UAI, Vol. 1, pp. 3. Cited by: §1.
 [4] (2021) ERAN: ETH robustness analyzer for neural networks. Note: Available online at: https://github.com/ethsri/eran Cited by: §1, §1, §4.
 Efficient neural network verification via adaptive refinement and adversarial search. In ECAI 2020, pp. 2513–2520. Cited by: 4th item, §1, §1, 4th item, Table 1, Table 2, Table 3, §4.

Applied interval analysis: with examples in parameter and state estimation, robust control and robotics
. Vol. 1, Springer Science & Business Media. Cited by: §3, 1st item.  The marabou framework for verification and analysis of deep neural networks. In International Conference on Computer Aided Verification, pp. 443–452. Cited by: §1.

A programmatic and semantic approach to explaining and debugging neural network based object detectors.
In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
, pp. 11128–11137. Cited by: §1.  Algorithms for verifying deep neural networks. Foundation and Trend in Optimization 4 (34), pp. 244–404. Cited by: §1, §1, §4.
 Interval reachability analysis: bounding trajectories of uncertain systems with boxes for control and verification. Springer. Cited by: §1, §1, §2.2.
 Certified defenses against adversarial examples. In International Conference on Learning Representations, Cited by: §1.
 A convex relaxation barrier to tight robustness verification of neural networks. arXiv preprint arXiv:1902.08722. Cited by: §1.
 Computation of controlled invariants for nonlinear systems: application to safe neural networks approximation and control. IFACPapersOnLine 54 (5), pp. 91–96. Cited by: §1.
 NNV: the neural network verification tool for deep neural networks and learningenabled cyberphysical systems. In International Conference on Computer Aided Verification, pp. 3–17. Cited by: §1, §1, §4.
 Efficient formal safety analysis of neural networks. In 32nd Conference on Neural Information Processing Systems (NIPS), Montreal, Canada. Cited by: 4th item, §1, §1, 3rd item, Table 1, Table 2, Table 3, §4.
 Formal security analysis of neural networks using symbolic intervals. In 27th USENIX Security Symposium (USENIX Security 18), Cited by: 4th item, §1, §1, 2nd item, Table 1, Table 2, Table 3, §4.

Verification for machine learning, autonomy, and neural networks survey
. arXiv preprint arXiv:1810.01989. Cited by: §1.  Reachable set estimation for neural network control systems: a simulationguided approach. IEEE Transactions on Neural Networks and Learning Systems 32 (5), pp. 1821–1830. Cited by: 4th item, §1, §1, §1, 1st item, Table 1, Table 2, Table 3, §4.
 Automatic perturbation analysis for scalable certified robustness and beyond. Advances in Neural Information Processing Systems 33. Cited by: 5th item.
 Efficient neural network robustness certification with general activation functions. Advances in Neural Information Processing Systems 31, pp. 4939–4948. External Links: Link Cited by: 4th item, §1, §1, 5th item, Table 1, Table 2, Table 3, §4, §4.

PFLU and FPFLU: two novel nonmonotonic activation functions in convolutional neural networks
. Neurocomputing 429, pp. 110–117. Cited by: §1, §2.3.
Comments
There are no comments yet.