# ART: Abstraction Refinement-Guided Training for Provably Correct Neural Networks

Artificial Neural Networks (ANNs) have demonstrated remarkable utility in various challenging machine learning applications. While formally verified properties of their behaviors are highly desired, they have proven notoriously difficult to derive and enforce. Existing approaches typically formulate this problem as a post facto analysis process. In this paper, we present a novel learning framework that ensures such formal guarantees are enforced by construction. Our technique enables training provably correct networks with respect to a broad class of safety properties, a capability that goes well-beyond existing approaches, without compromising much accuracy. Our key insight is that we can integrate an optimization-based abstraction refinement loop into the learning process and operate over dynamically constructed partitions of the input space that considers accuracy and safety objectives synergistically. The refinement procedure iteratively splits the input space from which training data is drawn, guided by the efficacy with which such partitions enable safety verification. We have implemented our approach in a tool (ART) and applied it to enforce general safety properties on unmanned aviator collision avoidance system ACAS Xu dataset and the Collision Detection dataset. Importantly, we empirically demonstrate that realizing safety does not come at the price of much accuracy. Our methodology demonstrates that an abstraction refinement methodology provides a meaningful pathway for building both accurate and correct machine learning networks.

## Authors

• 1 publication
• 11 publications
• 5 publications
• 11 publications
• ### Improving Neural Network Verification through Spurious Region Guided Refinement

We propose a spurious region guided refinement approach for robustness v...
10/15/2020 ∙ by Pengfei Yang, et al. ∙ 0

• ### Generating Probabilistic Safety Guarantees for Neural Network Controllers

Neural networks serve as effective controllers in a variety of complex s...
03/01/2021 ∙ by Sydney M. Katz, et al. ∙ 0

• ### Model Repair Revamped: On the Automated Synthesis of Markov Chains

This paper outlines two approaches|based on counterexample-guided abstra...
05/27/2021 ∙ by Milan Ceska, et al. ∙ 0

• ### Partial Predicate Abstraction and Counter-Example Guided Refinement

In this paper we present a counter-example guided abstraction and approx...
12/05/2017 ∙ by Tuba Yavuz, et al. ∙ 0

• ### SymAR: Symmetry Abstractions and Refinement for Accelerating Scenarios with Neural Network Controllers Verification

We present a Symmetry-based abstraction refinement algorithm SymAR that ...
11/21/2020 ∙ by Hussein Sibai, et al. ∙ 0

• ### Fast Neural Network Verification via Shadow Prices

To use neural networks in safety-critical settings it is paramount to pr...
02/19/2019 ∙ by Vicenc Rubies Royo, et al. ∙ 0

• ### How to Win First-Order Safety Games

First-order (FO) transition systems have recently attracted attention fo...
08/16/2019 ∙ by Helmut Seidl, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

Artificial neural networks (ANNs) have emerged in recent years as the primary computational structure for implementing many challenging machine learning applications. Their success has been due in large measure to their sophisticated architecture, typically comprised of multiple layers of connected neurons (or

activation functions), in which each neuron represents a possibly non-linear function over the inputs generated in a previous layer. In a supervised setting, the goal of learning is to identify the proper coefficients (i.e., weights) of these functions that minimize differences between the outputs generated by the network and ground truth, established via training samples. Their ability to identify fine-grained distinctions among their inputs through the execution of this process makes these networks particularly useful in problems related to classification, image recognition, natural language translation, and autonomous driving, all areas of major interest in the machine learning community.

Nonetheless, their complexity also poses significant challenges to verification, in large part because of the difficulty in identifying how logical notions of high-level correctness relate to overall network structure, a low-level, uninterpretible artifact. For certain kinds of properties such as local robustness (Papernot et al., 2016; Gehr et al., 2018), which are important to guarantee that the network is resilient to adversarial attacks (Goodfellow et al., 2015; Nguyen et al., 2015; Madry et al., 2018), recent efforts have presented techniques that can verify the robustness of an existing network (Singh et al., 2019b; Gehr et al., 2018) or appropriately guide the training phase used in the construction of the network (Mirman et al., 2018).

While local robustness properties are useful to prove that ANNs cannot be fooled by adversarial examples, it is also important to make sure that ANNs behave correctly, operating in ways consistent with more general specifications and invariants. For instance, Figure 1 depicts the ACAS Xu (Airborne Collision Avoidance System) application developed to handle midair collisions between commercial aircraft (Julian et al., 2016). The system is controlled by a series of ANNs to produce horizontal maneuver advisories. One example correctness111We do not distinguish between correctness and safety, and use these interchangeably. property states that if a potential intruder is far away ( is significantly large) and is significantly slower than own’s one vehicle, ( is

significantly lower than ), then regardless of the intruder’s () and subject’s () direction, the ANN controller should output Clear-of-Conflict (as it is unlikely that the intruder can collide with the subject). Unfortunately, even a sophisticated ANN handler used in the ACAS Xu system, although well trained, has been shown to violate this correctness property (Katz et al., 2017).

Existing approaches that are capable of potentially identifying violations of such properties separate verification from learning (Katz et al., 2017; Wang et al., 2018c), which has an inherent disadvantage when verification fails, given the complexity and uninterpretibility of these networks. In other words, if ANNs are generated without incorporating provable correctness as part of their training objective, then there is no guarantee that the weights discovered by the training process are provably correct, exemplified by the above incorrect ANN-controlled ACAS Xu system. The lack of a principled methodology to repair networks that are not verifiable, however, short of commencing the training process from scratch, makes the problem of leveraging verification counterexamples post facto a challenging exercise for which no credible proposal has been put forth thus far.

In this paper, we target a significant generalization of other state-of-the-art verification approaches that enables correct-by-construction generation of ANNs with respect to a broad class of correctness properties expressed over the network’s inputs. Developing a scalable training technique developed with correctness in mind that nonetheless retains desirable precision is the primary challenge to realizing this goal. Scalability is an important issue for any such strategy given the large size of the input space, and the potentially large number of neurons that comprise the network. Like previous efforts (Gehr et al., 2018; Mirman et al., 2018), we employ abstract interpretation methods to generate sound abstractions of both the input space and the network itself.

However, simply generating a safe over-approximation of a network is not necessarily useful because an excessively over-approximate abstraction may provide imprecise information on how to further optimize the network for correctness. The difficulty in balancing the goal of scalable verification with accurate optimization in our context arises from the need to integrate correctness constraints within the gradient descent optimization loop that sits at the core of the training procedure. When the property to verify is locally defined, for example, as in the case of robustness, it may be possible to bake-in these considerations as part of the abstraction itself, leading to a clean characterization of the optimization procedure in terms of the over-approximation (Mirman et al., 2018) induced by the abstraction.

In contrast, in our case, the structure of the optimization procedure must be significantly different since we do not know to guide the optimization loop by the logical characteristics of the correctness property a priori. To overcome this challenge, we obtain such information on the fly as shown by the workflow depicted schematically in Figure 2. Our approach takes as input a correctness property (, ) that prescribes desired network output behavior using logic constraints when the inputs to the network are within a domain described by . In particular, our training procedure involves an abstract domain (e.g., the interval domain) and a refinement loop over our abstraction of the input space, expressed in terms of correctness properties defined over these inputs. A non-zero loss of correctness of an input abstraction , obtained by an abstract interpretation over the abstract domain

via estimating the loss from the abstracted output

of the network to the correctness constraint , may indicate a potential violation of the network’s output correctness. This loss can then be used to optimize the network’s weights to mitigate the loss of correctness on (the right loop of Figure 2). On the other hand, since the amount of imprecision introduced by the input space abstraction is correlated with the precision of the abstracted network output , we additionally propose a refinement mechanism over the input space abstraction, optimized for this imprecision (the left loop of Figure 2). This abstraction refinement process allows us to apply gradient descent methods to construct networks that are provably correct, Notably, our correct-by-construction generation of ANNs can be applied with standard ANN training algorithms, without comprising the accuracy guarantees offered by classical optimization methods (the top of Figure 2).

This paper makes the following contributions:

1. We present an abstract interpretation-guided training strategy for building correct-by-construction neural networks, defined with respect to a rich class of correctness properties that go well beyond local robustness assertions.

2. We define an input space abstraction refinement loop that reduces training on input data to training on input space partitions, where the precision of the abstraction is, in turn, guided by a notion of correctness loss as determined by the correctness property.

3. We formalize soundness claims that capture correctness guarantees provided by our methodology; these results characterize the ability of our approach to ensure correctness with respect to domain-specific correctness properties.

4. We have implemented our ideas in a tool (Art ) and applied it to a challenging benchmark, the ACAS Xu collision avoidance dataset (Julian et al., 2016; Katz et al., 2017). We provide a detailed evaluation study quantifying the effectiveness of our approach and assess its utility to ensure correctness without compromising accuracy. We additionally provide a comparison of our approach with a post facto counterexample-guided verification strategy that provides strong evidence for the benefits of Art ’s methodology compared to such techniques. These experiments justify our claim that synthesis of synthesize correct-by-construction networks is feasible even when the correctness properties under consideration are highly sophisticated.

The remainder of the paper is organized as follows. In the next section, we provide a simple motivating example that illustrates our approach. Section 3 provides background and context. Section 4 presents a formalization of our approach. Details about Art ’s implementation and evaluation are provided in Section 5. Related work and conclusions are given in Sections 6 and 7, resp.

## 2. Illustrative Example

We illustrate and motivate the key components of our approach using a realistic albeit simple end-to-end example. We consider the construction of a learning-enabled system for autonomous driving. The learning objective is to identify potentially dangerous objects within a prescribed range of the vehicle’s current position.

Problem Setup. For the purpose of this example, we simplify our scenario, depicted in Figure 3, by

assuming that we track only a single object and that the information given by the vehicle’s radar is a feature vector of size two, containing the object’s normalized relative speed

and its relative angular position in a polar coordinate system with our vehicle located in the center. Here, means the vehicle is getting closer to the object with the speed of ; means our vehicle is moving away from the object; and, means the object and vehicle are moving in lock-step with respect to each other.

Consider an implementation of an ANN for this problem that uses a 2-layer ReLU neural network with initialized weights as depicted in Figure 4. The network takes an input vector and outputs a vector , where and are the prediction scores for action Report and action Ignore, respectively. The advisory system picks the action with the higher prediction score as the result. For simplicity, both layers in are linear layers with 2 neurons and without bias terms. An element-wise ReLU activation function is applied after the first layer. In this example, we assume the activation function in each layer is a simple linear combination of the inputs whose coefficients are given by the weights associated with the function’s input edges. Thus, is defined as where the initial weight assignment shown assigns 1 to and to . The output of is fed into a ReLU unit that emits 4.5 (since ). The output layer of the network again computes a linear combination of the ReLU outputs, which serve as its inputs, using the weight coefficients depicted.

Correctness Property. To serve as a useful advisory system, we can ascribe some correctness properties that we would like the network to always satisfy, as discussed in Sec. 1. In this example, we focus on one such correctness property, defined below. Our approach generalizes to an arbitrary number of such correctness properties that one may want to enforce in a learning-enabled system.

:

: Objects in front of the vehicle that are static or moving closer to our vehicle should always be reported.

We can interpret the assumptions of “static or moving closer” and “in front of” in terms of predicates over feature vector components such as and 222We pick because it is slightly wider than the front view angle of ., respectively. Using this representation and recalling that , the correctness property we want to ensure can be formulated as:

:

: .

Observe that the network shown in Figure 4 does not satisfy this property as discussed above.

Thus, (and, more generally, the correctness properties considered by our system) can be expressed using a pair of predicates specifying the assumptions on the network input and the corresponding requirements on the network output.

Correctness Loss Function.

To quantify how incorrect a neural network is, we define a distance function between the output of the neural network (on inputs satisfying the input predicate ) and the output predicate . For this example, we can define the distance of the network output from as:

 dist(y,Φout) =minq⊨Φoutdiste(y,q)

where is the standard Euclidean space distance function. Clearly, when , it follows that satisfies the output predicate .

The distance function can be used as a loss function, among other training objectives (e.g., optimizing the ANN to mimic an expert’s decisions), to train the neural network using a training set. However, a general correctness property like is defined over an infinite set of data; since training necessarily is performed using only a finite set of samples, we cannot generalize any observations made on just these samples to assert a general correctness property on the trained network with respect to .

Our approach, therefore, leverages abstract interpretation techniques to generate sound abstractions of both the network input space and the network itself. By training on an abstract input space, our method obtains a finite approximation of the infinite set of possible network behaviors. By training on a network’s abstraction, our technique is correct-by-construction, intensionally optimizing over the abstraction’s worst cases on correctness loss.

We parameterize our approach on any abstract domain that can soundly approximate a neural network’s behavior so that an abstract output is guaranteed to subsume all possible outputs for the set of inputs being abstracted. In the example, we consider a simple interval abstract domain that has been used for neural network verification (Gehr et al., 2018; Wang et al., 2018c).

For example, an interval abstraction of our 2-layer ReLU network is shown in Figure 5. Intervals maintain a maximum and minimum bound for each neuron, and abstract the concrete neural network computation using interval arithmetic (Moore et al., 2009), denoted as . Let us denote the lower bound and upper bound of a neuron as and , respectively. Using interval arithmetic, and can be computed from the bounds of neurons in the previous layer. For example, for neuron : and . For each neuron, the (abstracted) ReLU function applies to its lower and upper bounds directly, since bound values are maintained explicitly. Consider abstract value propagation from to . By definition of ReLU, the lower bound of neuron is reset to while its upper bound is unchanged.

Applying these rules, the bounds on the output layer can be computed as and . This abstracted network output fails to show that always holds under the prescribed input space predicate . Indeed, the network in Figure 4 is incorrect: for and , the network generates an output and that violates the correctness property.

Our approach leverages the neural network abstraction to quantify the loss of correctness on the abstract domain. To simply the exposition, we create a new temporary variable and apply the interval abstract transformer for the assignment . The transformer then computes interval bounds for , which produces , from the bounds generated for and . We rewrite in the correctness property as (i.e., ).

We define a correctness loss function , parameterized by an abstract domain (in the example is the interval abstract domain ), to measure the worst-case distance between an abstracted neural network output , e.g., in the example, and the output predicate of the correctness property, e.g., in the example:

 LD(F,Φin,Φout) =LI(F,Φin,Φout) =maxp∈FI(Φin)dist(p,Φout) =maxp∈[−6.125,11.875]dist(p,[−∞,0]) =maxp∈[−6.125,11.875] minq∈[−∞,0]diste(p,q) =11.875−0=11.875

where is the standard Euclidean space distance function. The correctness loss function enumerates all possible neural network outputs that are subsumed by the abstract network output to find the one that has the highest distance from . When returns 0, the abstracted output is subsumed by the output predicate of the correctness property; and, therefore, all possible inputs subsumed by the abstracted network’s input region are guaranteed to be correct. However, in our example, returns 11.875; that is, the worst case correctness loss occurs on the upper bound of the abstract neural network output.

Training on an Abstract Domain. Leveraging the correctness loss function, our approach derives the gradient of the loss w.r.t. the network weights and, in usual fashion, applies a gradient descent optimization algorithm to update the network weights. Note that, leveraging the interval abstraction, the correct loss function can be implemented using MaxPooling and MinPooling units, and hence is differentiable. Since an interval abstract domain is suited for differentiation and gradient descent, we can use off-the-shelf automatic differentiation frameworks (Paszke et al., 2017)

to backpropagate the gradient of the correctness loss function to readjust the neural network weights end-to-end so as to improve the correctness of the neural network.

Input Space Abstraction Refinement. An ANN is correct with respect to property if for every input that satisfies , the network produces an output that satisfies . When applied to an abstract network, the correctness loss function measures the degree of imprecision in the abstracted output. Our goal is to minimize this imprecision, making it as close as possible to the behavior of the concrete network, without violating correctness. But, imprecision in the abstracted output is directly correlated to the size of the input domain. Thus, identifying ways to reduce this size, without compromising correctness or scalability, is critical.

The correctness loss function provides a direction for applying gradient descent to train on neural network abstractions. Like standard ANN training algorithms, we could iteratively leverage the correctness loss function to update the neural network weights until reaching convergence. However, may be overly imprecise since the amount of imprecision introduced by the neural network abstraction is correlated with the size of the input region described by . Observe that if we simply bisect along every dimension of ’s input space and compute the correctness loss for each of them, we have

1. For and , the correctness loss ;

2. For and , the correctness loss ;

3. For and , the correctness loss ;

4. For and , the correctness loss .

Obviously, the original correctness loss () does not pertain to any real data points, since the maximum correctness loss is after a simple refinement.

To use more accurate gradients for network weight optimization, based on the above observation, during training, our approach also iteratively partitions the input region to aid the abstract interpreter. In other words, we seek an input space abstraction refinement mechanism that reduces imprecise correctness loss introduced by abstract interpretation. Notably, incorporating input space abstraction refinement with the gradient descent optimizer does not compromise the soundness of our approach. As long as all sub-regions of are provably correct, the network’s correctness with respect to

trivially holds. However, the simplistic input abstraction refinement mechanism described above does not work in practice because it partitions every dimension; its complexity is thus exponential to the number of dimensions. To overcome this weakness, we apply an optimization-based heuristic similar to the mechanism proposed by

(Wang et al., 2018c) that utilizes the correctness loss function to pick an input space dimension along which a single bisection in each training iteration is performed. In the example, assume dimension is chosen for input space partitioning. In the next training iteration, we show the partitioned input sub-regions and their correctness loss:

1. For and , correctness loss ;

2. For and , correctness loss ;

The result shows that the maximum correctness loss decreases from 11.875 to 9.375.

Iterative Training. In fact, our ANN correct-by-construction algorithm interweaves input space abstraction refinement and gradient descent training on a network abstraction in each training iteration by leveraging the correctness loss function produced by the network abstract interpreter (as depicted in Figure 2), until a provably correct ANN is trained. For our illustrative example, we set the learning rate of the optimizer to be . In our experiment, the maximum correctness loss among all refined input space abstractions drops to after 16 iterations. Convergence was achieved by partitioning the input space into 123 pieces. The trained ANN is guaranteed to satisfy the correctness property .

## 3. Background

###### Definition 3.1 (Neural network).

Neural networks are functions composed of layers and activation functions. Each layer is a function for where and . Each activation function is of the form for . Then .

###### Definition 3.2 (Abstract domain).

An abstract domain is defined by a tuple of with and being Galois connections

 (Dc,⊆)γ⇆α(Da,⊑).

Here and are the domains of concrete and abstract elements, respectively. is the abstraction function that maps concrete elements to abstract elements and is the concretization function mapping backwards. is a set of transformer pairs over and .

###### Definition 3.3 (D-compatible).

Given abstract domain , a neural network is -compatible iff

1. for every layer in , there exists a differentiable abstract transformer such that , and

2. for every activation function in , there exists a differentiable abstract transformer such that .

For a -compatible neural network , we denote by the over-approximation of where every layer and activation function in are replaced in by their corresponding abstract transformers in .

To reason about a neural network over some abstract domain , we need to first characterize what it means for an ANN to operate over .

###### Definition 3.4 (Evaluation over Abstract Domain).

Given an abstract domain and a neural network that is -compatible, the evaluation of over and a range of inputs , denoted as , is

 FD(X)=γ(F#D(α(X))).

In other words, defines the over-approximated output that covers all possible outputs corresponding to any input belong to . This is formulated in the following theorem.

###### Theorem 3.5 (Over-approximation Soundness).

For any input feature vector and input range , .

Although our approach is parametric over abstract domains, we require the abstract transformers associated with these domains to be differentiable, to enable the training over worst-cases over-approximated by via gradient-descent style optimization algorithm.

## 4. Correct-by-Construction Training

Our approach aims to train an ANN with respect to a correctness property , which is formally defined in Section 4.1. The core observation underlying our approach is that although the abstract transformer based on the abstract domain provides only a loose bound on the abstracted output, can nonetheless be trained to make this bound much tighter to improve the quality of its correctness guarantees. To this end, the training procedure must use precise gradient information for optimization. Section 4.2 introduces the idea of input space abstraction and refinement as mechanisms that can reduce imprecise gradient optimization over . Specifically, an input space abstraction induces a set of non-overlapping partitioned input domains. Section 4.3 formally defines a correctness loss function , over that supplies the gradient of the loss function to aid automated end-to-end differentiation. The correctness loss function is useful in guiding both the optimization of ’s weights and refining the input space abstraction. This abstraction refinement mechanism is the key to our training algorithm.

### 4.1. Correctness Property

The correctness properties we consider are expressed as logical propositions over the network’s inputs and outputs. We assume that an ANN correctness property checks the outputs for violations, given assumptions on the inputs. Formally,

###### Definition 4.1 (Correctness Property).

Given a neural network , a correctness property is a pair in which defines a bounded input domain over , and is an arbitrary boolean combination of linear inequalities over the network output vector . Specifically, is in the form where is a -dimensional vector of the lower bound of the network inputs and is the upper bound. We define an auxliary function size to measure the size of an input domain :

 size(Φin)=∫Φindμ.
###### Example 4.2 ().

In Section 2, the correctness property we wanted to train and verify was of the form: where . Here, is the input domain of the correctness property that captures the range of valid values for and , and defines a predicate on the network’s output vector.

A correctness property holds on , denoted , iff for any input feature vector ,

 x∈Φin⟹Φout(F(x)).

In practice, we formulate any Boolean combination of linear inequalities on the output of the network

as a sequence of additional linear and max-pooling layers. The verification problem is hence reduced to finding whether the scalar output of the modified network can reach a negative value

333Disjunctive clauses can be encoded using a MaxPooling unit: if the out predicate , can be encoded as . Conjunctive clauses can be encoded similarly. (see Section 2 for an example).

### 4.2. Input Space Abstraction Refinement

Recall that in Section 2 we illustrated how an input space abstraction refinement mechanism could help reduce imprecise worst-case correctness loss. We formally define this notion here. Given a correctness property , an input space abstraction decomposes into a set of non-overlapping intervals such that .

###### Definition 4.3 (Input Space Abstraction).

An input space abstraction refines a correctness property into a set of correctness properties such that , and , meaning that refined input domains are non-overlapping. Two abstractions and are non-overlapping iff the input domains of any pair of their correctness properties are non-overlapping. We use to denote the number of correctness properties included in . Given a neural network , and a correctness property with input space abstraction , we have

 F⊨S⟺⋀Φ∈SF⊨Φ.
###### Example 4.4 ().

In Section 2, the input domain of the correctness property in Example 4.2 was decomposed into two non-overlapping input domains and . Thus, the input space abstraction includes .

Armed with these definitions, we can now formally state our central notion of input space abstraction refinement:

###### Definition 4.5 (Input Space Abstraction Refinement).

A well-founded abstraction refinement is a binary relation over a set of input abstractions such that:

(reflexivity):

(refinement):

Given a correctness property , and an input space abstraction ,

 ⎛⎜⎝⋀Si≡(_,Φiout)∈SΦiout⟺Ψout⎞⎟⎠∧⎛⎜⎝Ψin=⋃Si≡(Φiin,_)∈SΦiin⎞⎟⎠⟹S⊑{(Ψin,Ψout)}
(transitivity):

, , and,

(composition):

,

The reflexivity, transitivity, and compositional requirements for a well-founded refinement are natural. If is a correctness property, then if the output predicates in are logically equivalent to and the union of all input domains in is equivalent to . Intuitively, this relation allows to be safely decomposed into a set of sub-domains. Notably, refining an abstract input domain in this way does not compromise correctness:

.

###### Proof Sketch.

By induction on Definition 4.5. We only show the case when is a single correctness property and is a refinement of in which the input domain is decomposed into . We prove if is correct with respect to then it is also correct to .
By Definition 4.3, from we have:

 ⋀(Φiin,Φiout)∈S1F⊨(Φiin,Φiout)

Given the hypothesis:

 ⎛⎜⎝⋀(_,Φiout)∈S1Φiout⟺Φout⎞⎟⎠∧⎛⎜⎝Φin=⋃(Φiin,_)∈S1Φiin⎞⎟⎠

we obtain:

 ⋀(Φiin,Ψout)∈S1F⊨(Φiin,Φout)

which leads to by Definition 4.3. Hence, . ∎

### 4.3. Correctness Loss Function

For an output predicate , we define the distance function that quantifies the distance from an output vector to by

 dist(y,Φout)=minq⊨Φoutdiste(y,q)

where is the standard Euclidean space distance function. We further extend this notion to quantify the distance of an abstracted output to an output predicate , based on which the correctness loss function is formally defined.

###### Definition 4.7 (Correctness Loss Function).

Given an abstract domain , a -compatible neural network , and a correctness property , the correctness loss function from an abstracted output to is

 LD(F,Φin,Φout)=maxp∈FD(Φin)dist(p,Φout).

Thus, the correctness loss function enumerates all possible neural network outputs that are subsumed by the abstract network output to find the one that has the highest distance from ; this output corresponds to the worst-case correctness distance from the abstract output on the abstract domain to . Figure 6 visualizes the definition of the correctness loss function .

The seemingly formidable definition of can be computed efficiently especially when is designed to have maximum and minimum values only appearing on the vertices of an abstract element as depicted in Figure 6. This condition holds for common abstract domains such as the interval, zonotope, hybrid zonotope (Mirman et al., 2018), and DeepPoly (Singh et al., 2019b). Observe that is differentiable since it can be encoded via a MaxPooling unit for such domains.

From the definition of correctness loss function, it follows naturally that when becomes , we can ensure the correctness of against the correctness property. This is formulated in the following theorem.

###### Theorem 4.8 (Zero Correctness Loss).

Given an abstract domain and a -compatible neural network , and a correctness property , .

###### Proof Sketch.

When , by Definition 4.7,

 maxp∈FD(Φin)dist(p,Φout)=0

Since is a non-negative function, we have:

 ∀p∈FD(Φin), dist(p,Φout)=0

Therefore,

 ∀p∈FD(Φin), p⊨Φout

By Theorem 3.5, we have

. Hence,

 ∀x⊨Φin, F(x)⊨Φout

Thus, . ∎

A refined input space abstraction leads to smaller or equal worst-case correctness loss. Consider example 4.4: when we decompose to and , as illustrated in Section 2, the worst-case correctness loss on both partitioned input domains decreases. This intuition is formalized by the following theorem.

###### Theorem 4.9 (Smaller Φin Implies Smaller or Equal Loss).

Given an abstract domain , a -compatible neural network , a correctness property , and an input domain , .

###### Proof Sketch.

If is an input domain that is partitioned from , we prove that the correctness loss on can be reduced from that on . By Definition 3.2 and Definition 3.4, . Intuitively, since the over-approximated output corresponding to a range of inputs must cover all outputs from these inputs, the approximated output of a subset must be within that of the set that covers . Therefore, by Definition 4.7,

 LD(F,Φin,Φout) =maxp∈FD(Φin)dist(p,Φout) ≤maxp∈FD(Ψin)dist(p,Φout) =LD(F,Ψin,Φout)

Finally, we can extend the notion of correctness loss from over a correctness property to over an input space abstraction.

###### Definition 4.10 (Abstract Correctness Loss).

Given an abstract domain , a -compatible neural network , a correctness property , and input space abstraction , the abstract correctness loss of with respect to is denoted by

 LD(F,S)=∑(Φiin,Φiout)∈Ssize(Φin)size(S)⋅LD(F,Φiin,Φiout)

Here, for , . is essentially an accumulation of correctness loss of refined correctness properties encompassed within the input space abstraction . Note that is weighted and proportional to the size of the input domain of each refined correctness property included in .

We extend Theorem 4.8, showing that when correctness loss reduces to , we can prove the correctness of the neural network over the input space abstraction.

###### Theorem 4.11 (Zero Abstract Correctness Loss).

Given an abstract domain , a -compatible neural network , and an input space abstraction , .

###### Proof Sketch.

Intuitively, when training reduces correctness loss on each partitioned input domain to 0, the network is correct on every concrete input subsumed by these input domains. By Definition 4.10, since both and are non-negative, when , we have:

 LD(F,S)=0⟹⋀(Φiin,Φiout)∈SLD(F,Φiin,Φiout)=0

By Theorem 4.8,

 ⋀(Φiin,Φiout)∈SF⊨(Φiin,Φiout)

Then by Definition 4.3. Thus, . ∎

Similarly, we extend Theorem 4.9, showing that input space abstraction refinement leads to smaller or equal correctness loss. This is expressed in the following theorem.

###### Theorem 4.12 (Refine Implies Smaller or Equal Loss).

Given an abstract domain , a -compatible neural network , for any two input space abstractions .

###### Proof Sketch.

By induction on Definition 4.5. We only show the case when is a single correctness property and is a refinement of in which the input domain is decomposed into . We prove after an input space abstraction refinement, the correctness loss over the refined input space abstraction reduces. This result is straightforward following Theorem 4.9.
Given the hypothesis:

 ⎛⎜⎝⋀(_,Φiout)∈S1Φiout⟺Φout⎞⎟⎠∧⎛⎜⎝Φin=⋃(Φiin,_)∈S1Φiin⎞⎟⎠

we obtain:

 ⋀(Φiin,Φiout)∈S1(Φiout⟺Φout)∧(Φiin⊆Φin)

Therefore, by Theorem 4.9,

 LD(F,S1) =∑(Φiin,Φiout)∈S1size(Φiin)size(S1)⋅LD(F,Φiin,Φiout) =∑(Φiin,Φiout)∈S1size(Φiin)size(S1)⋅LD(F,Φiin,Φout) ≤∑(Φiin,Φiout)∈S1size(Φiin)size(S1)⋅LD(F,Φin,Φout) =LD(F,Φin,Φout)=LD(F,S2)

### 4.4. Algorithm

Our correct-by-construction ANN training algorithm is given in Algorithm 1. The algorithm takes as an input an initial input space abstraction , which simply corresponds to a prescribed correctness property. While aiming at guaranteeing correctness, it additionally takes a set of labeled training data as an input in order to achieve a desired accuracy on the trained model. In each training iteration from Line 1 to Line 1, the algorithm mixes network weight optimization and input space abstraction refinement. From Line 1 to Line 1 of Algorithm 1, we obtain the correctness loss of the current input space abstraction that totals the weighted worst-case correctness loss (with respect to the abstract domain ) over all possible correctness counterexamples to the correctness properties defined in . Since the computation of is over-approximated, it follows that, if reduces to , the neural network is guaranteed to be correct with respect to the prescribed correctness property. In the algorithm, the goal is to train to reduce the correctness loss to a very small threshold . Algorithm 1 records in the accuracy loss with respect to the given training examples and . In Line 1, the function can be set to a standard loss function in machine learning, such as the cross-entropy loss.

For the optimization step of Algorithm 1, Line 1 to Line 1 applies gradient-descent on the abstraction of leveraging the worst-case loss and the fact that

is differentiable in our framework. In the implementation, this step is aided by PyTorch

(Paszke et al., 2017), an off-the-shelf automatic differentiation library. We also take accuracy loss into account at this step.

The code snippet in Algorithm 1 from Line 1 to Line 1 heuristically picks a few refined correctness properties in the input space abstraction that account for more correctness loss than average. This heuristic selection strategy aims to assign the highest priority for loss reduction to the most imprecise cases. As illustrated in Section 2, input space abstraction refinement is performed at Line 1, enabling the optimization process to be improved with more accurate gradient information. This snippet can be shown to satisfy the refinement relation (Definition 4.5), as formulated in the following theorem:

###### Theorem 4.13 (Valid Refinement).

For any input space abstraction , the code snippet of Algorithm 1 starting from Line 1 to Line 1 yields an input space abstraction such that .

We can formalize the soundness guarantees offered by our approach.

###### Corollary 4.14 (Art Soundness).

Given abstract domain , -compatible neural network , initial input space abstraction of correctness properties, the output neural network from Algorithm 1 is sound with respect to , i.e., , as long as the correctness loss reduces to .

###### Proof.

From Theorem 4.13, we know for any input space abstraction generated during the execution of Algorithm 1,