Self-Repairing Neural Networks: Provable Safety for Deep Networks via Dynamic Repair

by   Klas Leino, et al.
Carnegie Mellon University

Neural networks are increasingly being deployed in contexts where safety is a critical concern. In this work, we propose a way to construct neural network classifiers that dynamically repair violations of non-relational safety constraints called safe ordering properties. Safe ordering properties relate requirements on the ordering of a network's output indices to conditions on their input, and are sufficient to express most useful notions of non-relational safety for classifiers. Our approach is based on a novel self-repairing layer, which provably yields safe outputs regardless of the characteristics of its input. We compose this layer with an existing network to construct a self-repairing network (SR-Net), and show that in addition to providing safe outputs, the SR-Net is guaranteed to preserve the accuracy of the original network. Notably, our approach is independent of the size and architecture of the network being repaired, depending only on the specified property and the dimension of the network's output; thus it is scalable to large state-of-the-art networks. We show that our approach can be implemented using vectorized computations that execute efficiently on a GPU, introducing run-time overhead of less than one millisecond on current hardware – even on large, widely-used networks containing hundreds of thousands of neurons and millions of parameters.



There are no comments yet.


page 1

page 2

page 3

page 4


Provable Repair of Deep Neural Networks

Deep Neural Networks (DNNs) have grown in popularity over the past decad...

Finding Input Characterizations for Output Properties in ReLU Neural Networks

Deep Neural Networks (DNNs) have emerged as a powerful mechanism and are...

Sample-Specific Output Constraints for Neural Networks

Neural networks reach state-of-the-art performance in a variety of learn...

The Lyapunov Neural Network: Adaptive Stability Certification for Safe Learning of Dynamic Systems

Learning algorithms have shown considerable prowess in simulation by all...

Towards Repairing Neural Networks Correctly

Neural networks are increasingly applied to support decision making in s...

Safe-by-Repair: A Convex Optimization Approach for Repairing Unsafe Two-Level Lattice Neural Network Controllers

In this paper, we consider the problem of repairing a data-trained Recti...

Towards the Quantification of Safety Risks in Deep Neural Networks

Safety concerns on the deep neural networks (DNNs) have been raised when...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Neural networks are being deployed as components in many safety- and security-critical domains, such as autonomous transport, banking, and medical diagnosis, motivating the need to provide proofs of safety for such neural components. Even well-tested, highly accurate networks may still be unsafe, leading to potentially dangerous situations. For instance, the well-studied ACAS Xu networks that implement an airborne collision avoidance system for commercial aircraft have been shown to violate key safety properties (katz17).

A (feed-forward) neural network

is a total function of type learned from input-output examples. In a common mode of use, neural networks are used as classifiers by taking the index of the maximum element of the output vector. Safety properties for classifiers are often expressed in terms of ordering constraints, which specify the safe sets of possible total orderings over the indices of the network’s output vector, defined by the natural order of the corresponding real components. For instance, the safety properties for ACAS Xu studied in prior work (katz17) specify conditions when, e.g., the first index of the network’s output should be maximal.

As the exact real values of the classifier’s output are not relevant to these properties, we may view neural networks as programs of type , where contains all finite totally-ordered sets of elements . These safe ordering properties can be formalized as non-relational safety properties (clarkson08) of the logical form , where is an arbitrary decidable formula over the classifier’s input, and is a statement in the theory of totally ordered sets over its output. They are a sufficiently generic formulation of non-relational safety, as the scale of each output component (also called a logit) is unconstrained, so properties that are sensitive to their values are of limited use. Some applications scale the output

to produce a discrete probability distribution, thus interpreting the values as “confidence” scores. However, these are difficult to calibrate 

(guo2017calibration; johansson19calibration), and may be unreliable in practice. Safe ordering properties thus capture the relevant range of behaviors needed for non-relational safety, and we aim to construct neural networks that are provably safe according to such properties.

Verifying Neural Network Safety

To ensure the safety of neural networks, one strategy is to consider neural networks as programs generated by a learning routine. This casts the problem as one of normal program safety, to which techniques from the wide literature on program verification can be applied. For instance, abstract interpretation is used by gehr18; singh19 to verify properties stated as polyhedral pre- and postconditions; combined with the appropriate use of strengthening and weakening, this leads to a strategy for producing complete Hoare-style safety proofs for neural networks. Alternatively, katz17 encode a network’s semantics as a system of constraints, and pose safety verification as satisfiability modulo the relevant theories.

Employing program verification methods to prove neural network safety has led to some success. However, the scalability of these techniques remains a serious challenge for the large majority of neural network applications. More importantly, post-training verification does not address the problem of constructing safe networks to begin with, and conventional training methods are unlikely to produce networks that satisfy many useful safety properties, as shown by, e.g., mirman2018differentiable; lin20. Naively repeating training when verification fails, while supervising the correction of an observed set of counterexamples, is likely to be prohibitively expensive: training requires floating-point operations (FLOPs) for a state-of-the-art image model (tan19; tan21), and

FLOPs for modern language models 


Safe-by-Construction Learning

Another strategy takes the perspective that neural networks should be safe by construction. In other words, one proves that the learning algorithm produces only safe networks, rather than verifying the networks a posteriori

. Supervised learning, the most common framework for producing neural classifiers, is framed as an optimization problem:

where is a family of neural networks (referred to as the architecture) parameterized by , is a labeled training set, and is a real-valued loss function that measures how well “fits” the training data. As may be insufficient to ensure that the solution satisfies a safety property , safe-by-construction methods typically modify the objective by adding a safety loss term that measures the degree to which satisfies :

If is carefully designed, then one may prove that when the optimization result meets certain criteria (e.g., ), then is provably safe (lin20). However, there is no guarantee that will satisfy such criteria, and in those cases, one must fall back on verifying .

An alternative approach to safe learning modifies the architecture such that every network in the family is safe; in other words, one proves that is safe for all . This is attractive, as safety is, in principle, decoupled from optimization, and thus not conditional on its outcome. It also ensures that learning need not be repeated for the sake of safety, and network verification is unnecessary. However, designing a provably-safe architecture that is both compatible with effective learning and flexible enough to support a broad set of safety properties is challenging. To date, this approach has only been employed in a few restricted settings, such as Lipschitz certification (anil19; li2019preventing; trockman2021orthogonalizing), a relational notion of safety, and global robustness (leino21; leino2021relaxing), a relaxed notion of robustness where the network is allowed to selectively abstain from prediction while maintaining safety.

Our Contributions

We present a technique for modifying neural network architectures to ensure that they satisfy safe ordering properties. In particular, we describe a transformer that, given an architecture and a set of safe ordering properties , produces a new architecture such that satisfies the conjunction of for all parameters . Viewing the neural network as a composition of layers, our transformer appends a self-repairing layer (SR-Layer) to . This layer encodes a check-and-repair mechanism, so that when violates , the SR-Layer modifies the output to ensure safety. This approach is similar in spirit to those that dynamically repair errors caused by traditional software issues like division-by-zero, null dereference, and others (long14; rinard04; kling12; berger06; qin05; perkins09). Such mechanisms may be impractical for arbitrary neural network safety properties, as they may require solving arbitrarily complex constraint-satisfaction problems. We show that this is not the case for safe ordering properties, and that the solver needed for these constraints can be efficiently embedded in the repair layer.

A more pernicious issue with repair mechanisms for neural networks is that safe outputs may not be accurate, or that the steps taken to repair the network inadvertently damage its accuracy111A classifier is accurate on an input if its prediction matches the corresponding label given in the training or test data. on points outside the scope of a given safety property For example, sotoudeh2021provable propose a static repair method for safety properties over polyhedral pre- and postconditions, and observe non-trivial drawdown on several benchmarks, where the classifier’s previously-correct behavior on affected points is “forgotten” after repair. Thus, if not done carefully, repair may harm the model’s accuracy. We identify a key property, transparency, which ensures that the repair mechanism never has a negative impact on the network’s accuracy. Transparency requires that a prediction of the original network be retained whenever it is consistent with at least one ordering allowed by . However, if is inconsistent with the “correct” label specified by the data, then it is impossible for the network to be safe without harming accuracy, and the repair prioritizes safety.

Beyond transparency, in some cases repair may improve classifier accuracy. This can potentially occur, for example, when the correct prediction is consistent with , but a suboptimal training outcome yields a model that is both unsafe and incorrect. To open the possibility for the training procedure to take repair into account during training—so that models that are more often safe and correct can be found—we design the SR-Layer, including the embedded constraint solver, to be both vectorized and differentiable. This allows the implementation of our approach within popular neural network frameworks, so that models composed with them can be efficiently trained using gradient-based optimization with hardware acceleration. Moreover, the vectorized implementation significantly reduces the run-time overhead of our check-and-repair mechanism.

Finally, while the SR-Layer achieves safety without negatively impacting accuracy, it necessarily adds computational overhead each time the network is executed. Our empirical evaluation focuses on this issue, and how the overhead is impacted by several key factors. We show that the cost of the SR-Layer depends solely on and the length of the output vector, and thus importantly, is independent of the size or complexity of the underlying neural network. On three widely-used benchmark datasets (ACAS Xu (katz17), Collision Detection (ehlers2017formal)

, and CIFAR-100 

(krizhevsky09)), we show that this overhead is small in real terms (0.26-0.82 ms), and does not pose an impediment to practical adoption. In fact, because the overhead is independent of network size, its impact is less noticeable on larger networks, where the cost of evaluating the original classifier may come to dominate that of the repair. To further characterize the role of and , we use synthetic data and random safe ordering properties to isolate the effects that the postcondition complexity and number of classes have on network run time. Our results suggest that while these structural traits of the specified safety property can impact run time—the satisfiability of general ordering constraints is NP-complete (guttman06variations)—this may often not be an issue in practice.

To summarize, the main contributions of our work are as follows:

  • We define a generic notion of non-relational safety for neural network classifiers, which we call safe ordering properties,

  • We present a method for transforming arbitrary neural network architectures into safe-by-construction versions that are guaranteed to (i) satisfy a given set of safe ordering properties, and (ii) preserve or improve the empirical accuracy of the original model.

  • We show that the SR-Layer can be designed in a way that is both fully-vectorized and differentiable, which enables hardware acceleration to reduce run-time overhead, and facilitates its use during training in cases where doing so might improve the repaired model’s accuracy.

  • We empirically demonstrate that the overhead introduced by the SR-Layer is small enough for its deployment in practical settings.

The rest of the paper is organized as follows. In Section 2, we formally setup our problem. In Section 3, we present details of the SR-Layer. In Section 4, we prove that our SR-Layer ensures safety without loss in accuracy. In Section 5, we show how to encode the SR-Layer as a differentiable and vectorizable computation. We describe our experimental setting and results on evaluating the run-time overheads of the SR-Layer in Section 6. Finally, we survey the related work in Section 7 and conclude in Section 8.

2. Problem Setting

In this section, we formalize the concepts of safe ordering properties and self-repair. We begin by presenting background on neural networks and an illustrative application of safe ordering properties. We then formally define the problem we aim to solve, and introduce a set of desired properties for our self-repairing transformer.

2.1. Background

Neural Networks

A neural network, , is a total function defined by an architecture

, or composition of linear and non-linear transformations, and a set of

weights, , parameterizing its linear transformations. As neither the details of a network’s architecture nor the particular valuation of its weights are relevant to much of this paper, we will by default omit the subscript , and treat as a black-box function. Neural networks are commonly used as classifiers, by extracting predictions from the output , also called the logits of a network. Given a neural network , we use the upper-case to refer to the corresponding neural classifier: . For our purposes, we will assume that returns a single index, ; ties may be broken arbitrarily.

ACAS Xu: An Illustrative Example

Throughout this paper, we will use ACAS Xu as a running example to present the problem and the solutions we propose. The Airborne Collision Avoidance System X (ACAS X) (kochenderfer15) is a family of collision avoidance systems for both manned and unmanned aircraft. ACAS Xu, the variant for unmanned aircraft, is implemented as a large numeric lookup table (2GB) mapping sensor measurements to horizontal maneuver advisories. The lookup table represents an optimal

-step horizon policy for a partially observable Markov decision process formulation of the collision avoidance problem. As the table is too large for many certified avionics systems,

julian2019acas proposed the use of neural networks as a compressed, functional representation of the lookup table.

A neural network for the ACAS Xu problem therefore is a function which, given an encoding of the physical state of the aircraft and of a neighboring object (the intruder), outputs maneuveur advisories that the aircraft should follow to avoid collision. The physical state is comprised of six different features: the distance () between the aircraft and the intruder, the relative angle () from the aircraft to the intruder, the angle () from the intruder’s heading to the aircraft’s heading, the speed of the aircraft (), and of the intruder (), and the time () until loss of vertical separation. The model outputs one of five possible advisories: either that no change is needed, also called clear-of-conflict (COC), or that the aircraft must steer weakly to the left, weakly to the right, strongly to the left, or strongly to the right. These advisories correspond to output dimensions 0 through 4, respectively. The networks proposed by julian2019acas are functions ; the value is discretized and 45 different neural networks are constructed, one for each combination of the previous advisory () and discretized value of . While julian2019acas used the convention that the index of the minimal element of is the predicted advisory, in this paper we will use the more common convention of the maximal value’s index. Thus, if , then , which corresponds to the advisory weak left.

Observing that the ACAS Xu neural networks are approximations of the original lookup table, and could thus deviate arbitrarily from the table’s advisory on untested regions of their input space, several researchers have proposed verifying them against a set of safety properties (katz17; singh19; lin20). katz17 proposed 10 such properties, which capture requirements such as, “If the intruder is near and approaching from the left, the network advises strong right.” Formally, this property would consist of a precondition on the network’s input that encodes the intruder being “near,” and a postcondition that the logit corresponding to strong right is maximal.

2.2. Problem Definition

Definition 1 presents the safe ordering properties that we consider throughout the rest of the paper. Intuitively, these properties associate constraints on the relative ordering of a network’s output values (a postcondition) with a predicate on the corresponding input (a precondition). As we will see in later sections, the precondition does not need to belong to a particular theory, and need only to come with an effective procedure for deciding new instances.

Definition 1 (Safe ordering property).

Given a neural network , a safe ordering property, , is a precondition, , consisting of a decidable proposition over , and a postcondition, , given as a Boolean combination of order relations between the real components of .

Assuming a function, eval  :  bool, that decides given , notated as , and a similar eval function for , we say satisfies safe ordering property at iff . We use the shorthand to denote this; and given a set of properties , we write to denote and to denote .

Two points about our definition of safe ordering properties bear mentioning. First, although postconditions are evaluated using the inequality relation from real arithmetic, we assume that , and thus specifically exclude equality comparisons between the output components. This is a realistic assumption in nearly all practical settings, and in cases where it does not hold, can be resolved with arbitrary tie-breaking protocols that perturb to remove any equalities. Second, we omit explicit negation from our syntax, as it can be achieved by swapping the positions of the affected order relations; i.e., is just , as we exclude the possibility that .

Sections 6.3 and 6.4 provide several concrete examples of safe ordering properties. Example 2 revisits the property for ACAS Xu that was discussed in the previous section.

Example 2 (ACAS Xu).

Recall the property described earlier: “If the intruder is near and approaching from the left, the network advises strong right.” This is a safe ordering property , where the precondition is captured as a linear real arithmetic formula given by katz17:

In fact, nine of the ten properties proposed by katz17 are safe ordering properties. The single exception has a postcondition that places a constant lower-bound on , despite the fact that the logit values of the network can be freely scaled without impacting the network’s behavior as a classifier, making the interpretation of this property unclear.

Given a set of safe ordering properties, , our goal is to obtain a neural network that satisfies everywhere. In later sections, we show how to accomplish this by describing the construction of a self-repairing transformer (Definition 3) that takes an existing, possibly unsafe network, and produces a related model that satisfies at all points. While in practice, a meaningful, well-defined specification should be satisfiable for all inputs, our generic formulation of safe ordering properties in Definition 1 does not enforce this restriction; we can, for instance, let . To account for this, we lift predicates to operate on , where is considered valid for all .

Definition 3 (Self-repairing transformer).

A self-repairing transformer, , is a function that, given a set of safe ordering properties, , and a neural network, , produces a network, denoted as , that satisfies the following properties:

  1. [label=()]

  2. Safety:

  3. Forewarning:

In other words, is safe with respect to and produces a non- output everywhere that is satisfiable. We will refer to the output of , , as a self-repairing network, or SR-Net.

Definition 31 captures the essence of the problem that we aim to solve, requiring that the self-repairing network make changes to its output according to . While allowing it to abstain from prediction by outputting may appear to relax the underlying problem, note that this is only allowed in cases where cannot be satisfied on : definition 32 is an equivalence that precludes trivial solutions such as . However, it still allows abstention in exactly the cases where it is needed for principled reasons. A set of safe ordering properties may be mutually satisfiable almost everywhere, except in some places; for example: . In this case, can abstain at , and everywhere else must produce outputs in obeying .

While the properties required by Definition 3 are sufficient to ensure a non-trivial, safe-by-construction neural network, in practice, we aim to apply , which we will write as , to models that already perform well on observed test cases, but that still require a safety guarantee. Thus, we wish to repair networks without interfering with the existing network behavior when possible, a property we call transparency (Property 4).

Property 4 (Transparency).

Let be a self-repairing transformer. We say that satisfies transparency if

In other words, always produces an SR-Net, , for which the predictions derived from the safe output vectors of agree with the predictions of the original model whenever possible.

Property 4 leads to a useful result, namely that whenever is consistent with accurate predictions, then the classifier obtained from is at least as accurate as (Theorem 5). Formally, we characterize accuracy in terms of agreement with an oracle classifier that “knows” the correct label for each input, so that is accurate on if and only if . We note that accuracy is often defined with respect to a distribution of labeled points rather than an oracle; however our formulation captures the key fact that Theorem 5 holds regardless of how the data are distributed.

Theorem 5 (Accuracy Preservation).

Given a neural network, , and set of properties, , let and let be the oracle classifier. Assume that satisfies transparency. Further, assume that accuracy is consistent with safety, i.e.,



Let such that . By hypothesis, we have that , hence we can apply Property 4 to conclude that . ∎

One subtle point to note is that even when is consistent with accurate predictions, it is possible for a network to be accurate yet unsafe at an input. Example 6 describes such a situation. Our formulation of Property 4 is carefully designed to ensure accuracy preservation even in such scenarios.

Example 6 (Accuracy need not imply safety).

Consider the property proposed for ACAS Xu by katz17 which says: “Even if the intruder is distant and is significantly slower than the own ship, the score of the COC advisory should never be minimal.” This safe ordering property is applicable for all networks that correspond to COC and is concretely written as follows:

For some such that is true, let us assume that and for a network , , so that . Then, is accurate at , but the COC advisory receives the minimal score, meaning is unsafe at with respect to . If the transformer satisfies Property 4, then by Theorem 5, is guaranteed to be accurate as well as safe at , since is consistent with accuracy here (as does not preclude class 1 from being maximal).

3. Self-repairing Transformer

In this section, we describe our self-repairing transformer, SR. We begin with a high-level overview of the approach (Section 3.1), and provide algorithmic details in Sections 3.2 and 3.3. In later sections, we prove that SR models Definition 3, satisfies transparency, and can be implemented as a vectorized, differentiable function.

3.1. Overview

transformed neural network

original network


Figure 1. Illustration of the result of applying the transformer SR with a set of safe ordering properties, , to network, .

Our self-repairing transformer, SR, leverages the fact that whenever a safe ordering property is satisfiable at a point, it is possible to bring the network to compliance. Neural networks are typically constructed by composing a sequence of layers; we thus compose an additional self-repair layer

that operates on the original network’s output, and produces a result that will serve as the transformed network’s new output. This is reflected in the

SR routine in Algorithm 1, and Figure 1 shows the resulting network. The original network, , executes normally, and the self-repair layer subsequently takes both the input (to facilitate checking the preconditions of ) and , from which it either abstains (outputs ) or produces an output that is guaranteed to satisfy .

The high-level workflow of the self-repair layer, SR-Layer, proceeds as follows. The layer starts by checking the input against each of the preconditions, and derives an active postcondition. This is then passed to a solver, which attempts to find the set of orderings that are consistent with the active postcondition. If no such ordering exists, i.e., if the active postcondition is unsatisfiable, then the layer abstains with . Otherwise, the layer minimally permutes the indices of the original output vector in order to satisfy the active postcondition while ensuring transparency (Property 4).

3.2. Algorithmic Details of the Self-Repair Layer

The core logic of our approach is handled by a self-repair layer, or SR-Layer, that is appended to the original model, and dynamically ensures its outputs satisfy the requisite safety properties. The procedure followed by this layer, SR-Layer (shown in Algorithm 1), first checks if the input and output of the base network already satisfy (line 1). If they do, no repair is necessary and the repaired network can safely return . Otherwise, SR-Layer attempts to find a satisfiable ordering constraint that entails the relevant postconditions in (line 1). FindSatConstraint either returns such a term that consists of a conjunction of ordering literals , or returns whenever no such exists. When FindSatConstraint returns , then SR-Layer does as well (lines 1-1). Otherwise, the constraint identified by FindSatConstraint is used to repair the network’s output (line 1), where Repair permutes the logit values in to arrive at a vector that satisfies . Note that because is satisfiable, it is always possible to find a satisfying solution by simply permuting because the specific real values are irrelevant, and only their order matters (see Section 4).

Inputs: A set of safety properties, and a network,
Output: A network,
2        return
4        if  then
5               return
6       else
7               if  then
8                      return
9              else
10                      return
Algorithm 1 Self-repairing transformer

3.2.1. Finding Satisfiable Constraints

The FindSatConstraint procedure is shown in Algorithm 2. Recall that the goal is to identify a conjunction of ordering literals that implies the relevant postconditions in at the given input . More precisely, this means that for each precondition satisfied by , the corresponding postcondition is implied by . This is sufficient to ensure that any model of will satisfy at ; i.e., .

To accomplish this, FindSatConstraint first evaluates each precondition, and obtains a disjunctive normal form (DNF), , of the active postcondition, defined in Equation 1 (line 2).


In practice, we implement a lazy version of ToDNF that generates disjuncts as needed (see Section 5), as this step may be a bottleneck and we only need to process each clause individually. At this point, FindSatConstraint could proceed directly, checking the satisfiability of each disjunct in , and returning the first satisfiable one it encounters. This would be correct, but as we wish to satisfy transparency (Property 4), we first construct an ordered list of the terms in which prioritizes constraints that maintain the maximal position of the original prediction, (Prioritize, line 2). Property 1 formalizes the behavior required of Prioritize.

Property 1 (Prioritize).

Given and a list of conjunctive ordering constraints , the result of is a reordered list such that:


where denotes the set of root nodes of the directed graph .

The IsSat procedure (invoked on line 2, also shown in Algorithm 2) determines whether a conjunctive ordering constraint is satisfiable. It is based on an encoding of as a directed graph, embodied in OrderGraph (lines 2-2), where each component index of corresponds to a node, and there is a directed edge from to if the literal appears in the constraint . A constraint is satisfiable if and only if contains no cycles (lines 2-2(graphissat). Informally, acyclicity is necessary and sufficient for satisfiability because the directed edges encode immediate ordering requirements, and by transitivity, a cycle involving entails that .

Inputs: A set of safety properties, , a vector , and a vector
Output: Satisfiable ordering constraint,
2        return
4        return
6        foreach  do
7               if  then
8                      return
10       return
Algorithm 2 Finding a satisfiable ordering constraint from safe-ordering properties

3.2.2. Repairing Violations

Inputs: Satisfiable ordering constraint , model output
Output: A vector
2        return
Algorithm 3 Repair procedure for safe ordering properties

Algorithm 3 describes the Repair procedure, used to ensure the outputs of the SR-Layer satisfy safety. The inputs to Repair are a satisfiable ordering constraint , and the output of the original network . The goal is to permute such that the result satisfies , without violating transparency. Our approach is based on OrderGraph, the same directed-graph encoding used by IsSat. It uses a stable topological sort of the graph encoding of to construct a total order over the indices of that is consistent with the partial ordering implied by (line 3). TopologicalSort returns a permutation , a function that maps indices in to their rank (or position) in the total order. Formally, TopologicalSort takes as argument a graph , and returns such that Equation 3 holds.


I.e., if the edge is in the graph, then occurs before in the ordering. In general, many total orderings may be consistent, but in order to guarantee transparency, TopologicalSort also needs to ensure the following invariant, captured by Property 2, capturing that the maximal index is listed first in the total order if possible.

Property 2 ().

Given a graph, , and , the result of satisfies

where denotes the set of root nodes of the directed graph .

In other words, the topological sort preserves the network’s original prediction when doing so is consistent with . Then, by sorting in descending order, the sorted vector can be used to construct the final output of Repair, . For any index , we simply set to the component of , since gives the desired rank of the logit value and components in are sorted according to the component values (line 3). An example of the complete Repair procedure is given by Example 3.

Example 3 (Repair).

We refer again to the safety properties introduced for ACAS Xu (katz17). The postcondition of property states that the logit score for class 0 (COC) is not minimal, which can be written as the following ordering constraint:

Suppose that for some input , the active postcondition is equivalent to , and that . Further, suppose that FindSatConstraint has returned , corresponding to the second disjunct of (satisfying ). We then take the following steps according to :

  • First we let . We note that all vertices of the graph representation of are roots except for , which has as its parent. We observe that , which corresponds to a root node; thus by Property 2, . Moreover, by our ordering constraint, we also have that . Thus, the ordering where , , , , and is a possible result of TopologicalSort, which we will assume for this example.

  • Next we obtain by a descending sort that .

  • Finally we obtain by indexing by the inverse of , that is . This gives us that , , , , and , resulting in a final output of , which we observe (i) satisfies , and (ii) preserves the prediction of class 1.

3.3. Complexity

Given a neural network , we define the input size as and output size as . Also, assuming that the postconditions for all are expressed in DNF, we define the size of a property as the number of disjuncts in and define , i.e., the number of properties in . Then, the worst-case computational complexity of SR-Layer is given by Equation 4, where is the complexity of ContainsCycle, is the complexity of TopologicalSort, and is the maximum number of disjuncts possible in if the postconditions are in DNF.


The complexity given by Equation 4 is with respect to a cost model that treats matrix operations—e.g., matrix multiplication, associative row/column reductions—as constant-time primitives. Crucially, note that the complexity does not depend on the size of the neural network .

4. Safety and Accuracy

In this section, we present the two main theorems of our work. First, we show that the approach presented in Section 3 yields a self-repairing transformer, i.e., SR satisfies the properties from Defintion 3. Second, we prove that the network transformer we propose also satisfies transparency (Property 4), ensuring a repaired network is at least as accurate as the original network (Theorem 5).

4.1. Sr is a Self-Repairing Transformer

We start by proving that the transformer presented in Algorithm 1, SR, is self-repairing, i.e., it satisfies Properties 31 and 32. Recall that this means that will either return safe outputs vectors, or in the event that is inconsistent at a point, and only in that event, return .

Let be an arbitrary vector. If is initially satisfied, the SR-Layer does not modify the original output , and Properties 31 and 32 are trivially satisfied. If does not hold, we will rely on two key properties of FindSatConstraint and Repair to establish that SR is self-repairing. The first, Property 1, requires that FindSatConstraint either return , or else return ordering constraints that are sufficient to establish .

Property 1 (FindSatConstraint).

Let be a set of safety properties, and two vectors. Then satisfies the following properties:

  1. [label=(),font=]


The first observation is that the list of ordering constraints in accurately models the initial set of properties , i.e.,


This stems from the definition of the disjunctive normal form, and from the fact that Prioritize only performs a permutation of the disjuncts.

We also rely on the following loop invariant, stating that all disjuncts considered so far, when iterating over , were unsatisfiable:


Here, returns the index of constraint in the list . This invariant is trivially true when entering the loop, since the current is the first element of the list. Its preservation relies on correctly determining whether is satisfiable, i.e.,  (graphissat).

Combining these two facts, we can now establish that FindSatConstraint satisfies 11 and 12. By definition, outputs if and only if it traverses the entire list , never returning a . From loop invariant 6, this is equivalent to , which finally yields property 11 from equation 5. Conversely, if outputs , then . We directly obtain property 12 as, for any , implies that by application of equation 5

Next, Property 2 states that Repair correctly permutes the output of the network to satisfy the constraint that it is given. Combined with Property 1, this is sufficient to show that SR is a self-repairing transformer (Theorem 3).

Property 2 (Repair).

Let be a satisfiable ordering constraint, and a vector. Then satisfies .


Let be an atom in . Reusing notation from Algorithm 3, let , , and . We have that is an edge in , which implies that by Equation 3. Because the elements of are sorted in descending order, and assumed to be distinct (Definition 1), we obtain that , i.e., that . ∎

Theorem 3 (Sr is a self-repairing transformer).

SR (Algorithm 1) satisfies conditions (i) and (ii) of Definition 3.


By definition of Algorithm 1, if and only if outputs . We derive from Property 11 that this is equivalent to , which corresponds exactly to Property 32. Conversely, if is satisfiable for input , i.e., , then outputs . By definition, we have , which satisfies by application of Property 2, which in turn implies that by application of Property 12. ∎

4.2. Sr is Transparent

Now that we have demonstrated that our approach produces safe-by-construction networks, we next prove that it also preserves predictions when possible, i.e., that SR satisfies transparency, as formalized in Property 4.

Let be an arbitrary vector. As in the previous section, if is initially satisfied, transparency trivially holds, as the repair layer does not modify the original output . When does not hold, we will rely on several additional properties about FindSatConstraint, Repair, and OrderGraph. The first, Property 4, states that whenever the index of the network’s prediction is a root of the graph encoding of used by FindSatConstraint and Repair, then there exists an output which satisfies that preserves that prediction.

Property 4 (OrderGraph).

Let be a satisfiable, disjunction-free ordering constraint, and a vector. Then,

The intuition behind this property is that belongs to the roots of if and only if there is no constraint in ; hence since is satisfiable, we can always permute indices in a solution to have . Formally, Lemma 1 in Section 5.1.2 entails this property, as it shows that the permutation returned by TopologicalSort satisfies it.

Next, Property 5 formalizes the requirement that whenever FindSatConstraint returns a constraint (rather than ), then that constraint will not eliminate any prediction-preserving solutions that would otherwise have been compatible with the full set of properties

Property 5 (FindSatConstraint).

Let be a set of safety properties, and two vectors, and . Then,


Let us assume that , and that . We will proceed by contradiction, assuming that there does not exist such that and , which entails that by application of Property 4. In combination with the specification of Prioritize (Property 1), this implies that any such that occurs before in , i.e., . From loop invariant 6, we therefore conclude that there does not exist such a , which contradicts the hypothesis by application of Equation 5. ∎

Lastly, Property 6 states that Repair (Algorithm 3) will always find an output that preserves the original prediction, whenever the constraint returned by FindSatConstraint allows it. This is the final piece needed to prove Theorem 7, the desired result about the self-repairing transformer.

Property 6 (Repair).

Let be a satisfiable term, and a vector. Then,


Assume that there exists such that and . This entails that (Property 4), which in turn implies that is 0 (property 2). By definition of a descending sort, we have that , such that , hence concluding that by injectivity of . ∎

Theorem 7 (Transparency of Sr).

SR, the self-repairing transformer described in Algorithm 1 satisfies Property 4.


That the SR transformer satisfies transparency is straightforward given Properties 4-6. Let us assume that there exists such that and . By application of Property 11, this implies that outputs , and therefore that there exists such that and by application of Property 5, since is defined as . Composing this fact with Property 6, we obtain that , since by definition. ∎

5. Vectorizing Self-Repair

Widely-used machine learning libraries, such as TensorFlow 

(abadi2016tensorflow), simplify the implementation of parallelized, hardware-accelerated code by providing a collection of operations on multi-dimensional arrays of uniform type, called tensors

. One can view such libraries as domain-specific languages that operate primarily over tensors, providing embarassingly parallel operations like matrix multiplication and associative reduction, as well as non-parallel operations like iterative loops and sorting routines. In this section, we present matrix-based algorithms implementing the core procedures used by

SR-Layer described in Section 3. As we will later see in Section 6, taking advantage of these frameworks allows our implementation to introduce minimal overhead, typically fractions of milliseconds. Additionally, it means that SR-Layer can be automatically differentiated, making it fully compatible with training and fine-tuning.

Several of the subroutines of FindSatConstraint and Repair (Algorithms 2 and 3 presented in Section 3) operate on an OrderGraph, which represents a conjunction of ordering literals, . An OrderGraph contains a vertex set, , and edge set, , where contains a vertex, , for each class in , and contains an edge, , from vertex to vertex if the literal is in . We represent an OrderGraph as an adjacency matrix, , defined according to Equation 7.


Section 5.1 describes the matrix-based algorithm that we use to conduct the stable topological sort that Repair (Algorithm 3) depends on. It is based on a classic parallel algorithm due to dekel81graph, which we modify to ensure that SR satisfies transparency (Property 4). Section 5.2 describes our approach to cycle detection, which is able to share much of the work with the topological sort. Finally, Section 5.3 discusses efficiently prioritizing ordering constraints, needed to ensure that SR satisfies transparency.

5.1. Stable Topological Sort

Inputs: A graph, , represented as an adjacency matrix, and a vector,
Result: A permutation,
          // set the value of each vertex to the
          // smallest value among its ancestors
          // calculate the depth of each
         // vertex in the graph
        return argsort()   // break ties in favor of minimum depth
Algorithm 4 Stable Topological Sort

Our approach builds on a parallel topological sort algorithm given by dekel81graph, which is based on constructing an all pairs longest paths (APLP) matrix. However, dekel81graph’s algorithm is not stable in the sense that the resulting order depends only on the graph, and not on the original order of the sequence, even when multiple orderings are possible. While for our purposes this is sufficient for ensuring safety, it is not for transparency. We begin with background on constructing the APLP matrix, showing that it is compatible with a vectorized implementation, and then describe how it is used to perform a stable topological sort.

5.1.1. All Pairs Longest Paths.

The primary foundation underpinning many of the graph algorithms in this section is the all pairs longest paths (APLP) matrix, which we will denote by . On acyclic graphs, for is defined to be the length of the longest path from vertex to vertex . Absent the presence of cycles, the distance from a vertex to itself, , is defined to be 0. For vertices and for which there is no path from to , we let .

We compute from using a matrix-based algorithm from dekel81graph, which requires taking matrix max-distance products, where the max-distance product is equivalent to a matrix multiplication where element-wise multiplications have been replaced by additions and element-wise additions have been replaced by the pairwise maximum. That is, a matrix product can be abstractly written with respect to operations and according to Equation 8, and the max-distance product corresponds to the case where and .


Using this matrix product, can be computed recursively from by performing a fast matrix exponentiation, as described in Equations 9 and 10.


5.1.2. Stable Sort

We propose a stable variant of the dekel81graph topological sort, shown in Algorithm 4. Crucially, this variant satisfies Property 2 (Lemma 1), which Section 3.2.2 identifies as sufficient for ensuring transparency. Essentially, the value of each logit is adjusted so that it is at least as small as the smallest logit value corresponding to vertices that are parents of vertex , including itself. A vertex, , is a parent of vertex if , meaning that there is some path from vertex to vertex or . The logits are then sorted in descending order, with ties being broken in favor of minimum depth in the dependency graph. The depth of vertex is the maximum of the column of , i.e., the length of the longest path from any vertex to . An example trace of Algorithm 4 is given in Figure 2. By adjusting into such that for all ancestors, , of , , we ensure each child vertex appears after each of its parents in the returned ordering–once ties have been broken by depth—as the child’s depth will always be strictly larger than that of any of its parents since a path of length to an immediate parent of vertex implies the existence of a path of length to vertex .

Lemma 1 ().

TopologicalSort satisfies Property 2.


Note that the adjusted logit values, , are chosen according to Equation 11.


We observe that (i) for all root vertices, , , and (ii) the root vertex with the highest original logit value will appear first in the topological ordering. The former follows from the fact that the root vertices have no ancestors. The latter subsequently follows from the fact that the first element in a valid topological ordering must correspond to a root vertex. Thus if , then is the vertex with the highest logit value, and so by (ii), it will appear first in the topological ordering produced by TopologicalSort, establishing Property 2. ∎

transformed neural network

0 (2)

1 (3)

2 (1)

3 (4)

4 (5)

transformed neural network

0 (2,0)

1 (3,0)

2 (1,2)

3 (3,1)

4 (2,1)
Figure 2. Example trace of Algorithm 4. (fig:sort:init): The dependency graph and original logit values, . The values of each logit are provided; the non-bracketed number indicates the logit index and the number in brackets is the logit value, e.g., . Arrows indicate a directed edge in the dependency graph; e.g., we require