nn-dependability-kit: Engineering Neural Networks for Safety-Critical Systems

11/16/2018 ∙ by Chih-Hong Cheng, et al. ∙ 0

nn-dependability-kit is an open-source toolbox to support safety engineering of neural networks. The key functionality of nn-dependability-kit includes (a) novel dependability metrics for indicating sufficient elimination of uncertainties in the product life cycle, (b) formal reasoning engine for ensuring that the generalization does not lead to undesired behaviors, and (c) runtime monitoring for reasoning whether a decision of a neural network in operation time is supported by prior similarities in the training data.



There are no comments yet.


page 1

page 2

page 3

page 4

Code Repositories


Toolbox for software dependability engineering of artificial neural networks

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In recent years, neural networks have been widely adapted in engineering automated driving systems with examples in perception [5, 19, 32], decision making [25, 35, 2], or even end-to-end scenarios [3, 40]. As these systems are safety-critical in nature, problems during operation such as failed identification of pedestrians may contribute to risk behaviors. Importantly, the root cause of these undesired behaviors can be independent of hardware faults or software programming errors but can solely reside in the data-driven engineering process

, e.g., unexpected results of function extrapolation between correctly classified training data.

In this paper, we present nn-dependability-kit, an open-source toolbox to support data-driven engineering of neural networks for safety-critical domains. The goal is to provide evidence of uncertainty reduction in key phases of the product life cycle, ranging from data collection, training & validation, testing & generalization, to operation. nn-dependability-kit is built upon our previous work[10, 9, 7, 8, 6], where (a) novel dependability metrics [9, 7] are introduced to indicate uncertainties being reduced in the engineering life cycle, (b) formal reasoning engine [10, 6]

is used to ensure that the generalization does not lead to undesired behaviors, and (c) runtime neuron activation pattern monitoring 

[8] is applied to reason whether a decision of a neural network in operation time is supported by prior similarities in the training data.

Figure 1: Using a simplified GSN to understand solutions (Sn) provided by nn-dependability-kit, including the associated goals (G), assumptions (A) and strategies (S).

Concerning related work, our results [10, 9, 7, 8, 6] are within recent research efforts from the software engineering and formal methods community targeting to provide provable guarantee over neural networks [39, 20, 30, 10, 23, 14, 18, 13, 12] or to test neural networks [36, 37, 7, 29, 15, 26]. The static analysis engine inside nn-dependability-kit for formally analyzing neural networks, as introduced in our work in 2017 [10] as a pre-processing step before exact constraint solving, has been further extended to support the octagon abstract domain [28] in addition to using the interval domain. One deficiency of the above mentioned works is how to connect safety goals or uncertainty identification [17, 22, 33] to concrete evidences required in the safety engineering process. This gap is tightened by our earlier work of dependability metrics [9] being partly integrated inside nn-dependability-kit. Lastly, our runtime monitoring technique [8]

is different from known results that either use additional semantic embedding (and integrate it in the loss function) for computing difference measures 

[27] or use Monte-Carlo dropout [16] as ensembles: our approach provides a sound guarantee of the similarity measure based on the neuron word distance to the training data.

2 Features and Connections to Safety Cases

Fig. 1 provides a simplified Goal Structuring Notation (GSN) [21] diagram to assist understanding how features provided by nn-dependability-kit contribute to the overall safety goal111Note that the GSN in Fig. 1 may not be complete, but it can serve as a basis for further extensions.. Our proposed metrics, unless explicitly specified, are based on extensions of our early work [9]. Starting with the goal of having a neural network to function correctly (G1), based on assumptions where no software and hardware fault appears (A1, A2), the strategy (S1) is to ensure that within different phases of the product life cycle, correctness is ensured. These phases include data preparation (G2), training and validation (G3), testing and generalization (G4), and operation (G5).

(Data preparation)

Data preparation includes data collection and labeling. Apart from correctly labeling the data (G6), one needs to ensure that the collected data covers all operating scenarios (G7). An artificial example of violating such a principle is to only collect data in sunny weather, while the vehicle is also expected to be operated in snowy weather. Quantitative projection coverage metric [7] and its associated test case generation techniques (Sn1, Sn2), based on the concept of combinatorial testing [24, 11], are used to provide a relative form of completeness against the combinatorial explosion of scenarios.

(Training and validation)

Currently, the goal of training correctness is refined into two subgoals of understanding the decision of the neural network (G10) and correctness rate (G8), which is further refined by considering the performance under different operating scenarios (G9). Under the assumption where the neural network is used for vision-based object detection (A5), metrics such as interpretation precision (Sn5) and sensitivity of occlusion (Sn6) are provided by nn-dependability-kit.

(Testing and generalization)

Apart from classical performance measures (Sn7), we also test the generalization subject to known perturbations such as haze or Gaussian noise (G11) using the perturbation loss metric (Sn8). Provided that domain knowledge can be formally specified (A3), one can also apply formal verification (Sn9) to examine if the neural network demonstrates correct behavior with respect to the specification (G12).


In operation time, as the ground truth is no longer available, a dependable system shall raise warnings when a decision of the neural network is not supported by prior similarities in training (S3). nn-dependability-kit provides runtime monitoring (Sn10) [8] by first using binary decision diagrams (BDD) [4]

to record binarized neuron activation patterns in training time, followed by checking if an activation pattern produced during operation is contained in the BDD.

3 Using nn-dependability-kit

Example 1: Formal Verification of a Highway Front Car Selection Network.

The first example is to formally verify properties of a neural network that selects the target vehicle for an adaptive cruise control (ACC) system to follow. The overall pipeline is illustrated in Fig. 2, where two modules use images of a front facing camera of a vehicle to (i) detect other vehicles as bounding boxes (vehicle detection via YOLO [31]) and (ii) identify the ego-lane boundaries (ego-lane detection). Outputs of these two modules are fed into the third module called target vehicle selection, which is as a neural-network based classifier that reports either the index of the bounding box where the target vehicle is located, or a special class for “no target vehicle”.

Figure 2: Illustration of the Target vehicle selection pipeline for ACC. The input features of the Target vehicle selection NN are defined as follows: 1-8 (possibly up to 10) are bounding boxes of detected vehicles, E is an empty input slot, i.e., there are less than ten vehicles, and L stands for the ego-lane information.

As the target vehicle selection neural network takes a fixed number of bounding boxes, one undesired situation of the neural network appears when the network outputs the existence of a target vehicle with index , but the corresponding -th input does not contain a vehicle bounding box (marked with “E” in Fig. 2). For the snapshot in Fig. 2, the neural network should not output class  or  as there are only  vehicle bounding boxes. This undesired property, encoded in the form of linear constraints over inputs and outputs of the neural network, can be proven with the static analysis engine of nn-dependability-kit in seconds, using the below code snippet.

from nndependability.formal import staticanalysis ... staticanalysis.verify(inputMinBound, inputMaxBound, net, True, inputConstraints, riskProperty)

Example 2: Perturbation Loss over German Traffic Sign Recognition Network.

Another example is to analyze a neural network trained under the German Traffic Sign Recognition Benchmark [34] with the goal of classifying various traffic signs. With nn-dependability-kit, one can apply the perturbation loss metric using the below code snippet, in order to understand the robustness of the network subject to known perturbations.

from nndependability.metrics import PerturbationLoss metric = PerturbationLoss.Perturbation_Loss_Metric() ... metric.addInputs(net, image, label) ... metric.printMetricQuantity("AVERAGE_LOSS")

Figure 3: Using nn-dependability-kit to perturb the image of a traffic sign (a) and compute the maximum and average performance drop, by applying perturbation on the data set (b).

As shown in the center of Fig. 3-a, the original image of “end of no overtaking zone” is perturbed by nn-dependability-kit using seven methods. The application of Gaussian noise changes the result of classification, where the confidence of being “end of no overtaking zone” has dropped  (from the originally identified  to ). When applying perturbation systematically over the complete data set, the result is summarized in Fig. 3

-b, where one concludes that the network is robust against haze and fog (the maximum probability reduction is close to 

) but should be improved against FGSM attacks [38] and snow.