Quantitative Projection Coverage for Testing ML-enabled Autonomous Systems

by   Chih-Hong Cheng, et al.

Systematically testing models learned from neural networks remains a crucial unsolved barrier to successfully justify safety for autonomous vehicles engineered using data-driven approach. We propose quantitative k-projection coverage as a metric to mediate combinatorial explosion while guiding the data sampling process. By assuming that domain experts propose largely independent environment conditions and by associating elements in each condition with weights, the product of these conditions forms scenarios, and one may interpret weights associated with each equivalence class as relative importance. Achieving full k-projection coverage requires that the data set, when being projected to the hyperplane formed by arbitrarily selected k-conditions, covers each class with number of data points no less than the associated weight. For the general case where scenario composition is constrained by rules, precisely computing k-projection coverage remains in NP. In terms of finding minimum test cases to achieve full coverage, we present theoretical complexity for important sub-cases and an encoding to 0-1 integer programming. We have implemented a research prototype that generates test cases for a visual object defection unit in automated driving, demonstrating the technological feasibility of our proposed coverage criterion.



page 13


Testing Autonomous Systems with Believed Equivalence Refinement

Continuous engineering of autonomous driving functions commonly requires...

DeepEvolution: A Search-Based Testing Approach for Deep Neural Networks

The increasing inclusion of Deep Learning (DL) models in safety-critical...

Formal Scenario-Based Testing of Autonomous Vehicles: From Simulation to the Real World

We present a new approach to automated scenario-based testing of the saf...

Requirements-driven Test Generation for Autonomous Vehicles with Machine Learning Components

Autonomous vehicles are complex systems that are challenging to test and...

ComOpT: Combination and Optimization for Testing Autonomous Driving Systems

ComOpT is an open-source research tool for coverage-driven testing of au...

Optimizing the Efficiency of Accelerated Reliability Testing for the Internet Router Motherboard

With the rapid development of internet Router, the complexity of its mai...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

There is a recent hype of applying neural networks in automated driving, ranging from perception [3, 9] to the creation of driving strategies [14, 21] to even end-to-end driving setup [1]. Despite many public stories that seemly hint the technical feasibility of using neural networks, one fundamental challenge is to establish rigorous safety claims by considering all classes of relevant scenarios whose presence is subject to technical or societal constraints.

The key motivation of this work is that, apart from recent formal verification efforts [8, 7, 10, 5] where scalability and lack of specification are obvious concerns, the most plausible approach, from a certification perspective, remains to be testing. As domain experts or authorities in autonomous driving may suggest  (incomplete) weighted criteria for describing the operating conditions such as weather, landscape, or partially occluding pedestrians, with these criteria one can systematically partition the domain and weight each partitioned class based on its relative importance. This step fits very well to the consideration as in automotive safety standard ISO 26262, where for deriving test cases, it is highly recommended to perform analysis of equivalence classes (Chap 6, Table 11, item 1b). Unfortunately, there is an exponential number of classes being partitioned, making the naïve coverage metric of having at least one data point in each class unfeasible. In addition, such a basic metric is qualitative in that it does not address the relative importance among different scenarios.

Towards above issues, in this paper we study the problem of quantitative -projection coverage, i.e., for arbitrary criteria being selected ( being a small constant value), the data set, when being projected onto the -hyperplane, needs to have (in each region) data points no less than the associated weight. When  is a constant, the size of required data points to achieve full quantitative -projection coverage remains polynomially bounded. Even more importantly, for the case where the composition of scenarios is constrained by rules, we present an NP algorithm to compute exact -projection coverage. This is in contrast to the case without projection, where computing exact coverage is -hard.

Apart from calculating coverage, another crucial problem is to generate, based on the goal of increasing coverage, fewer scenarios if possible, as generating images or videos matching the scenario in autonomous driving is largely semi-automatic and requires huge human efforts. While we demonstrate that for unconstrained quantitative -projection, finding a minimum set of test scenarios to achieve full coverage remains in polynomial time, we prove that for -projection, the problem is NP-complete. To this end, we develop an efficient encoding to 0-1 integer programming which allows incrementally creating scenarios to maximally increase coverage.

To validate our approach, we have implemented a prototype to define and ensure coverage of a vision-based front-car detector. The prototype has integrated state-of-the-art traffic simulators and image synthesis frameworks [15, 25], in order to synthesize close-to-reality images specific to automatically proposed scenarios.

Figure 1: A total of data points and their corresponding equivalence classes (highlighted as bounding boxes).

(Related Work)

The use of AI technologies, in particular the use of neural networks, has created fundamental challenges in safety certification. Since 2017 there has been a tremendous research advance in formally verifying properties of neural networks, with focuses on neurons using piecewise linear activation function (ReLU). For sound-and-complete approaches, Reluplex and Planet developed specialized rules for managing the 0-1 activation in the proof system 

[10, 7]. Our previous work [5, 4] focused on the reduction to mixed integer liner programming (MILP) and applied techniques to compute tighter bounds such that in MILP, the relaxation bound is closer to the real bound. Exact approaches suffer from combinatorial explosion and currently the verification speed is not satisfactory. For imprecise yet sound approaches, recent work has been emphasizing linear relaxation of ReLU units by approximating them using outer convex polytopes [26, 11, 20]

, making the verification problem feasible for linear programming solvers. These approaches are even applied in the synthesis (training) process, such that one can derive provable guarantees 

[11, 20]. Almost all verification work (apart from [10, 7, 4]) targets robustness properties, which is similar to adversarial testing (e.g., FGSM & iterative attacks [24], deepfool [16], Carlini-Wagner attacks [2]

) as in the machine learning community. All these approaches can be complemented with our approach by having our approach covering important scenarios, while adversarial training or formal verification measuring robustness within each scenario.

For classical structural coverage testing criteria such as MC/DC, they fail to deliver assurance promises, as satisfying full coverage either turns trivial (tanh) or intractable (ReLU). The recent work by Sun, Huang, and Kroening [22] borrows the concept of MC/DC and considers a structural coverage criterion, where one needs to find tests to ensure that for every neuron, its activation is supported by independent activation of neurons in its immediate previous layer. Such an approach can further be supported by concolic testing, as being recently demonstrated by same team [23]. Our work and theirs should be viewed as complementary, as we focus on the data space for training and testing neural networks, while they focus on the internal structure of a neural network. However, as in the original MC/DC concept, each condition in a conditional statement (apart from detecting errors in programming such as array out-of-bound which is not the core problem of neural networks) is designed to describe scenarios which should be viewed as natural consequences of input space partitioning (our work). Working on coverage criteria related to the internal structure of neural networks, provided that one cannot enforce the meaning of an individual neuron but can only empirically analyze it via reverse engineering (as in standard approaches like saliency maps [19]), is less likely provide direct benefits. Lastly, one major benefit of these structural testing approaches, based on the author claims, is to find adversarial examples via perturbation, but the benefit may be reduced due to new training methods with provable perturbation bounds [11, 20].

Lastly, our proposed metric tightly connects to the classic work of combinatorial testing and covering arrays [6, 17, 12, 13, 18]. However, as their application starts within hardware testing (i.e., each input variable being true or false), the quantitative aspects are not really needed and it does not need to consider constrained input cases, which is contrary to our practical motivation in the context of autonomous driving. For unconstrained cases, there are some results of NP-completeness in the field of combinatorial testing, which is largely based on the proof in [18]. It is not applicable to our case, as the proof is based on having freedom to define the set of groups to be listed in the projection. In fact, as listed in a survey paper [12], the authors commented that it remains open whether “the problem of generating a minimum test set for pairwise testing () is NP-complete” and “existing proof in [13] for the NP-completeness of pairwise testing is wrong” (due to the same reason where pairwise testing cannot have freedom to define the set of groups). Our new NP-completeness result in this paper can be viewed as a relaxed case by considering with sampling being quantitative than qualitative.

2 Discrete Categorization and Coverage

Let be the data space, be a finite set called data set, and is called a data point. A categorization is a list of functions that transform any data point to a discrete categorization point , where for all , has co-domain . Two data points and are equivalent by categorization, denoted by , if . The weight of a categorization further assigns value in the co-domain of with an integer value .

Next, we define constraints over categorization, allowing domain experts to express knowledge by specifying relations among categorizations. Importantly, for all data points in the data space, whenever they are transformed using , the transformed discrete categorization points satisfy the constraints.

Definition 1 (Categorization constraint).

A categorization constraint is a set of constraints with each being a CNF formula having literals of the form , where and .

Let abbreviate , where can be either scalar addition, multiplication, or max operators. In this paper, unless specially mentioned we always treat as scalar multiplication. Let be the multi-set , and be set removal operation on such that every categorization point has at most cardinality equal to . We define categorization coverage by requiring that for each discrete categorization point , in order to achieve full coverage, have at least data points.

Definition 2 (Categorization coverage).

Given a data set , a categorization  and its associated weights , define the categorization coverage for data set over and to be , where is the set of discrete categorization points satisfying constraints .

(Example 1) In Fig. 1, let the data space be and the data set be . By setting where for , then for data points , and , applying and creates , i.e., . Similarly, .

  • If is an empty set and , then , removes by keeping one element in each equivalence class, and equals .

  • If and , then rather than  in the unconstrained case, and equals . Notice that all data points, once when being transformed into discrete categorization points, satisfy the categorization constraint.

  • Assume that is an empty set, and , always returns  apart from and returning . Lastly, let be scalar multiplication. Then for discrete categorization points having the form of , a total of data points are needed. One follows the definition and computes to be .

Achieving categorization coverage is essentially hard, due to the need of exponentially many data points.

Proposition 1.

Provided that and , to achieve full coverage where , is exponential to the number of categorizations.


Based on the given condition, , and for each , . Therefore, . As
, to achieve full coverage (and correspondingly ) needs to be exponential to the number of categorizations. ∎

Proposition 2.

Computing exact is -hard.


Computing the exact number of the denominator in Definition 2, under the condition of , equals to the problem of model counting for a SAT formula, which is known to be -complete. ∎

3 Quantitative Projection Coverage

The intuition behind quantitative projection-based coverage is that, although it is unfeasible to cover all discrete categorization points, one may degrade the confidence by asking if the data set has covered every pair or triple of possible categorization with sufficient amount of data.

Definition 3 (-projection).

Let set where elements in do not overlap. Given , define the projection of a discrete categorization point over to be .

Given a multi-set  of discrete categorization points, we use to denote the resulting multi-set by applying the projection function on each element in , and analogously define to be a function which removes elements in such that every element has cardinality at most . Finally, we define -projection coverage based on applying projection operation on the data set , for all possible subsets of  of size .

Definition 4 (-projection coverage).

Given a data set and categorization , define the -projection categorization coverage for data set over and  to be

where function translates a multi-set to a set without element repetition.

(Example) Consider again Fig. 1 with being scalar multiplication, and .

  • For , one computes . In the denominator, has choices, namely , , or . Here we do detailed analysis over , i.e., we consider the projection to .

    • Since , allows all possible  assignments.

    • creates a set with elements 0, 1, 2 with each being repeated  times, and removes multiplicity and creates . The sum equals .

    The “” in the numerator comes from the contribution of with  (albeit it has data points), with , and with .

  • For , one computes . The denominator captures three hyper planes (, , ) with each having grids and with each grid allowing data points.

Notice that Definition 2 and 4 are the same when one takes  with value .

Proposition 3.



When , the projection operator does not change . Subsequently, to-set operator is not effective as is already a set, not a multi-set. Finally, we also have . Thus the denominator part of Definition 2 and 4 are computing the same value. The argument also holds for the numerator part. Thus the definition of can be rewritten as . ∎

The important difference between categorization coverage and -projection coverage (where is a constant) includes the number of data points needed to achieve full coverage (exponential vs. polynomial), as well as the required time to compute exact coverage ( vs. NP).

Proposition 4.

If is a constant, then to satisfy full -projection coverage, one can find a data set whose size is bounded by a number which is polynomial to , and .


In Definition 4, the denominator is bounded by .

  • The total number of possible with size equals , which is a polynomial of  with highest degree being .

  • For each , has at most possible assignments - this happens when .

  • For each assignment of , can at most has largest value .

As one can use one data point for each element in the denominator, which achieves full coverage is polynomially bounded. ∎

(Example 2) Consider a setup of defining traffic scenarios where one has and . When and , the denominator of categorization coverage as defined in Definition 2 equals , while the denominator of -projection coverage equals  and the denominator of -projection coverage equals .

Proposition 5.

If is a constant, then computing -projection coverage can be done in NP. If , then computing -projection coverage can be done in P.

  • For the general case where , to compute -projection coverage, the crucial problem is to know the precise value of the denominator. In the denominator, the part “” is actually only checking if for grid in the projected -hyperplane, whether it is possible to be occupied due to the constraint of . If one knows that it can be occupied, simply add to the denominator by . This “occupation checking” step can be achieved by examining the satisfiability of with being replaced by the concrete assignment of the grid. As there are polynomially many grids (there are  hyperplanes, with each having at most grids), and for each grid, checking is done in NP (due to SAT problem being NP), the overall process is in NP.

  • For the special case where , the “occupation checking” step mentioned previously is not required. As there are polynomially many grids (there are  hyperplanes, with each having at most grids), the overall process is in P. ∎

4 Fulfilling -projection Coverage

As a given data set may not fulfill full -projection coverage, one needs to generate additional data points to increase coverage. By assuming that there exists a data generator function which can, from any discrete categorization point , creates a new data point in such that and (e.g., for image generation,  can be realized using techniques such as conditional-GAN [15] to synthesize an image following the specified criterion, or using manually synthesized videos), generating data points to increase coverage amounts to the problem of finding additional discrete categorization points.

Definition 5 (Efficiently increasing -projection coverage).

Given a data set , categorization and generator , the problem of increasing -projection coverage refers to the problem of finding a minimum sized set , such that .

(Book-keeping -projection for a given data set)

For , we use to represent the data structure for book-keeping the covered items, and use subscript ”” to indicate that certain categorization has been covered times by the existing data set.

(Example 3) Consider the following three discrete categorization points under . Results of applying -projection and -projection are book-kept in Equation 1 and 2 respectively.


(Full -projection coverage under )

To achieve -projection coverage under , in the worst case, one can always generate discrete categorization points for in polynomial time. Precisely, to complete coverage on a particular projection , simply enumerate all possible assignments (a total of assignments, as is a constant, the process is done in polynomial time) for all , and extend them by associating , where , with arbitrary value within , and do it for times. For example, to increase -projection coverage in Equation 2, provided that , one first completes by adding where “-” can be either 0 or 1. One further improves using , and subsequently all others.

As using to be can still create problems when data points are manually generated from discrete categorization points, in the following, we demonstrate important sub-cases with substantially improved bounds over .

Proposition 6 (-projection coverage).

Finding an additional set of discrete categorization points to achieve -projection coverage, with minimum size and under the condition of , can be solved in time , with being bounded by .


We present an algorithm (Algo. 1) that allows generating minimum discrete categorization points for full -projection coverage. Recall for -projection, our starting point is with each recording the number of appearances for element . We use to denote the number of appearances for element  in .

Data: of a given data set, and weight function
Result: The minimum set of additional discrete categorization points to guarantee full -projection
1 while true do
2       let ;
3       for  do
4             for  do
5                   if  then
6                         replace the -th element of by value ;
7                         ;
8                         break /* inner-loop */;
10                   end if
12             end for
14       end for
15      if  then  return ;
16      else  replace every in by value 0, ;
18 end while
Algorithm 1 Algorithm for achieving -projection.

In Algo. 1, for every projection , the inner loop picks a value  whose appearance in  is lower than (line 5-9). If no value is picked for some projection , then the algorithm just replaces by 0, before adding it to the set  used to increase coverage (line 13). If after the iteration, remains to be , then we have achieved full -projection coverage and the program exits (line 12). The algorithm guarantees to return a set fulling full -projection with minimum size, due to the observation that each categorization is independent, so the algorithm stops so long as the categorization which misses most elements is completed. In the worst case, if projection  started without any data, after iterations, it should have reached a state where it no longer requires additional discrete characterization points. Thus, is guaranteed to be bounded by .

Consider the example in Eq. 1. When , the above algorithm reports that only one additional discrete categorization point is needed to satisfy full -projection.

On the other hand, efficiently increasing -projection coverage, even under the condition of , is hard.

Proposition 7 (Hardness of maximally increasing -projection coverage, when ).

Checking whether there exists one discrete categorization point to increase -projection coverage from existing value to value , under the condition where is scalar multiplication, is NP-hard.


(Sketch) The hardness result is via a reduction from 3-SAT satisfiability, where we assume that each clause has exactly three variables. This problem is known to be NP-complete. We consider the case where and , i.e., each categorization function creates values in . Given a 3-SAT formula with clauses, with each literal within the set of variables being , we perform the following construction.

  • Set the weight of categorization such that and .

  • For each clause such as , we create a discrete categorization point by setting , , (i.e., the corresponding assignment makes the clause false) and by setting remaining to be 2. Therefore, the process creates a total of discrete categorization points and can be done in polynomial time.

  • Subsequently, prepare the data structure and record the result of -projection for the above created discrete categorization points. As there are at most boxes of form , with each box having items, the construction can be conducted in polynomial time.

  • One can subsequently compute the -projection coverage. Notice that due to the construction of , all projected elements that contain value 2 should not be counted. The computed denominator should be  rather than  also due to .

Then the problem has a satisfying instance iff there exists a discrete categorization point which increases the -projection coverage from to value .

  • () If has a satisfying instance, create a discrete categorization point where () if the satisfying assignment of , equals false (true). The created discrete categorization point, when being projected, will

    • not occupy the already occupied space (recall that overlapping with existing items in each box implies that the corresponding clause can not be satisfied), and

    • not occupy a grid having (as the assignment only makes  to be 0 or 1), making the point being added truly help in increasing the numerator of the computed coverage.

    Overall, each projection will increase value by , and therefore, the -projection coverage increases from to value .

  • () Conversely, if there exists one discrete categorization point to increase coverage by , due to the fact that we only have one point and there are projections, it needs to increase in each box representing -projection, without being overlapped with existing items in that box and without having value 2 being used. One can subsequently use the value of the discrete categorization point to create a satisfying assignment. ∎

In the following, we present an algorithm which encodes the problem of finding a discrete categorization point with maximum coverage increase to a 0-1 integer programming problem. Stated in Algo. 2, line 1 prepares variables and constraints to be placed in the 0-1 programming framework. For each categorization , for each possible value we create an 0-1 variable (line 3-5), such that iff the newly introduced discrete categorization point has  using value . As the algorithm proceeds by only generating one discrete categorization point, only one of them can be true, which is reflected in the constraint in line 6.

Then starting from line 8, the algorithm checks if a particular projected value still allows improvement . If so, then create a variable (line 10) such that it is set to  iff the newly introduced discrete categorization point will occupy this grid when being projected. As our goal is to maximally increase -projection coverage, is introduced in the objective function (line 11 and 16) where the sum of all variables is the objective to be maximized. Note that is set to iff the newly introduced discrete categorization point guarantees that . For this purpose, line 12 applies a standard encoding tactic in 0-1 integer programming to encode such a condition - If , then . Thus will be set to  to enforce satisfaction of the right-hand inequality of the constraint. Contrarily, if any of , where has value , then needs to set to , in order to enforce the left-hand inequality of the constraint. Consider the example in Eq. 2, where one has . For improving , line 12 generates the following constraint .

Line 14 will be triggered when no improvement can be made by every check of line 9, meaning that the system has already achieved full -projection coverage. Lastly, apply 0-1 integer programming where one translates variable having value  by assigning to in the newly generated discrete categorization point (line 17, 18).

Here we omit technical details, but Algo. 2 can easily be extended to constrained cases by adding to the list of constraints.

Data: The set of the current -projection records, and weight function
Result: One discrete categorization point which maximally increase coverage, or null if current records have achieved full coverage.
1 let , , ,;
2 forall ,  do
3       forall  do
4             ;
6       end forall
7      ;
9 end forall
10forall  do
11       if  then
12             ;
13             ;
14             ;
16       end if
17      if  then return null;
18       else
19             let ;
20             let ;
21             return where in assignment is assigned to ;
23       end if
25 end forall
Algorithm 2 Finding a discrete categorization point which maximally increases -projection coverage, via an encoding to 0-1 integer programming.

5 Implementation and Evaluation

Figure 2: Workflow in the developed prototype for quantitative projection coverage and generation of new synthetic data.
Figure 3: Existing data points (E to E), and the automatically generated data points (G to G) to achieve full coverage.

We have implemented above mentioned techniques as a workbench to support training vision-based perception units for autonomous driving. The internal workflow of our developed tool is summarized in Fig. 2. It takes existing labelled/categorized data set and the user-specified  value as input, computes -projection coverage, and finds a new discrete categorization point which can increase the coverage most significantly. For the underlying 0-1 programming solving, we use IBM CPLEX Optimization Studio111IBM CPLEX Optimization Studio: https://www.ibm.com/analytics/data-science/prescriptive-analytics/cplex-optimizer.

To convert the generated discrete categorization points to real images, we have further implemented a C++ plugin over the Carla simulator222Carla Simulator: http://carla.org/, an open-source simulator for autonomous driving based on Unreal Engine 4333Unreal Engine 4: https://www.unrealengine.com/. The plugin reads the scenario from the discrete categorization point and configures the ground truth, the weather, and the camera angle accordingly. Then the plugin starts the simulation and takes a snapshot using the camera mounted on the simulated vehicle. The camera can either return synthetic images (e.g., images in Fig. 3

) or images with segmentation information, where for the latter one, we further generate close-to-real image via applying conditional GAN framework Pix2Pix from NVIDIA

444https://github.com/NVIDIA/pix2pixHD. Due to space limits, here we detail a small example by choosing the following operating conditions of autonomous vehicles as our categories.

  • Weather =

  • Lane orientation =

  • Total number of lanes (one side) =

  • Current driving lane =

  • Forward vehicle existing =

  • Oncoming vehicle existing =

We used our test case generator to generate new data points to achieve full -projection coverage (with ) starting with a small set of randomly captured data points (Fig. 3, images E to E). Images G to G are synthesized in sequence until full -projection coverage is achieved. The coverage condition of each 2-projection plane is shown in Table 1. Note that there exists one entry in the sub-table (f) which is not coverable (labelled as “X”), as there is a constraint stating that if there exists only  lane, it is impossible for the vehicle to drive on the lane. Fig. 4 demonstrates the growth of -projection coverage when gradually introducing images G to G.

1 Lane 2 Lanes
Sunny G G
Cloudy G G
Rainy E E
(a) Weather & Lanes
Lane Lane
Sunny G G
Cloudy G G
Rainy E G
(b) Weather & Current Lane
Straight Curvy
Sunny G G
Cloudy G G
Rainy G E
(c) Weather & Lane Curve
Sunny G G
Cloudy G G
Rainy G E
(d) Weather & Forward Car
Sunny G G
Cloudy G G
Rainy G E
(e) Weather & Oncoming Car
Lane Lane
1 Lane E X
2 Lanes E G
(f) Lanes & Current Lane
Straight Curvy
1 Lane G E
2 Lanes E E
(g) Lanes & Lane Curve
1 Lane E G
2 Lanes G G
(h) Lanes & Forward Car
1 Lane G E
2 Lanes E E
(i) Lanes & Oncoming Car
Straight Curvy
Lane E E
Lane G G
(j) Current Lane & Lane Curve
Lane E E
Lane G G
(k) Current Lane & Forward Car
Lane E E
Lane G G
(l) Current Lane & Oncoming Car
Straight G E
Curvy E E
(m) Lane Curve & Forward Car
Straight E G
Curvy E E
(n) Lane Curve & Oncoming Car
(o) Forward Car & Oncoming Car
Table 1: -projection coverage tables of the final data set
Figure 4: Change of -projection coverage due to newly generated data.

6 Concluding Remarks

In this paper, we presented quantitative -projection coverage as a method to systematically evaluate the quality of data for systems engineered using machine learning approaches. Our prototype implementation is used to compute coverage and synthesize additional images for engineering a vision-based perception unit for automated driving. The proposed metric can further be served as basis to refine other classical metrics such as MTBF or availability which is based on statistical measurement.

Currently, our metric is to take more data points for important (higher weight) scenarios. For larger  values, achieving full projection coverage may not be feasible, so one extension is to adapt the objective function of Algo. 2 such that the generation process favors discrete categorization points with higher weights when being projected. Another direction is to improve the encoding of Algo. 2 such that the algorithm can return multiple discrete categorization points instead of one. Yet another direction is to further associate temporal behaviors to categorization and the associated categorization constraints, when the data space represents a sequence of images.