Towards Better Test Coverage: Merging Unit Tests for Autonomous Systems

We present a framework for merging unit tests for autonomous systems. Typically, it is intractable to test an autonomous system for every scenario in its operating environment. The question of whether it is possible to design a single test for multiple requirements of the system motivates this work. First, we formally define three attributes of a test: a test specification that characterizes behaviors observed in a test execution, a test environment, and a test policy. Using the merge operator from contract-based design theory, we provide a formalism to construct a merged test specification from two unit test specifications. Temporal constraints on the merged test specification guarantee that non-trivial satisfaction of both unit test specifications is necessary for a successful merged test execution. We assume that the test environment remains the same across the unit tests and the merged test. Given a test specification and a test environment, we synthesize a test policy filter using a receding horizon approach, and use the test policy filter to guide a search procedure (e.g. Monte-Carlo Tree Search) to find a test policy that is guaranteed to satisfy the test specification. This search procedure finds a test policy that maximizes a pre-defined robustness metric for the test while the filter guarantees a test policy for satisfying the test specification. We prove that our algorithm is sound. Furthermore, the receding horizon approach to synthesizing the filter ensures that our algorithm is scalable. Finally, we show that merging unit tests is impactful for designing efficient test campaigns to achieve similar levels of coverage in fewer test executions. We illustrate our framework on two self-driving examples in a discrete-state setting.



page 14

page 15


Sionnx: Automatic Unit Test Generator for ONNX Conformance

Open Neural Network Exchange (ONNX) is an open format to represent AI mo...

Adversarial Specification Mining

There have been numerous studies on mining temporal specifications from ...

Secure Control in Partially Observable Environments to Satisfy LTL Specifications

This paper studies the synthesis of control policies for an agent that h...

Synthesizing Adaptive Test Strategies from Temporal Logic Specifications

Constructing good test cases is difficult and time-consuming, especially...

Model-Based Safe Policy Search from Signal Temporal Logic Specifications Using Recurrent Neural Networks

We propose a policy search approach to learn controllers from specificat...

Adaptive Testing for Specification Coverage

Ensuring correctness of cyber-physical systems (CPS) is an extremely cha...

Synthesis of Static Test Environments for Observing Sequence-like Behaviors in Autonomous Systems

In this paper, we investigate formal test-case generation for high-level...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Rigorous test and evaluation of autonomous systems is imperative for deploying autonomy in safety-critical settings [25]. In the case of testing self-driving cars, operational tests are constructed manually by experienced test engineers and can be combined with test cases generated in simulators using falsification techniques [11]. In addition, operational testing of self-driving cars on the road is expensive, and would need to be repeated after every design iteration [13]. In this paper, we pose the question of whether it is possible to check multiple requirements in a single test execution. Addressing this question is the first step towards optimizing for the largest number of test requirements checked in as few operational tests as possible.

The study of principled approaches to testing, verification and validation is a relatively young but growing research area. In the formal methods community, falsification is the technical term referring to the study of optimization algorithms, typically black-box, and sampling techniques to search for inputs that result in the system-under-test violating its formal requirements on input-output behavior [1, 2, 8, 9, 12, 24]. Falsification algorithms require a metric defined over temporal logic requirements to quantitatively reason about the degree to which a formal requirement has been satisfied. Assuming that the design of the autonomous system is black-box, falsification algorithms seek to find inputs that minimize the metric associated with satisfying formal requirements. The reasoning here is that minimizing this metric brings the system closer to violating its requirements, thus being a critical test scenario [14]. Formal methods literature uses falsification and testing interchangeably. In addition to manually constructing operational tests, falsification is used to find critical scenarios in simulation and the test environment parameters characterizing these critical scenarios are used for operational testing [11].

Figure 1: Overview of the proposed framework. The blocks the left represent the inputs to the algorithm that define the unit tests, the blocks on the top represent inputs describing the system under test, the building blocks of our approach are shown in the blue shaded box, and the test policy is the result of the algorithm.

Falsification aims to find parameters of the test environment that lead the system to violate its requirements. However, our approach is different in that we construct a test with respect to a test specification, which characterizes a set of desired test executions. For example, consider an autonomous car on a test track. The requirement for the autonomous car is to drive around the track and follow traffic rules while the human drivers of the test vehicles are instructed to drive in a specific fashion (ex: maintaining some distance between each other). These guidelines given to the test drivers constitute the test specification, which is not known to the system-under-test. Instead of considering all possible test environment policies, the test specification restricts the space of scenarios that our test policy search algorithm searches over. It also leverages reactivity: test scenarios are not planned in advance, but the test environment agents will react depending on the actions taken by the system under test.

Our contributions are the following. First, we formally characterize a test by three attributes, a test specification, a test environment and a test policy. Second, we leverage the merge operator from assume-guarantee contract theory to merge two unit test specifications into a merged specification, resulting in a single test that checks the test specifications of both unit tests. Furthermore, if necessary, we characterize temporal constraints on the merged test specification. Finally, we use Monte Carlo Tree Search (MCTS) to search for a test environment policy corresponding to the test specification, and use receding horizon synthesis techniques to prevent the search procedure from exploring policies that violate the test specification. This framework is illustrated in Figure 1.

2 Background

In this work, we choose Linear Temporal Logic (LTL) to represent the system and test specifications. LTL is a temporal logic language for describing linear-time properties over traces of computer programs and formally verifying their properties [23]. Although first introduced to formally describe properties of computer programs, LTL has been used for formal methods applications in control such as temporal logic synthesis of planners and controllers [26, 17, 15].

Definition 1 (Linear Temporal Logic (LTL) [3])

Given a set of atomic propositions , the syntax of LTL is given by the following grammar:


where is an atomic proposition, (and) and (not) are logic operators, and (next) and (until) are temporal operators. Other temporal operators such as (always), (eventually), (always eventually), and (eventually always) can be derived. Let be an LTL formula over the set of atomic propositions . The semantics of LTL are inductively defined over an infinite sequence of states as follows: i) If , iff evaluates to true at , ii) iff , iii) iff , iv) iff , v) iff such that for all , and . An infinite sequence satisfies an LTL formula , denoted by , iff .

In our framework, we consider a fragment of LTL specifications in the class of generalized reactivity of rank 1 ([22]. specifications are expressive for capturing safety (), liveness (), and recurrence () requirements that are relevant to several autonomous systems [17, 26]. A formula is as follows,


where the subscript refers to the robotic system for which a reactive controller is being synthesized, and , , and , define respectively, the initial requirements, safety requirements and recurrence requirements on the system denoted by . Similarly, , , and , define requirements on the environment of the system . Furthermore, synthesis for formulas has time complexity , where is the size of the state space [22].

2.0.1 Assume-Guarantee Contracts

Contract-based design was first developed as a formal modular design methodology for analysis of component-based software systems [19, 7, 18], and later applied for the design and analysis of complex autonomous systems [20, 10]. In this work, we adopt the mathematical framework of assume-guarantee contracts presented in [5, 21].

Definition 2 (Assume-Guarantee Contract)

Let be an alphabet and be the set of all behaviors over . A component over the alphabet is defined as . Then an assume-guarantee contract is defined as a pair , where is a set of behaviors for assumptions on the environment in which the component operates, and is a set of behaviors for the guarantees that the component provides, assuming its assumptions on the environment are met. is an implementation of a contract, , if and only if [4].

In this work, the assumptions and guarantees constituting assume-guarantee contracts are LTL formulas. To facilitate the contract algebra, we will consider contracts in their saturated form, where a contract is defined as . In Section 3 we define system and test specifications with LTL and borrow operators from assume-guarantee contract theory in Section 4.1 to formally define the merge of two unit tests.

3 Problem Setup

First we define the system under test, which we will refer to as system for brevity, and its corresponding system specification. We assume that the system, the system specification, and the controller are provided by the designer of the system and cannot be modified when designing the test.

Definition 3 (Transition System)

A transition system is a tuple , where is a set of states and is a transition relation. If a transition from to , we write .

Definition 4 (System)

Let be the set of system variables, and let be the set of all possible valuations of . A system is a transition system , where the transition relation is defined by the dynamics of the system.

Definition 5 (System Specification)

A system specification is the formula,


where is the initial condition that the system needs to satisfy, encode system dynamics and safety requirements on the system, and specifies recurrence goals for the system. Likewise, , , and represent assumptions the system has on the test environment.

The system is evaluated in a test environment, which comprises of both the test track and test agents. A test is characterized by the test environment, a test specification, and a test policy. Our approach differs from falsification in that we are not generating a test strategy to stress test the system for . Instead, we synthesize a test for a new concept — a test specification — which describes the set of behaviors we would like to see in a test. For example, an informal version of a test specification is requiring test agents to “drive around the test track at a fixed speed while maintaining a certain distance from each other”.

Definition 6 (Test Environment)

Let be the set of test environment variables, and let be the set of all possible valuations of . A test environment is a transition system , where the transition relation is defined by the dynamics of the test agents.

Definition 7 (Test Specification)

A test specification is the formula,


where , and , , and are propositional formulas from equation (3). Additionally, and describe the safety and recurrence formulas for the test environment in addition to the dynamics of the test environment known to the system. Note that the system is unaware of these additional specifications on the test environment, and the test specification is such that the system is allowed to satisfy its requirements. Defining the test specification in this manner allows for i) synthesizing a test in which the system, if properly designed, can meet , and ii) specifying additional requirements on the test environment, unknown to the system at design time. We assume that test specifications are defined a priori; we leave finding relevant test specifications to future work.

Let be a turn-based product transition system constructed from and , where , and . In particular, for every transition , we have where . Similarly, for every transition , we have where .

Definition 8 (Game Graph)

Let and be copies of the states . Let denote the set of transitions , and let denote the set of transitions for some and . Then the game graph is a directed graph with vertices and edges .

Definition 9 (Policy)

On the game graph , a policy for the system is a function such that , where and . Similarly defined, denotes the test environment policy, where is the Kleene star operator.

Definition 10 (Test Execution)

A test execution starting from vertex is an infinite sequence of states on the game graph . Since is a turn-based game graph, the states in the test execution alternate between and , so if , then . Let be the test execution starting from state for policies and . Let denote the set of all possible test executions on . A robustness metric is a function evaluated assigning a scalar value to a test execution.

Problem 1

Given system and environment transition systems, and , two unit test specifications and , and a robustness metric , find a test policy , such that


3.0.1 Running Example — Lane Change

Consider the example of lane change illustrated in Figure 2. The system (red car) must merge into the lower lane before the track ends, and must not collide with the test environment agents (blue cars). Thus, the liveness requirement of changing lanes, , and the safety requirement of not colliding with test agent , , constitute part of the system specification . In the two unit tests, we have the system changing into the other lane in front of and behind a tester car, respectively, and in the merged test, it finished its lane change maneuver in between the tester cars.

Figure 2: Lane change example with initial (left) and final (right) configurations. The -coordinates are numbered from left to right, and -coordinates are numbered top to bottom, starting from 1. The system (red) is required to merge into the lower lane without colliding. Merging in front of (top), behind (center), or in between (bottom) tester agents (blue).

4 Merging Unit Tests

In this section, we will outline our main approach for merging unit tests. First, we define the notion of a merged test and use the merge operator for merging test specifications and add temporal constraints to the test specification, if necessary. Then, we construct an auxiliary graph corresponding to the merged test specification and describe the synthesis of the test policy filter on this auxiliary graph using a receding horizon approach.

4.1 Merging Test Specifications

The merge, also known as strong merge, operator of two contracts and is defined as follows,


In addition to strong merge, contract theory defines other operators over a pair of contracts such as composition and conjunction [5, 21]. Among all these operators, strong merge is the only operator that requires assumptions from both unit contracts (and as a result, unit test specifications) to hold true. Thus, we choose the strong merge operator to derive the merged test specification. Given two unit test specifications, and , we can construct the corresponding contracts and , where being the assumptions on the system (under test), and being the guarantees for unit test .

Remark 1

We make the following modifications to guarantees for brevity. First, we assume that the only recurrence requirements in the test specification is , which is not a part of the system’s assumptions on the environment. Second, we assume that the merged test environment is a simple product transition system of the unit test environments, and . On the merged test environment, we assume that the initial conditions and are equivalent, and test environment dynamics and are equivalent. Therefore, in merging the two unit specifications, we refer to the test guarantees as .

Definition 11 (Merged Test)

From the merged contract , the specification , where , and is the merged test specification. A test environment policy for merged test specification results in a test execution .

Lemma 1

Given unit test specifications and such that is the corresponding merged test specification. Then, for every test execution such that , we also have that and .


Suppose and are the assume-guarantee contracts corresponding to unit test specifications and . Applying strong merge operator on contracts and , we get:


Thus, the merged test specification requires either one of the assumptions to not be satisfied, or for both the guarantees hold. Since , and , we get that and . ∎

A key point in our framework is that we select and to guide the test search, that is, we do not allow merged test policies that vacuously satisfy the merged test specification. This allows the test environment to always give the system an opportunity to satisfy its specification. If assumptions ever get violated, that is because of the system, and not the design of the test.

Returning to our lane change example, we define the unit test specifications as merging behind a car and merging in front of a car. The respective saturated assume guarantee contracts are defined as and with and , and and being the assumptions and guarantees of the two individual tests. Thus, after applying the strong merge operation to the two contracts, the guarantee of the merged test specification for the lane change example is,


4.2 Temporal Constraints on the Merged Test Specification

Definition 12 (Temporally constrained tests)

For a test trace , let be the suffix of the trace, starting at time . Let be times such that and , and assume there exists a time such that for all , such that and assume that there exists a time such that for all , such that . Then if and the tests are parallel-merged in the interval . If and , or and , the tests are temporally constrained.

In this section, we will outline when the merged test specification requires a more constrained temporal structure. To ensure that the test execution will provide the desired information, we need to make certain that each test specification is sufficiently checked. For example, consider the lane change example. There exist many executions in which one of the unit tests is satisfied (i.e. the car merges in front of a vehicle), but it is not guaranteed that the other specification is satisfied as well. Therefore these two tests can be parallel-merged. In contrast to this there exist test specifications where satisfying one will trivially satisfy the other. Then we are not able to distinguish which specification was checked, thus these unit tests should not be parallel-merged to ensure that during the test there is a point in time where each test specification is satisfied individually.

Proposition 1

If for two test specifications and , and the set of all test executions , we have , then these tests cannot be parallel-merged. Instead, the temporal constraint must be enforced on and .


We refine the general specification in equation (7), which allows any temporal structure, to include the temporal constraints in the guarantees. The temporally constrained merged test specification is thus defined as , with


Because any trace satisfying will also satisfy , . Any test trace satisfying this specification will consist of at least one occurrence of visiting a state satisfying and not and vice versa. Thus the guarantees of the specifications for each unit test, and are checked individually during the merged test which satisfies the temporal constraints.∎

4.3 Receding Horizon Synthesis of Test Policy Filter

Since the test specification characterizes the set of possible test executions, we need a policy for the test environment that is consistent with the test specification. In this section, we detail the construction of an auxiliary game graph and algorithms for receding horizon synthesis of the test specification on the auxiliary game graph. This filter will then be used to find the test policy (detailed in Section 4.4).

4.3.1 Auxiliary Game Graph

Assume we are given a game graph constructed according to Definition (8), and a (merged) test specification in form as in equation (4). Then, for each recurrence requirement in the test specification, , we can find a set of states that satisfy the propositional formula . For each , there exists a non-empty subset of vertices that can be partitioned into . We follow [26] in partitioning the states; is the set of states in that is exactly steps away from the goal state . From this partition of states, we can construct a partial order, , such that for all . This partial order will be useful in the receding horizon synthesis of the test policy outlined below [26]. We construct an auxiliary game graph (illustrated in Figure 3) to accommodate any temporal constraints on the merged test specification before proceeding to synthesize a filter for the test policy. Without loss of generality, we elaborate on the auxiliary graph construction in the case of one recurrence requirement in each unit specification, but this approach can be easily extended to multiple progress requirements. An illustration of the auxiliary graph is given in Figure 3. Let and be the two unit test specifications, with and , respectively. First, we make three copies of the game graph , , and . Note that, , and are all copies of , but are denoted differently for differentiating between the vertices that constitute , and a similar argument applies to edges of these subgraphs. Let be the set of states in that satisfy propositional formula . Likewise, the set of states satisfy the propositional formula .

Figure 3: Auxiliary game graph construction for the merged test specification of unit test specifications and . Subgraphs , and are copies of the game graph constructed per Definition 8. In , the sets of states at which the progress propositional formulas of test specifications, and , are satisfied are shaded yellow and blue, respectively.

Now, we connect the various subgraphs through the vertices in and . Let be an outgoing edge from a node , and let be the vertex in subgraph that corresponds to vertex in . Remove edge and add the edge . Likewise, every outgoing edge from in is replaced by adding edges to and . On subgraphs and , vertices are partitioned and partial orders are constructed once again for and , respectively. From defined on the nodes of the graph , every outgoing edge is replaced by a corresponding edge to . Subgraph is connected back to in a similar manner. The construction of the auxiliary graph and partial order is summarized in Algorithm 2. Our choice of constructing the auxiliary graph in this manner is amenable to constructing a simple partial order as outlined below.

Assumption 1

For unit test specifications and with recurrence specifications and , respectively, such that and . Suppose there exist partial orders and on corresponding to and , respectively. Assume that at least one of the following is true: (a) there exists an edge where and for some , (b) there exists an edge where and for some .

Lemma 2

If Assumption 1 holds, there exists a partial order on for the merged recurrence propositional formula, , where is the propositional formula that evaluates to true at: (i) all such that , (ii) all such that , and (iii) all such that .


Let denote the non-empty set of states at which evaluates to true. Then, let be the subset of states that is at least steps away from a vertex in . Then, we can construct the partial order , where is the distance of the farthest vertex connected to . The subset of vertices is non-empty because is non-empty. Furthermore, from Assumption 1, if (a) holds, there exists a such that is non-empty. Likewise, if (b) holds, there exists a such that is non-empty. Therefore, for some there exists a test execution over the game graph such that . ∎

Remark 2

If Assumption 1 is not true, the unit tests corresponding to test specifications and cannot be merged.

4.3.2 Receding Horizon Synthesis on

We leverage receding horizon synthesis to scalably compute the set of states from which the test environment can realize the test specification on the system in a test execution. Note that we are not synthesizing a test strategy using the receding horizon approach, instead using as a filter on a search algorithm (MCTS) that finds an optimal test policy. Further details on applying receding horizon strategies for temporal logic planning can be found in [26]. A distinction in our work is that there can be multiple states in graph that satisfy a progress requirement on the test specification.

For a test specification with progress propositional formula , let be the set of states on at which evaluates to true. Specifically, for some goal , if the product state starts at steps from (i.e. ), the test environment is required to guide the product state to . The corresponding formal specification for the test environment is,


where is the invariant condition that ensures that is realizable. See [26] for further details on how this invariant can be constructed. Since there are different ways to satisfy the goal requirement , and the test specification requires that we satisfy for at least one . To capture this in the receding horizon framework the test execution must progress to at least one , formally stated as,


Thus, the set of states from which the test environment has a strategy that satisfies the specification in equation (11) is the short horizon filter, denoted by . Let denote the supremum of all shortest paths from a vertex to some . Then, overall test policy filter is the union of short-horizon test policy filters,


The synthesis of and its use as a test policy filter in the MCTS procedure used to find the test environment policy is outlined in Algorithm 1. Note, that this receding horizon approach to generating a filter can be applied on any specification and its corresponding game graph. For the merged test specification, is generated on where is the set of states corresponding to , and for simplicity, we apply the following arguments on . Let be the subgraph of induced by such that and .

4.3.3 On as a test policy filter

Inspired by work on shield synthesis [6], we use the winning set as a filter to guide rollouts in the Monte Carlo Tree Search sub-routine for finding the test policy. Since is a disjunction of short-horizon specifications, it is possible that an execution always satisfies without ever satisfying the progress requirement . This happens when the test execution makes progress towards some but never actually reaches a goal in , resulting in a live lock. Further details addressing this are given in the Appendix. We assume that the graph is constructed such that there are no such cycles. In addition to using to ensure that will always be satisfied, we enforce progress by only allowing the search procedure to take actions that will lead to a state which is closer to one of the goals . Thus, the search procedure will ensure that for every state , the control strategy for the next horizon will end in , such that for at least one goal .

Figure 4: Illustration of the intersection of the winning sets for the unit specification. are depicted as circles and as rhombi. The black states lie in the intersection and the filter will ensure that only these states are being searched. The orange intersection represents the set of traces of the merged test specification.
Theorem 4.1

Receding horizon synthesis of test filter is such that any test execution on starting from an initial state in satisfies the test specification in equation (4).


For the recurrence formula of the merged test specification, , suppose there exists a single vertex on that satisfies . Then, it is shown in [26] that if there exists a partial order on , we can find a set of vertices , such that every test execution that remains in , will satisfy the safety requirements and , and the invariant . Furthermore, given the partial order , one can find a test policy to ensure that the makes progress along the partial order such that for some , . However, in case of multiple vertices in that satisfy , we need to extend the receding horizon synthesis to specification . We construct the filter and also check that for every test execution , there exists such that for every , and . Therefore, because the auxiliary game graph is assumed to not have cycles, the test execution makes progress on the partial order of at least one at each timestep, thus eventually satisfying . Thus every execution of our algorithm will satisfy equation (4).∎

0:  Unit test specifications and , system specification , System , unit test environments, and , and quantitative metric of robustness ,
0:  Merged test specification , Merged test environment , Merged test policy
1:   Construct contracts for and
2:   Merged test environment
3:   Product transition system
4:   Game graph from product transition system
5:   strong merge) Constructing the merged specification
6:   Merged test specification
7:   Auxiliary game graph.
8:   Defining goal states and partial orders
9:  for  do
10:      Partial order for goal
11:      Receding horizon specification for goal at distance
12:  end for
13:   Test policy filter for goal at a distance of
14:   Searching for test policy guided by
15:  return  , ,
Algorithm 1 Merge Unit Tests

4.4 Searching for a Test Policy

To find the merged test policy , we use Monte-Carlo Tree Search (MCTS), which is a search method that and combines random sampling with the precision of a tree search. Using MCTS with an upper confidence bound (UCB) was introduced in [16] as upper confidence bound for trees (UCT) which guarantees that given enough time and memory, the result converges to the optimal solution. We use MCTS to find , the approximate solution to Problem 1 for the merged test. We apply the filter that was generated according to the approach detailed in Section 4.3 to constrain the search space as shown graphically in Figure 4.

Proposition 2

Algorithm 1 is sound.


This follows by construction of the algorithm and the use of MCTS with UCB. Given a test policy and a system policy , for every resulting execution starting from an initial state in , it is guaranteed that by Theorem 4.1. This is because for any action chosen by the test environment according to the policy found by MCTS, we are guaranteed to remain in for any valid system policy . If or the initial state is not in

, the algorithm will terminate before any rollout is attempted and no policy is returned. It can be shown that the probability of selecting the optimal action converges to 1 as the limit of the number of rollouts is taken to infinity. For convergence analysis of MCTS, please refer to 

[16]. ∎

4.4.1 Complexity Analysis

The time complexity of synthesis is in the order of , where is the size of the state space. To improve the scalability, our algorithm uses a receding horizon approach to synthesize the winning sets, which reduces the time complexity significantly, please prefer to [26]. The complexity for MCTS is given as with the number of rollouts, the branching factor of the tree, the depth of the tree, and the number of iterations. In our approach the filter reduces the size of the search space, for a visualization refer to Figure 4. The number of rollouts and iterations are design variables, that can be chosen to ensure convergence. More details on the complexity of MCTS for the lane change example can be found in the Appendix.

Definition 13 (Coverage)

A test execution covers a test specification if the test execution non-trivially satisfies the test specification, that is, and . A set of test executions covers the set of test specifications iff for each test specification , there exists a test execution such that covers .

Optimizing for the smallest set of test executions that cover a set of test specifications is combinatorial in the number of test specifications. In this work, we outlined an algorithm for merging two unit tests. In future work, given unit tests, we will consider the problem of constructing a smaller set of merged test specifications with upper bounds on .

Lemma 3

Given a set of unit test specifications, such that test executions are are required to cover , i.e. one test execution for each test specification, merging unit tests results in test executions that cover where . The equality holds iff no two unit tests in can be merged.


If at least a pair of test specifications in can be merged, it is possible to characterize a set of test specifications such that the cardinality of , , is always smaller than . If each test specification in has a test execution, then we have test executions. ∎

5 Examples

We implemented the examples as a discrete gridworld simulation in Python, where the system controller is non-deterministic and the test agents follow the test policy generated by our framework. We use the Temporal Logic and Planning Toolbox (TuLiP) to synthesize the winning sets [27] and online MCTS to find the test policy. Videos of the results can be found in the linked GitHub repository.

5.1 Lane Change

For our discrete lane change example, we define as the x-value of the cell in which the system finished its lane change maneuver. We search for the test policy that satisfies the test specification in equation (8) as explained in Section 4. Snapshots of the resulting test execution are depicted in Figure 5.

Figure 5: Snapshots during the execution of the test generated by our framework. The system under test (red car) needs to merge onto the lower lane between the two test agents (blue cars).

5.1.1 Unprotected left-turn at intersection

Consider the example of an autonomous vehicle (AV) crossing an intersection with the intention of taking a left-turn. The test agents are a car approaching the intersection from the opposite direction and a pedestrian crossing the crosswalk to the left of the AV under test. The intersection layout can be seen in Figure 6. The individual tests are defined to be waiting for a car, and waiting for a pedestrian while taking a left turn. The unit specification for waiting for the pedestrian are defined according to equation (4), with


with the system coordinates, the initial state of the system, the set of desired goal states after the left turn, the pedestrian coordinates, and the states in which the car must wait for the pedestrian if the pedestrian is in a state in . Similarly we define the specification for waiting for the tester car (detailed in the Appendix).

Figure 6: Snapshots during the execution of the unprotected left turn test generated by our framework. The autonomous vehicle (AV) under test (red) should take an unprotected left turn and wait for the pedestrian and the car (blue) individually, which are agents of the test environment. In the snapshots at time steps 8 and 12, the AV waits just for the car, and in time step 21 it waits just for the pedestrian.

The robustness metric is assumed to be the time until the traffic light changes to red starting the moment the system executes a successful left turn, and minimizing this metric results in a difficult test execution. Next, we merge unit test contracts, and derive the resulting merged test specification. According to Proposition 

1, this merged specification needs to include the temporal constraints as defined in equation (9). In this example, waiting for the tester car and waiting for the pedestrian trivially imply each other in this example. Any execution of the system waiting at the intersection will satisfy both unit specifications. Thus we need to find a test where the system waits for just the tester car at some time during the test execution and waits for the tester pedestrian at another time during the test execution. We follow the approach detailed in Section 4.3.1 to generate the auxiliary graph for this example, with the terminal states corresponding to a successful left turn through the intersection after satisfying the temporally constrained merged test specification. The graph for this example is illustrated in Figure 3, with and being the subscripts for the first and second unit test specification. We then generate the test policy filter by constructing a partial order for the goal states and synthesizing the winning sets with the receding horizon strategy detailed in Section 4.3. Finally, applying this test filter on MCTS to find the test policy. Figure 6 shows snapshots from a test execution resulting from a test policy generated by Algorithm 1. As expected, we see the system first waiting for the tester car to pass the intersection. Even after the tester car passes, the pedestrian is still traversing the crosswalk, causing the system to wait for the pedestrian, satisfying the temporally constrained merged test specification.

6 Conclusion and Future Work

In this work, we presented a framework for merging unit test specifications. While we applied this framework to two discrete-state examples in the self-driving domain, this framework can be applied to test other autonomous systems as well. This paper details the mathematical and algorithmic foundation for merging two unit tests. This technique could be used as a subroutine to optimize for a small set of tests that cover several unit specifications. The winning set structure of the unit specifications could be leveraged to decide which unit specifications should be merged. The scalability of our algorithm can be further improved by symbolic implementations to synthesize the test policy filter. Lastly, we would like to show the results of this framework on continuous dynamical systems with a discrete abstraction for which the test policy filter can be synthesized.


We thank Dr. Ioannis Filippidis, Dr. Tichakorn Wongpiromsarn, Íñigo Íncer Romeo, Dr. Qiming Zhao, Dr. Michel Ingham, and Dr. Karena Cai for valuable discussions that helped shape this work. The authors acknowledge funding from AFOSR Test and Evaluation program, grant FA9550-19-1-0302 and National Science Foundation award CNS-1932091.


  • [1] H. Abbas, G. Fainekos, S. Sankaranarayanan, F. Ivančić, and A. Gupta (2013) Probabilistic temporal logic falsification of cyber-physical systems. ACM Transactions on Embedded Computing Systems (TECS) 12 (2s), pp. 1–30. Cited by: §1.
  • [2] Y. Annpureddy, C. Liu, G. Fainekos, and S. Sankaranarayanan (2011) S-taliro: a tool for temporal logic falsification for hybrid systems. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pp. 254–257. Cited by: §1.
  • [3] C. Baier and J. Katoen (2008) Principles of Model Checking. MIT press. Cited by: Definition 1.
  • [4] A. Benveniste, B. Caillaud, A. Ferrari, L. Mangeruca, R. Passerone, and C. Sofronis (2007) Multiple viewpoint contract-based specification and design. In International Symposium on Formal Methods for Components and Objects, pp. 200–225. Cited by: Definition 2.
  • [5] A. Benveniste, B. Caillaud, D. Nickovic, R. Passerone, J. Raclet, P. Reinkemeier, A. L. Sangiovanni-Vincentelli, W. Damm, T. A. Henzinger, K. G. Larsen, et al. (2018) Contracts for system design. Foundations and Trends in Electronic Design Automation 12 (2-3), pp. 124–400. Cited by: §2.0.1, §4.1.
  • [6] R. Bloem, B. Könighofer, R. Könighofer, and C. Wang (2015) Shield synthesis. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pp. 533–548. Cited by: §4.3.3.
  • [7] E. W. Dijkstra (1975) Guarded commands, nondeterminacy and formal derivation of programs. Communications of the ACM 18 (8), pp. 453–457. Cited by: §2.0.1.
  • [8] T. Dreossi, A. Donzé, and S. A. Seshia (2019)

    Compositional falsification of cyber-physical systems with machine learning components


    Journal of Automated Reasoning

    63 (4), pp. 1031–1053.
    Cited by: