A Mutation-based Approach for Assessing Weight Coverage of a Path Planner

10/02/2019 ∙ by Thomas Laurent, et al. ∙ 0

Autonomous cars are subjected to several different kind of inputs (other cars, road structure, etc.) and, therefore, testing the car under all possible conditions is impossible. To tackle this problem, scenario-based testing for automated driving defines categories of different scenarios that should be covered. Although this kind of coverage is a necessary condition, it still does not guarantee that any possible behaviour of the autonomous car is tested. In this paper, we consider the path planner of an autonomous car that decides, at each timestep, the short-term path to follow in the next few seconds; such decision is done by using a weighted cost function that considers different aspects (safety, comfort, etc.). In order to assess whether all the possible decisions that can be taken by the path planner are covered by a given test suite T, we propose a mutation-based approach that mutates the weights of the cost function and then checks if at least one scenario of T kills the mutant. Preliminary experiments on a manually designed test suite show that some weights are easier to cover as they consider aspects that more likely occur in a scenario, and that more complicated scenarios (that generate more complex paths) are those that allow to cover more weights.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Automated driving is a technology currently being intensely developed, that promises to impact our lives in many ways. Application for automated vehicles range from transport of goods (automated freight) to personal mobility, with offers such as Tesla’s or Uber’s. As promising as the technology is, great care must be taken in evaluating and validating such systems, to avoid tragic accidents happening [13, 8].

Testing automated driving systems is critical for the satisfaction and safety of all stakeholders, but is also a very expensive operation. Therefore, it is essential to know when a system has been sufficiently tested. This is the question we focus on in this paper.

An autonomous driving system can be seen as a set of components that sense the environment of the vehicle, choose a path given an itinerary, and implement this path into concrete actions performed by actuators. In this paper, we consider a path planner component provided by our industry partner, which computes the best trajectory for the vehicle given a target destination. At every timestep, the path planner decides the short-term path that the car should follow in the next few seconds and the control commands that must be provided to implement it (such as acceleration and angle); in order to decide the next short-term path, an optimisation algorithm is employed. A set of short-term paths starting from the head of the car to a grid of points in front of the car are enumerated. This is shown in Fig. 1 where the white car is the ego car, the red car another, immobile, car. The translucent cars represent the possible future positions sampled by the path planner and the blue arrows the associated short-term paths.

Fig. 1: Illustration of the path planner.

Then, each short-term path is evaluated according to a cost function. The cost function considers different aspects, such as safety, vehicle limitation, regulation compliance, and comfort. Given the ranking of all the short-term paths, the one with the lowest cost is taken.

The path planner is tested using a simulator. The simulator takes as input a path planner and a scenario –a road configuration, a starting position, direction and speed for the automated car (or ego car), and for other objects on the road– and runs the path planner in this particular scenario, computing the path the ego car would take. Evaluating the output of the simulation, i.e., a path, is not done with a pass/fail oracle, as there is no comprehensive definition of what a valid path is. The presence or absence of a crash, for example, is not a good enough oracle: one can drive badly and not crash, or can take the best possible decision and still experience an unavoidable crash. Instead, we can define some metrics to capture some measures of interest (e.g., minimum distance of the ego car with all the other objects along the path); these metrics can be used to evaluate a path or compare paths computed by different path planners for a same scenario.

In testing, coverage criteria are used to evaluate the quality of a test suite , i.e., if is sufficient for testing the System Under Test. For example, classical structural criteria check that the code has been covered sufficiently. In scenario-based testing [10] for automated driving (as done with our simulator), main approaches aim at covering all different traffic situations (e.g., number and positions of the other cars), and manoeuvres done by the ego car [5]. To support this kind of testing, ontologies regarding driving behaviours, road topologies, environmental conditions, etc. have been devised (see the eight documents in [3]). Although such kind of coverage is also necessary for our path planner, it still does not guarantee coverage of all possible behaviours of the path planner.

Therefore, in this paper, we propose a definition of what it means to sufficiently test a path planner. At a very high level, we want to check that all the possible decisions that can be taken by the path planner are observed in at least one test. This is difficult because, in general, we do not know which scenarios lead to some given decisions; this is why the coverage of scenario elements (as in [5]) may be not sufficient and so we need a different criterion. Moreover, we do not even have a proper characterisation of the different decisions, i.e., given two short-term paths computed by the path planner, we cannot say if they have been taken for the same reasons (i.e., for respecting the same aspects).

Although it is impossible to evaluate if all possible decisions are covered in a direct manner, we propose an indirect way of assessing this. The path planner can be seen as a weighted function of the different aspects listed before (i.e., safety, comfort, etc.). For each aspect, the path planner has one or more weight(s) set by the system designer. Such weights represent how important that aspect is in selecting a short-term path (we call this selection a decision). We claim that a minimal condition for testing the path planner is that each weight is shown to be “relevant” in at least one decision in one test. We say that a weight is covered by a scenario (i.e., that is “relevant” to at least a decision taken for ) if using a different weight value in the path planner, the computed path is different from the path computed with the original weight. Indeed, if changing the weight in all possible ways does not affect any decision, it means that the aspect considered by is irrelevant in that scenario.

Since trying all the possible alternative weights is infeasible (as their number is very large), we propose a mutation-based approach [11]

that is able to estimate the

weight coverage of a given test suite . The approach consists in mutating a weight with a finite set of mutation operators; is considered covered by a test suite if, in at least one test in , the path computed by the mutated path planner is different from the path computed by original path planner, according to a mutation oracle. We propose three mutation oracles that provide different guarantees in terms of coverage: the path oracle simply compares the two paths point-wise, the safety oracle compares the minimum distance of the ego car w.r.t. the other objects in the two paths, and the comfort oracle compares the smoothness of the paths. Note that some mutation oracles, as the path oracle, are more likely to say that two paths are different, while others, as the comfort oracle, are more demanding and require bigger differences in order to consider two paths different; in general, stronger mutation oracles provide stronger guarantees that two paths are significantly different [11].

The rest of this paper is structured as follows: Section II introduces some core definitions, and Section III presents our definition of weight coverage and a mutation-based approach to estimate it. Section IV presents some experiments we performed to evaluate our approach and discusses their results. Section V further discusses some insights from the experiments, and Section VI tackles some threats that could affect the validity of the proposed approach. Finally, Section VII reviews some related work, and Section VIII concludes the paper.

Ii Definitions

In the following, we provide some definitions related to the path planner and its simulator.

Definition 1 (Scenario).

A scenario describes the environment in which the ego car is operating. It is constituted of:

  • a map describing the road structure:

  • an initial position, speed, acceleration, and direction of the ego car;

  • a target destination of the ego car;

  • a set of static objects ; each static object is characterised by its position in the map, and its size (length and width);

  • a set of dynamic objects ; each dynamic object, in addition to position and size, is also characterised by its initial speed, acceleration, and direction;

  • a timeout ; the scenario must be run until time .

We will use the dot notation (e.g., ) to access a particular field of a scenario.

For the sake of conciseness, in the following, we consider static objects as dynamic objects having no velocity and no acceleration.

Definition 2 (Path).

A path is a sequence of tuples , where each tuple identifies a timestamp , a location in the map, a direction , a speed , and an acceleration . We use the dot notation to access tuple fields at a given time (e.g., ).

Note that, for each dynamic object do of a scenario , we can automatically compute its path up to the timeout 111In the path planner simulator we are using, the behaviour of dynamic objects does not depend on the current situation, but only on the initial conditions specified in the scenario. For this reason, we can compute the path offline.; we will write , where and .

Definition 3 (Path Planner).

A path planner PP can be seen as a function that, given a scenario , produces a path for the ego car up to simulation time , formally, . We name each pair of consecutive tuples (with ) as short-term path: it corresponds to a decision taken by the path planner.

Ii-a Evaluation metrics (path quality metrics)

Given a scenario , we can define different metrics characterising the whole path computed by the path planner. In the following, let be the path computed by the path planner for the ego car, and , …, be the paths of the dynamic objects .

Safety metric

The first metric provides a quantitative evaluation of how safe the chosen path is. It is defined in terms of minimum distance between the ego car and any other object along the path as follows

where is the Euclidean distance between two points.

Comfort metric

This metric assesses how comfortable the path has been for the driver. It is defined as maximum acceleration along the path:

Note that other comfort metrics could be defined in terms of, e.g., maximum torque or maximum lateral acceleration.

Iii Proposed approach

In this paper, we are interested in defining sufficiency criteria for path planner testing. A path planner can take different decisions on the base of the different environmental and driving conditions in which it is operating; we would like to check that all these possible decisions that can be taken by the path planner are observed in at least one test. However, we do not precisely know which scenarios cause a particular decision. Moreover, we cannot even characterise all the possible decisions taken by the path planner, i.e., given two decisions (two short-term paths computed by the path planner) we do not know if they have been selected for the same reason. However, we can exploit the architecture of the particular path planner under test in order to create a proxy for these decisions. In this section, we describe how we propose to do this using a mutation-based approach.

Iii-a Path planner under test

The path planner provided by our industrial partner works as follows. At each timestep, it chooses which short-term path to follow in the next time period (see Def. 3). In order to do this, it enumerates a set of possible short-term paths, and scores them using a weighted cost function that considers different aspects:

  • Safety: no collision with moving or static objects must happen and safety distances must be respected;

  • Vehicle Limitation: actions that cannot be achieved by the car must be avoided (e.g., no impossible steering can be required to follow a path);

  • Compliance: the car should respect road regulations as much as possible;

  • Comfort: the path should be as comfortable as possible for the passenger, avoiding too much forward and/or lateral acceleration.

In particular, the cost function uses these weights for the different aspects:

  • : a factor that is multiplied with the maximum lateral acceleration along the short-term path;

  • : a constant that is added to the total cost if the maximum lateral acceleration is over a given threshold;

  • : a constant that is added to the total cost if the speed is over a given speed limit;

  • : a constant that is added to total cost if the maximum acceleration along the short-term path is over a certain threshold;

  • : a constant that is added to the total cost if the maximum deceleration along the short-term path is over a given threshold;

  • : a constant that is added to the total cost if the curvature along the short-term path is over a given threshold.

As such, the different decisions that the system can take are tightly dictated by these weights. Note that weights , , , , and are related to the safety aspect; weights , , and , are related to the comfort aspect; weight is related to the compliance aspect; finally, , , and are related to the vehicle limitation aspect. A weight can be associated with more than one aspect, e.g., is associated to the safety and comfort aspects, and to all the aspects.

Our industrial partner provided us with a version of the path planner that has been calibrated with a satisfactory set of weight values for . In the following, we identify with the path planner configured with weight values .

Iii-B Weight coverage

Since the weights are strictly bound to the aspects that are considered in the decisions, we propose to map the coverage of the possible decisions with the coverage of the weights used to make the decisions.

Therefore, in this section, we propose a way to assess whether a weight is involved in a decision and, in section III-C, a technique for measuring the sufficiency of a given test suite in testing the weights .

Definition 4 (Weight coverage criterion).

Given a path planner with weights , a test scenario covers a weight w.r.t. a metric , if there exists a weight such that with .

Intuitively, a test scenario covers a weight if, with another value of the weight, the path planner behaves differently according to metric . A good test suite should then cover all weights .

Note that weight coverage has similarities with the MC/DC coverage criterion [2] for Boolean expressions in which each clause must be shown to determine the value of the global predicate in a test: given an assignment of truth values, a clause determines the value of the global predicate if flipping the value of changes the value of . In our case, for each weight , we want to have a test in which the aspect considered by has some influence on the final decision taken by the path planner; we want to show that by modifying the weight in some way we can also modify the decision.

Remark 1.

The path planner, in order to decide the next short-term path in a scenario , assigns a numerical cost to a set of possible short-term paths , using a cost function that depends on the weights ; then, it selects the candidate with the lowest cost. Changing a weight in will change the cost of a given short-term path from to . If the weight considers an aspect that is relevant for the scenario , will be different for the different short-term paths and so their ranking could be modified (and so the final decision). Instead, if weight considers an aspect that is irrelevant for the scenario , the costs of all the possible short-term paths will be modified by a same value ; therefore, the rank of the possible short-term paths will not be affected and the same short-term path (i.e., the one selected with the original weights) will be selected as final decision.

Iii-C Mutation-based approximation of weight coverage

As we can not exhaustively evaluate the weight coverage of a test suite (the weights having continuous values), we propose a mutation-based approach to estimate whether or not covers the different weights.

In the following, we describe the mutation operators we use to generate mutants, some oracles that we use to assess whether a test kills a mutant, and finally how we use these for estimating weight coverage.

Iii-C1 Mutation operators

In this work, we are only concerned with the coverage of the test suite w.r.t. each individual weight . Thus, we propose a simple mutation operator: each mutant differs only in the value of a weight , which is multiplied by a constant , i.e., . In order to explore different ranges for each weight, we use the following values of : 0, 0.5, 0.9, 1.1, 1.5, 2, 10. This leads to seven versions of the operator that we refer to as . These factors were chosen to sample the space of possible weight values. In particular, 0 and 10 show extreme changes, 0 completely cancelling the effect of a weight. The other values of let us explore the effect of different scales of change to the weight values.

In the following, we identify with the path planner obtained from by mutating weight with mutation operator .

Remark 2.

Note that our mutation operators are not meant to be related to some fault-classes as in classical mutation analysis, i.e., they are not meant to replicate some possible faults. They are used to artificially perturbate the path planner, such that it possibly takes different decisions due to the mutated weight. As future work, we could design more targeted mutation operators, based on system and domain knowledge.

Iii-C2 Mutation oracles

In order to assess whether a mutant has been killed, we need to compare the paths computed by the original path planner and the mutated one . In the following, given a scenario , let and be two paths computed by the two path planners, and , …, be the paths of the dynamic objects . The mutated path planner is considered killed by if and are sufficiently different.

In order to assess this, we can use different mutation oracles that differ in the characteristics of the paths they consider (e.g., safety or comfort). We devised the following oracles, defined as predicate that tells whether a scenario kills the path planner obtained by mutating weight with mutation operator .

  • Given a threshold , the mutated path planner is considered killed if there is a timestep in which the difference in the position of the ego car in the two paths is greater than , i.e.,

    where is the Euclidean distance.

  • Given a threshold , the mutated path planner is killed if the difference of the minimum distances (with the dynamic objects) of the two paths is greater than , i.e.,

  • Given a threshold , the mutated path planner is killed if the difference of the comfort measure in the two paths is greater than , i.e.,

Thresholds , , and must be selected by the domain expert who can tune how much difference must be observed in order to declare a mutant killed.

The different measures assess different aspects of the system but are also more or less “strict”. For example, the Path Oracle is the “easiest” to kill, and should be subsumed by the other oracles (for equivalent thresholds). This idea can be somehow related to the idea of weak and strong mutation testing [11] in classic software testing, where weak mutation measures changes in the internal state of the program caused by mutants, while strong mutation considers only changes to output.

Iii-C3 Estimating weight coverage

We can now describe a way to estimate weight coverage. From an original valuation of weights for the path planner PP, we create mutants by applying the mutation operators described before, by changing the value of each weight in in turn, thereby obtaining mutated versions of the path planner . We then run against each and determine which mutants are killed by according to our different oracles. Finally, following our initial definition of weight coverage (see Def. 4), we estimate that a weight is covered w.r.t. a metric (with ) if one of its mutants is killed in a scenario, i.e.,

(1)

By Def. 4, if holds, then is also covered for the weight coverage criterion. If does not hold, we can estimate that weight coverage does not hold as well, assuming that the mutation operators are a good proxy of all the possible weight changes (see Sect. VI for a more detailed discussion on this point). In the remainder of the text, we use the phrase weight coverage for both the coverage and its mutation-based approximation.

Iv Experiments

In order to evaluate the approach, we designed a test suite composed of 10 scenarios, whose description is reported in Table I.

ID Description
The ego car is proceeding on a lane and two dynamic objects cross the street closely in front of it.
The ego car is proceeding on a lane, following a slowing dynamic object and with a faster dynamic object coming from behind.
The ego car is proceeding on a lane and a dynamic object is proceeding in the different direction on a different lane.
The ego car is proceeding on a lane, encounters a parked car, and overtakes it.
Similar to , but there is another car coming from the opposite direction. The ego car has enough time to overtake the parked car before the other car arrives.
Similar to , but the ego car must let the other car pass before overtaking the parked car (as there is not enough time before).
At a crossing, the ego car must turn right, while a dynamic object crosses the intersection from the opposite direction. The ego car must let the object pass before turning.
At a crossing, the ego car must turn right, while a dynamic object is approaching the intersection from right. The ego car must slow down and let the dynamic object pass.
At a crossing, the ego turns right, and, just after the turn, it encounters a dynamic object coming against the flow of traffic in its target lane.
The ego car is approaching from behind a dynamic object that is slowing down.
TABLE I: Description of the test suite scenarios

Note that the path planner is designed to work in countries as Ireland and Japan that adopt the left-hand traffic practice. While designing the test suite, we tried to cover different kinds of manoeuvres (e.g., going straight, overtaking a parked car, turning at a crossroad, giving precedence to another car, etc.). All the scenarios have been designed manually, except for scenario that has been found using a search algorithm with the aim of having a dangerous situation.

Then, we mutated the six weights of the original path planner (see Sect. III-A) using the seven mutation operators described in Sect. III-C1. Therefore, in total we have mutated versions of the path planner; as before, we identify with the path planner obtained from by mutating weight with mutation operator .

We then ran the designed test suite on the original path planner and the 42 mutated versions ; we collected all the produced paths and computed the mutation oracles as specified in Sect. III-C2. For the experiments, thresholds , , and of the mutation oracles have been set to 0: in this way, it is easier to compare the killing strength of each oracle.

We evaluated the approach using four research questions.

  • What is the weight coverage of the designed test suite?

We are interested in assessing how much the designed test suite covers the path planner weights. Table II reports, for each weight , its coverage (either True or False) according to the three mutation oracles (see Eq. 1); we highlight in grey the covered cases.

Weight Mutation oracle
PO SO CO
T T T
T T T
T F F
T T T
T T T
T F F
TABLE II: Weight coverage (T: covered, F: not covered)

Weights , , , and are covered by the test suite with all the mutation oracles; these weights are all related to (lateral) acceleration/deceleration. The fact that they are all covered means that the test suite contains tests in which the acceleration has some effect on the decision taken by the path planner. Moreover, they are covered not only with the path oracle (that is a weak criterion for declaring a mutant killed), but also with the safety and comfort oracles, that are more demanding: this means that the killed mutants change both the minimum distance with the other dynamic objects (considered in the safety oracle) and the maximum acceleration (considered in the comfort oracle).

Weights and (related to the speed limit and sudden change of direction), instead, are only covered by the path oracle. This means that, although the mutants can slightly change the taken path, they do not affect the minimum distance with the other cars and the maximum speed.

  • What is the weight coverage provided by each single scenario?

We want to conduct a deeper analysis on the coverage provided by each single scenario. Table III reports, for the three mutation oracles, whether a given scenario covers a given weight.

s Weight Count
T T T T F F 4/6
F T F T F T 3/6
F F F F F F 0/6
T T F F F F 2/6
T T F T T F 4/6
F T F T T F 3/6
T T F T T F 4/6
T T F T T F 4/6
T T F T T F 4/6
T F F T T F 3/6
Count 7/10 8/10 1/10 8/10 6/10 1/10
(a) Mutation oracle PO
s Weight Count
F T F F F F 1/6
F T F F F F 1/6
F F F F F F 0/6
T F F F F F 1/6
T T F T T F 4/6
F T F F T F 2/6
T T F T T F 4/6
F F F F T F 1/6
T T F F F F 2/6
T F F T T F 3/6
Count 5/10 6/10 0/10 3/10 5/10 0/10
(b) Mutation oracle SO
s Weight Count
F F F F F F 0/6
F T F F F F 1/6
F F F F F F 0/6
F F F F F F 0/6
F F F F F F 0/6
F F F F F F 0/6
T T F T F F 3/6
F F F F F F 0/6
F F F F F F 0/6
F F F F T F 1/6
Count 1/10 2/10 0/10 1/10 1/10 0/10
(c) Mutation oracle CO
TABLE III: Weight coverage by scenario (T: covered, F: not covered)

Considering mutation oracle PO, we observe that scenarios , , , , and cover more than half of the weights (4/6); these scenarios are among the most complicated ones (see the description in Table I), in which different aspects must be taken into consideration; this also partially holds for scenarios , , and that cover half of the weights. Scenario does not cover any weight, as it simply describes a situation in which the ego car is going straight, and not too many factors influence the decision of the path planner in this case.

Regarding the mutation oracle SO, in general, scenarios kill fewer weights than what done with the mutation oracle PO: this is expected, as SO subsumes PO (i.e., if the minimum distance is different, the path must be different as well, but not the other way round). However, scenarios , , and kill the same weights with the two oracles; this means that, for these scenarios, the mutants always lead to a different path in which the minimum distance with the dynamic objects is affected (either smaller or larger). Indeed, using the original path planner, the ego car gets quite close to the dynamic objects, and so it is reasonable that any change in the path affects also the minimum distance.

Regarding the mutation oracle CO, only three scenarios kill some weight. Scenario achieves the highest coverage, killing half of the weights; this is due to the fact that the change of the weights leads to either a greater maximum acceleration to cross before the incoming car, or a lower maximum acceleration to let the other car pass, depending on the mutants (see scenario description in Table I).

  • How many scenarios cover each weight?

We now want to assess how easy it is to cover a weight; we assume that the more scenarios cover a weight, the easier it is to cover it. The last rows of the tables in Table III report the count of scenarios covering a given weight.

Using the mutation oracle PO, we observe that , , and are the weights that are easier to cover. Indeed, they are all related to lateral/normal acceleration and very likely a decision of the path planner depends on the acceleration (and so a perturbation of the weights changes the computed path).

Instead, weight (related to the violation of the speed limit) is only covered by scenario in which the ego car is close to collision with two other dynamic objects. We further observe that, for , only mutant , in which the constraint on the speed limit is completely removed, is killed: in this way, the path planner can compute an even safer (and so different) path that avoids the dynamic objects faster.

Also weight (related to sudden change of direction) is only covered by a single scenario, namely . In the ego car is approaching a slowing car and is followed by a fast car that is approaching its back: by relaxing the constraint on the sudden change of lane, the path planner can compute a different and safer path.

Observations similar to those done for mutation oracle PO can also be done for mutation oracle SO. We only observe that is no more covered by : this means that, although the mutated path planner can compute a different path, the minimum distance to the other dynamic objects remains the same (the mutated path planner can simply exit from the dangerous situation faster, as the constraint on the speed limit has been relaxed). In the same way, scenario no longer covers weight : the mutated path planner can avoid the dangerous situation with a more sudden action (because the weight is relaxed) but it reaches the same minimum distance as the original path planner.

As we already observed in RQ2, the comfort oracle CO is highly demanding and it is difficult to kill mutants with this oracle (it is only possible by obtaining a path with a different maximum acceleration). As expected, only the weights related to acceleration (i.e., , , , and ) can be covered by at least one scenario.

  • What is the weight coverage provided by each mutation operator?

We are interested in assessing which mutation operators produce mutants that are easier to kill. Table IV reports, for the three mutation oracles, whether a given mutation operator (we report the constant used in the operator) produces a mutated path planner that is covered (for at least one scenario of the test suite ).

Weight Count
0 T T T T T F 5/6
0.5 T F F T T F 3/6
0.9 F F F T F F 1/6
1.1 F F F F T F 1/6
1.5 F T F T T F 3/6
2 F T F T T F 3/6
10 T T F T T T 5/6
Count 3/7 4/7 1/7 6/7 6/7 1/7
(a) Mutation oracle PO
Weight Count
0 T T F T T F 4/6
0.5 F F F T T F 3/6
0.9 F F F T F F 1/6
1.1 F F F F T F 1/6
1.5 F T F F T F 2/6
2 F T F F T F 2/6
10 T T F T T F 4/6
Count 2/7 4/7 0/7 4/7 6/7 0/7
(b) Mutation oracle SO
Weight Count
0 F T F T T F 3/6
0.5 F F F F F F 0/6
0.9 F F F F F F 0/6
1.1 F F F F F F 0/6
1.5 F F F F F F 0/6
2 F T F F F F 1/6
10 T T F F T F 3/6
Count 1/7 3/7 0/7 1/7 2/7 0/7
(c) Mutation oracle CO
TABLE IV: Weight coverage by mutation operator (T: covered, F: not covered)

For all the mutation oracles, coverage is correlated with the degree of change of the weight: mutation operators that change the weight significantly (i.e., 0 and 10) are those that cover the most (5 out of 6 weights), while weaker mutation operators (i.e., 0.9 and 1.1) cover less. For mutation oracle CO, only mutants with equal to 0, 2, or 10, can lead to the coverage of at least one weight.

The results in Table IV also provide some insights on the weights themselves. Let’s consider the results of the mutation oracle PO in Table IV(a). We observe that some weights such as and are covered with almost any mutation operator: this means that the weight is important in the decision making of the path planner and thus it is sensitive to small changes. On the other hand, if a system designer knew their test suite is strong, but a weight is not covered, this could show that the weight has no influence on the decision making, and could highlight a fault in the system or its design.

V Discussion

We now provide more general observations about the proposed approach.

The first observation is related to the coverage of a weight. If a weight is never covered in a test suite, it could mean that either the test suite is not complete enough to cover , or is never relevant in the decisions taken by the path planner. In the former case, we would just need to add some scenario trying to cover ; in the latter case, we would need to mark as an infeasible test requirement and we could report a problem in the path planner. However, detecting infeasible test requirements is in general undecidable.

Another observation is related to the completeness of the mutation-based approach. In order to approximate weight coverage of a weight , we propose to use a set of seven mutants where is modified using seven constants of different scales. It could be that is covered by a test suite according to weight coverage (see Def. 4), but not using the mutation-based approximation. However, we believe that this does not affect the general conclusions of our experiments regarding the relations between the scenarios and the weights: it is unlikely that, given two scenarios that do not cover any weight with the mutation-based approach, there are some other changes of the weights (not considered by the mutants) that cover one scenario and not the other. Indeed, the path planner considered in this work uses a linear cost function. For a new value to change the result of a test when no mutant does, it would then need to induce a greater change than the mutants, which have already been designed to induce significant changes to the weights. Still, if such a case occurred, one could ponder the significance of such a coverage: does a scenario meaningfully cover a weight, if for the decision of the path planner in this scenario to change one must introduce massive change to the weight?

A final observation is related to the mutation oracles. We can note that, although the path oracle is a very weak criterion, it is still useful to decide whether a scenario should be kept in the test suite: if a scenario cannot kill any weight even with regards to the path oracle, it means that it is not challenging the path planner at all and should be removed from the test suite (as scenario . See Table III(a)).

Vi Threats to validity

We identify these threats to the validity of the approach.

A threat to external validity [15] is that the approach may not be generalizable to other systems. As this is a project driven by a collaboration with an industrial partner, the solution has the risk to be too domain specific. First of all, we want to point out that, in some cases, a solution to a given problem is necessarily domain-specific and trying to achieve generability could also be counterproductive [1]. Moreover, we still believe that the approach could be applied to other systems similar to the path planner, i.e., systems solving some optimization problems using some weights to consider different aspects. As future work, we plan to evaluate whether the approach is generalizable to a broader class of systems.

A threat to internal validity [15] could be that our mutation-based approach could be faulty and so the obtained results would be not meaningful. In order to mitigate this threat, we checked that the mutated scenarios are syntactically correct and that they are parsed correctly by the path planner simulator; moreover, we assessed that the mutation oracles are implemented correctly by verifying that some known relations between them hold: for example, if the path oracle is 0, the other two oracles must be 0 as well.

Vii Related work

In this section we review some related work concerning testing of, and testing criteria for automated driving systems, as well as non-conventional applications of mutation analysis.

Testing of autonomous driving systems is a complex issue that includes many challenges, as highlighted by Koopman and Wagner in [7]. In [14], Wachenfeld and Winner show that it is infeasible to test autonomous driving systems only using real life test drives. Indeed, they show that, according to German highway driving data, one would have to drive 6.61 billion kilometers in order to encounter some fatal scenarios, i.e., the scenarios that should be most critically tested. Zhao and Peng [16] and Helmer et al. [6] arrive at similar conclusions, stating that billions of kilometers should be driven to achieve sufficient testing guarantees.

As such, many works [10, 12, 4, 6] focus on using simulation and particular scenarios to test autonomous driving systems, this is the situation we are in in the context of this work. In this context, the question of test sufficiency, or of a test stopping criterion becomes essential. Indeed, as Hauer at al. remark in [5], “One can always come up with another scenario type as well as with instances of those types that are different from the types and instances used before”, which means that we need a criterion to know when our test data has covered all plausible situations. Our work not only focuses on a test ending criterion but on a more general testing criterion that lets us evaluate how much of the system’s decision space a test suite covers, rather than how much of the possible scenarios have been covered, as different scenarios could lead to the same decisions.

Mutation analysis has been applied to diverse domains [11]

, and recently to deep neural networks (DNNs). DNNs have the same characteristic as the path planner, in that their behavior is governed by computed numerical values, rather than logical branches, and that their correctness is evaluated by some metrics (e.g., accuracy) rather than with pass/fail tests. A mutation analysis method for DNNs has been proposed that considers mutations on training data, training code, and trained models 

[9]. The mutation score evaluates whether each mutation changes correct classification into misclassification in some test data. Our proposal works with the more complex situation of path planner. Although the mutation targets (weights) are also continuous values, we deal with complex oracles and multiple evaluation criteria, instead of the binary problem of misclassification.

Viii Conclusions

In this paper, we proposed a mutation-based approach for assessing whether all the possible decisions that can be taken by the path planner of an autonomous car are covered in a test suite (each test is a scenario). The path planner we consider makes decisions by using a weighted function of different aspects (safety, comfort, etc.). The approach consists in mutating the weights and checking whether the test suite is able to kill the mutant. The approach has been experimented on a manually designed test suite; we observed that some weights are easier to cover as they consider aspects that occur more often in a scenario. Moreover, more complicated scenarios that generate more complex paths are those that allow coverage of more weights. We believe that these preliminary results confirm our intuition that the proposed coverage criterion is reasonable. However, more rigorous and systematic evaluation is needed: as future work, we plan to perform a wider set of experiments using different test suites, automatically generated and manually designed. Moreover, we plan to assess whether weight coverage correlates with good fault detection.

Finally, we believe that the proposed approach is not only applicable to path planners, but to any optimisation program that relies on a weighted function; as future work, we plan to give a more general definition of the weight coverage criterion, and experiment it on a wider class of systems.

References

  • [1] L. C. Briand, D. Bianculli, S. Nejati, F. Pastore, and M. Sabetzadeh (2017) The case for context-driven software engineering research: generalizability is overrated. IEEE Software 34 (5), pp. 72–75. External Links: Document, ISSN 0740-7459 Cited by: §VI.
  • [2] J. J. Chilenski and S. P. Miller (1994-Sep.) Applicability of modified condition/decision coverage to software testing. Software Engineering Journal 9 (5), pp. 193–200. External Links: ISSN 0268-6961 Cited by: §III-B.
  • [3] K. Czarnecki (2018-07) WISE drive: requirements analysis framework for automated driving systems. Technical report Waterloo Intelligent Systems Engineering Lab (WISE), University of Waterloo. Note: https://uwaterloo.ca/waterloo-intelligent-systems-engineering-lab/projects/wise-drive-requirements-analysis-framework-automated-driving Cited by: §I.
  • [4] E. de Gelder and J. Paardekooper (2017-06) Assessment of automated driving systems using real-life scenarios. In 2017 IEEE Intelligent Vehicles Symposium (IV), Vol. , pp. 589–594. External Links: Document, ISSN Cited by: §VII.
  • [5] F. Hauer, T. Schmidt, B. Holzmuller, and A. Pretschner (2019) Did we test all scenarios for automated and autonomous driving systems?. In PrePrint for Proc. of IEEE Intelligent Transportation Systems Conference, Cited by: §I, §I, §VII.
  • [6] T. Helmer, L. Wang, K. Kompass, and R. Kates (2015-Sep.) Safety performance assessment of assisted and automated driving by virtual experiments: stochastic microscopic traffic simulation as knowledge synthesis. In 2015 IEEE 18th International Conference on Intelligent Transportation Systems, Vol. , pp. 2019–2023. External Links: Document, ISSN 2153-0017 Cited by: §VII, §VII.
  • [7] P. Koopman and M. Wagner (2016-04) Challenges in autonomous vehicle testing and validation. SAE Int. J. Trans. Safety 4, pp. 15–24. External Links: Document Cited by: §VII.
  • [8] S. Levin and J. Wong Carrie (2018-03)(Website) Note: Accessed October 3, 2019 External Links: Link Cited by: §I.
  • [9] L. Ma, F. Zhang, J. Sun, M. Xue, B. Li, F. Juefei-Xu, C. Xie, L. Li, Y. Liu, J. Zhao, and Y. Wang (2018-10)

    DeepMutation: mutation testing of deep learning systems

    .
    In 2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE), Vol. , pp. 100–111. External Links: Document, ISSN 2332-6549 Cited by: §VII.
  • [10] T. Menzel, G. Bagschik, and M. Maurer (2018-06) Scenarios for development, test and validation of automated vehicles. In 2018 IEEE Intelligent Vehicles Symposium (IV), Vol. , pp. 1821–1827. External Links: Document, ISSN 1931-0587 Cited by: §I, §VII.
  • [11] M. Papadakis, M. Kintis, J. Zhang, Y. Jia, Y. Le Traon, and M. Harman (2019) Chapter six - mutation testing advances: an analysis and survey. Advances in Computers, Vol. 112, pp. 275–378. External Links: ISSN 0065-2458, Document Cited by: §I, §III-C2, §VII.
  • [12] C. Roesener, F. Fahrenkrog, A. Uhlig, and L. Eckstein (2016-11) A scenario-based assessment approach for automated driving by using time series classification of human-driving behaviour. In 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), Vol. , pp. 1360–1365. External Links: Document, ISSN 2153-0017 Cited by: §VII.
  • [13] J. Stewart (2018-03)(Website) Note: Accessed October 3, 2019 External Links: Link Cited by: §I.
  • [14] W. Wachenfeld and H. Winner (2016) The release of autonomous vehicles. In Autonomous Driving: Technical, Legal and Social Aspects, M. Maurer, J. C. Gerdes, B. Lenz, and H. Winner (Eds.), pp. 425–449. External Links: ISBN 978-3-662-48847-8, Document Cited by: §VII.
  • [15] C. Wohlin, P. Runeson, M. Hst, M. C. Ohlsson, B. Regnell, and A. Wessln (2012) Experimentation in software engineering. Springer Publishing Company, Incorporated. External Links: ISBN 3642290434, 9783642290435 Cited by: §VI, §VI.
  • [16] D. Zhao and H. Peng (2017) From the lab to the street: solving the challenge of accelerating automated vehicle testing. CoRR abs/1707.04792. External Links: 1707.04792 Cited by: §VII.