Log In Sign Up

A Finite-Sampling, Operational Domain Specific, and Provably Unbiased Connected and Automated Vehicle Safety Metric

by   Bowen Weng, et al.

A connected and automated vehicle safety metric determines the performance of a subject vehicle (SV) by analyzing the data involving the interactions among the SV and other dynamic road users and environmental features. When the data set contains only a finite set of samples collected from the naturalistic mixed-traffic driving environment, a metric is expected to generalize the safety assessment outcome from the observed finite samples to the unobserved cases by specifying in what domain the SV is expected to be safe and how safe the SV is, statistically, in that domain. However, to the best of our knowledge, none of the existing safety metrics are able to justify the above properties with an operational domain specific, guaranteed complete, and provably unbiased safety evaluation outcome. In this paper, we propose a novel safety metric that involves the α-shape and the ϵ-almost robustly forward invariant set to characterize the SV's almost safe operable domain and the probability for the SV to remain inside the safe domain indefinitely, respectively. The empirical performance of the proposed method is demonstrated in several different operational design domains through a series of cases covering a variety of fidelity levels (real-world and simulators), driving environments (highway, urban, and intersections), road users (car, truck, and pedestrian), and SV driving behaviors (human driver and self driving algorithms).


page 1

page 8

page 9

page 10


A Formal Safety Characterization of Advanced Driver Assist Systems in the Car-Following Regime with Scenario-Sampling

The capability to follow a lead-vehicle and avoid rear-end collisions is...

Model Predictive Instantaneous Safety Metric for Evaluation of Automated Driving Systems

Vehicles with Automated Driving Systems (ADS) operate in a high-dimensio...

Autonomous Vehicles Meet the Physical World: RSS, Variability, Uncertainty, and Proving Safety (Expanded Version)

The Responsibility-Sensitive Safety (RSS) model offers provable safety f...

Waymo's Safety Methodologies and Safety Readiness Determinations

Waymo's safety methodologies, which draw on well established engineering...

A Formally Verified Fail-Operational Safety Concept for Automated Driving

Modern Automated Driving (AD) systems rely on safety measures to handle ...

Rethink the Adversarial Scenario-based Safety Testing of Robots: the Comparability and Optimal Aggressiveness

This paper studies the class of scenario-based safety testing algorithms...

I Introduction

Advanced Driver Assistance System (ADAS) or Automated Driving System (ADS) equipped Connected and Automated Vehicles (CAVs) operate in a mixed traffic environment with various traffic participants (e.g., pedestrians, cyclists, and different types of vehicles) and environmental disturbances (e.g., road gradients, surface friction, and weather conditions). In general, to ensure the safe performance of a Subject Vehicle (SV) or a fleet of SVs (e.g., a group of CAVs) in the real-world mixed traffic driving environment (also referred to as the naturalistic driving environment in the literature [feng2020testing] and the “nominal driving environment” in the remainder of this paper), one typically follows a two-step procedure with testing and analysis. First, the testing procedure deploys the SV (or a fleet of SVs) in the environment and acquires the traffic interactions and other observable infrastructure information [altekar2021infrastructure] near each SV with a certain data acquisition system. This creates a set of finite observations sampled from the nominal driving environment. Note that the environment can consist of simulated scenes, real-world on-road operation, or controlled testing and proving grounds. Second, the analysis procedure summarizes the safety performance from the finite sampled observations and seeks to generalize the understanding, intuitively or provably, to the non-sampled unobserved cases. In this paper, we only focus on the analysis step, which involves one or more specific safety metrics to which the previously collected data is presented as it stands.

Let’s start from a toy example of one observing a certain SV operating safely (without collisions, human driver engagements, or breaking traffic rules) navigating from the Empire State Building to the Times Square (both are attractions in New York City, United States) for one mile at 6 P.M. on a weekday through a crowd of vehicles, pedestrians, cyclists, and an intersection with traffic lights. A safety measure then seeks to infer the SV’s overall safety performance in the mixed-traffic driving environment from the collected one-mile observation.

The first class of measures are known as leading measures as they “reflect performance, activity, and prevention” [fraade2018measuring], such as infractions (i.e., noncriminal violations of state and local traffic law) [censi2019liability] and disengagements [favaro2018autonomous]. In general, it is expected that the leading measure outcomes from the one-mile trip imply a certain safety property, yet such an implication is mostly intuitive (e.g., the observed engagement rate within one mile does not necessarily hold for the rest of the trip).

On the other hand, the lagging measures are primarily interested in safety outcomes or harm [fraade2018measuring]

. They can be further classified as observed failures, predictive failures, and inferred failures. As a collision is the most well-adopted failure event in the literature, it is considered interchangeable with failure for the remainder of this paper.

The observed failures share the same spirit with many aforementioned leading measures. For example, the observed collision rate in the one-mile trip does not necessarily hold as the vehicle operation proceeds. It can also be expanded from the scalar value measure to a more complex group of collision ratings [schwall2020waymo], yet the above mentioned problem still remains. Second, the predictive failure is often derived by asserting surrogate models and assumptions [bowen2020presentation, weng2021model]. Hence some lagging safety measures are also referred to as surrogate safety measures in the literature [wang2021review]. One well-adopted assumption and the surrogate model is the steady-state assumption (all road users maintain the current velocity and heading) and the linear double integrator dynamics, leading to a series of classic safety measures including time-to-collision (TTC) [lee1976theory] and the minimum safe distance (MSD) based variants [wishart2020driving]. Some recently propose metrics, developed as more complex dynamic and behavioral models are considered, include the Responsibility Sensitive Safety (RSS) [shalev2017formal] based method, Instantaneous Safety Metric (ISM) [every2017novel], criticality metric [junietz2018criticality], and Model Predictive Instantaneous Safety Metric (MPrISM) [weng2020model], which all belong to a class of model predictive safety measures [weng2021model]. Note that many of the lagging measures generalize the finite observations to the non-sampled cases to some extent, but the generalization relies heavily on asserted surrogate models and assumptions, which are mostly invalid in the real-world mixed traffic driving environment [bowen2020presentation, weng2021model].

Finally, in contrast to the predictive failure based lagging measures that generalize the safety assessment to the “imaginary” domain, the statistically inferred failure rate estimate is an unbiased safety assessment generalization from the observed samples to the nominal driving environment. One representative method in this category comes from Fraade et al. 

[fraade2018measuring] using the Monte-Carlo sampling approach, to provide the finite-sampling safety assurance by inferring the SV’s fatality rate estimate from consecutively operating for a certain number of miles safely. If applied to the aforementioned one-mile trip example, with confidence level 90%, the SV has a fatality rate of 90 million fatalities per 100 million miles. Despite the 90% risk being rigorously provable, note that the safety measure outcome is essentially invariant from the mixed traffic environment as one still obtains the same values if the vehicle safely operates on the same route on empty streets at 3 A.M. (i.e., no other traffic objects are present). Note that the importance sampling based technique [ding2011toward] has been shown capable of improving the sampling efficiency of the Monte-Carlo sampling methods. However, the accuracy of the estimated failure rate relies heavily on the accuracy of the estimated importance function, which is not a provable condition in general.

Another line of research on formal safety analysis relies on a model-based approach where one first approximates a certain probabilistic model, parametric or non-parametric, from the observed data and then derives the risk rate estimate [aasljung2019probabilistic], information gain [collin2021plane], and other safety related properties [hejase2020methodology]

using the obtained model. This shares a similar problem with the aforementioned importance sampling based methods as the safety outcome estimate is unbiased only if the approximated model is also unbiased with analytically justifiable variance, which remains as an open challenge to date.

To a certain extent, existing efforts seek to establish a CAV safety measurement that is monotonic w.r.t. the SV’s safety performance (e.g., a lower TTC value indicates a more unsafe SV behavior than a higher TTC value). This is generally true if other variables are controlled properly. One particularly important variable is the SV’s operable domain. As we have discussed before, the leading measures fail to control the domain variables since the generalization is biased. The predictive failure based lagging measures also fail, for while the generalization is provably true in a certain predictive domain, it does not necessarily align with the nominal driving environment. Finally, for the statistically inferred collision rate [fraade2018measuring], the operable domain is invariant as the required total mileage to claim a certain fatality rate with a given confidence level does not change as one moves from the lead-vehicle following domain to a more complex operable domain involving mixed-traffic interactions. Moreover, the particular SV driving behavior also partially affects its operable domain construction. As a result, the notion of one vehicle being safer than the other is mostly problematic as it is essentially a multi-dimensional comparison. This will be demonstrated in detail through a series of examples in Section IV.

In summary, to make a competitive safety measurement for the SV that resolves the various mentioned problems of existing methods, the following two questions need to be jointly and rigorously addressed:

  • Q1: Where (in terms of the operable domain) will the vehicle be statistically safe within the nominal driving environment?

  • Q2: Supplied with a certain operable domain, how safe will the vehicle be within the given domain?

To the best of our knowledge, there does not exist a safety metric that rigorously addresses the above two questions simultaneously.

In this paper, we propose a novel safety metric using the -shape [akkiraju1995alpha] and the -almost set invariance property [weng2021towards, weng2021formal]. Given the driving data collected from a certain testing procedure, the proposed method first rearranges the data to formulate the Operational State Space (OSS) of a multi-agent system that admits measurable states and other non-observable uncertainties. One then characterizes an Operational Design Domain

(ODD) as a subset of the formulated OSS that is “almost” forward invariant for the multi-agent dynamics. As the characterized domain does not intersect with the set of failure events, the SV is also almost safe in the given domain except for an arbitrarily small subset with a certain confidence level. The main contribution of this paper is further summarized as follows.

An operational domain specific safety indicator The proposed method characterizes an operational domain specific set using -shape and other coverage properties, which formally answers question Q1. The effectiveness of the proposed methodology is empirically demonstrated through a group of challenging cases. The study not only includes the classic three-dimensional lead-vehicle following domain, but also considers the challenging vehicle-to-vehicle and vehicle-pedestrian interactions with up to a 17-dimensional state space.

An unbiased safety indicator The -almost robustly forward invariant set is a provably unbiased safety indicator that generalizes the observation from sampled driving data to the unobserved domain. In particular, given a certain confidence level , the probability coefficient answers question Q2 by provably quantifying the performance of the SV statistically within the constructed set. The process does not involve any asserted behavioral assumptions, distribution estimates, or model fitting.

Empirical evaluation The empirical performance of the proposed method is demonstrated in a series of cases covering a variety of fidelity levels (real-world and simulators), driving environments (highway, urban, and intersections), road users (car, truck, and pedestrian), and SV driving behaviors (human driver and self driving algorithms).

I-a Constructions and Notation

Notation: The set of real and positive real numbers are denoted by and , respectively. denotes the set of all positive integers and . The -norm is denoted by . is the cardinality of the set , e.g., for a finite set , denotes the total number of points in . Let be the boundary of the set . Some commonly adopted acronyms are also adopted including i.i.d. (independent and identically distributed), w.r.t. (with respect to), and w.l.o.g. (without loss of generality).

In the remainder of the paper, Section II will present the preliminaries along with formulating the finite-sampling operable domain quantification problem. Section III introduces details of the proposed safety metric. The empirical performance of the proposed metric is demonstrated in Section IV. Section V summarizes the paper and discusses future work of interest.

Ii Preliminaries and Problem Formulation

Ii-a Mixed-Traffic Environment Formulation

Fig. 1: Some illustrative examples of OSS specifications considered by this study: (a) a 13-dimensional multi-vehicle interactive state space of near-by vehicle-only traffics defined over the SV’s local coordinates, (b) a special case to complement the ODD specifications in (a), (c) a 3-dimensional lead-vehicle following state space, (d) a generalization of (c) to non-straight road-segments, (e) a 5-dimensional vehicle-pedestrian interactive state space.

Consider the mixed traffic environment as a time-variant heterogeneous multi-agent system of agents at time where the -th () agent admits the motion dynamics as


with state , disturbances and uncertainties , . Note that the agent is not limited to dynamic road users (e.g. vehicles, pedestrians, cyclists), but can also include other environmental features such as the traffic light color, stop sign position, weather condition, and road surface friction. Let the index denote the test SV. For a fleet of SVs, one can assign the index 0 to each SV iteratively for further analysis as the safety of each individual SV ensures the overall safety of the fleet. In general, the above system can be very complex as the real-world driving environment has very large and varies with respect to time.

The desired Operational Design Domain (ODD) is thus introduced to specify the set within which the SV is expected to operate safely. Formal specifications of the ODD is further derived from an OSS with explicitly defined observable states and implicitly induced disturbances and uncertainties . Some examples of OSS specifications are presented in Fig. 1. This paper is primarily focused on three OSS specifications that are explained as follows.

Ii-A1 The lead-vehicle following domain

This OSS characterizes the lead-vehicle following safety performance. It incorporates all instances from the on-road driving data with a preceding vehicle presented in the same lane with the SV. It is applicable for many ADAS features such as Automatic Emergency Braking (AEB) and Traffic-Jam Assist (TJA). The lead-vehicle following domain is also a commonly studied instance with other domain specifications incorporating time duration [arief2021deep] and assumed hybrid control modes [fan2017d]. In this paper, we consider a more general configuration than the aforementioned references, as the state specification takes the speed of both vehicles () and the bumper-to-bumper distance headway (DHW) () between the two vehicles as the states of interest. The specification is applicable for both straight road segments (Fig. 1(c)) and curved roads (Fig. 1(d)), i.e., the road curvature is considered to be part of , as are other factors such as road gradients, weather condition, and road surface friction.

Ii-A2 The multi-vehicle interaction domain

This OSS defines the SV’s interaction with nearby vehicles. All the position states are represented with respect to the SV’s local coordinates. The near-SV region is divided into 6 subregions: front-left (fl), front-center (fc), front-right (fr), rear-left (rl), rear-center (rc), and rear-right (rr). The left, the center, and the right regions are typically determined by the lane width. Within each region, the nearest vehicle is determined through the center-to-center -norm distance against the SV. Two features of the nearest vehicles are selected by including the bumper-to-bumper longitudinal distance clearance against SV and the vehicle speed . When presented with an alongside vehicle (i.e., part of the vehicles are overlapping longitudinally) on either side of the SV, the bumper-to-bumper distances are set to zero as shown in Fig. 1(b). Combined with the SV’s speed , we have a 13-dimensional state space, i.e., . If a particular subregion is empty or if any of the states falls outside the domain of interest defined by and other given bounds, a fixed low-risk state is assigned (e.g., if the front-center region is empty, we assign and ). To have a valid state , at least one of the six subregions must remain non-empty with a vehicle satisfying the state bounds. The lateral distances are treated as uncertainties as each subregion is limited by the lane width, which already provides certain sideways localization information. Some other examples of disturbances and uncertainties include the presence of other dynamic road users, the road curvature, and different road infrastructures. A similar multi-vehicle configuration is also adopted by other studies for scenario extraction purposes [hauer2020clustering] and driver behavioral modeling [yan2021distributionally].

Ii-A3 The vehicle-pedestrian interaction domain

This OSS is primarily concerned with the SV interacting with pedestrians. Only pedestrians in front of the SV are involved in the specification due to responsibility oriented causes [shalev2017formal]. The front-left corner and the front-right corner of the SV are the reference points. Then, the nearest pedestrian to each reference point in terms of 2-norm distance is considered as the pedestrian of interest. For each pedestrian of interest, its longitudinal offset and lateral offset from the corresponding reference point are selected as part of the states in . Combined with the SV’s velocity , we have the 5-dimensional state space for the vehicle-pedestrian interaction domain.

Note that the above OSSs and possibly other variants can be further combined to formulate various mixed traffic operational environments. For example, combining the multi-vehicle interaction domain with the vehicle-pedestrian interaction domain, results in a -dimensional state space.

Remark 1.

The driving data studied in this paper can be collected from both on-road tests and scenario-based tests, as long as the data collection follows the nominal distribution of the mixed-traffic driving environment within which the SV is being tested.

W.l.o.g., let there be some states from the collected driving data consistent with the given . We then have the primary driving data which comprises finite observations that allow us to implement the safety analysis of the SV’s performance in . Moreover, some of the states are consecutively collected in time w.r.t. the same SV, which is further referred to as a trajectory . In this paper, we often extract state transition pairs from all trajectories in as , where are two consecutively collected states in time w.r.t. the same SV. Note that each state also inherently admits a certain motion dynamics as


In this paper, our focus is to present a safety performance evaluation metric that identifies the real operable domain in a data-driven manner (from

) and identifies its safety-related properties. This is formally presented as the finite-sampling operable domain quantification problems, as we shall introduce in the following section.

Ii-B Finite-sampling Safety Assurance with Set Invariance

Given a certain set , if one continuously observes sampled state transitions staying inside , then the set is potentially forward invariant. To formally quantify such a statistical potential, we introduce the almost forward invariant set adapted from [weng2021towards] as follows.

Definition 1.

[-Almost Robustly Forward Invariant Set] Let , the set is -almost robustly forward invariant for (2) if


To further relate the above definition to the purpose of safety analysis, let be the set of unsafe states such as collisions. Then we have the following definition for the almost safe set.

Definition 2.

[-Almost Safe Set] Let , the SV is -almost safe in if and is -almost robustly forward invariant for (2) by Definition 1.

The problem of interest for this paper is than formally presented as follows.

Problem 1.

[The Finite-Sampling Operable Domain Quantification Problem] Given and a group of observed states , the finite-sampling operable domain quantification problem seeks an algorithm that identifies a certain set and such that with confidence level of at least , the SV is -almost safe in by Definition 2.

The above problem formulation is fundamentally different from many of the existing CAV safety metrics as mentioned in Section I. The desired set is the specific operable domain within which the SV is expected to operate safely. The probability coefficient quantifies the statistical potential of the SV’s safety performance in . The next section discusses details of the proposed algorithm that solves the aforementioned problems.

Iii Finite-Sampling Operable Domain Quantification

The proposed solution to Problem 1 follows a two-step procedure in general including (i) set construction and (ii) set validation. The set construction step seeks to construct a certain set from . Second, as one replays data in , one can then validate the almost forward invariance property of the constructed set through consecutively observing transitions among states in . The derived also relies on the given confidence level defined in Problem 1. For the remainder of this section, we shall address the aforementioned two steps, respectively, in Section III-A and Section III-B. The complete algorithm is summarized in Section III-C.

Iii-a Set Construction with -Shape and Coverage Measures

For the safety evaluation purpose, the constructed set is expected to cover all potentially safe points. This is referred to as and is obtained through Algorithm 1. A series of methods is then proposed to formally construct a set that characterizes the shape and the coverage information of .

2:Initialize: Empty graph
3:Collect all safe trajectories
4:Collect all unsafe trajectories
5:While :
7:    While :
8:        .add
13:    While :
14:        For in do
15:            Reachable
16:            .remove
17:        End For
19:End If
Algorithm 1 Extract all potential safe states from

Note that Reachable returns all vertices on the graph that connects, directly or indirectly, to the given point . In practice, this is achieved through a standard depth-first-search (DFS) routine. Moreover, add and remove are both notation functions where .add adds the edge to the graph , and .remove removes all vertices in from .

We are now ready to construct the potentially safe set from . In this paper, we adopt the -shape [akkiraju1995alpha] to characterize the shape of the desired set. The following definition is standard [alphashape2011].

Definition 3.

Consider a finite set of points . Let an -ball be an open ball with radius . Let be a -simplex for some such that . A -simplex is -exposed if there exists an empty -ball with . An -shape, , of the set satisfies and


i.e., the boundary of the -shape consists of all -simplices of for which are -exposed.

It follows that and is the ordinary convex hull of . The -shape of a finite point set is uniquely determined by and . For any given , the corresponding -shape determination algorithm comes with a time complexity of  [akkiraju1995alpha], where denotes the number of points in , i.e., . In practice, one may also require a certain preferred shape such as a single polytope that wraps with the smallest cardinality. This implies a certain cost function of the -shape with the optimal cost determined by the preferred shape. This is typically performed through a logarithmic search of -shapes by modifying the lower and upper bounds of the tested until the gap between the two bounds becomes sufficiently small [kengithub]. However, with this method the computational complexity also increases.

Remark 2.

In the previous literature of scenario-sampling almost safe set validation [weng2021towards, weng2021formal], the -covering set is adopted to characterize the set construction. However, as indicated by Problem 1, the data set is presented as it is in this study, and one cannot control the scenario sampling to modify the testing procedure or to add more observations for analysis. For sparse data sets, the -covering set tends to have a significant over-approximation. Moreover, the -covering set of the finite set is not unique for all non-zero s. As a result, the -shape is a more flexible solution to handle various levels of data sparsity with a uniquely determined solution for a given the finite set.

For this paper, the finite set being considered is derived from Algorithm 1. Hence the -shape takes the notation as . Finally, to characterize the coverage performance of , we also adopt the following two measures to characterize the density and occupancy as


Note that there also exists other coverage indicators such as the index of dispersion [selby1965index] and the star discrepancy [dang2008sensitive]. However, the index of dispersion is not directly applicable in this paper as is not all positive. The star discrepancy is not selected for computational complexity concerns. More representative coverage metrics are of future interest.

Iii-B Finite-Sampling Almost Robustly Forward Invariant Set Validation

1:Input: A set of state transitions ,
3:While :
5:    If and :
7:    Else
9:    End If
Algorithm 2 Count consecutive safe transitions for validation

We are now ready to characterize how safe the SV is in . Suppose the validation is executed online with a single SV. The data acquisition of and its corresponding are thus collected following a particular time sequence w.r.t. the same SV. At a certain step, if one starts consecutively observing transitions that start and stay inside until the end of the test, one then has statistical evidence to claim the robustly forward invariance property of by Definition 1. This is presented as a validation routine in Algorithm 2.

However, the above described online procedure is no longer applicable for a fleet of SVs deployed in the nominal driving environment test simultaneously. Moreover, a safety metric is primarily used to analyze the safety performance of a system in a post-processing manner. That is, one replays the data set following a certain order of all elements in . For statistical inference, as the set of initializations of all transition pairs are i.i.d. w.r.t. the underlying distribution on , can thus be replayed in any order. In particular, the replay of is formally specified as follows.

Definition 4.

Consider the domain-specific finite set and the corresponding set of all state transitions as presented in Section II. The replay of , , is a permutation of , i.e., a certain rearrangement of all elements in .

It is immediate that the total number of possible replays of is . As long as the probability for each replay order to occur remains the same (i.e., ), the set of initialization of all transition pairs in remains i.i.d. w.r.t. the same underlying distribution on . We can then formally justify the safety performance of the SV in through the following theorem.

Theorem 1.

[-Almost Robustly Forward Invariance Validation] Consider , , the domain-specific finite set and the corresponding set of all state transitions . Let be the set of potentially safe states extracted from through Algorithm 1. Let be the -shape of as specified by Definition 3. For a certain replay of denoted by the index as , let . Then, we have that is -almost robustly forward invariant with confidence level , and


Moreover, is expected to be -almost robustly forward invariant with confidence level and


As , is also an -almost safe set.


For any fixed choice of and a particular data replay , the proof of the -almost robustly forward invariance property is a direct outcome from Theorem 2 in [weng2021towards]. Furthermore, consider

as a random variable and the occurrence probability for each replay is the same, i.e.,

. The expected is thus obtained in the form of (7). Finally, the -almost safe property is a direct outcome of Definition 2. ∎

Fig. 2: Overview of the proposed finite-sampling safe operable domain quantification algorithm.

Iii-C Finite-sampling Operable Domain Quantification Algorithm

So far, we have presented the set construction and set validation steps. The complete algorithm that tackles Problem 1 is summarized in Algorithm 3 and is also conceptually illustrated in Fig. 2.

1:Input: , ,
2:Determine with Algorithm 1
3:Determine given and  [akkiraju1995alpha, kengithub]
4:Determine all pairs of state transitions such that and
6:For in do
10:End For
Algorithm 3 Finite-sampling Operable Domain Quantification

Note that the derivation of is slightly different from Theorem 1 as replays sharing the same value are grouped together to improve the computational performance. The density and occupancy features can also be derived through (5). Given the finite set and the selected , is determined by the standard -shape algorithm [akkiraju1995alpha, kengithub]. In practice, we also use the discussed logarithmic search scheme in Section III-A to determine the appropriate . Implementation details will also be discussed in Section IV.

We conclude this section by emphasizing that the and the obtained -shape are not only embedded with coverage and forward invariance information. The graph induces state transitions that could be used for other safety related applications such as fault tree analysis with backtracking process algorithms [hejase2020methodology, capito2021bpa] and information gain justification [collin2021plane]. The states can also be associated with other safety features available from the raw data such as human driver engagement (e.g., a human may tend to engage within a certain subset of the obtained covering set) and ADAS/ADS signals (e.g., the forward collision warning may only be triggered in a certain subregion). Existing driving data sets collected from real-world and simulators are not comprehensive enough to provide the aforementioned features. Hence, considering those features is regarded as future work.

Iv Case Study

To demonstrate the performance of the proposed safety metric, a series of cases are studied in this section. Detailed configurations are summarized in Table I and explained as follows.

Case HighD data Waymo open data set SUMO Carla NCAP-AEB
(straight highway)
(highway, urban, intersection)
(straight highway)
SV Driver human
Waymo’s self-driving
car (SDC)

lane change heuristics

driver IDM_0 IDM_1 IDM_0 IDM_1
SV Type Car Truck Car Car Car Car
Traffic Type
& Truck
& Truck
& Pedestrian
& Pedestrian
TABLE I: Summary of all real-world data sets and simulators used for the case study section.
HighD data set

The HighD data set [krajewski2018highd] is a data set of naturalistic vehicle trajectories recorded on German highways. The data set comes with a mixture of car and truck drivers operating on straight-road highway segments. It is a well-known fact that naturalistic driving behavior exhibits statistical consensus in general, but also with discrepancies that depend on the vehicle type. This inspires our study in this section by analyzing the human driver safety performance w.r.t. different vehicle types and different ODDs.

Waymo open data

The Waymo open data [sun2020scalability] used in this study is the motion data set, which is primarily used for training and validating traffic motion prediction algorithms. In this study, we redirect the data set to the safety analysis purpose by taking advantage of the motion trajectories recorded for Waymo’s self-driving car (SDC) and the surrounded mixed-traffic road users, especially vehicles and pedestrians.


The Simulation of Urban MObility (SUMO) [SUMO2018]

is an open source, microscopic, and continuous multi-modal traffic simulator. In this study, we compare two parametric self-driving algorithms with the main difference being the Intelligent Driving Model (IDM) hyper-parameters, referred to as IDM_0 and IDM_1. The simulated traffic is created with a variety of vehicles of different dynamics and self-driving configurations operating in a mapped environment with a mixture of highway and urban roads. A fleet of 20 SVs with each parametric policy is then deployed along with the simulated traffic.


The simulated mixed-traffic environment in Carla [dosovitskiy2017carla] involves a fleet of vehicles driven by the default autopilot algorithm along with randomly deployed other vehicles and pedestrians.


The SV algorithms adopted to create this data set are the same as those in the SUMO case. Two parametric IDM algorithms are deployed in a simulated straight-road segment with the lead principal other vehicle (POV) executing the testing policy specified in the NCAP Autonomous Emergency Braking (AEB) car-to-car test program [van2017euro]. The program involves 48 testing scenarios with each scenario executed once.

Remark 3.

To differentiate between IDM_0 and IDM_1, IDM_0 is parameterized with a stronger braking capability but is less willing to take extreme maneuvers for collision avoidance (due to a smaller minimum safe distance and a smaller safe time headway). The IDMs used in NCAP-AEB and SUMO mostly share similar specifications. However, the SUMO simulator also includes other hyper-parameters that may affect the performance, such as the perfectness and the lateral lane change heuristics.

Remark 4.

As a pure data-driven approach, the obtained safety performance evaluations throughout this section are only based on the given data assuming the collected data points are i.i.d. w.r.t. the distribution in the nominal driving environment. This is generally true in simulator-based tests such as in Carla and SUMO, but is not necessarily valid for real-world driving data sets as the data processing details are largely unknown. As a result, the claimed safety performance from the HighD data set and the Waymo open motion data set do not necessarily represent the corresponding SVs’ actual safety performance.

Before proceeding to the domain-oriented safety evaluation outcomes, we first emphasize some featured observations:

  1. The safe operable domain of a certain SV is a joint outcome of the SV’s own driving behavior, the other dynamic road users’ behavior, and the test environment.

  2. Within the same case study (i.e., the same testing behavior and environment), it is in general inaccurate to claim that a certain SV is safer than the other, unless the outcome concurs among all features, i.e., small , large density, and large occupancy.

  3. Comparing the proposed safety metric with the statistical fatality rate inference [fraade2018measuring], given the same confidence level and the same data set , the magnitude of is significantly smaller than the fatality rate value. That is, the inferred fatality rate metric tends to over-estimate the risk, especially when the collected finite states are clustered in a specific sub-domain in the nominal driving environment. In the meanwhile, the operational domain specific nature of the proposed metric helps establish a more precise safety performance assessment.

Safe Distance
1-R(C=0.999) TTC (s) TTC Valid Rate
NCAP-AEB IDM_0 N/A N/A 1.368 1.051 0.964 13.0199 0.832 0.1960
IDM_1 N/A N/A 1.229 2.223 0.244 4.9077 2.464 0.0568
SUMO IDM_0 5725.99 0.0019 8.844 0.685 0.378 0.7717 1.752 0.2619
IDM_1 N/A N/A 8.581 1.288 0.447 0.5121 2.009 0.3265
HighD Car 3276.48 0.0034 8.871 0.666 0.507 0.5732 7.675 0.5807
Truck 551.81 0.0199 8.951 0.413 0.442 2.9755 1.968 0.4418
TABLE II: Safety study of the lead-vehicle following domain with the NCAP-AEB test of two parametric IDMs, two types of human-driven vehicles from the HighD data set, and two different parametric driving algorithms from the SUMO simulator. Bold typeface highlights values that indicate the higher-risk driving behavior.
(a) The -almost safe sets obtained for IDM_0 in the NCAP-AEB case.
(b) The -almost safe sets obtained for IDM_1 in the NCAP-AEB case.
(c) Comparing the -almost safe sets obtained from the class of car drivers and the class of truck drivers in the HighD case.
Fig. 3: Comparing the -almost safe sets obtained from various cases for the lead-vehicle following domain.

Finally, note that the selection of for the set construction follows the procedure described in Section III-A, with the initial lower and upper bounds of set to and , respectively. The search terminates if the best -shape that wraps in a single polytope is found (the termination threshold is 0.1). That is, to a certain extent, the proposed algorithm not only finds the almost safe operable domain, but also finds the optimal almost safe operable domain. For the high-dimensional ODD analysis with a significantly large data set, such as the multi-vehicle interaction domain with the HighD data set, we also implement a hierarchical -means clustering routine to divide the data points into several clusters until all clusters are smaller than a preset threshold. The final -shape is then determined by combining all -shapes derived from the obtained clusters. Throughout this section, we also have . As a result, all obtained values are derived with a confidence level of at least 0.999.

Iv-a The lead-vehicle following domain

We start with the lead-vehicle following domain and analyze three different cases including HighD data set, SUMO-based simulation, and a customized simulation executing the NCAP-AEB car-to-car testing procedure. For the HighD data set, we define with , and and consider all vehicles in all lanes. The extracted trajectories are further classified into two categories determined by the SV’s type (car or truck). For the SUMO simulation, we consider with , and . For the NCAP-AEB case, the testing procedure defines the as , and . The experiment results are summarized in Table II and illustrated in Fig. 3.

In Table II, the safe travel distance and the inferred fatality rate are not available (N/A) for some of the cases as there are collisions included in

. The TTC is presented with the average value and the standard deviation of all admissible states and all TTC values are clipped at 9 seconds. A TTC is valid if it is positive (i.e.

). The TTC validate rate is determined as the ratio between the total number of time steps with valid TTC and the total number of time steps in the data. Within each case and each column, the bold font emphasizes the value that indicates the higher-risk driving behavior (e.g., small average TTC and small -shape occupancy). We further emphasize some observations as follows.

First, the comparison between SUMO and NCAP-AEB presents an interesting case of the same set of SVs tested with different testing policies induced by the traffic vehicle behavior. From the statistical summary shown in Table II, with the TTC based evaluation, IDM_1 is considered more dangerous in both the NCAP-AEB and the SUMO cases, yet the valid rates are different. On the other hand, the proposed operational domain specific and unbiased safety evaluation considers IDM_1 as the mostly safer behavior because it exhibits a higher probability of remaining inside the operable domain dictated by the -shape (small ) with a higher density. However, note that both the illustrated -shapes in Fig. 2(a), Fig. 2(b), and the occupancy values in Table II illustrate that the obtained safe operable domains from the two cases (SUMO and NCAP-AEB) are different. Recalling Remark 3, IDM_0 is less willing to execute collision avoidance maneuvers. This deficiency is more pronounced in the NCAP-AEB case with a more aggressive lead-vehicle driving behavior than the SUMO case. IDM_1 thus ends up with a relatively smaller safe operable domain than IDM_0 in the NCAP-AEB case, whereas in the SUMO case, the IDM_1’s safe operable domain is larger. In summary, IDM_1’s willingness to brake leads to a safer behavior than IDM_0 in the normal driving environment, yet it also confines itself to a smaller safe operable domain in the NCAP-AEB case which is more biased towards the falsification purpose with the other traffic behaving relatively aggressively.

Second, for the HighD case, the proposed metric mostly agrees with the mileage-based fatality rate measure and identifies the naturalistic behavior induced by the class of truck drivers as more dangerous than that of the class of car drivers. This contradicts the TTC-based metric as the class of car drivers exhibits a smaller average TTC. This aligns with the well-known deficiency of TTC also reported by other work in the literature [weng2020model, weng2021model].

Finally, throughout all the cases, given the same confidence level, the values from the proposed metric all exhibit a significantly smaller magnitude than the fatality rate value (mostly ten-thousand times smaller). Fundamentally, this occurs because the inferred fatality rate from [fraade2018measuring] does not have an explicitly defined operable domain. As a result, one would require a larger data set to obtain a similar level of probability to that of our proposed domain-specific metric.

Iv-B Multi-vehicle Interaction and Vehicle-to-Pedestrian Interaction

(a) Subspace slicing for with truck being the SV.
(b) Subspace slicing for with car being the SV.
(c) Subspace slicing for with car being the SV.
Fig. 4: Comparing the subspace slicing of the -almost safe sets obtained from HighD data set for the multi-vehicle interaction domain with different SV velocity ranges.
SV Safe Distance (km) 1-R(C=0.999) (C=0.999)
Car 536.895 0.0205 0.0004 0.1507 0.3505
Truck 168.042 0.0640 0.0012 0.3461 0.0441
TABLE III: Safety study of the multi-vehicle interaction domain with the HighD data set.

This section starts with a case study of the HighD case with the multi-vehicle interaction domain. As the driving environment in the HighD case consists of only straight-road segments, the domain specification shown in Fig. 1 directly applies with . The domain extraction for the left and right regions is confined to the adjacent lanes near the SV’s lane and also excludes some SV lanes with light traffic on the side (mostly with lane ID 5). The results are summarized in Table III and Fig. 4.

For the statistical inferred fatality rate, the truck is considered more dangerous with a short safe travel distance. On the other hand, for the proposed method, the truck is considered more dangerous with a large and a large occupancy value, yet the point density is also large. Moreover, comparing the center column subplots between Fig. 3(a) and Fig. 3(b), at least within the inspected SV velocity range, the car and the truck share similar lead-following distances (indicated by the bottom of the front-center subplots in green) and rear-following distances (indicated by the bottom of the rear-center subplots in purple). That is, the other traffic vehicles are not staying further away from a truck than they do from a car, nor do trucks maintain a longer following distance from the lead vehicle than cars. This contradicts the intuitive opinion one typically forms about naturalistic driving behavior in the real-world. Finally, comparing the first column with the third column in all three subplots in Fig. 4, one observes that vehicles on the left typically travel at a faster speed than the SV. This aligns with the nature of the data set as the HighD data set is primarily collected from German highways. This observation may not hold as we examine other driving environments, as we shall soon demonstrate.

As we consider the Waymo case and the Carla case, where pedestrian information is available, the vehicle-pedestrian interaction domain is involved. Although both cases involve a variety of driving environments including highway, urban roads, intersections, and roundabout, the domain definition considers them as unknown disturbances and uncertainties. The specifications shown in Fig. 1 still apply. For both cases, we have . The side subregion falls between the lateral offsets of and from the SV’s geometric center. The results are summarized in Table IV. Fig. 5 illustrates the multi-vehicle interaction domain sub-space analysis where the vehicle-pedestrian states (the last 4 dimensions in the 17-dimensional OSS) are ignored. The vehicle-to-pedestrian interaction is analyzed separately in Fig. 6.

(a) Subspace slicing for with Waymo-SDC being the SV.
(b) Subspace slicing for with Carla-Autopilot being the SV.
(c) Subspace slicing for with Carla-Autopilot being the SV.
Fig. 5: Comparing the subspace slicing of the -almost safe sets obtained from the Waymo and Carla cases for the multi-vehicle interaction domain with different SV velocity ranges and ignoring the vehicle-pedestrian states.
SV Safe Distance (km) 1-R(C=0.999) (C=0.999)
Waymo 40.778 0.2386 8.8567 1.4591 0.0096
Carla 399.195 0.0275 0.8060 18.3606 0.0118
TABLE IV: Safety study of the combined multi-vehicle interaction domain and the vehicle-pedestrian domain in the Waymo and the Carla cases.
(a) Waymo case.
(b) Carla case.
Fig. 6: Subspace slicing of the vehicle-pedestrian interaction domain: the first row on both sub-figures denotes the obtained -shapes and the second row denotes the points in within the selected sub-space.

Within the SV velocity range of (m/s) (see the center column of Fig. 4(a) and Fig. 4(b)), the Waymo-SDC is more conservative as it maintains a longer following distance. Moreover, the observation also generalizes to the vehicle-pedestrian interaction domain where the Carla-Autopilot exhibits a short vehicle-pedestrian distance within a large velocity range (see Fig. 6). However, note that these comparisons are not necessarily fair as the driving environments are essentially different.

In comparison with the HighD case, the two analyzed cases have poorer coverage performance, and vehicles are mostly operating at a low speed range given that the driving environments are different. The observation from the HighD case where vehicles on the left run faster is no longer valid as illustrated in Fig. 5. The advantage of having a domain-specific safety analysis can also be shown through the Waymo case in Table IV. Limited by the data availability, the total safe travel distance for the Waymo-SDC is short, leading to a large fatality rate (0.2386). In addition, the value is much smaller for the same confidence level.

V Conclusion and Discussions

This paper has presented a novel safety metric that is operational domain specific and provably unbiased for performance evaluation of ADS/ADAS equipped CAVs involving the -shape and the -almost robustly forward set invariance property. The performance of the proposed method is also demonstrated over several commonly encountered and challenging ODDs with a variety of data sets collected with different fidelity levels. It is shown, provably and empirically, more accurate than many leading measures, observed and predictive safety lagging measures. In comparison with the inferred fatality rate, the domain-specific nature also customizes a more precise safety assessment property.

As discussed in Section III, it is of future interest to expand the -almost safe set with richer information related to the dynamic modeling, engagement information, and other safety related features. It is also of practical value to explore more efficient algorithms in deriving the (optimal) -shape.