1 Challenges, Background, and Contribution
Automated and autonomous vehicles (AV) are responsible for avoiding mishaps and even for mitigating hazardous situations in as many operational situations as possible. Hence, AVs are examples of systems where the identification (2a) and mitigation (2b) of hazards have to be highly automated. This circumstance makes these systems even more complex and difficult to design. Thus, safety engineers require specific models and methods for risk analysis and mitigation.
As an example, we consider manned road vehicles in road traffic with an autopilot (AP) feature. Such vehicles are able to automatically conduct a ride only given some valid target and minimizing human intervention. The following AV-level (S)afety (G)oal specifies the problem we want to focus on in this paper:
SG: The AV can always reach a safest possible state wrt. the hazards identified and present in a specific operational situation .
Adopted from [4, 9], we give a brief overview of terms used in this paper: We perceive a mishap as an event of harm, injury, damage, or loss. A hazard (or hazardous state) is an event that can lead to a mishap. We consider hazards to be factorable. Hence, a hazard can play the role of a causal factor of another hazard or a mishap. We denote causal factors, hazards, and mishaps—i.e., the elements of a causal (event) chain—by the term safety risk (risk state or risk for short). We perceive the part of a causal chain increasing risk as an endangerment scenario, and the part of a causal chain decreasing risk as a mitigation strategy. Table 1 exemplifies different endangerment scenarios and how these can be mitigated using corresponding strategies.
Mitigation strategies can be seen as specific system-level safety requirements implemented by a given control system architecture. We assume that a control system architecture consists of features deployed on sensors, actuators, and software components running on networked computing units (cf. Figure (a)a). By traditional driver assistance (TDA), we refer to driver assistance features already in the field, e.g. adaptive cruise control (ACC) and lane keeping assistance (LKA).
We distinguish between the domains vehicle, driver, and road environment. For highly and fully automated driving, not all domains have to be considered. For example, in full automation (e.g. level 5 in ), the vehicle has to operate under all road and environmental conditions manageable by a human driver and therefore a driver does not have to be taken into account.
|Possible Mitigation Strategy|
|Scenario of Endangerment||Vehicle||Driver||RoadEnv|
digital road signs
|Driver||maloperation||passive safety||safe reaction (if controllable)|
|digital road signs, x2car com.|
|IT attack||security pattern||safe reaction (if controllable)|
a framework for modeling, analysis, and design of planners (i.e., high-level controllers) capable of run-time hazard identification and mitigation, and
a procedure for constructing planning models from hazard analysis.
For this, we formalize the core engineering steps necessary for (2a) the identification and analysis of scenarios of endangerment and (2b) the design of operational mitigation strategies. Using an exemplary AV, we incrementally build up a risk structure involving three hazards in the vehicle domain, as well as several strategies to reach safe states in presence of these hazards. We discuss approaches to model reduction suited for run-time hazard analysis and mitigation planning where efficient identification of operational situations and acting therein play a crucial role.
In this paper, we discuss related work in Section 2, our abstraction in Section 3, and our modeling framework in Section 4. Section 5 shows a procedure for building a hazard mitigation planning model. We present an AV example in Section 6, discuss our approach in Section 7, and conclude in Section 8.
2 Related Work
Among the related formal methods available in robotics planning, embedded systems, and automated vehicle control, we only discuss a few more recent ones and highlight how we can improve over them.
Güdemann and Ortmeier  present a language for probabilistic system modeling for safety analysis. Formalized as Markov decision processes (MDP), they propose two ways of failure mode modeling (i.e., per-time and per-demand failure modes), and two ways of deductive cause consequence reasoning (i.e., quantitative and qualitative). Their model and reasoning can extend our approach. However, our work (i) adds stronger guidelines on how to build planning models and (ii) puts hazard analysis into the context of autonomous systems and mitigation planning.
Eastwood et al.  present an algorithm for finding permissive robot action plans optimal w.r.t. to safety and performance. They employ partially observable MDPs (helpful in regarding uncertainty and robot limitations) to model robot behavior, and two abstractions from this model to capture a system’s modes and hazards. Our framework uses three layers of abstraction (, , ), operational situations to capture control modes, and a structure to capture hazards. While they directly encode hazard severity for plan selection, our framework allows the planner to calculate the risk priority based on a causal event tree towards mishaps. As opposed to complete behavioral planning, our approach focuses the construction of mitigation planning models. For example, for system faults we can plan mitigations by using adaptation mechanisms of a given control system architecture.
Jha and Raman  discuss the synthesis of vehicle trajectories from probabilistic temporal logic assertions. Synthesized trajectories take into account perception uncertainty through approximation of sensed obstacles by combining Gaussian polytopes. In a similar context, Rizaldi and Althoff  formalize safe driving policies to derive safe control strategies implementing worst-case braking scenarios in autonomous driving. They apply a hybrid-trace-based formalization of physics required for model checking of recorded  and planned  strategies. [8, 10, 11] discuss low-level control for a specific class of driving scenarios, whereas our approach provides for (i) the investigation and combination of many related operational situations, thus, forming a more comprehensive perspective of driving safety, (ii) regarding various kinds of hazards that might play a role in high- and low-level control beyond safe and optimal trajectory planning and collision avoidance.
Wei et al.  describe an autonomous driving platform, capable of bringing vehicles to a safe state and stop, i.e., activating a fail-operational mode on critical failure, and a limp-home mode on less critical failure. These are mitigation strategies we can assess in our framework. Their work elaborates on designing a specific class of architectures. Additionally, we provide an approach to systematically evaluate risks and, consequently, derive an architecture design.
Babin et al.  propose a system reconfiguration approach developed with the Event-B method in a correct-by-construction fashion using a behavior pattern similar to our approach (particularly, Figure (b)b). Reconfiguration as one way to mitigate faults is discussed in this work. Wardziński  discusses hazard identification and mitigation for autonomous vehicles by predetermined risk assessment (i.e., with safety barriers) and dynamic risk assessment. For both, he provides argumentation patterns for creating AV safety cases. In addition to his work, the abstraction and the method we propose covers both paradigms in one framework. We provide formal notions of all core concepts.
3 Abstraction for Run-time Hazard Mitigation
Figure 1 depicts three abstractions—, , and —for run-time hazard mitigation in AVs. The state space pertains to the quantization of continuous signals from the physical world encompassing the driver (), the vehicle (), and the road environment (). For instance, the quantity speed is represented by the discrete state variable , which in turn is used to formulate predicates to obtain the abstract state space . For example, a predicate over sensor values , , can encode , an invariant constraining the activity of leaving a tunnel. We describe this two-staged abstraction in more detail in .
Here, we will work with the risk state space whose concepts—actions, hazard phases, their composition and ordering—are discussed below:
Let be a set of actions. We abstract from control loop behaviors within and across operational situations by distinguishing four classes of actions: endangerments , mitigations (see Figure (b)b), mishaps , and ordinary actions . Note that actions can take place in one or more out of the three domains, drv, veh, and renv, depending on the quantities they modify. We require .
Definition 1 (Hazard Phases)
Let be a set of hazards. Given , endangerment actions , and mitigation actions , we define the phases of a hazard as the set whose elements denote the following:
hazard is (inact)ive,
hazard has been (act)ivated by an action ,
(act)tivated hazard has contributed to a mishap by an action , and
hazard has been (mit)igated by an action .
For each hazard , Figure (a)a depicts as a transition system where , the indices , the state subsumes phases, subsumes phases and . For example, in the vehicle domain, can model degradation transitions and or can model repair transitions.
From all the sets of hazard phases, we compose a tuple space as follows:
Definition 2 (Risk State Space)
Based on Definition 1, we define the risk state space as the set of -tuples
We call any subset of a region. Let with and . To quantify risk in scenarios of endangerment and mitigation strategies (Table 1), we define a partial order over :
Definition 3 (Mitigation Order)
Let be a set of phases for hazard (Definition 1) and . By the reflexive transitive closure111Here, for a relation , represents the composition of relations. , we define the mitigation order , for states , as follows:
Intuitively, denotes “ is better or further in mitigation than .”222We use the convention .
4 Concepts for Run-time Hazard Mitigation
In this section, we explain the core concepts of deriving a risk structure for a specific operational situation. Using the risk state space and actions , we define the notions of risk structure, risk region, and operational situation:
Definition 4 (Risk Structure)
A risk structure is a weighted labeled transition system with
a set called the risk state space (Definition 2),
a set of actions used as transition labels,
a relation called labeled transition relation, and
a set of partial functions called weights where the set can be, e.g. , or .333(m)arginal, (c)ritical, (f)atal; for other examples of severity scales, see .
To capture the notions of endangerment scenario and mitigation strategy (Table 1) based on , we consider paths and strategies:
Definition 5 (Paths, Strategies, and Reachability)
By convention, we write for . Then, for , a path is a sequence . By we denote the set of all paths of length and by all paths over . Furthermore, we call a set a strategy. By with , we denote the set of states reachable in from a state .
We consider an action as an endangerment, i.e., , if for a transition . The class models steps of endangerment scenarios. For example, can stem from faults in drv, veh, and renv.
We consider an action as a mitigation, i.e., , if for a transition . The class models steps of mitigation strategies. One objective of a good mitigation strategy is to achieve a stable safe state.
States and regions in both correspond to subsets of (Section 3). To limit the scope of a risk analysis, we use an operational situation which combines an initial region with a (reasonably weak) invariant holding along the driving scenarios in a specific road environment.
Definition 6 (Operational Situation)
An operational situation is a tuple where and is an invariant over including all representations of in . Let be the set of all operational situations.
Below, we will work with a risk structure and assume a fixed operational situation associated with . Hence, we use solely.
4.0.1 Risk Regions.
We consider specific subsets of called risk regions, particularly, the safe region , the hazardous region , and the mishap region (see Figure (b)b). Safety engineers aim at the design of mitigations which (i) avoid and (ii) react to endangerments as early and effectively as possible. Then, reduces to unavoidable actions from so-called near-mishaps still in towards . For example, we consider a successfully deployed airbag to be in such that is not reached in such an accident (more in Section 7).
Our definitions of risk regions depend on : First, . We require mishaps to be final, i.e., . Second, and vary with a given operational situation. Moreover, they can be defined based on, e.g. weights and equivalences. However, and, for an , we start in the safe region iff .
By associating weights with elements of , we quantify further details on the physical phenomena of the controlled process relevant for risk analysis.
For example, given with , the probability of endangerment
yields the probability that hazardgets activated in by performing in . Furthermore, given with ,
the probability of mitigation yields the probability that hazard gets mitigated in by performing in .
the cost of mitigation yields the potential effort (i.e., time, energy, other resources) of performing the mitigation .
For any mishap , specifies its severity. Depending on the abstraction, we can use qualitative (as shown above) or quantitative scales for and . Anyway, we assume to have operators for and , e.g. see Figure (a)a.
Weights are typically calculated from measurements of the controlled process. For example, the estimation ofmight be result of a controllability analysis of in (of an operational situation). Moreover, further quantities (e.g. risk priority) might be (i) calculated from weights, (ii) be propagated along , and (iii) lead to an update of weights.
4.0.2 Risk Priority.
Given , and a function , we can compute the minimum partial risk priority
where denotes the probability444See, e.g.  for details about probabilistic temporal logic and reasoning. that from some mishap is eventually () reached in . This definition implements a traditional measure of risk analysis (see, e.g. ), referring to the minimum negative outcome (i.e., damage, injury, harm, loss) possibly reachable from in a specific operational situation . Note that for , .
4.0.3 Equivalences over .
For simplification of complex risk structures , we can construct equivalence classes over states. From the structure of states in , the dynamics in , and the elements of the control system architecture (Section 1.0.1), we give a brief informal overview of equivalences over to be considered:
We speak of feature equivalence, , iff both, and map to the same set of active features of the control system, i.e., in-the-loop no matter whether they are fully operational, faulty, or degraded. Note that out-of-the-loop features can be faulty, deactivated, or in standby mode. Next, we speak of degradation equivalence, , iff and both states share the same set of degraded features. Furthermore, we speak of hazard (or fault) equivalence, , iff , and, particularly, of mishap equivalence, , iff . Based on , we finally define:
Definition 7 (Mitigation Equivalence)
Based on Definition 3, two states are mitigation equivalent, written , iff
5 Construction of Risk Structures
In this section, we describe an incremental and forward555For generation of , backward reasoning is the alternative not shown here. reasoning approach to building a risk structure .
5.0.1 Identification of Hazards.
Throughout the construction of , we assume to have a procedure for the identification of a set of hazards based on a fixed control loop design of a class of AVs and their environments, and a fixed set of operational situations (Definition 6). Failure mode effects and fault-tree analysis (see, e.g. ) incorporate widely practiced schemes for .
5.0.2 Building the Risk Structure.
Figure (b)b shows the main steps of a procedure which, given a set and after termination, returns all elements of a complete risk structure . Here, completeness is relative to and means that can no more be extended by (i) states which are reachable by existing actions in , (ii) actions which allow reaching non-visited states in , (iii) transitions in which are technically possible and probable, and (iv) further knowledge by extending the domains of weights. Based on Figure (b)b, Algorithm 1 refines for a control loop and an operational situation .
The while-loop (cf. line 2) accounts for the alternation between adding endangerments and mitigations. By using the maps and (cf. lines 2, 3, 14, 17, 26), the algorithm keeps track of the ndangerment- and itigation-coverage of visited states, i.e., for which hazards has already been visited.
We assume to have (i) a function (cf. lines 9, 11, 22, 23) which acts as an oracle for weights (Section 4.0.1) depending on , and (ii) a function (cf. lines 6, 20) which acts as an oracle for determining the technical possibility of newly identified transitions.
The first for-loop checks for the addition of new transitions to (cf. line 7). The transition constructor returns a state with the given hazard or mishap activated (i.e., phases or ). Note that can generate reachable via .
The second for-loop checks for the addition of new transitions to (cf. line 21). The transition constructor returns a state with the given hazards mitigated to a new phase for each .
5.0.3 Model Reduction.
To keep reasoning efficient, we have to apply reachability-preserving simplifications to (cf. lines 29f), e.g. equivalences such as in Definition 7. The mitigation order (Definition 3) helps in reducing the state space and in merging actions modifying phases of the same hazards (i.e., by hazard equivalence).
5.0.4 Abstraction from Control System Architecture.
In both stages of Algorithm 1, we need to analyze the given or envisaged architecture and to identify state variables, e.g. for software modules, at an appropriate level of granularity.
In the endangerment stage (lines 3ff), we can perform dependability analyses to identify events that can activate causal factors. Off-line, we then design specific measures to reach the safe region again, and, on-line, we design generic measures to be refined at run-time.
Moreover, the mitigation stage (lines 17ff) helps to revise a control system architecture, e.g. by adding redundant execution units and degradation paths. Moreover, we can pursue off-line synthesis of respective parts of the control system architecture.
5.0.5 Hazard Mitigation Planning.
First, is hybrid in the sense that it (i) performs the sensing of already known endangerment scenarios (e.g. near-collision detection, component fault diagnosis) on-line, and (ii) allows the addition of new scenarios from off-line hazard analysis.
Second, a simple planner would continuously perform shortest weighted path search in to keep a list of all available lowest-risk mitigation paths (Definition 5) and coordinate optimized lower-level controllers.
Based on these two steps, we assume to be continuously updated according to the available information (i.e., adding or modifying endangerments and mitigations according to known scenarios). It is important to have powerful and precise update mechanisms, highly responsive actuation, and short control loop delays. Main issues of signal processing are briefly mentioned in Section 7.
The notion of safest possible state (SG, Section 1) is governed by the accuracy of (Section 3), the completeness of the results of , and the exhaustiveness of for a fixed setting . According to Definition 3, for a pair , we might say that is the safest possible state iff we have
where . Any controller for SG would have to find and completely conduct a shortest plan for to reach .
6 Example: Fail-operational Driver Assistance
6.0.1 Identifying an Operational Situation.
We consider the situation : “AV is taking an exit in a tunnel, at a speed between 30 and 90 km/h, with the driver being properly seated, and the next road segments contain a crossing.” Figure (b)b depicts the corresponding street segment.
|Driver||Physical presence, consciousness, vigilance,||drv|
|Vehicle||Speed, loc(ation), fault conditions,||veh|
|RoadEnv||Daylight, weather, traffic, road,||renv|
6.0.2 Modeling the Road Vehicle Domain.
Figure (a)a shows a simplified control system architecture used for driver assistance systems. We model the relevant state information according to the abstractions described in Section 3. State variables commonly used for road vehicles are listed in Table 2. For , we assume to have the variables666Variable types and usage depend on the AV sensors and car2X services through which they are measured. We assume individual error estimators for all variables. (prefixed with their domains, in parentheses their types): (coordinate),
(vector of floats),(street map777With, e.g. topological coordinate system, information about tunneled parts.), and (enumeration). veh denotes all variables of this domain. For , we identify the following predicates888Here, refers to a pattern for the street map element class which acts like a filter on the street map data type. For sake of brevity, we omit details of sensor fusion and street map calculations required for evaluating these predicates.:
Furthermore, we use unspecified predicates:
The invariant for is . Note that the AP is active in the initial state associated with .
6.0.3 Incremental Forward Construction of the Risk Structure.
Refining the regions and (Figure (b)b), we construct from three hazards , and identified by (Section 5.0.1). Table 3 sketches the construction of the first and second increments towards , including the events “AP sensor fault” and “TDA LKA software fault.”
|Introduce faults (e.g. from fault model)|
|AP sensor fault|
|TDA LKA software fault|
|End. phases: Comb. of and||,|
|…“LKA faulty” “TDA active” “AP out of the loop”|
|Actions establishing and (e.g. from architecture analysis)||,|
|Probability of endangerment||e.g. ,|
|Severity …“high-speed collision”||,|
|…“AP fail-op. by degrad. to TDA”|
|…“deact. ACC” “driver in loop”|
|…“TDA fail-silent and warn”|
|…“TDA total fail-silent” “immediate handover to driver”||,|
Mitigation phases: …“ fault” “TDA active,”
…“ fault” “handed to driver” “TDA active”
…“ fault” “handed to driver” “AP out of the loop”
…“TDA out of the loop” “driver warned”
…“AP and TDA out of the loop” “handed to driver”
|Probability of mitigation||e.g.|
|Cost of mitigation||e.g.|
|Simplifications: e.g. (cf. Definition 7)|
Figure (a)a shows for . According to Algorithm 1, we try to add the fault condition to and other states in (i.e., black states in Figure (a)a). Based on the action , this step yields the states , and . Then, a mitigation step yields the states and and, finally, another step of endangerment analysis based on the action yields .
Risk Priority Estimation.
From the state with , we can derive, e.g. according to Eq. (1). We can as well derive because reaching by driving assistance control is no more possible.
Equivalences and Model Reduction.
In Figure (a)a, for example,
because in both states is mitigated and other hazards are inactive (, cf. Definition 7),
because in the degraded variants of LKA and ACC, i.e., LKA and ACC, are in the loop,
because in both states LKA and ACC are in the loop,
because in both states, LKA and ACC are in the loop, and
because ACC (part of AP) is faulty and ACC (part of TDA) is fully operational.
Simplifications can be derived from Figure (a)a, where we might (i) merge two states if , or (ii) merge two consecutive states on a “safe” mitigation path, e.g. from any to if actions such as limp-home, shutdown, and repair are feasible from .
Figure (b)b shows a simplification of . We omit irrelevant transitions () and collapse the mitigation-equivalent () states and . Consequently, with the states and we get a refinement of . According to Eq. (2), is a safest possible state reachable from .