Log In Sign Up

Decision-making for automated vehicles using a hierarchical behavior-based arbitration scheme

by   Piotr Franciszek Orzechowski, et al.
FZI Forschungszentrum Informatik

Behavior planning and decision-making are some of the biggest challenges for highly automated systems. A fully automated vehicle is confronted with numerous tactical and strategical choices. Most state-of-the-art automated vehicle platforms implement tactical and strategical behavior generation using finite state machines. However, these usually result in poor explainability, maintainability and scalability. Research in robotics has raised many architectures to mitigate these problems, most interestingly behavior-based systems and hybrid derivatives. Inspired by these approaches, we propose a hierarchical behavior-based architecture for tactical and strategical behavior generation in automated driving. It is a generalizing and scalable decision-making framework, utilizing modular behavior components to compose more complex behaviors in a bottom-up approach. The system is capable of combining a variety of scenario- and methodology-specific solutions, like POMDPs, RRT* or learning-based behavior, into one understandable and traceable architecture. We extend the hierarchical behavior-based arbitration concept to address scenarios where multiple behavior options are applicable but have no clear priority against each other. Then, we formulate the behavior generation stack for automated driving in urban and highway environments, incorporating parking and emergency behaviors as well. Finally, we illustrate our design in an explanatory evaluation.


A Driver-Vehicle Model for ADS Scenario-based Testing

Scenario-based testing for automated driving systems (ADS) must be able ...

Planning Automated Driving with Accident Experience Referencing and Common-sense Inferencing

Although a typical autopilot system far surpasses humans in term of sens...

Vehicle Type Specific Waypoint Generation

We develop a generic mechanism for generating vehicle-type specific sequ...

Towards Courteous Behavior and Trajectory Planning for Automated Driving

Efficient behavior and trajectory planning is one of the major challenge...

Modeling Preemptive Behaviors for Uncommon Hazardous Situations From Demonstrations

This paper presents a learning from demonstration approach to programmin...

I Introduction

Recent years have shown significant research and engineering progress in the field of automated driving and advanced driver assistance systems. In particular, the performance of state-of-the-art environment perception methods is improving quickly due to the advances in deep learning and other AI technologies.

Besides perception, one of the biggest challenges for highly automated systems is behavior planning and decision-making. In urban driving, traffic participants are confronted with numerous tactical and strategical choices. Humans decide in most of these situations, like stopping at a zebra crossing, choosing an appropriate gap when merging or yielding at intersections, reactively. Long-term decisions, like goal and route selection or the choice of driving style and behavior preferences, consider longer time horizons, though.

So far, considerable results in behavior and trajectory planning have been already achieved for some complex scenarios [hoermann_entering_2017, hubmann_automated_2018, bouton_scalable_2018]. However, no generalizing and scalable decision-making framework has been found that is capable of combining a variety of such scenario- and methodology-specific approaches into one understandable and traceable architecture.

How and when should an automated vehicle switch from a regular ACC controller to a lane change, cooperative zip merge or parking planner? How can we support POMDPs, hybrid A* and any other planning method in our behavior generation?

Most state-of-the-art fully automated vehicles that have at least proven successful in the DARPA Urban Challenge [buehler_darpa_2009, bacha_odin_2008, montemerlo_junior_2008] or during test rides on public roads [ziegler_making_2014, aeberhard_experience_2015] have used finite state machines (FSMs) for tactical and/or strategical behavior generation. FSMs are a useful tool for simple systems with a small number of behavior options and maneuvers where each state represents one maneuver or driving mode. In practice FSMs, even hierarchical FMSs, turn out to be unsuitable for more complex tasks due to their poor explainability (about the reason why a behavior is executed), maintainability (the effort to refine existing behavior) and scalability (the effort to achieve a high number of behaviors). These shortcomings motivate the search for other architectures that can be used for tactical and strategical behavior generation.

Decision-making is a well known research field in robotics, also referred to as “robot control” or “action selection” [siciliano_springer_2016]

. Generally, the various approaches can be classified into knowledge- or behavior-based systems.

Knowledge-based systems, like FSMs, typically perform the action selection in a centralized, top-down manner using a knowledge database that contains a fused and abstracted representation of all available sensor data. As a result, the engineer designing the action selection module (in FSMs the state transitions) has to be aware of the conditions, effects and possible interactions of all behaviors at hand.

Behavior-based systems, on the other hand, decouple actions into atomic simple behavior components that should be aware of their conditions and effects themself. These modular behavior components are then combined to more complex behaviors in a bottom-up approach. Many architectures for behavior coordination have been proposed. The most prominent are the subsumption architecture [brooks_robust_1986], activation networks [pattie_maes_how_1989] and voting systems [julio_k._rosenblatt_damn_1997].

In this publication, we propose a hybrid approach combining the best from both worlds: A hierarchical behavior-based architecture for tactical and strategical behavior generation in automated driving. We combine atomic behavior components to more complex behaviors using generic arbitrators. Arbitrators can again be combined with other arbitrators or behavior components to generate an even more complex system behavior. A similar approach has proven very successful in robot soccer [lauer_cognitive_2010].

In automated driving, each behavior represents a tactical driving maneuver like changing the lane, crossing an unsignalized intersection or parking into a nearby parking lot. The integration of these self-contained maneuvers into the arbitration structure also realizes strategical behaviors like switching from urban to highway driving.

The behavior components make use of an environment model to assess if they are applicable in the current state and make this information available to their parent arbitrator. Arbitrators do not know the nature of their underlying behavior options by design (functional decomposition). Instead, given their arbitration strategy, they select an appropriate behavior based on abstract information only (e.g., expected utility or priority).

Such a hierarchical behavior-based arbitration scheme has great advantages:

  • Scenario-specific solutions can be combined easily.

  • It supports different planning approaches.

  • The resulting behavior can be well explained.

  • It can be iteratively extended by more behaviors.

  • The modularity improves robustness and efficiency.

  • Complex behavior emerges from simple components.

The remainder of this paper is structured as follows: Section II summarizes the main concepts of hierarchical behavior-based arbitration schemes as found in [lauer_cognitive_2010]. The main contribution of this publication is presented in Section III, where we formulate the behavior generation for automated driving using a hierarchical behavior-based arbitration scheme. We extend the existing arbitration approach, develop a suitable maneuver representation, define a set of fundamental driving behaviors and combine these to an overall system behavior using arbitrators. Section IV validates this approach with explanatory experiments on an urban route. Finally, Section V concludes with the key advantages and a brief outlook.

Ii Fundamentals

A first concept of hierarchical behavior-based arbitration schemes for behavior generation has been presented in detail in [lauer_cognitive_2010]. This chapter highlights the main ideas that also have been briefly outlined in section I.

The concept is based on simple modular behavior components and generic arbitrators.

Ii-a Behaviors — How to do things

Behavior components are the fundamental building blocks of a behavior-based architecture. They describe how and when things can be done.

A behavior component provides three main functionalities: The invocation condition indicates if this behavior is applicable in the current situation. The commitment condition of a currently active behavior signalizes if it can be continued or not. If either the invocation or commitment condition is true, the behavior can be selected to generate and execute its actual behavior command.

Ii-B Arbitrators — Which thing to do

Arbitrators hierarchically combine behaviors to produce more complex behavior strategies. They decide which thing to do.

An arbitrator contains a list of behavior options to choose from. A specific selection logic determines which option is chosen based on abstract information.

Any knowledge and decision logic necessary for the selection and execution of a behavior component is completely encapsulated inside the behavior component itself. As a result, arbitrators do not need any knowledge about the nature of their underlying behavior options.

This bottom-up design approach leads to strong functional and semantic decomposition.

To generate even more complex behavior, an arbitrator can also be a behavior option of a hierarchically higher arbitrator.

The following selection schemes have been proposed: The highest priority first arbitrator organizes its behavior options in a list ordered by priority. An applicable option with the highest priority is chosen. The sequence arbitrator executes its options based on a fixed predefined order. A random arbitrator assigns probabilities to its behavior options and selects one among all applicable options randomly.

Additional arbitration schemes that are necessary for, but not limited to automated driving are introduced in section III-D.

Iii Application to automated driving

This chapter describes the main contribution of this publication: how a hierarchical behavior-based arbitration scheme can be utilized for decision-making in automated driving.

In contrast to classical behavior-based systems each behavior component is not directly connected to the sensors and actors. Instead, the input is an abstract environment model that contains a fused, tracked and filtered representation of the world. The behaviors’ output is also in a more generic form that can be passed to a trajectory planner or controller. In this sense, we follow the sense-plan-act paradigm in the overall software structure [siciliano_springer_2016] but employ a behavior-based approach in the decision-making module.

Iii-a Environment Model

The environment model in our implementation contains a lanelet map [poggenhans_lanelet2_2018], ego motion state and detected objects with prediction. The map describes drivable areas, distinct lanes, parking lots, traffic rules, etc. The ego motion state mainly depicts the current pose and velocity of the ego vehicle. Currently, we assume that the objects are provided with a decoupled prediction, but integrated planning and prediction within the behavior components is also possible. We think our design should support both open-loop and closed-loop prediction.

Iii-B Maneuver Representation

As we aim for a generalizing approach that is applicable to various driving environments our behavior representation should be as task-agnostic as possible. It should fit all relevant use cases and environments of automated driving, namely highway, urban and parking. However, the proposed representation and interfaces would also work for other environments like off-road driving.

Our smallest behavior building blocks, the behavior components, represent basic driving maneuvers such as “follow the ego lane”, “merge into traffic” or “park near goal”. In general, we can distinguish between maneuvers in a structured or unstructured environment. Urban and highway scenarios provide road boundaries or even distinct lanes, while parking lots and off-road areas feature open space like scenarios.

Therefore, we use a twofold maneuver representation:

Figure 1: Maneuver corridor for a lane change, right bound in green, left bound in red, reference line in blue. The planned trajectory as circles, one circle per time step.

Driving commands in structured environments use a corridor-based maneuver representation. It consists of a maneuver corridor, reference line, predicted objects and the chosen maneuver variant. The corridor is usually generated from map data [poggenhans_lanelet2_2018], but could also be provided online, e.g. from semantic segmentation [meyer_deep_2018]. The reference line is an approximation of the centerline and can serve as a rough positional reference. Additionally, velocity objectives are given along this line, e.g. derived from the speed limit and curvature. The object list contains all objects that are relevant for this maneuver, their predictions as well as virtual objects indicating stop positions. Finally, the maneuver variant defines the chosen homotopy class, as discussed in [bender_combinatorial_2015]. An example of a corridor-based driving command is shown in Fig. 1.

Driving commands in unstructured environments directly use a trajectory to represent the requested maneuver. We did not choose a more abstract representation in this case, in order to support a wide variety of use cases in such environments.

Depending on the command representation type, the system following the decision-making module runs different pipelines to execute these maneuvers. Corridor-based maneuvers are passed to a trajectory planner, e.g. [ziegler_trajectory_2014] or [gutjahr_lateral_2017], followed by an appropriate controller. While trajectory-based driving commands are directly handed over to a trajectory controller that is tuned for slow and capable of backward driving, as needed for parking and similar maneuvers.

Iii-C Driving Maneuvers — How to drive

Following the behavior-based approach we begin with designing atomic behavior components for simple tasks, before stacking them together in section III-D. Here, we do not attempt to present a feature-complete list with all necessary behaviors. Instead, we focus on explaining the main design concept using some hand-picked example behaviors, that should compile a decent start to develop an automated vehicle. This stack can then be extended iteratively by more specialized behavior components addressing specific driving situations. Furthermore, a behavior component can compute its maneuver command with any preferred state-of-the-art method.

An urban environment is probably the most challenging one for automated driving. We can think of at least three basic driving maneuvers needed in an urban setting:


As long as the ego pose matches any urban lane of our route and this lane does not intersect other lanes, our vehicle could probably follow it. Thus, the invocation condition of FollowEgoLane is true as long as such a matching lane exists. Executing this behavior for one time step will keep the vehicle in a well-defined state, so we can leave the commitment condition false to allow other behaviors to be selected after a FollowEgoLane command.


One characteristic of urban environments are numerous signalized or unsignalized intersections that need specific behavior. An automated vehicle has to make sure to yield to superordinate traffic participants (such as vehicles and vulnerable road users (VRUs)). Additionally, in some intersections occlusions ask for even higher caution [orzechowski_tackling_2018].

The invocation condition of CrossIntersection is true as long as the current ego lane intersects other lanes within its planning horizon. Crossing intersections can be critical such that its commitment condition will be true until the intersection has been passed. This should help to e.g. clear the intersection as soon as reasonably possible and not to unintendedly change lanes in the intersection.


Lane changes, on the other hand, are only possible when the current ego lane has a directly adjacent reachable lane on the left or right side with a safe distance to the following and leading vehicles. The ChangeLane component is defined w.r.t. the supposed changing direction and instantiated once for each direction to improve reusability.

As a result, the invocation condition of ChangeLane is true as long as the current ego lane has a directly adjacent reachable lane in the respective direction with a big enough gap to safely change into. In order to produce consistent driving behavior, the commitment condition is true until the lane change maneuver has been completed or properly aborted, e.g. in case the selected gap becomes too small.

However, in dense traffic it might be necessary to perform a lane change in three consecutive phases — ApproachGap, IndicateIntention and MergeIntoGap [nilsson_lane_2016]. These can be designed as behavior components as well and put into sequence in section III-D.

For better clarity and conciseness, the remaining behaviors will only be described briefly.


A gap approach should be possible, if the current ego lane has a directly adjacent reachable lane, regardless of the nearest gap size.


Once the gap has been reached but is too tight to merge into directly, the vehicle will stay in its lane but indicate its intention using the turn signals and by laterally approaching the other lane.


As soon as the gap size is big enough, the vehicle can safely merge into it.

Another typical application for automated vehicles is driving on highways. Many occurring behaviors, like lane changing and ACC like lane following, are similar to those provided for urban environments. High velocities, special traffic rules and the absence of VRUs justify distinct behavior components though, in order to properly adjust these for highway environments.


One of the most critical maneuvers on highways is merging onto the highway from an onramp. High relative velocities and sometimes short onramps pose a challenge to select an adequate gap, approach it with possibly high acceleration and finally merge into it. Similarly to lane changes in dense urban traffic, MergeOntoHighway could be modeled with sequential sub-behaviors, to further decompose the problem.


Following a highway lane is comparable to FollowEgoLane, except that the environment is much more structured. Additionally, it can be assumed that no unpredictable VRU will enter or even cross the highway, which prevents overcautious behavior enabling driving at a higher speed. Furthermore, in countries like Germany, special highway laws prohibit overtaking other traffic participants on the right side.


Again, comparable to lane changes in dense urban traffic, changing lanes on highways can be modeled as a multi-phase behavior or as one integrated interaction aware behavior, using e.g. POMDPs [hubmann_belief_2018].


Exiting from highways can be as simple as changing to a new diverging lane or as challenging as crossing traffic that is meanwhile entering the highway. Such scenarios can be found in cloverleaf interchanges, for example, where offramps overlap onramps.

In the beginning, end or even during an automated drive, the vehicle has to park in a suitable place. Either to drop and collect passengers or for standby and charging. Usually, path or trajectory planners based on graph search methods are used in these environments [banzhaf_footprints_2018].


Typically a ride starts with the car parked in a parking lot or garage. In this case, LeaveGarage is the only applicable behavior.


As soon as the automated vehicle is close to its goal and a suitable parking lot is found, the vehicle can reduce its speed and park into this parking lot. Notice, that the search for a parking lot is not included here. It might be modeled as another behavior component or supplied by the routing module.

Finally, we add fail-safe emergency behaviors, in case a dangerous unforeseen traffic situation evolves or as a fall back if no other behavior component provides feasible commands.


In case — for whatever reason — an unavoidable collision will be anticipated, the EmergenyStop behavior will provide a full-stop trajectory to reduce damage and fatalities.


If a longitudinally unavoidable collision will be anticipated which could be avoided or at least mitigated by an evasive maneuver like [werling_automatic_2012], EvadeObject will provide such a behavior option.


As a fail-safe fallback for any system failure or if none of the other behavior components provide feasible commands, SafeStop will bring the vehicle to a safe stop, preferably in a safe position.

Iii-D Arbitration Scheme — Which maneuver to drive

Figure 2: Full arbitration graph of the proposed minimal behavior set for automated driving. Basic behavior components are drawn with round corners, arbitrators have sharp corners.

Now that we have developed a couple of basic behavior components, we can use them to compose the overall behavior for automated driving, starting bottom-up.

We follow a similar notation to [lauer_cognitive_2010], denoting the behavior options of an arbitrator with , using round brackets for an ordered list and curly brackets for a set of options. Basic behavior components are highlighted with ItalicNames and arbitrators with CapitalNames.

In an urban environment FollowEgoLane, CrossIntersection, ChangeLane and MergeIntoLane are possible behaviors. Typically, these have no clear and consistent priority over each other, yet the most reasonable one should be chosen. As none of the existing arbitration schemes (by priority, sequence or random) is sufficient for this task, we define a new cost-based arbitrator that selects the behavior option with the lowest expected cost. By introducing such a cost arbitrator, the decision-making concept can be extended to dynamically changing preferences.

Now, such a cost-based UrbanDriving arbitrator can be used to select among the urban behavior components:

The cost-based arbitrator uses a behavior-agnostic cost estimation module to estimate the expected average travel velocity, incorporate routing costs and penalize lane changes.

As discussed in section III-C lane changes in dense traffic can be decomposed into three stages. As a result, a sequence-based arbitrator is used to compose MergeIntoLane:

Similar to the urban case, highway behavior options are combined using a cost-based arbitrator:

In case of Parking at most one option — parking in or out — is feasible after all, such that a trivial priority-based arbitrator can be used:

The emergency maneuvers for unavoidable collisions are grouped together using a cost-based arbitrator estimating the expected damage. In such a way, it chooses the option with the lowest expected damage:

Finally, these arbitrators and the SafeStop fallback are composed together to the top-most priority-based AutomatedDriving arbitrator:

The resulting arbitration graph is shown in Fig. 2.

SafeStopChangeLaneRightFollowEgoLaneChangeLaneLeftParkNearGoal0100200300400500600Time [s]
Figure 3: Behavior choices in the experiment driving the whole test track.
Figure 4: Test track running through Karlsruhe, Germany. Start and end position is a parking lot on the university campus. Tiles © 2020 Google, Map data © 2020 GeoBasis-DE/BKG.
Figure 5: Example arbitration graph, that has been used in our simulative experiments.

Iv Experiments

In this section, we show the applicability of the proposed concept to utilize a hierarchical behavior-based architecture for behavior generation in automated driving.

Iv-a Setup

The explanatory example performs basic urban driving behaviors on a simulative test track based on our real-world test route in Karlsruhe, Germany. The route, shown in Fig. 4, contains segments with speed limits of , and , is crossing or turning at 12 intersections, traversing one roundabout and ends at a parking lot.

We use the ROS-based open-source simulation framework CoInCar-Sim 

[naumann_coincar-sim_2018]. One great advantage of this framework is that it provides the same interface as our test vehicle Bertha [ziegler_making_2014]. Hence, we can develop, test and deploy the same behavior and planning pipeline in CoInCar-Sim and Bertha.

Our basic example maneuvers for this track are: ParkNearGoal, FollowEgoLane, ChangeLane (one instantiation for left, another for right lane changes) and SafeStop. Lane following and both lane change behaviors are combined within a cost-based UrbanDriving arbitrator. Whereas parking, urban driving and the safe stop fallback constitute the overall behavior using a priority-based AutomatedDriving arbitrator. Fig. 5 illustrates this arbitration graph.

This design has the following motivation. ParkNearGoal is only applicable in the vicinity of the goal and a nearby parking lot. Thus, as long as the ego vehicle is still on the route FollowEgoLane is and ChangeLaneLeft or ChangeLaneRight might be applicable. UrbanDriving will select the most promising one, w.r.t. the expected average velocity, routing costs and lane change penalties. As soon as the vehicle approaches its goal, FollowEgoLane will bring it to a stop within the last lanelet. Then, ParkNearGoal will become applicable, chosen by priority and lead the car into its parking lot. When the parking maneuver is finished, ParkNearGoal will render inapplicable again. At that point also none of the UrbanDriving behaviors is applicable any more because the car has left the route. As a result AutomatedDriving selects the lowest priority behavior SafeStop. This is a good illustration of how the fallback behavior prevents undefined states and keeps the vehicle in a safe position.

Iv-B Results

Fig. 3 shows the resulting behavior selection over time. The whole route takes 9:40min and features the expected behavior characteristics. The vehicle starts leaving the campus area by following the lane. At intersection , it changes to the right lane in order to take a turn into a north-east direction. At point , it takes another right turn following the ego lane and has to merge into traffic on the left lane. When approaching the next intersection , the ego vehicle changes onto the exit lane in order to turn into south-east direction. At it approaches and passes the roundabout .

Fig. 6 shows the two applicable behavior options at point , where the route leads onto the “Adenauerring” again. The route continues with a right turn from the rightmost lane, while the ego is on the leftmost lane still. This is a suitable scenario to explain the cost-based arbitration in detail. The urban driving cost estimate incorporates the average expected travel velocity, routing costs and penalizes lane changes:

depicts the expected average velocity of this maneuver. Each of the needed lane changes to follow the route after this command is charged with . Lane change behaviors themself are penalized with a lower . Hence, the arbitrator generally prefers the follow lane behavior as long as it matches the route. As soon as one or multiple lane changes will be necessary, this maneuver will become more favorable.

At point , the behaviors have these costs:

Consequently the cost-based arbitrator chooses ChangeLaneRight, which has lower cost than FollowEgoLane.

An interesting part is directly after taking the right turn at point from to . Here, the vehicle performs two consecutive lane changes in order to pass this two-lane road from the rightmost lane to the exit lane. This is especially noteworthy, as no double lane change or other hand-crafted behavior has been defined for such a scenario. The behavior emerges purely because the routing has been incorporated into the cost estimate.

The road leads back to the campus again, where the vehicle comes slows down and stops at the end of the route. Finally, the parking behavior becomes active and brings the car into its parking lot. After finishing the parking maneuver, the safe stop behavior is the last suitable option and keeps the car at a standstill.

Figure 6: FollowEgoLane and ChangeLaneRight maneuver corridors at point . The route continues to the right at this point. As a result, FollowEgoLane corridor ends in , while the ChangeLaneRight corridor has a length of .

V Conclusions and Future Work

In this publication, we presented the following contributions:

An extension to the hierarchical behavior-based arbitration concept introduced in [lauer_cognitive_2010]. We introduced a cost-based arbitration scheme that is helpful when multiple behavior options are applicable but have no clear and consistent priority against each other.

We have formulated a behavior generation stack for automated driving based on the hierarchical behavior-based arbitration scheme. It consists of maneuvers for urban and highway environments, contains parking and emergency behaviors, and prevents undefined states with a fallback safe stop behavior.

We have shown the usefulness and applicability of our design in an explanatory evaluation on a simulative route.

The key advantages of the approach are:

  • Scenario-specific solutions can be combined easily.
    In the experiments, five different behaviors have been employed to handle various scenarios, from four-way intersections, T-junctions, a roundabout to multilane bypass roads and parking.

  • It supports different planning approaches.
    We utilized two different trajectory planners in our experiments. Urban corridor-based maneuvers activated an optimization-based planner similar to [ziegler_trajectory_2014], while the parking maneuver used an RRT* motion planner to generate Hybrid Curvature trajectories [banzhaf_footprints_2018].

  • The resulting behavior can be well explained.
    The strongly modular design significantly improves understandability compared to FSMs or classical behavior-based systems. Each invocation condition can be well understood; the selection logic of arbitrators is comprehensive. As a result, the hierarchical decision-making process can be well explained and traced over time.

  • It can be iteratively extended by more behaviors.
    In order to add the parking behavior to our behavior generation, the definition of invocation and commitment conditions for parking was sufficient to add it to the AutomatedDriving arbitrator. Thanks to the strong decoupling, no changes to any other behavior component were necessary.

  • The modularity supports robustness and efficiency.
    As each of the behavior components is self-contained, such that occurring failures are contained as well and do not affect the overall system stability. In case of a failure, the system will degrade seamlessly by ignoring this behavior option. Furthermore, the atomic structure allows to evaluate behavior options in parallel to increase efficiency. Strong modularity has many more advantages, among others, reusability and maintainability.

  • Complex behavior emerges from simple components.
    Complex system behavior, as multiple consecutive lane changes to approach an exit lane, emerges from the arbitration scheme without the need for hand-crafted decision or planning logic.

These benefits have led to a smooth development process with promising results, as outlined in section IV. Thus, we look forward to further enhance the numerous existing behavior components, extend the behavior stack by e.g. our MIQP approach for cooperative zip merges [burger_cooperative_2018] and most excitingly to integrate this stack on our test vehicle Bertha.