Interpretable Modelling of Driving Behaviors in Interactive Driving Scenarios based on Cumulative Prospect Theory

by   Liting Sun, et al.
berkeley college

Understanding human driving behavior is important for autonomous vehicles. In this paper, we propose an interpretable human behavior model in interactive driving scenarios based on the cumulative prospect theory (CPT). As a non-expected utility theory, CPT can well explain some systematically biased or "irrational" behavior/decisions of human that cannot be explained by the expected utility theory. Hence, the goal of this work is to formulate the human drivers' behavior generation model with CPT so that some "irrational" behavior or decisions of human can be better captured and predicted. Towards such a goal, we first develop a CPT-driven decision-making model focusing on driving scenarios with two interacting agents. A hierarchical learning algorithm is proposed afterward to learn the utility function, the value function, and the decision weighting function in the CPT model. A case study for roundabout merging is also provided as verification. With real driving data, the prediction performances of three different models are compared: a predefined model based on time-to-collision (TTC), a learning-based model based on neural networks, and the proposed CPT-based model. The results show that the proposed model outperforms the TTC model and achieves similar performance as the learning-based model with much less training data and better interpretability.



page 1

page 2

page 3

page 4


Generic Prediction Architecture Considering both Rational and Irrational Driving Behaviors

Accurately predicting future behaviors of surrounding vehicles is an ess...

Human-Like Decision Making for Autonomous Driving: A Noncooperative Game Theoretic Approach

Considering that human-driven vehicles and autonomous vehicles (AVs) wil...

A Learning-based Stochastic Driving Model for Autonomous Vehicle Testing

In the simulation-based testing and evaluation of autonomous vehicles (A...

Socially-Compatible Behavior Design of Autonomous Vehicles with Verification on Real Human Data

As more and more autonomous vehicles (AVs) are being deployed on public ...

Expressing Diverse Human Driving Behavior with Probabilistic Rewards and Online Inference

In human-robot interaction (HRI) systems, such as autonomous vehicles, u...

On Social Interactions of Merging Behaviors at Highway On-Ramps in Congested Traffic

Merging at highway on-ramps while interacting with other human-driven ve...

CARPAL: Confidence-Aware Intent Recognition for Parallel Autonomy

Predicting the behavior of road agents is a difficult and crucial task f...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Modelling interactive behavior of human drivers is extremely important to enable safe and full autonomy for vehicles. It not only can facilitate better prediction of human drivers’ intentions and motions, but also can serve as valuable asset to generate more human-like decisions and trajectories for autonomous vehicles.

In the past decade, a great amount of effort has been devoted to driver behavior modelling, e.g., [1, 2, 3]. Most of the proposed methodologies can be categorized into three groups: 1) predefined models, 2) learning models, and 3) utility-driven models. Predefined models generate driving behavior based on IF-THEN rules [4], or selected key indices such as time-to-collision (TTC) and time-to-intersection (TTI) [5], or some analytical functions dedicated to describe the behavior in specific scenarios. Examples include the intelligent driver model (IDM) [6] for car following and the minimizing overall braking model for lane changing (MOBIL) [7]. Such predefined models are highly interpretable, i.e., explicit physical meanings can be found for all the model structures, variables and parameters. However, these models typically require lots of manual work in designing structures and tuning parameters, which can be overwhelming tasks when the amount of data is large.

Learning models generate driving behavior based on trained machine-learning models. They can be either discriminative models such as support vector machines (SVM)

[8] and mixture density network (MDN) [9]

, or generative models such as hidden Markov models (HMM)

[10], generative adversarial networks (GAN) [11, 12] and variational auto-encoder (VAE) [13, 14, 15]. Compared to the predefined models, such learning models can better approximate the complicated distributions of human behavior in massive driving data without manual tuning of model parameters. However, they also suffer from several fundamental problems. First, most of the learning models, particularly the deep networks, are data-hungry. With a relatively small amount of data, they can hardly achieve satisfactory performance. Even with sufficient amount of data supplied, they still suffer from a second problem: the lack of causality and interpretability of the learned behavior model. Consequently, it is hard to efficiently generalize them to new scenarios such as those with a varying number of agents or new driving maps.

Utility-driven models come from the theory of mind (TOM) [16]. A key feature of these models is that they leverage the fact that human drivers are not random agents, but agents who optimize some utility functions. Hence, they assume that human drivers try to make decisions or plan trajectories that maximize their utilities (or minimize the costs). Such assumption is often known as the Boltzmann noisily rational model [17]. Stemming from TOM, utility-driven models provide causality inherently, and are more interpretable since all the utilities and constraints are associated with explicit physical meanings, as mentioned in [18]

. In order to infer the various utility functions of human from actual driving data, inverse optimal control (or inverse reinforcement learning (IRL))

[19, 20, 21] has been well adopted. For instance, [22] use IRL to model the courteous behavior. [23] and [24] use it for probabilistic reaction prediction under social interactions. Benefiting from causal and interpretable structure design, the utility-driven models are more data-efficient, i.e., a satisfactory model can be learned with a relatively small amount of data. Hence, they provide a promising balance between interpretability, model flexibility, and data-efficiency.

One remaining challenge for the utility-driven models is that, as mentioned above, most of them assume the rationality (or at least noisy rationality) of human drivers with respect to the expected utility theory (EUT). However, there have been substantial evidences in various domains contradicting such assumption. Human behavior is often found to be systematically deviating from the optimal (or rational) behavior predicted by EUT. Examples can be found as framing effect, risk-seeking behavior, loss-aversion behavior, and so on [25, 26]. In this paper, we define such systematically biased behavior from EUT as “irrational behavior”. In driving scenarios, such irrational behavior can be well observed, particularly when the drivers are interacting with each other. Under such circumstances, the traditional EUT-based utility-driven models can no longer correctly predict the human behaviors, which might cause collisions for the autonomous vehicles.

Therefore, in this paper, we aim to extend the utility-based behavior generation model to capture both the rational and irrational behavior of human drivers. Towards this goal, we reformuate the utility-based models in the framework of the cumulative prospect theory (CPT) [26] - a well-known non-expected utility theory (NEUT) that can explain many of the irrational behaviors mentioned above. Afterwards, a hierarchical learning algorithm is proposed to learn the utility function, the value function, and the decision weighting function in the developed CPT model. A case study for roundabout merging is presented with real data from the INTERACTION dataset [27, 28]. Prediction performances of three different models are compared: a predefined model based on TTC, a learning-based model based on a neural network, and the proposed CPT-based model. The results show that the proposed model outperforms the TTC model, and achieves similar performance as the learning-based method with much less training data and better interpretability.

Ii A Brief Introduction to Utility Theories

We briefly review two utility theories for modelling the decision-making process of human: the expected utility theory and the cumulative prospect theory, a non-expected utility theory.

Ii-a Expected utility theory

The expected utility theory (EUT) [29] was first introduced by Bernoulli in 1738. It approximates decision makers as maximizers of their expected utilities. Mathematically, the process can be modelled as follows.

Let be a set of possible actions/choices. With each action , define the possible state set as with for and

. The probability of each state is represented by

satisfying . Define as the function that assigns utility to each pair of state and action. Then, under each decision choice , the possible outcome profile (i.e., the prospect) can be represented by , where is the utility vector defined on the possible state set, and is the corresponding probability vector of .

The expected utility of each decision can then be written as


and decision makers choose the action that generates the maximum expected utility, i.e.,


Although the EUT has been adopted in many application domains as the dominant model to describe how individuals make decisions under uncertainties, there have been substantial evidences showing that human behavior often violates the EUT hypothesis in a systematic way such as loss aversion, risk seeking and nonlinear preferences [26, 30].

Ii-B Cumulative prospect theory, a non-expected utility theory

Many non-expected utility theories (NEUT) were developed to explain the above-mentioned behaviors which deviate from EUT. Among them, the cumulative prospect theory (CPT), proposed by Kahneman and Tversky [26], is one study that formulates many such biased or irrational human behaviors in a uniform way. Compared to the EUT in (1), CPT introduced two additional concepts in the definition of prospect : a value function defined on the utility and a decision weight function defined on the cumulative probability. Each action is evaluated by the function


where the function is a strictly increasing function, and and represent, respectively, the gains and losses of compared to a reference utility . The decision weights are defined as


where are both strictly increasing functions with , and .

Typically, the value function is convex when (gains) and concave when (losses), and it is steeper for losses than for gains. Figure 1(a) shows one example of the value function when is set as the reference utility. Many experiment studies have showed that representative functional forms for and can be written as


respectively, with and . As shown in Figure 1

(b), such decision weight functions can describe the well-observed behaviors that human tends to over-estimate the occurrence of low-probability events but under-estimate that of the high-probability ones.

Similary to EUT, the CPT model assumes that the decision makers choose the action that yields the maximum value defined in (3), i.e.,

Fig. 1: Examples of the value function and weighting function

Iii Driving Behaviour Modelling

As discussed above, although EUT has been extensively utilized to model human driving behaviors, there are substantial evidences showing that actual human behaviors often systematically deviate from it. Motivated by this, we reformulate the decision-making process of human drivers in the framework of the CPT model, aiming to capture some irrational behaviors/decisions of human drivers under uncertainties in interactive driving scenarios.

Iii-a Modelling the decision-making process via CPT

We consider the driving scenarios with two interacting drivers. Each driver has two discrete decisions/actions: yield and pass. Such scenario can be found in many urban driving circumstances such as intersections, roundabouts and ramp merging.

Throughout the paper, we refer the predicted vehicle as the target vehicle (denoted with subscript ), and the other one as the interacting vehicle (with ). Denote the action set with pass and yield as . At time , given the historical trajectories of both vehicles, , we aim to obtain an interpretable decision-making model to predict the decision of the target vehicle. Note that in interactive driving scenarios, the responses from the interacting vehicle are probabilistic in nature, which will bring uncertainties to the decision-making process of the target vehicles. Under the decision , the target vehicle has to consider the possibility of the interacting vehicle not yielding, which might force the target vehicle to brake and fail to pass. For the decision , however, we can assume that it will always succeed. Hence, the prospects for and are, respectively,


where represents the probability of the interacting vehicle yielding to the target one, and and are, respectively, the yielding and non-yielding trajectories of the interacting vehicle. Similarly, and are, respectively, the passing and yielding trajectories of the target vehicle.

Set . Recalling the CPT model defined in (3)-(9), we can write the CPT values of the target vehicle under different decisions as:


Note that in (13)-(14), simplifies to .

The decision of the target vehicle is then written as


Iii-B Hierarchical learning of the model parameters

In the CPT-based decision-making model given in (13)-(14), we have many unknowns that need to be learned from data: the parameters and , the utility function , and the probability given . We propose to learn them hierarchically based on the following two assumptions:

Assumption 1: The parametrization of the utility function of the target vehicle does not change with decisions. For instance, if we assume that is a linear combination of a set of features defined on trajectories, the weights of the features will not change.

Assumption 2: When the target vehicle is evaluating the CPT value under each decision, the best achievable utilities corresponding to different responses of the interacting vehicle will be adopted. Namely, in (13)-(14), we assume , , and .

Iii-B1 Learning the utility function

We start with learning the utility function for the target vehicle. With Assumption 1, we learn from a set of decision-free trajectories of the target vehicle, so that influences of decisions on the demonstrated trajectories can be avoided. This transforms the learning of the utility function into a typical IRL problem. We assume that is a linear combination of a set of selected features defined over with a horizon length of :


The goal is to find the weights which maximizes the likelihood of the demonstration set :


With the principle of maximum entropy

[20], trajectories with higher utilities are exponentially more likely:


Thus (17) becomes


To solve (19), we use the continuous-domain IRL algorithm proposed in [21]. One can refer [21] for details.

Iii-B2 Evaluating the utilities and probabilities

Once the utility function is obtained, we can generate the best achievable utilities under different decisions. Based on Assumption 2, utilities under different decisions are generated as follows:

  • describes the utility when the target vehicle passes, and the interacting vehicle yields. It can be approximated by with . Intuitively, this utility is equivalent to the best achievable utility as if the interacting vehicle was not there since it would yield to the target vehicle.

  • , on the other hand, describes the utility when the target vehicle passes but with a non-yielding interacting vehicle. Under this situation, the target vehicle might have to brake and terminate the action of passing. Therefore, we set as if the interacting vehicle was maintaining its initial speed. is calculated as where the first part is the first steps of an optimal passing trajectory, while the second part is a braking trajectory in order to avoid collision with the interacting vehicle. The maximum value of is found via boundaries on deceleration. Hence, the corresponding utility in this situation is given by .

  • is the utility if the target vehicle chooses to yield. For this scenario, it does not matter whether the interacting vehicle yields or not. We can directly solve for the optimal trajectory for the target vehicle with additional constraints on its trajectories. For instance, we can set an upper bound for the achievable zones of its trajectories to force it to yield 111The settings of the upper bound differ depending on scenarios. For interactions and roundabouts, the upper bound comes from the traffic-rule maps such as the locations of stop bars. For ramp merging, the upper bound can be draw from the trajectory of the interacting vehicle. We will explain more details about this in the case study.. Hence, with constraints in the form of .

Apart from the utilities, we also need to find an objective probability variable that can quantify approximately how the interacting vehicle will respond if the target vehicle was to take the pass action, i.e., approximating in (13) given historical observations . Inspired by the predefined models, we use the TTC to approximate it. Define the TTC gap between the two interacting vehicles as . We assume that is higher if is lower:


Iii-B3 Learning the value function and decision weighting function

With the acquired utilities and probabilities, the next step is to formulate a learning problem to find the unknown parameter in the value function as well as in the decision weighting function in (13)-(14). To achieve this goal, we again adopt the principle of maximum entropy to convert the decision selection process in (15) as a soft-max problem:


where and represent the probabilities of choosing action and , respectively. Given a set of interactive trajectories with labelled decisions for the target vehicle, denoted by , we can formulate the learning of and

as a nonlinear logistic regression problem with the loss function as:

where if and otherwise. and are the evaluated probabilities as in (21)-(22) on the -th pair of interactive trajectories based on (13)-(14) and (20).

The optimal and can be found via


With the three steps described above, all the unknowns in the CPT model in (13)-(14) can be obtained.

Iv Case Study

Iv-a A driving scenario: roundabout

To evaluate the performance of the proposed approach, we select a roundabout merging scenario from the INTERACTION dataset [27, 28]. As shown in Figure 2(a), the roundabout has 6 branches and each branch has two directions (both in and out). We selected the interactive motions of two cars at the left-most branch (Figure 2(b)) because there is no enforced stop signs at this branch before merging into the roundabout. This makes the interaction more intensive, and consequently creating more challenging problems.

We define the merging-in vehicle (the blue one in Figure 2(b)) as the target vehicle, and the one already in the roundabout as the interacting vehicle (the red one in Figure 2(b)). Based on a period of historical data on both vehicles, different driving behavior models try to predict whether the red target vehicle will decide to merge in front of the interacting vehicle in blue (i.e, the target vehicle passes), or wait to merge in until the blue car passes (i.e., the target vehicle yields).

Fig. 2: The map of the roundabout and one example pair of interactive trajectories. Red stars: the target vehicle; Blue circles: the interacting vehicle.

We use the Frenet frame [2] to represent the trajectory coordinates of each vehicle. Reference paths of the map are shown in Figure 3(a). To capture the relationships between the two cars on the longitudinal direction, we set the crossing point of the reference paths of the two interactive cars as their shared reference point. Before the crossing point, the longitudinal coordinates of both cars are negative. Once passed the crossing point, both longitudinal coordinates become positive. One example of the interactive trajectories in the defined Frenet frame can be found in Figure 3(b).

Fig. 3: The reference paths (a) and trajectories in Frenet frame (b). The crossing points on the pair of reference paths define the common reference zero points for two interactive cars.

Iv-B Comparison models

We compared the decision prediction performance among three different models: 1) a predefined TTC rule-based model, 2) a learning-based neural network model, and 3) the proposed CPT model. A brief introduction of each model is given below.

Iv-B1 The TTC rule-based model

The TTC rule-based model uses the TTC as an indicator to predict which car will go first between the two interactive cars. Given the trajectories of each car in Frenet frame as shown in Figure 3(b), the TTC can be easily calculated via


where represents the longitudinal length from the current location of the cars at time to the collision point along the reference paths. is the current speed of the cars.

As discussed in (20) in Section III, we calculate the soft-max probability of the target car passing via


Iv-B2 The learning-based neural network model

The learning-based model we used is based on neural networks (NNs). The input is a period of historical trajectories of the two interacting vehicles in Frenet frame. The first layer is a long short-term memory (LSTM) cell with 16 neurons, followed by two fully connected layers, and each with 8 neurons. Afterwards, a


nonlinear activation layer is applied, with a the softmax layer as the final layer to output the classification results. In order to avoid over-fitting, we applied the drop-out technique to the fully connected layers with a dropout rate of 0.5 and added a L2 regularization term to the original cross-entropy loss function.

Iv-B3 The proposed CPT model

In the proposed CPT model, we have selected four features in the utility function. They are defined as:

  • Speed feature ;

  • Acceleration feature ;

  • Jerk feature ;

  • Safety feature .

Note that all the variables , and can be written as linear functions of the trajectories of the target vehicle based on backward differentiation.

As for the calculation of key utilities in (13)-(14), examples of the corresponding trajectories for the utility evaluation are shown in Figure 4. The ground truth interactive trajectories are shown in Figure 4(a) with red for the target vehicle and blue for the interacting vehicle. Figure 4(b) shows the optimal yielding trajectory of the target vehicle (cyan) and the ground truth trajectory of the interacting vehicle (blue). Figure 4(c) and Figure 4(d), respectively, show the trajectories of the target vehicle (cyan) under a passing decision with a non-yielding and yielding interacting vehicles. If the interacting is not-yielding, we assume that it will maintain its initial speed, as shown in green in Figure 4(c). In this case, the target vehicle is forced to brake. On the other hand, with a yielding interacting vehicle, the optimal passing trajectory of the target vehicle is shown in Figure 4(d).

Fig. 4: An example of the trajectories used for utility calculation under different decisions and different responses of the interacting vehicle: (a) the ground truth trajectories (red: the target car; blue: the interacting car); (b) the optimal trajectory of the target car (cyan) under the decision of yielding (), and the ground truth of the interacting car (blue); (c) the forced braking trajectory of the target car (cyan) under a passing decision but with a non-yielding interacting car (green). The virtual trajectory of the interacting car is assumed to maintain its initial speed; (d) the optimal trajectory of the target car (cyan) under a passing decision with a yielding interacting car.

Iv-C Experiment results and discussion

We discuss the experimental results in two aspects: prediction performance comparison among the three models, and the interpretability of these models.

Iv-C1 Comparison of the prediction performance

We trained and tested all three models on a dataset containing 67 pairs of interacting trajectories with a sampling frequency of 10Hz. To learn more generalized results, we slice the trajectories into frames with a fixed length using moving windows. Each frame contains the trajectories in 1s. Thus, all 67 pairs of interacting trajectories generate 2680 frames. To achieve better performance for the learning-based model, we have conducted two sets of experiments for the training of the neural network:

  • Experiment 1: randomly shuffle all the trajectory pairs and select 80% of them for training and the other 20% for testing. The success rate 222Success rate is defined as the percentage of correct predictions among all test examples. is 65% for testing.

  • Experiment 2: directly shuffle all frames for the neural network and randomly select 80% for training and 20% for testing. The success rate is 97% for testing.

The large discrepancy between the testing accuracies of the two experiments with the NN model is mainly due to the over-fitting problem cause by the data insufficiency. In experiment 1, it showed that the NN model learned on 80% of the trajectory pairs cannot be well generalized to other interaction pairs.

We list the success rates for prediction from all three models in Table I. It shows that the proposed CPT model outperformed the TTC model and the NN model in experiment 1, and it achieved similar performance as the NN model in experiment 2. Moreover, both the TTC model and the proposed CPT model are more data-efficient for similar achievable performance.

Success rates
TABLE I: Comparison of the success rates in three models

Iv-C2 Interpretability of the CPT model

In the CPT model, the parameters we have learned via the nonlinear logistic regression are


With the optimal , the learned decision weighting function is shown in Figure 5. We can see that the CPT model indeed captured the human choice patterns that events with low probabilities will tend to be overestimated, while high-probability events are often underestimated. Such results are consistent with many studies about human behavior in other domains such as economics, investment and waiting paradox problems.

Fig. 5: The learned decision weighting function (red curve)

V Conclusion

In this paper, we proposed an interpretable and irrationality-aware human behavior model in interactive driving scenarios based on the cumulative prospect theory (CPT). To learn the model parameters from real driving data, a hierarchical learning algorithm was also developed, in which inverse reinforcement learning and nonlinear logistic regression were combined. Comparison studies were conducted among three different models: a predefined TTC model, a neural network (NN) based learning model, and the proposed CPT model. The results showed that the proposed CPT model outperformed the TTC model in terms of prediction accuracy. Similar performance was achieved by the CPT model as the NN model, but with much less amount of data. Moreover, the learned parameters of the CPT model have explicit and interpretable physical meanings, which matched the observations of the human behavior in many domains.


  • [1] N. AbuAli and H. Abou-zeid, “Driver Behavior Modeling: Developments and Future Directions,” International Journal of Vehicular Technology, 2016.
  • [2] W. Wang, J. Xi, and H. Chen, “Modeling and Recognizing Driver Behavior Based on Driving Data: A Survey,” Mathematical Problems in Engineering, 2014.
  • [3] M. Rahman, M. Chowdhury, Y. Xie, and Y. He, “Review of Microscopic Lane-Changing Models and Future Research Opportunities,” IEEE Transactions on Intelligent Transportation Systems, vol. 14, no. 4, pp. 1942–1956, Dec. 2013.
  • [4] M. McDonald, J. Wu, and M. Brackstone, “Development of a Fuzzy Logic based Microscopic Motorway Simulation Model,” in Proceedings of Conference on Intelligent Transportation Systems.   Boston, MA, USA: IEEE, 1997, pp. 82–87.
  • [5] R. J. Kiefer, J. Salinger, and J. J. Ference, “Status of NHTSA’s Rear-End Crash Prevention Research Program,” June 2005.
  • [6] M. Treiber, A. Hennecke, and D. Helbing, “Congested Traffic States in Empirical Observations and Microscopic Simulations,” Physical Review E, vol. 62, no. 2, pp. 1805–1824, Aug. 2000.
  • [7] A. Kesting, M. Treiber, and D. Helbing, “General Lane-Changing Model MOBIL for Car-Following Models,” Transportation Research Record, vol. 1999, no. 1, pp. 86–94, Jan. 2007.
  • [8] G. S. Aoude, V. R. Desaraju, L. H. Stephens, and J. P. How, “Driver Behavior Classification at Intersections and Validation on Large Naturalistic Data Set,” IEEE Transactions on Intelligent Transportation Systems, vol. 13, no. 2, pp. 724–736, June 2012.
  • [9] Y. Hu, W. Zhan, and M. Tomizuka, “Probabilistic Prediction of Vehicle Semantic Intention and Motion,” in 2018 IEEE Intelligent Vehicles Symposium (IV), June 2018, pp. 307–313.
  • [10] J. Li, W. Zhan, and M. Tomizuka, “Generic Vehicle Tracking Framework Capable of Handling Occlusions Based on Modified Mixture Particle Filter,” in 2018 IEEE Intelligent Vehicles Symposium (IV), June 2018, pp. 936–942.
  • [11] J. Li, H. Ma, W. Zhan, and M. Tomizuka, “Coordination and Trajectory Prediction for Vehicle Interactions via Bayesian Generative Modeling,” in IEEE Intelligent Vehicles Symposium, 2019.
  • [12] A. Gupta, J. Johnson, L. Fei-Fei, S. Savarese, and A. Alahi, “Social gan: Socially acceptable trajectories with generative adversarial networks,” in

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    , 2018, pp. 2255–2264.
  • [13] Y. Hu, W. Zhan, and M. Tomizuka, “A Framework for Probabilistic Generic Traffic Scene Prediction,” in 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Nov. 2018, pp. 2790–2796.
  • [14] Y. Ma, X. Zhu, S. Zhang, R. Yang, W. Wang, and D. Manocha, “Trafficpredict: Trajectory prediction for heterogeneous traffic-agents,” arXiv preprint arXiv:1811.02146, 2018.
  • [15] Y. Hu, W. Zhan, L. Sun, and M. Tomizuka, “Multi-modal probabilistic prediction of interactive behavior via an interpretable model,” in Proceedings of the IEEE Intelligent Vehicle Symposium (IV2019), 2019.
  • [16] D. Premack and G. Woodruff, “Does the chimpanzee have a theory of mind?” Behavioral and brain sciences, vol. 1, no. 4, pp. 515–526, 1978.
  • [17] C. L. Baker and J. B. Tenenbaum, “Modeling human plan recognition using bayesian theory of mind,” Plan, activity, and intent recognition: Theory and practice, pp. 177–204, 2014.
  • [18] K. Driggs-Campbell, V. Govindarajan, and R. Bajcsy, “Integrating Intuitive Driver Models in Autonomous Planning for Interactive Maneuvers,” IEEE Transactions on Intelligent Transportation Systems, vol. 18, no. 12, pp. 3461–3472, Dec. 2017.
  • [19] P. Abbeel and A. Y. Ng, “Apprenticeship learning via inverse reinforcement learning,” in Proceedings of the twenty-first international conference on Machine learning.   ACM, 2004, p. 1.
  • [20] B. D. Ziebart, A. L. Maas, J. A. Bagnell, and A. K. Dey, “Maximum entropy inverse reinforcement learning.” in AAAI, vol. 8.   Chicago, IL, USA, 2008, pp. 1433–1438.
  • [21] S. Levine and V. Koltun, “continuous inverse optimal control with locally optimal examples,,” in the 29th International Conference on Machine Learning (ICML-12), 2012.
  • [22] L. Sun, W. Zhan, M. Tomizuka, and A. D. Dragan, “Courteous Autonomous Cars,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Oct. 2018, pp. 663–670.
  • [23] L. Sun, W. Zhan, and M. Tomizuka, “Probabilistic Prediction of Interactive Driving Behavior via Hierarchical Inverse Reinforcement Learning,” in 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Nov. 2018, pp. 2111–2117.
  • [24] Y. Hu, L. Sun, and M. Tomizuka, “Generic prediction architecture considering both rational and irrational driving behavior,” in Proceedings of the IEEE Transportation System Conference (ITSC2019), 2019.
  • [25] D. Kahneman, “Prospect theory: An analysis of decisions under risk,” Econometrica, vol. 47, p. 278, 1979.
  • [26] A. Tversky and D. Kahneman, “Advances in prospect theory: Cumulative representation of uncertainty,” Journal of Risk and uncertainty, vol. 5, no. 4, pp. 297–323, 1992.
  • [27] W. Zhan, L. Sun, D. Wang, Y. Jin, and M. Tomizuka, “Constructing a Highly Interactive Vehicle Motion Dataset,” in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019.
  • [28] W. Zhan, L. Sun, D. Wang, H. Shi, A. Clausse, M. Naumann, J. Kümmerle, H. Königshof, C. Stiller, A. de La Fortelle, and M. Tomizuka, “INTERACTION Dataset: An INTERnational, Adversarial and Cooperative moTION Dataset in Interactive Scenarios with Semantic Maps,” 2019.
  • [29] O. RISK and D. BERNOULLI, “Exposition of a new theory on the measurement,” Econometrica, vol. 22, no. 1, pp. 23–36, 1954.
  • [30] M. Allais, “Rational man’s behavior in the presence of risk: Critique of the postulates and axioms of the american school,” Econometrica, vol. 21, no. 4, pp. 503–46, 1953.