Modeling Multi-Vehicle Interaction Scenarios Using Gaussian Random Field

06/25/2019 ∙ by Yaohui Guo, et al. ∙ Carnegie Mellon University 0

Autonomous vehicles (AV) are expected to navigate in complex traffic scenarios with multiple surrounding vehicles. The correlations between road users vary over time, the degree of which, in theory, could be infinitely large, and thus posing a great challenge in modeling and predicting the driving environment. In this research, we propose a method to reproduce such high-dimensional scenarios in a finitely tractable form by defining a stochastic vector field model in multi-vehicle interactions. We then apply non-parametric Bayesian learning to extract the underlying motion patterns from a large quantity of naturalistic traffic data. We use Gaussian process to model multi-vehicle motion, and Dirichlet process to assign each observation to a specific scenario. We implement the proposed method on NGSim highway and intersection data sets, in which complex multi-vehicle interactions are prevalent. The results show that the proposed method is capable of capturing motion patterns from both settings, without imposing heroic prior, hence can be applied for a wide array of traffic situations. The proposed modeling can enable simulation platforms and other testing methods designed for AV evaluation, to easily model and generate traffic scenarios emulating large scale driving data.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The deployment of an autonomous vehicle (AV) on public roads requires the AV be able to interact with naturalistic driving scenarios, most of which involve multiple road users. On the other hand, public expectation that an AV shall be able to drive and merge seamlessly in complex traffic keeps growing [1]. Hence, the modeling of multi-vehicle interaction scenarios is inevitable.

Traditionally, due to the limitations of the training data sets, researchers and engineers in transportation rely on strong assumptions to keep the inference tractable. Researchers often assume some fixed number of vehicles to be considered within a prediction window [2], consider all road users using the same driving strategies [3], or simplify the systems by only simulating one-to-one interactions [4], which restrict the applicability of such models for the study of naturalistic driving systems. To alleviate these assumptions, the model shall be able to consider the interactions among vehicles and avoid the restriction of presupposing the number of vehicles involved, which is a challenging task.

To fulfil the modeling requirements, we combine Gaussian Process (GP) with Dirichlet Process (DP). GP has been proven to be effective in modeling trajectory patterns. [5] uses Gaussian vector random field to model observed trajectories. [6] constructs a Gaussian random field model on fully or partially observed trajectories to perform classification and prediction. However, GP has rarely been used for modeling multi-agent interactions. In this study, we build a multi-vehicle interaction model by combining the effectiveness of GP [7] to model high-dimensional primitive patterns and the versatility of the DP [8] as a nonparametric method to model a stochastic system in a Bayesian view. By combining these methods, we are capable of modeling a highly dynamic multi-vehicle interaction, alleviating the need to pre-specify the number of vehicles involved, the driving events considered, and the heroic assumption about the independence between vehicles in the systems. Therefore, the motion patterns learned by the proposed model are fully data-driven, hence allowing us to characterize the observed traffic scenes and generate variants of the observed driving cases based on the learned motion patterns.

Many existing multi-vehicle modeling methods are focused on trajectory prediction. Some of these approaches employ neural network based methods while some others use stochastic models.

[9] uses LSTM to model surrounding vehicles and predict their motion with the experiments limited to highway datasets. In [10]

, deep convolutional neural network is used to output predicted trajectories with associated probabilities.


uses context aware, Markovian models to describe multi-agent behavior and dynamic Bayesian networks to perform the prediction.

[12] applies Gaussian process for multi-modal maneuver recognition and trajectory prediction using regression for likelihood calculations. In modeling motion patterns for trajectory prediction, [13] shows DP-GP performs better than Markov based models, but with the experiments limited to single vehicle. DP-GP modeling has also been used in the prediction of pedestrian trajectories [14].

While prediction can be seen as a popular outcome of modeling multi-agent motion using DP-GP, we study how it can also prove useful in capturing the interaction scenarios based on traffic data and generate simulated trajectories emulating the data.

This paper proceeds as follows. In Section II, the formulation of our proposed scheme is presented, followed by the overall framework in Section II. The experiment and results are described in Section IV and the findings are discussed in Section V. Finally, the conclusion and future work is summarized in Section VI.

Ii Formulation

The multi-vehicle motion model is defined as an infinite Gaussian mixture [15] of interaction scenarios with each mixture component defined by a Gaussian Process (GP)


where are the respective mixing proportions defined using Dirichlet Process (DP) prior. The problem can be therefore divided into several parts: how to define motion patterns , how to determine the number of mixtures K, and how to infer the model parameters.

Ii-a Gaussian process motion patterns

Ii-A1 Modeling motion pattern with Gaussian process velocity vector field

We define a motion pattern as a GP that maps from position coordinates to velocity, as shown in Fig 1.

where is the region of interest.

Fig. 1: Gaussian process mapping from position to velocity vector field in

The velocity field information within a small region is expected to be consistent– a property that motivated the exploitation of GP to capture the consistency. A GP is defined as a collection of random variables, whose arbitrary subset has Gaussian distribution

[7]. A GP motion pattern here models the velocity as Gaussian random variables. Here, and are assumed to be independent for simplicity. For concise representation, we shall use indicator to avoid writing equations for both and directions— for example, the velocity at is written as or simply, .

A GP is fully specified by its mean function and covariance function:

and is written as


where is any position coordinate in . The covariance is defined using squared exponential function as follows:



is the variance of the

component speed; and are the characteristic length-scale parameters, the inference of which is discussed in detail in section III.

The observed data is assumed to have an additive independent identically distributed Gaussian noise with zero mean and variance . As such, the covariance function for the noisy observed velocities and becomes

where is the Kronecker delta function defined as

Each observation , referred to as a frame, is a sample from a time-series data and is a 2-dimensional representation of a given . Each frame is considered to have the position information and the corresponding velocity information of all the vehicles observed in at that time. In vector form, the observed data is given by . Here is the observed velocity at where denotes the th element of a vector. Here, are also vectors of the observed data of all agents in frame . Similarly, we write the testing data as , where is unknown.

By definition of GP, the output has a joint Gaussian distribution given by:

where and are the prior distribution of the velocity. Conditioned on the observation , the posterior distribution of is still Gaussian




Ii-A2 Multi-vehicle trajectory generation from motion patterns

In order to calculate the likelihood of a GP motion pattern given frame , we need to specify how generates . We model this procedure in three-steps: drawing the number of agents , the location of the agents and the velocity respectively.

Let be the number of observations with agents observed, , and assign the weights to , where is the distribution concentrated at a single point , the empirical distribution of the number of vehicles is as follows:

To draw the location of each car independently for the empirical population of cars over

, we shall use a mixture of uniform distributions to fit the location distribution from

. First, we discretize into disjoint bins , such that for any and ; then we account the number of agents appeared in each bin , and assign a weight   to each bin . We have:

where is a uniform distribution over bin .

For sampling the velocity for each agent from , similar to the notation in (2) in which we use , we have

The likelihood of motion pattern given observation therefore is


For a given dataset, the empirical distributions and are implicitly defined by the data. However, the discussed formulation enables the model to scale to data generation platforms where the distributions are expected to be explicitly defined.

Ii-B Dirichlet Process Mixture of Motion Model

The proposed model considers the dataset as generated by an infinite mixture of motion patterns as shown in (1). Since the total number of the motion patterns is not known, we give a Dirichlet Process (DP) prior mixture weight. A DP is a distribution over distributions with infinite components. In our case, however, since the number of observations is finite, only finitely many components will be discovered from the data.

Fig. 2: Dirichlet process mixture of motion model

An indicator variable is introduced where means the frame is associated with latent motion pattern . The predictive distribution of conditioned on the other motion patterns , where , is

where is the concentration parameter and is the point mass at

. Then the prior probability of

belonging to an existing motion pattern or an unseen motion pattern is given by


where is the number of observations currently assigned to and .

Combining the likelihood from (6) and prior from (7), we have the posterior distribution of as


The integration calculates the likelihood of all the motion patterns over the base measure given observation .

Fig. 3: DP-GP mixture model simulation of traffic scenarios

Iii Framework

In order to find a posterior motion pattern mixture, we use Gibbs sampling to infer the parameters of the model. For every iteration of Gibbs sampling, the model parameters and the mixture assignment of frames into motion patterns are updated.

Iii-a Mixture model assignment

The assignment of all frames from is performed according to (8).

The likelihood for assigning frame into existing pattern defined in (6) is computed using the GP posterior from (4), with the training data now given by , which is the vector form of the data of the frames clustered under and the testing data of frame

where and hold the same definition as in (5

). A maximum a posteriori estimation is then performed across all the motion patterns

to identify the assignment

For assignment of frame under a new, unseen pattern , Monte-Carlo (MC) integration is used to approximate the likelihood integral . Each MC iteration samples a new motion pattern using priors of model parameters and computes the likelihood using the GP prior given as

where and are set to the data mean and variance respectively, and is sampled using the prior defined later in (9)

Iii-B Model parameters

The length scale parameters and from the exponential covariance calculation in (3) are given vague gamma prior


where shape factor and scale factor are constants. The posterior calculation of uses likelihood given by the GP prior of the frame data assigned under motion pattern . The parameters are therefore updated by re-sampling from the posterior given by

For the concentration parameter , similar to [15], an inverse gamma prior is chosen and is updated by re-sampling from the posterior distribution given by

The inference algorithm is summarized in Algorithm 1.


for Gibbs sampling iterations do

       Update Mixture Assignment ():
             for frames i = 1,2,…,N do
                   for motion patterns k = 1,2,…,K do

                   end for
             end for
       Update Model Parameters ():
             for motion patterns k = 1,2,…,K do
             end for
end for
Algorithm 1 Inference

After the Gibbs sampling iterations, the posterior mixture model is used to generate simulated trajectories as shown in Fig 3. Given a test frame, a motion pattern assignment is performed using this mixture which is used to generate the GP posterior mean velocity field. The velocity field defines the multi-vehicle trajectory simulation on the test frame.

Iv Experiment And Results

Iv-a Experiment Setup

For evaluating the proposed motion model, a real world traffic dataset collected as part of Federal Highway Administration’s (FWHA) Next Generation SIMulation (NGSIM) project [16, 17], providing detailed multi-vehicle trajectory data as a time-series sequence, is chosen. The velocity information as and components is derived from this trajectory data. The model is evaluated on two traffic settings- highway dataset collected on a segment of the US Highway 101 (Hollywood Freeway) in Los Angeles, and intersection dataset collected on Lankershim Boulevard at Universal Hollywood Dr. in Los Angeles.

The inference algorithm is run for 100 Gibbs sampling iterations and is executed using parallel computing on a 44-core computer processor. The parameters and are chosen for gamma prior of the length scale parameters. The variance for the additive Gaussian noise is set to 0.04.

Iv-B Highway Traffic Scenarios

The highway dataset is down-sampled to 1000 frames of time-sequence data with discretization of 0.5s. The mixture model resulted in 99 motion patterns being extracted from the data. The mixture proportion in a decreasing order is presented in Fig 4.

Fig. 4: DP-GP mixture proportion from: 4 Highway dataset 4 Intersection dataset

To generate the simulated multi-vehicle trajectories from the motion pattern results at the end of the Gibbs sampling iterations, a test frame is randomly chosen from the dataset. A motion pattern from the mixture is then assigned to the frame according to the assignment procedure discussed in section III. The derived mean GP velocity field is imposed on the vehicle distribution present in the to run the simulation for that interaction scenario, and is presented in Fig 5. To illustrate the clustering, the original observations assigned under the same motion pattern are also included (with the vehicle velocity vectors shown) in the figure.

Fig. 5: Highway dataset: 5 Mean GP velocity field of motion pattern 1 5 Motion pattern 1 based multi-vehicle trajectory simulation of test frame 5 Frame 151: observation clustered under motion pattern 1 5 Frame 958: observation clustered under motion pattern 1

Iv-C Intersection Traffic Scenarios

The results seen in the case of highway dataset are reproduced for the intersection dataset. The dataset is down-sampled to 600 frames of time-sequence data with discretization of 0.5s. The posterior mixture model consists of 86 motion patterns whose mixture proportion in a decreasing order is presented in Fig 4.

A simulated trajectory with the motion pattern vector field and the data observations are presented in Fig 6 similar to the results seen from the highway dataset.

Fig. 6: Intersection dataset: 6 Intersection image 6 Mean GP velocity field of motion pattern 1 6 Motion pattern 1 based multi-vehicle trajectory simulation of test frame 6 Frame 45: observation clustered under motion pattern 1 6 Frame 399: observation clustered under motion pattern 1
Fig. 7: Intersection dataset: Mean GP velocity field of 5 motion pattern 5 5 motion pattern 19 5 motion pattern 64 5 motion pattern 85

V Discussion

V-a Result analysis

The results from both the datasets demonstrate that the proposed model when applied to large time-series data sequences, extracts the underlying motion patterns which can be used to represent the interaction scenarios.

It can be noted in case of both the datasets that the frame indexes of the observation (training) frames clustered under the presented motion patterns are far apart in the data sequence. In the highway dataset, the frame indexes in 5 and 5 show that these observations are over 800 frames i.e. over 400s apart. The frame indexes from the intersection results in Fig 6 also show the same effect.

The motion pattern from highway dataset presented in Fig 5 indicates an interaction scenario with vehicles in some parts of the (especially the rightmost lanes) moving faster than the others. This can be noted from the observations 5 and 5 and the same is also reflected in the simulated trajectory 5.

For the intersection based , the GP mean velocity field of the generated motion pattern presented in Fig 6 and the corresponding simulated trajectory 6 indicate a motion scenario where some of the vehicles oncoming from one direction are seen to take a left turn at the intersection while others continue straight with vehicles from the other direction standing still. This is a scene that the model generated on the test frame after learning the motion pattern from the data and presents an outcome for. The GP velocity field and simulated trajectory results show that the model learns the road physical layout from the data by exhibiting almost non-existent probability of the posterior vector field outside the road boundary. The model is also seen to learn lane information like which lanes correspond to a left turn, without having any explicit information of the road layout.

While the highway dataset has been clustered into motion patterns as expected, it offers little insights into the semantic visualization of the results due to the vehicle interactions being limited to motion in only one direction. In that sense, the intersection results offer better diversity based on the interactions involving vehicles’ motion in multiple directions. Illustrating this, further motion patterns generated from the intersection dataset are presented in Fig 7. While patterns 7 and 7 present the interaction scenarios with vehicles from both sides travel straight, 7 almost exclusively captures the left turning motion of the vehicles from one direction. 7 presents a more complex scenario with vehicle motion flow in many directions.

In the mixture proportion presented in Fig 4, the tail end of the plot indicates few motion patterns generated from only one frame i.e., each of those frames clustered as a single motion pattern. From the intersection results in Fig 4, there exist fewer such patterns in proportion to the total number of patterns. This could be due to the larger speeds of the vehicles on the highway which strongly affects the correlation between sequential frames due to the updated vehicle distribution in the . In the case of the intersection, the relatively lower speeds result in having the sequential frames more likely to be clustered under the same motion pattern, possibly in addition to other frames elsewhere in the time-sequence.

V-B Limitations

The primary limitation of this work lies in the inference of the mixture model using Gibbs sampling. A termination criteria is not explicitly available, especially because of the unsupervised nature of the problem, due to which it is difficult to come up with a suitable number of iterations. An evaluation of the resultant mixture assignment could be defined to determine the convergence.

Furthermore, in a dataset involving a more complex intersection, modeling all the multi-vehicle interactions with a single Gaussian process might overly marginalize the true velocity information of the data. Future research shall employ multiple Gaussian processes possibly conditional on vehicle direction of motion and other information to model each interaction scenario. Also, the true road layout information is not included in the model. Although the results show that the model has learnt these boundaries from the data, the generated simulation trajectories can be treated with higher confidence if the road boundary and other traffic rules based information are embedded into the model wherever available.

Vi Conclusion and Future Work

In this work, we formulate a model for multi-vehicle interaction scenarios using Gaussian process, a mixture of which is generated from naturalistic data by using non-parametric Bayesian learning. By employing Dirichlet Process as the mixture of the motion model, we are able to alleviate the restriction on the number of motion patterns existing in the dataset, allowing the model to be fully data-driven. The experiment results using NGSim datasets demonstrate the extracted multi-vehicle interactions as motion patterns, capable of capturing the highly dynamic scenes from highways and intersections. This result allows modelers to extract multi-vehicle interaction scenarios efficiently from large-scale data, which can further be used for simulating complex traffic scenes, predicting the trajectories of vehicles in multi-vehicle systems, and evaluating the safety of AV when interacting with human driven vehicles in a complex driving situations.


Toyota Research Institute (“TRI”) provided funds to assist the authors with their research but this article solely reflects the opinions and conclusions of its authors and not TRI or any other Toyota entity.