Future urban transportation system concepts include a mixture of both autonomous and human-driven vehicles. We anticipate driverless cars on roads as well as delivery drones [choudhury2020efficient] and urban air mobility (UAM) systems with vertical take-off and landing capabilities [holden2016]. These advances add complexity to our transportation systems (Figure 1a), making decision making for autonomous vehicles operating in such dynamic systems even more challenging.
Safe and efficient transportation in a congested environment requires accurately modeling the flow of traffic in 3D space at both the macroscopic and microscopic levels. At the macroscopic level, the velocity maps estimate the average behavior of vehicles at different locations in the system, as opposed to estimating the current behavior of surrounding targets (Figure 1b). We may have large amounts of data to build such models, and we want to be able to build them efficiently. At the microscopic level, we want to model the surroundings of an individual vehicle from a continuous stream of data to allow it to maneuver safely (Figure 1c). These velocity maps estimate the behavior of vehicles surrounding ego vehicle at the current point in time. We often only have to learn models from very little data.
To provide robust control of these vehicles, it is important to capture the uncertainty associated with our velocity maps. For every point in the 3D space, for example, it would be helpful to quantify both the mean and variance of the various velocity components. While Bayesian nonparametric methods such as Gaussian process-based models[rasmussen2003gaussian] can provide uncertainty estimates, they (as well as many of their approximations [senanayake2017learning]) are generally not suitable for building large-scale macroscopic models because their computational complexity grows too quickly with the number of data points [rasmussen2003gaussian, o2012gaussian].
This paper applies an approach that has many of the desirable attributes of a Gaussian process-based model, but it is faster and more memory efficient. We adopt an approach that involves projecting the spatial coordinates into a high-dimensional, yet interpretable, feature space to capture information local to a given area. Bayesian linear regression is used to learn the parameters of the model. We can then query arbitrary points in the 3D space to obtain smooth estimates of the mean and variance of the velocity components. The model can be efficiently updated online, making it amenable to changes in the environment. A theoretical analysis shows that this model is robust against input noise. We demonstrate our approach on simulated and real-world datasets.
Ii Related Work
Many robotics applications involve building maps of the environment. Occupancy maps is the most common representation. Occupancy maps alone can only be used to navigate through an environment when surrounding obstacles are stationary. However, in real urban environments an autonomous vehicle must be able to safely navigate around both stationary obstacles and moving vehicles. Developing velocity maps are therefore crucial for planning algorithms to plan safe trajectories for autonomous vehicles operating in urban environments.
Techniques such as occupancy grid maps [elfes1989using]
discretize the environment and model if a cell is free or occupied. Such models assume the cells are independent, ignoring neighborhood information. While such methods can model the binary occupancy probability, they do not model epistemic uncertainty. To address these limitations, several 2D Bayesian models have been proposed[o2012gaussian, senanayake2017bayesian, mcleod2019navigating, duong2020autonomous]. These models can be queried at an arbitrary resolution at run time. There have also been various attempts to model occupancy in dynamic environments [senanayake2016spatio, itkina2019dynamic, Toyungyernsub2020arxiv]
. However, such models do not explicitly represent the velocity as a map. We model the uncertainty of velocity in the 3D space which is computationally challenging compared to conventional 2D occupancy mapping techniques. Although there are similarities with the model we explore in this paper, these models use a binary random variable to model occupancy[senanayake2018building]
. In this work, we are interested in modeling the velocity which is not a binary variable.
Velocity modeling has previously been studied in various disciplines [lawrance2011path, homicz2002three]. However, these models are deterministic. Other models that attempt to estimate quantities that change temporally include modeling the long-term occupancy [senanayake2017learning] and directions [Senanayake2020itsc] in 2D. In contrast, the objective of this paper is modeling velocity with their associated epistemic uncertainties in 3D space.
Iii Bayesian Dynamic Fields
This section introduces the proposed framework for modeling 3D velocity maps. Since we want to model the spatial field of dynamics such as velocity with its associated uncertainty, we refer to this framework as Bayesian Dynamic Fields (BDF). First, we explain how to generate high dimensional features from data using kernel functions. Then, we discuss how to build a linear model from these features and estimate uncertainty with Bayesian linear regression with these basis functions. These basis functions can be customized for different datasets. This model is applied to build macroscopic and microscopic dynamic models. Finally, a theoretical analysis of the robustness of the proposed framework is presented.
Iii-a High dimensional feature space
Our objective is to model the velocity field and its corresponding uncertainty field in a given 3D space. Velocity variations exhibit nonlinear patterns with respect to spatial location. A common tool for modeling nonlinear patterns is deep neural networks, but they generally require large amounts of training data and can be slow to train. We focus on kernel methods[kivinen2002online], which have been successfully used to model spatial quantities such as occupancy [senanayake2017bayesian, vallicrosa2018h, ramos2016hilbert] and directions [zhi2019spatiotemporal] in robotics, soil concentration in geostatistics [cressie2015statistics], and disease propagation in epidemiology [senanayake2016predicting].
Kernels are similarity functions. A kernel, , takes two inputs and and outputs a measure on how similar the two inputs are. In this work, we use the squared-exponential kernel because of its simplicity and interpretability:
is the inverse bandwidth hyperparameter. This hyperparameter controls the sensitivity of the similarity. Kernels with smaller values ofcapture correlations over larger areas.
We use this kernel to define a set of basis functions, , where is any point in 3D space and are fixed points in that space. We arrange the fixed points in a 3D regular grid. Although kernels can alternatively be computed through random Fourier features [mildenhall2020nerf, ramos2016hilbert] or Nystrom approximation, we use a grid for simplicity, interpretability, and accuracy [ramos2016hilbert]. If desired, these fixed points can be learned alongside other parameters [senanayake2018automorphing].
A data point may come, for example, from radar (Figure 2a–b) or IMU measurements. For such data points, , the feature matrix is defined as,
This matrix would be mostly sparse with many values close to zero because the kernel function goes to zero when its two inputs are far apart.
Unlike LIDAR, automotive radars provides the velocity associated with each point in the point cloud. We want to estimate the uncertainty of velocity at the arbitrary point indicated by the three colored arrows. (a) The camera image of an area an automotive radar can see. (b) The velocity vectors from pre-processed 3D radar measurements[meyer2019automotive] are in black. (c) The estimates from our model are not deterministic vectors but distributions of velocities for each direction. For simplicity, we only show the red and blue directions. A few resulting vectors are shown in black.
Iii-B Bayesian inference
We want to build a model that can estimate the velocity of a given point in the environment. For instance, as shown in Figure 2, given some sparse velocity measurements, we want to know the velocity at an arbitrary location indicated by the three colored arrows. Because the three directional velocity components , , and are independent with each other, three different models are learned in parallel.
If the velocity component labels of each datapoint is denoted by , then the training dataset can be defined as . Since we projected the data into -dimensional space, we can now create a linear model with noise where is the noise precision. Our objective is to learn the parameter vector from
. The velocities at one of the three directions in a given location are modeled as a Gaussian distribution. Because measurements arei.i.d., the likelihood can be decomposed as .
As illustrated in Figure 2
c, we are not only interested in estimating the velocity but also the associated uncertainty. In order to model the epistemic uncertainty, we consider a prior probability distribution over. A Gaussian distribution over
is a conjugate prior to the likelihood model of our interest. With this prior, the posterior distributionfrom Bayesian linear regression in the feature space can be computed analytically [bishop2006pattern]:
Given that our prior knowledge about an area is minimal, we can set and , where is a small parameter indicating the precision (i.e. the inverse of variance) of the prior. This indicates an almost uninformative prior. Furthermore, this prior acts as a natural regularizer for the high-dimensional regression problem.
Iii-C Dimension-adjusted kernels
When defining the kernel in (1), we considered that the norm between a data point and a fixed point is scaled by the hyperparameter . However, if the nonlinear patterns change in different rates in different axes of the 3D space, we can scale them differently to better fit the model:
is the hyperparameter matrix. Although it is possible to consider the full non-diagonal matrix by considering the hyperparameter-hyperparameter covariances, in this application, we ignore such complex interactions for the sake of simplicity. It is also possible to learn the parameters [senanayake2018automorphing] or learn a completely new kernel function (1) [gonen2011multiple].
Iii-D Macroscopic and microscopic velocity maps
As shown in Figure 1b, a macroscopic model represents the velocity of a large area of an urban environment. These models will especially be useful when designing urban air mobility systems [thipphavong2018urban]. In order to build a global model, trajectory information of vehicles in the environment are collected for a long time period. This information can be obtained from surveillance radar [Jung2019] or IMU data. The velocity is represented as and we perform inference separately for each of the different dimensions. For each of these learning problems, we place a regular grid over the entire space for fixed points.
In macroscopic mapping, we have to learn large environments with lots of data. In such environments, when data arrives sequentially, it is possible to use the posterior distribution from the previous time step as the prior distribution in the current time step before applying the update rules in (3)–(4). Furthermore, when data is obtained sequentially, we can start with the assumption that the environment is static and populate the 3D environment (training dataset) with quasi-Monte Carlo (QMC) samples of velocity zero to improve the learning efficiency. QMC sampling techniques are known to be sample efficient over Monte Carlo techniques [lemieux2009monte]. In particular, Sobol and generalized Halton QMC sequences can populate the 3D free space more evenly [lemieux2009monte, tompkins2019black]. Trajectory data points that are in the neighborhood measured by the Euclidean distance of the populated data are removed from the dataset.
For the microscopic model, we are interested only in modeling the field of view of the ego vehicle (Figures 1c and 2). For such models, velocity information coming from automotive radar is used. Since automotive radar point clouds, unlike LIDAR measurements, are extremely sparse, modeling the epistemic uncertainty is important so that we know which of our velocity estimates are less reliable. Furthermore, automotive radar measurements tend to have higher noise levels and therefore, modeling the aleatoric uncertainty is equally important. Equation (6) is the combined aleatoric-epistemic uncertainty estimation.
Iii-E Wasserstein robustness against input noise
Sensor measurements, especially radar measurements, are typically corrupted by some noise. In this section, we theoretically analyze whether the proposed model can withstand input perturbations [kuhn2019wasserstein].
The squared 2-Wasserstein distance between a normal distribution
The squared 2-Wasserstein distance between a normal distributionand a point is .
By computing between and , it can be shown that [kuhn2019wasserstein]. For diagonal matrices and , by reducing the Gaussian to a Dirac delta distribution (by taking the limit of the variance terms to zero), we obtain .
Expectation of estimations of the linear model with weights where in Bayesian dynamic fields are unaltered by the input noise under the 2-Wasserstein metric.
Let us rewrite the introduced in Section III-B as a summation,
The measurement noise is absorbed into the distributions with whose parameters are estimated during training. Therefore, the mean estimations are unaffected by noise.
|Chunks||Synthetic||3 closely packed velocity clusters|
|Blobs||Synthetic||3 separated velocity clusters as blobs|
|Carla||Simulated||Automotive radar data|
|Astyx||Real||Automotive radar data|
|nuScenes||Real||Automotive radar data|
|AirSim||Simulated||60 drone trajectories to simulate UAM|
|Airport||Real||100 aircraft trajectories around an airport|
Iv-a Experimental setup
We studied the effectiveness of BDFs in constructing macroscopic and microscopic models from a variety of datasets summarized in Table I. These datasets were obtained from hi-fidelity simulators and real-world benchmark datasets. Since automotive radar is becoming increasingly popular in driverless cars, we included some of those datasets as well. The models we build using small automotive radar datasets, Carla, Astyx, and nuScenes are microscopic because they only model their surroundings. These datasets have only a few data points per scan (Figure 0(c) and 1(b)). AirSim is a dataset that we generated using the AirSim simulator [airsim2017fsr]. It contains 66859 data points of drone trajectories (Figure 0(b)) in a large area 100040060 m. The airport dataset contains 128349 data points of real aircraft tracks within 30 nautical miles of the John F. Kennedy airport [Jung2019].
Twenty percent of each dataset is used as the test dataset. For large areas, we normalized data to be in a cube of between and and picked the hyperparameters and grid distance using cross validation. The parameters and were set to and , respectively. The code and video can be found at https://github.com/RansML/BDF. Experiments were run on a 3.30 GHz CPU. Since Gaussian process-based models are a common choice for modeling epistemic uncertainty in many robotics tasks [deisenroth2013gaussian, o2012gaussian], we base-lined against full Gaussian process (FGP) [rasmussen2003gaussian] and its scalable approximations such as subset of data Gaussian process (SGP) [herbrich2003fast] and more recent big data Gaussian process (BGP) [senanayake2017learning]. GPflow [GPflow2017] was used for benchmarking.
Iv-B Effect of dimension adjustment
As the first experiment, we verify that dimension-adjusted kernels are useful to maintain sharp velocity transitions. For this purpose, we use the the Chunks dataset as it has sharp velocity transitions. Figure 3 shows that we get much crisper edges when we use a higher in the direction. This is because gamma controls the contribution from each direction. Table II further corroborates that higher for the -axis has the least root mean squared error (RMSE) for a similar mean standardized log loss (MSLL) [rasmussen2003gaussian].
Iv-C Runtime and accuracy
|Dataset||Method||Train time (s)||Query time (s)||RMSE|
|Due to limited scalability of benchmarks, only 3.8% and 4.3% of|
|data in AirSim and JFK, respectively, was used for all four methods.|
Figure 4 shows a detailed example of how we can model the 3D space. Observe that the variance in areas where we do not have data is higher, indicating the epistemic uncertainty. Metrics for each dataset are reported in Table III. For almost all datasets, our model has the least training time to achieve similar accuracy to other methods. For small datasets (points < 1000), SGP is equivalent to FGP as the subset is the same as the full dataset. Since BGP is based on a stochastic gradient approach, it is typically harder to optimize compared to other analytical forms and is not extremely useful in small data settings.
The efficiency of our model is more pronounced in large datasets such as Airport (Figure 6) and AirSim (Figure 7). BDF is at least 10 times faster than the second best performing model. This is because its asymptotic computational complexity is for kernels. That is, the speed depends only on the number of kernels but not on the number of data points. In contrast, FGPs have a memory complexity for both training and query for data points. In their sparse approximations, inducing points are used to represent the key points in the dataset [quinonero2005unifying]. For , the computational complexities of BGP and SGP are and , respectively. Although SGP, in theory, has a similar asymptotic computational complexity, the major drawback of SGP is that it discards a large amount of data to achieve this speedup. In our method, every data point has an equal contribution when training the model.
Although BDFs have complexity, in practice, when is finite (around 1000 in most of our experiments), we observe a slight increase with the number of data points due to the matrix product in (3). This can be observed in Figure 5 in which we separately train the model for an increasing number of data points. This slight increase of time is negligible compared to BGP and FGP in which the training time is 50 minutes before failing.
Since BDFs are parametric models that use kernels, we can update parts of the model or combine various models. This is because kernels are similarity functions, and therefore the weight parameters of a kernel located atare not affected by data far away from them. Similarly, a large-scale model can be easily decomposed to create light-weight models that cater to only a designated area of the environment.
This paper presented Bayesian Dynamic Fields to model velocity in the 3D space. The velocity of the environment is represented as a continuous function that can be queried at arbitrary resolutions. The training procedure is equally suitable for both small and big data regimes, making it suitable to build microscopic and macroscopic transportation models. The model captures both velocity estimates as well as the uncertainty associated with those estimates. In conjunction with common environment representations such as occupancy maps in robotics, in the future, we will use the uncertainty estimates of velocity for decision-making under uncertainty and safety analysis [cannon2012stochastic].
The authors thank Soyeon Jung for assisting with preprocessing the aviation radar dataset. Toyota Research Institute (TRI) provided funds to assist the authors with their research, but this article solely reflects the opinions and conclusions of its authors and not TRI or any other Toyota entity.