Shared Cross-Modal Trajectory Prediction for Autonomous Driving
We propose a framework for predicting future trajectories of traffic agents in highly interactive environments. On the basis of the fact that autonomous driving vehicles are equipped with various types of sensors (e.g., LiDAR scanner, RGB camera, etc.), our work aims to get benefit from the use of multiple input modalities that are complementary to each other. The proposed approach is composed of two stages. (i) feature encoding where we discover motion behavior of the target agent with respect to other directly and indirectly observable influences. We extract such behaviors from multiple perspectives such as in top-down and frontal view. (ii) cross-modal embedding where we embed a set of learned behavior representations into a single cross-modal latent space. We construct a generative model and formulate the objective functions with an additional regularizer specifically designed for future prediction. An extensive evaluation is conducted to show the efficacy of the proposed framework using two benchmark driving datasets.
READ FULL TEXT