ViP3D: End-to-end Visual Trajectory Prediction via 3D Agent Queries

by   Junru Gu, et al.

Existing autonomous driving pipelines separate the perception module from the prediction module. The two modules communicate via hand-picked features such as agent boxes and trajectories as interfaces. Due to this separation, the prediction module only receives partial information from the perception module. Even worse, errors from the perception modules can propagate and accumulate, adversely affecting the prediction results. In this work, we propose ViP3D, a visual trajectory prediction pipeline that leverages the rich information from raw videos to predict future trajectories of agents in a scene. ViP3D employs sparse agent queries throughout the pipeline, making it fully differentiable and interpretable. Furthermore, we propose an evaluation metric for this novel end-to-end visual trajectory prediction task. Extensive experimental results on the nuScenes dataset show the strong performance of ViP3D over traditional pipelines and previous end-to-end models.


page 8

page 14


Control-Aware Prediction Objectives for Autonomous Driving

Autonomous vehicle software is typically structured as a modular pipelin...

LatentFormer: Multi-Agent Transformer-Based Interaction Modeling and Trajectory Prediction

Multi-agent trajectory prediction is a fundamental problem in autonomous...

Bounce and Learn: Modeling Scene Dynamics with Real-World Bounces

We introduce an approach to model surface properties governing bounces i...

Learning Driving Decisions by Imitating Drivers' Control Behaviors

Classical autonomous driving systems are modularized as a pipeline of pe...

MTP: Multi-Hypothesis Tracking and Prediction for Reduced Error Propagation

Recently, there has been tremendous progress in developing each individu...

Many Episode Learning in a Modular Embodied Agent via End-to-End Interaction

In this work we give a case study of an embodied machine-learning (ML) p...

Goal-GAN: Multimodal Trajectory Prediction Based on Goal Position Estimation

In this paper, we present Goal-GAN, an interpretable and end-to-end trai...