TBP-Former: Learning Temporal Bird's-Eye-View Pyramid for Joint Perception and Prediction in Vision-Centric Autonomous Driving

03/17/2023
by   Shaoheng Fang, et al.
0

Vision-centric joint perception and prediction (PnP) has become an emerging trend in autonomous driving research. It predicts the future states of the traffic participants in the surrounding environment from raw RGB images. However, it is still a critical challenge to synchronize features obtained at multiple camera views and timestamps due to inevitable geometric distortions and further exploit those spatial-temporal features. To address this issue, we propose a temporal bird's-eye-view pyramid transformer (TBP-Former) for vision-centric PnP, which includes two novel designs. First, a pose-synchronized BEV encoder is proposed to map raw image inputs with any camera pose at any time to a shared and synchronized BEV space for better spatial-temporal synchronization. Second, a spatial-temporal pyramid transformer is introduced to comprehensively extract multi-scale BEV features and predict future BEV states with the support of spatial-temporal priors. Extensive experiments on nuScenes dataset show that our proposed framework overall outperforms all state-of-the-art vision-based prediction methods.

READ FULL TEXT
research
03/15/2020

MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird's Eye View Maps

The ability to reliably perceive the environmental states, particularly ...
research
07/15/2022

ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal Feature Learning

Many existing autonomous driving paradigms involve a multi-stage discret...
research
07/18/2022

UniFormer: Unified Multi-view Fusion Transformer for Spatial-Temporal Representation in Bird's-Eye-View

Bird's eye view (BEV) representation is a new perception formulation for...
research
07/04/2023

FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation

This technical report summarizes the winning solution for the 3D Occupan...
research
08/12/2017

Deep Steering: Learning End-to-End Driving Model from Spatial and Temporal Visual Cues

In recent years, autonomous driving algorithms using low-cost vehicle-mo...
research
03/31/2022

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

3D visual perception tasks, including 3D detection and map segmentation ...
research
06/26/2023

Imitation with Spatial-Temporal Heatmap: 2nd Place Solution for NuPlan Challenge

This paper presents our 2nd place solution for the NuPlan Challenge 2023...

Please sign up or login with your details

Forgot password? Click here to reset