BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving

05/19/2022
by   Yunpeng Zhang, et al.
0

In this paper, we present BEVerse, a unified framework for 3D perception and prediction based on multi-camera systems. Unlike existing studies focusing on the improvement of single-task approaches, BEVerse features in producing spatio-temporal Birds-Eye-View (BEV) representations from multi-camera videos and jointly reasoning about multiple tasks for vision-centric autonomous driving. Specifically, BEVerse first performs shared feature extraction and lifting to generate 4D BEV representations from multi-timestamp and multi-view images. After the ego-motion alignment, the spatio-temporal encoder is utilized for further feature extraction in BEV. Finally, multiple task decoders are attached for joint reasoning and prediction. Within the decoders, we propose the grid sampler to generate BEV features with different ranges and granularities for different tasks. Also, we design the method of iterative flow for memory-efficient future prediction. We show that the temporal information improves 3D object detection and semantic map construction, while the multi-task learning can implicitly benefit motion prediction. With extensive experiments on the nuScenes dataset, we show that the multi-task BEVerse outperforms existing single-task methods on 3D object detection, semantic map construction, and motion prediction. Compared with the sequential paradigm, BEVerse also favors in significantly improved efficiency. The code and trained models will be released at https://github.com/zhangyp15/BEVerse.

READ FULL TEXT

page 1

page 9

research
03/15/2020

MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird's Eye View Maps

The ability to reliably perceive the environmental states, particularly ...
research
03/31/2022

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

3D visual perception tasks, including 3D detection and map segmentation ...
research
04/11/2022

M^2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation

In this paper, we propose M^2BEV, a unified framework that jointly perfo...
research
06/19/2023

PowerBEV: A Powerful Yet Lightweight Framework for Instance Prediction in Bird's-Eye View

Accurately perceiving instances and predicting their future motion are k...
research
07/04/2023

FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation

This technical report summarizes the winning solution for the 3D Occupan...
research
04/06/2023

Geometric-aware Pretraining for Vision-centric 3D Object Detection

Multi-camera 3D object detection for autonomous driving is a challenging...
research
01/17/2021

Deep Multi-Task Learning for Joint Localization, Perception, and Prediction

Over the last few years, we have witnessed tremendous progress on many s...

Please sign up or login with your details

Forgot password? Click here to reset