VoxelFormer: Bird's-Eye-View Feature Generation based on Dual-view Attention for Multi-view 3D Object Detection

04/03/2023
by   Zhuoling Li, et al.
0

In recent years, transformer-based detectors have demonstrated remarkable performance in 2D visual perception tasks. However, their performance in multi-view 3D object detection remains inferior to the state-of-the-art (SOTA) of convolutional neural network based detectors. In this work, we investigate this issue from the perspective of bird's-eye-view (BEV) feature generation. Specifically, we examine the BEV feature generation method employed by the transformer-based SOTA, BEVFormer, and identify its two limitations: (i) it only generates attention weights from BEV, which precludes the use of lidar points for supervision, and (ii) it aggregates camera view features to the BEV through deformable sampling, which only selects a small subset of features and fails to exploit all information. To overcome these limitations, we propose a novel BEV feature generation method, dual-view attention, which generates attention weights from both the BEV and camera view. This method encodes all camera features into the BEV feature. By combining dual-view attention with the BEVFormer architecture, we build a new detector named VoxelFormer. Extensive experiments are conducted on the nuScenes benchmark to verify the superiority of dual-view attention and VoxelForer. We observe that even only adopting 3 encoders and 1 historical frame during training, VoxelFormer still outperforms BEVFormer significantly. When trained in the same setting, VoxelFormer can surpass BEVFormer by 4.9 https://github.com/Lizhuoling/VoxelFormer-public.git.

READ FULL TEXT
research
07/26/2022

MV-FCOS3D++: Multi-View Camera-Only 4D Object Detection with Pretrained Monocular Backbones

In this technical report, we present our solution, dubbed MV-FCOS3D++, f...
research
03/25/2023

Viewpoint Equivariance for Multi-View 3D Object Detection

3D object detection from visual sensors is a cornerstone capability of r...
research
04/03/2023

Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction

In this paper, we propose a new paradigm, named Historical Object Predic...
research
04/04/2023

LiDAR-Based 3D Object Detection via Hybrid 2D Semantic Scene Generation

Bird's-Eye View (BEV) features are popular intermediate scene representa...
research
11/19/2022

MatrixVT: Efficient Multi-Camera to BEV Transformation for 3D Perception

This paper proposes an efficient multi-camera to Bird's-Eye-View (BEV) v...
research
08/19/2022

PersDet: Monocular 3D Detection in Perspective Bird's-Eye-View

Currently, detecting 3D objects in Bird's-Eye-View (BEV) is superior to ...
research
11/18/2022

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision

We present a novel bird's-eye-view (BEV) detector with perspective super...

Please sign up or login with your details

Forgot password? Click here to reset