Sparse4D: Multi-view 3D Object Detection with Sparse Spatial-Temporal Fusion

11/19/2022
by   Xuewu Lin, et al.
1

Bird-eye-view (BEV) based methods have made great progress recently in multi-view 3D detection task. Comparing with BEV based methods, sparse based methods lag behind in performance, but still have lots of non-negligible merits. To push sparse 3D detection further, in this work, we introduce a novel method, named Sparse4D, which does the iterative refinement of anchor boxes via sparsely sampling and fusing spatial-temporal features. (1) Sparse 4D Sampling: for each 3D anchor, we assign multiple 4D keypoints, which are then projected to multi-view/scale/timestamp image features to sample corresponding features; (2) Hierarchy Feature Fusion: we hierarchically fuse sampled features of different view/scale, different timestamp and different keypoints to generate high-quality instance feature. In this way, Sparse4D can efficiently and effectively achieve 3D detection without relying on dense view transformation nor global attention, and is more friendly to edge devices deployment. Furthermore, we introduce an instance-level depth reweight module to alleviate the ill-posed issue in 3D-to-2D projection. In experiment, our method outperforms all sparse based methods and most BEV based methods on detection task in the nuScenes dataset.

READ FULL TEXT

page 1

page 3

page 4

research
09/09/2019

MLOD: A multi-view 3D object detection based on robust feature fusion method

This paper presents Multi-view Labelling Object Detector (MLOD). The det...
research
12/15/2022

DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention

3D object detection with surround-view images is an essential task for a...
research
04/21/2021

MVFuseNet: Improving End-to-End Object Detection and Motion Forecasting through Multi-View Fusion of LiDAR Data

In this work, we propose MVFuseNet, a novel end-to-end method for joint ...
research
05/23/2023

Sparse4D v2: Recurrent Temporal Fusion with Sparse Model

Sparse algorithms offer great flexibility for multi-view temporal percep...
research
01/10/2023

FrustumFormer: Adaptive Instance-aware Resampling for Multi-view 3D Detection

The transformation of features from 2D perspective space to 3D space is ...
research
07/18/2022

UniFormer: Unified Multi-view Fusion Transformer for Spatial-Temporal Representation in Bird's-Eye-View

Bird's eye view (BEV) representation is a new perception formulation for...
research
03/18/2022

VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention

Detecting objects from LiDAR point clouds is of tremendous significance ...

Please sign up or login with your details

Forgot password? Click here to reset