CenterFormer: Center-based Transformer for 3D Object Detection

09/12/2022
by   Zixiang Zhou, et al.
0

Query-based transformer has shown great potential in constructing long-range attention in many image-domain tasks, but has rarely been considered in LiDAR-based 3D object detection due to the overwhelming size of the point cloud data. In this paper, we propose CenterFormer, a center-based transformer network for 3D object detection. CenterFormer first uses a center heatmap to select center candidates on top of a standard voxel-based point cloud encoder. It then uses the feature of the center candidate as the query embedding in the transformer. To further aggregate features from multiple frames, we design an approach to fuse features through cross-attention. Lastly, regression heads are added to predict the bounding box on the output center feature representation. Our design reduces the convergence difficulty and computational complexity of the transformer structure. The results show significant improvements over the strong baseline of anchor-free object detection networks. CenterFormer achieves state-of-the-art performance for a single model on the Waymo Open Dataset, with 73.7 outperforming all previously published CNN and transformer-based methods. Our code is publicly available at https://github.com/TuSimple/centerformer

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/08/2021

Anchor-free 3D Single Stage Detector with Mask-Guided Attention for Point Cloud

Most of the existing single-stage and two-stage 3D object detectors are ...
research
11/21/2020

Rethinking Transformer-based Set Prediction for Object Detection

DETR is a recently proposed Transformer-based method which views object ...
research
10/05/2022

Centralized Feature Pyramid for Object Detection

Visual feature pyramid has shown its superiority in both effectiveness a...
research
07/17/2023

Box-DETR: Understanding and Boxing Conditional Spatial Queries

Conditional spatial queries are recently introduced into DEtection TRans...
research
05/12/2023

SSD-MonoDTR: Supervised Scale-constrained Deformable Transformer for Monocular 3D Object Detection

Transformer-based methods have demonstrated superior performance for mon...
research
01/15/2023

DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets

Designing an efficient yet deployment-friendly 3D backbone to handle spa...
research
09/02/2023

S^3-MonoDETR: Supervised Shape Scale-perceptive Deformable Transformer for Monocular 3D Object Detection

Recently, transformer-based methods have shown exceptional performance i...

Please sign up or login with your details

Forgot password? Click here to reset