DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets

01/15/2023
by   Haiyang Wang, et al.
0

Designing an efficient yet deployment-friendly 3D backbone to handle sparse point clouds is a fundamental problem in 3D object detection. Compared with the customized sparse convolution, the attention mechanism in Transformers is more appropriate for flexibly modeling long-range relationships and is easier to be deployed in real-world applications. However, due to the sparse characteristics of point clouds, it is non-trivial to apply a standard transformer on sparse points. In this paper, we present Dynamic Sparse Voxel Transformer (DSVT), a single-stride window-based voxel Transformer backbone for outdoor 3D object detection. In order to efficiently process sparse points in parallel, we propose Dynamic Sparse Window Attention, which partitions a series of local regions in each window according to its sparsity and then computes the features of all regions in a fully parallel manner. To allow the cross-set connection, we design a rotated set partitioning strategy that alternates between two partitioning configurations in consecutive self-attention layers. To support effective downsampling and better encode geometric information, we also propose an attention-style 3D pooling module on sparse points, which is powerful and deployment-friendly without utilizing any customized CUDA operations. Our model achieves state-of-the-art performance on large-scale Waymo Open Dataset with remarkable gains. More importantly, DSVT can be easily deployed by TensorRT with real-time inference speed (27Hz). Code will be available at <https://github.com/Haiyang-W/DSVT>.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/13/2022

SWFormer: Sparse Window Transformer for 3D Object Detection in Point Clouds

3D object detection in point clouds is a core component for modern robot...
research
09/06/2021

Voxel Transformer for 3D Object Detection

We present Voxel Transformer (VoTr), a novel and effective voxel-based T...
research
10/09/2022

CAGroup3D: Class-Aware Grouping for 3D Object Detection on Point Clouds

We present a novel two-stage fully sparse convolutional 3D object detect...
research
05/04/2023

OctFormer: Octree-based Transformers for 3D Point Clouds

We propose octree-based transformers, named OctFormer, for 3D point clou...
research
09/12/2022

CenterFormer: Center-based Transformer for 3D Object Detection

Query-based transformer has shown great potential in constructing long-r...
research
12/09/2021

Fast Point Transformer

The recent success of neural networks enables a better interpretation of...
research
05/01/2021

SVT-Net: A Super Light-Weight Network for Large Scale Place Recognition using Sparse Voxel Transformers

Point cloud-based large scale place recognition is fundamental for many ...

Please sign up or login with your details

Forgot password? Click here to reset