Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds

03/19/2022
by   Chenhang He, et al.
0

Transformer has demonstrated promising performance in many 2D vision tasks. However, it is cumbersome to compute the self-attention on large-scale point cloud data because point cloud is a long sequence and unevenly distributed in 3D space. To solve this issue, existing methods usually compute self-attention locally by grouping the points into clusters of the same size, or perform convolutional self-attention on a discretized representation. However, the former results in stochastic point dropout, while the latter typically has narrow attention fields. In this paper, we propose a novel voxel-based architecture, namely Voxel Set Transformer (VoxSeT), to detect 3D objects from point clouds by means of set-to-set translation. VoxSeT is built upon a voxel-based set attention (VSA) module, which reduces the self-attention in each voxel by two cross-attentions and models features in a hidden space induced by a group of latent codes. With the VSA module, VoxSeT can manage voxelized point clusters with arbitrary size in a wide range, and process them in parallel with linear complexity. The proposed VoxSeT integrates the high performance of transformer with the efficiency of voxel-based model, which can be used as a good alternative to the convolutional and point-based backbones. VoxSeT reports competitive results on the KITTI and Waymo detection benchmarks. The source codes can be found at <https://github.com/skyhehe123/VoxSeT>.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/06/2021

Voxel Transformer for 3D Object Detection

We present Voxel Transformer (VoTr), a novel and effective voxel-based T...
research
04/18/2020

DAPnet: A double self-attention convolutional network for segmentation of point clouds

LiDAR point cloud has a complex structure and the 3D semantic labeling o...
research
01/07/2021

Self-Attention Based Context-Aware 3D Object Detection

Most existing point-cloud based 3D object detectors use convolution-like...
research
09/05/2022

SEFormer: Structure Embedding Transformer for 3D Object Detection

Effectively preserving and encoding structure features from objects in i...
research
12/09/2021

Fast Point Transformer

The recent success of neural networks enables a better interpretation of...
research
02/28/2022

Spatiotemporal Transformer Attention Network for 3D Voxel Level Joint Segmentation and Motion Prediction in Point Cloud

Environment perception including detection, classification, tracking, an...
research
06/18/2020

SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks

We introduce the SE(3)-Transformer, a variant of the self-attention modu...

Please sign up or login with your details

Forgot password? Click here to reset