OcTr: Octree-based Transformer for 3D Object Detection

03/22/2023
by   Chao Zhou, et al.
0

A key challenge for LiDAR-based 3D object detection is to capture sufficient features from large scale 3D scenes especially for distant or/and occluded objects. Albeit recent efforts made by Transformers with the long sequence modeling capability, they fail to properly balance the accuracy and efficiency, suffering from inadequate receptive fields or coarse-grained holistic correlations. In this paper, we propose an Octree-based Transformer, named OcTr, to address this issue. It first constructs a dynamic octree on the hierarchical feature pyramid through conducting self-attention on the top level and then recursively propagates to the level below restricted by the octants, which captures rich global context in a coarse-to-fine manner while maintaining the computational complexity under control. Furthermore, for enhanced foreground perception, we propose a hybrid positional embedding, composed of the semantic-aware positional embedding and attention mask, to fully exploit semantic and geometry clues. Extensive experiments are conducted on the Waymo Open Dataset and KITTI Dataset, and OcTr reaches newly state-of-the-art results.

READ FULL TEXT

page 3

page 12

page 13

page 15

research
09/05/2022

SEFormer: Structure Embedding Transformer for 3D Object Detection

Effectively preserving and encoding structure features from objects in i...
research
09/29/2022

Dilated Neighborhood Attention Transformer

Transformers are quickly becoming one of the most heavily applied deep l...
research
12/06/2021

PTTR: Relational 3D Point Cloud Object Tracking with Transformer

In a point cloud sequence, 3D object tracking aims to predict the locati...
research
11/27/2022

Semantic-Aware Local-Global Vision Transformer

Vision Transformers have achieved remarkable progresses, among which Swi...
research
07/16/2020

InfoFocus: 3D Object Detection for Autonomous Driving with Dynamic Information Modeling

Real-time 3D object detection is crucial for autonomous cars. Achieving ...
research
05/07/2023

RFR-WWANet: Weighted Window Attention-Based Recovery Feature Resolution Network for Unsupervised Image Registration

The Swin transformer has recently attracted attention in medical image a...
research
07/02/2019

Obj-GloVe: Scene-Based Contextual Object Embedding

Recently, with the prevalence of large-scale image dataset, the co-occur...

Please sign up or login with your details

Forgot password? Click here to reset