CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object Detection

04/01/2022
by   Yanan Zhang, et al.
0

In autonomous driving, LiDAR point-clouds and RGB images are two major data modalities with complementary cues for 3D object detection. However, it is quite difficult to sufficiently use them, due to large inter-modal discrepancies. To address this issue, we propose a novel framework, namely Contrastively Augmented Transformer for multi-modal 3D object Detection (CAT-Det). Specifically, CAT-Det adopts a two-stream structure consisting of a Pointformer (PT) branch, an Imageformer (IT) branch along with a Cross-Modal Transformer (CMT) module. PT, IT and CMT jointly encode intra-modal and inter-modal long-range contexts for representing an object, thus fully exploring multi-modal information for detection. Furthermore, we propose an effective One-way Multi-modal Data Augmentation (OMDA) approach via hierarchical contrastive learning at both the point and object levels, significantly improving the accuracy only by augmenting point-clouds, which is free from complex generation of paired samples of the two modalities. Extensive experiments on the KITTI benchmark show that CAT-Det achieves a new state-of-the-art, highlighting its effectiveness.

READ FULL TEXT

page 1

page 8

page 14

research
02/16/2023

Hierarchical Cross-modal Transformer for RGB-D Salient Object Detection

Most of existing RGB-D salient object detection (SOD) methods follow the...
research
07/21/2022

AutoAlignV2: Deformable Feature Aggregation for Dynamic Multi-Modal 3D Object Detection

Point clouds and RGB images are two general perceptional sources in auto...
research
01/03/2023

Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection

In this paper, we propose a robust 3D detector, named Cross Modal Transf...
research
04/23/2023

Informative Data Selection with Uncertainty for Multi-modal Object Detection

Noise has always been nonnegligible trouble in object detection by creat...
research
05/11/2023

Multi-modal Multi-level Fusion for 3D Single Object Tracking

3D single object tracking plays a crucial role in computer vision. Mains...
research
05/25/2023

Language-Guided 3D Object Detection in Point Cloud for Autonomous Driving

This paper addresses the problem of 3D referring expression comprehensio...
research
03/17/2023

PersonalTailor: Personalizing 2D Pattern Design from 3D Garment Point Clouds

Garment pattern design aims to convert a 3D garment to the corresponding...

Please sign up or login with your details

Forgot password? Click here to reset