Language-Guided 3D Object Detection in Point Cloud for Autonomous Driving

05/25/2023
by   Wenhao Cheng, et al.
0

This paper addresses the problem of 3D referring expression comprehension (REC) in autonomous driving scenario, which aims to ground a natural language to the targeted region in LiDAR point clouds. Previous approaches for REC usually focus on the 2D or 3D-indoor domain, which is not suitable for accurately predicting the location of the queried 3D region in an autonomous driving scene. In addition, the upper-bound limitation and the heavy computation cost motivate us to explore a better solution. In this work, we propose a new multi-modal visual grounding task, termed LiDAR Grounding. Then we devise a Multi-modal Single Shot Grounding (MSSG) approach with an effective token fusion strategy. It jointly learns the LiDAR-based object detector with the language features and predicts the targeted region directly from the detector without any post-processing. Moreover, the image feature can be flexibly integrated into our approach to provide rich texture and color information. The cross-modal learning enforces the detector to concentrate on important regions in the point cloud by considering the informative language expressions, thus leading to much better accuracy and efficiency. Extensive experiments on the Talk2Car dataset demonstrate the effectiveness of the proposed methods. Our work offers a deeper insight into the LiDAR-based grounding task and we expect it presents a promising direction for the autonomous driving community.

READ FULL TEXT

page 1

page 4

page 7

research
01/18/2023

PTA-Det: Point Transformer Associating Point cloud and Image for 3D Object Detection

In autonomous driving, 3D object detection based on multi-modal data has...
research
04/28/2022

TJ4DRadSet: A 4D Radar Dataset for Autonomous Driving

The new generation of 4D high-resolution imaging radar provides not only...
research
04/20/2021

Efficient Online Transfer Learning for 3D Object Classification in Autonomous Driving

Autonomous driving has achieved rapid development over the last few deca...
research
04/01/2022

CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object Detection

In autonomous driving, LiDAR point-clouds and RGB images are two major d...
research
09/24/2022

Ground then Navigate: Language-guided Navigation in Dynamic Scenes

We investigate the Vision-and-Language Navigation (VLN) problem in the c...
research
03/14/2022

Grounding Commands for Autonomous Vehicles via Layer Fusion with Region-specific Dynamic Layer Attention

Grounding a command to the visual environment is an essential ingredient...
research
10/07/2019

Adversarial reconstruction for Multi-modal Machine Translation

Even with the growing interest in problems at the intersection of Comput...

Please sign up or login with your details

Forgot password? Click here to reset