RangeViT: Towards Vision Transformers for 3D Semantic Segmentation in Autonomous Driving

01/24/2023
by   Angelika Ando, et al.
0

Casting semantic segmentation of outdoor LiDAR point clouds as a 2D problem, e.g., via range projection, is an effective and popular approach. These projection-based methods usually benefit from fast computations and, when combined with techniques which use other point cloud representations, achieve state-of-the-art results. Today, projection-based methods leverage 2D CNNs but recent advances in computer vision show that vision transformers (ViTs) have achieved state-of-the-art results in many image-based benchmarks. In this work, we question if projection-based methods for 3D semantic segmentation can benefit from these latest improvements on ViTs. We answer positively but only after combining them with three key ingredients: (a) ViTs are notoriously hard to train and require a lot of training data to learn powerful representations. By preserving the same backbone architecture as for RGB images, we can exploit the knowledge from long training on large image collections that are much cheaper to acquire and annotate than point clouds. We reach our best results with pre-trained ViTs on large image datasets. (b) We compensate ViTs' lack of inductive bias by substituting a tailored convolutional stem for the classical linear embedding layer. (c) We refine pixel-wise predictions with a convolutional decoder and a skip connection from the convolutional stem to combine low-level but fine-grained features of the the convolutional stem with the high-level but coarse predictions of the ViT encoder. With these ingredients, we show that our method, called RangeViT, outperforms existing projection-based methods on nuScenes and SemanticKITTI. We provide the implementation code at https://github.com/valeoai/rangevit.

READ FULL TEXT
research
01/24/2023

Using a Waffle Iron for Automotive Point Cloud Semantic Segmentation

Semantic segmentation of point clouds in autonomous driving datasets req...
research
11/03/2020

Multi Projection Fusion for Real-time Semantic Segmentation of 3D LiDAR Point Clouds

Semantic segmentation of 3D point cloud data is essential for enhanced h...
research
08/24/2020

TORNADO-Net: mulTiview tOtal vaRiatioN semAntic segmentation with Diamond inceptiOn module

Semantic segmentation of point clouds is a key component of scene unders...
research
02/28/2023

Applying Plain Transformers to Real-World Point Clouds

Due to the lack of inductive bias, transformer-based models usually requ...
research
03/07/2020

SalsaNext: Fast Semantic Segmentation of LiDAR Point Clouds for Autonomous Driving

In this paper, we introduce SalsaNext for the semantic segmentation of a...
research
08/28/2023

Attention-Guided Lidar Segmentation and Odometry Using Image-to-Point Cloud Saliency Transfer

LiDAR odometry estimation and 3D semantic segmentation are crucial for a...
research
05/29/2019

A survey of Object Classification and Detection based on 2D/3D data

Recently, by using deep neural network based algorithms, object classifi...

Please sign up or login with your details

Forgot password? Click here to reset