ViT-BEVSeg: A Hierarchical Transformer Network for Monocular Birds-Eye-View Segmentation

05/31/2022
by   Pramit Dutta, et al.
0

Generating a detailed near-field perceptual model of the environment is an important and challenging problem in both self-driving vehicles and autonomous mobile robotics. A Bird Eye View (BEV) map, providing a panoptic representation, is a commonly used approach that provides a simplified 2D representation of the vehicle surroundings with accurate semantic level segmentation for many downstream tasks. Current state-of-the art approaches to generate BEV-maps employ a Convolutional Neural Network (CNN) backbone to create feature-maps which are passed through a spatial transformer to project the derived features onto the BEV coordinate frame. In this paper, we evaluate the use of vision transformers (ViT) as a backbone architecture to generate BEV maps. Our network architecture, ViT-BEVSeg, employs standard vision transformers to generate a multi-scale representation of the input image. The resulting representation is then provided as an input to a spatial transformer decoder module which outputs segmentation maps in the BEV grid. We evaluate our approach on the nuScenes dataset demonstrating a considerable improvement in the performance relative to state-of-the-art approaches.

READ FULL TEXT

page 1

page 3

page 5

research
03/07/2023

F2BEV: Bird's Eye View Generation from Surround-View Fisheye Camera Images for Automated Driving

Bird's Eye View (BEV) representations are tremendously useful for percep...
research
08/06/2021

Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images

Bird's-Eye-View (BEV) maps have emerged as one of the most powerful repr...
research
05/14/2022

Transformer Scale Gate for Semantic Segmentation

Effectively encoding multi-scale contextual information is crucial for a...
research
12/29/2022

Local Learning on Transformers via Feature Reconstruction

Transformers are becoming increasingly popular due to their superior per...
research
03/18/2023

Social Occlusion Inference with Vectorized Representation for Autonomous Driving

Autonomous vehicles must be capable of handling the occlusion of the env...
research
10/05/2021

Transformer Assisted Convolutional Network for Cell Instance Segmentation

Region proposal based methods like R-CNN and Faster R-CNN models have pr...
research
04/16/2022

GitNet: Geometric Prior-based Transformation for Birds-Eye-View Segmentation

Birds-eye-view (BEV) semantic segmentation is critical for autonomous dr...

Please sign up or login with your details

Forgot password? Click here to reset