RoadFormer: Duplex Transformer for RGB-Normal Semantic Road Scene Parsing

by   Jiahang Li, et al.

The recent advancements in deep convolutional neural networks have shown significant promise in the domain of road scene parsing. Nevertheless, the existing works focus primarily on freespace detection, with little attention given to hazardous road defects that could compromise both driving safety and comfort. In this paper, we introduce RoadFormer, a novel Transformer-based data-fusion network developed for road scene parsing. RoadFormer utilizes a duplex encoder architecture to extract heterogeneous features from both RGB images and surface normal information. The encoded features are subsequently fed into a novel heterogeneous feature synergy block for effective feature fusion and recalibration. The pixel decoder then learns multi-scale long-range dependencies from the fused and recalibrated heterogeneous features, which are subsequently processed by a Transformer decoder to produce the final semantic prediction. Additionally, we release SYN-UDTIRI, the first large-scale road scene parsing dataset that contains over 10,407 RGB images, dense depth images, and the corresponding pixel-level annotations for both freespace and road defects of different shapes and sizes. Extensive experimental evaluations conducted on our SYN-UDTIRI dataset, as well as on three public datasets, including KITTI road, CityScapes, and ORFD, demonstrate that RoadFormer outperforms all other state-of-the-art networks for road scene parsing. Specifically, RoadFormer ranks first on the KITTI road benchmark. Our source code, created dataset, and demo video are publicly available at


page 1

page 3

page 5


Traffic Scene Parsing through the TSP6K Dataset

Traffic scene parsing is one of the most important tasks to achieve inte...

Infrared Colorization Using Deep Convolutional Neural Networks

This paper proposes a method for transferring the RGB color spectrum to ...

Applying Surface Normal Information in Drivable Area and Road Anomaly Detection for Ground Mobile Robots

The joint detection of drivable areas and road anomalies is a crucial ta...

Depth Not Needed - An Evaluation of RGB-D Feature Encodings for Off-Road Scene Understanding by Convolutional Neural Network

Scene understanding for autonomous vehicles is a challenging computer vi...

Deep Surface Normal Estimation with Hierarchical RGB-D Fusion

The growing availability of commodity RGB-D cameras has boosted the appl...

Semantic Dense Reconstruction with Consistent Scene Segments

In this paper, a method for dense semantic 3D scene reconstruction from ...

EfficientPS: Efficient Panoptic Segmentation

Understanding the scene in which an autonomous robot operates is critica...

Please sign up or login with your details

Forgot password? Click here to reset