TBN-ViT: Temporal Bilateral Network with Vision Transformer for Video Scene Parsing

12/02/2021
by   Bo Yan, et al.
0

Video scene parsing in the wild with diverse scenarios is a challenging and great significance task, especially with the rapid development of automatic driving technique. The dataset Video Scene Parsing in the Wild(VSPW) contains well-trimmed long-temporal, dense annotation and high resolution clips. Based on VSPW, we design a Temporal Bilateral Network with Vision Transformer. We first design a spatial path with convolutions to generate low level features which can preserve the spatial information. Meanwhile, a context path with vision transformer is employed to obtain sufficient context information. Furthermore, a temporal context module is designed to harness the inter-frames contextual information. Finally, the proposed method can achieve the mean intersection over union(mIoU) of 49.85% for the VSPW2021 Challenge test dataset.

READ FULL TEXT
research
09/03/2021

Semantic Segmentation on VSPW Dataset through Aggregation of Transformer Models

Semantic segmentation is an important task in computer vision, from whic...
research
09/01/2021

Memory Based Video Scene Parsing

Video scene parsing is a long-standing challenging task in computer visi...
research
09/06/2021

Exploiting Spatial-Temporal Semantic Consistency for Video Scene Parsing

Compared with image scene parsing, video scene parsing introduces tempor...
research
06/24/2022

Bilateral Network with Channel Splitting Network and Transformer for Thermal Image Super-Resolution

In recent years, the Thermal Image Super-Resolution (TISR) problem has b...
research
12/01/2016

Video Scene Parsing with Predictive Feature Learning

In this work, we address the challenging video scene parsing problem by ...
research
06/06/2023

Semantic Segmentation on VSPW Dataset through Contrastive Loss and Multi-dataset Training Approach

Video scene parsing incorporates temporal information, which can enhance...
research
08/22/2022

Automated Temporal Segmentation of Orofacial Assessment Videos

Computer vision techniques can help automate or partially automate clini...

Please sign up or login with your details

Forgot password? Click here to reset