Video Semantic Segmentation with Inter-Frame Feature Fusion and Inner-Frame Feature Refinement

01/10/2023
by   Jiafan Zhuang, et al.
0

Video semantic segmentation aims to generate accurate semantic maps for each video frame. To this end, many works dedicate to integrate diverse information from consecutive frames to enhance the features for prediction, where a feature alignment procedure via estimated optical flow is usually required. However, the optical flow would inevitably suffer from inaccuracy, and then introduce noises in feature fusion and further result in unsatisfactory segmentation results. In this paper, to tackle the misalignment issue, we propose a spatial-temporal fusion (STF) module to model dense pairwise relationships among multi-frame features. Different from previous methods, STF uniformly and adaptively fuses features at different spatial and temporal positions, and avoids error-prone optical flow estimation. Besides, we further exploit feature refinement within a single frame and propose a novel memory-augmented refinement (MAR) module to tackle difficult predictions among semantic boundaries. Specifically, MAR can store the boundary features and prototypes extracted from the training samples, which together form the task-specific memory, and then use them to refine the features during inference. Essentially, MAR can move the hard features closer to the most likely category and thus make them more discriminative. We conduct extensive experiments on Cityscapes and CamVid, and the results show that our proposed methods significantly outperform previous methods and achieves the state-of-the-art performance. Code and pretrained models are available at https://github.com/jfzhuang/ST_Memory.

READ FULL TEXT
research
02/17/2021

Temporal Memory Attention for Video Semantic Segmentation

Video semantic segmentation requires to utilize the complex temporal rel...
research
06/18/2020

Video Semantic Segmentation with Distortion-Aware Feature Correction

Video semantic segmentation is active in recent years benefited from the...
research
09/16/2020

Dual Semantic Fusion Network for Video Object Detection

Video object detection is a tough task due to the deteriorated quality o...
research
04/08/2021

Progressive Temporal Feature Alignment Network for Video Inpainting

Video inpainting aims to fill spatio-temporal "corrupted" regions with p...
research
07/21/2022

Mining Relations among Cross-Frame Affinities for Video Semantic Segmentation

The essence of video semantic segmentation (VSS) is how to leverage temp...
research
07/17/2018

Accel: A Corrective Fusion Network for Efficient Semantic Segmentation on Video

In this paper, we present Accel, a novel semantic video segmentation sys...
research
08/11/2021

Multi-Source Fusion and Automatic Predictor Selection for Zero-Shot Video Object Segmentation

Location and appearance are the key cues for video object segmentation. ...

Please sign up or login with your details

Forgot password? Click here to reset