Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos

by   Yubin Hu, et al.

Video semantic segmentation (VSS) is a computationally expensive task due to the per-frame prediction for videos of high frame rates. In recent work, compact models or adaptive network strategies have been proposed for efficient VSS. However, they did not consider a crucial factor that affects the computational cost from the input side: the input resolution. In this paper, we propose an altering resolution framework called AR-Seg for compressed videos to achieve efficient VSS. AR-Seg aims to reduce the computational cost by using low resolution for non-keyframes. To prevent the performance degradation caused by downsampling, we design a Cross Resolution Feature Fusion (CReFF) module, and supervise it with a novel Feature Similarity Training (FST) strategy. Specifically, CReFF first makes use of motion vectors stored in a compressed video to warp features from high-resolution keyframes to low-resolution non-keyframes for better spatial alignment, and then selectively aggregates the warped features with local attention mechanism. Furthermore, the proposed FST supervises the aggregated features with high-resolution features through an explicit similarity loss and an implicit constraint from the shared decoding layer. Extensive experiments on CamVid and Cityscapes show that AR-Seg achieves state-of-the-art performance and is compatible with different segmentation backbones. On CamVid, AR-Seg saves 67 with the PSPNet18 backbone while maintaining high segmentation accuracy. Code: https://github.com/THU-LYJ-Lab/AR-Seg.


page 4

page 7


U-HRNet: Delving into Improving Semantic Representation of High Resolution Network for Dense Prediction

High resolution and advanced semantic representation are both vital for ...

Real-time Semantic Segmentation with Fast Attention

In deep CNN based models for semantic segmentation, high accuracy relies...

RFC-Net: Learning High Resolution Global Features for Medical Image Segmentation on a Computational Budget

Learning High-Resolution representations is essential for semantic segme...

High Quality Image Interpolation via Local Autoregressive and Nonlocal 3-D Sparse Regularization

In this paper, we propose a novel image interpolation algorithm, which i...

Per-clip adaptive Lagrangian multiplier optimisation with low-resolution proxies

This work focuses on reducing the computational cost of repeated video e...

Differentiable Resolution Compression and Alignment for Efficient Video Classification and Retrieval

Optimizing video inference efficiency has become increasingly important ...

Spectrum-guided Multi-granularity Referring Video Object Segmentation

Current referring video object segmentation (R-VOS) techniques extract c...

Please sign up or login with your details

Forgot password? Click here to reset