TapLab: A Fast Framework for Semantic Video Segmentation Tapping into Compressed-Domain Knowledge

by   Junyi Feng, et al.

Real-time semantic video segmentation is a challenging task due to the strict requirements of inference speed. Recent approaches mainly devote great efforts to reducing the model size for high efficiency. In this paper, we rethink this problem from a different viewpoint: using knowledge contained in compressed videos. We propose a simple and effective framework, dubbed TapLab, to tap into resources from the compressed domain. Specifically, we design a fast feature warping module using motion vectors for acceleration. To reduce the noise introduced by motion vectors, we design a residual-guided correction module and a residual-guided frame selection module using residuals. Compared with the state-of-the-art fast semantic image segmentation models, our proposed TapLab significantly reduces redundant computations, running around 3 times faster with comparable accuracy for 1024x2048 video. The experimental results show that TapLab achieves 70.6 single GPU card. A high-speed version even reaches the speed of 160+ FPS.



There are no comments yet.


page 4

page 5

page 7

page 8

page 9

page 11


How to Train Your Dragon: Tamed Warping Network for Semantic Video Segmentation

Real-time semantic segmentation on high-resolution videos is challenging...

GSVNet: Guided Spatially-Varying Convolution for Fast Semantic Segmentation on Video

This paper addresses fast semantic segmentation on video.Video segmentat...

Efficient Video Object Segmentation with Compressed Video

We propose an efficient inference framework for semi-supervised video ob...

Double Similarity Distillation for Semantic Image Segmentation

The balance between high accuracy and high speed has always been a chall...

Fast Object Detection in Compressed Video

Object detection in videos has drawn increasing attention recently since...

Separable Convolutional LSTMs for Faster Video Segmentation

Semantic Segmentation is an important module for autonomous robots such ...

MODNet-V: Improving Portrait Video Matting via Background Restoration

To address the challenging portrait video matting problem more precisely...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.