Semantic Video CNNs through Representation Warping

08/10/2017
by   Raghudeep Gadde, et al.
0

In this work, we propose a technique to convert CNN models for semantic segmentation of static images into CNNs for video data. We describe a warping method that can be used to augment existing architectures with very little extra computational cost. This module is called NetWarp and we demonstrate its use for a range of network architectures. The main design principle is to use optical flow of adjacent frames for warping internal network representations across time. A key insight of this work is that fast optical flow methods can be combined with many different CNN architectures for improved performance and end-to-end training. Experiments validate that the proposed approach incurs only little extra computational cost, while improving performance, when video streams are available. We achieve new state-of-the-art results on the CamVid and Cityscapes benchmark datasets and show consistent improvements over different baseline networks. Our code and models will be available at http://segmentation.is.tue.mpg.de

READ FULL TEXT

page 2

page 6

page 7

page 8

research
05/04/2019

Learning Spatio-Temporal Features with Two-Stream Deep 3D CNNs for Lipreading

We focus on the word-level visual lipreading, which requires recognizing...
research
03/24/2020

MaskFlownet: Asymmetric Feature Matching with Learnable Occlusion Mask

Feature warping is a core technique in optical flow estimation; however,...
research
11/20/2021

FlowVOS: Weakly-Supervised Visual Warping for Detail-Preserving and Temporally Consistent Single-Shot Video Object Segmentation

We consider the task of semi-supervised video object segmentation (VOS)....
research
10/08/2018

Inter-BMV: Interpolation with Block Motion Vectors for Fast Semantic Segmentation on Video

Models optimized for accuracy on single images are often prohibitively s...
research
12/19/2018

Light Weight Color Image Warping with Inter-Channel Information

Image warping is a necessary step in many multimedia applications such a...
research
11/05/2019

High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks

Predicting future video frames is extremely challenging, as there are ma...
research
07/17/2018

Accel: A Corrective Fusion Network for Efficient Semantic Segmentation on Video

In this paper, we present Accel, a novel semantic video segmentation sys...

Please sign up or login with your details

Forgot password? Click here to reset