Video Scene Parsing with Predictive Feature Learning

12/01/2016
by   Xiaojie Jin, et al.
0

In this work, we address the challenging video scene parsing problem by developing effective representation learning methods given limited parsing annotations. In particular, we contribute two novel methods that constitute a unified parsing framework. (1) Predictive feature learning from nearly unlimited unlabeled video data. Different from existing methods learning features from single frame parsing, we learn spatiotemporal discriminative features by enforcing a parsing network to predict future frames and their parsing maps (if available) given only historical frames. In this way, the network can effectively learn to capture video dynamics and temporal context, which are critical clues for video scene parsing, without requiring extra manual annotations. (2) Prediction steering parsing architecture that effectively adapts the learned spatiotemporal features to scene parsing tasks and provides strong guidance for any off-the-shelf parsing model to achieve better video scene parsing performance. Extensive experiments over two challenging datasets, Cityscapes and Camvid, have demonstrated the effectiveness of our methods by showing significant improvement over well-established baselines.

READ FULL TEXT

page 2

page 5

page 8

page 12

page 13

page 14

page 15

research
09/01/2021

Memory Based Video Scene Parsing

Video scene parsing is a long-standing challenging task in computer visi...
research
12/04/2016

Pyramid Scene Parsing Network

Scene parsing is challenging for unrestricted open vocabulary and divers...
research
11/09/2017

Predicting Scene Parsing and Motion Dynamics in the Future

The ability of predicting the future is important for intelligent system...
research
08/02/2018

Adaptive Temporal Encoding Network for Video Instance-level Human Parsing

Beyond the existing single-person and multiple-person human parsing task...
research
08/08/2017

FoveaNet: Perspective-aware Urban Scene Parsing

Parsing urban scene images benefits many applications, especially self-d...
research
12/02/2021

TBN-ViT: Temporal Bilateral Network with Vision Transformer for Video Scene Parsing

Video scene parsing in the wild with diverse scenarios is a challenging ...
research
06/11/2021

Part-aware Panoptic Segmentation

In this work, we introduce the new scene understanding task of Part-awar...

Please sign up or login with your details

Forgot password? Click here to reset