Temporal Transductive Inference for Few-Shot Video Object Segmentation

by   Mennatullah Siam, et al.

Few-shot video object segmentation (FS-VOS) aims at segmenting video frames using a few labelled examples of classes not seen during initial training. In this paper, we present a simple but effective temporal transductive inference (TTI) approach that leverages temporal consistency in the unlabelled video frames during few-shot inference. Key to our approach is the use of both global and local temporal constraints. The objective of the global constraint is to learn consistent linear classifiers for novel classes across the image sequence, whereas the local constraint enforces the proportion of foreground/background regions in each frame to be coherent across a local temporal window. These constraints act as spatiotemporal regularizers during the transductive inference to increase temporal coherence and reduce overfitting on the few-shot support set. Empirically, our model outperforms state-of-the-art meta-learning approaches in terms of mean intersection over union on YouTube-VIS by 2.8 that are exhaustively labelled (i.e. all object occurrences are labelled, unlike the currently available), and present a more realistic evaluation paradigm that targets data distribution shift between training and testing sets. Our empirical results and in-depth analysis confirm the added benefits of the proposed spatiotemporal regularizers to improve temporal coherence and overcome certain overfitting scenarios.


page 2

page 4

page 6

page 9

page 11

page 12

page 14


Video Object Segmentation Without Temporal Information

Video Object Segmentation, and video processing in general, has been his...

Weakly Supervised Few-shot Object Segmentation using Co-Attention with Visual and Semantic Inputs

Significant progress has been made recently in developing few-shot objec...

Domain Adaptive Video Segmentation via Temporal Pseudo Supervision

Video semantic segmentation has achieved great progress under the superv...

One-Shot Weakly Supervised Video Object Segmentation

Conventional few-shot object segmentation methods learn object segmentat...

MetaPix: Few-Shot Video Retargeting

We address the task of unsupervised retargeting of human actions from on...

Noisy-LSTM: Improving Temporal Awareness for Video Semantic Segmentation

Semantic video segmentation is a key challenge for various applications....

Melody Harmonization Using Orderless NADE, Chord Balancing, and Blocked Gibbs Sampling

Coherence and interestingness are two criteria for evaluating the perfor...