Joint Inductive and Transductive Learning for Video Object Segmentation

08/08/2021
by   Yunyao Mao, et al.
0

Semi-supervised video object segmentation is a task of segmenting the target object in a video sequence given only a mask annotation in the first frame. The limited information available makes it an extremely challenging task. Most previous best-performing methods adopt matching-based transductive reasoning or online inductive learning. Nevertheless, they are either less discriminative for similar instances or insufficient in the utilization of spatio-temporal information. In this work, we propose to integrate transductive and inductive learning into a unified framework to exploit the complementarity between them for accurate and robust video object segmentation. The proposed approach consists of two functional branches. The transduction branch adopts a lightweight transformer architecture to aggregate rich spatio-temporal cues while the induction branch performs online inductive learning to obtain discriminative target information. To bridge these two diverse branches, a two-head label encoder is introduced to learn the suitable target prior for each of them. The generated mask encodings are further forced to be disentangled to better retain their complementarity. Extensive experiments on several prevalent benchmarks show that, without the need of synthetic training data, the proposed approach sets a series of new state-of-the-art records. Code is available at https://github.com/maoyunyao/JOINT.

READ FULL TEXT

page 1

page 3

page 5

page 7

page 8

research
07/27/2021

Self-Supervised Video Object Segmentation by Motion-Aware Mask Propagation

We propose a self-supervised spatio-temporal matching method coined Moti...
research
04/04/2019

Spatiotemporal CNN for Video Object Segmentation

In this paper, we present a unified, end-to-end trainable spatiotemporal...
research
08/16/2020

Object-Aware Multi-Branch Relation Networks for Spatio-Temporal Video Grounding

Spatio-temporal video grounding aims to retrieve the spatio-temporal tub...
research
02/27/2020

Learning Fast and Robust Target Models for Video Object Segmentation

Video object segmentation (VOS) is a highly challenging problem since th...
research
06/14/2018

ReConvNet: Video Object Segmentation with Spatio-Temporal Features Modulation

We introduce ReConvNet, a recurrent convolutional architecture for semi-...
research
06/25/2020

SmallBigNet: Integrating Core and Contextual Views for Video Classification

Temporal convolution has been widely used for video classification. Howe...
research
02/06/2023

INCREASE: Inductive Graph Representation Learning for Spatio-Temporal Kriging

Spatio-temporal kriging is an important problem in web and social applic...

Please sign up or login with your details

Forgot password? Click here to reset