DeepAI AI Chat
Log In Sign Up

Weakly Supervised Instance Segmentation for Videos with Temporal Mask Consistency

by   Qing Liu, et al.
Johns Hopkins University

Weakly supervised instance segmentation reduces the cost of annotations required to train models. However, existing approaches which rely only on image-level class labels predominantly suffer from errors due to (a) partial segmentation of objects and (b) missing object predictions. We show that these issues can be better addressed by training with weakly labeled videos instead of images. In videos, motion and temporal consistency of predictions across frames provide complementary signals which can help segmentation. We are the first to explore the use of these video signals to tackle weakly supervised instance segmentation. We propose two ways to leverage this information in our model. First, we adapt inter-pixel relation network (IRN) to effectively incorporate motion information during training. Second, we introduce a new MaskConsist module, which addresses the problem of missing object instances by transferring stable predictions between neighboring frames during training. We demonstrate that both approaches together improve the instance segmentation metric AP_50 on video frames of two datasets: Youtube-VIS and Cityscapes by 5% and 3% respectively.


page 1

page 7

page 8

page 11


Solve the Puzzle of Instance Segmentation in Videos: A Weakly Supervised Framework with Spatio-Temporal Collaboration

Instance segmentation in videos, which aims to segment and track multipl...

Weakly Supervised Instance Segmentation using Motion Information via Optical Flow

Weakly supervised instance segmentation has gained popularity because it...

Weakly Supervised Airway Orifice Segmentation in Video Bronchoscopy

Video bronchoscopy is routinely conducted for biopsies of lung tissue su...

Weakly Supervised Multi-Object Tracking and Segmentation

We introduce the problem of weakly supervised Multi-Object Tracking and ...

Mask-Free Video Instance Segmentation

The recent advancement in Video Instance Segmentation (VIS) has largely ...

Weakly Supervised Complementary Parts Models for Fine-Grained Image Classification from the Bottom Up

Given a training dataset composed of images and corresponding category l...

Learning Shadow Correspondence for Video Shadow Detection

Video shadow detection aims to generate consistent shadow predictions am...