1st Place Solution for YouTubeVOS Challenge 2021:Video Instance Segmentation

06/12/2021
by   Thuy C. Nguyen, et al.
0

Video Instance Segmentation (VIS) is a multi-task problem performing detection, segmentation, and tracking simultaneously. Extended from image set applications, video data additionally induces the temporal information, which, if handled appropriately, is very useful to identify and predict object motions. In this work, we design a unified model to mutually learn these tasks. Specifically, we propose two modules, named Temporally Correlated Instance Segmentation (TCIS) and Bidirectional Tracking (BiTrack), to take the benefit of the temporal correlation between the object's instance masks across adjacent frames. On the other hand, video data is often redundant due to the frame's overlap. Our analysis shows that this problem is particularly severe for the YoutubeVOS-VIS2021 data. Therefore, we propose a Multi-Source Data (MSD) training mechanism to compensate for the data deficiency. By combining these techniques with a bag of tricks, the network performance is significantly boosted compared to the baseline, and outperforms other methods by a considerable margin on the YoutubeVOS-VIS 2019 and 2021 datasets.

READ FULL TEXT

page 2

page 3

page 5

page 8

research
06/14/2022

Consistent Video Instance Segmentation with Inter-Frame Recurrent Attention

Video instance segmentation aims at predicting object segmentation masks...
research
11/15/2021

Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation

Video instance segmentation aims to detect, segment, and track objects i...
research
11/16/2022

Robust Online Video Instance Segmentation with Track Queries

Recently, transformer-based methods have achieved impressive results on ...
research
03/07/2022

End-to-end video instance segmentation via spatial-temporal graph neural networks

Video instance segmentation is a challenging task that extends image ins...
research
08/17/2022

Video-TransUNet: Temporally Blended Vision Transformer for CT VFSS Instance Segmentation

We propose Video-TransUNet, a deep architecture for instance segmentatio...
research
07/15/2021

MeNToS: Tracklets Association with a Space-Time Memory Network

We propose a method for multi-object tracking and segmentation (MOTS) th...
research
12/14/2020

Improving Video Instance Segmentation by Light-weight Temporal Uncertainty Estimates

Instance segmentation with neural networks is an essential task in envir...

Please sign up or login with your details

Forgot password? Click here to reset