Crossover Learning for Fast Online Video Instance Segmentation

04/13/2021
by   Shusheng Yang, et al.
0

Modeling temporal visual context across frames is critical for video instance segmentation (VIS) and other video understanding tasks. In this paper, we propose a fast online VIS model named CrossVIS. For temporal information modeling in VIS, we present a novel crossover learning scheme that uses the instance feature in the current frame to pixel-wisely localize the same instance in other frames. Different from previous schemes, crossover learning does not require any additional network parameters for feature enhancement. By integrating with the instance segmentation loss, crossover learning enables efficient cross-frame instance-to-pixel relation learning and brings cost-free improvement during inference. Besides, a global balanced instance embedding branch is proposed for more accurate and more stable online instance association. We conduct extensive experiments on three challenging VIS benchmarks, , YouTube-VIS-2019, OVIS, and YouTube-VIS-2021 to evaluate our methods. To our knowledge, CrossVIS achieves state-of-the-art performance among all online VIS methods and shows a decent trade-off between latency and accuracy. Code will be available to facilitate future research.

READ FULL TEXT

page 1

page 3

page 9

page 10

research
09/21/2023

TCOVIS: Temporally Consistent Online Video Instance Segmentation

In recent years, significant progress has been made in video instance se...
research
07/28/2021

Improving Video Instance Segmentation via Temporal Pyramid Routing

Video Instance Segmentation (VIS) is a new and inherently multi-task pro...
research
12/13/2022

Look Before You Match: Instance Understanding Matters in Video Object Segmentation

Exploring dense matching between the current frame and past frames for l...
research
06/07/2021

Video Instance Segmentation using Inter-Frame Communication Transformers

We propose a novel end-to-end solution for video instance segmentation (...
research
12/08/2021

VISOLO: Grid-Based Space-Time Aggregation for Efficient Online Video Instance Segmentation

For online video instance segmentation (VIS), fully utilizing the inform...
research
02/15/2023

Offline-to-Online Knowledge Distillation for Video Instance Segmentation

In this paper, we present offline-to-online knowledge distillation (OOKD...
research
04/11/2019

MAIN: Multi-Attention Instance Network for Video Segmentation

Instance-level video segmentation requires a solid integration of spatia...

Please sign up or login with your details

Forgot password? Click here to reset