SG-Net: Spatial Granularity Network for One-Stage Video Instance Segmentation

03/18/2021
by   Dongfang Liu, et al.
4

Video instance segmentation (VIS) is a new and critical task in computer vision. To date, top-performing VIS methods extend the two-stage Mask R-CNN by adding a tracking branch, leaving plenty of room for improvement. In contrast, we approach the VIS task from a new perspective and propose a one-stage spatial granularity network (SG-Net). Compared to the conventional two-stage methods, SG-Net demonstrates four advantages: 1) Our method has a one-stage compact architecture and each task head (detection, segmentation, and tracking) is crafted interdependently so they can effectively share features and enjoy the joint optimization; 2) Our mask prediction is dynamically performed on the sub-regions of each detected instance, leading to high-quality masks of fine granularity; 3) Each of our task predictions avoids using expensive proposal-based RoI features, resulting in much reduced runtime complexity per instance; 4) Our tracking head models objects centerness movements for tracking, which effectively enhances the tracking robustness to different object appearances. In evaluation, we present state-of-the-art comparisons on the YouTube-VIS dataset. Extensive experiments demonstrate that our compact one-stage method can achieve improved performance in both accuracy and inference speed. We hope our SG-Net could serve as a strong and flexible baseline for the VIS task. Our code will be available.

READ FULL TEXT

page 1

page 5

page 7

page 8

research
01/02/2020

BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation

Instance segmentation is one of the fundamental vision tasks. Recently, ...
research
05/12/2019

Video Instance Segmentation

In this paper we present a new computer vision task, named video instanc...
research
07/29/2020

SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation

Single-stage instance segmentation approaches have recently gained popul...
research
03/23/2020

SOLOv2: Dynamic, Faster and Stronger

In this work, we aim at building a simple, direct, and fast instance seg...
research
02/05/2021

Instance and Panoptic Segmentation Using Conditional Convolutions

We propose a simple yet effective framework for instance and panoptic se...
research
05/26/2023

OpenVIS: Open-vocabulary Video Instance Segmentation

We propose and study a new computer vision task named open-vocabulary vi...
research
09/03/2020

1st Place Solution of LVIS Challenge 2020: A Good Box is not a Guarantee of a Good Mask

This article introduces the solutions of the team lvisTraveler for LVIS ...

Please sign up or login with your details

Forgot password? Click here to reset