SeqFormer: a Frustratingly Simple Model for Video Instance Segmentation

12/15/2021
by   Junfeng Wu, et al.
16

In this work, we present SeqFormer, a frustratingly simple model for video instance segmentation. SeqFormer follows the principle of vision transformer that models instance relationships among video frames. Nevertheless, we observe that a stand-alone instance query suffices for capturing a time sequence of instances in a video, but attention mechanisms should be done with each frame independently. To achieve this, SeqFormer locates an instance in each frame and aggregates temporal information to learn a powerful representation of a video-level instance, which is used to predict the mask sequences on each frame dynamically. Instance tracking is achieved naturally without tracking branches or post-processing. On the YouTube-VIS dataset, SeqFormer achieves 47.4 AP with a ResNet-50 backbone and 49.0 AP with a ResNet-101 backbone without bells and whistles. Such achievement significantly exceeds the previous state-of-the-art performance by 4.6 and 4.4, respectively. In addition, integrated with the recently-proposed Swin transformer, SeqFormer achieves a much higher AP of 59.3. We hope SeqFormer could be a strong baseline that fosters future research in video instance segmentation, and in the meantime, advances this field with a more robust, accurate, neat model. The code and the pre-trained models are publicly available at https://github.com/wjf5203/SeqFormer.

READ FULL TEXT

page 3

page 7

page 8

page 9

page 10

research
06/09/2022

VITA: Video Instance Segmentation via Object Token Association

We introduce a novel paradigm for offline Video Instance Segmentation (V...
research
04/18/2022

Temporally Efficient Vision Transformer for Video Instance Segmentation

Recently vision transformer has achieved tremendous success on image-lev...
research
01/05/2023

InsPro: Propagating Instance Query and Proposal for Online Video Instance Segmentation

Video instance segmentation (VIS) aims at segmenting and tracking object...
research
03/30/2023

MobileInst: Video Instance Segmentation on the Mobile

Although recent approaches aiming for video instance segmentation have a...
research
08/17/2022

Video-TransUNet: Temporally Blended Vision Transformer for CT VFSS Instance Segmentation

We propose Video-TransUNet, a deep architecture for instance segmentatio...
research
04/15/2020

A Transductive Approach for Video Object Segmentation

Semi-supervised video object segmentation aims to separate a target obje...
research
03/12/2022

One-stage Video Instance Segmentation: From Frame-in Frame-out to Clip-in Clip-out

Many video instance segmentation (VIS) methods partition a video sequenc...

Please sign up or login with your details

Forgot password? Click here to reset