DeVIS: Making Deformable Transformers Work for Video Instance Segmentation

07/22/2022
by   Adrià Caelles, et al.
0

Video Instance Segmentation (VIS) jointly tackles multi-object detection, tracking, and segmentation in video sequences. In the past, VIS methods mirrored the fragmentation of these subtasks in their architectural design, hence missing out on a joint solution. Transformers recently allowed to cast the entire VIS task as a single set-prediction problem. Nevertheless, the quadratic complexity of existing Transformer-based methods requires long training times, high memory requirements, and processing of low-single-scale feature maps. Deformable attention provides a more efficient alternative but its application to the temporal domain or the segmentation task have not yet been explored. In this work, we present Deformable VIS (DeVIS), a VIS method which capitalizes on the efficiency and performance of deformable Transformers. To reason about all VIS subtasks jointly over multiple frames, we present temporal multi-scale deformable attention with instance-aware object queries. We further introduce a new image and video instance mask head with multi-scale features, and perform near-online video processing with multi-cue clip tracking. DeVIS reduces memory as well as training time requirements, and achieves state-of-the-art results on the YouTube-VIS 2021, as well as the challenging OVIS dataset. Code is available at https://github.com/acaelles97/DeVIS.

READ FULL TEXT

page 7

page 18

page 19

research
03/12/2022

Deformable VisTR: Spatio temporal deformable attention for video instance segmentation

Video instance segmentation (VIS) task requires classifying, segmenting,...
research
03/24/2022

Video Instance Segmentation via Multi-scale Spatio-temporal Split Attention Transformer

State-of-the-art transformer-based video instance segmentation (VIS) app...
research
05/26/2023

GRAtt-VIS: Gated Residual Attention for Auto Rectifying Video Instance Segmentation

Recent trends in Video Instance Segmentation (VIS) have seen a growing r...
research
03/31/2022

Deformable Video Transformer

Video transformers have recently emerged as an effective alternative to ...
research
12/13/2022

Look Before You Match: Instance Understanding Matters in Video Object Segmentation

Exploring dense matching between the current frame and past frames for l...
research
06/07/2021

Video Instance Segmentation using Inter-Frame Communication Transformers

We propose a novel end-to-end solution for video instance segmentation (...
research
11/16/2022

A Generalized Framework for Video Instance Segmentation

Recently, handling long videos of complex and occluded sequences has eme...

Please sign up or login with your details

Forgot password? Click here to reset