Deformable VisTR: Spatio temporal deformable attention for video instance segmentation

03/12/2022
by   Sudhir Yarram, et al.
0

Video instance segmentation (VIS) task requires classifying, segmenting, and tracking object instances over all frames in a video clip. Recently, VisTR has been proposed as end-to-end transformer-based VIS framework, while demonstrating state-of-the-art performance. However, VisTR is slow to converge during training, requiring around 1000 GPU hours due to the high computational cost of its transformer attention module. To improve the training efficiency, we propose Deformable VisTR, leveraging spatio-temporal deformable attention module that only attends to a small fixed set of key spatio-temporal sampling points around a reference point. This enables Deformable VisTR to achieve linear computation in the size of spatio-temporal feature maps. Moreover, it can achieve on par performance as the original VisTR with 10× less GPU training hours. We validate the effectiveness of our method on the Youtube-VIS benchmark. Code is available at https://github.com/skrya/DefVIS.

READ FULL TEXT
research
07/22/2022

DeVIS: Making Deformable Transformers Work for Video Instance Segmentation

Video Instance Segmentation (VIS) jointly tackles multi-object detection...
research
03/24/2022

Video Instance Segmentation via Multi-scale Spatio-temporal Split Attention Transformer

State-of-the-art transformer-based video instance segmentation (VIS) app...
research
03/21/2023

3D Mitochondria Instance Segmentation with Spatio-Temporal Transformers

Accurate 3D mitochondria instance segmentation in electron microscopy (E...
research
12/09/2019

STAGE: Spatio-Temporal Attention on Graph Entities for Video Action Detection

Spatio-temporal action localization is a challenging yet fascinating tas...
research
11/15/2021

D^2Conv3D: Dynamic Dilated Convolutions for Object Segmentation in Videos

Despite receiving significant attention from the research community, the...
research
06/30/2021

Efficient Spatio-Temporal Recurrent Neural Network for Video Deblurring

Real-time video deblurring still remains a challenging task due to the c...
research
08/21/2020

INSIDE: Steering Spatial Attention with Non-Imaging Information in CNNs

We consider the problem of integrating non-imaging information into segm...

Please sign up or login with your details

Forgot password? Click here to reset