GRAtt-VIS: Gated Residual Attention for Auto Rectifying Video Instance Segmentation

05/26/2023
by   Tanveer Hannan, et al.
0

Recent trends in Video Instance Segmentation (VIS) have seen a growing reliance on online methods to model complex and lengthy video sequences. However, the degradation of representation and noise accumulation of the online methods, especially during occlusion and abrupt changes, pose substantial challenges. Transformer-based query propagation provides promising directions at the cost of quadratic memory attention. However, they are susceptible to the degradation of instance features due to the above-mentioned challenges and suffer from cascading effects. The detection and rectification of such errors remain largely underexplored. To this end, we introduce GRAtt-VIS, Gated Residual Attention for Video Instance Segmentation. Firstly, we leverage a Gumbel-Softmax-based gate to detect possible errors in the current frame. Next, based on the gate activation, we rectify degraded features from its past representation. Such a residual configuration alleviates the need for dedicated memory and provides a continuous stream of relevant instance features. Secondly, we propose a novel inter-instance interaction using gate activation as a mask for self-attention. This masking strategy dynamically restricts the unrepresentative instance queries in the self-attention and preserves vital information for long-term tracking. We refer to this novel combination of Gated Residual Connection and Masked Self-Attention as GRAtt block, which can easily be integrated into the existing propagation-based framework. Further, GRAtt blocks significantly reduce the attention overhead and simplify dynamic temporal modeling. GRAtt-VIS achieves state-of-the-art performance on YouTube-VIS and the highly challenging OVIS dataset, significantly improving over previous methods. Code is available at <https://github.com/Tanveer81/GRAttVIS>.

READ FULL TEXT

page 8

page 9

page 14

research
10/30/2022

Two-Level Temporal Relation Model for Online Video Instance Segmentation

In Video Instance Segmentation (VIS), current approaches either focus on...
research
07/22/2022

DeVIS: Making Deformable Transformers Work for Video Instance Segmentation

Video Instance Segmentation (VIS) jointly tackles multi-object detection...
research
08/22/2022

InstanceFormer: An Online Video Instance Segmentation Framework

Recent transformer-based offline video instance segmentation (VIS) appro...
research
08/03/2022

MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training

We propose MinVIS, a minimal video instance segmentation (VIS) framework...
research
08/29/2023

NOVIS: A Case for End-to-End Near-Online Video Instance Segmentation

Until recently, the Video Instance Segmentation (VIS) community operated...
research
11/16/2022

A Generalized Framework for Video Instance Segmentation

Recently, handling long videos of complex and occluded sequences has eme...
research
11/29/2021

Feature-Gate Coupling for Dynamic Network Pruning

Gating modules have been widely explored in dynamic network pruning to r...

Please sign up or login with your details

Forgot password? Click here to reset