Tag-Based Attention Guided Bottom-Up Approach for Video Instance Segmentation

04/22/2022
by   jyoti-kini, et al.
0

Video Instance Segmentation is a fundamental computer vision task that deals with segmenting and tracking object instances across a video sequence. Most existing methods typically accomplish this task by employing a multi-stage top-down approach that usually involves separate networks to detect and segment objects in each frame, followed by associating these detections in consecutive frames using a learned tracking head. In this work, however, we introduce a simple end-to-end trainable bottom-up approach to achieve instance mask predictions at the pixel-level granularity, instead of the typical region-proposals-based approach. Unlike contemporary frame-based models, our network pipeline processes an input video clip as a single 3D volume to incorporate temporal information. The central idea of our formulation is to solve the video instance segmentation task as a tag assignment problem, such that generating distinct tag values essentially separates individual object instances across the video sequence (here each tag could be any arbitrary value between 0 and 1). To this end, we propose a novel spatio-temporal tagging loss that allows for sufficient separation of different objects as well as necessary identification of different instances of the same object. Furthermore, we present a tag-based attention module that improves instance tags, while concurrently learning instance propagation within a video. Evaluations demonstrate that our method provides competitive results on YouTube-VIS and DAVIS-19 datasets, and has minimum run-time compared to other state-of-the-art performance methods.

READ FULL TEXT

page 1

page 3

page 5

research
03/18/2020

STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos

Existing methods for instance segmentation in videos typically involve m...
research
11/30/2020

End-to-End Video Instance Segmentation with Transformers

Video instance segmentation (VIS) is the task that requires simultaneous...
research
12/10/2019

Classifying, Segmenting, and Tracking Object Instances in Video with Mask Propagation

We introduce a method for simultaneously classifying, segmenting and tra...
research
04/11/2019

MAIN: Multi-Attention Instance Network for Video Segmentation

Instance-level video segmentation requires a solid integration of spatia...
research
06/22/2021

Tracking Instances as Queries

Recently, query based deep networks catch lots of attention owing to the...
research
03/25/2021

Video Instance Segmentation with a Propose-Reduce Paradigm

Video instance segmentation (VIS) aims to segment and associate all inst...
research
06/07/2021

Contextual Guided Segmentation Framework for Semi-supervised Video Instance Segmentation

In this paper, we propose Contextual Guided Segmentation (CGS) framework...

Please sign up or login with your details

Forgot password? Click here to reset