Efficient Video Instance Segmentation via Tracklet Query and Proposal

03/03/2022
by   Jialian Wu, et al.
0

Video Instance Segmentation (VIS) aims to simultaneously classify, segment, and track multiple object instances in videos. Recent clip-level VIS takes a short video clip as input each time showing stronger performance than frame-level VIS (tracking-by-segmentation), as more temporal context from multiple frames is utilized. Yet, most clip-level methods are neither end-to-end learnable nor real-time. These limitations are addressed by the recent VIS transformer (VisTR) which performs VIS end-to-end within a clip. However, VisTR suffers from long training time due to its frame-wise dense attention. In addition, VisTR is not fully end-to-end learnable in multiple video clips as it requires a hand-crafted data association to link instance tracklets between successive clips. This paper proposes EfficientVIS, a fully end-to-end framework with efficient training and inference. At the core are tracklet query and tracklet proposal that associate and segment regions-of-interest (RoIs) across space and time by an iterative query-video interaction. We further propose a correspondence learning that makes tracklets linking between clips end-to-end learnable. Compared to VisTR, EfficientVIS requires 15x fewer training epochs while achieving state-of-the-art accuracy on the YouTube-VIS benchmark. Meanwhile, our method enables whole video instance segmentation in a single end-to-end pass without data association at all.

READ FULL TEXT

page 3

page 7

page 8

page 11

page 12

research
01/05/2023

InsPro: Propagating Instance Query and Proposal for Online Video Instance Segmentation

Video instance segmentation (VIS) aims at segmenting and tracking object...
research
03/22/2023

Tube-Link: A Flexible Cross Tube Baseline for Universal Video Segmentation

The goal of video segmentation is to accurately segment and track every ...
research
06/22/2021

Tracking Instances as Queries

Recently, query based deep networks catch lots of attention owing to the...
research
08/29/2023

NOVIS: A Case for End-to-End Near-Online Video Instance Segmentation

Until recently, the Video Instance Segmentation (VIS) community operated...
research
03/25/2021

Video Instance Segmentation with a Propose-Reduce Paradigm

Video instance segmentation (VIS) aims to segment and associate all inst...
research
04/13/2023

DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer

Most state-of-the-art instance segmentation methods rely on large amount...
research
11/23/2022

Video Instance Shadow Detection

Video instance shadow detection aims to simultaneously detect, segment, ...

Please sign up or login with your details

Forgot password? Click here to reset