Improving Video Instance Segmentation via Temporal Pyramid Routing

07/28/2021
by   Xiangtai Li, et al.
0

Video Instance Segmentation (VIS) is a new and inherently multi-task problem, which aims to detect, segment and track each instance in a video sequence. Existing approaches are mainly based on single-frame features or single-scale features of multiple frames, where temporal information or multi-scale information is ignored. To incorporate both temporal and scale information, we propose a Temporal Pyramid Routing (TPR) strategy to conditionally align and conduct pixel-level aggregation from a feature pyramid pair of two adjacent frames. Specifically, TPR contains two novel components, including Dynamic Aligned Cell Routing (DACR) and Cross Pyramid Routing (CPR), where DACR is designed for aligning and gating pyramid features across temporal dimension, while CPR transfers temporally aggregated features across scale dimension. Moreover, our approach is a plug-and-play module and can be easily applied to existing instance segmentation methods. Extensive experiments on YouTube-VIS dataset demonstrate the effectiveness and efficiency of the proposed approach on several state-of-the-art instance segmentation methods. Codes and trained models will be publicly available to facilitate future research.(<https://github.com/lxtGH/TemporalPyramidRouting>).

READ FULL TEXT

page 1

page 2

page 7

page 8

research
04/13/2021

Crossover Learning for Fast Online Video Instance Segmentation

Modeling temporal visual context across frames is critical for video ins...
research
12/07/2020

CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation

Video instance segmentation is a complex task in which we need to detect...
research
09/08/2021

Temporal RoI Align for Video Object Recognition

Video object detection is challenging in the presence of appearance dete...
research
11/22/2021

CATNet: Context AggregaTion Network for Instance Segmentation in Remote Sensing Images

The task of instance segmentation in remote sensing images, aiming at pe...
research
12/07/2020

Learning Video Instance Segmentation with Recurrent Graph Neural Networks

Most existing approaches to video instance segmentation comprise multipl...
research
07/21/2022

Mining Relations among Cross-Frame Affinities for Video Semantic Segmentation

The essence of video semantic segmentation (VSS) is how to leverage temp...
research
08/17/2022

Video-TransUNet: Temporally Blended Vision Transformer for CT VFSS Instance Segmentation

We propose Video-TransUNet, a deep architecture for instance segmentatio...

Please sign up or login with your details

Forgot password? Click here to reset