Joint Modeling of Feature, Correspondence, and a Compressed Memory for Video Object Segmentation

08/25/2023
by   Jiaming Zhang, et al.
0

Current prevailing Video Object Segmentation (VOS) methods usually perform dense matching between the current and reference frames after extracting their features. One on hand, the decoupled modeling restricts the targets information propagation only at high-level feature space. On the other hand, the pixel-wise matching leads to a lack of holistic understanding of the targets. To overcome these issues, we propose a unified VOS framework, coined as JointFormer, for joint modeling the three elements of feature, correspondence, and a compressed memory. The core design is the Joint Block, utilizing the flexibility of attention to simultaneously extract feature and propagate the targets information to the current tokens and the compressed memory token. This scheme allows to perform extensive information propagation and discriminative feature learning. To incorporate the long-term temporal targets information, we also devise a customized online updating mechanism for the compressed memory token, which can prompt the information flow along the temporal dimension and thus improve the global modeling capability. Under the design, our method achieves a new state-of-art performance on DAVIS 2017 val/test-dev (89.7 YouTube-VOS 2018/2019 val (87.0 works by a large margin.

READ FULL TEXT

page 1

page 3

page 7

page 9

research
01/19/2020

See More, Know More: Unsupervised Video Object Segmentation with Co-Attention Siamese Networks

We introduce a novel network, called CO-attention Siamese Network (COSNe...
research
09/02/2020

LSMVOS: Long-Short-Term Similarity Matching for Video Object

Objective Semi-supervised video object segmentation refers to segmenting...
research
03/17/2023

Unified Mask Embedding and Correspondence Learning for Self-Supervised Video Segmentation

The objective of this paper is self-supervised learning of video object ...
research
08/19/2023

Scalable Video Object Segmentation with Simplified Framework

The current popular methods for video object segmentation (VOS) implemen...
research
09/21/2020

Discriminative Segmentation Tracking Using Dual Memory Banks

Existing template-based trackers usually localize the target in each fra...
research
10/10/2020

Hybrid Sequence to Sequence Model for Video Object Segmentation

One-shot Video Object Segmentation (VOS) is the task of pixel-wise track...
research
03/22/2023

Tube-Link: A Flexible Cross Tube Baseline for Universal Video Segmentation

The goal of video segmentation is to accurately segment and track every ...

Please sign up or login with your details

Forgot password? Click here to reset