Spectrum-guided Multi-granularity Referring Video Object Segmentation

07/25/2023
by   Bo Miao, et al.
0

Current referring video object segmentation (R-VOS) techniques extract conditional kernels from encoded (low-resolution) vision-language features to segment the decoded high-resolution features. We discovered that this causes significant feature drift, which the segmentation kernels struggle to perceive during the forward computation. This negatively affects the ability of segmentation kernels. To address the drift problem, we propose a Spectrum-guided Multi-granularity (SgMg) approach, which performs direct segmentation on the encoded features and employs visual details to further optimize the masks. In addition, we propose Spectrum-guided Cross-modal Fusion (SCF) to perform intra-frame global interactions in the spectral domain for effective multimodal representation. Finally, we extend SgMg to perform multi-object R-VOS, a new paradigm that enables simultaneous segmentation of multiple referred objects in a video. This not only makes R-VOS faster, but also more practical. Extensive experiments show that SgMg achieves state-of-the-art performance on four video benchmark datasets, outperforming the nearest competitor by 2.8 enables multi-object R-VOS, runs about 3 times faster while maintaining satisfactory performance. Code is available at https://github.com/bo-miao/SgMg.

READ FULL TEXT

page 3

page 8

page 13

research
07/26/2022

Multi-Attention Network for Compressed Video Referring Object Segmentation

Referring video object segmentation aims to segment the object referred ...
research
03/22/2022

High-resolution Iterative Feedback Network for Camouflaged Object Detection

Spotting camouflaged objects that are visually assimilated into the back...
research
03/30/2022

Deeply Interleaved Two-Stream Encoder for Referring Video Segmentation

Referring video segmentation aims to segment the corresponding video obj...
research
03/13/2023

Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos

Video semantic segmentation (VSS) is a computationally expensive task du...
research
09/05/2023

Learning Cross-Modal Affinity for Referring Video Object Segmentation Targeting Limited Samples

Referring video object segmentation (RVOS), as a supervised learning tas...
research
03/22/2022

Associating Objects with Scalable Transformers for Video Object Segmentation

This paper investigates how to realize better and more efficient embeddi...

Please sign up or login with your details

Forgot password? Click here to reset