Associating Objects with Transformers for Video Object Segmentation

06/04/2021
by   Zongxin Yang, et al.
0

This paper investigates how to realize better and more efficient embedding learning to tackle the semi-supervised video object segmentation under challenging multi-object scenarios. The state-of-the-art methods learn to decode features with a single positive object and thus have to match and segment each target separately under multi-object scenarios, consuming multiple times computing resources. To solve the problem, we propose an Associating Objects with Transformers (AOT) approach to match and decode multiple objects uniformly. In detail, AOT employs an identification mechanism to associate multiple targets into the same high-dimensional embedding space. Thus, we can simultaneously process the matching and segmentation decoding of multiple objects as efficiently as processing a single object. For sufficiently modeling multi-object association, a Long Short-Term Transformer is designed for constructing hierarchical matching and propagation. We conduct extensive experiments on both multi-object and single-object benchmarks to examine AOT variant networks with different complexities. Particularly, our AOT-L outperforms all the state-of-the-art competitors on three popular benchmarks, i.e., YouTube-VOS (83.7 while keeping better multi-object efficiency. Meanwhile, our AOT-T can maintain real-time multi-object speed on above benchmarks. We ranked 1st in the 3rd Large-scale Video Object Segmentation Challenge. The code will be publicly available at https://github.com/z-x-yang/AOT.

READ FULL TEXT

page 2

page 8

page 13

page 14

page 15

page 17

page 18

research
03/22/2022

Associating Objects with Scalable Transformers for Video Object Segmentation

This paper investigates how to realize better and more efficient embeddi...
research
05/08/2023

Video Object Segmentation in Panoptic Wild Scenes

In this paper, we introduce semi-supervised video object segmentation (V...
research
10/13/2020

Collaborative Video Object Segmentation by Multi-Scale Foreground-Background Integration

This paper investigates the principles of embedding learning to tackle t...
research
07/17/2023

Hierarchical Spatiotemporal Transformers for Video Object Segmentation

This paper presents a novel framework called HST for semi-supervised vid...
research
10/18/2022

Decoupling Features in Hierarchical Propagation for Video Object Segmentation

This paper focuses on developing a more effective method of hierarchical...
research
03/18/2020

Collaborative Video Object Segmentation by Foreground-Background Integration

In this paper, we investigate the principles of embedding learning betwe...
research
08/19/2023

Scalable Video Object Segmentation with Simplified Framework

The current popular methods for video object segmentation (VOS) implemen...

Please sign up or login with your details

Forgot password? Click here to reset