Video Object Segmentation in Panoptic Wild Scenes

05/08/2023
by   Yuanyou Xu, et al.
0

In this paper, we introduce semi-supervised video object segmentation (VOS) to panoptic wild scenes and present a large-scale benchmark as well as a baseline method for it. Previous benchmarks for VOS with sparse annotations are not sufficient to train or evaluate a model that needs to process all possible objects in real-world scenarios. Our new benchmark (VIPOSeg) contains exhaustive object annotations and covers various real-world object categories which are carefully divided into subsets of thing/stuff and seen/unseen classes for comprehensive evaluation. Considering the challenges in panoptic VOS, we propose a strong baseline method named panoptic object association with transformers (PAOT), which uses panoptic identification to associate objects with a pyramid architecture on multiple scales. Experimental results show that VIPOSeg can not only boost the performance of VOS models by panoptic training but also evaluate them comprehensively in panoptic scenes. Previous methods for classic VOS still need to improve in performance and efficiency when dealing with panoptic scenes, while our PAOT achieves SOTA performance with good efficiency on VIPOSeg and previous VOS benchmarks. PAOT also ranks 1st in the VOT2022 challenge. Our dataset is available at https://github.com/yoxu515/VIPOSeg-Benchmark.

READ FULL TEXT

page 1

page 4

page 5

page 7

research
06/04/2021

Associating Objects with Transformers for Video Object Segmentation

This paper investigates how to realize better and more efficient embeddi...
research
12/12/2022

ScanEnts3D: Exploiting Phrase-to-3D-Object Correspondences for Improved Visio-Linguistic Models in 3D Scenes

The two popular datasets ScanRefer [16] and ReferIt3D [3] connect natura...
research
07/05/2023

ZJU ReLER Submission for EPIC-KITCHEN Challenge 2023: Semi-Supervised Video Object Segmentation

The Associating Objects with Transformers (AOT) framework has exhibited ...
research
12/06/2022

Semi-supervised Deep Large-baseline Homography Estimation with Progressive Equivalence Constraint

Homography estimation is erroneous in the case of large-baseline due to ...
research
09/05/2023

Learning Cross-Modal Affinity for Referring Video Object Segmentation Targeting Limited Samples

Referring video object segmentation (RVOS), as a supervised learning tas...
research
08/08/2021

One-Shot Object Affordance Detection in the Wild

Affordance detection refers to identifying the potential action possibil...
research
06/29/2021

RICE: Refining Instance Masks in Cluttered Environments with Graph Neural Networks

Segmenting unseen object instances in cluttered environments is an impor...

Please sign up or login with your details

Forgot password? Click here to reset