Breaking the "Object" in Video Object Segmentation

12/12/2022
by   Pavel Tokmakov, et al.
3

The appearance of an object can be fleeting when it transforms. As eggs are broken or paper is torn, their color, shape and texture can change dramatically, preserving virtually nothing of the original except for the identity itself. Yet, this important phenomenon is largely absent from existing video object segmentation (VOS) benchmarks. In this work, we close the gap by collecting a new dataset for Video Object Segmentation under Transformations (VOST). It consists of more than 700 high-resolution videos, captured in diverse environments, which are 20 seconds long on average and densely labeled with instance masks. A careful, multi-step approach is adopted to ensure that these videos focus on complex object transformations, capturing their full temporal extent. We then extensively evaluate state-of-the-art VOS methods and make a number of important discoveries. In particular, we show that existing methods struggle when applied to this novel task and that their main limitation lies in over-reliance on static appearance cues. This motivates us to propose a few modifications for the top-performing baseline that improve its capabilities by better modeling spatio-temporal information. But more broadly, the hope is to stimulate discussion on learning more robust video object representations.

READ FULL TEXT

page 1

page 2

page 4

page 5

page 8

page 14

page 16

page 17

research
09/18/2017

Video Object Segmentation Without Temporal Information

Video Object Segmentation, and video processing in general, has been his...
research
01/06/2021

Generating Masks from Boxes by Mining Spatio-Temporal Consistencies in Videos

Segmenting objects in videos is a fundamental computer vision task. The ...
research
11/18/2022

LVOS: A Benchmark for Long-term Video Object Segmentation

Existing video object segmentation (VOS) benchmarks focus on short-term ...
research
08/20/2018

Video-to-Video Synthesis

We study the problem of video-to-video synthesis, whose goal is to learn...
research
11/22/2022

Domain Alignment and Temporal Aggregation for Unsupervised Video Object Segmentation

Unsupervised video object segmentation aims at detecting and segmenting ...
research
08/25/2021

Robust High-Resolution Video Matting with Temporal Guidance

We introduce a robust, real-time, high-resolution human video matting me...
research
06/25/2020

Unsupervised Video Decomposition using Spatio-temporal Iterative Inference

Unsupervised multi-object scene decomposition is a fast-emerging problem...

Please sign up or login with your details

Forgot password? Click here to reset