RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation

10/01/2020
by   Miriam Bellver, et al.
2

The task of video object segmentation with referring expressions (language-guided VOS) is to, given a linguistic phrase and a video, generate binary masks for the object to which the phrase refers. Our work argues that existing benchmarks used for this task are mainly composed of trivial cases, in which referents can be identified with simple phrases. Our analysis relies on a new categorization of the phrases in the DAVIS-2017 and Actor-Action datasets into trivial and non-trivial REs, with the non-trivial REs annotated with seven RE semantic categories. We leverage this data to analyze the results of RefVOS, a novel neural network that obtains competitive results for the task of language-guided image segmentation and state of the art results for language-guided VOS. Our study indicates that the major challenges for the task are related to understanding motion and static actions.

READ FULL TEXT

page 2

page 6

page 8

research
08/03/2020

PhraseCut: Language-based Image Segmentation in the Wild

We consider the problem of segmenting image regions given a natural lang...
research
02/24/2022

Phrase-Based Affordance Detection via Cyclic Bilateral Interaction

Affordance detection, which refers to perceiving objects with potential ...
research
04/12/2012

Video In Sentences Out

We present a system that produces sentential descriptions of video: who ...
research
06/08/2021

SynthRef: Generation of Synthetic Referring Expressions for Object Segmentation

Recent advances in deep learning have brought significant progress in vi...
research
08/19/2015

Recognizing Extended Spatiotemporal Expressions by Actively Trained Average Perceptron Ensembles

Precise geocoding and time normalization for text requires that location...
research
10/24/2022

Investigating the detection of Tortured Phrases in Scientific Literature

With the help of online tools, unscrupulous authors can today generate a...
research
11/02/2020

Actor and Action Modular Network for Text-based Video Segmentation

The actor and action semantic segmentation is a challenging problem that...

Please sign up or login with your details

Forgot password? Click here to reset