SynthRef: Generation of Synthetic Referring Expressions for Object Segmentation

06/08/2021
by   Ioannis Kazakos, et al.
0

Recent advances in deep learning have brought significant progress in visual grounding tasks such as language-guided video object segmentation. However, collecting large datasets for these tasks is expensive in terms of annotation time, which represents a bottleneck. To this end, we propose a novel method, namely SynthRef, for generating synthetic referring expressions for target objects in an image (or video frame), and we also present and disseminate the first large-scale dataset with synthetic referring expressions for video object segmentation. Our experiments demonstrate that by training with our synthetic referring expressions one can improve the ability of a model to generalize across different datasets, without any additional annotation cost. Moreover, our formulation allows its application to any object detection or segmentation dataset.

READ FULL TEXT

page 2

page 4

research
03/21/2018

Video Object Segmentation with Language Referring Expressions

Most state-of-the-art semi-supervised video object segmentation methods ...
research
12/08/2016

Learning Video Object Segmentation from Static Images

Inspired by recent advances of deep learning in instance segmentation an...
research
05/22/2023

UVOSAM: A Mask-free Paradigm for Unsupervised Video Object Segmentation via Segment Anything Model

Unsupervised video object segmentation has made significant progress in ...
research
04/06/2017

Semantically-Guided Video Object Segmentation

This paper tackles the problem of semi-supervised video object segmentat...
research
06/13/2019

Grounding Object Detections With Transcriptions

A vast amount of audio-visual data is available on the Internet thanks t...
research
10/01/2020

RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation

The task of video object segmentation with referring expressions (langua...
research
08/23/2023

RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D

Grounding textual expressions on scene objects from first-person views i...

Please sign up or login with your details

Forgot password? Click here to reset