Learning Unsupervised Visual Grounding Through Semantic Self-Supervision

03/17/2018
by   Syed Ashar Javed, et al.
0

Localizing natural language phrases in images is a challenging problem that requires joint understanding of both the textual and visual modalities. In the unsupervised setting, lack of supervisory signals exacerbate this difficulty. In this paper, we propose a novel framework for unsupervised visual grounding which uses concept learning as a proxy task to obtain self-supervision. The simple intuition behind this idea is to encourage the model to localize to regions which can explain some semantic property in the data, in our case, the property being the presence of a concept in a set of images. We present thorough quantitative and qualitative experiments to demonstrate the efficacy of our approach and show a 5.6 on Visual Genome dataset, a 5.8 comparable to state-of-art performance on the Flickr30k dataset.

READ FULL TEXT

page 2

page 15

page 21

page 22

page 23

research
04/24/2019

On the Contributions of Visual and Textual Supervision in Low-resource Semantic Speech Retrieval

Recent work has shown that speech paired with images can be used to lear...
research
04/04/2019

VQD: Visual Query Detection in Natural Scenes

We propose Visual Query Detection (VQD), a new visual grounding task. In...
research
10/20/2021

Look at What I'm Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos

We introduce the task of spatially localizing narrated interactions in v...
research
07/13/2022

3D Concept Grounding on Neural Fields

In this paper, we address the challenging problem of 3D concept groundin...
research
03/24/2021

Relation-aware Instance Refinement for Weakly Supervised Visual Grounding

Visual grounding, which aims to build a correspondence between visual ob...
research
01/05/2022

Formal Analysis of Art: Proxy Learning of Visual Concepts from Style Through Language Models

We present a machine learning system that can quantify fine art painting...
research
11/27/2017

Separating Self-Expression and Visual Content in Hashtag Supervision

The variety, abundance, and structured nature of hashtags make them an i...

Please sign up or login with your details

Forgot password? Click here to reset