Knowledge Aided Consistency for Weakly Supervised Phrase Grounding

03/11/2018
by   Kan Chen, et al.
0

Given a natural language query, a phrase grounding system aims to localize mentioned objects in an image. In weakly supervised scenario, mapping between image regions (i.e., proposals) and language is not available in the training set. Previous methods address this deficiency by training a grounding system via learning to reconstruct language information contained in input queries from predicted proposals. However, the optimization is solely guided by the reconstruction loss from the language modality, and ignores rich visual information contained in proposals and useful cues from external knowledge. In this paper, we explore the consistency contained in both visual and language modalities, and leverage complementary external knowledge to facilitate weakly supervised grounding. We propose a novel Knowledge Aided Consistency Network (KAC Net) which is optimized by reconstructing input query and proposal's information. To leverage complementary knowledge contained in the visual features, we introduce a Knowledge Based Pooling (KBP) gate to focus on query-related proposals. Experiments show that KAC Net provides a significant improvement on two popular datasets.

READ FULL TEXT

page 3

page 7

research
12/07/2018

PIRC Net : Using Proposal Indexing, Relationships and Context for Phrase Grounding

Phrase Grounding aims to detect and localize objects in images that are ...
research
08/04/2017

Query-guided Regression Network with Context Policy for Phrase Grounding

Given a textual description of an image, phrase grounding localizes obje...
research
08/28/2019

Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding

Weakly supervised referring expression grounding aims at localizing the ...
research
11/05/2020

Utilizing Every Image Object for Semi-supervised Phrase Grounding

Phrase grounding models localize an object in the image given a referrin...
research
05/09/2018

Rethinking Diversified and Discriminative Proposal Generation for Visual Grounding

Visual grounding aims to localize an object in an image referred to by a...
research
07/18/2022

Entity-enhanced Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding

Weakly supervised Referring Expression Grounding (REG) aims to ground a ...
research
04/20/2021

Detector-Free Weakly Supervised Grounding by Separation

Nowadays, there is an abundance of data involving images and surrounding...

Please sign up or login with your details

Forgot password? Click here to reset