Rethinking Diversified and Discriminative Proposal Generation for Visual Grounding

05/09/2018
by   Zhou Yu, et al.
0

Visual grounding aims to localize an object in an image referred to by a textual query phrase. Various visual grounding approaches have been proposed, and the problem can be modularized into a general framework: proposal generation, multi-modal feature representation, and proposal ranking. Of these three modules, most existing approaches focus on the latter two, with the importance of proposal generation generally neglected. In this paper, we rethink the problem of what properties make a good proposal generator. We introduce the diversity and discrimination simultaneously when generating proposals, and in doing so propose Diversified and Discriminative Proposal Networks model (DDPN). Based on the proposals generated by DDPN, we propose a high performance baseline model for visual grounding and evaluate it on four benchmark datasets. Experimental results demonstrate that our model delivers significant improvements on all the tested data-sets (e.g., 18.8% improvement on ReferItGame and 8.2% improvement on Flickr30k Entities over the existing state-of-the-arts respectively)

READ FULL TEXT

page 2

page 5

page 6

research
12/20/2020

PPGN: Phrase-Guided Proposal Generation Network For Referring Expression Comprehension

Reference expression comprehension (REC) aims to find the location that ...
research
08/04/2017

Query-guided Regression Network with Context Policy for Phrase Grounding

Given a textual description of an image, phrase grounding localizes obje...
research
08/20/2019

Zero-Shot Grounding of Objects from Natural Language Queries

A phrase grounding system localizes a particular object in an image refe...
research
03/11/2018

Knowledge Aided Consistency for Weakly Supervised Phrase Grounding

Given a natural language query, a phrase grounding system aims to locali...
research
03/27/2018

A New Target-specific Object Proposal Generation Method for Visual Tracking

Object proposal generation methods have been widely applied to many comp...
research
06/11/2021

Team RUC_AIM3 Technical Report at ActivityNet 2021: Entities Object Localization

Entities Object Localization (EOL) aims to evaluate how grounded or fait...
research
03/09/2021

Iterative Shrinking for Referring Expression Grounding Using Deep Reinforcement Learning

In this paper, we are tackling the proposal-free referring expression gr...

Please sign up or login with your details

Forgot password? Click here to reset