Open-vocabulary Phrase Detection

11/17/2018
by   Bryan A. Plummer, et al.
0

Most existing work that grounds natural language phrases in images starts with the assumption that the phrase in question is relevant to the image. In this paper we address a more realistic version of the natural language grounding task where we must both identify whether the phrase is relevant to an image and localize the phrase. This can also be viewed as a generalization of object detection to an open-ended vocabulary, essentially introducing elements of few- and zero-shot detection. We propose a Phrase R-CNN network for this task that extends Faster R-CNN to relate image regions and phrases. By carefully initializing the classification layers of our network using canonical correlation analysis (CCA), we encourage a solution that is more discerning when reasoning between similar phrases, resulting in over double the performance compared to a naive adaptation on two popular phrase grounding datasets, Flickr30K Entities and ReferIt Game, with test-time phrase vocabulary sizes of 5K and 39K, respectively.

READ FULL TEXT

page 2

page 7

research
10/23/2022

Extending Phrase Grounding with Pronouns in Visual Dialogues

Conventional phrase grounding aims to localize noun phrases mentioned in...
research
08/20/2019

Zero-Shot Grounding of Objects from Natural Language Queries

A phrase grounding system localizes a particular object in an image refe...
research
11/27/2017

Query-Adaptive R-CNN for Open-Vocabulary Object Detection and Retrieval

We address the problem of open-vocabulary object retrieval and localizat...
research
03/27/2013

How Much More Probable is "Much More Probable"? Verbal Expressions for Probability Updates

Bayesian inference systems should be able to explain their reasoning to ...
research
11/21/2016

Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues

This paper presents a framework for localization or grounding of phrases...
research
08/03/2020

PhraseCut: Language-based Image Segmentation in the Wild

We consider the problem of segmenting image regions given a natural lang...
research
11/17/2017

Grounding Visual Explanations (Extended Abstract)

Existing models which generate textual explanations enforce task relevan...

Please sign up or login with your details

Forgot password? Click here to reset