Query-Adaptive R-CNN for Open-Vocabulary Object Detection and Retrieval

11/27/2017
by   Ryota Hinami, et al.
0

We address the problem of open-vocabulary object retrieval and localization, which is to retrieve and localize objects from a very large-scale image database immediately by a textual query (e.g., a word or phrase). We first propose Query-Adaptive R-CNN, a simple yet strong framework for open-vocabulary object detection. Query-Adaptive R-CNN is a simple extension of Faster R-CNN from closed-vocabulary to open-vocabulary object detection: instead of learning a class-specific classifier and regressor, we learn a detector generator that transforms a text into classifier and regressor weights. All of its components can be learned in an end-to-end manner. Even with its simple architecture, it outperforms all state-of-the-art methods in the Flickr30k Entities phrase localization task. In addition, we propose negative phrase augmentation, a generic approach for exploiting hard negatives in the training of open-vocabulary object detection that significantly improves the discriminative ability of the generated classifier. We show that our system can retrieve and localize objects specified by a textual query from one million images in only 0.5 seconds.

READ FULL TEXT

page 1

page 7

page 8

research
11/17/2018

Open-vocabulary Phrase Detection

Most existing work that grounds natural language phrases in images start...
research
09/30/2022

F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models

We present F-VLM, a simple open-vocabulary object detection method built...
research
03/25/2023

Prompt-Guided Transformers for End-to-End Open-Vocabulary Object Detection

Prompt-OVD is an efficient and effective framework for open-vocabulary o...
research
03/28/2022

NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External Knowledge

Novel object captioning aims at describing objects absent from training ...
research
02/23/2017

ViP-CNN: Visual Phrase Guided Convolutional Neural Network

As the intermediate level task connecting image captioning and object de...
research
01/06/2019

A Comparative Study on Vocabulary Reduction for Phrase Table Smoothing

This work systematically analyzes the smoothing effect of vocabulary red...
research
07/07/2022

Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection

Existing open-vocabulary object detectors typically enlarge their vocabu...

Please sign up or login with your details

Forgot password? Click here to reset