Open-Vocabulary DETR with Conditional Matching

03/22/2022
by   Yuhang Zang, et al.
3

Open-vocabulary object detection, which is concerned with the problem of detecting novel objects guided by natural language, has gained increasing attention from the community. Ideally, we would like to extend an open-vocabulary detector such that it can produce bounding box predictions based on user inputs in form of either natural language or exemplar image. This offers great flexibility and user experience for human-computer interaction. To this end, we propose a novel open-vocabulary detector based on DETR – hence the name OV-DETR – which, once trained, can detect any object given its class name or an exemplar image. The biggest challenge of turning DETR into an open-vocabulary detector is that it is impossible to calculate the classification cost matrix of novel classes without access to their labeled images. To overcome this challenge, we formulate the learning objective as a binary matching one between input queries (class name or exemplar image) and the corresponding objects, which learns useful correspondence to generalize to unseen queries during testing. For training, we choose to condition the Transformer decoder on the input embeddings obtained from a pre-trained vision-language model like CLIP, in order to enable matching for both text and image queries. With extensive experiments on LVIS and COCO datasets, we demonstrate that our OV-DETR – the first end-to-end Transformer-based open-vocabulary detector – achieves non-trivial improvements over current state of the arts.

READ FULL TEXT

page 2

page 12

page 17

page 18

research
03/25/2023

Prompt-Guided Transformers for End-to-End Open-Vocabulary Object Detection

Prompt-OVD is an efficient and effective framework for open-vocabulary o...
research
03/23/2023

CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching

Open-vocabulary detection (OVD) is an object detection task aiming at de...
research
03/28/2022

Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model

Recently, vision-language pre-training shows great potential in open-voc...
research
05/12/2022

Localized Vision-Language Matching for Open-vocabulary Object Detection

In this work, we propose an open-world object detection method that, bas...
research
11/18/2022

Detect Only What You Specify : Object Detection with Linguistic Target

Object detection is a computer vision task of predicting a set of boundi...
research
06/22/2022

Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization

Open-vocabulary object detection (OVD) aims to scale up vocabulary size ...
research
10/24/2018

Resolving Referring Expressions in Images With Labeled Elements

Images may have elements containing text and a bounding box associated w...

Please sign up or login with your details

Forgot password? Click here to reset