Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model

03/28/2022
by   Yu Du, et al.
0

Recently, vision-language pre-training shows great potential in open-vocabulary object detection, where detectors trained on base classes are devised for detecting new classes. The class text embedding is firstly generated by feeding prompts to the text encoder of a pre-trained vision-language model. It is then used as the region classifier to supervise the training of a detector. The key element that leads to the success of this model is the proper prompt, which requires careful words tuning and ingenious design. To avoid laborious prompt engineering, there are some prompt representation learning methods being proposed for the image classification task, which however can only be sub-optimal solutions when applied to the detection task. In this paper, we introduce a novel method, detection prompt (DetPro), to learn continuous prompt representations for open-vocabulary object detection based on the pre-trained vision-language model. Different from the previous classification-oriented methods, DetPro has two highlights: 1) a background interpretation scheme to include the proposals in image background into the prompt training; 2) a context grading scheme to separate proposals in image foreground for tailored prompt training. We assemble DetPro with ViLD, a recent state-of-the-art open-world object detector, and conduct experiments on the LVIS as well as transfer learning on the Pascal VOC, COCO, Objects365 datasets. Experimental results show that our DetPro outperforms the baseline ViLD in all settings, e.g., +3.4 APbox and +3.0 APmask improvements on the novel classes of LVIS. Code and models are available at https://github.com/dyabel/detpro.

READ FULL TEXT
research
11/27/2022

Learning Object-Language Alignments for Open-Vocabulary Object Detection

Existing object detection methods are bounded in a fixed-set vocabulary ...
research
05/12/2022

Localized Vision-Language Matching for Open-vocabulary Object Detection

In this work, we propose an open-world object detection method that, bas...
research
02/27/2023

Aligning Bag of Regions for Open-Vocabulary Object Detection

Pre-trained vision-language models (VLMs) learn to align vision and lang...
research
01/07/2022

Detecting Twenty-thousand Classes using Image-level Supervision

Current object detectors are limited in vocabulary size due to the small...
research
03/22/2022

Open-Vocabulary DETR with Conditional Matching

Open-vocabulary object detection, which is concerned with the problem of...
research
05/16/2023

Mobile User Interface Element Detection Via Adaptively Prompt Tuning

Recent object detection approaches rely on pretrained vision-language mo...
research
03/23/2023

CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching

Open-vocabulary detection (OVD) is an object detection task aiming at de...

Please sign up or login with your details

Forgot password? Click here to reset