Scaling Open-Vocabulary Object Detection

06/16/2023
by   Matthias Minderer, et al.
0

Open-vocabulary object detection has benefited greatly from pretrained vision-language models, but is still limited by the amount of available detection training data. While detection training data can be expanded by using Web image-text pairs as weak supervision, this has not been done at scales comparable to image-level pretraining. Here, we scale up detection data with self-training, which uses an existing detector to generate pseudo-box annotations on image-text pairs. Major challenges in scaling self-training are the choice of label space, pseudo-annotation filtering, and training efficiency. We present the OWLv2 model and OWL-ST self-training recipe, which address these challenges. OWLv2 surpasses the performance of previous state-of-the-art open-vocabulary detectors already at comparable training scales ( 10M examples). However, with OWL-ST, we can scale to over 1B examples, yielding further large improvement: With an L/14 architecture, OWL-ST improves AP on LVIS rare classes, for which the model has seen no human box annotations, from 31.2 training for open-world localization, similar to what has been seen for image classification and language modelling.

READ FULL TEXT

page 13

page 19

page 20

page 21

page 22

research
09/30/2022

F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models

We present F-VLM, a simple open-vocabulary object detection method built...
research
07/07/2022

Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection

Existing open-vocabulary object detectors typically enlarge their vocabu...
research
04/10/2023

DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment

This paper presents DetCLIPv2, an efficient and scalable training framew...
research
01/07/2022

Detecting Twenty-thousand Classes using Image-level Supervision

Current object detectors are limited in vocabulary size due to the small...
research
03/23/2023

Open-Vocabulary Object Detection using Pseudo Caption Labels

Recent open-vocabulary detection methods aim to detect novel objects by ...
research
04/12/2022

X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks

In this paper, we study the challenging instance-wise vision-language ta...
research
03/23/2023

CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching

Open-vocabulary detection (OVD) is an object detection task aiming at de...

Please sign up or login with your details

Forgot password? Click here to reset