Detecting Twenty-thousand Classes using Image-level Supervision

01/07/2022
by   Xingyi Zhou, et al.
11

Current object detectors are limited in vocabulary size due to the small scale of detection datasets. Image classifiers, on the other hand, reason about much larger vocabularies, as their datasets are larger and easier to collect. We propose Detic, which simply trains the classifiers of a detector on image classification data and thus expands the vocabulary of detectors to tens of thousands of concepts. Unlike prior work, Detic does not assign image labels to boxes based on model predictions, making it much easier to implement and compatible with a range of detection architectures and backbones. Our results show that Detic yields excellent detectors even for classes without box annotations. It outperforms prior work on both open-vocabulary and long-tail detection benchmarks. Detic provides a gain of 2.4 mAP for all classes and 8.3 mAP for novel classes on the open-vocabulary LVIS benchmark. On the standard LVIS benchmark, Detic reaches 41.7 mAP for all classes and 41.7 mAP for rare classes. For the first time, we train a detector with all the twenty-one-thousand classes of the ImageNet dataset and show that it generalizes to new datasets without fine-tuning. Code is available at https://github.com/facebookresearch/Detic.

READ FULL TEXT

page 1

page 2

page 3

page 8

page 13

research
03/16/2022

UnseenNet: Fast Training Detector for Any Unseen Concept

Training of object detection models using less data is currently the foc...
research
03/28/2022

Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model

Recently, vision-language pre-training shows great potential in open-voc...
research
06/16/2023

Scaling Open-Vocabulary Object Detection

Open-vocabulary object detection has benefited greatly from pretrained v...
research
07/18/2014

LSDA: Large Scale Detection Through Adaptation

A major challenge in scaling object detection is the difficulty of obtai...
research
06/04/2019

Natural Vocabulary Emerges from Free-Form Annotations

We propose an approach for annotating object classes using free-form tex...
research
05/18/2023

Going Denser with Open-Vocabulary Part Segmentation

Object detection has been expanded from a limited number of categories t...
research
11/23/2018

Fast Object Class Labelling via Speech

Object class labelling is the task of annotating images with labels on t...

Please sign up or login with your details

Forgot password? Click here to reset