Towards Open Vocabulary Object Detection without Human-provided Bounding Boxes

11/18/2021
by   Mingfei Gao, et al.
7

Despite great progress in object detection, most existing methods are limited to a small set of object categories, due to the tremendous human effort needed for instance-level bounding-box annotation. To alleviate the problem, recent open vocabulary and zero-shot detection methods attempt to detect object categories not seen during training. However, these approaches still rely on manually provided bounding-box annotations on a set of base classes. We propose an open vocabulary detection framework that can be trained without manually provided bounding-box annotations. Our method achieves this by leveraging the localization ability of pre-trained vision-language models and generating pseudo bounding-box labels that can be used directly for training object detectors. Experimental results on COCO, PASCAL VOC, Objects365 and LVIS demonstrate the effectiveness of our method. Specifically, our method outperforms the state-of-the-arts (SOTA) that are trained using human annotated bounding-boxes by 3 source is not equipped with manual bounding-box labels. When utilizing the manual bounding-box labels as our baselines do, our method surpasses the SOTA largely by 8

READ FULL TEXT

page 3

page 4

page 7

research
11/20/2020

Open-Vocabulary Object Detection Using Captions

Despite the remarkable accuracy of deep neural networks in object detect...
research
07/02/2020

Iterative Bounding Box Annotation for Object Detection

Manual annotation of bounding boxes for object detection in digital imag...
research
07/16/2018

Leveraging Pre-Trained 3D Object Detection Models For Fast Ground Truth Generation

Training 3D object detectors for autonomous driving has been limited to ...
research
03/03/2020

Towards Noise-resistant Object Detection with Noisy Annotations

Training deep object detectors requires significant amount of human-anno...
research
03/23/2023

CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching

Open-vocabulary detection (OVD) is an object detection task aiming at de...
research
04/22/2019

Detecting retail products in situ using CNN without human effort labeling

CNN is a powerful tool for many computer vision tasks, achieving much be...
research
12/10/2018

EDF: Ensemble, Distill, and Fuse for Easy Video Labeling

We present a way to rapidly bootstrap object detection on unseen videos ...

Please sign up or login with your details

Forgot password? Click here to reset