Read, look and detect: Bounding box annotation from image-caption pairs

06/09/2023
by   Eduardo Hugo Sanchez, et al.
0

Various methods have been proposed to detect objects while reducing the cost of data annotation. For instance, weakly supervised object detection (WSOD) methods rely only on image-level annotations during training. Unfortunately, data annotation remains expensive since annotators must provide the categories describing the content of each image and labeling is restricted to a fixed set of categories. In this paper, we propose a method to locate and label objects in an image by using a form of weaker supervision: image-caption pairs. By leveraging recent advances in vision-language (VL) models and self-supervised vision transformers (ViTs), our method is able to perform phrase grounding and object detection in a weakly supervised manner. Our experiments demonstrate the effectiveness of our approach by achieving a 47.51 grounding on Flickr30k Entities and establishing a new state-of-the-art in object detection by achieving 21.1 mAP 50 and 10.5 mAP 50:95 on MS COCO when exclusively relying on image-caption pairs.

READ FULL TEXT
research
11/20/2020

Open-Vocabulary Object Detection Using Captions

Despite the remarkable accuracy of deep neural networks in object detect...
research
08/16/2022

Object Discovery via Contrastive Learning for Weakly Supervised Object Detection

Weakly Supervised Object Detection (WSOD) is a task that detects objects...
research
04/02/2019

Activity Driven Weakly Supervised Object Detection

Weakly supervised object detection aims at reducing the amount of superv...
research
11/27/2017

Separating Self-Expression and Visual Content in Hashtag Supervision

The variety, abundance, and structured nature of hashtags make them an i...
research
07/23/2019

Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection

Learning to localize and name object instances is a fundamental problem ...
research
07/31/2020

Weakly supervised one-stage vision and language disease detection using large scale pneumonia and pneumothorax studies

Detecting clinically relevant objects in medical images is a challenge d...
research
04/10/2023

H2RBox-v2: Boosting HBox-supervised Oriented Object Detection via Symmetric Learning

With the increasing demand for oriented object detection e.g. in autonom...

Please sign up or login with your details

Forgot password? Click here to reset