Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection

07/23/2019
by   Keren Ye, et al.
4

Learning to localize and name object instances is a fundamental problem in vision, but state-of-the-art approaches rely on expensive bounding box supervision. While weakly supervised detection (WSOD) methods relax the need for boxes to that of image-level annotations, even cheaper supervision is naturally available in the form of unstructured textual descriptions that users may freely provide when uploading image content. However, straightforward approaches to using such data for WSOD wastefully discard captions that do not exactly match object names. Instead, we show how to squeeze the most information out of these captions by training a text-only classifier that generalizes beyond dataset boundaries. Our discovery provides an opportunity for learning detection models from noisy but more abundant and freely-available caption data. We also validate our model on three classic object detection benchmarks and achieve state-of-the-art WSOD performance.

READ FULL TEXT

page 1

page 3

page 7

page 12

page 16

page 17

research
11/25/2018

Learning to discover and localize visual objects with open vocabulary

To alleviate the cost of obtaining accurate bounding boxes for training ...
research
09/30/2020

Learning Object Detection from Captions via Textual Scene Attributes

Object detection is a fundamental task in computer vision, requiring lar...
research
03/16/2023

VEIL: Vetting Extracted Image Labels from In-the-Wild Captions for Weakly-Supervised Object Detection

The use of large-scale vision-language datasets is limited for object de...
research
06/09/2023

Read, look and detect: Bounding box annotation from image-caption pairs

Various methods have been proposed to detect objects while reducing the ...
research
05/09/2022

Beyond Bounding Box: Multimodal Knowledge Learning for Object Detection

Multimodal supervision has achieved promising results in many visual lan...
research
07/27/2021

Is Object Detection Necessary for Human-Object Interaction Recognition?

This paper revisits human-object interaction (HOI) recognition at image ...
research
08/08/2020

Assisting Scene Graph Generation with Self-Supervision

Research in scene graph generation has quickly gained traction in the pa...

Please sign up or login with your details

Forgot password? Click here to reset