Beyond Bounding Box: Multimodal Knowledge Learning for Object Detection

05/09/2022
by   Weixin Feng, et al.
0

Multimodal supervision has achieved promising results in many visual language understanding tasks, where the language plays an essential role as a hint or context for recognizing and locating instances. However, due to the defects of the human-annotated language corpus, multimodal supervision remains unexplored in fully supervised object detection scenarios. In this paper, we take advantage of language prompt to introduce effective and unbiased linguistic supervision into object detection, and propose a new mechanism called multimodal knowledge learning (MKL), which is required to learn knowledge from language supervision. Specifically, we design prompts and fill them with the bounding box annotations to generate descriptions containing extensive hints and context for instances recognition and localization. The knowledge from language is then distilled into the detection model via maximizing cross-modal mutual information in both image- and object-level. Moreover, the generated descriptions are manipulated to produce hard negatives to further boost the detector performance. Extensive experiments demonstrate that the proposed method yields a consistent performance gain by 1.6% ∼ 2.1% and achieves state-of-the-art on MS-COCO and OpenImages datasets.

READ FULL TEXT

page 3

page 5

page 7

research
11/19/2019

Tell Me What They're Holding: Weakly-supervised Object Detection with Transferable Knowledge from Human-object Interaction

In this work, we introduce a novel weakly supervised object detection (W...
research
10/12/2022

BoxMask: Revisiting Bounding Box Supervision for Video Object Detection

We present a new, simple yet effective approach to uplift video object d...
research
03/03/2020

Towards Noise-resistant Object Detection with Noisy Annotations

Training deep object detectors requires significant amount of human-anno...
research
08/08/2020

Assisting Scene Graph Generation with Self-Supervision

Research in scene graph generation has quickly gained traction in the pa...
research
11/08/2022

Detecting Euphemisms with Literal Descriptions and Visual Imagery

This paper describes our two-stage system for the Euphemism Detection sh...
research
07/23/2019

Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection

Learning to localize and name object instances is a fundamental problem ...
research
04/03/2018

Transferring Common-Sense Knowledge for Object Detection

We propose the idea of transferring common-sense knowledge from source c...

Please sign up or login with your details

Forgot password? Click here to reset