Weakly Supervised Attention Learning for Textual Phrases Grounding

05/01/2018
by   Zhiyuan Fang, et al.
0

Grounding textual phrases in visual content is a meaningful yet challenging problem with various potential applications such as image-text inference or text-driven multimedia interaction. Most of the current existing methods adopt the supervised learning mechanism which requires ground-truth at pixel level during training. However, fine-grained level ground-truth annotation is quite time-consuming and severely narrows the scope for more general applications. In this extended abstract, we explore methods to localize flexibly image regions from the top-down signal (in a form of one-hot label or natural languages) with a weakly supervised attention learning mechanism. In our model, two types of modules are utilized: a backbone module for visual feature capturing, and an attentive module generating maps based on regularized bilinear pooling. We construct the model in an end-to-end fashion which is trained by encouraging the spatial attentive map to shift and focus on the region that consists of the best matched visual features with the top-down signal. We demonstrate the preliminary yet promising results on a testbed that is synthesized with multi-label MNIST data.

READ FULL TEXT
research
12/07/2018

PIRC Net : Using Proposal Indexing, Relationships and Context for Phrase Grounding

Phrase Grounding aims to detect and localize objects in images that are ...
research
04/07/2019

Modularized Textual Grounding for Counterfactual Resilience

Computer Vision applications often require a textual grounding module wi...
research
05/10/2022

Weakly-supervised segmentation of referring expressions

Visual grounding localizes regions (boxes or segments) in the image corr...
research
05/03/2017

Weakly-supervised Visual Grounding of Phrases with Linguistic Structures

We propose a weakly-supervised approach that takes image-sentence pairs ...
research
09/10/2021

Panoptic Narrative Grounding

This paper proposes Panoptic Narrative Grounding, a spatially fine and g...
research
06/24/2023

Weakly Supervised Multi-Label Classification of Full-Text Scientific Papers

Instead of relying on human-annotated training samples to build a classi...
research
05/23/2018

Neural Network Interpretation via Fine Grained Textual Summarization

Current visualization based network interpretation methodssuffer from la...

Please sign up or login with your details

Forgot password? Click here to reset