Modularized Textual Grounding for Counterfactual Resilience

04/07/2019
by   Zhiyuan Fang, et al.
0

Computer Vision applications often require a textual grounding module with precision, interpretability, and resilience to counterfactual inputs/queries. To achieve high grounding precision, current textual grounding methods heavily rely on large-scale training data with manual annotations at the pixel level. Such annotations are expensive to obtain and thus severely narrow the model's scope of real-world applications. Moreover, most of these methods sacrifice interpretability, generalizability, and they neglect the importance of being resilient to counterfactual inputs. To address these issues, we propose a visual grounding system which is 1) end-to-end trainable in a weakly supervised fashion with only image-level annotations, and 2) counterfactually resilient owing to the modular design. Specifically, we decompose textual descriptions into three levels: entity, semantic attribute, color information, and perform compositional grounding progressively. We validate our model through a series of experiments and demonstrate its improvement over the state-of-the-art methods. In particular, our model's performance not only surpasses other weakly/un-supervised methods and even approaches the strongly supervised ones, but also is interpretable for decision making and performs much better in face of counterfactual classes than all the others.

READ FULL TEXT

page 2

page 6

page 8

page 12

page 13

research
05/01/2018

Weakly Supervised Attention Learning for Textual Phrases Grounding

Grounding textual phrases in visual content is a meaningful yet challeng...
research
11/26/2022

Who are you referring to? Weakly supervised coreference resolution with multimodal grounding

Coreference resolution aims at identifying words and phrases which refer...
research
07/18/2022

Entity-enhanced Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding

Weakly supervised Referring Expression Grounding (REG) aims to ground a ...
research
05/11/2021

Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules

The problem of grounding VQA tasks has seen an increased attention in th...
research
03/16/2022

Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding

Visual grounding, i.e., localizing objects in images according to natura...
research
06/21/2020

Weak Supervision and Referring Attention for Temporal-Textual Association Learning

A system capturing the association between video frames and textual quer...
research
05/23/2018

Neural Network Interpretation via Fine Grained Textual Summarization

Current visualization based network interpretation methodssuffer from la...

Please sign up or login with your details

Forgot password? Click here to reset