Fine-Grained Visual Entailment

03/29/2022
by   Christopher Thomas, et al.
0

Visual entailment is a recently proposed multimodal reasoning task where the goal is to predict the logical relationship of a piece of text to an image. In this paper, we propose an extension of this task, where the goal is to predict the logical relationship of fine-grained knowledge elements within a piece of text to an image. Unlike prior work, our method is inherently explainable and makes logical predictions at different levels of granularity. Because we lack fine-grained labels to train our method, we propose a novel multi-instance learning approach which learns a fine-grained labeling using only sample-level supervision. We also impose novel semantic structural constraints which ensure that fine-grained predictions are internally semantically consistent. We evaluate our method on a new dataset of manually annotated knowledge elements and show that our method achieves 68.18% accuracy at this challenging task while significantly outperforming several strong baselines. Finally, we present extensive qualitative results illustrating our method's predictions and the visual evidence our method relied on. Our code and annotated dataset can be found here: https://github.com/SkrighYZ/FGVE.

READ FULL TEXT
research
01/20/2019

Visual Entailment: A Novel Task for Fine-Grained Image Understanding

Existing visual reasoning datasets such as Visual Question Answering (VQ...
research
03/27/2022

Knowledge Mining with Scene Text for Fine-Grained Recognition

Recently, the semantics of scene text has been proven to be essential in...
research
04/29/2020

Leveraging Declarative Knowledge in Text and First-Order Logic for Fine-Grained Propaganda Detection

We study the detection of propagandistic text fragments in news articles...
research
11/05/2021

The Curious Layperson: Fine-Grained Image Recognition without Expert Labels

Most of us are not experts in specific fields, such as ornithology. None...
research
12/12/2022

Robust and Explainable Identification of Logical Fallacies in Natural Language Arguments

The spread of misinformation, propaganda, and flawed argumentation has b...
research
12/05/2022

Decoding natural image stimuli from fMRI data with a surface-based convolutional network

Due to the low signal-to-noise ratio and limited resolution of functiona...
research
04/14/2023

CornerFormer: Boosting Corner Representation for Fine-Grained Structured Reconstruction

Structured reconstruction is a non-trivial dense prediction problem, whi...

Please sign up or login with your details

Forgot password? Click here to reset