Visual Entailment Task for Visually-Grounded Language Learning

11/26/2018
by   Ning Xie, et al.
0

We introduce a new inference task - Visual Entailment (VE) - which differs from traditional Textual Entailment (TE) tasks whereby a premise is defined by an image, rather than a natural language sentence as in TE tasks. A novel dataset SNLI-VE is proposed for VE tasks based on the Stanford Natural Language Inference corpus and Flickr30K. We introduce a differentiable architecture called the Explainable Visual Entailment model (EVE) to tackle the VE problem. EVE and several other state-of-the-art visual question answering (VQA) based models are evaluated on the SNLI-VE dataset, facilitating grounded language understanding and providing insights on how modern VQA based models perform.

READ FULL TEXT

page 2

page 3

research
01/20/2019

Visual Entailment: A Novel Task for Fine-Grained Image Understanding

Existing visual reasoning datasets such as Visual Question Answering (VQ...
research
08/21/2015

A large annotated corpus for learning natural language inference

Understanding entailment and contradiction is fundamental to understandi...
research
10/09/2017

Natural Language Inference from Multiple Premises

We define a novel textual entailment task that requires inference over m...
research
11/16/2022

AlignVE: Visual Entailment Recognition Based on Alignment Relations

Visual entailment (VE) is to recognize whether the semantics of a hypoth...
research
06/23/2019

Investigating Biases in Textual Entailment Datasets

The ability to understand logical relationships between sentences is an ...
research
04/16/2021

VGNMN: Video-grounded Neural Module Network to Video-Grounded Language Tasks

Neural module networks (NMN) have achieved success in image-grounded tas...
research
05/26/2023

Entailment as Robust Self-Learner

Entailment has been recognized as an important metric for evaluating nat...

Please sign up or login with your details

Forgot password? Click here to reset