Multimodal Logical Inference System for Visual-Textual Entailment

06/10/2019
by   Riko Suzuki, et al.
0

A large amount of research about multimodal inference across text and vision has been recently developed to obtain visually grounded word and sentence representations. In this paper, we use logic-based representations as unified meaning representations for texts and images and present an unsupervised multimodal logical inference system that can effectively prove entailment relations between them. We show that by combining semantic parsing and theorem proving, the system can handle semantically complex sentences for visual-textual inference.

READ FULL TEXT

page 2

page 5

research
04/04/2020

Evaluating Multimodal Representations on Visual Semantic Textual Similarity

The combination of visual and textual representations has produced excel...
research
06/14/2018

Grounded Textual Entailment

Capturing semantic relations between sentences, such as entailment, is a...
research
06/27/2021

Building a Video-and-Language Dataset with Human Actions for Multimodal Logical Inference

This paper introduces a new video-and-language dataset with human action...
research
04/19/2018

Consistent CCG Parsing over Multiple Sentences for Improved Logical Reasoning

In formal logic-based approaches to Recognizing Textual Entailment (RTE)...
research
11/20/2019

Paraphrasing Verbs for Noun Compound Interpretation

An important challenge for the automatic analysis of English written tex...
research
04/16/2019

Unsupervised Discovery of Multimodal Links in Multi-Image, Multi-Sentence Documents

Images and text co-occur everywhere on the web, but explicit links betwe...
research
05/01/2021

Capturing Logical Structure of Visually Structured Documents with Multimodal Transition Parser

While many NLP papers, tasks and pipelines assume raw, clean texts, many...

Please sign up or login with your details

Forgot password? Click here to reset