AI Chat AI Image Generator AI Video Text to Speech

Multimodal Logical Inference System for Visual-Textual Entailment

06/10/2019

∙

by Riko Suzuki, et al.

∙

∙

A large amount of research about multimodal inference across text and vision has been recently developed to obtain visually grounded word and sentence representations. In this paper, we use logic-based representations as unified meaning representations for texts and images and present an unsupervised multimodal logical inference system that can effectively prove entailment relations between them. We show that by combining semantic parsing and theorem proving, the system can handle semantically complex sentences for visual-textual inference.

Riko Suzuki
2 publications
Hitomi Yanaka
18 publications
Masashi Yoshikawa
9 publications
Koji Mineshima
18 publications
Daisuke Bekki
14 publications

page 2

page 5

research

∙ 04/04/2020

Evaluating Multimodal Representations on Visual Semantic Textual Similarity

The combination of visual and textual representations has produced excel...

0 Oier Lopez de Lacalle, et al. ∙

research

∙ 06/14/2018

Grounded Textual Entailment

Capturing semantic relations between sentences, such as entailment, is a...

0 Hoa Trong Vu, et al. ∙

research

∙ 06/27/2021

Building a Video-and-Language Dataset with Human Actions for Multimodal Logical Inference

This paper introduces a new video-and-language dataset with human action...

0 Riko Suzuki, et al. ∙

research

∙ 04/19/2018

Consistent CCG Parsing over Multiple Sentences for Improved Logical Reasoning

In formal logic-based approaches to Recognizing Textual Entailment (RTE)...

0 Masashi Yoshikawa, et al. ∙

research

∙ 11/20/2019

Paraphrasing Verbs for Noun Compound Interpretation

An important challenge for the automatic analysis of English written tex...

6 Preslav Nakov, et al. ∙

research

∙ 04/16/2019

Unsupervised Discovery of Multimodal Links in Multi-Image, Multi-Sentence Documents

Images and text co-occur everywhere on the web, but explicit links betwe...

0 Jack Hessel, et al. ∙

research

∙ 05/01/2021

Capturing Logical Structure of Visually Structured Documents with Multimodal Transition Parser

While many NLP papers, tasks and pipelines assume raw, clean texts, many...

0 Yuta Koreeda, et al. ∙