Premise-based Multimodal Reasoning: A Human-like Cognitive Process

05/15/2021
by   Qingxiu Dong, et al.
0

Reasoning is one of the major challenges of Human-like AI and has recently attracted intensive attention from natural language processing (NLP) researchers. However, cross-modal reasoning needs further research. For cross-modal reasoning, we observe that most methods fall into shallow feature matching without in-depth human-like reasoning.The reason lies in that existing cross-modal tasks directly ask questions for a image. However, human reasoning in real scenes is often made under specific background information, a process that is studied by the ABC theory in social psychology. We propose a shared task named "Premise-based Multimodal Reasoning" (PMR), which requires participating models to reason after establishing a profound understanding of background information. We believe that the proposed PMR would contribute to and help shed a light on human-like in-depth reasoning.

READ FULL TEXT
research
09/05/2023

A Survey on Interpretable Cross-modal Reasoning

In recent years, cross-modal reasoning (CMR), the process of understandi...
research
04/15/2022

Improving Cross-Modal Understanding in Visual Dialog via Contrastive Learning

Visual Dialog is a challenging vision-language task since the visual dia...
research
01/26/2023

Multimodal Event Transformer for Image-guided Story Ending Generation

Image-guided story ending generation (IgSEG) is to generate a story endi...
research
07/11/2022

Cross-modal Prototype Driven Network for Radiology Report Generation

Radiology report generation (RRG) aims to describe automatically a radio...
research
10/13/2020

Does my multimodal model learn cross-modal interactions? It's harder to tell than you might think!

Modeling expressive cross-modal interactions seems crucial in multimodal...
research
06/12/2018

Attentive cross-modal paratope prediction

Antibodies are a critical part of the immune system, having the function...
research
11/09/2020

Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human Gaze

When speakers describe an image, they tend to look at objects before men...

Please sign up or login with your details

Forgot password? Click here to reset