DeepAI AI Chat
Log In Sign Up

Towards Solving Multimodal Comprehension

04/20/2021
by   Pritish Sahu, et al.
16

This paper targets the problem of procedural multimodal machine comprehension (M3C). This task requires an AI to comprehend given steps of multimodal instructions and then answer questions. Compared to vanilla machine comprehension tasks where an AI is required only to understand a textual input, procedural M3C is more challenging as the AI needs to comprehend both the temporal and causal factors along with multimodal inputs. Recently Yagcioglu et al. [35] introduced RecipeQA dataset to evaluate M3C. Our first contribution is the introduction of two new M3C datasets- WoodworkQA and DecorationQA with 16K and 10K instructional procedures, respectively. We then evaluate M3C using a textual cloze style question-answering task and highlight an inherent bias in the question answer generation method from [35] that enables a naive baseline to cheat by learning from only answer choices. This naive baseline performs similar to a popular method used in question answering- Impatient Reader [6] that uses attention over both the context and the query. We hypothesized that this naturally occurring bias present in the dataset affects even the best performing model. We verify our proposed hypothesis and propose an algorithm capable of modifying the given dataset to remove the bias elements. Finally, we report our performance on the debiased dataset with several strong baselines. We observe that the performance of all methods falls by a margin of 8 after correcting for the bias. We hope these datasets and the analysis will provide valuable benchmarks and encourage further research in this area.

READ FULL TEXT

page 3

page 5

page 7

04/25/2020

MCQA: Multimodal Co-attention Based Network for Question Answering

We present MCQA, a learning-based algorithm for multimodal question answ...
09/17/2021

CodeQA: A Question Answering Dataset for Source Code Comprehension

We propose CodeQA, a free-form question answering dataset for the purpos...
04/18/2017

SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine

We publicly release a new large-scale dataset, called SearchQA, for mach...
07/04/2021

Audio-Oriented Multimodal Machine Comprehension: Task, Dataset and Model

While Machine Comprehension (MC) has attracted extensive research intere...
06/05/2017

A Joint Model for Question Answering and Question Generation

We propose a generative machine comprehension model that learns jointly ...
01/11/2023

Multimodal Inverse Cloze Task for Knowledge-based Visual Question Answering

We present a new pre-training method, Multimodal Inverse Cloze Task, for...
03/04/2016

Text Understanding with the Attention Sum Reader Network

Several large cloze-style context-question-answer datasets have been int...