Holistic Multi-modal Memory Network for Movie Question Answering

11/12/2018
by   Anran Wang, et al.
12

Answering questions according to multi-modal context is a challenging problem as it requires a deep integration of different data sources. Existing approaches only employ partial interactions among data sources in one attention hop. In this paper, we present the Holistic Multi-modal Memory Network (HMMN) framework which fully considers the interactions between different input sources (multi-modal context, question) in each hop. In addition, it takes answer choices into consideration during the context retrieval stage. Therefore, the proposed framework effectively integrates multi-modal context, question, and answer information, which leads to more informative context retrieved for question answering. Our HMMN framework achieves state-of-the-art accuracy on MovieQA dataset. Extensive ablation studies show the importance of holistic reasoning and contributions of different attention strategies.

READ FULL TEXT

page 2

page 7

page 8

research
12/16/2022

Enhancing Multi-modal and Multi-hop Question Answering via Structured Knowledge and Unified Retrieval-Generation

Multi-modal and multi-hop question answering aims to answer a question b...
research
10/17/2020

Answer-checking in Context: A Multi-modal FullyAttention Network for Visual Question Answering

Visual Question Answering (VQA) is challenging due to the complex cross-...
research
04/27/2021

Question-Aware Memory Network for Multi-hop Question Answering in Human-Robot Interaction

Knowledge graph question answering is an important technology in intelli...
research
03/09/2021

A Discriminative Vectorial Framework for Multi-modal Feature Representation

Due to the rapid advancements of sensory and computing technology, multi...
research
05/28/2019

Gaining Extra Supervision via Multi-task learning for Multi-Modal Video Question Answering

This paper proposes a method to gain extra supervision via multi-task le...
research
08/03/2018

Visual Reasoning with Multi-hop Feature Modulation

Recent breakthroughs in computer vision and natural language processing ...
research
04/15/2021

Data-QuestEval: A Referenceless Metric for Data to Text Semantic Evaluation

In this paper, we explore how QuestEval, which is a Text-vs-Text metric,...

Please sign up or login with your details

Forgot password? Click here to reset