Progressive Attention Memory Network for Movie Story Question Answering

04/18/2019
by   Junyeong Kim, et al.
0

This paper proposes the progressive attention memory network (PAMN) for movie story question answering (QA). Movie story QA is challenging compared to VQA in two aspects: (1) pinpointing the temporal parts relevant to answer the question is difficult as the movies are typically longer than an hour, (2) it has both video and subtitle where different questions require different modality to infer the answer. To overcome these challenges, PAMN involves three main features: (1) progressive attention mechanism that utilizes cues from both question and answer to progressively prune out irrelevant temporal parts in memory, (2) dynamic modality fusion that adaptively determines the contribution of each modality for answering the current question, and (3) belief correction answering scheme that successively corrects the prediction score on each candidate answer. Experiments on publicly available benchmark datasets, MovieQA and TVQA, demonstrate that each feature contributes to our movie story QA architecture, PAMN, and improves performance to achieve the state-of-the-art result. Qualitative analysis by visualizing the inference mechanism of PAMN is also provided.

READ FULL TEXT

page 3

page 8

research
03/29/2018

Motion-Appearance Co-Memory Networks for Video Question Answering

Video Question Answering (QA) is an important task in understanding vide...
research
09/27/2017

A Read-Write Memory Network for Movie Story Understanding

We propose a novel memory network model named Read-Write Memory Network ...
research
09/21/2018

Multimodal Dual Attention Memory for Video Story Question Answering

We propose a video story question-answering (QA) architecture, Multimoda...
research
04/25/2018

Movie Question Answering: Remembering the Textual Cues for Layered Visual Contents

Movies provide us with a mass of visual content as well as attracting st...
research
07/04/2020

Modality Shifting Attention Network for Multi-modal Video Question Answering

This paper considers a network referred to as Modality Shifting Attentio...
research
07/17/2020

Knowledge-Based Video Question Answering with Unsupervised Scene Descriptions

To understand movies, humans constantly reason over the dialogues and ac...
research
02/01/2018

Adaptive Memory Networks

We present Adaptive Memory Networks (AMN) that processes input-question ...

Please sign up or login with your details

Forgot password? Click here to reset